linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/22]  Shared virtual address IOMMU and VT-d support
@ 2019-06-09 13:44 Jacob Pan
  2019-06-09 13:44 ` [PATCH v4 01/22] driver core: Add per device iommu param Jacob Pan
                   ` (21 more replies)
  0 siblings, 22 replies; 64+ messages in thread
From: Jacob Pan @ 2019-06-09 13:44 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan

Shared virtual address (SVA), a.k.a, Shared virtual memory (SVM) on Intel
platforms allow address space sharing between device DMA and applications.
SVA can reduce programming complexity and enhance security.
This series is intended to enable SVA virtualization, i.e. shared guest
application address space and physical device DMA address. Only IOMMU portion
of the changes are included in this series. Additional support is needed in
VFIO and QEMU (will be submitted separately) to complete this functionality.

To make incremental changes and reduce the size of each patchset. This series
does not inlcude support for page request services.

In VT-d implementation, PASID table is per device and maintained in the host.
Guest PASID table is shadowed in VMM where virtual IOMMU is emulated.

    .-------------.  .---------------------------.
    |   vIOMMU    |  | Guest process CR3, FL only|
    |             |  '---------------------------'
    .----------------/
    | PASID Entry |--- PASID cache flush -
    '-------------'                       |
    |             |                       V
    |             |                CR3 in GPA
    '-------------'
Guest
------| Shadow |--------------------------|--------
      v        v                          v
Host
    .-------------.  .----------------------.
    |   pIOMMU    |  | Bind FL for GVA-GPA  |
    |             |  '----------------------'
    .----------------/  |
    | PASID Entry |     V (Nested xlate)
    '----------------\.------------------------------.
    |             |   |SL for GPA-HPA, default domain|
    |             |   '------------------------------'
    '-------------'
Where:
 - FL = First level/stage one page tables
 - SL = Second level/stage two page tables


This work is based on collaboration with other developers on the IOMMU
mailing list. Notably,

[1] [PATCH v6 00/22] SMMUv3 Nested Stage Setup by Eric Auger
https://lkml.org/lkml/2019/3/17/124

[2] [RFC PATCH 2/6] drivers core: Add I/O ASID allocator by Jean-Philippe
Brucker
https://www.spinics.net/lists/iommu/msg30639.html

[3] [RFC PATCH 0/5] iommu: APIs for paravirtual PASID allocation by Lu Baolu
https://lkml.org/lkml/2018/11/12/1921

[4] [PATCH v5 00/23] IOMMU and VT-d driver support for Shared Virtual
    Address (SVA)
    https://lwn.net/Articles/754331/

There are roughly three parts:
1. Generic PASID allocator [1] with extension to support custom allocator
2. IOMMU cache invalidation passdown from guest to host
3. Guest PASID bind for nested translation

All generic IOMMU APIs are reused from [1], which has a v7 just published with
no real impact to the patches used here. It is worth noting that unlike sMMU
nested stage setup, where PASID table is owned by the guest, VT-d PASID table is
owned by the host, individual PASIDs are bound instead of the PASID table.

This series is based on the new VT-d 3.0 Specification
(https://software.intel.com/sites/default/files/managed/c5/15/vt-directed-io-spec.pdf).
This is different than the older series in [4] which was based on the older
specification that does not have scalable mode.


ChangeLog:
	- V4
	  - Redesigned IOASID allocator such that it can support custom
	  allocators with shared helper functions. Use separate XArray
	  to store IOASIDs per allocator. Took advice from Eric Auger to
	  have default allocator use the generic allocator structure.
	  Combined into one patch in that the default allocator is just
	  "another" allocator now. Can be built as a module in case of
	  driver use without IOMMU.
	  - Extended bind guest PASID data to support SMMU and non-identity
	  guest to host PASID mapping https://lkml.org/lkml/2019/5/21/802
	  - Rebased on Jean's sva/api common tree, new patches starts with
	   [PATCH v4 10/22]

	- V3
	  - Addressed thorough review comments from Eric Auger (Thank you!)
	  - Moved IOASID allocator from driver core to IOMMU code per
	    suggestion by Christoph Hellwig
	    (https://lkml.org/lkml/2019/4/26/462)
	  - Rebased on top of Jean's SVA API branch and Eric's v7[1]
	    (git://linux-arm.org/linux-jpb.git sva/api)
	  - All IOMMU APIs are unmodified (except the new bind guest PASID
	    call in patch 9/16)

	- V2
	  - Rebased on Joerg's IOMMU x86/vt-d branch v5.1-rc4
	  - Integrated with Eric Auger's new v7 series for common APIs
	  (https://github.com/eauger/linux/tree/v5.1-rc3-2stage-v7)
	  - Addressed review comments from Andy Shevchenko and Alex Williamson on
	    IOASID custom allocator.
	  - Support multiple custom IOASID allocators (vIOMMUs) and dynamic
	    registration.


Jacob Pan (17):
  driver core: Add per device iommu param
  iommu: Introduce device fault data
  iommu: Introduce device fault report API
  iommu: Add a timeout parameter for PRQ response
  iommu: Use device fault trace event
  iommu: Introduce attach/detach_pasid_table API
  iommu: Fix compile error without IOMMU_API
  iommu: Introduce guest PASID bind function
  iommu/vt-d: Add custom allocator for IOASID
  iommu/vt-d: Replace Intel specific PASID allocator with IOASID
  iommu/vt-d: Move domain helper to header
  iommu/vt-d: Avoid duplicated code for PASID setup
  iommu/vt-d: Add nested translation helper function
  iommu/vt-d: Clean up for SVM device list
  iommu/vt-d: Add bind guest PASID support
  iommu/vt-d: Support flushing more translation cache types
  iommu/vt-d: Add svm/sva invalidate function

Jean-Philippe Brucker (3):
  iommu: Add recoverable fault reporting
  trace/iommu: Add sva trace events
  iommu: Add I/O ASID allocator

Liu Yi L (1):
  iommu: Introduce cache_invalidate API

Lu Baolu (1):
  iommu/vt-d: Enlightened PASID allocation

 Documentation/admin-guide/kernel-parameters.txt |   8 +
 drivers/iommu/Kconfig                           |   9 +
 drivers/iommu/Makefile                          |   1 +
 drivers/iommu/dmar.c                            |  50 +++
 drivers/iommu/intel-iommu.c                     | 251 +++++++++++++-
 drivers/iommu/intel-pasid.c                     | 223 ++++++++++---
 drivers/iommu/intel-pasid.h                     |  24 +-
 drivers/iommu/intel-svm.c                       | 301 ++++++++++++++---
 drivers/iommu/ioasid.c                          | 427 ++++++++++++++++++++++++
 drivers/iommu/iommu.c                           | 282 +++++++++++++++-
 include/linux/device.h                          |   3 +
 include/linux/intel-iommu.h                     |  44 ++-
 include/linux/intel-svm.h                       |  17 +
 include/linux/ioasid.h                          |  74 ++++
 include/linux/iommu.h                           | 178 +++++++++-
 include/trace/events/iommu.h                    |  87 +++++
 include/uapi/linux/iommu.h                      | 338 +++++++++++++++++++
 17 files changed, 2190 insertions(+), 127 deletions(-)
 create mode 100644 drivers/iommu/ioasid.c
 create mode 100644 include/linux/ioasid.h
 create mode 100644 include/uapi/linux/iommu.h

-- 
2.7.4


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v4 01/22] driver core: Add per device iommu param
  2019-06-09 13:44 [PATCH v4 00/22] Shared virtual address IOMMU and VT-d support Jacob Pan
@ 2019-06-09 13:44 ` Jacob Pan
  2019-06-09 13:44 ` [PATCH v4 02/22] iommu: Introduce device fault data Jacob Pan
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 64+ messages in thread
From: Jacob Pan @ 2019-06-09 13:44 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan

DMA faults can be detected by IOMMU at device level. Adding a pointer
to struct device allows IOMMU subsystem to report relevant faults
back to the device driver for further handling.
For direct assigned device (or user space drivers), guest OS holds
responsibility to handle and respond per device IOMMU fault.
Therefore we need fault reporting mechanism to propagate faults beyond
IOMMU subsystem.

There are two other IOMMU data pointers under struct device today, here
we introduce iommu_param as a parent pointer such that all device IOMMU
data can be consolidated here. The idea was suggested here by Greg KH
and Joerg. The name iommu_param is chosen here since iommu_data has been used.

Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lkml.org/lkml/2017/10/6/81
---
 include/linux/device.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/device.h b/include/linux/device.h
index e85264f..f0a975a 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -42,6 +42,7 @@ struct iommu_ops;
 struct iommu_group;
 struct iommu_fwspec;
 struct dev_pin_info;
+struct iommu_param;
 
 struct bus_attribute {
 	struct attribute	attr;
@@ -959,6 +960,7 @@ struct dev_links_info {
  * 		device (i.e. the bus driver that discovered the device).
  * @iommu_group: IOMMU group the device belongs to.
  * @iommu_fwspec: IOMMU-specific properties supplied by firmware.
+ * @iommu_param: Per device generic IOMMU runtime data
  *
  * @offline_disabled: If set, the device is permanently online.
  * @offline:	Set after successful invocation of bus type's .offline().
@@ -1052,6 +1054,7 @@ struct device {
 	void	(*release)(struct device *dev);
 	struct iommu_group	*iommu_group;
 	struct iommu_fwspec	*iommu_fwspec;
+	struct iommu_param	*iommu_param;
 
 	bool			offline_disabled:1;
 	bool			offline:1;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 02/22] iommu: Introduce device fault data
  2019-06-09 13:44 [PATCH v4 00/22] Shared virtual address IOMMU and VT-d support Jacob Pan
  2019-06-09 13:44 ` [PATCH v4 01/22] driver core: Add per device iommu param Jacob Pan
@ 2019-06-09 13:44 ` Jacob Pan
  2019-06-18 15:42   ` Jonathan Cameron
  2019-06-09 13:44 ` [PATCH v4 03/22] iommu: Introduce device fault report API Jacob Pan
                   ` (19 subsequent siblings)
  21 siblings, 1 reply; 64+ messages in thread
From: Jacob Pan @ 2019-06-09 13:44 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan, Liu, Yi L

Device faults detected by IOMMU can be reported outside the IOMMU
subsystem for further processing. This patch introduces
a generic device fault data structure.

The fault can be either an unrecoverable fault or a page request,
also referred to as a recoverable fault.

We only care about non internal faults that are likely to be reported
to an external subsystem.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 include/linux/iommu.h      |  44 +++++++++++++++++
 include/uapi/linux/iommu.h | 118 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 162 insertions(+)
 create mode 100644 include/uapi/linux/iommu.h

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a815cf6..7890a92 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -25,6 +25,7 @@
 #include <linux/errno.h>
 #include <linux/err.h>
 #include <linux/of.h>
+#include <uapi/linux/iommu.h>
 
 #define IOMMU_READ	(1 << 0)
 #define IOMMU_WRITE	(1 << 1)
@@ -49,6 +50,7 @@ struct device;
 struct iommu_domain;
 struct notifier_block;
 struct iommu_sva;
+struct iommu_fault_event;
 
 /* iommu fault flags */
 #define IOMMU_FAULT_READ	0x0
@@ -58,6 +60,7 @@ typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
 			struct device *, unsigned long, int, void *);
 typedef int (*iommu_mm_exit_handler_t)(struct device *dev, struct iommu_sva *,
 				       void *);
+typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault_event *, void *);
 
 struct iommu_domain_geometry {
 	dma_addr_t aperture_start; /* First address that can be mapped    */
@@ -301,6 +304,46 @@ struct iommu_device {
 	struct device *dev;
 };
 
+/**
+ * struct iommu_fault_event - Generic fault event
+ *
+ * Can represent recoverable faults such as a page requests or
+ * unrecoverable faults such as DMA or IRQ remapping faults.
+ *
+ * @fault: fault descriptor
+ * @iommu_private: used by the IOMMU driver for storing fault-specific
+ *                 data. Users should not modify this field before
+ *                 sending the fault response.
+ */
+struct iommu_fault_event {
+	struct iommu_fault fault;
+	u64 iommu_private;
+};
+
+/**
+ * struct iommu_fault_param - per-device IOMMU fault data
+ * @dev_fault_handler: Callback function to handle IOMMU faults at device level
+ * @data: handler private data
+ *
+ */
+struct iommu_fault_param {
+	iommu_dev_fault_handler_t handler;
+	void *data;
+};
+
+/**
+ * struct iommu_param - collection of per-device IOMMU data
+ *
+ * @fault_param: IOMMU detected device fault reporting data
+ *
+ * TODO: migrate other per device data pointers under iommu_dev_data, e.g.
+ *	struct iommu_group	*iommu_group;
+ *	struct iommu_fwspec	*iommu_fwspec;
+ */
+struct iommu_param {
+	struct iommu_fault_param *fault_param;
+};
+
 int  iommu_device_register(struct iommu_device *iommu);
 void iommu_device_unregister(struct iommu_device *iommu);
 int  iommu_device_sysfs_add(struct iommu_device *iommu,
@@ -504,6 +547,7 @@ struct iommu_ops {};
 struct iommu_group {};
 struct iommu_fwspec {};
 struct iommu_device {};
+struct iommu_fault_param {};
 
 static inline bool iommu_present(struct bus_type *bus)
 {
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
new file mode 100644
index 0000000..aaa3b6a
--- /dev/null
+++ b/include/uapi/linux/iommu.h
@@ -0,0 +1,118 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * IOMMU user API definitions
+ */
+
+#ifndef _UAPI_IOMMU_H
+#define _UAPI_IOMMU_H
+
+#include <linux/types.h>
+
+#define IOMMU_FAULT_PERM_READ	(1 << 0) /* read */
+#define IOMMU_FAULT_PERM_WRITE	(1 << 1) /* write */
+#define IOMMU_FAULT_PERM_EXEC	(1 << 2) /* exec */
+#define IOMMU_FAULT_PERM_PRIV	(1 << 3) /* privileged */
+
+/* Generic fault types, can be expanded IRQ remapping fault */
+enum iommu_fault_type {
+	IOMMU_FAULT_DMA_UNRECOV = 1,	/* unrecoverable fault */
+	IOMMU_FAULT_PAGE_REQ,		/* page request fault */
+};
+
+enum iommu_fault_reason {
+	IOMMU_FAULT_REASON_UNKNOWN = 0,
+
+	/* Could not access the PASID table (fetch caused external abort) */
+	IOMMU_FAULT_REASON_PASID_FETCH,
+
+	/* PASID entry is invalid or has configuration errors */
+	IOMMU_FAULT_REASON_BAD_PASID_ENTRY,
+
+	/*
+	 * PASID is out of range (e.g. exceeds the maximum PASID
+	 * supported by the IOMMU) or disabled.
+	 */
+	IOMMU_FAULT_REASON_PASID_INVALID,
+
+	/*
+	 * An external abort occurred fetching (or updating) a translation
+	 * table descriptor
+	 */
+	IOMMU_FAULT_REASON_WALK_EABT,
+
+	/*
+	 * Could not access the page table entry (Bad address),
+	 * actual translation fault
+	 */
+	IOMMU_FAULT_REASON_PTE_FETCH,
+
+	/* Protection flag check failed */
+	IOMMU_FAULT_REASON_PERMISSION,
+
+	/* access flag check failed */
+	IOMMU_FAULT_REASON_ACCESS,
+
+	/* Output address of a translation stage caused Address Size fault */
+	IOMMU_FAULT_REASON_OOR_ADDRESS,
+};
+
+/**
+ * struct iommu_fault_unrecoverable - Unrecoverable fault data
+ * @reason: reason of the fault, from &enum iommu_fault_reason
+ * @flags: parameters of this fault (IOMMU_FAULT_UNRECOV_* values)
+ * @pasid: Process Address Space ID
+ * @perm: Requested permission access using by the incoming transaction
+ *        (IOMMU_FAULT_PERM_* values)
+ * @addr: offending page address
+ * @fetch_addr: address that caused a fetch abort, if any
+ */
+struct iommu_fault_unrecoverable {
+	__u32	reason;
+#define IOMMU_FAULT_UNRECOV_PASID_VALID		(1 << 0)
+#define IOMMU_FAULT_UNRECOV_ADDR_VALID		(1 << 1)
+#define IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID	(1 << 2)
+	__u32	flags;
+	__u32	pasid;
+	__u32	perm;
+	__u64	addr;
+	__u64	fetch_addr;
+};
+
+/**
+ * struct iommu_fault_page_request - Page Request data
+ * @flags: encodes whether the corresponding fields are valid and whether this
+ *         is the last page in group (IOMMU_FAULT_PAGE_REQUEST_* values)
+ * @pasid: Process Address Space ID
+ * @grpid: Page Request Group Index
+ * @perm: requested page permissions (IOMMU_FAULT_PERM_* values)
+ * @addr: page address
+ * @private_data: device-specific private information
+ */
+struct iommu_fault_page_request {
+#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID	(1 << 0)
+#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE	(1 << 1)
+#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA	(1 << 2)
+	__u32	flags;
+	__u32	pasid;
+	__u32	grpid;
+	__u32	perm;
+	__u64	addr;
+	__u64	private_data[2];
+};
+
+/**
+ * struct iommu_fault - Generic fault data
+ * @type: fault type from &enum iommu_fault_type
+ * @padding: reserved for future use (should be zero)
+ * @event: Fault event, when @type is %IOMMU_FAULT_DMA_UNRECOV
+ * @prm: Page Request message, when @type is %IOMMU_FAULT_PAGE_REQ
+ */
+struct iommu_fault {
+	__u32	type;
+	__u32	padding;
+	union {
+		struct iommu_fault_unrecoverable event;
+		struct iommu_fault_page_request prm;
+	};
+};
+#endif /* _UAPI_IOMMU_H */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 03/22] iommu: Introduce device fault report API
  2019-06-09 13:44 [PATCH v4 00/22] Shared virtual address IOMMU and VT-d support Jacob Pan
  2019-06-09 13:44 ` [PATCH v4 01/22] driver core: Add per device iommu param Jacob Pan
  2019-06-09 13:44 ` [PATCH v4 02/22] iommu: Introduce device fault data Jacob Pan
@ 2019-06-09 13:44 ` Jacob Pan
  2019-06-18 15:41   ` Jonathan Cameron
  2019-06-09 13:44 ` [PATCH v4 04/22] iommu: Add recoverable fault reporting Jacob Pan
                   ` (18 subsequent siblings)
  21 siblings, 1 reply; 64+ messages in thread
From: Jacob Pan @ 2019-06-09 13:44 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan

Traditionally, device specific faults are detected and handled within
their own device drivers. When IOMMU is enabled, faults such as DMA
related transactions are detected by IOMMU. There is no generic
reporting mechanism to report faults back to the in-kernel device
driver or the guest OS in case of assigned devices.

This patch introduces a registration API for device specific fault
handlers. This differs from the existing iommu_set_fault_handler/
report_iommu_fault infrastructures in several ways:
- it allows to report more sophisticated fault events (both
  unrecoverable faults and page request faults) due to the nature
  of the iommu_fault struct
- it is device specific and not domain specific.

The current iommu_report_device_fault() implementation only handles
the "shoot and forget" unrecoverable fault case. Handling of page
request faults or stalled faults will come later.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/iommu/iommu.c | 127 +++++++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/iommu.h |  33 ++++++++++++-
 2 files changed, 157 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 67ee662..7955184 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -644,6 +644,13 @@ int iommu_group_add_device(struct iommu_group *group, struct device *dev)
 		goto err_free_name;
 	}
 
+	dev->iommu_param = kzalloc(sizeof(*dev->iommu_param), GFP_KERNEL);
+	if (!dev->iommu_param) {
+		ret = -ENOMEM;
+		goto err_free_name;
+	}
+	mutex_init(&dev->iommu_param->lock);
+
 	kobject_get(group->devices_kobj);
 
 	dev->iommu_group = group;
@@ -674,6 +681,7 @@ int iommu_group_add_device(struct iommu_group *group, struct device *dev)
 	mutex_unlock(&group->mutex);
 	dev->iommu_group = NULL;
 	kobject_put(group->devices_kobj);
+	kfree(dev->iommu_param);
 err_free_name:
 	kfree(device->name);
 err_remove_link:
@@ -720,7 +728,7 @@ void iommu_group_remove_device(struct device *dev)
 	sysfs_remove_link(&dev->kobj, "iommu_group");
 
 	trace_remove_device_from_group(group->id, dev);
-
+	kfree(dev->iommu_param);
 	kfree(device->name);
 	kfree(device);
 	dev->iommu_group = NULL;
@@ -855,6 +863,123 @@ int iommu_group_unregister_notifier(struct iommu_group *group,
 EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
 
 /**
+ * iommu_register_device_fault_handler() - Register a device fault handler
+ * @dev: the device
+ * @handler: the fault handler
+ * @data: private data passed as argument to the handler
+ *
+ * When an IOMMU fault event is received, this handler gets called with the
+ * fault event and data as argument.
+ *
+ * Return 0 if the fault handler was installed successfully, or an error.
+ */
+int iommu_register_device_fault_handler(struct device *dev,
+					iommu_dev_fault_handler_t handler,
+					void *data)
+{
+	struct iommu_param *param = dev->iommu_param;
+	int ret = 0;
+
+	/*
+	 * Device iommu_param should have been allocated when device is
+	 * added to its iommu_group.
+	 */
+	if (!param)
+		return -EINVAL;
+
+	mutex_lock(&param->lock);
+	/* Only allow one fault handler registered for each device */
+	if (param->fault_param) {
+		ret = -EBUSY;
+		goto done_unlock;
+	}
+
+	get_device(dev);
+	param->fault_param =
+		kzalloc(sizeof(struct iommu_fault_param), GFP_KERNEL);
+	if (!param->fault_param) {
+		put_device(dev);
+		ret = -ENOMEM;
+		goto done_unlock;
+	}
+	param->fault_param->handler = handler;
+	param->fault_param->data = data;
+
+done_unlock:
+	mutex_unlock(&param->lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_register_device_fault_handler);
+
+/**
+ * iommu_unregister_device_fault_handler() - Unregister the device fault handler
+ * @dev: the device
+ *
+ * Remove the device fault handler installed with
+ * iommu_register_device_fault_handler().
+ *
+ * Return 0 on success, or an error.
+ */
+int iommu_unregister_device_fault_handler(struct device *dev)
+{
+	struct iommu_param *param = dev->iommu_param;
+	int ret = 0;
+
+	if (!param)
+		return -EINVAL;
+
+	mutex_lock(&param->lock);
+
+	if (!param->fault_param)
+		goto unlock;
+
+	kfree(param->fault_param);
+	param->fault_param = NULL;
+	put_device(dev);
+unlock:
+	mutex_unlock(&param->lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_unregister_device_fault_handler);
+
+
+/**
+ * iommu_report_device_fault() - Report fault event to device
+ * @dev: the device
+ * @evt: fault event data
+ *
+ * Called by IOMMU drivers when a fault is detected, typically in a threaded IRQ
+ * handler.
+ *
+ * Return 0 on success, or an error.
+ */
+int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
+{
+	struct iommu_param *param = dev->iommu_param;
+	struct iommu_fault_param *fparam;
+	int ret = 0;
+
+	/* iommu_param is allocated when device is added to group */
+	if (!param || !evt)
+		return -EINVAL;
+
+	/* we only report device fault if there is a handler registered */
+	mutex_lock(&param->lock);
+	fparam = param->fault_param;
+	if (!fparam || !fparam->handler) {
+		ret = -EINVAL;
+		goto done_unlock;
+	}
+	ret = fparam->handler(evt, fparam->data);
+done_unlock:
+	mutex_unlock(&param->lock);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_report_device_fault);
+
+/**
  * iommu_group_id - Return ID for a group
  * @group: the group to ID
  *
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 7890a92..b87b74c 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -322,9 +322,9 @@ struct iommu_fault_event {
 
 /**
  * struct iommu_fault_param - per-device IOMMU fault data
- * @dev_fault_handler: Callback function to handle IOMMU faults at device level
- * @data: handler private data
  *
+ * @handler: Callback function to handle IOMMU faults at device level
+ * @data: handler private data
  */
 struct iommu_fault_param {
 	iommu_dev_fault_handler_t handler;
@@ -341,6 +341,7 @@ struct iommu_fault_param {
  *	struct iommu_fwspec	*iommu_fwspec;
  */
 struct iommu_param {
+	struct mutex lock;
 	struct iommu_fault_param *fault_param;
 };
 
@@ -433,6 +434,15 @@ extern int iommu_group_register_notifier(struct iommu_group *group,
 					 struct notifier_block *nb);
 extern int iommu_group_unregister_notifier(struct iommu_group *group,
 					   struct notifier_block *nb);
+extern int iommu_register_device_fault_handler(struct device *dev,
+					iommu_dev_fault_handler_t handler,
+					void *data);
+
+extern int iommu_unregister_device_fault_handler(struct device *dev);
+
+extern int iommu_report_device_fault(struct device *dev,
+				     struct iommu_fault_event *evt);
+
 extern int iommu_group_id(struct iommu_group *group);
 extern struct iommu_group *iommu_group_get_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_group_default_domain(struct iommu_group *);
@@ -741,6 +751,25 @@ static inline int iommu_group_unregister_notifier(struct iommu_group *group,
 	return 0;
 }
 
+static inline
+int iommu_register_device_fault_handler(struct device *dev,
+					iommu_dev_fault_handler_t handler,
+					void *data)
+{
+	return -ENODEV;
+}
+
+static inline int iommu_unregister_device_fault_handler(struct device *dev)
+{
+	return 0;
+}
+
+static inline
+int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
+{
+	return -ENODEV;
+}
+
 static inline int iommu_group_id(struct iommu_group *group)
 {
 	return -ENODEV;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 04/22] iommu: Add recoverable fault reporting
  2019-06-09 13:44 [PATCH v4 00/22] Shared virtual address IOMMU and VT-d support Jacob Pan
                   ` (2 preceding siblings ...)
  2019-06-09 13:44 ` [PATCH v4 03/22] iommu: Introduce device fault report API Jacob Pan
@ 2019-06-09 13:44 ` Jacob Pan
  2019-06-18 15:44   ` Jonathan Cameron
  2019-06-09 13:44 ` [PATCH v4 05/22] iommu: Add a timeout parameter for PRQ response Jacob Pan
                   ` (17 subsequent siblings)
  21 siblings, 1 reply; 64+ messages in thread
From: Jacob Pan @ 2019-06-09 13:44 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan

From: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>

Some IOMMU hardware features, for example PCI's PRI and Arm SMMU's Stall,
enable recoverable I/O page faults. Allow IOMMU drivers to report PRI Page
Requests and Stall events through the new fault reporting API. The
consumer of the fault can be either an I/O page fault handler in the host,
or a guest OS.

Once handled, the fault must be completed by sending a page response back
to the IOMMU. Add an iommu_page_response() function to complete a page
fault.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/iommu.c | 77 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/iommu.h | 51 ++++++++++++++++++++++++++++++++++
 2 files changed, 127 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 7955184..13b301c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -869,7 +869,14 @@ EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
  * @data: private data passed as argument to the handler
  *
  * When an IOMMU fault event is received, this handler gets called with the
- * fault event and data as argument.
+ * fault event and data as argument. The handler should return 0 on success. If
+ * the fault is recoverable (IOMMU_FAULT_PAGE_REQ), the handler should also
+ * complete the fault by calling iommu_page_response() with one of the following
+ * response code:
+ * - IOMMU_PAGE_RESP_SUCCESS: retry the translation
+ * - IOMMU_PAGE_RESP_INVALID: terminate the fault
+ * - IOMMU_PAGE_RESP_FAILURE: terminate the fault and stop reporting
+ *   page faults if possible.
  *
  * Return 0 if the fault handler was installed successfully, or an error.
  */
@@ -904,6 +911,8 @@ int iommu_register_device_fault_handler(struct device *dev,
 	}
 	param->fault_param->handler = handler;
 	param->fault_param->data = data;
+	mutex_init(&param->fault_param->lock);
+	INIT_LIST_HEAD(&param->fault_param->faults);
 
 done_unlock:
 	mutex_unlock(&param->lock);
@@ -934,6 +943,12 @@ int iommu_unregister_device_fault_handler(struct device *dev)
 	if (!param->fault_param)
 		goto unlock;
 
+	/* we cannot unregister handler if there are pending faults */
+	if (!list_empty(&param->fault_param->faults)) {
+		ret = -EBUSY;
+		goto unlock;
+	}
+
 	kfree(param->fault_param);
 	param->fault_param = NULL;
 	put_device(dev);
@@ -958,6 +973,7 @@ EXPORT_SYMBOL_GPL(iommu_unregister_device_fault_handler);
 int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
 {
 	struct iommu_param *param = dev->iommu_param;
+	struct iommu_fault_event *evt_pending;
 	struct iommu_fault_param *fparam;
 	int ret = 0;
 
@@ -972,6 +988,20 @@ int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
 		ret = -EINVAL;
 		goto done_unlock;
 	}
+
+	if (evt->fault.type == IOMMU_FAULT_PAGE_REQ &&
+	    (evt->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) {
+		evt_pending = kmemdup(evt, sizeof(struct iommu_fault_event),
+				      GFP_KERNEL);
+		if (!evt_pending) {
+			ret = -ENOMEM;
+			goto done_unlock;
+		}
+		mutex_lock(&fparam->lock);
+		list_add_tail(&evt_pending->list, &fparam->faults);
+		mutex_unlock(&fparam->lock);
+	}
+
 	ret = fparam->handler(evt, fparam->data);
 done_unlock:
 	mutex_unlock(&param->lock);
@@ -1513,6 +1543,51 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_attach_device);
 
+int iommu_page_response(struct device *dev,
+			struct page_response_msg *msg)
+{
+	struct iommu_param *param = dev->iommu_param;
+	int ret = -EINVAL;
+	struct iommu_fault_event *evt;
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+	if (!domain || !domain->ops->page_response)
+		return -ENODEV;
+
+	/*
+	 * Device iommu_param should have been allocated when device is
+	 * added to its iommu_group.
+	 */
+	if (!param || !param->fault_param)
+		return -EINVAL;
+
+	/* Only send response if there is a fault report pending */
+	mutex_lock(&param->fault_param->lock);
+	if (list_empty(&param->fault_param->faults)) {
+		pr_warn("no pending PRQ, drop response\n");
+		goto done_unlock;
+	}
+	/*
+	 * Check if we have a matching page request pending to respond,
+	 * otherwise return -EINVAL
+	 */
+	list_for_each_entry(evt, &param->fault_param->faults, list) {
+		if (evt->fault.prm.pasid == msg->pasid &&
+		    evt->fault.prm.grpid == msg->grpid) {
+			msg->iommu_data = evt->iommu_private;
+			ret = domain->ops->page_response(dev, msg);
+			list_del(&evt->list);
+			kfree(evt);
+			break;
+		}
+	}
+
+done_unlock:
+	mutex_unlock(&param->fault_param->lock);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_page_response);
+
 static void __iommu_detach_device(struct iommu_domain *domain,
 				  struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index b87b74c..950347b 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -192,6 +192,42 @@ struct iommu_sva_ops {
 #ifdef CONFIG_IOMMU_API
 
 /**
+ * enum page_response_code - Return status of fault handlers, telling the IOMMU
+ * driver how to proceed with the fault.
+ *
+ * @IOMMU_PAGE_RESP_SUCCESS: Fault has been handled and the page tables
+ *	populated, retry the access. This is "Success" in PCI PRI.
+ * @IOMMU_PAGE_RESP_FAILURE: General error. Drop all subsequent faults from
+ *	this device if possible. This is "Response Failure" in PCI PRI.
+ * @IOMMU_PAGE_RESP_INVALID: Could not handle this fault, don't retry the
+ *	access. This is "Invalid Request" in PCI PRI.
+ */
+enum page_response_code {
+	IOMMU_PAGE_RESP_SUCCESS = 0,
+	IOMMU_PAGE_RESP_INVALID,
+	IOMMU_PAGE_RESP_FAILURE,
+};
+
+/**
+ * struct page_response_msg - Generic page response information based on PCI ATS
+ *                            and PASID spec
+ * @addr: servicing page address
+ * @pasid: contains process address space ID
+ * @pasid_present: the @pasid field is valid
+ * @resp_code: response code
+ * @grpid: page request group index
+ * @iommu_data: data private to the IOMMU
+ */
+struct page_response_msg {
+	u64 addr;
+	u32 pasid;
+	u32 pasid_present:1;
+	enum page_response_code resp_code;
+	u32 grpid;
+	u64 iommu_data;
+};
+
+/**
  * struct iommu_ops - iommu ops and capabilities
  * @capable: check capability
  * @domain_alloc: allocate iommu domain
@@ -227,6 +263,7 @@ struct iommu_sva_ops {
  * @sva_bind: Bind process address space to device
  * @sva_unbind: Unbind process address space from device
  * @sva_get_pasid: Get PASID associated to a SVA handle
+ * @page_response: handle page request response
  * @pgsize_bitmap: bitmap of all possible supported page sizes
  */
 struct iommu_ops {
@@ -287,6 +324,8 @@ struct iommu_ops {
 	void (*sva_unbind)(struct iommu_sva *handle);
 	int (*sva_get_pasid)(struct iommu_sva *handle);
 
+	int (*page_response)(struct device *dev, struct page_response_msg *msg);
+
 	unsigned long pgsize_bitmap;
 };
 
@@ -311,11 +350,13 @@ struct iommu_device {
  * unrecoverable faults such as DMA or IRQ remapping faults.
  *
  * @fault: fault descriptor
+ * @list: pending fault event list, used for tracking responses
  * @iommu_private: used by the IOMMU driver for storing fault-specific
  *                 data. Users should not modify this field before
  *                 sending the fault response.
  */
 struct iommu_fault_event {
+	struct list_head list;
 	struct iommu_fault fault;
 	u64 iommu_private;
 };
@@ -325,10 +366,14 @@ struct iommu_fault_event {
  *
  * @handler: Callback function to handle IOMMU faults at device level
  * @data: handler private data
+ * @faults: holds the pending faults which needs response, e.g. page response.
+ * @lock: protect pending faults list
  */
 struct iommu_fault_param {
 	iommu_dev_fault_handler_t handler;
 	void *data;
+	struct list_head faults;
+	struct mutex lock;
 };
 
 /**
@@ -443,6 +488,7 @@ extern int iommu_unregister_device_fault_handler(struct device *dev);
 extern int iommu_report_device_fault(struct device *dev,
 				     struct iommu_fault_event *evt);
 
+extern int iommu_page_response(struct device *dev, struct page_response_msg *msg);
 extern int iommu_group_id(struct iommu_group *group);
 extern struct iommu_group *iommu_group_get_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_group_default_domain(struct iommu_group *);
@@ -770,6 +816,11 @@ int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
 	return -ENODEV;
 }
 
+static inline int iommu_page_response(struct device *dev, struct page_response_msg *msg)
+{
+	return -ENODEV;
+}
+
 static inline int iommu_group_id(struct iommu_group *group)
 {
 	return -ENODEV;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 05/22] iommu: Add a timeout parameter for PRQ response
  2019-06-09 13:44 [PATCH v4 00/22] Shared virtual address IOMMU and VT-d support Jacob Pan
                   ` (3 preceding siblings ...)
  2019-06-09 13:44 ` [PATCH v4 04/22] iommu: Add recoverable fault reporting Jacob Pan
@ 2019-06-09 13:44 ` Jacob Pan
  2019-06-09 13:44 ` [PATCH v4 06/22] trace/iommu: Add sva trace events Jacob Pan
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 64+ messages in thread
From: Jacob Pan @ 2019-06-09 13:44 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan

When an IO page request is processed outside IOMMU subsystem, response
can be delayed or lost. Add a tunable setup parameter such that user can
choose the timeout for IOMMU to track pending page requests.

This timeout mechanism is a basic safety net which can be implemented in
conjunction with credit based or device level page response exception
handling.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  8 +++++++
 drivers/iommu/iommu.c                           | 29 +++++++++++++++++++++++++
 2 files changed, 37 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 138f666..b43f089 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1813,6 +1813,14 @@
 			1 - Bypass the IOMMU for DMA.
 			unset - Use value of CONFIG_IOMMU_DEFAULT_PASSTHROUGH.
 
+	iommu.prq_timeout=
+			Timeout in seconds to wait for page response
+			of a pending page request.
+			Format: <integer>
+			Default: 10
+			0 - no timeout tracking
+			1 to 100 - allowed range
+
 	io7=		[HW] IO7 for Marvel based alpha systems
 			See comment before marvel_specify_io7 in
 			arch/alpha/kernel/core_marvel.c.
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 13b301c..64e87d5 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -45,6 +45,19 @@ static unsigned int iommu_def_domain_type = IOMMU_DOMAIN_DMA;
 #endif
 static bool iommu_dma_strict __read_mostly = true;
 
+/*
+ * Timeout to wait for page response of a pending page request. This is
+ * intended as a basic safty net in case a pending page request is not
+ * responded for an exceptionally long time. Device may also implement
+ * its own protection mechanism against this exception.
+ * Units are in jiffies with a range between 1 - 100 seconds equivalent.
+ * Default to 10 seconds.
+ * Setting 0 means no timeout tracking.
+ */
+#define IOMMU_PAGE_RESPONSE_MAX_TIMEOUT (HZ * 100)
+#define IOMMU_PAGE_RESPONSE_DEF_TIMEOUT (HZ * 10)
+static unsigned long prq_timeout = IOMMU_PAGE_RESPONSE_DEF_TIMEOUT;
+
 struct iommu_group {
 	struct kobject kobj;
 	struct kobject *devices_kobj;
@@ -157,6 +170,22 @@ static int __init iommu_dma_setup(char *str)
 }
 early_param("iommu.strict", iommu_dma_setup);
 
+static int __init iommu_set_prq_timeout(char *str)
+{
+	unsigned long timeout;
+
+	if (!str)
+		return -EINVAL;
+	timeout = simple_strtoul(str, NULL, 0);
+	timeout = timeout * HZ;
+	if (timeout > IOMMU_PAGE_RESPONSE_MAX_TIMEOUT)
+		return -EINVAL;
+	prq_timeout = timeout;
+
+	return 0;
+}
+early_param("iommu.prq_timeout", iommu_set_prq_timeout);
+
 static ssize_t iommu_group_attr_show(struct kobject *kobj,
 				     struct attribute *__attr, char *buf)
 {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 06/22] trace/iommu: Add sva trace events
  2019-06-09 13:44 [PATCH v4 00/22] Shared virtual address IOMMU and VT-d support Jacob Pan
                   ` (4 preceding siblings ...)
  2019-06-09 13:44 ` [PATCH v4 05/22] iommu: Add a timeout parameter for PRQ response Jacob Pan
@ 2019-06-09 13:44 ` Jacob Pan
  2019-06-09 13:44 ` [PATCH v4 07/22] iommu: Use device fault trace event Jacob Pan
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 64+ messages in thread
From: Jacob Pan @ 2019-06-09 13:44 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan

From: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>

For development only, trace I/O page faults and responses.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
[JPB: removed the invalidate trace event, that will be added later]
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 include/trace/events/iommu.h | 87 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 87 insertions(+)

diff --git a/include/trace/events/iommu.h b/include/trace/events/iommu.h
index 72b4582..c8de147 100644
--- a/include/trace/events/iommu.h
+++ b/include/trace/events/iommu.h
@@ -12,6 +12,8 @@
 #define _TRACE_IOMMU_H
 
 #include <linux/tracepoint.h>
+#include <linux/iommu.h>
+#include <uapi/linux/iommu.h>
 
 struct device;
 
@@ -161,6 +163,91 @@ DEFINE_EVENT(iommu_error, io_page_fault,
 
 	TP_ARGS(dev, iova, flags)
 );
+
+TRACE_EVENT(dev_fault,
+
+	TP_PROTO(struct device *dev,  struct iommu_fault *evt),
+
+	TP_ARGS(dev, evt),
+
+	TP_STRUCT__entry(
+		__string(device, dev_name(dev))
+		__field(int, type)
+		__field(int, reason)
+		__field(u64, addr)
+		__field(u64, fetch_addr)
+		__field(u32, pasid)
+		__field(u32, grpid)
+		__field(u32, flags)
+		__field(u32, prot)
+	),
+
+	TP_fast_assign(
+		__assign_str(device, dev_name(dev));
+		__entry->type = evt->type;
+		if (evt->type == IOMMU_FAULT_DMA_UNRECOV) {
+			__entry->reason		= evt->event.reason;
+			__entry->flags		= evt->event.flags;
+			__entry->pasid		= evt->event.pasid;
+			__entry->grpid		= 0;
+			__entry->prot		= evt->event.perm;
+			__entry->addr		= evt->event.addr;
+			__entry->fetch_addr	= evt->event.fetch_addr;
+		} else {
+			__entry->reason		= 0;
+			__entry->flags		= evt->prm.flags;
+			__entry->pasid		= evt->prm.pasid;
+			__entry->grpid		= evt->prm.grpid;
+			__entry->prot		= evt->prm.perm;
+			__entry->addr		= evt->prm.addr;
+			__entry->fetch_addr	= 0;
+		}
+	),
+
+	TP_printk("IOMMU:%s type=%d reason=%d addr=0x%016llx fetch=0x%016llx pasid=%d group=%d flags=%x prot=%d",
+		__get_str(device),
+		__entry->type,
+		__entry->reason,
+		__entry->addr,
+		__entry->fetch_addr,
+		__entry->pasid,
+		__entry->grpid,
+		__entry->flags,
+		__entry->prot
+	)
+);
+
+TRACE_EVENT(dev_page_response,
+
+	TP_PROTO(struct device *dev,  struct page_response_msg *msg),
+
+	TP_ARGS(dev, msg),
+
+	TP_STRUCT__entry(
+		__string(device, dev_name(dev))
+		__field(int, code)
+		__field(u64, addr)
+		__field(u32, pasid)
+		__field(u32, grpid)
+	),
+
+	TP_fast_assign(
+		__assign_str(device, dev_name(dev));
+		__entry->code = msg->resp_code;
+		__entry->addr = msg->addr;
+		__entry->pasid = msg->pasid;
+		__entry->grpid = msg->grpid;
+	),
+
+	TP_printk("IOMMU:%s code=%d addr=0x%016llx pasid=%d group=%d",
+		__get_str(device),
+		__entry->code,
+		__entry->addr,
+		__entry->pasid,
+		__entry->grpid
+	)
+);
+
 #endif /* _TRACE_IOMMU_H */
 
 /* This part must be outside protection */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 07/22] iommu: Use device fault trace event
  2019-06-09 13:44 [PATCH v4 00/22] Shared virtual address IOMMU and VT-d support Jacob Pan
                   ` (5 preceding siblings ...)
  2019-06-09 13:44 ` [PATCH v4 06/22] trace/iommu: Add sva trace events Jacob Pan
@ 2019-06-09 13:44 ` Jacob Pan
  2019-06-09 13:44 ` [PATCH v4 08/22] iommu: Introduce attach/detach_pasid_table API Jacob Pan
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 64+ messages in thread
From: Jacob Pan @ 2019-06-09 13:44 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan

For performance and debugging purposes, these trace events help
analyzing device faults that interact with IOMMU subsystem.
E.g.
IOMMU:0000:00:0a.0 type=2 reason=0 addr=0x00000000007ff000 pasid=1
group=1 last=0 prot=1

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
[JPB: removed invalidate event, that will be added later]
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/iommu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 64e87d5..166adb8 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1032,6 +1032,7 @@ int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
 	}
 
 	ret = fparam->handler(evt, fparam->data);
+	trace_dev_fault(dev, &evt->fault);
 done_unlock:
 	mutex_unlock(&param->lock);
 	return ret;
@@ -1604,6 +1605,7 @@ int iommu_page_response(struct device *dev,
 		if (evt->fault.prm.pasid == msg->pasid &&
 		    evt->fault.prm.grpid == msg->grpid) {
 			msg->iommu_data = evt->iommu_private;
+			trace_dev_page_response(dev, msg);
 			ret = domain->ops->page_response(dev, msg);
 			list_del(&evt->list);
 			kfree(evt);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 08/22] iommu: Introduce attach/detach_pasid_table API
  2019-06-09 13:44 [PATCH v4 00/22] Shared virtual address IOMMU and VT-d support Jacob Pan
                   ` (6 preceding siblings ...)
  2019-06-09 13:44 ` [PATCH v4 07/22] iommu: Use device fault trace event Jacob Pan
@ 2019-06-09 13:44 ` Jacob Pan
  2019-06-18 15:41   ` Jonathan Cameron
  2019-06-09 13:44 ` [PATCH v4 09/22] iommu: Introduce cache_invalidate API Jacob Pan
                   ` (13 subsequent siblings)
  21 siblings, 1 reply; 64+ messages in thread
From: Jacob Pan @ 2019-06-09 13:44 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan, Liu, Yi L

In virtualization use case, when a guest is assigned
a PCI host device, protected by a virtual IOMMU on the guest,
the physical IOMMU must be programmed to be consistent with
the guest mappings. If the physical IOMMU supports two
translation stages it makes sense to program guest mappings
onto the first stage/level (ARM/Intel terminology) while the host
owns the stage/level 2.

In that case, it is mandated to trap on guest configuration
settings and pass those to the physical iommu driver.

This patch adds a new API to the iommu subsystem that allows
to set/unset the pasid table information.

A generic iommu_pasid_table_config struct is introduced in
a new iommu.h uapi header. This is going to be used by the VFIO
user API.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/iommu.c      | 19 +++++++++++++++++
 include/linux/iommu.h      | 18 ++++++++++++++++
 include/uapi/linux/iommu.h | 52 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 89 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 166adb8..4496ccd 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1619,6 +1619,25 @@ int iommu_page_response(struct device *dev,
 }
 EXPORT_SYMBOL_GPL(iommu_page_response);
 
+int iommu_attach_pasid_table(struct iommu_domain *domain,
+			     struct iommu_pasid_table_config *cfg)
+{
+	if (unlikely(!domain->ops->attach_pasid_table))
+		return -ENODEV;
+
+	return domain->ops->attach_pasid_table(domain, cfg);
+}
+EXPORT_SYMBOL_GPL(iommu_attach_pasid_table);
+
+void iommu_detach_pasid_table(struct iommu_domain *domain)
+{
+	if (unlikely(!domain->ops->detach_pasid_table))
+		return;
+
+	domain->ops->detach_pasid_table(domain);
+}
+EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
+
 static void __iommu_detach_device(struct iommu_domain *domain,
 				  struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 950347b..d3edb10 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -264,6 +264,8 @@ struct page_response_msg {
  * @sva_unbind: Unbind process address space from device
  * @sva_get_pasid: Get PASID associated to a SVA handle
  * @page_response: handle page request response
+ * @attach_pasid_table: attach a pasid table
+ * @detach_pasid_table: detach the pasid table
  * @pgsize_bitmap: bitmap of all possible supported page sizes
  */
 struct iommu_ops {
@@ -323,6 +325,9 @@ struct iommu_ops {
 				      void *drvdata);
 	void (*sva_unbind)(struct iommu_sva *handle);
 	int (*sva_get_pasid)(struct iommu_sva *handle);
+	int (*attach_pasid_table)(struct iommu_domain *domain,
+				  struct iommu_pasid_table_config *cfg);
+	void (*detach_pasid_table)(struct iommu_domain *domain);
 
 	int (*page_response)(struct device *dev, struct page_response_msg *msg);
 
@@ -434,6 +439,9 @@ extern int iommu_attach_device(struct iommu_domain *domain,
 			       struct device *dev);
 extern void iommu_detach_device(struct iommu_domain *domain,
 				struct device *dev);
+extern int iommu_attach_pasid_table(struct iommu_domain *domain,
+				    struct iommu_pasid_table_config *cfg);
+extern void iommu_detach_pasid_table(struct iommu_domain *domain);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
@@ -947,6 +955,13 @@ iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev)
 	return -ENODEV;
 }
 
+static inline
+int iommu_attach_pasid_table(struct iommu_domain *domain,
+			     struct iommu_pasid_table_config *cfg)
+{
+	return -ENODEV;
+}
+
 static inline struct iommu_sva *
 iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
 {
@@ -968,6 +983,9 @@ static inline int iommu_sva_get_pasid(struct iommu_sva *handle)
 	return IOMMU_PASID_INVALID;
 }
 
+static inline
+void iommu_detach_pasid_table(struct iommu_domain *domain) {}
+
 #endif /* CONFIG_IOMMU_API */
 
 #ifdef CONFIG_IOMMU_DEBUGFS
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index aaa3b6a..3976767 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -115,4 +115,56 @@ struct iommu_fault {
 		struct iommu_fault_page_request prm;
 	};
 };
+
+/**
+ * struct iommu_pasid_smmuv3 - ARM SMMUv3 Stream Table Entry stage 1 related
+ *     information
+ * @version: API version of this structure
+ * @s1fmt: STE s1fmt (format of the CD table: single CD, linear table
+ *         or 2-level table)
+ * @s1dss: STE s1dss (specifies the behavior when @pasid_bits != 0
+ *         and no PASID is passed along with the incoming transaction)
+ * @padding: reserved for future use (should be zero)
+ *
+ * The PASID table is referred to as the Context Descriptor (CD) table on ARM
+ * SMMUv3. Please refer to the ARM SMMU 3.x spec (ARM IHI 0070A) for full
+ * details.
+ */
+struct iommu_pasid_smmuv3 {
+#define PASID_TABLE_SMMUV3_CFG_VERSION_1 1
+	__u32	version;
+	__u8	s1fmt;
+	__u8	s1dss;
+	__u8	padding[2];
+};
+
+/**
+ * struct iommu_pasid_table_config - PASID table data used to bind guest PASID
+ *     table to the host IOMMU
+ * @version: API version to prepare for future extensions
+ * @format: format of the PASID table
+ * @base_ptr: guest physical address of the PASID table
+ * @pasid_bits: number of PASID bits used in the PASID table
+ * @config: indicates whether the guest translation stage must
+ *          be translated, bypassed or aborted.
+ * @padding: reserved for future use (should be zero)
+ * @smmuv3: table information when @format is %IOMMU_PASID_FORMAT_SMMUV3
+ */
+struct iommu_pasid_table_config {
+#define PASID_TABLE_CFG_VERSION_1 1
+	__u32	version;
+#define IOMMU_PASID_FORMAT_SMMUV3	1
+	__u32	format;
+	__u64	base_ptr;
+	__u8	pasid_bits;
+#define IOMMU_PASID_CONFIG_TRANSLATE	1
+#define IOMMU_PASID_CONFIG_BYPASS	2
+#define IOMMU_PASID_CONFIG_ABORT	3
+	__u8	config;
+	__u8    padding[6];
+	union {
+		struct iommu_pasid_smmuv3 smmuv3;
+	};
+};
+
 #endif /* _UAPI_IOMMU_H */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 09/22] iommu: Introduce cache_invalidate API
  2019-06-09 13:44 [PATCH v4 00/22] Shared virtual address IOMMU and VT-d support Jacob Pan
                   ` (7 preceding siblings ...)
  2019-06-09 13:44 ` [PATCH v4 08/22] iommu: Introduce attach/detach_pasid_table API Jacob Pan
@ 2019-06-09 13:44 ` Jacob Pan
  2019-06-18 15:41   ` Jonathan Cameron
  2019-06-09 13:44 ` [PATCH v4 10/22] iommu: Fix compile error without IOMMU_API Jacob Pan
                   ` (12 subsequent siblings)
  21 siblings, 1 reply; 64+ messages in thread
From: Jacob Pan @ 2019-06-09 13:44 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan

From: Liu Yi L <yi.l.liu@intel.com>

In any virtualization use case, when the first translation stage
is "owned" by the guest OS, the host IOMMU driver has no knowledge
of caching structure updates unless the guest invalidation activities
are trapped by the virtualizer and passed down to the host.

Since the invalidation data are obtained from user space and will be
written into physical IOMMU, we must allow security check at various
layers. Therefore, generic invalidation data format are proposed here,
model specific IOMMU drivers need to convert them into their own format.

Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 drivers/iommu/iommu.c      |  10 +++++
 include/linux/iommu.h      |  14 ++++++
 include/uapi/linux/iommu.h | 110 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 134 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 4496ccd..1758b57 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1638,6 +1638,16 @@ void iommu_detach_pasid_table(struct iommu_domain *domain)
 }
 EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
 
+int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
+			   struct iommu_cache_invalidate_info *inv_info)
+{
+	if (unlikely(!domain->ops->cache_invalidate))
+		return -ENODEV;
+
+	return domain->ops->cache_invalidate(domain, dev, inv_info);
+}
+EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
+
 static void __iommu_detach_device(struct iommu_domain *domain,
 				  struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d3edb10..7a37336 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -266,6 +266,7 @@ struct page_response_msg {
  * @page_response: handle page request response
  * @attach_pasid_table: attach a pasid table
  * @detach_pasid_table: detach the pasid table
+ * @cache_invalidate: invalidate translation caches
  * @pgsize_bitmap: bitmap of all possible supported page sizes
  */
 struct iommu_ops {
@@ -330,6 +331,8 @@ struct iommu_ops {
 	void (*detach_pasid_table)(struct iommu_domain *domain);
 
 	int (*page_response)(struct device *dev, struct page_response_msg *msg);
+	int (*cache_invalidate)(struct iommu_domain *domain, struct device *dev,
+				struct iommu_cache_invalidate_info *inv_info);
 
 	unsigned long pgsize_bitmap;
 };
@@ -442,6 +445,9 @@ extern void iommu_detach_device(struct iommu_domain *domain,
 extern int iommu_attach_pasid_table(struct iommu_domain *domain,
 				    struct iommu_pasid_table_config *cfg);
 extern void iommu_detach_pasid_table(struct iommu_domain *domain);
+extern int iommu_cache_invalidate(struct iommu_domain *domain,
+				  struct device *dev,
+				  struct iommu_cache_invalidate_info *inv_info);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
@@ -986,6 +992,14 @@ static inline int iommu_sva_get_pasid(struct iommu_sva *handle)
 static inline
 void iommu_detach_pasid_table(struct iommu_domain *domain) {}
 
+static inline int
+iommu_cache_invalidate(struct iommu_domain *domain,
+		       struct device *dev,
+		       struct iommu_cache_invalidate_info *inv_info)
+{
+	return -ENODEV;
+}
+
 #endif /* CONFIG_IOMMU_API */
 
 #ifdef CONFIG_IOMMU_DEBUGFS
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index 3976767..ca4b753 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -167,4 +167,114 @@ struct iommu_pasid_table_config {
 	};
 };
 
+/* defines the granularity of the invalidation */
+enum iommu_inv_granularity {
+	IOMMU_INV_GRANU_DOMAIN,	/* domain-selective invalidation */
+	IOMMU_INV_GRANU_PASID,	/* PASID-selective invalidation */
+	IOMMU_INV_GRANU_ADDR,	/* page-selective invalidation */
+	IOMMU_INV_GRANU_NR,	/* number of invalidation granularities */
+};
+
+/**
+ * struct iommu_inv_addr_info - Address Selective Invalidation Structure
+ *
+ * @flags: indicates the granularity of the address-selective invalidation
+ * - If the PASID bit is set, the @pasid field is populated and the invalidation
+ *   relates to cache entries tagged with this PASID and matching the address
+ *   range.
+ * - If ARCHID bit is set, @archid is populated and the invalidation relates
+ *   to cache entries tagged with this architecture specific ID and matching
+ *   the address range.
+ * - Both PASID and ARCHID can be set as they may tag different caches.
+ * - If neither PASID or ARCHID is set, global addr invalidation applies.
+ * - The LEAF flag indicates whether only the leaf PTE caching needs to be
+ *   invalidated and other paging structure caches can be preserved.
+ * @pasid: process address space ID
+ * @archid: architecture-specific ID
+ * @addr: first stage/level input address
+ * @granule_size: page/block size of the mapping in bytes
+ * @nb_granules: number of contiguous granules to be invalidated
+ */
+struct iommu_inv_addr_info {
+#define IOMMU_INV_ADDR_FLAGS_PASID	(1 << 0)
+#define IOMMU_INV_ADDR_FLAGS_ARCHID	(1 << 1)
+#define IOMMU_INV_ADDR_FLAGS_LEAF	(1 << 2)
+	__u32	flags;
+	__u32	archid;
+	__u64	pasid;
+	__u64	addr;
+	__u64	granule_size;
+	__u64	nb_granules;
+};
+
+/**
+ * struct iommu_inv_pasid_info - PASID Selective Invalidation Structure
+ *
+ * @flags: indicates the granularity of the PASID-selective invalidation
+ * - If the PASID bit is set, the @pasid field is populated and the invalidation
+ *   relates to cache entries tagged with this PASID and matching the address
+ *   range.
+ * - If the ARCHID bit is set, the @archid is populated and the invalidation
+ *   relates to cache entries tagged with this architecture specific ID and
+ *   matching the address range.
+ * - Both PASID and ARCHID can be set as they may tag different caches.
+ * - At least one of PASID or ARCHID must be set.
+ * @pasid: process address space ID
+ * @archid: architecture-specific ID
+ */
+struct iommu_inv_pasid_info {
+#define IOMMU_INV_PASID_FLAGS_PASID	(1 << 0)
+#define IOMMU_INV_PASID_FLAGS_ARCHID	(1 << 1)
+	__u32	flags;
+	__u32	archid;
+	__u64	pasid;
+};
+
+/**
+ * struct iommu_cache_invalidate_info - First level/stage invalidation
+ *     information
+ * @version: API version of this structure
+ * @cache: bitfield that allows to select which caches to invalidate
+ * @granularity: defines the lowest granularity used for the invalidation:
+ *     domain > PASID > addr
+ * @padding: reserved for future use (should be zero)
+ * @pasid_info: invalidation data when @granularity is %IOMMU_INV_GRANU_PASID
+ * @addr_info: invalidation data when @granularity is %IOMMU_INV_GRANU_ADDR
+ *
+ * Not all the combinations of cache/granularity are valid:
+ *
+ * +--------------+---------------+---------------+---------------+
+ * | type /       |   DEV_IOTLB   |     IOTLB     |      PASID    |
+ * | granularity  |               |               |      cache    |
+ * +==============+===============+===============+===============+
+ * | DOMAIN       |       N/A     |       Y       |       Y       |
+ * +--------------+---------------+---------------+---------------+
+ * | PASID        |       Y       |       Y       |       Y       |
+ * +--------------+---------------+---------------+---------------+
+ * | ADDR         |       Y       |       Y       |       N/A     |
+ * +--------------+---------------+---------------+---------------+
+ *
+ * Invalidations by %IOMMU_INV_GRANU_DOMAIN don't take any argument other than
+ * @version and @cache.
+ *
+ * If multiple cache types are invalidated simultaneously, they all
+ * must support the used granularity.
+ */
+struct iommu_cache_invalidate_info {
+#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
+	__u32	version;
+/* IOMMU paging structure cache */
+#define IOMMU_CACHE_INV_TYPE_IOTLB	(1 << 0) /* IOMMU IOTLB */
+#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB	(1 << 1) /* Device IOTLB */
+#define IOMMU_CACHE_INV_TYPE_PASID	(1 << 2) /* PASID cache */
+#define IOMMU_CACHE_INV_TYPE_NR		(3)
+	__u8	cache;
+	__u8	granularity;
+	__u8	padding[2];
+	union {
+		struct iommu_inv_pasid_info pasid_info;
+		struct iommu_inv_addr_info addr_info;
+	};
+};
+
 #endif /* _UAPI_IOMMU_H */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 10/22] iommu: Fix compile error without IOMMU_API
  2019-06-09 13:44 [PATCH v4 00/22] Shared virtual address IOMMU and VT-d support Jacob Pan
                   ` (8 preceding siblings ...)
  2019-06-09 13:44 ` [PATCH v4 09/22] iommu: Introduce cache_invalidate API Jacob Pan
@ 2019-06-09 13:44 ` Jacob Pan
  2019-06-18 14:10   ` Jonathan Cameron
  2019-06-09 13:44 ` [PATCH v4 11/22] iommu: Introduce guest PASID bind function Jacob Pan
                   ` (11 subsequent siblings)
  21 siblings, 1 reply; 64+ messages in thread
From: Jacob Pan @ 2019-06-09 13:44 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan

struct page_response_msg needs to be defined outside CONFIG_IOMMU_API.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 include/linux/iommu.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 7a37336..8d766a8 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -189,8 +189,6 @@ struct iommu_sva_ops {
 	iommu_mm_exit_handler_t mm_exit;
 };
 
-#ifdef CONFIG_IOMMU_API
-
 /**
  * enum page_response_code - Return status of fault handlers, telling the IOMMU
  * driver how to proceed with the fault.
@@ -227,6 +225,7 @@ struct page_response_msg {
 	u64 iommu_data;
 };
 
+#ifdef CONFIG_IOMMU_API
 /**
  * struct iommu_ops - iommu ops and capabilities
  * @capable: check capability
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 11/22] iommu: Introduce guest PASID bind function
  2019-06-09 13:44 [PATCH v4 00/22] Shared virtual address IOMMU and VT-d support Jacob Pan
                   ` (9 preceding siblings ...)
  2019-06-09 13:44 ` [PATCH v4 10/22] iommu: Fix compile error without IOMMU_API Jacob Pan
@ 2019-06-09 13:44 ` Jacob Pan
  2019-06-18 15:36   ` Jean-Philippe Brucker
  2019-07-16 16:44   ` Auger Eric
  2019-06-09 13:44 ` [PATCH v4 12/22] iommu: Add I/O ASID allocator Jacob Pan
                   ` (10 subsequent siblings)
  21 siblings, 2 replies; 64+ messages in thread
From: Jacob Pan @ 2019-06-09 13:44 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan

Guest shared virtual address (SVA) may require host to shadow guest
PASID tables. Guest PASID can also be allocated from the host via
enlightened interfaces. In this case, guest needs to bind the guest
mm, i.e. cr3 in guest physical address to the actual PASID table in
the host IOMMU. Nesting will be turned on such that guest virtual
address can go through a two level translation:
- 1st level translates GVA to GPA
- 2nd level translates GPA to HPA
This patch introduces APIs to bind guest PASID data to the assigned
device entry in the physical IOMMU. See the diagram below for usage
explaination.

    .-------------.  .---------------------------.
    |   vIOMMU    |  | Guest process mm, FL only |
    |             |  '---------------------------'
    .----------------/
    | PASID Entry |--- PASID cache flush -
    '-------------'                       |
    |             |                       V
    |             |                      GP
    '-------------'
Guest
------| Shadow |----------------------- GP->HP* ---------
      v        v                          |
Host                                      v
    .-------------.  .----------------------.
    |   pIOMMU    |  | Bind FL for GVA-GPA  |
    |             |  '----------------------'
    .----------------/  |
    | PASID Entry |     V (Nested xlate)
    '----------------\.---------------------.
    |             |   |Set SL to GPA-HPA    |
    |             |   '---------------------'
    '-------------'

Where:
 - FL = First level/stage one page tables
 - SL = Second level/stage two page tables
 - GP = Guest PASID
 - HP = Host PASID
* Conversion needed if non-identity GP-HP mapping option is chosen.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
---
 drivers/iommu/iommu.c      | 20 ++++++++++++++++
 include/linux/iommu.h      | 21 +++++++++++++++++
 include/uapi/linux/iommu.h | 58 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 99 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 1758b57..d0416f60 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1648,6 +1648,26 @@ int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
 }
 EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
 
+int iommu_sva_bind_gpasid(struct iommu_domain *domain,
+			struct device *dev, struct gpasid_bind_data *data)
+{
+	if (unlikely(!domain->ops->sva_bind_gpasid))
+		return -ENODEV;
+
+	return domain->ops->sva_bind_gpasid(domain, dev, data);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_bind_gpasid);
+
+int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
+			ioasid_t pasid)
+{
+	if (unlikely(!domain->ops->sva_unbind_gpasid))
+		return -ENODEV;
+
+	return domain->ops->sva_unbind_gpasid(dev, pasid);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid);
+
 static void __iommu_detach_device(struct iommu_domain *domain,
 				  struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 8d766a8..560c8c8 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -25,6 +25,7 @@
 #include <linux/errno.h>
 #include <linux/err.h>
 #include <linux/of.h>
+#include <linux/ioasid.h>
 #include <uapi/linux/iommu.h>
 
 #define IOMMU_READ	(1 << 0)
@@ -267,6 +268,8 @@ struct page_response_msg {
  * @detach_pasid_table: detach the pasid table
  * @cache_invalidate: invalidate translation caches
  * @pgsize_bitmap: bitmap of all possible supported page sizes
+ * @sva_bind_gpasid: bind guest pasid and mm
+ * @sva_unbind_gpasid: unbind guest pasid and mm
  */
 struct iommu_ops {
 	bool (*capable)(enum iommu_cap);
@@ -332,6 +335,10 @@ struct iommu_ops {
 	int (*page_response)(struct device *dev, struct page_response_msg *msg);
 	int (*cache_invalidate)(struct iommu_domain *domain, struct device *dev,
 				struct iommu_cache_invalidate_info *inv_info);
+	int (*sva_bind_gpasid)(struct iommu_domain *domain,
+			struct device *dev, struct gpasid_bind_data *data);
+
+	int (*sva_unbind_gpasid)(struct device *dev, int pasid);
 
 	unsigned long pgsize_bitmap;
 };
@@ -447,6 +454,10 @@ extern void iommu_detach_pasid_table(struct iommu_domain *domain);
 extern int iommu_cache_invalidate(struct iommu_domain *domain,
 				  struct device *dev,
 				  struct iommu_cache_invalidate_info *inv_info);
+extern int iommu_sva_bind_gpasid(struct iommu_domain *domain,
+		struct device *dev, struct gpasid_bind_data *data);
+extern int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
+				struct device *dev, ioasid_t pasid);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
@@ -998,6 +1009,16 @@ iommu_cache_invalidate(struct iommu_domain *domain,
 {
 	return -ENODEV;
 }
+static inline int iommu_sva_bind_gpasid(struct iommu_domain *domain,
+				struct device *dev, struct gpasid_bind_data *data)
+{
+	return -ENODEV;
+}
+
+static inline int sva_unbind_gpasid(struct device *dev, int pasid)
+{
+	return -ENODEV;
+}
 
 #endif /* CONFIG_IOMMU_API */
 
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index ca4b753..a9cdc63 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -277,4 +277,62 @@ struct iommu_cache_invalidate_info {
 	};
 };
 
+/**
+ * struct gpasid_bind_data_vtd - Intel VT-d specific data on device and guest
+ * SVA binding.
+ *
+ * @flags:	VT-d PASID table entry attributes
+ * @pat:	Page attribute table data to compute effective memory type
+ * @emt:	Extended memory type
+ *
+ * Only guest vIOMMU selectable and effective options are passed down to
+ * the host IOMMU.
+ */
+struct gpasid_bind_data_vtd {
+#define IOMMU_SVA_VTD_GPASID_SRE	(1 << 0) /* supervisor request */
+#define IOMMU_SVA_VTD_GPASID_EAFE	(1 << 1) /* extended access enable */
+#define IOMMU_SVA_VTD_GPASID_PCD	(1 << 2) /* page-level cache disable */
+#define IOMMU_SVA_VTD_GPASID_PWT	(1 << 3) /* page-level write through */
+#define IOMMU_SVA_VTD_GPASID_EMTE	(1 << 4) /* extended mem type enable */
+#define IOMMU_SVA_VTD_GPASID_CD		(1 << 5) /* PASID-level cache disable */
+	__u64 flags;
+	__u32 pat;
+	__u32 emt;
+};
+
+/**
+ * struct gpasid_bind_data - Information about device and guest PASID binding
+ * @version:	Version of this data structure
+ * @format:	PASID table entry format
+ * @flags:	Additional information on guest bind request
+ * @gpgd:	Guest page directory base of the guest mm to bind
+ * @hpasid:	Process address space ID used for the guest mm in host IOMMU
+ * @gpasid:	Process address space ID used for the guest mm in guest IOMMU
+ * @addr_width:	Guest virtual address width
+ * @vtd:	Intel VT-d specific data
+ *
+ * Guest to host PASID mapping can be an identity or non-identity, where guest
+ * has its own PASID space. For non-identify mapping, guest to host PASID lookup
+ * is needed when VM programs guest PASID into an assigned device. VMM may
+ * trap such PASID programming then request host IOMMU driver to convert guest
+ * PASID to host PASID based on this bind data.
+ */
+struct gpasid_bind_data {
+#define IOMMU_GPASID_BIND_VERSION_1	1
+	__u32 version;
+#define IOMMU_PASID_FORMAT_INTEL_VTD	1
+	__u32 format;
+#define IOMMU_SVA_GPASID_VAL	(1 << 0) /* guest PASID valid */
+	__u64 flags;
+	__u64 gpgd;
+	__u64 hpasid;
+	__u64 gpasid;
+	__u32 addr_width;
+	__u8  padding[4];
+	/* Vendor specific data */
+	union {
+		struct gpasid_bind_data_vtd vtd;
+	};
+};
+
 #endif /* _UAPI_IOMMU_H */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 12/22] iommu: Add I/O ASID allocator
  2019-06-09 13:44 [PATCH v4 00/22] Shared virtual address IOMMU and VT-d support Jacob Pan
                   ` (10 preceding siblings ...)
  2019-06-09 13:44 ` [PATCH v4 11/22] iommu: Introduce guest PASID bind function Jacob Pan
@ 2019-06-09 13:44 ` Jacob Pan
  2019-06-18 16:50   ` Jonathan Cameron
  2019-06-09 13:44 ` [PATCH v4 13/22] iommu/vt-d: Enlightened PASID allocation Jacob Pan
                   ` (9 subsequent siblings)
  21 siblings, 1 reply; 64+ messages in thread
From: Jacob Pan @ 2019-06-09 13:44 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan

From: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>

Some devices might support multiple DMA address spaces, in particular
those that have the PCI PASID feature. PASID (Process Address Space ID)
allows to share process address spaces with devices (SVA), partition a
device into VM-assignable entities (VFIO mdev) or simply provide
multiple DMA address space to kernel drivers. Add a global PASID
allocator usable by different drivers at the same time. Name it I/O ASID
to avoid confusion with ASIDs allocated by arch code, which are usually
a separate ID space.

The IOASID space is global. Each device can have its own PASID space,
but by convention the IOMMU ended up having a global PASID space, so
that with SVA, each mm_struct is associated to a single PASID.

The allocator is primarily used by IOMMU subsystem but in rare occasions
drivers would like to allocate PASIDs for devices that aren't managed by
an IOMMU, using the same ID space as IOMMU.

There are two types of allocators:
1. default allocator - Always available, uses an XArray to track
2. custom allocators - Can be registered at runtime, take precedence
over the default allocator.

Custom allocators have these attributes:
- provides platform specific alloc()/free() functions with private data.
- allocation results lookup are not provided by the allocator, lookup
  request must be done by the IOASID framework by its own XArray.
- allocators can be unregistered at runtime, either fallback to the next
  custom allocator or to the default allocator.
- custom allocators can share the same set of alloc()/free() helpers, in
  this case they also share the same IOASID space, thus the same XArray.
- switching between allocators requires all outstanding IOASIDs to be
  freed unless the two allocators share the same alloc()/free() helpers.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Link: https://lkml.org/lkml/2019/4/26/462
---
 drivers/iommu/Kconfig  |   8 +
 drivers/iommu/Makefile |   1 +
 drivers/iommu/ioasid.c | 427 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/ioasid.h |  74 +++++++++
 4 files changed, 510 insertions(+)
 create mode 100644 drivers/iommu/ioasid.c
 create mode 100644 include/linux/ioasid.h

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 83664db..c40c4b5 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -3,6 +3,13 @@
 config IOMMU_IOVA
 	tristate
 
+# The IOASID allocator may also be used by non-IOMMU_API users
+config IOASID
+	tristate
+	help
+	  Enable the I/O Address Space ID allocator. A single ID space shared
+	  between different users.
+
 # IOMMU_API always gets selected by whoever wants it.
 config IOMMU_API
 	bool
@@ -207,6 +214,7 @@ config INTEL_IOMMU_SVM
 	depends on INTEL_IOMMU && X86
 	select PCI_PASID
 	select MMU_NOTIFIER
+	select IOASID
 	help
 	  Shared Virtual Memory (SVM) provides a facility for devices
 	  to access DMA resources through process address space by
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 8c71a15..0efac6f 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -7,6 +7,7 @@ obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
+obj-$(CONFIG_IOASID) += ioasid.o
 obj-$(CONFIG_IOMMU_IOVA) += iova.o
 obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
 obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
new file mode 100644
index 0000000..0919b70
--- /dev/null
+++ b/drivers/iommu/ioasid.c
@@ -0,0 +1,427 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * I/O Address Space ID allocator. There is one global IOASID space, split into
+ * subsets. Users create a subset with DECLARE_IOASID_SET, then allocate and
+ * free IOASIDs with ioasid_alloc and ioasid_free.
+ */
+#define pr_fmt(fmt)	KBUILD_MODNAME ": " fmt
+
+#include <linux/xarray.h>
+#include <linux/ioasid.h>
+#include <linux/slab.h>
+#include <linux/module.h>
+#include <linux/spinlock.h>
+
+struct ioasid_data {
+	ioasid_t id;
+	struct ioasid_set *set;
+	void *private;
+	struct rcu_head rcu;
+};
+
+/*
+ * struct ioasid_allocator_data - Internal data structure to hold information
+ * about an allocator. There are two types of allocators:
+ *
+ * - Default allocator always has its own XArray to track the IOASIDs allocated.
+ * - Custom allocators may share allocation helpers with different private data.
+ *   Custom allocators share the same helper functions also share the same
+ *   XArray.
+ * Rules:
+ * 1. Default allocator is always available, not dynamically registered. This is
+ *    to prevent race conditions with early boot code that want to register
+ *    custom allocators or allocate IOASIDs.
+ * 2. Custom allocators take precedence over the default allocator.
+ * 3. When all custom allocators sharing the same helper functions are
+ *    unregistered (e.g. due to hotplug), all outstanding IOASIDs must be
+ *    freed.
+ * 4. When switching between custom allocators sharing the same helper
+ *    functions, outstanding IOASIDs are preserved.
+ * 5. When switching between custom allocator and default allocator, all IOASIDs
+ *    must be freed to ensure unadulterated space for the new allocator.
+ *
+ * @ops:	allocator helper functions and its data
+ * @list:	registered custom allocators
+ * @slist:	allocators share the same ops but different data
+ * @flags:	attributes of the allocator
+ * @users	number of allocators sharing the same ops and XArray
+ * @xa		xarray holds the IOASID space
+ */
+struct ioasid_allocator_data {
+	struct ioasid_allocator_ops *ops;
+	struct list_head list;
+	struct list_head slist;
+#define IOASID_ALLOCATOR_CUSTOM BIT(0) /* Needs framework to track results */
+	unsigned long flags;
+	struct xarray xa;
+	refcount_t users;
+};
+
+static DEFINE_MUTEX(ioasid_allocator_lock);
+static LIST_HEAD(allocators_list);
+static ioasid_t default_alloc(ioasid_t min, ioasid_t max, void *opaque);
+static void default_free(ioasid_t ioasid, void *opaque);
+
+static struct ioasid_allocator_ops default_ops = {
+	.alloc = default_alloc,
+	.free = default_free
+};
+
+static struct ioasid_allocator_data default_allocator = {
+	.ops = &default_ops,
+	.flags = 0,
+	.xa = XARRAY_INIT(ioasid_xa, XA_FLAGS_ALLOC)
+};
+
+static struct ioasid_allocator_data *active_allocator = &default_allocator;
+
+static ioasid_t default_alloc(ioasid_t min, ioasid_t max, void *opaque)
+{
+	ioasid_t id;
+
+	if (xa_alloc(&default_allocator.xa, &id, opaque, XA_LIMIT(min, max), GFP_KERNEL)) {
+		pr_err("Failed to alloc ioasid from %d to %d\n", min, max);
+		return INVALID_IOASID;
+	}
+
+	return id;
+}
+
+void default_free(ioasid_t ioasid, void *opaque)
+{
+	struct ioasid_data *ioasid_data;
+
+	ioasid_data = xa_erase(&default_allocator.xa, ioasid);
+	kfree_rcu(ioasid_data, rcu);
+}
+
+/* Allocate and initialize a new custom allocator with its helper functions */
+static inline struct ioasid_allocator_data *ioasid_alloc_allocator(struct ioasid_allocator_ops *ops)
+{
+	struct ioasid_allocator_data *ia_data;
+
+	ia_data = kzalloc(sizeof(*ia_data), GFP_KERNEL);
+	if (!ia_data)
+		return NULL;
+
+	xa_init_flags(&ia_data->xa, XA_FLAGS_ALLOC);
+	INIT_LIST_HEAD(&ia_data->slist);
+	ia_data->flags |= IOASID_ALLOCATOR_CUSTOM;
+	ia_data->ops = ops;
+
+	/* For tracking custom allocators that share the same ops */
+	list_add_tail(&ops->list, &ia_data->slist);
+	refcount_set(&ia_data->users, 1);
+
+	return ia_data;
+}
+
+static inline bool use_same_ops(struct ioasid_allocator_ops *a, struct ioasid_allocator_ops *b)
+{
+	return (a->free == b->free) && (a->alloc == b->alloc);
+}
+
+/**
+ * ioasid_register_allocator - register a custom allocator
+ * @ops: the custom allocator ops to be registered
+ *
+ * Custom allocators take precedence over the default xarray based allocator.
+ * Private data associated with the IOASID allocated by the custom allocators
+ * are managed by IOASID framework similar to data stored in xa by default
+ * allocator.
+ *
+ * There can be multiple allocators registered but only one is active. In case
+ * of runtime removal of a custom allocator, the next one is activated based
+ * on the registration ordering.
+ *
+ * Multiple allocators can share the same alloc() function, in this case the
+ * IOASID space is shared.
+ *
+ */
+int ioasid_register_allocator(struct ioasid_allocator_ops *ops)
+{
+	struct ioasid_allocator_data *ia_data;
+	struct ioasid_allocator_data *pallocator;
+	int ret = 0;
+
+	mutex_lock(&ioasid_allocator_lock);
+
+	ia_data = ioasid_alloc_allocator(ops);
+	if (!ia_data) {
+		ret = -ENOMEM;
+		goto out_unlock;
+	}
+
+	/*
+	 * No particular preference, we activate the first one and keep
+	 * the later registered allocators in a list in case the first one gets
+	 * removed due to hotplug.
+	 */
+	if (list_empty(&allocators_list)) {
+		WARN_ON(active_allocator != &default_allocator);
+		/* Use this new allocator if default is not active */
+		if (xa_empty(&active_allocator->xa)) {
+			active_allocator = ia_data;
+			list_add_tail(&ia_data->list, &allocators_list);
+			goto out_unlock;
+		}
+		pr_warn("Default allocator active with outstanding IOASID\n");
+		ret = -EAGAIN;
+		goto out_free;
+	}
+
+	/* Check if the allocator is already registered */
+	list_for_each_entry(pallocator, &allocators_list, list) {
+		if (pallocator->ops == ops) {
+			pr_err("IOASID allocator already registered\n");
+			ret = -EEXIST;
+			goto out_free;
+		} else if (use_same_ops(pallocator->ops, ops)) {
+			/*
+			 * If the new allocator shares the same ops,
+			 * then they will share the same IOASID space.
+			 * We should put them under the same xarray.
+			 */
+			list_add_tail(&ops->list, &pallocator->slist);
+			refcount_inc(&pallocator->users);
+			pr_info("New IOASID allocator ops shared %u times\n",
+				refcount_read(&pallocator->users));
+			goto out_free;
+		}
+	}
+	list_add_tail(&ia_data->list, &allocators_list);
+
+	goto out_unlock;
+
+out_free:
+	kfree(ia_data);
+out_unlock:
+	mutex_unlock(&ioasid_allocator_lock);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ioasid_register_allocator);
+
+static inline bool has_shared_ops(struct ioasid_allocator_data *allocator)
+{
+	return refcount_read(&allocator->users) > 1;
+}
+
+/**
+ * ioasid_unregister_allocator - Remove a custom IOASID allocator ops
+ * @ops: the custom allocator to be removed
+ *
+ * Remove an allocator from the list, activate the next allocator in
+ * the order it was registered. Or revert to default allocator if all
+ * custom allocators are unregistered without outstanding IOASIDs.
+ */
+void ioasid_unregister_allocator(struct ioasid_allocator_ops *ops)
+{
+	struct ioasid_allocator_data *pallocator;
+	struct ioasid_allocator_ops *sops;
+
+	mutex_lock(&ioasid_allocator_lock);
+	if (list_empty(&allocators_list)) {
+		pr_warn("No custom IOASID allocators active!\n");
+		goto exit_unlock;
+	}
+
+	list_for_each_entry(pallocator, &allocators_list, list) {
+		if (use_same_ops(pallocator->ops, ops)) {
+			if (refcount_read(&pallocator->users) == 1) {
+				/* No shared helper functions */
+				list_del(&pallocator->list);
+				/*
+				 * All IOASIDs should have been freed before
+				 * the last allocator that shares the same ops
+				 * is unregistered.
+				 */
+				WARN_ON(!xa_empty(&pallocator->xa));
+				kfree(pallocator);
+				if (list_empty(&allocators_list)) {
+					pr_info("No custom IOASID allocators, switch to default.\n");
+					active_allocator = &default_allocator;
+				} else if (pallocator == active_allocator) {
+					active_allocator = list_entry(&allocators_list, struct ioasid_allocator_data, list);
+					pr_info("IOASID allocator changed");
+				}
+				break;
+			}
+			/*
+			 * Find the matching shared ops to delete,
+			 * but keep outstanding IOASIDs
+			 */
+			list_for_each_entry(sops, &pallocator->slist, list) {
+				if (sops == ops) {
+					list_del(&ops->list);
+					if (refcount_dec_and_test(&pallocator->users))
+						pr_err("no shared ops\n");
+					break;
+				}
+			}
+			break;
+		}
+	}
+
+exit_unlock:
+	mutex_unlock(&ioasid_allocator_lock);
+}
+EXPORT_SYMBOL_GPL(ioasid_unregister_allocator);
+
+/**
+ * ioasid_set_data - Set private data for an allocated ioasid
+ * @ioasid: the ID to set data
+ * @data:   the private data
+ *
+ * For IOASID that is already allocated, private data can be set
+ * via this API. Future lookup can be done via ioasid_find.
+ */
+int ioasid_set_data(ioasid_t ioasid, void *data)
+{
+	struct ioasid_data *ioasid_data;
+	int ret = 0;
+
+	mutex_lock(&ioasid_allocator_lock);
+	ioasid_data = xa_load(&active_allocator->xa, ioasid);
+	if (ioasid_data)
+		ioasid_data->private = data;
+	else
+		ret = -ENOENT;
+	mutex_unlock(&ioasid_allocator_lock);
+
+	/* getter may use the private data */
+	synchronize_rcu();
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ioasid_set_data);
+
+/**
+ * ioasid_alloc - Allocate an IOASID
+ * @set: the IOASID set
+ * @min: the minimum ID (inclusive)
+ * @max: the maximum ID (inclusive)
+ * @private: data private to the caller
+ *
+ * Allocate an ID between @min and @max. Return the allocated ID on success,
+ * or INVALID_IOASID on failure. The @private pointer is stored internally
+ * and can be retrieved with ioasid_find().
+ */
+ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max,
+		      void *private)
+{
+	ioasid_t id = INVALID_IOASID;
+	struct ioasid_data *data;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data)
+		return INVALID_IOASID;
+
+	data->set = set;
+	data->private = private;
+
+	mutex_lock(&ioasid_allocator_lock);
+
+	id = active_allocator->ops->alloc(min, max, data);
+	if (id == INVALID_IOASID) {
+		pr_err("Failed ASID allocation %lu\n", active_allocator->flags);
+			mutex_unlock(&ioasid_allocator_lock);
+			goto exit_free;
+	}
+	if (active_allocator->flags & IOASID_ALLOCATOR_CUSTOM) {
+		/* Custom allocator needs framework to store and track allocation results */
+		min = id;
+		max = id + 1;
+
+		if (xa_alloc(&active_allocator->xa, &id, data, XA_LIMIT(min, max), GFP_KERNEL)) {
+			pr_err("Failed to alloc ioasid from %d to %d\n", min, max);
+			active_allocator->ops->free(id, NULL);
+			goto exit_free;
+		}
+	}
+	data->id = id;
+
+	mutex_unlock(&ioasid_allocator_lock);
+
+exit_free:
+	if (id == INVALID_IOASID) {
+		kfree(data);
+		return INVALID_IOASID;
+	}
+	return id;
+}
+EXPORT_SYMBOL_GPL(ioasid_alloc);
+
+/**
+ * ioasid_free - Free an IOASID
+ * @ioasid: the ID to remove
+ */
+void ioasid_free(ioasid_t ioasid)
+{
+	struct ioasid_data *ioasid_data;
+
+	mutex_lock(&ioasid_allocator_lock);
+
+	ioasid_data = xa_load(&active_allocator->xa, ioasid);
+	if (!ioasid_data) {
+		pr_err("Trying to free unknown IOASID %u\n", ioasid);
+		goto exit_unlock;
+	}
+
+	active_allocator->ops->free(ioasid, active_allocator->ops->pdata);
+	/* Custom allocator needs additional steps to free the xa */
+	if (active_allocator->flags & IOASID_ALLOCATOR_CUSTOM) {
+		ioasid_data = xa_erase(&active_allocator->xa, ioasid);
+		kfree_rcu(ioasid_data, rcu);
+	}
+
+exit_unlock:
+	mutex_unlock(&ioasid_allocator_lock);
+}
+EXPORT_SYMBOL_GPL(ioasid_free);
+
+/**
+ * ioasid_find - Find IOASID data
+ * @set: the IOASID set
+ * @ioasid: the IOASID to find
+ * @getter: function to call on the found object
+ *
+ * The optional getter function allows to take a reference to the found object
+ * under the rcu lock. The function can also check if the object is still valid:
+ * if @getter returns false, then the object is invalid and NULL is returned.
+ *
+ * If the IOASID has been allocated for this set, return the private pointer
+ * passed to ioasid_alloc. Private data can be NULL if not set. Return an error
+ * if the IOASID is not found or does not belong to the set.
+ */
+void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
+		  bool (*getter)(void *))
+{
+	void *priv = NULL;
+	struct ioasid_data *ioasid_data;
+
+	rcu_read_lock();
+	ioasid_data = xa_load(&active_allocator->xa, ioasid);
+	if (!ioasid_data) {
+		priv = ERR_PTR(-ENOENT);
+		goto unlock;
+	}
+	if (set && ioasid_data->set != set) {
+		/* data found but does not belong to the set */
+		priv = ERR_PTR(-EACCES);
+		goto unlock;
+	}
+	/* Now IOASID and its set is verified, we can return the private data */
+	priv = ioasid_data->private;
+	if (getter && !getter(priv))
+		priv = NULL;
+unlock:
+	rcu_read_unlock();
+
+	return priv;
+}
+EXPORT_SYMBOL_GPL(ioasid_find);
+
+MODULE_AUTHOR("Jean-Philippe Brucker <jean-philippe.brucker@arm.com>");
+MODULE_AUTHOR("Jacob Pan <jacob.jun.pan@linux.intel.com>");
+MODULE_DESCRIPTION("IO Address Space ID (IOASID) allocator");
+MODULE_LICENSE("GPL");
diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
new file mode 100644
index 0000000..8c8d1c5
--- /dev/null
+++ b/include/linux/ioasid.h
@@ -0,0 +1,74 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_IOASID_H
+#define __LINUX_IOASID_H
+
+#define INVALID_IOASID ((ioasid_t)-1)
+typedef unsigned int ioasid_t;
+typedef ioasid_t (*ioasid_alloc_fn_t)(ioasid_t min, ioasid_t max, void *data);
+typedef void (*ioasid_free_fn_t)(ioasid_t ioasid, void *data);
+
+struct ioasid_set {
+	int dummy;
+};
+
+/**
+ * struct ioasid_allocator_ops - IOASID allocator helper functions and data
+ *
+ * @alloc:	helper function to allocate IOASID
+ * @free:	helper function to free IOASID
+ * @list:	for tracking ops that share helper functions but not data
+ * @pdata:	data belong to the allocator, provided when calling alloc()
+ */
+struct ioasid_allocator_ops {
+	ioasid_alloc_fn_t alloc;
+	ioasid_free_fn_t free;
+	struct list_head list;
+	void *pdata;
+};
+
+#define DECLARE_IOASID_SET(name) struct ioasid_set name = { 0 }
+
+#if IS_ENABLED(CONFIG_IOASID)
+ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max,
+		      void *private);
+void ioasid_free(ioasid_t ioasid);
+
+void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
+			bool (*getter)(void *));
+int ioasid_register_allocator(struct ioasid_allocator_ops *allocator);
+void ioasid_unregister_allocator(struct ioasid_allocator_ops *allocator);
+int ioasid_set_data(ioasid_t ioasid, void *data);
+
+#else /* !CONFIG_IOASID */
+static inline ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min,
+				    ioasid_t max, void *private)
+{
+	return INVALID_IOASID;
+}
+
+static inline void ioasid_free(ioasid_t ioasid)
+{
+}
+
+static inline void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
+				bool (*getter)(void *))
+{
+	return NULL;
+}
+
+static inline int ioasid_register_allocator(struct ioasid_allocator_ops *allocator)
+{
+	return -ENODEV;
+}
+
+static inline void ioasid_unregister_allocator(struct ioasid_allocator_ops *allocator)
+{
+}
+
+static inline int ioasid_set_data(ioasid_t ioasid, void *data)
+{
+	return -ENODEV;
+}
+
+#endif /* CONFIG_IOASID */
+#endif /* __LINUX_IOASID_H */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 13/22] iommu/vt-d: Enlightened PASID allocation
  2019-06-09 13:44 [PATCH v4 00/22] Shared virtual address IOMMU and VT-d support Jacob Pan
                   ` (11 preceding siblings ...)
  2019-06-09 13:44 ` [PATCH v4 12/22] iommu: Add I/O ASID allocator Jacob Pan
@ 2019-06-09 13:44 ` Jacob Pan
  2019-07-16  9:29   ` Auger Eric
  2019-06-09 13:44 ` [PATCH v4 14/22] iommu/vt-d: Add custom allocator for IOASID Jacob Pan
                   ` (8 subsequent siblings)
  21 siblings, 1 reply; 64+ messages in thread
From: Jacob Pan @ 2019-06-09 13:44 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan

From: Lu Baolu <baolu.lu@linux.intel.com>

If Intel IOMMU runs in caching mode, a.k.a. virtual IOMMU, the
IOMMU driver should rely on the emulation software to allocate
and free PASID IDs. The Intel vt-d spec revision 3.0 defines a
register set to support this. This includes a capability register,
a virtual command register and a virtual response register. Refer
to section 10.4.42, 10.4.43, 10.4.44 for more information.

This patch adds the enlightened PASID allocation/free interfaces
via the virtual command register.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/intel-pasid.c | 76 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/intel-pasid.h | 13 +++++++-
 include/linux/intel-iommu.h |  2 ++
 3 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 2fefeaf..69fddd3 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -63,6 +63,82 @@ void *intel_pasid_lookup_id(int pasid)
 	return p;
 }
 
+int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid)
+{
+	u64 res;
+	u64 cap;
+	u8 status_code;
+	unsigned long flags;
+	int ret = 0;
+
+	if (!ecap_vcs(iommu->ecap)) {
+		pr_warn("IOMMU: %s: Hardware doesn't support virtual command\n",
+			iommu->name);
+		return -ENODEV;
+	}
+
+	cap = dmar_readq(iommu->reg + DMAR_VCCAP_REG);
+	if (!(cap & DMA_VCS_PAS)) {
+		pr_warn("IOMMU: %s: Emulation software doesn't support PASID allocation\n",
+			iommu->name);
+		return -ENODEV;
+	}
+
+	raw_spin_lock_irqsave(&iommu->register_lock, flags);
+	dmar_writeq(iommu->reg + DMAR_VCMD_REG, VCMD_CMD_ALLOC);
+	IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
+		      !(res & VCMD_VRSP_IP), res);
+	raw_spin_unlock_irqrestore(&iommu->register_lock, flags);
+
+	status_code = VCMD_VRSP_SC(res);
+	switch (status_code) {
+	case VCMD_VRSP_SC_SUCCESS:
+		*pasid = VCMD_VRSP_RESULT(res);
+		break;
+	case VCMD_VRSP_SC_NO_PASID_AVAIL:
+		pr_info("IOMMU: %s: No PASID available\n", iommu->name);
+		ret = -ENOMEM;
+		break;
+	default:
+		ret = -ENODEV;
+		pr_warn("IOMMU: %s: Unkonwn error code %d\n",
+			iommu->name, status_code);
+	}
+
+	return ret;
+}
+
+void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid)
+{
+	u64 res;
+	u8 status_code;
+	unsigned long flags;
+
+	if (!ecap_vcs(iommu->ecap)) {
+		pr_warn("IOMMU: %s: Hardware doesn't support virtual command\n",
+			iommu->name);
+		return;
+	}
+
+	raw_spin_lock_irqsave(&iommu->register_lock, flags);
+	dmar_writeq(iommu->reg + DMAR_VCMD_REG, (pasid << 8) | VCMD_CMD_FREE);
+	IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
+		      !(res & VCMD_VRSP_IP), res);
+	raw_spin_unlock_irqrestore(&iommu->register_lock, flags);
+
+	status_code = VCMD_VRSP_SC(res);
+	switch (status_code) {
+	case VCMD_VRSP_SC_SUCCESS:
+		break;
+	case VCMD_VRSP_SC_INVALID_PASID:
+		pr_info("IOMMU: %s: Invalid PASID\n", iommu->name);
+		break;
+	default:
+		pr_warn("IOMMU: %s: Unkonwn error code %d\n",
+			iommu->name, status_code);
+	}
+}
+
 /*
  * Per device pasid table management:
  */
diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
index 23537b3..4b26ab5 100644
--- a/drivers/iommu/intel-pasid.h
+++ b/drivers/iommu/intel-pasid.h
@@ -19,6 +19,16 @@
 #define PASID_PDE_SHIFT			6
 #define MAX_NR_PASID_BITS		20
 
+/* Virtual command interface for enlightened pasid management. */
+#define VCMD_CMD_ALLOC			0x1
+#define VCMD_CMD_FREE			0x2
+#define VCMD_VRSP_IP			0x1
+#define VCMD_VRSP_SC(e)			(((e) >> 1) & 0x3)
+#define VCMD_VRSP_SC_SUCCESS		0
+#define VCMD_VRSP_SC_NO_PASID_AVAIL	1
+#define VCMD_VRSP_SC_INVALID_PASID	1
+#define VCMD_VRSP_RESULT(e)		(((e) >> 8) & 0xfffff)
+
 /*
  * Domain ID reserved for pasid entries programmed for first-level
  * only and pass-through transfer modes.
@@ -69,5 +79,6 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
 				   struct device *dev, int pasid);
 void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
 				 struct device *dev, int pasid);
-
+int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid);
+void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid);
 #endif /* __INTEL_PASID_H */
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 6925a18..bff907b 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -173,6 +173,7 @@
 #define ecap_smpwc(e)		(((e) >> 48) & 0x1)
 #define ecap_flts(e)		(((e) >> 47) & 0x1)
 #define ecap_slts(e)		(((e) >> 46) & 0x1)
+#define ecap_vcs(e)		(((e) >> 44) & 0x1)
 #define ecap_smts(e)		(((e) >> 43) & 0x1)
 #define ecap_dit(e)		((e >> 41) & 0x1)
 #define ecap_pasid(e)		((e >> 40) & 0x1)
@@ -289,6 +290,7 @@
 
 /* PRS_REG */
 #define DMA_PRS_PPR	((u32)1)
+#define DMA_VCS_PAS	((u64)1)
 
 #define IOMMU_WAIT_OP(iommu, offset, op, cond, sts)			\
 do {									\
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 14/22] iommu/vt-d: Add custom allocator for IOASID
  2019-06-09 13:44 [PATCH v4 00/22] Shared virtual address IOMMU and VT-d support Jacob Pan
                   ` (12 preceding siblings ...)
  2019-06-09 13:44 ` [PATCH v4 13/22] iommu/vt-d: Enlightened PASID allocation Jacob Pan
@ 2019-06-09 13:44 ` Jacob Pan
  2019-07-16  9:30   ` Auger Eric
  2019-06-09 13:44 ` [PATCH v4 15/22] iommu/vt-d: Replace Intel specific PASID allocator with IOASID Jacob Pan
                   ` (7 subsequent siblings)
  21 siblings, 1 reply; 64+ messages in thread
From: Jacob Pan @ 2019-06-09 13:44 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan, Liu

When VT-d driver runs in the guest, PASID allocation must be
performed via virtual command interface. This patch registers a
custom IOASID allocator which takes precedence over the default
XArray based allocator. The resulting IOASID allocation will always
come from the host. This ensures that PASID namespace is system-
wide.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Liu, Yi L <yi.l.liu@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/Kconfig       |  1 +
 drivers/iommu/intel-iommu.c | 60 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/intel-iommu.h |  2 ++
 3 files changed, 63 insertions(+)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index c40c4b5..5d1bc2a 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -213,6 +213,7 @@ config INTEL_IOMMU_SVM
 	bool "Support for Shared Virtual Memory with Intel IOMMU"
 	depends on INTEL_IOMMU && X86
 	select PCI_PASID
+	select IOASID
 	select MMU_NOTIFIER
 	select IOASID
 	help
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 09b8ff0..5b84994 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1711,6 +1711,8 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
 		if (ecap_prs(iommu->ecap))
 			intel_svm_finish_prq(iommu);
 	}
+	ioasid_unregister_allocator(&iommu->pasid_allocator);
+
 #endif
 }
 
@@ -4820,6 +4822,46 @@ static int __init platform_optin_force_iommu(void)
 	return 1;
 }
 
+#ifdef CONFIG_INTEL_IOMMU_SVM
+static ioasid_t intel_ioasid_alloc(ioasid_t min, ioasid_t max, void *data)
+{
+	struct intel_iommu *iommu = data;
+	ioasid_t ioasid;
+
+	/*
+	 * VT-d virtual command interface always uses the full 20 bit
+	 * PASID range. Host can partition guest PASID range based on
+	 * policies but it is out of guest's control.
+	 */
+	if (min < PASID_MIN || max > PASID_MAX)
+		return -EINVAL;
+
+	if (vcmd_alloc_pasid(iommu, &ioasid))
+		return INVALID_IOASID;
+
+	return ioasid;
+}
+
+static void intel_ioasid_free(ioasid_t ioasid, void *data)
+{
+	struct iommu_pasid_alloc_info *svm;
+	struct intel_iommu *iommu = data;
+
+	if (!iommu)
+		return;
+	/*
+	 * Sanity check the ioasid owner is done at upper layer, e.g. VFIO
+	 * We can only free the PASID when all the devices are unbond.
+	 */
+	svm = ioasid_find(NULL, ioasid, NULL);
+	if (!svm) {
+		pr_warn("Freeing unbond IOASID %d\n", ioasid);
+		return;
+	}
+	vcmd_free_pasid(iommu, ioasid);
+}
+#endif
+
 int __init intel_iommu_init(void)
 {
 	int ret = -ENODEV;
@@ -4924,6 +4966,24 @@ int __init intel_iommu_init(void)
 				       "%s", iommu->name);
 		iommu_device_set_ops(&iommu->iommu, &intel_iommu_ops);
 		iommu_device_register(&iommu->iommu);
+#ifdef CONFIG_INTEL_IOMMU_SVM
+		if (cap_caching_mode(iommu->cap) && sm_supported(iommu)) {
+			/*
+			 * Register a custom ASID allocator if we are running
+			 * in a guest, the purpose is to have a system wide PASID
+			 * namespace among all PASID users.
+			 * There can be multiple vIOMMUs in each guest but only
+			 * one allocator is active. All vIOMMU allocators will
+			 * eventually be calling the same host allocator.
+			 */
+			iommu->pasid_allocator.alloc = intel_ioasid_alloc;
+			iommu->pasid_allocator.free = intel_ioasid_free;
+			iommu->pasid_allocator.pdata = (void *)iommu;
+			ret = ioasid_register_allocator(&iommu->pasid_allocator);
+			if (ret)
+				pr_warn("Custom PASID allocator registeration failed\n");
+		}
+#endif
 	}
 
 	bus_set_iommu(&pci_bus_type, &intel_iommu_ops);
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index bff907b..8605c74 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -31,6 +31,7 @@
 #include <linux/iommu.h>
 #include <linux/io-64-nonatomic-lo-hi.h>
 #include <linux/dmar.h>
+#include <linux/ioasid.h>
 
 #include <asm/cacheflush.h>
 #include <asm/iommu.h>
@@ -549,6 +550,7 @@ struct intel_iommu {
 #ifdef CONFIG_INTEL_IOMMU_SVM
 	struct page_req_dsc *prq;
 	unsigned char prq_name[16];    /* Name for PRQ interrupt */
+	struct ioasid_allocator_ops pasid_allocator; /* Custom allocator for PASIDs */
 #endif
 	struct q_inval  *qi;            /* Queued invalidation info */
 	u32 *iommu_state; /* Store iommu states between suspend and resume.*/
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 15/22] iommu/vt-d: Replace Intel specific PASID allocator with IOASID
  2019-06-09 13:44 [PATCH v4 00/22] Shared virtual address IOMMU and VT-d support Jacob Pan
                   ` (13 preceding siblings ...)
  2019-06-09 13:44 ` [PATCH v4 14/22] iommu/vt-d: Add custom allocator for IOASID Jacob Pan
@ 2019-06-09 13:44 ` Jacob Pan
  2019-06-18 15:57   ` Jonathan Cameron
  2019-06-27  1:53   ` Lu Baolu
  2019-06-09 13:44 ` [PATCH v4 16/22] iommu/vt-d: Move domain helper to header Jacob Pan
                   ` (6 subsequent siblings)
  21 siblings, 2 replies; 64+ messages in thread
From: Jacob Pan @ 2019-06-09 13:44 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan

Make use of generic IOASID code to manage PASID allocation,
free, and lookup. Replace Intel specific code.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/intel-iommu.c | 11 +++++------
 drivers/iommu/intel-pasid.c | 36 ------------------------------------
 drivers/iommu/intel-svm.c   | 37 +++++++++++++++++++++----------------
 3 files changed, 26 insertions(+), 58 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 5b84994..39b63fe 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5167,7 +5167,7 @@ static void auxiliary_unlink_device(struct dmar_domain *domain,
 	domain->auxd_refcnt--;
 
 	if (!domain->auxd_refcnt && domain->default_pasid > 0)
-		intel_pasid_free_id(domain->default_pasid);
+		ioasid_free(domain->default_pasid);
 }
 
 static int aux_domain_add_dev(struct dmar_domain *domain,
@@ -5185,10 +5185,9 @@ static int aux_domain_add_dev(struct dmar_domain *domain,
 	if (domain->default_pasid <= 0) {
 		int pasid;
 
-		pasid = intel_pasid_alloc_id(domain, PASID_MIN,
-					     pci_max_pasids(to_pci_dev(dev)),
-					     GFP_KERNEL);
-		if (pasid <= 0) {
+		pasid = ioasid_alloc(NULL, PASID_MIN, pci_max_pasids(to_pci_dev(dev)) - 1,
+				domain);
+		if (pasid == INVALID_IOASID) {
 			pr_err("Can't allocate default pasid\n");
 			return -ENODEV;
 		}
@@ -5224,7 +5223,7 @@ static int aux_domain_add_dev(struct dmar_domain *domain,
 	spin_unlock(&iommu->lock);
 	spin_unlock_irqrestore(&device_domain_lock, flags);
 	if (!domain->auxd_refcnt && domain->default_pasid > 0)
-		intel_pasid_free_id(domain->default_pasid);
+		ioasid_free(domain->default_pasid);
 
 	return ret;
 }
diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 69fddd3..1e25539 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -26,42 +26,6 @@
  */
 static DEFINE_SPINLOCK(pasid_lock);
 u32 intel_pasid_max_id = PASID_MAX;
-static DEFINE_IDR(pasid_idr);
-
-int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp)
-{
-	int ret, min, max;
-
-	min = max_t(int, start, PASID_MIN);
-	max = min_t(int, end, intel_pasid_max_id);
-
-	WARN_ON(in_interrupt());
-	idr_preload(gfp);
-	spin_lock(&pasid_lock);
-	ret = idr_alloc(&pasid_idr, ptr, min, max, GFP_ATOMIC);
-	spin_unlock(&pasid_lock);
-	idr_preload_end();
-
-	return ret;
-}
-
-void intel_pasid_free_id(int pasid)
-{
-	spin_lock(&pasid_lock);
-	idr_remove(&pasid_idr, pasid);
-	spin_unlock(&pasid_lock);
-}
-
-void *intel_pasid_lookup_id(int pasid)
-{
-	void *p;
-
-	spin_lock(&pasid_lock);
-	p = idr_find(&pasid_idr, pasid);
-	spin_unlock(&pasid_lock);
-
-	return p;
-}
 
 int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid)
 {
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 8f87304..9cbcc1f 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -25,6 +25,7 @@
 #include <linux/dmar.h>
 #include <linux/interrupt.h>
 #include <linux/mm_types.h>
+#include <linux/ioasid.h>
 #include <asm/page.h>
 
 #include "intel-pasid.h"
@@ -332,16 +333,15 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_
 		if (pasid_max > intel_pasid_max_id)
 			pasid_max = intel_pasid_max_id;
 
-		/* Do not use PASID 0 in caching mode (virtualised IOMMU) */
-		ret = intel_pasid_alloc_id(svm,
-					   !!cap_caching_mode(iommu->cap),
-					   pasid_max - 1, GFP_KERNEL);
-		if (ret < 0) {
+		/* Do not use PASID 0, reserved for RID to PASID */
+		svm->pasid = ioasid_alloc(NULL, PASID_MIN,
+					pasid_max - 1, svm);
+		if (svm->pasid == INVALID_IOASID) {
 			kfree(svm);
 			kfree(sdev);
+			ret = ENOSPC;
 			goto out;
 		}
-		svm->pasid = ret;
 		svm->notifier.ops = &intel_mmuops;
 		svm->mm = mm;
 		svm->flags = flags;
@@ -351,7 +351,7 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_
 		if (mm) {
 			ret = mmu_notifier_register(&svm->notifier, mm);
 			if (ret) {
-				intel_pasid_free_id(svm->pasid);
+				ioasid_free(svm->pasid);
 				kfree(svm);
 				kfree(sdev);
 				goto out;
@@ -367,7 +367,7 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_
 		if (ret) {
 			if (mm)
 				mmu_notifier_unregister(&svm->notifier, mm);
-			intel_pasid_free_id(svm->pasid);
+			ioasid_free(svm->pasid);
 			kfree(svm);
 			kfree(sdev);
 			goto out;
@@ -400,7 +400,12 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
 	if (!iommu)
 		goto out;
 
-	svm = intel_pasid_lookup_id(pasid);
+	svm = ioasid_find(NULL, pasid, NULL);
+	if (IS_ERR(svm)) {
+		ret = PTR_ERR(svm);
+		goto out;
+	}
+
 	if (!svm)
 		goto out;
 
@@ -422,7 +427,7 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
 				kfree_rcu(sdev, rcu);
 
 				if (list_empty(&svm->devs)) {
-					intel_pasid_free_id(svm->pasid);
+					ioasid_free(svm->pasid);
 					if (svm->mm)
 						mmu_notifier_unregister(&svm->notifier, svm->mm);
 
@@ -457,10 +462,11 @@ int intel_svm_is_pasid_valid(struct device *dev, int pasid)
 	if (!iommu)
 		goto out;
 
-	svm = intel_pasid_lookup_id(pasid);
-	if (!svm)
+	svm = ioasid_find(NULL, pasid, NULL);
+	if (IS_ERR(svm)) {
+		ret = PTR_ERR(svm);
 		goto out;
-
+	}
 	/* init_mm is used in this case */
 	if (!svm->mm)
 		ret = 1;
@@ -567,13 +573,12 @@ static irqreturn_t prq_event_thread(int irq, void *d)
 
 		if (!svm || svm->pasid != req->pasid) {
 			rcu_read_lock();
-			svm = intel_pasid_lookup_id(req->pasid);
+			svm = ioasid_find(NULL, req->pasid, NULL);
 			/* It *can't* go away, because the driver is not permitted
 			 * to unbind the mm while any page faults are outstanding.
 			 * So we only need RCU to protect the internal idr code. */
 			rcu_read_unlock();
-
-			if (!svm) {
+			if (IS_ERR(svm) || !svm) {
 				pr_err("%s: Page request for invalid PASID %d: %08llx %08llx\n",
 				       iommu->name, req->pasid, ((unsigned long long *)req)[0],
 				       ((unsigned long long *)req)[1]);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 16/22] iommu/vt-d: Move domain helper to header
  2019-06-09 13:44 [PATCH v4 00/22] Shared virtual address IOMMU and VT-d support Jacob Pan
                   ` (14 preceding siblings ...)
  2019-06-09 13:44 ` [PATCH v4 15/22] iommu/vt-d: Replace Intel specific PASID allocator with IOASID Jacob Pan
@ 2019-06-09 13:44 ` Jacob Pan
  2019-07-16  9:33   ` Auger Eric
  2019-06-09 13:44 ` [PATCH v4 17/22] iommu/vt-d: Avoid duplicated code for PASID setup Jacob Pan
                   ` (5 subsequent siblings)
  21 siblings, 1 reply; 64+ messages in thread
From: Jacob Pan @ 2019-06-09 13:44 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan

Move domainer helper to header to be used by SVA code.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/intel-iommu.c | 6 ------
 include/linux/intel-iommu.h | 6 ++++++
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 39b63fe..7cfa0eb 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -427,12 +427,6 @@ static void init_translation_status(struct intel_iommu *iommu)
 		iommu->flags |= VTD_FLAG_TRANS_PRE_ENABLED;
 }
 
-/* Convert generic 'struct iommu_domain to private struct dmar_domain */
-static struct dmar_domain *to_dmar_domain(struct iommu_domain *dom)
-{
-	return container_of(dom, struct dmar_domain, domain);
-}
-
 static int __init intel_iommu_setup(char *str)
 {
 	if (!str)
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 8605c74..b75f17d 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -597,6 +597,12 @@ static inline void __iommu_flush_cache(
 		clflush_cache_range(addr, size);
 }
 
+/* Convert generic 'struct iommu_domain to private struct dmar_domain */
+static inline struct dmar_domain *to_dmar_domain(struct iommu_domain *dom)
+{
+	return container_of(dom, struct dmar_domain, domain);
+}
+
 /*
  * 0: readable
  * 1: writable
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 17/22] iommu/vt-d: Avoid duplicated code for PASID setup
  2019-06-09 13:44 [PATCH v4 00/22] Shared virtual address IOMMU and VT-d support Jacob Pan
                   ` (15 preceding siblings ...)
  2019-06-09 13:44 ` [PATCH v4 16/22] iommu/vt-d: Move domain helper to header Jacob Pan
@ 2019-06-09 13:44 ` Jacob Pan
  2019-06-18 16:03   ` Jonathan Cameron
  2019-07-16  9:52   ` Auger Eric
  2019-06-09 13:44 ` [PATCH v4 18/22] iommu/vt-d: Add nested translation helper function Jacob Pan
                   ` (4 subsequent siblings)
  21 siblings, 2 replies; 64+ messages in thread
From: Jacob Pan @ 2019-06-09 13:44 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan

After each setup for PASID entry, related translation caches must be flushed.
We can combine duplicated code into one function which is less error prone.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/intel-pasid.c | 48 +++++++++++++++++----------------------------
 1 file changed, 18 insertions(+), 30 deletions(-)

diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 1e25539..1ff2ecc 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -522,6 +522,21 @@ void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
 		devtlb_invalidation_with_pasid(iommu, dev, pasid);
 }
 
+static inline void pasid_flush_caches(struct intel_iommu *iommu,
+				struct pasid_entry *pte,
+				int pasid, u16 did)
+{
+	if (!ecap_coherent(iommu->ecap))
+		clflush_cache_range(pte, sizeof(*pte));
+
+	if (cap_caching_mode(iommu->cap)) {
+		pasid_cache_invalidation_with_pasid(iommu, did, pasid);
+		iotlb_invalidation_with_pasid(iommu, did, pasid);
+	} else
+		iommu_flush_write_buffer(iommu);
+
+}
+
 /*
  * Set up the scalable mode pasid table entry for first only
  * translation type.
@@ -567,16 +582,7 @@ int intel_pasid_setup_first_level(struct intel_iommu *iommu,
 	/* Setup Present and PASID Granular Transfer Type: */
 	pasid_set_translation_type(pte, 1);
 	pasid_set_present(pte);
-
-	if (!ecap_coherent(iommu->ecap))
-		clflush_cache_range(pte, sizeof(*pte));
-
-	if (cap_caching_mode(iommu->cap)) {
-		pasid_cache_invalidation_with_pasid(iommu, did, pasid);
-		iotlb_invalidation_with_pasid(iommu, did, pasid);
-	} else {
-		iommu_flush_write_buffer(iommu);
-	}
+	pasid_flush_caches(iommu, pte, pasid, did);
 
 	return 0;
 }
@@ -640,16 +646,7 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
 	 */
 	pasid_set_sre(pte);
 	pasid_set_present(pte);
-
-	if (!ecap_coherent(iommu->ecap))
-		clflush_cache_range(pte, sizeof(*pte));
-
-	if (cap_caching_mode(iommu->cap)) {
-		pasid_cache_invalidation_with_pasid(iommu, did, pasid);
-		iotlb_invalidation_with_pasid(iommu, did, pasid);
-	} else {
-		iommu_flush_write_buffer(iommu);
-	}
+	pasid_flush_caches(iommu, pte, pasid, did);
 
 	return 0;
 }
@@ -683,16 +680,7 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
 	 */
 	pasid_set_sre(pte);
 	pasid_set_present(pte);
-
-	if (!ecap_coherent(iommu->ecap))
-		clflush_cache_range(pte, sizeof(*pte));
-
-	if (cap_caching_mode(iommu->cap)) {
-		pasid_cache_invalidation_with_pasid(iommu, did, pasid);
-		iotlb_invalidation_with_pasid(iommu, did, pasid);
-	} else {
-		iommu_flush_write_buffer(iommu);
-	}
+	pasid_flush_caches(iommu, pte, pasid, did);
 
 	return 0;
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 18/22] iommu/vt-d: Add nested translation helper function
  2019-06-09 13:44 [PATCH v4 00/22] Shared virtual address IOMMU and VT-d support Jacob Pan
                   ` (16 preceding siblings ...)
  2019-06-09 13:44 ` [PATCH v4 17/22] iommu/vt-d: Avoid duplicated code for PASID setup Jacob Pan
@ 2019-06-09 13:44 ` Jacob Pan
  2019-06-09 13:44 ` [PATCH v4 19/22] iommu/vt-d: Clean up for SVM device list Jacob Pan
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 64+ messages in thread
From: Jacob Pan @ 2019-06-09 13:44 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan, Liu, Yi L

Nested translation mode is supported in VT-d 3.0 Spec.CH 3.8.
With PASID granular translation type set to 0x11b, translation
result from the first level(FL) also subject to a second level(SL)
page table translation. This mode is used for SVA virtualization,
where FL performs guest virtual to guest physical translation and
SL performs guest physical to host physical translation.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 drivers/iommu/intel-pasid.c | 93 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/intel-pasid.h | 11 ++++++
 2 files changed, 104 insertions(+)

diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 1ff2ecc..50ec344 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -684,3 +684,96 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
 
 	return 0;
 }
+
+/**
+ * intel_pasid_setup_nested() - Set up PASID entry for nested translation
+ * which is used for vSVA. The first level page tables are used for
+ * GVA-GPA translation in the guest, second level page tables are used
+ * for GPA to HPA translation.
+ *
+ * @iommu:      Iommu which the device belong to
+ * @dev:        Device to be set up for translation
+ * @gpgd:       FLPTPTR: First Level Page translation pointer in GPA
+ * @pasid:      PASID to be programmed in the device PASID table
+ * @flags:      Additional info such as supervisor PASID
+ * @domain:     Domain info for setting up second level page tables
+ * @addr_width: Address width of the first level (guest)
+ */
+int intel_pasid_setup_nested(struct intel_iommu *iommu,
+			struct device *dev, pgd_t *gpgd,
+			int pasid, int flags,
+			struct dmar_domain *domain,
+			int addr_width)
+{
+	struct pasid_entry *pte;
+	struct dma_pte *pgd;
+	u64 pgd_val;
+	int agaw;
+	u16 did;
+
+	if (!ecap_nest(iommu->ecap)) {
+		pr_err("IOMMU: %s: No nested translation support\n",
+		       iommu->name);
+		return -EINVAL;
+	}
+
+	pte = intel_pasid_get_entry(dev, pasid);
+	if (WARN_ON(!pte))
+		return -EINVAL;
+
+	pasid_clear_entry(pte);
+
+	/* Sanity checking performed by caller to make sure address
+	 * width matching in two dimensions:
+	 * 1. CPU vs. IOMMU
+	 * 2. Guest vs. Host.
+	 */
+	switch (addr_width) {
+	case 57:
+		pasid_set_flpm(pte, 1);
+		break;
+	case 48:
+		pasid_set_flpm(pte, 0);
+		break;
+	default:
+		dev_err(dev, "Invalid paging mode %d\n", addr_width);
+		return -EINVAL;
+	}
+
+	/* Setup the first level page table pointer in GPA */
+	pasid_set_flptr(pte, (u64)gpgd);
+	if (flags & PASID_FLAG_SUPERVISOR_MODE) {
+		if (!ecap_srs(iommu->ecap)) {
+			pr_err("No supervisor request support on %s\n",
+			       iommu->name);
+			return -EINVAL;
+		}
+		pasid_set_sre(pte);
+	}
+
+	/* Setup the second level based on the given domain */
+	pgd = domain->pgd;
+
+	for (agaw = domain->agaw; agaw != iommu->agaw; agaw--) {
+		pgd = phys_to_virt(dma_pte_addr(pgd));
+		if (!dma_pte_present(pgd)) {
+			dev_err(dev, "Invalid domain page table\n");
+			return -EINVAL;
+		}
+	}
+	pgd_val = virt_to_phys(pgd);
+	pasid_set_slptr(pte, pgd_val);
+	pasid_set_fault_enable(pte);
+
+	did = domain->iommu_did[iommu->seq_id];
+	pasid_set_domain_id(pte, did);
+
+	pasid_set_address_width(pte, agaw);
+	pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
+
+	pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED);
+	pasid_set_present(pte);
+	pasid_flush_caches(iommu, pte, pasid, did);
+
+	return 0;
+}
diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
index 4b26ab5..2234fd5 100644
--- a/drivers/iommu/intel-pasid.h
+++ b/drivers/iommu/intel-pasid.h
@@ -42,6 +42,7 @@
  * to vmalloc or even module mappings.
  */
 #define PASID_FLAG_SUPERVISOR_MODE	BIT(0)
+#define PASID_FLAG_NESTED		BIT(1)
 
 struct pasid_dir_entry {
 	u64 val;
@@ -51,6 +52,11 @@ struct pasid_entry {
 	u64 val[8];
 };
 
+#define PASID_ENTRY_PGTT_FL_ONLY	(1)
+#define PASID_ENTRY_PGTT_SL_ONLY	(2)
+#define PASID_ENTRY_PGTT_NESTED		(3)
+#define PASID_ENTRY_PGTT_PT		(4)
+
 /* The representative of a PASID table */
 struct pasid_table {
 	void			*table;		/* pasid table pointer */
@@ -77,6 +83,11 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
 int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
 				   struct dmar_domain *domain,
 				   struct device *dev, int pasid);
+int intel_pasid_setup_nested(struct intel_iommu *iommu,
+			struct device *dev, pgd_t *pgd,
+			int pasid, int flags,
+			struct dmar_domain *domain,
+			int addr_width);
 void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
 				 struct device *dev, int pasid);
 int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 19/22] iommu/vt-d: Clean up for SVM device list
  2019-06-09 13:44 [PATCH v4 00/22] Shared virtual address IOMMU and VT-d support Jacob Pan
                   ` (17 preceding siblings ...)
  2019-06-09 13:44 ` [PATCH v4 18/22] iommu/vt-d: Add nested translation helper function Jacob Pan
@ 2019-06-09 13:44 ` Jacob Pan
  2019-06-18 16:42   ` Jonathan Cameron
  2019-06-09 13:44 ` [PATCH v4 20/22] iommu/vt-d: Add bind guest PASID support Jacob Pan
                   ` (2 subsequent siblings)
  21 siblings, 1 reply; 64+ messages in thread
From: Jacob Pan @ 2019-06-09 13:44 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan

Use combined macro for_each_svm_dev() to simplify SVM device iteration.

Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/iommu/intel-svm.c | 79 +++++++++++++++++++++++------------------------
 1 file changed, 39 insertions(+), 40 deletions(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 9cbcc1f..66d98e1 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -225,6 +225,9 @@ static const struct mmu_notifier_ops intel_mmuops = {
 
 static DEFINE_MUTEX(pasid_mutex);
 static LIST_HEAD(global_svm_list);
+#define for_each_svm_dev() \
+	list_for_each_entry(sdev, &svm->devs, list)	\
+	if (dev == sdev->dev)				\
 
 int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ops *ops)
 {
@@ -271,15 +274,13 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_
 				goto out;
 			}
 
-			list_for_each_entry(sdev, &svm->devs, list) {
-				if (dev == sdev->dev) {
-					if (sdev->ops != ops) {
-						ret = -EBUSY;
-						goto out;
-					}
-					sdev->users++;
-					goto success;
+			for_each_svm_dev() {
+				if (sdev->ops != ops) {
+					ret = -EBUSY;
+					goto out;
 				}
+				sdev->users++;
+				goto success;
 			}
 
 			break;
@@ -409,40 +410,38 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
 	if (!svm)
 		goto out;
 
-	list_for_each_entry(sdev, &svm->devs, list) {
-		if (dev == sdev->dev) {
-			ret = 0;
-			sdev->users--;
-			if (!sdev->users) {
-				list_del_rcu(&sdev->list);
-				/* Flush the PASID cache and IOTLB for this device.
-				 * Note that we do depend on the hardware *not* using
-				 * the PASID any more. Just as we depend on other
-				 * devices never using PASIDs that they have no right
-				 * to use. We have a *shared* PASID table, because it's
-				 * large and has to be physically contiguous. So it's
-				 * hard to be as defensive as we might like. */
-				intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
-				intel_flush_svm_range_dev(svm, sdev, 0, -1, 0, !svm->mm);
-				kfree_rcu(sdev, rcu);
-
-				if (list_empty(&svm->devs)) {
-					ioasid_free(svm->pasid);
-					if (svm->mm)
-						mmu_notifier_unregister(&svm->notifier, svm->mm);
-
-					list_del(&svm->list);
-
-					/* We mandate that no page faults may be outstanding
-					 * for the PASID when intel_svm_unbind_mm() is called.
-					 * If that is not obeyed, subtle errors will happen.
-					 * Let's make them less subtle... */
-					memset(svm, 0x6b, sizeof(*svm));
-					kfree(svm);
-				}
+	for_each_svm_dev() {
+		ret = 0;
+		sdev->users--;
+		if (!sdev->users) {
+			list_del_rcu(&sdev->list);
+			/* Flush the PASID cache and IOTLB for this device.
+			 * Note that we do depend on the hardware *not* using
+			 * the PASID any more. Just as we depend on other
+			 * devices never using PASIDs that they have no right
+			 * to use. We have a *shared* PASID table, because it's
+			 * large and has to be physically contiguous. So it's
+			 * hard to be as defensive as we might like. */
+			intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
+			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0, !svm->mm);
+			kfree_rcu(sdev, rcu);
+
+			if (list_empty(&svm->devs)) {
+				ioasid_free(svm->pasid);
+				if (svm->mm)
+					mmu_notifier_unregister(&svm->notifier, svm->mm);
+
+				list_del(&svm->list);
+
+				/* We mandate that no page faults may be outstanding
+				 * for the PASID when intel_svm_unbind_mm() is called.
+				 * If that is not obeyed, subtle errors will happen.
+				 * Let's make them less subtle... */
+				memset(svm, 0x6b, sizeof(*svm));
+				kfree(svm);
 			}
-			break;
 		}
+		break;
 	}
  out:
 	mutex_unlock(&pasid_mutex);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 20/22] iommu/vt-d: Add bind guest PASID support
  2019-06-09 13:44 [PATCH v4 00/22] Shared virtual address IOMMU and VT-d support Jacob Pan
                   ` (18 preceding siblings ...)
  2019-06-09 13:44 ` [PATCH v4 19/22] iommu/vt-d: Clean up for SVM device list Jacob Pan
@ 2019-06-09 13:44 ` Jacob Pan
  2019-06-18 16:44   ` Jonathan Cameron
                     ` (2 more replies)
  2019-06-09 13:44 ` [PATCH v4 21/22] iommu/vt-d: Support flushing more translation cache types Jacob Pan
  2019-06-09 13:44 ` [PATCH v4 22/22] iommu/vt-d: Add svm/sva invalidate function Jacob Pan
  21 siblings, 3 replies; 64+ messages in thread
From: Jacob Pan @ 2019-06-09 13:44 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan, Liu, Yi L

When supporting guest SVA with emulated IOMMU, the guest PASID
table is shadowed in VMM. Updates to guest vIOMMU PASID table
will result in PASID cache flush which will be passed down to
the host as bind guest PASID calls.

For the SL page tables, it will be harvested from device's
default domain (request w/o PASID), or aux domain in case of
mediated device.

    .-------------.  .---------------------------.
    |   vIOMMU    |  | Guest process CR3, FL only|
    |             |  '---------------------------'
    .----------------/
    | PASID Entry |--- PASID cache flush -
    '-------------'                       |
    |             |                       V
    |             |                CR3 in GPA
    '-------------'
Guest
------| Shadow |--------------------------|--------
      v        v                          v
Host
    .-------------.  .----------------------.
    |   pIOMMU    |  | Bind FL for GVA-GPA  |
    |             |  '----------------------'
    .----------------/  |
    | PASID Entry |     V (Nested xlate)
    '----------------\.------------------------------.
    |             |   |SL for GPA-HPA, default domain|
    |             |   '------------------------------'
    '-------------'
Where:
 - FL = First level/stage one page tables
 - SL = Second level/stage two page tables

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 drivers/iommu/intel-iommu.c |   4 +
 drivers/iommu/intel-svm.c   | 187 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/intel-iommu.h |  13 ++-
 include/linux/intel-svm.h   |  17 ++++
 4 files changed, 219 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 7cfa0eb..3b4d712 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5782,6 +5782,10 @@ const struct iommu_ops intel_iommu_ops = {
 	.dev_enable_feat	= intel_iommu_dev_enable_feat,
 	.dev_disable_feat	= intel_iommu_dev_disable_feat,
 	.pgsize_bitmap		= INTEL_IOMMU_PGSIZES,
+#ifdef CONFIG_INTEL_IOMMU_SVM
+	.sva_bind_gpasid	= intel_svm_bind_gpasid,
+	.sva_unbind_gpasid	= intel_svm_unbind_gpasid,
+#endif
 };
 
 static void quirk_iommu_g4x_gfx(struct pci_dev *dev)
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 66d98e1..f06a82f 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -229,6 +229,193 @@ static LIST_HEAD(global_svm_list);
 	list_for_each_entry(sdev, &svm->devs, list)	\
 	if (dev == sdev->dev)				\
 
+int intel_svm_bind_gpasid(struct iommu_domain *domain,
+			struct device *dev,
+			struct gpasid_bind_data *data)
+{
+	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
+	struct intel_svm_dev *sdev;
+	struct intel_svm *svm = NULL;
+	struct dmar_domain *ddomain;
+	int ret = 0;
+
+	if (WARN_ON(!iommu) || !data)
+		return -EINVAL;
+
+	if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
+		data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
+		return -EINVAL;
+
+	if (dev_is_pci(dev)) {
+		/* VT-d supports devices with full 20 bit PASIDs only */
+		if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX)
+			return -EINVAL;
+	}
+
+	/*
+	 * We only check host PASID range, we have no knowledge to check
+	 * guest PASID range nor do we use the guest PASID.
+	 */
+	if (data->hpasid <= 0 || data->hpasid >= PASID_MAX)
+		return -EINVAL;
+
+	ddomain = to_dmar_domain(domain);
+	/* REVISIT:
+	 * Sanity check adddress width and paging mode support
+	 * width matching in two dimensions:
+	 * 1. paging mode CPU <= IOMMU
+	 * 2. address width Guest <= Host.
+	 */
+	mutex_lock(&pasid_mutex);
+	svm = ioasid_find(NULL, data->hpasid, NULL);
+	if (IS_ERR(svm)) {
+		ret = PTR_ERR(svm);
+		goto out;
+	}
+	if (svm) {
+		/*
+		 * If we found svm for the PASID, there must be at
+		 * least one device bond, otherwise svm should be freed.
+		 */
+		BUG_ON(list_empty(&svm->devs));
+
+		for_each_svm_dev() {
+			/* In case of multiple sub-devices of the same pdev assigned, we should
+			 * allow multiple bind calls with the same PASID and pdev.
+			 */
+			sdev->users++;
+			goto out;
+		}
+	} else {
+		/* We come here when PASID has never been bond to a device. */
+		svm = kzalloc(sizeof(*svm), GFP_KERNEL);
+		if (!svm) {
+			ret = -ENOMEM;
+			goto out;
+		}
+		/* REVISIT: upper layer/VFIO can track host process that bind the PASID.
+		 * ioasid_set = mm might be sufficient for vfio to check pasid VMM
+		 * ownership.
+		 */
+		svm->mm = get_task_mm(current);
+		svm->pasid = data->hpasid;
+		if (data->flags & IOMMU_SVA_GPASID_VAL) {
+			svm->gpasid = data->gpasid;
+			svm->flags &= SVM_FLAG_GUEST_PASID;
+		}
+		refcount_set(&svm->refs, 0);
+		ioasid_set_data(data->hpasid, svm);
+		INIT_LIST_HEAD_RCU(&svm->devs);
+		INIT_LIST_HEAD(&svm->list);
+
+		mmput(svm->mm);
+	}
+	sdev = kzalloc(sizeof(*sdev), GFP_KERNEL);
+	if (!sdev) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	sdev->dev = dev;
+	sdev->users = 1;
+
+	/* Set up device context entry for PASID if not enabled already */
+	ret = intel_iommu_enable_pasid(iommu, sdev->dev);
+	if (ret) {
+		dev_err(dev, "Failed to enable PASID capability\n");
+		kfree(sdev);
+		goto out;
+	}
+
+	/*
+	 * For guest bind, we need to set up PASID table entry as follows:
+	 * - FLPM matches guest paging mode
+	 * - turn on nested mode
+	 * - SL guest address width matching
+	 */
+	ret = intel_pasid_setup_nested(iommu,
+				dev,
+				(pgd_t *)data->gpgd,
+				data->hpasid,
+				data->flags,
+				ddomain,
+				data->addr_width);
+	if (ret) {
+		dev_err(dev, "Failed to set up PASID %llu in nested mode, Err %d\n",
+			data->hpasid, ret);
+		kfree(sdev);
+		goto out;
+	}
+	svm->flags |= SVM_FLAG_GUEST_MODE;
+
+	init_rcu_head(&sdev->rcu);
+	refcount_inc(&svm->refs);
+	list_add_rcu(&sdev->list, &svm->devs);
+ out:
+	mutex_unlock(&pasid_mutex);
+	return ret;
+}
+
+int intel_svm_unbind_gpasid(struct device *dev, int pasid)
+{
+	struct intel_svm_dev *sdev;
+	struct intel_iommu *iommu;
+	struct intel_svm *svm;
+	int ret = -EINVAL;
+
+	mutex_lock(&pasid_mutex);
+	iommu = intel_svm_device_to_iommu(dev);
+	if (!iommu)
+		goto out;
+
+	svm = ioasid_find(NULL, pasid, NULL);
+	if (IS_ERR(svm)) {
+		ret = PTR_ERR(svm);
+		goto out;
+	}
+
+	if (!svm)
+		goto out;
+
+	for_each_svm_dev() {
+		ret = 0;
+		sdev->users--;
+		if (!sdev->users) {
+			list_del_rcu(&sdev->list);
+			intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
+			/* TODO: Drain in flight PRQ for the PASID since it
+			 * may get reused soon, we don't want to
+			 * confuse with its previous live.
+			 * intel_svm_drain_prq(dev, pasid);
+			 */
+			kfree_rcu(sdev, rcu);
+
+			if (list_empty(&svm->devs)) {
+				list_del(&svm->list);
+				kfree(svm);
+				/*
+				 * We do not free PASID here until explicit call
+				 * from VFIO to free. The PASID life cycle
+				 * management is largely tied to VFIO management
+				 * of assigned device life cycles. In case of
+				 * guest exit without a explicit free PASID call,
+				 * the responsibility lies in VFIO layer to free
+				 * the PASIDs allocated for the guest.
+				 * For security reasons, VFIO has to track the
+				 * PASID ownership per guest anyway to ensure
+				 * that PASID allocated by one guest cannot be
+				 * used by another.
+				 */
+				ioasid_set_data(pasid, NULL);
+			}
+		}
+		break;
+	}
+ out:
+	mutex_unlock(&pasid_mutex);
+
+	return ret;
+}
+
 int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ops *ops)
 {
 	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index b75f17d..94d3a9a 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -677,7 +677,9 @@ int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct device *dev);
 int intel_svm_init(struct intel_iommu *iommu);
 extern int intel_svm_enable_prq(struct intel_iommu *iommu);
 extern int intel_svm_finish_prq(struct intel_iommu *iommu);
-
+extern int intel_svm_bind_gpasid(struct iommu_domain *domain,
+		struct device *dev, struct gpasid_bind_data *data);
+extern int intel_svm_unbind_gpasid(struct device *dev, int pasid);
 struct svm_dev_ops;
 
 struct intel_svm_dev {
@@ -693,12 +695,19 @@ struct intel_svm_dev {
 
 struct intel_svm {
 	struct mmu_notifier notifier;
-	struct mm_struct *mm;
+	union {
+		struct mm_struct *mm;
+		u64 gcr3;
+	};
 	struct intel_iommu *iommu;
 	int flags;
 	int pasid;
+	int gpasid; /* Guest PASID in case of vSVA bind with non-identity host
+		     * to guest PASID mapping.
+		     */
 	struct list_head devs;
 	struct list_head list;
+	refcount_t refs; /* Number of devices sharing this PASID */
 };
 
 extern struct intel_iommu *intel_svm_device_to_iommu(struct device *dev);
diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
index e3f7631..577d5df 100644
--- a/include/linux/intel-svm.h
+++ b/include/linux/intel-svm.h
@@ -52,6 +52,23 @@ struct svm_dev_ops {
  * do such IOTLB flushes automatically.
  */
 #define SVM_FLAG_SUPERVISOR_MODE	(1<<1)
+/*
+ * The SVM_FLAG_GUEST_MODE flag is used when a guest process bind to a device.
+ * In this case the mm_struct is in the guest kernel or userspace, its life
+ * cycle is managed by VMM and VFIO layer. For IOMMU driver, this API provides
+ * means to bind/unbind guest CR3 with PASIDs allocated for a device.
+ */
+#define SVM_FLAG_GUEST_MODE	(1<<2)
+/*
+ * The SVM_FLAG_GUEST_PASID flag is used when a guest has its own PASID space,
+ * which requires guest and host PASID translation at both directions. We keep
+ * track of guest PASID in order to provide lookup service to device drivers.
+ * One such example is a physical function (PF) driver that supports mediated
+ * device (mdev) assignment. Guest programming of mdev configuration space can
+ * only be done with guest PASID, therefore PF driver needs to find the matching
+ * host PASID to program the real hardware.
+ */
+#define SVM_FLAG_GUEST_PASID	(1<<3)
 
 #ifdef CONFIG_INTEL_IOMMU_SVM
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 21/22] iommu/vt-d: Support flushing more translation cache types
  2019-06-09 13:44 [PATCH v4 00/22] Shared virtual address IOMMU and VT-d support Jacob Pan
                   ` (19 preceding siblings ...)
  2019-06-09 13:44 ` [PATCH v4 20/22] iommu/vt-d: Add bind guest PASID support Jacob Pan
@ 2019-06-09 13:44 ` Jacob Pan
  2019-07-18  8:35   ` Auger Eric
  2019-06-09 13:44 ` [PATCH v4 22/22] iommu/vt-d: Add svm/sva invalidate function Jacob Pan
  21 siblings, 1 reply; 64+ messages in thread
From: Jacob Pan @ 2019-06-09 13:44 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan

When Shared Virtual Memory is exposed to a guest via vIOMMU, scalable
IOTLB invalidation may be passed down from outside IOMMU subsystems.
This patch adds invalidation functions that can be used for additional
translation cache types.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/dmar.c        | 50 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/intel-iommu.h | 21 +++++++++++++++----
 2 files changed, 67 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 6d969a1..0cda6fb 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1357,6 +1357,21 @@ void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
 	qi_submit_sync(&desc, iommu);
 }
 
+/* PASID-based IOTLB Invalidate */
+void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u64 addr, u32 pasid,
+		unsigned int size_order, u64 granu, int ih)
+{
+	struct qi_desc desc;
+
+	desc.qw0 = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) |
+		QI_EIOTLB_GRAN(granu) | QI_EIOTLB_TYPE;
+	desc.qw1 = QI_EIOTLB_ADDR(addr) | QI_EIOTLB_IH(ih) |
+		QI_EIOTLB_AM(size_order);
+	desc.qw2 = 0;
+	desc.qw3 = 0;
+	qi_submit_sync(&desc, iommu);
+}
+
 void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
 			u16 qdep, u64 addr, unsigned mask)
 {
@@ -1380,6 +1395,41 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
 	qi_submit_sync(&desc, iommu);
 }
 
+/* PASID-based device IOTLB Invalidate */
+void qi_flush_dev_piotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
+		u32 pasid,  u16 qdep, u64 addr, unsigned size, u64 granu)
+{
+	struct qi_desc desc;
+
+	desc.qw0 = QI_DEV_EIOTLB_PASID(pasid) | QI_DEV_EIOTLB_SID(sid) |
+		QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE |
+		QI_DEV_IOTLB_PFSID(pfsid);
+	desc.qw1 = QI_DEV_EIOTLB_GLOB(granu);
+
+	/* If S bit is 0, we only flush a single page. If S bit is set,
+	 * The least significant zero bit indicates the size. VT-d spec
+	 * 6.5.2.6
+	 */
+	if (!size)
+		desc.qw0 |= QI_DEV_EIOTLB_ADDR(addr) & ~QI_DEV_EIOTLB_SIZE;
+	else {
+		unsigned long mask = 1UL << (VTD_PAGE_SHIFT + size);
+
+		desc.qw1 |= QI_DEV_EIOTLB_ADDR(addr & ~mask) | QI_DEV_EIOTLB_SIZE;
+	}
+	qi_submit_sync(&desc, iommu);
+}
+
+void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu, int pasid)
+{
+	struct qi_desc desc;
+
+	desc.qw0 = QI_PC_TYPE | QI_PC_DID(did) | QI_PC_GRAN(granu) | QI_PC_PASID(pasid);
+	desc.qw1 = 0;
+	desc.qw2 = 0;
+	desc.qw3 = 0;
+	qi_submit_sync(&desc, iommu);
+}
 /*
  * Disable Queued Invalidation interface.
  */
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 94d3a9a..1cdb35b 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -339,7 +339,7 @@ enum {
 #define QI_IOTLB_GRAN(gran) 	(((u64)gran) >> (DMA_TLB_FLUSH_GRANU_OFFSET-4))
 #define QI_IOTLB_ADDR(addr)	(((u64)addr) & VTD_PAGE_MASK)
 #define QI_IOTLB_IH(ih)		(((u64)ih) << 6)
-#define QI_IOTLB_AM(am)		(((u8)am))
+#define QI_IOTLB_AM(am)		(((u8)am) & 0x3f)
 
 #define QI_CC_FM(fm)		(((u64)fm) << 48)
 #define QI_CC_SID(sid)		(((u64)sid) << 32)
@@ -357,17 +357,22 @@ enum {
 #define QI_PC_DID(did)		(((u64)did) << 16)
 #define QI_PC_GRAN(gran)	(((u64)gran) << 4)
 
-#define QI_PC_ALL_PASIDS	(QI_PC_TYPE | QI_PC_GRAN(0))
-#define QI_PC_PASID_SEL		(QI_PC_TYPE | QI_PC_GRAN(1))
+/* PASID cache invalidation granu */
+#define QI_PC_ALL_PASIDS	0
+#define QI_PC_PASID_SEL		1
 
 #define QI_EIOTLB_ADDR(addr)	((u64)(addr) & VTD_PAGE_MASK)
 #define QI_EIOTLB_GL(gl)	(((u64)gl) << 7)
 #define QI_EIOTLB_IH(ih)	(((u64)ih) << 6)
-#define QI_EIOTLB_AM(am)	(((u64)am))
+#define QI_EIOTLB_AM(am)	(((u64)am) & 0x3f)
 #define QI_EIOTLB_PASID(pasid) 	(((u64)pasid) << 32)
 #define QI_EIOTLB_DID(did)	(((u64)did) << 16)
 #define QI_EIOTLB_GRAN(gran) 	(((u64)gran) << 4)
 
+/* QI Dev-IOTLB inv granu */
+#define QI_DEV_IOTLB_GRAN_ALL		1
+#define QI_DEV_IOTLB_GRAN_PASID_SEL	0
+
 #define QI_DEV_EIOTLB_ADDR(a)	((u64)(a) & VTD_PAGE_MASK)
 #define QI_DEV_EIOTLB_SIZE	(((u64)1) << 11)
 #define QI_DEV_EIOTLB_GLOB(g)	((u64)g)
@@ -658,8 +663,16 @@ extern void qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid,
 			     u8 fm, u64 type);
 extern void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
 			  unsigned int size_order, u64 type);
+extern void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u64 addr,
+			u32 pasid, unsigned int size_order, u64 type, int ih);
 extern void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
 			u16 qdep, u64 addr, unsigned mask);
+
+extern void qi_flush_dev_piotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
+			u32 pasid, u16 qdep, u64 addr, unsigned size, u64 granu);
+
+extern void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu, int pasid);
+
 extern int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu);
 
 extern int dmar_ir_support(void);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v4 22/22] iommu/vt-d: Add svm/sva invalidate function
  2019-06-09 13:44 [PATCH v4 00/22] Shared virtual address IOMMU and VT-d support Jacob Pan
                   ` (20 preceding siblings ...)
  2019-06-09 13:44 ` [PATCH v4 21/22] iommu/vt-d: Support flushing more translation cache types Jacob Pan
@ 2019-06-09 13:44 ` Jacob Pan
  21 siblings, 0 replies; 64+ messages in thread
From: Jacob Pan @ 2019-06-09 13:44 UTC (permalink / raw)
  To: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Jacob Pan, Liu, Yi L

When Shared Virtual Address (SVA) is enabled for a guest OS via
vIOMMU, we need to provide invalidation support at IOMMU API and driver
level. This patch adds Intel VT-d specific function to implement
iommu passdown invalidate API for shared virtual address.

The use case is for supporting caching structure invalidation
of assigned SVM capable devices. Emulated IOMMU exposes queue
invalidation capability and passes down all descriptors from the guest
to the physical IOMMU.

The assumption is that guest to host device ID mapping should be
resolved prior to calling IOMMU driver. Based on the device handle,
host IOMMU driver can replace certain fields before submit to the
invalidation queue.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
---
 drivers/iommu/intel-iommu.c | 170 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 170 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 3b4d712..3b55b86 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5352,6 +5352,175 @@ static void intel_iommu_aux_detach_device(struct iommu_domain *domain,
 	aux_domain_remove_dev(to_dmar_domain(domain), dev);
 }
 
+/*
+ * 2D array for converting and sanitizing IOMMU generic TLB granularity to
+ * VT-d granularity. Invalidation is typically included in the unmap operation
+ * as a result of DMA or VFIO unmap. However, for assigned device where guest
+ * could own the first level page tables without being shadowed by QEMU. In
+ * this case there is no pass down unmap to the host IOMMU as a result of unmap
+ * in the guest. Only invalidations are trapped and passed down.
+ * In all cases, only first level TLB invalidation (request with PASID) can be
+ * passed down, therefore we do not include IOTLB granularity for request
+ * without PASID (second level).
+ *
+ * For an example, to find the VT-d granularity encoding for IOTLB
+ * type and page selective granularity within PASID:
+ * X: indexed by iommu cache type
+ * Y: indexed by enum iommu_inv_granularity
+ * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR]
+ *
+ * Granu_map array indicates validity of the table. 1: valid, 0: invalid
+ *
+ */
+const static int inv_type_granu_map[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = {
+	/* PASID based IOTLB, support PASID selective and page selective */
+	{0, 1, 1},
+	/* PASID based dev TLBs, only support all PASIDs or single PASID */
+	{1, 1, 0},
+	/* PASID cache */
+	{1, 1, 0}
+};
+
+const static u64 inv_type_granu_table[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = {
+	/* PASID based IOTLB */
+	{0, QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID},
+	/* PASID based dev TLBs */
+	{QI_DEV_IOTLB_GRAN_ALL, QI_DEV_IOTLB_GRAN_PASID_SEL, 0},
+	/* PASID cache */
+	{QI_PC_ALL_PASIDS, QI_PC_PASID_SEL, 0},
+};
+
+static inline int to_vtd_granularity(int type, int granu, u64 *vtd_granu)
+{
+	if (type >= IOMMU_CACHE_INV_TYPE_NR || granu >= IOMMU_INV_GRANU_NR ||
+		!inv_type_granu_map[type][granu])
+		return -EINVAL;
+
+	*vtd_granu = inv_type_granu_table[type][granu];
+
+	return 0;
+}
+
+static inline u64 to_vtd_size(u64 granu_size, u64 nr_granules)
+{
+	u64 nr_pages = (granu_size * nr_granules) >> VTD_PAGE_SHIFT;
+
+	/* VT-d size is encoded as 2^size of 4K pages, 0 for 4k, 9 for 2MB, etc.
+	 * IOMMU cache invalidate API passes granu_size in bytes, and number of
+	 * granu size in contiguous memory.
+	 */
+	return order_base_2(nr_pages);
+}
+
+#ifdef CONFIG_INTEL_IOMMU_SVM
+static int intel_iommu_sva_invalidate(struct iommu_domain *domain,
+		struct device *dev, struct iommu_cache_invalidate_info *inv_info)
+{
+	struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+	struct device_domain_info *info;
+	struct intel_iommu *iommu;
+	unsigned long flags;
+	int cache_type;
+	u8 bus, devfn;
+	u16 did, sid;
+	int ret = 0;
+	u64 size;
+
+	if (!inv_info || !dmar_domain ||
+		inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
+		return -EINVAL;
+
+	if (!dev || !dev_is_pci(dev))
+		return -ENODEV;
+
+	iommu = device_to_iommu(dev, &bus, &devfn);
+	if (!iommu)
+		return -ENODEV;
+
+	spin_lock_irqsave(&device_domain_lock, flags);
+	spin_lock(&iommu->lock);
+	info = iommu_support_dev_iotlb(dmar_domain, iommu, bus, devfn);
+	if (!info) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+	did = dmar_domain->iommu_did[iommu->seq_id];
+	sid = PCI_DEVID(bus, devfn);
+	size = to_vtd_size(inv_info->addr_info.granule_size, inv_info->addr_info.nb_granules);
+
+	for_each_set_bit(cache_type, (unsigned long *)&inv_info->cache, IOMMU_CACHE_INV_TYPE_NR) {
+		u64 granu = 0;
+		u64 pasid = 0;
+
+		ret = to_vtd_granularity(cache_type, inv_info->granularity, &granu);
+		if (ret) {
+			pr_err("Invalid cache type and granu combination %d/%d\n", cache_type,
+				inv_info->granularity);
+			break;
+		}
+
+		/* PASID is stored in different locations based on granularity */
+		if (inv_info->granularity == IOMMU_INV_GRANU_PASID)
+			pasid = inv_info->pasid_info.pasid;
+		else if (inv_info->granularity == IOMMU_INV_GRANU_ADDR)
+			pasid = inv_info->addr_info.pasid;
+		else {
+			pr_err("Cannot find PASID for given cache type and granularity\n");
+			break;
+		}
+
+		switch (BIT(cache_type)) {
+		case IOMMU_CACHE_INV_TYPE_IOTLB:
+			if (size && (inv_info->addr_info.addr & ((BIT(VTD_PAGE_SHIFT + size)) - 1))) {
+				pr_err("Address out of range, 0x%llx, size order %llu\n",
+					inv_info->addr_info.addr, size);
+				ret = -ERANGE;
+				goto out_unlock;
+			}
+
+			qi_flush_piotlb(iommu, did, mm_to_dma_pfn(inv_info->addr_info.addr),
+					pasid, size, granu, inv_info->addr_info.flags & IOMMU_INV_ADDR_FLAGS_LEAF);
+
+			/*
+			 * Always flush device IOTLB if ATS is enabled since guest
+			 * vIOMMU exposes CM = 1, no device IOTLB flush will be passed
+			 * down.
+			 */
+			if (info->ats_enabled) {
+				qi_flush_dev_piotlb(iommu, sid, info->pfsid,
+						pasid, info->ats_qdep,
+						inv_info->addr_info.addr, size,
+						granu);
+			}
+			break;
+		case IOMMU_CACHE_INV_TYPE_DEV_IOTLB:
+			if (info->ats_enabled) {
+				qi_flush_dev_piotlb(iommu, sid, info->pfsid,
+						inv_info->addr_info.pasid, info->ats_qdep,
+						inv_info->addr_info.addr, size,
+						granu);
+			} else
+				pr_warn("Passdown device IOTLB flush w/o ATS!\n");
+
+			break;
+		case IOMMU_CACHE_INV_TYPE_PASID:
+			qi_flush_pasid_cache(iommu, did, granu, inv_info->pasid_info.pasid);
+
+			break;
+		default:
+			dev_err(dev, "Unsupported IOMMU invalidation type %d\n",
+				cache_type);
+			ret = -EINVAL;
+		}
+	}
+out_unlock:
+	spin_unlock(&iommu->lock);
+	spin_unlock_irqrestore(&device_domain_lock, flags);
+
+	return ret;
+}
+#endif
+
 static int intel_iommu_map(struct iommu_domain *domain,
 			   unsigned long iova, phys_addr_t hpa,
 			   size_t size, int iommu_prot)
@@ -5783,6 +5952,7 @@ const struct iommu_ops intel_iommu_ops = {
 	.dev_disable_feat	= intel_iommu_dev_disable_feat,
 	.pgsize_bitmap		= INTEL_IOMMU_PGSIZES,
 #ifdef CONFIG_INTEL_IOMMU_SVM
+	.cache_invalidate	= intel_iommu_sva_invalidate,
 	.sva_bind_gpasid	= intel_svm_bind_gpasid,
 	.sva_unbind_gpasid	= intel_svm_unbind_gpasid,
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 10/22] iommu: Fix compile error without IOMMU_API
  2019-06-09 13:44 ` [PATCH v4 10/22] iommu: Fix compile error without IOMMU_API Jacob Pan
@ 2019-06-18 14:10   ` Jonathan Cameron
  2019-06-24 22:28     ` Jacob Pan
  0 siblings, 1 reply; 64+ messages in thread
From: Jonathan Cameron @ 2019-06-18 14:10 UTC (permalink / raw)
  To: Jacob Pan
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker, Tian, Kevin, Raj Ashok,
	Andriy Shevchenko

On Sun, 9 Jun 2019 06:44:10 -0700
Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:

> struct page_response_msg needs to be defined outside CONFIG_IOMMU_API.

What was the error? 

If this is a fix for an earlier patch in this series role it in there
(or put it before it). If more general we should add a fixes tag.

Jonathan
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
>  include/linux/iommu.h | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 7a37336..8d766a8 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -189,8 +189,6 @@ struct iommu_sva_ops {
>  	iommu_mm_exit_handler_t mm_exit;
>  };
>  
> -#ifdef CONFIG_IOMMU_API
> -
>  /**
>   * enum page_response_code - Return status of fault handlers, telling the IOMMU
>   * driver how to proceed with the fault.
> @@ -227,6 +225,7 @@ struct page_response_msg {
>  	u64 iommu_data;
>  };
>  
> +#ifdef CONFIG_IOMMU_API
>  /**
>   * struct iommu_ops - iommu ops and capabilities
>   * @capable: check capability



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 11/22] iommu: Introduce guest PASID bind function
  2019-06-09 13:44 ` [PATCH v4 11/22] iommu: Introduce guest PASID bind function Jacob Pan
@ 2019-06-18 15:36   ` Jean-Philippe Brucker
  2019-06-24 22:24     ` Jacob Pan
  2019-07-16 16:44   ` Auger Eric
  1 sibling, 1 reply; 64+ messages in thread
From: Jean-Philippe Brucker @ 2019-06-18 15:36 UTC (permalink / raw)
  To: Jacob Pan, iommu, LKML, Joerg Roedel, David Woodhouse,
	Eric Auger, Alex Williamson
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko

On 09/06/2019 14:44, Jacob Pan wrote:
> Guest shared virtual address (SVA) may require host to shadow guest
> PASID tables. Guest PASID can also be allocated from the host via
> enlightened interfaces. In this case, guest needs to bind the guest
> mm, i.e. cr3 in guest physical address to the actual PASID table in
> the host IOMMU. Nesting will be turned on such that guest virtual
> address can go through a two level translation:
> - 1st level translates GVA to GPA
> - 2nd level translates GPA to HPA
> This patch introduces APIs to bind guest PASID data to the assigned
> device entry in the physical IOMMU. See the diagram below for usage
> explaination.

explanation

> 
>     .-------------.  .---------------------------.
>     |   vIOMMU    |  | Guest process mm, FL only |
>     |             |  '---------------------------'
>     .----------------/
>     | PASID Entry |--- PASID cache flush -
>     '-------------'                       |
>     |             |                       V
>     |             |                      GP
>     '-------------'
> Guest
> ------| Shadow |----------------------- GP->HP* ---------
>       v        v                          |
> Host                                      v
>     .-------------.  .----------------------.
>     |   pIOMMU    |  | Bind FL for GVA-GPA  |
>     |             |  '----------------------'
>     .----------------/  |
>     | PASID Entry |     V (Nested xlate)
>     '----------------\.---------------------.
>     |             |   |Set SL to GPA-HPA    |
>     |             |   '---------------------'
>     '-------------'
> 
> Where:
>  - FL = First level/stage one page tables
>  - SL = Second level/stage two page tables
>  - GP = Guest PASID
>  - HP = Host PASID
> * Conversion needed if non-identity GP-HP mapping option is chosen.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> ---
>  drivers/iommu/iommu.c      | 20 ++++++++++++++++
>  include/linux/iommu.h      | 21 +++++++++++++++++
>  include/uapi/linux/iommu.h | 58 ++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 99 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 1758b57..d0416f60 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1648,6 +1648,26 @@ int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
>  }
>  EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
>  
> +int iommu_sva_bind_gpasid(struct iommu_domain *domain,
> +			struct device *dev, struct gpasid_bind_data *data)

I'm curious about the VFIO side of this. Is the ioctl on the device or
on the container fd? For bind_pasid_table, it's on the container and we
only pass the iommu_domain to the IOMMU driver, not the device (since
devices in a domain share the same PASID table).

> +{
> +	if (unlikely(!domain->ops->sva_bind_gpasid))
> +		return -ENODEV;
> +
> +	return domain->ops->sva_bind_gpasid(domain, dev, data);
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_bind_gpasid);
> +
> +int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
> +			ioasid_t pasid)
> +{
> +	if (unlikely(!domain->ops->sva_unbind_gpasid))
> +		return -ENODEV;
> +
> +	return domain->ops->sva_unbind_gpasid(dev, pasid);
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid);
> +
>  static void __iommu_detach_device(struct iommu_domain *domain,
>  				  struct device *dev)
>  {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 8d766a8..560c8c8 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -25,6 +25,7 @@
>  #include <linux/errno.h>
>  #include <linux/err.h>
>  #include <linux/of.h>
> +#include <linux/ioasid.h>
>  #include <uapi/linux/iommu.h>
>  
>  #define IOMMU_READ	(1 << 0)
> @@ -267,6 +268,8 @@ struct page_response_msg {
>   * @detach_pasid_table: detach the pasid table
>   * @cache_invalidate: invalidate translation caches
>   * @pgsize_bitmap: bitmap of all possible supported page sizes
> + * @sva_bind_gpasid: bind guest pasid and mm
> + * @sva_unbind_gpasid: unbind guest pasid and mm
>   */
>  struct iommu_ops {
>  	bool (*capable)(enum iommu_cap);
> @@ -332,6 +335,10 @@ struct iommu_ops {
>  	int (*page_response)(struct device *dev, struct page_response_msg *msg);
>  	int (*cache_invalidate)(struct iommu_domain *domain, struct device *dev,
>  				struct iommu_cache_invalidate_info *inv_info);
> +	int (*sva_bind_gpasid)(struct iommu_domain *domain,
> +			struct device *dev, struct gpasid_bind_data *data);
> +
> +	int (*sva_unbind_gpasid)(struct device *dev, int pasid);
>  
>  	unsigned long pgsize_bitmap;
>  };
> @@ -447,6 +454,10 @@ extern void iommu_detach_pasid_table(struct iommu_domain *domain);
>  extern int iommu_cache_invalidate(struct iommu_domain *domain,
>  				  struct device *dev,
>  				  struct iommu_cache_invalidate_info *inv_info);
> +extern int iommu_sva_bind_gpasid(struct iommu_domain *domain,
> +		struct device *dev, struct gpasid_bind_data *data);
> +extern int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
> +				struct device *dev, ioasid_t pasid);
>  extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
>  extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
>  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
> @@ -998,6 +1009,16 @@ iommu_cache_invalidate(struct iommu_domain *domain,
>  {
>  	return -ENODEV;
>  }
> +static inline int iommu_sva_bind_gpasid(struct iommu_domain *domain,
> +				struct device *dev, struct gpasid_bind_data *data)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline int sva_unbind_gpasid(struct device *dev, int pasid)

The prototype above also has a domain argument

> +{
> +	return -ENODEV;
> +}
>  
>  #endif /* CONFIG_IOMMU_API */
>  
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index ca4b753..a9cdc63 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -277,4 +277,62 @@ struct iommu_cache_invalidate_info {
>  	};
>  };
>  
> +/**
> + * struct gpasid_bind_data_vtd - Intel VT-d specific data on device and guest
> + * SVA binding.
> + *
> + * @flags:	VT-d PASID table entry attributes
> + * @pat:	Page attribute table data to compute effective memory type
> + * @emt:	Extended memory type
> + *
> + * Only guest vIOMMU selectable and effective options are passed down to
> + * the host IOMMU.
> + */
> +struct gpasid_bind_data_vtd {
> +#define IOMMU_SVA_VTD_GPASID_SRE	(1 << 0) /* supervisor request */
> +#define IOMMU_SVA_VTD_GPASID_EAFE	(1 << 1) /* extended access enable */
> +#define IOMMU_SVA_VTD_GPASID_PCD	(1 << 2) /* page-level cache disable */
> +#define IOMMU_SVA_VTD_GPASID_PWT	(1 << 3) /* page-level write through */
> +#define IOMMU_SVA_VTD_GPASID_EMTE	(1 << 4) /* extended mem type enable */
> +#define IOMMU_SVA_VTD_GPASID_CD		(1 << 5) /* PASID-level cache disable */
> +	__u64 flags;
> +	__u32 pat;
> +	__u32 emt;
> +};
> +
> +/**
> + * struct gpasid_bind_data - Information about device and guest PASID binding
> + * @version:	Version of this data structure
> + * @format:	PASID table entry format
> + * @flags:	Additional information on guest bind request
> + * @gpgd:	Guest page directory base of the guest mm to bind
> + * @hpasid:	Process address space ID used for the guest mm in host IOMMU
> + * @gpasid:	Process address space ID used for the guest mm in guest IOMMU
> + * @addr_width:	Guest virtual address width

+ "in bits"

> + * @vtd:	Intel VT-d specific data
> + *
> + * Guest to host PASID mapping can be an identity or non-identity, where guest
> + * has its own PASID space. For non-identify mapping, guest to host PASID lookup
> + * is needed when VM programs guest PASID into an assigned device. VMM may
> + * trap such PASID programming then request host IOMMU driver to convert guest
> + * PASID to host PASID based on this bind data.
> + */
> +struct gpasid_bind_data {
> +#define IOMMU_GPASID_BIND_VERSION_1	1
> +	__u32 version;
> +#define IOMMU_PASID_FORMAT_INTEL_VTD	1
> +	__u32 format;
> +#define IOMMU_SVA_GPASID_VAL	(1 << 0) /* guest PASID valid */
> +	__u64 flags;
> +	__u64 gpgd;
> +	__u64 hpasid;
> +	__u64 gpasid;
> +	__u32 addr_width;

We could use a __u8 for addr_width

Thanks,
Jean

> +	__u8  padding[4];
> +	/* Vendor specific data */
> +	union {
> +		struct gpasid_bind_data_vtd vtd;
> +	};
> +};
> +
>  #endif /* _UAPI_IOMMU_H */
> 


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 09/22] iommu: Introduce cache_invalidate API
  2019-06-09 13:44 ` [PATCH v4 09/22] iommu: Introduce cache_invalidate API Jacob Pan
@ 2019-06-18 15:41   ` Jonathan Cameron
  0 siblings, 0 replies; 64+ messages in thread
From: Jonathan Cameron @ 2019-06-18 15:41 UTC (permalink / raw)
  To: Jacob Pan
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker, Tian, Kevin, Raj Ashok,
	Andriy Shevchenko

On Sun, 9 Jun 2019 06:44:09 -0700
Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:

> From: Liu Yi L <yi.l.liu@intel.com>
> 
> In any virtualization use case, when the first translation stage
> is "owned" by the guest OS, the host IOMMU driver has no knowledge
> of caching structure updates unless the guest invalidation activities
> are trapped by the virtualizer and passed down to the host.
> 
> Since the invalidation data are obtained from user space and will be
> written into physical IOMMU, we must allow security check at various
> layers. Therefore, generic invalidation data format are proposed here,
> model specific IOMMU drivers need to convert them into their own format.
> 
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Some comment ordering nitpicks.  Nothing important.

Jonathan

> ---
>  drivers/iommu/iommu.c      |  10 +++++
>  include/linux/iommu.h      |  14 ++++++
>  include/uapi/linux/iommu.h | 110 +++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 134 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 4496ccd..1758b57 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1638,6 +1638,16 @@ void iommu_detach_pasid_table(struct iommu_domain *domain)
>  }
>  EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
>  
> +int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
> +			   struct iommu_cache_invalidate_info *inv_info)
> +{
> +	if (unlikely(!domain->ops->cache_invalidate))
> +		return -ENODEV;
> +
> +	return domain->ops->cache_invalidate(domain, dev, inv_info);
> +}
> +EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
> +
>  static void __iommu_detach_device(struct iommu_domain *domain,
>  				  struct device *dev)
>  {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index d3edb10..7a37336 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -266,6 +266,7 @@ struct page_response_msg {
>   * @page_response: handle page request response
>   * @attach_pasid_table: attach a pasid table
>   * @detach_pasid_table: detach the pasid table
> + * @cache_invalidate: invalidate translation caches
>   * @pgsize_bitmap: bitmap of all possible supported page sizes
>   */
>  struct iommu_ops {
> @@ -330,6 +331,8 @@ struct iommu_ops {
>  	void (*detach_pasid_table)(struct iommu_domain *domain);
>  
>  	int (*page_response)(struct device *dev, struct page_response_msg *msg);
> +	int (*cache_invalidate)(struct iommu_domain *domain, struct device *dev,
> +				struct iommu_cache_invalidate_info *inv_info);
>  
>  	unsigned long pgsize_bitmap;
>  };
> @@ -442,6 +445,9 @@ extern void iommu_detach_device(struct iommu_domain *domain,
>  extern int iommu_attach_pasid_table(struct iommu_domain *domain,
>  				    struct iommu_pasid_table_config *cfg);
>  extern void iommu_detach_pasid_table(struct iommu_domain *domain);
> +extern int iommu_cache_invalidate(struct iommu_domain *domain,
> +				  struct device *dev,
> +				  struct iommu_cache_invalidate_info *inv_info);
>  extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
>  extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
>  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
> @@ -986,6 +992,14 @@ static inline int iommu_sva_get_pasid(struct iommu_sva *handle)
>  static inline
>  void iommu_detach_pasid_table(struct iommu_domain *domain) {}
>  
> +static inline int
> +iommu_cache_invalidate(struct iommu_domain *domain,
> +		       struct device *dev,
> +		       struct iommu_cache_invalidate_info *inv_info)
> +{
> +	return -ENODEV;
> +}
> +
>  #endif /* CONFIG_IOMMU_API */
>  
>  #ifdef CONFIG_IOMMU_DEBUGFS
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index 3976767..ca4b753 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -167,4 +167,114 @@ struct iommu_pasid_table_config {
>  	};
>  };
>  
> +/* defines the granularity of the invalidation */
> +enum iommu_inv_granularity {
> +	IOMMU_INV_GRANU_DOMAIN,	/* domain-selective invalidation */
> +	IOMMU_INV_GRANU_PASID,	/* PASID-selective invalidation */
> +	IOMMU_INV_GRANU_ADDR,	/* page-selective invalidation */
> +	IOMMU_INV_GRANU_NR,	/* number of invalidation granularities */
> +};
> +
> +/**
> + * struct iommu_inv_addr_info - Address Selective Invalidation Structure
> + *
> + * @flags: indicates the granularity of the address-selective invalidation
> + * - If the PASID bit is set, the @pasid field is populated and the invalidation
> + *   relates to cache entries tagged with this PASID and matching the address
> + *   range.
> + * - If ARCHID bit is set, @archid is populated and the invalidation relates
> + *   to cache entries tagged with this architecture specific ID and matching
> + *   the address range.
> + * - Both PASID and ARCHID can be set as they may tag different caches.
> + * - If neither PASID or ARCHID is set, global addr invalidation applies.
> + * - The LEAF flag indicates whether only the leaf PTE caching needs to be
> + *   invalidated and other paging structure caches can be preserved.
> + * @pasid: process address space ID
> + * @archid: architecture-specific ID

Parameter ordering should match between docs and structure.
Might make more sense in some ways to not do so but the kernel-doc
guid states
"The kernel-doc data structure comments describe each member of the structure,
in order, with the @member: descriptions. "

> + * @addr: first stage/level input address
> + * @granule_size: page/block size of the mapping in bytes
> + * @nb_granules: number of contiguous granules to be invalidated
> + */
> +struct iommu_inv_addr_info {
> +#define IOMMU_INV_ADDR_FLAGS_PASID	(1 << 0)
> +#define IOMMU_INV_ADDR_FLAGS_ARCHID	(1 << 1)
> +#define IOMMU_INV_ADDR_FLAGS_LEAF	(1 << 2)
> +	__u32	flags;
> +	__u32	archid;
> +	__u64	pasid;
> +	__u64	addr;
> +	__u64	granule_size;
> +	__u64	nb_granules;
> +};
> +
> +/**
> + * struct iommu_inv_pasid_info - PASID Selective Invalidation Structure
> + *
> + * @flags: indicates the granularity of the PASID-selective invalidation
> + * - If the PASID bit is set, the @pasid field is populated and the invalidation
> + *   relates to cache entries tagged with this PASID and matching the address
> + *   range.
> + * - If the ARCHID bit is set, the @archid is populated and the invalidation
> + *   relates to cache entries tagged with this architecture specific ID and
> + *   matching the address range.
> + * - Both PASID and ARCHID can be set as they may tag different caches.
> + * - At least one of PASID or ARCHID must be set.
> + * @pasid: process address space ID
> + * @archid: architecture-specific ID
Ordering of parameters is different from below.
> + */
> +struct iommu_inv_pasid_info {
> +#define IOMMU_INV_PASID_FLAGS_PASID	(1 << 0)
> +#define IOMMU_INV_PASID_FLAGS_ARCHID	(1 << 1)
> +	__u32	flags;
> +	__u32	archid;
> +	__u64	pasid;
> +};
> +
> +/**
> + * struct iommu_cache_invalidate_info - First level/stage invalidation
> + *     information
> + * @version: API version of this structure
> + * @cache: bitfield that allows to select which caches to invalidate
> + * @granularity: defines the lowest granularity used for the invalidation:
> + *     domain > PASID > addr
> + * @padding: reserved for future use (should be zero)
> + * @pasid_info: invalidation data when @granularity is %IOMMU_INV_GRANU_PASID
> + * @addr_info: invalidation data when @granularity is %IOMMU_INV_GRANU_ADDR
> + *
> + * Not all the combinations of cache/granularity are valid:
> + *
> + * +--------------+---------------+---------------+---------------+
> + * | type /       |   DEV_IOTLB   |     IOTLB     |      PASID    |
> + * | granularity  |               |               |      cache    |
> + * +==============+===============+===============+===============+
> + * | DOMAIN       |       N/A     |       Y       |       Y       |
> + * +--------------+---------------+---------------+---------------+
> + * | PASID        |       Y       |       Y       |       Y       |
> + * +--------------+---------------+---------------+---------------+
> + * | ADDR         |       Y       |       Y       |       N/A     |
> + * +--------------+---------------+---------------+---------------+
> + *
> + * Invalidations by %IOMMU_INV_GRANU_DOMAIN don't take any argument other than
> + * @version and @cache.
> + *
> + * If multiple cache types are invalidated simultaneously, they all
> + * must support the used granularity.
> + */
> +struct iommu_cache_invalidate_info {
> +#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
> +	__u32	version;
> +/* IOMMU paging structure cache */
> +#define IOMMU_CACHE_INV_TYPE_IOTLB	(1 << 0) /* IOMMU IOTLB */
> +#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB	(1 << 1) /* Device IOTLB */
> +#define IOMMU_CACHE_INV_TYPE_PASID	(1 << 2) /* PASID cache */
> +#define IOMMU_CACHE_INV_TYPE_NR		(3)
> +	__u8	cache;
> +	__u8	granularity;
> +	__u8	padding[2];
> +	union {
> +		struct iommu_inv_pasid_info pasid_info;
> +		struct iommu_inv_addr_info addr_info;
> +	};
> +};
> +
>  #endif /* _UAPI_IOMMU_H */



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 08/22] iommu: Introduce attach/detach_pasid_table API
  2019-06-09 13:44 ` [PATCH v4 08/22] iommu: Introduce attach/detach_pasid_table API Jacob Pan
@ 2019-06-18 15:41   ` Jonathan Cameron
  2019-06-24 15:06     ` Auger Eric
  0 siblings, 1 reply; 64+ messages in thread
From: Jonathan Cameron @ 2019-06-18 15:41 UTC (permalink / raw)
  To: Jacob Pan
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker, Yi L, Tian, Kevin,
	Raj Ashok, Liu, Andriy Shevchenko

On Sun, 9 Jun 2019 06:44:08 -0700
Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:

> In virtualization use case, when a guest is assigned
> a PCI host device, protected by a virtual IOMMU on the guest,
> the physical IOMMU must be programmed to be consistent with
> the guest mappings. If the physical IOMMU supports two
> translation stages it makes sense to program guest mappings
> onto the first stage/level (ARM/Intel terminology) while the host
> owns the stage/level 2.
> 
> In that case, it is mandated to trap on guest configuration
> settings and pass those to the physical iommu driver.
> 
> This patch adds a new API to the iommu subsystem that allows
> to set/unset the pasid table information.
> 
> A generic iommu_pasid_table_config struct is introduced in
> a new iommu.h uapi header. This is going to be used by the VFIO
> user API.

Another case where strictly speaking stuff is introduced that this series
doesn't use.  I don't know what the plans are to merge the various
related series though so this might make sense in general. Right now
it just bloats this series a bit..
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  drivers/iommu/iommu.c      | 19 +++++++++++++++++
>  include/linux/iommu.h      | 18 ++++++++++++++++
>  include/uapi/linux/iommu.h | 52 ++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 89 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 166adb8..4496ccd 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1619,6 +1619,25 @@ int iommu_page_response(struct device *dev,
>  }
>  EXPORT_SYMBOL_GPL(iommu_page_response);
>  
> +int iommu_attach_pasid_table(struct iommu_domain *domain,
> +			     struct iommu_pasid_table_config *cfg)
> +{
> +	if (unlikely(!domain->ops->attach_pasid_table))
> +		return -ENODEV;
> +
> +	return domain->ops->attach_pasid_table(domain, cfg);
> +}
> +EXPORT_SYMBOL_GPL(iommu_attach_pasid_table);
> +
> +void iommu_detach_pasid_table(struct iommu_domain *domain)
> +{
> +	if (unlikely(!domain->ops->detach_pasid_table))
> +		return;
> +
> +	domain->ops->detach_pasid_table(domain);
> +}
> +EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
> +
>  static void __iommu_detach_device(struct iommu_domain *domain,
>  				  struct device *dev)
>  {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 950347b..d3edb10 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -264,6 +264,8 @@ struct page_response_msg {
>   * @sva_unbind: Unbind process address space from device
>   * @sva_get_pasid: Get PASID associated to a SVA handle
>   * @page_response: handle page request response
> + * @attach_pasid_table: attach a pasid table
> + * @detach_pasid_table: detach the pasid table
>   * @pgsize_bitmap: bitmap of all possible supported page sizes
>   */
>  struct iommu_ops {
> @@ -323,6 +325,9 @@ struct iommu_ops {
>  				      void *drvdata);
>  	void (*sva_unbind)(struct iommu_sva *handle);
>  	int (*sva_get_pasid)(struct iommu_sva *handle);
> +	int (*attach_pasid_table)(struct iommu_domain *domain,
> +				  struct iommu_pasid_table_config *cfg);
> +	void (*detach_pasid_table)(struct iommu_domain *domain);
>  
>  	int (*page_response)(struct device *dev, struct page_response_msg *msg);
>  
> @@ -434,6 +439,9 @@ extern int iommu_attach_device(struct iommu_domain *domain,
>  			       struct device *dev);
>  extern void iommu_detach_device(struct iommu_domain *domain,
>  				struct device *dev);
> +extern int iommu_attach_pasid_table(struct iommu_domain *domain,
> +				    struct iommu_pasid_table_config *cfg);
> +extern void iommu_detach_pasid_table(struct iommu_domain *domain);
>  extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
>  extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
>  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
> @@ -947,6 +955,13 @@ iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev)
>  	return -ENODEV;
>  }
>  
> +static inline
> +int iommu_attach_pasid_table(struct iommu_domain *domain,
> +			     struct iommu_pasid_table_config *cfg)
> +{
> +	return -ENODEV;
> +}
> +
>  static inline struct iommu_sva *
>  iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
>  {
> @@ -968,6 +983,9 @@ static inline int iommu_sva_get_pasid(struct iommu_sva *handle)
>  	return IOMMU_PASID_INVALID;
>  }
>  
> +static inline
> +void iommu_detach_pasid_table(struct iommu_domain *domain) {}
> +
>  #endif /* CONFIG_IOMMU_API */
>  
>  #ifdef CONFIG_IOMMU_DEBUGFS
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index aaa3b6a..3976767 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -115,4 +115,56 @@ struct iommu_fault {
>  		struct iommu_fault_page_request prm;
>  	};
>  };
> +
> +/**
> + * struct iommu_pasid_smmuv3 - ARM SMMUv3 Stream Table Entry stage 1 related
> + *     information
> + * @version: API version of this structure
> + * @s1fmt: STE s1fmt (format of the CD table: single CD, linear table
> + *         or 2-level table)
> + * @s1dss: STE s1dss (specifies the behavior when @pasid_bits != 0
> + *         and no PASID is passed along with the incoming transaction)
> + * @padding: reserved for future use (should be zero)
> + *
> + * The PASID table is referred to as the Context Descriptor (CD) table on ARM
> + * SMMUv3. Please refer to the ARM SMMU 3.x spec (ARM IHI 0070A) for full
> + * details.
> + */
> +struct iommu_pasid_smmuv3 {
> +#define PASID_TABLE_SMMUV3_CFG_VERSION_1 1
> +	__u32	version;
> +	__u8	s1fmt;
> +	__u8	s1dss;
> +	__u8	padding[2];
> +};
> +
> +/**
> + * struct iommu_pasid_table_config - PASID table data used to bind guest PASID
> + *     table to the host IOMMU
> + * @version: API version to prepare for future extensions
> + * @format: format of the PASID table
> + * @base_ptr: guest physical address of the PASID table
> + * @pasid_bits: number of PASID bits used in the PASID table
> + * @config: indicates whether the guest translation stage must
> + *          be translated, bypassed or aborted.
> + * @padding: reserved for future use (should be zero)
> + * @smmuv3: table information when @format is %IOMMU_PASID_FORMAT_SMMUV3
> + */
> +struct iommu_pasid_table_config {
> +#define PASID_TABLE_CFG_VERSION_1 1
> +	__u32	version;
> +#define IOMMU_PASID_FORMAT_SMMUV3	1
> +	__u32	format;
> +	__u64	base_ptr;
> +	__u8	pasid_bits;
> +#define IOMMU_PASID_CONFIG_TRANSLATE	1
> +#define IOMMU_PASID_CONFIG_BYPASS	2
> +#define IOMMU_PASID_CONFIG_ABORT	3
> +	__u8	config;
> +	__u8    padding[6];
> +	union {
> +		struct iommu_pasid_smmuv3 smmuv3;
> +	};
> +};
> +
>  #endif /* _UAPI_IOMMU_H */



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 03/22] iommu: Introduce device fault report API
  2019-06-09 13:44 ` [PATCH v4 03/22] iommu: Introduce device fault report API Jacob Pan
@ 2019-06-18 15:41   ` Jonathan Cameron
  0 siblings, 0 replies; 64+ messages in thread
From: Jonathan Cameron @ 2019-06-18 15:41 UTC (permalink / raw)
  To: Jacob Pan
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker, Tian, Kevin, Raj Ashok,
	Andriy Shevchenko

On Sun, 9 Jun 2019 06:44:03 -0700
Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:

> Traditionally, device specific faults are detected and handled within
> their own device drivers. When IOMMU is enabled, faults such as DMA
> related transactions are detected by IOMMU. There is no generic
> reporting mechanism to report faults back to the in-kernel device
> driver or the guest OS in case of assigned devices.
> 
> This patch introduces a registration API for device specific fault
> handlers. This differs from the existing iommu_set_fault_handler/
> report_iommu_fault infrastructures in several ways:
> - it allows to report more sophisticated fault events (both
>   unrecoverable faults and page request faults) due to the nature
>   of the iommu_fault struct
> - it is device specific and not domain specific.
> 
> The current iommu_report_device_fault() implementation only handles
> the "shoot and forget" unrecoverable fault case. Handling of page
> request faults or stalled faults will come later.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>

A few nitpicks and minor suggestions inline.

> ---
>  drivers/iommu/iommu.c | 127 +++++++++++++++++++++++++++++++++++++++++++++++++-
>  include/linux/iommu.h |  33 ++++++++++++-
>  2 files changed, 157 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 67ee662..7955184 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -644,6 +644,13 @@ int iommu_group_add_device(struct iommu_group *group, struct device *dev)
>  		goto err_free_name;
>  	}
>  
> +	dev->iommu_param = kzalloc(sizeof(*dev->iommu_param), GFP_KERNEL);
> +	if (!dev->iommu_param) {
> +		ret = -ENOMEM;
> +		goto err_free_name;
> +	}
> +	mutex_init(&dev->iommu_param->lock);
> +
>  	kobject_get(group->devices_kobj);
>  
>  	dev->iommu_group = group;
> @@ -674,6 +681,7 @@ int iommu_group_add_device(struct iommu_group *group, struct device *dev)
>  	mutex_unlock(&group->mutex);
>  	dev->iommu_group = NULL;
>  	kobject_put(group->devices_kobj);
> +	kfree(dev->iommu_param);
>  err_free_name:
>  	kfree(device->name);
>  err_remove_link:
> @@ -720,7 +728,7 @@ void iommu_group_remove_device(struct device *dev)
>  	sysfs_remove_link(&dev->kobj, "iommu_group");
>  
>  	trace_remove_device_from_group(group->id, dev);
> -
> +	kfree(dev->iommu_param);
>  	kfree(device->name);
>  	kfree(device);
>  	dev->iommu_group = NULL;
> @@ -855,6 +863,123 @@ int iommu_group_unregister_notifier(struct iommu_group *group,
>  EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
>  
>  /**
> + * iommu_register_device_fault_handler() - Register a device fault handler
> + * @dev: the device
> + * @handler: the fault handler
> + * @data: private data passed as argument to the handler
> + *
> + * When an IOMMU fault event is received, this handler gets called with the
> + * fault event and data as argument.
> + *
> + * Return 0 if the fault handler was installed successfully, or an error.
> + */
> +int iommu_register_device_fault_handler(struct device *dev,
> +					iommu_dev_fault_handler_t handler,
> +					void *data)
> +{
> +	struct iommu_param *param = dev->iommu_param;
> +	int ret = 0;
> +
> +	/*
> +	 * Device iommu_param should have been allocated when device is
> +	 * added to its iommu_group.
> +	 */
> +	if (!param)
> +		return -EINVAL;
> +
> +	mutex_lock(&param->lock);
> +	/* Only allow one fault handler registered for each device */
> +	if (param->fault_param) {
> +		ret = -EBUSY;
> +		goto done_unlock;
> +	}
> +
> +	get_device(dev);
> +	param->fault_param =
> +		kzalloc(sizeof(struct iommu_fault_param), GFP_KERNEL);
> +	if (!param->fault_param) {
> +		put_device(dev);
> +		ret = -ENOMEM;
> +		goto done_unlock;
> +	}
> +	param->fault_param->handler = handler;
> +	param->fault_param->data = data;
> +
> +done_unlock:
> +	mutex_unlock(&param->lock);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_register_device_fault_handler);
> +
> +/**
> + * iommu_unregister_device_fault_handler() - Unregister the device fault handler
> + * @dev: the device
> + *
> + * Remove the device fault handler installed with
> + * iommu_register_device_fault_handler().
> + *
> + * Return 0 on success, or an error.
> + */
> +int iommu_unregister_device_fault_handler(struct device *dev)
> +{
> +	struct iommu_param *param = dev->iommu_param;
> +	int ret = 0;
> +
> +	if (!param)
> +		return -EINVAL;
> +
> +	mutex_lock(&param->lock);
> +
> +	if (!param->fault_param)
> +		goto unlock;
> +
> +	kfree(param->fault_param);
> +	param->fault_param = NULL;
> +	put_device(dev);
> +unlock:
> +	mutex_unlock(&param->lock);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_unregister_device_fault_handler);
> +

One blank line probably enough.. (my personal favourite
stylistic nitpick ;)

> +
> +/**
> + * iommu_report_device_fault() - Report fault event to device
> + * @dev: the device
> + * @evt: fault event data
> + *
> + * Called by IOMMU drivers when a fault is detected, typically in a threaded IRQ
> + * handler.

What constraints does that put on it?  We probably don't care about the
fact it is in a threaded irq as such, rather something that is true as
a result of that?

> + *
> + * Return 0 on success, or an error.
> + */
> +int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
> +{
> +	struct iommu_param *param = dev->iommu_param;
> +	struct iommu_fault_param *fparam;
> +	int ret = 0;
> +
> +	/* iommu_param is allocated when device is added to group */
> +	if (!param || !evt)
> +		return -EINVAL;
> +
> +	/* we only report device fault if there is a handler registered */
> +	mutex_lock(&param->lock);
> +	fparam = param->fault_param;
> +	if (!fparam || !fparam->handler) {
> +		ret = -EINVAL;
> +		goto done_unlock;
> +	}
> +	ret = fparam->handler(evt, fparam->data);
> +done_unlock:
> +	mutex_unlock(&param->lock);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_report_device_fault);
> +
> +/**
>   * iommu_group_id - Return ID for a group
>   * @group: the group to ID
>   *
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 7890a92..b87b74c 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -322,9 +322,9 @@ struct iommu_fault_event {
>  
>  /**
>   * struct iommu_fault_param - per-device IOMMU fault data
> - * @dev_fault_handler: Callback function to handle IOMMU faults at device level
> - * @data: handler private data
>   *
> + * @handler: Callback function to handle IOMMU faults at device level

This should be in the previous patch, not this one.

> + * @data: handler private data
>   */
>  struct iommu_fault_param {
>  	iommu_dev_fault_handler_t handler;
> @@ -341,6 +341,7 @@ struct iommu_fault_param {
>   *	struct iommu_fwspec	*iommu_fwspec;
>   */
>  struct iommu_param {
> +	struct mutex lock;

Structure has kernel doc so this should be added to that.

>  	struct iommu_fault_param *fault_param;
>  };
>  
> @@ -433,6 +434,15 @@ extern int iommu_group_register_notifier(struct iommu_group *group,
>  					 struct notifier_block *nb);
>  extern int iommu_group_unregister_notifier(struct iommu_group *group,
>  					   struct notifier_block *nb);
> +extern int iommu_register_device_fault_handler(struct device *dev,
> +					iommu_dev_fault_handler_t handler,
> +					void *data);
> +
> +extern int iommu_unregister_device_fault_handler(struct device *dev);
> +
> +extern int iommu_report_device_fault(struct device *dev,
> +				     struct iommu_fault_event *evt);
> +
>  extern int iommu_group_id(struct iommu_group *group);
>  extern struct iommu_group *iommu_group_get_for_dev(struct device *dev);
>  extern struct iommu_domain *iommu_group_default_domain(struct iommu_group *);
> @@ -741,6 +751,25 @@ static inline int iommu_group_unregister_notifier(struct iommu_group *group,
>  	return 0;
>  }
>  
> +static inline
> +int iommu_register_device_fault_handler(struct device *dev,
> +					iommu_dev_fault_handler_t handler,
> +					void *data)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline int iommu_unregister_device_fault_handler(struct device *dev)
> +{
> +	return 0;
> +}
> +
> +static inline
> +int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
> +{
> +	return -ENODEV;
> +}
> +
>  static inline int iommu_group_id(struct iommu_group *group)
>  {
>  	return -ENODEV;



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 02/22] iommu: Introduce device fault data
  2019-06-09 13:44 ` [PATCH v4 02/22] iommu: Introduce device fault data Jacob Pan
@ 2019-06-18 15:42   ` Jonathan Cameron
  0 siblings, 0 replies; 64+ messages in thread
From: Jonathan Cameron @ 2019-06-18 15:42 UTC (permalink / raw)
  To: Jacob Pan
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker, Yi L, Tian, Kevin,
	Raj Ashok, Liu, Andriy Shevchenko

On Sun, 9 Jun 2019 06:44:02 -0700
Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:

> Device faults detected by IOMMU can be reported outside the IOMMU
> subsystem for further processing. This patch introduces
> a generic device fault data structure.
> 
> The fault can be either an unrecoverable fault or a page request,
> also referred to as a recoverable fault.
> 
> We only care about non internal faults that are likely to be reported
> to an external subsystem.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>

A few trivial nitpicks in here.

Otherwise looks straight forward and sensible to me.

Jonathan
> ---
>  include/linux/iommu.h      |  44 +++++++++++++++++
>  include/uapi/linux/iommu.h | 118 +++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 162 insertions(+)
>  create mode 100644 include/uapi/linux/iommu.h
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index a815cf6..7890a92 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -25,6 +25,7 @@
>  #include <linux/errno.h>
>  #include <linux/err.h>
>  #include <linux/of.h>
> +#include <uapi/linux/iommu.h>
>  
>  #define IOMMU_READ	(1 << 0)
>  #define IOMMU_WRITE	(1 << 1)
> @@ -49,6 +50,7 @@ struct device;
>  struct iommu_domain;
>  struct notifier_block;
>  struct iommu_sva;
> +struct iommu_fault_event;
>  
>  /* iommu fault flags */
>  #define IOMMU_FAULT_READ	0x0
> @@ -58,6 +60,7 @@ typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
>  			struct device *, unsigned long, int, void *);
>  typedef int (*iommu_mm_exit_handler_t)(struct device *dev, struct iommu_sva *,
>  				       void *);
> +typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault_event *, void *);
>  
>  struct iommu_domain_geometry {
>  	dma_addr_t aperture_start; /* First address that can be mapped    */
> @@ -301,6 +304,46 @@ struct iommu_device {
>  	struct device *dev;
>  };
>  
> +/**
> + * struct iommu_fault_event - Generic fault event
> + *
> + * Can represent recoverable faults such as a page requests or
> + * unrecoverable faults such as DMA or IRQ remapping faults.
> + *
> + * @fault: fault descriptor
> + * @iommu_private: used by the IOMMU driver for storing fault-specific
> + *                 data. Users should not modify this field before
> + *                 sending the fault response.
> + */
> +struct iommu_fault_event {
> +	struct iommu_fault fault;
> +	u64 iommu_private;
> +};
> +
> +/**
> + * struct iommu_fault_param - per-device IOMMU fault data
> + * @dev_fault_handler: Callback function to handle IOMMU faults at device level
> + * @data: handler private data
> + *

No need for this blank line.  Seems inconsistent with other docs in here.

Also, there is a docs fixup in patch 3 which should be pulled back to here.

> + */
> +struct iommu_fault_param {
> +	iommu_dev_fault_handler_t handler;
> +	void *data;
> +};
> +
> +/**
> + * struct iommu_param - collection of per-device IOMMU data
> + *
> + * @fault_param: IOMMU detected device fault reporting data
> + *
> + * TODO: migrate other per device data pointers under iommu_dev_data, e.g.
> + *	struct iommu_group	*iommu_group;
> + *	struct iommu_fwspec	*iommu_fwspec;
> + */
> +struct iommu_param {
> +	struct iommu_fault_param *fault_param;
Is it actually worth having the indirection of a pointer here as opposed
to just embedding the actual structure?  The null value is used in places
but can just use the handler being null for the same job I think...

It reduces the code needed in patch 3 a bit.

It gets a bit bigger in patch 4, but is still only about 16 bytes.

> +};
> +
>  int  iommu_device_register(struct iommu_device *iommu);
>  void iommu_device_unregister(struct iommu_device *iommu);
>  int  iommu_device_sysfs_add(struct iommu_device *iommu,
> @@ -504,6 +547,7 @@ struct iommu_ops {};
>  struct iommu_group {};
>  struct iommu_fwspec {};
>  struct iommu_device {};
> +struct iommu_fault_param {};
>  
>  static inline bool iommu_present(struct bus_type *bus)
>  {
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> new file mode 100644
> index 0000000..aaa3b6a
> --- /dev/null
> +++ b/include/uapi/linux/iommu.h
> @@ -0,0 +1,118 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/*
> + * IOMMU user API definitions
> + */
> +
> +#ifndef _UAPI_IOMMU_H
> +#define _UAPI_IOMMU_H
> +
> +#include <linux/types.h>
> +
> +#define IOMMU_FAULT_PERM_READ	(1 << 0) /* read */
> +#define IOMMU_FAULT_PERM_WRITE	(1 << 1) /* write */
> +#define IOMMU_FAULT_PERM_EXEC	(1 << 2) /* exec */
> +#define IOMMU_FAULT_PERM_PRIV	(1 << 3) /* privileged */
> +
> +/* Generic fault types, can be expanded IRQ remapping fault */
> +enum iommu_fault_type {
> +	IOMMU_FAULT_DMA_UNRECOV = 1,	/* unrecoverable fault */
> +	IOMMU_FAULT_PAGE_REQ,		/* page request fault */
> +};
> +
> +enum iommu_fault_reason {
> +	IOMMU_FAULT_REASON_UNKNOWN = 0,
> +
> +	/* Could not access the PASID table (fetch caused external abort) */
> +	IOMMU_FAULT_REASON_PASID_FETCH,
> +
> +	/* PASID entry is invalid or has configuration errors */
> +	IOMMU_FAULT_REASON_BAD_PASID_ENTRY,
> +
> +	/*
> +	 * PASID is out of range (e.g. exceeds the maximum PASID
> +	 * supported by the IOMMU) or disabled.
> +	 */
> +	IOMMU_FAULT_REASON_PASID_INVALID,
> +
> +	/*
> +	 * An external abort occurred fetching (or updating) a translation
> +	 * table descriptor
> +	 */
> +	IOMMU_FAULT_REASON_WALK_EABT,
> +
> +	/*
> +	 * Could not access the page table entry (Bad address),
> +	 * actual translation fault
> +	 */
> +	IOMMU_FAULT_REASON_PTE_FETCH,
> +
> +	/* Protection flag check failed */
> +	IOMMU_FAULT_REASON_PERMISSION,
> +
> +	/* access flag check failed */

Nitpick: Inconsistent capitalization of first word in comment :)

> +	IOMMU_FAULT_REASON_ACCESS,
> +
> +	/* Output address of a translation stage caused Address Size fault */
> +	IOMMU_FAULT_REASON_OOR_ADDRESS,
> +};
> +
> +/**
> + * struct iommu_fault_unrecoverable - Unrecoverable fault data
> + * @reason: reason of the fault, from &enum iommu_fault_reason
> + * @flags: parameters of this fault (IOMMU_FAULT_UNRECOV_* values)
> + * @pasid: Process Address Space ID
> + * @perm: Requested permission access using by the incoming transaction
> + *        (IOMMU_FAULT_PERM_* values)
> + * @addr: offending page address
> + * @fetch_addr: address that caused a fetch abort, if any
> + */
> +struct iommu_fault_unrecoverable {
> +	__u32	reason;
> +#define IOMMU_FAULT_UNRECOV_PASID_VALID		(1 << 0)
> +#define IOMMU_FAULT_UNRECOV_ADDR_VALID		(1 << 1)
> +#define IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID	(1 << 2)
> +	__u32	flags;
> +	__u32	pasid;
> +	__u32	perm;
> +	__u64	addr;
> +	__u64	fetch_addr;
> +};
> +
> +/**
> + * struct iommu_fault_page_request - Page Request data
> + * @flags: encodes whether the corresponding fields are valid and whether this
> + *         is the last page in group (IOMMU_FAULT_PAGE_REQUEST_* values)
> + * @pasid: Process Address Space ID
> + * @grpid: Page Request Group Index
> + * @perm: requested page permissions (IOMMU_FAULT_PERM_* values)
> + * @addr: page address
> + * @private_data: device-specific private information
> + */

Slightly surprised to see this given cover letter says no PRS.
I guess it is part of the original patch rather than intended
for this series.. Not a problem as far as I'm concerned.

> +struct iommu_fault_page_request {
> +#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID	(1 << 0)
> +#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE	(1 << 1)
> +#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA	(1 << 2)
> +	__u32	flags;
> +	__u32	pasid;
> +	__u32	grpid;
> +	__u32	perm;
> +	__u64	addr;
> +	__u64	private_data[2];
> +};
> +
> +/**
> + * struct iommu_fault - Generic fault data
> + * @type: fault type from &enum iommu_fault_type
> + * @padding: reserved for future use (should be zero)
> + * @event: Fault event, when @type is %IOMMU_FAULT_DMA_UNRECOV
> + * @prm: Page Request message, when @type is %IOMMU_FAULT_PAGE_REQ
> + */
> +struct iommu_fault {
> +	__u32	type;
> +	__u32	padding;
> +	union {
> +		struct iommu_fault_unrecoverable event;
> +		struct iommu_fault_page_request prm;
> +	};
> +};
> +#endif /* _UAPI_IOMMU_H */



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 04/22] iommu: Add recoverable fault reporting
  2019-06-09 13:44 ` [PATCH v4 04/22] iommu: Add recoverable fault reporting Jacob Pan
@ 2019-06-18 15:44   ` Jonathan Cameron
  0 siblings, 0 replies; 64+ messages in thread
From: Jonathan Cameron @ 2019-06-18 15:44 UTC (permalink / raw)
  To: Jacob Pan
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker, Tian, Kevin, Raj Ashok,
	Andriy Shevchenko

On Sun, 9 Jun 2019 06:44:04 -0700
Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:

> From: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> 
> Some IOMMU hardware features, for example PCI's PRI and Arm SMMU's Stall,
> enable recoverable I/O page faults. Allow IOMMU drivers to report PRI Page
> Requests and Stall events through the new fault reporting API. The
> consumer of the fault can be either an I/O page fault handler in the host,
> or a guest OS.
> 
> Once handled, the fault must be completed by sending a page response back
> to the IOMMU. Add an iommu_page_response() function to complete a page
> fault.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
One totally trivial ordering of docs comment in here.  Otherwise good.

Jonathan
> ---
>  drivers/iommu/iommu.c | 77 ++++++++++++++++++++++++++++++++++++++++++++++++++-
>  include/linux/iommu.h | 51 ++++++++++++++++++++++++++++++++++
>  2 files changed, 127 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 7955184..13b301c 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -869,7 +869,14 @@ EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
>   * @data: private data passed as argument to the handler
>   *
>   * When an IOMMU fault event is received, this handler gets called with the
> - * fault event and data as argument.
> + * fault event and data as argument. The handler should return 0 on success. If
> + * the fault is recoverable (IOMMU_FAULT_PAGE_REQ), the handler should also
> + * complete the fault by calling iommu_page_response() with one of the following
> + * response code:
> + * - IOMMU_PAGE_RESP_SUCCESS: retry the translation
> + * - IOMMU_PAGE_RESP_INVALID: terminate the fault
> + * - IOMMU_PAGE_RESP_FAILURE: terminate the fault and stop reporting
> + *   page faults if possible.
>   *
>   * Return 0 if the fault handler was installed successfully, or an error.
>   */
> @@ -904,6 +911,8 @@ int iommu_register_device_fault_handler(struct device *dev,
>  	}
>  	param->fault_param->handler = handler;
>  	param->fault_param->data = data;
> +	mutex_init(&param->fault_param->lock);
> +	INIT_LIST_HEAD(&param->fault_param->faults);
>  
>  done_unlock:
>  	mutex_unlock(&param->lock);
> @@ -934,6 +943,12 @@ int iommu_unregister_device_fault_handler(struct device *dev)
>  	if (!param->fault_param)
>  		goto unlock;
>  
> +	/* we cannot unregister handler if there are pending faults */
> +	if (!list_empty(&param->fault_param->faults)) {
> +		ret = -EBUSY;
> +		goto unlock;
> +	}
> +
>  	kfree(param->fault_param);
>  	param->fault_param = NULL;
>  	put_device(dev);
> @@ -958,6 +973,7 @@ EXPORT_SYMBOL_GPL(iommu_unregister_device_fault_handler);
>  int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
>  {
>  	struct iommu_param *param = dev->iommu_param;
> +	struct iommu_fault_event *evt_pending;
>  	struct iommu_fault_param *fparam;
>  	int ret = 0;
>  
> @@ -972,6 +988,20 @@ int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
>  		ret = -EINVAL;
>  		goto done_unlock;
>  	}
> +
> +	if (evt->fault.type == IOMMU_FAULT_PAGE_REQ &&
> +	    (evt->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) {
> +		evt_pending = kmemdup(evt, sizeof(struct iommu_fault_event),
> +				      GFP_KERNEL);
> +		if (!evt_pending) {
> +			ret = -ENOMEM;
> +			goto done_unlock;
> +		}
> +		mutex_lock(&fparam->lock);
> +		list_add_tail(&evt_pending->list, &fparam->faults);
> +		mutex_unlock(&fparam->lock);
> +	}
> +
>  	ret = fparam->handler(evt, fparam->data);
>  done_unlock:
>  	mutex_unlock(&param->lock);
> @@ -1513,6 +1543,51 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(iommu_attach_device);
>  
> +int iommu_page_response(struct device *dev,
> +			struct page_response_msg *msg)
> +{
> +	struct iommu_param *param = dev->iommu_param;
> +	int ret = -EINVAL;
> +	struct iommu_fault_event *evt;
> +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +
> +	if (!domain || !domain->ops->page_response)
> +		return -ENODEV;
> +
> +	/*
> +	 * Device iommu_param should have been allocated when device is
> +	 * added to its iommu_group.
> +	 */
> +	if (!param || !param->fault_param)
> +		return -EINVAL;
> +
> +	/* Only send response if there is a fault report pending */
> +	mutex_lock(&param->fault_param->lock);
> +	if (list_empty(&param->fault_param->faults)) {
> +		pr_warn("no pending PRQ, drop response\n");
> +		goto done_unlock;
> +	}
> +	/*
> +	 * Check if we have a matching page request pending to respond,
> +	 * otherwise return -EINVAL
> +	 */
> +	list_for_each_entry(evt, &param->fault_param->faults, list) {
> +		if (evt->fault.prm.pasid == msg->pasid &&
> +		    evt->fault.prm.grpid == msg->grpid) {
> +			msg->iommu_data = evt->iommu_private;
> +			ret = domain->ops->page_response(dev, msg);
> +			list_del(&evt->list);
> +			kfree(evt);
> +			break;
> +		}
> +	}
> +
> +done_unlock:
> +	mutex_unlock(&param->fault_param->lock);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_page_response);
> +
>  static void __iommu_detach_device(struct iommu_domain *domain,
>  				  struct device *dev)
>  {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index b87b74c..950347b 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -192,6 +192,42 @@ struct iommu_sva_ops {
>  #ifdef CONFIG_IOMMU_API
>  
>  /**
> + * enum page_response_code - Return status of fault handlers, telling the IOMMU
> + * driver how to proceed with the fault.
> + *
> + * @IOMMU_PAGE_RESP_SUCCESS: Fault has been handled and the page tables
> + *	populated, retry the access. This is "Success" in PCI PRI.
> + * @IOMMU_PAGE_RESP_FAILURE: General error. Drop all subsequent faults from
> + *	this device if possible. This is "Response Failure" in PCI PRI.
> + * @IOMMU_PAGE_RESP_INVALID: Could not handle this fault, don't retry the
> + *	access. This is "Invalid Request" in PCI PRI.
> + */
> +enum page_response_code {
> +	IOMMU_PAGE_RESP_SUCCESS = 0,
> +	IOMMU_PAGE_RESP_INVALID,
> +	IOMMU_PAGE_RESP_FAILURE,
> +};
> +
> +/**
> + * struct page_response_msg - Generic page response information based on PCI ATS
> + *                            and PASID spec
> + * @addr: servicing page address
> + * @pasid: contains process address space ID
> + * @pasid_present: the @pasid field is valid
> + * @resp_code: response code
> + * @grpid: page request group index
> + * @iommu_data: data private to the IOMMU
> + */
> +struct page_response_msg {
> +	u64 addr;
> +	u32 pasid;
> +	u32 pasid_present:1;
> +	enum page_response_code resp_code;
> +	u32 grpid;
> +	u64 iommu_data;
> +};
> +
> +/**
>   * struct iommu_ops - iommu ops and capabilities
>   * @capable: check capability
>   * @domain_alloc: allocate iommu domain
> @@ -227,6 +263,7 @@ struct iommu_sva_ops {
>   * @sva_bind: Bind process address space to device
>   * @sva_unbind: Unbind process address space from device
>   * @sva_get_pasid: Get PASID associated to a SVA handle
> + * @page_response: handle page request response
>   * @pgsize_bitmap: bitmap of all possible supported page sizes
>   */
>  struct iommu_ops {
> @@ -287,6 +324,8 @@ struct iommu_ops {
>  	void (*sva_unbind)(struct iommu_sva *handle);
>  	int (*sva_get_pasid)(struct iommu_sva *handle);
>  
> +	int (*page_response)(struct device *dev, struct page_response_msg *msg);
> +
>  	unsigned long pgsize_bitmap;
>  };
>  
> @@ -311,11 +350,13 @@ struct iommu_device {
>   * unrecoverable faults such as DMA or IRQ remapping faults.
>   *
>   * @fault: fault descriptor
> + * @list: pending fault event list, used for tracking responses

Ordering of docs vs structure is wrong.

>   * @iommu_private: used by the IOMMU driver for storing fault-specific
>   *                 data. Users should not modify this field before
>   *                 sending the fault response.
>   */
>  struct iommu_fault_event {
> +	struct list_head list;
>  	struct iommu_fault fault;
>  	u64 iommu_private;
>  };
> @@ -325,10 +366,14 @@ struct iommu_fault_event {
>   *
>   * @handler: Callback function to handle IOMMU faults at device level
>   * @data: handler private data
> + * @faults: holds the pending faults which needs response, e.g. page response.
> + * @lock: protect pending faults list
>   */
>  struct iommu_fault_param {
>  	iommu_dev_fault_handler_t handler;
>  	void *data;
> +	struct list_head faults;
> +	struct mutex lock;
>  };
>  
>  /**
> @@ -443,6 +488,7 @@ extern int iommu_unregister_device_fault_handler(struct device *dev);
>  extern int iommu_report_device_fault(struct device *dev,
>  				     struct iommu_fault_event *evt);
>  
> +extern int iommu_page_response(struct device *dev, struct page_response_msg *msg);
>  extern int iommu_group_id(struct iommu_group *group);
>  extern struct iommu_group *iommu_group_get_for_dev(struct device *dev);
>  extern struct iommu_domain *iommu_group_default_domain(struct iommu_group *);
> @@ -770,6 +816,11 @@ int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
>  	return -ENODEV;
>  }
>  
> +static inline int iommu_page_response(struct device *dev, struct page_response_msg *msg)
> +{
> +	return -ENODEV;
> +}
> +
>  static inline int iommu_group_id(struct iommu_group *group)
>  {
>  	return -ENODEV;



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 15/22] iommu/vt-d: Replace Intel specific PASID allocator with IOASID
  2019-06-09 13:44 ` [PATCH v4 15/22] iommu/vt-d: Replace Intel specific PASID allocator with IOASID Jacob Pan
@ 2019-06-18 15:57   ` Jonathan Cameron
  2019-06-24 21:36     ` Jacob Pan
  2019-06-27  1:53   ` Lu Baolu
  1 sibling, 1 reply; 64+ messages in thread
From: Jonathan Cameron @ 2019-06-18 15:57 UTC (permalink / raw)
  To: Jacob Pan
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker, Tian, Kevin, Raj Ashok,
	Andriy Shevchenko

On Sun, 9 Jun 2019 06:44:15 -0700
Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:

> Make use of generic IOASID code to manage PASID allocation,
> free, and lookup. Replace Intel specific code.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Hi Jacob,

One question inline.

Jonathan

> ---
>  drivers/iommu/intel-iommu.c | 11 +++++------
>  drivers/iommu/intel-pasid.c | 36 ------------------------------------
>  drivers/iommu/intel-svm.c   | 37 +++++++++++++++++++++----------------
>  3 files changed, 26 insertions(+), 58 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 5b84994..39b63fe 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5167,7 +5167,7 @@ static void auxiliary_unlink_device(struct dmar_domain *domain,
>  	domain->auxd_refcnt--;
>  
>  	if (!domain->auxd_refcnt && domain->default_pasid > 0)
> -		intel_pasid_free_id(domain->default_pasid);
> +		ioasid_free(domain->default_pasid);
>  }
>  
>  static int aux_domain_add_dev(struct dmar_domain *domain,
> @@ -5185,10 +5185,9 @@ static int aux_domain_add_dev(struct dmar_domain *domain,
>  	if (domain->default_pasid <= 0) {
>  		int pasid;
>  
> -		pasid = intel_pasid_alloc_id(domain, PASID_MIN,
> -					     pci_max_pasids(to_pci_dev(dev)),
> -					     GFP_KERNEL);
> -		if (pasid <= 0) {
> +		pasid = ioasid_alloc(NULL, PASID_MIN, pci_max_pasids(to_pci_dev(dev)) - 1,
> +				domain);

Is there any point in passing the domain in as the private pointer here?
I can't immediately see anywhere it is read back?

It is also rather confusing as the same driver stashes two different types of data
in the same xarray.

> +		if (pasid == INVALID_IOASID) {
>  			pr_err("Can't allocate default pasid\n");
>  			return -ENODEV;
>  		}
> @@ -5224,7 +5223,7 @@ static int aux_domain_add_dev(struct dmar_domain *domain,
>  	spin_unlock(&iommu->lock);
>  	spin_unlock_irqrestore(&device_domain_lock, flags);
>  	if (!domain->auxd_refcnt && domain->default_pasid > 0)
> -		intel_pasid_free_id(domain->default_pasid);
> +		ioasid_free(domain->default_pasid);
>  
>  	return ret;
>  }
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index 69fddd3..1e25539 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -26,42 +26,6 @@
>   */
>  static DEFINE_SPINLOCK(pasid_lock);
>  u32 intel_pasid_max_id = PASID_MAX;
> -static DEFINE_IDR(pasid_idr);
> -
> -int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp)
> -{
> -	int ret, min, max;
> -
> -	min = max_t(int, start, PASID_MIN);
> -	max = min_t(int, end, intel_pasid_max_id);
> -
> -	WARN_ON(in_interrupt());
> -	idr_preload(gfp);
> -	spin_lock(&pasid_lock);
> -	ret = idr_alloc(&pasid_idr, ptr, min, max, GFP_ATOMIC);
> -	spin_unlock(&pasid_lock);
> -	idr_preload_end();
> -
> -	return ret;
> -}
> -
> -void intel_pasid_free_id(int pasid)
> -{
> -	spin_lock(&pasid_lock);
> -	idr_remove(&pasid_idr, pasid);
> -	spin_unlock(&pasid_lock);
> -}
> -
> -void *intel_pasid_lookup_id(int pasid)
> -{
> -	void *p;
> -
> -	spin_lock(&pasid_lock);
> -	p = idr_find(&pasid_idr, pasid);
> -	spin_unlock(&pasid_lock);
> -
> -	return p;
> -}
>  
>  int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid)
>  {
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 8f87304..9cbcc1f 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -25,6 +25,7 @@
>  #include <linux/dmar.h>
>  #include <linux/interrupt.h>
>  #include <linux/mm_types.h>
> +#include <linux/ioasid.h>
>  #include <asm/page.h>
>  
>  #include "intel-pasid.h"
> @@ -332,16 +333,15 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_
>  		if (pasid_max > intel_pasid_max_id)
>  			pasid_max = intel_pasid_max_id;
>  
> -		/* Do not use PASID 0 in caching mode (virtualised IOMMU) */
> -		ret = intel_pasid_alloc_id(svm,
> -					   !!cap_caching_mode(iommu->cap),
> -					   pasid_max - 1, GFP_KERNEL);
> -		if (ret < 0) {
> +		/* Do not use PASID 0, reserved for RID to PASID */
> +		svm->pasid = ioasid_alloc(NULL, PASID_MIN,
> +					pasid_max - 1, svm);
> +		if (svm->pasid == INVALID_IOASID) {
>  			kfree(svm);
>  			kfree(sdev);
> +			ret = ENOSPC;
>  			goto out;
>  		}
> -		svm->pasid = ret;
>  		svm->notifier.ops = &intel_mmuops;
>  		svm->mm = mm;
>  		svm->flags = flags;
> @@ -351,7 +351,7 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_
>  		if (mm) {
>  			ret = mmu_notifier_register(&svm->notifier, mm);
>  			if (ret) {
> -				intel_pasid_free_id(svm->pasid);
> +				ioasid_free(svm->pasid);
>  				kfree(svm);
>  				kfree(sdev);
>  				goto out;
> @@ -367,7 +367,7 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_
>  		if (ret) {
>  			if (mm)
>  				mmu_notifier_unregister(&svm->notifier, mm);
> -			intel_pasid_free_id(svm->pasid);
> +			ioasid_free(svm->pasid);
>  			kfree(svm);
>  			kfree(sdev);
>  			goto out;
> @@ -400,7 +400,12 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
>  	if (!iommu)
>  		goto out;
>  
> -	svm = intel_pasid_lookup_id(pasid);
> +	svm = ioasid_find(NULL, pasid, NULL);
> +	if (IS_ERR(svm)) {
> +		ret = PTR_ERR(svm);
> +		goto out;
> +	}
> +
>  	if (!svm)
>  		goto out;
>  
> @@ -422,7 +427,7 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
>  				kfree_rcu(sdev, rcu);
>  
>  				if (list_empty(&svm->devs)) {
> -					intel_pasid_free_id(svm->pasid);
> +					ioasid_free(svm->pasid);
>  					if (svm->mm)
>  						mmu_notifier_unregister(&svm->notifier, svm->mm);
>  
> @@ -457,10 +462,11 @@ int intel_svm_is_pasid_valid(struct device *dev, int pasid)
>  	if (!iommu)
>  		goto out;
>  
> -	svm = intel_pasid_lookup_id(pasid);
> -	if (!svm)
> +	svm = ioasid_find(NULL, pasid, NULL);
> +	if (IS_ERR(svm)) {
> +		ret = PTR_ERR(svm);
>  		goto out;
> -
> +	}
>  	/* init_mm is used in this case */
>  	if (!svm->mm)
>  		ret = 1;
> @@ -567,13 +573,12 @@ static irqreturn_t prq_event_thread(int irq, void *d)
>  
>  		if (!svm || svm->pasid != req->pasid) {
>  			rcu_read_lock();
> -			svm = intel_pasid_lookup_id(req->pasid);
> +			svm = ioasid_find(NULL, req->pasid, NULL);
>  			/* It *can't* go away, because the driver is not permitted
>  			 * to unbind the mm while any page faults are outstanding.
>  			 * So we only need RCU to protect the internal idr code. */
>  			rcu_read_unlock();
> -
> -			if (!svm) {
> +			if (IS_ERR(svm) || !svm) {
>  				pr_err("%s: Page request for invalid PASID %d: %08llx %08llx\n",
>  				       iommu->name, req->pasid, ((unsigned long long *)req)[0],
>  				       ((unsigned long long *)req)[1]);



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 17/22] iommu/vt-d: Avoid duplicated code for PASID setup
  2019-06-09 13:44 ` [PATCH v4 17/22] iommu/vt-d: Avoid duplicated code for PASID setup Jacob Pan
@ 2019-06-18 16:03   ` Jonathan Cameron
  2019-06-24 23:44     ` Jacob Pan
  2019-07-16  9:52   ` Auger Eric
  1 sibling, 1 reply; 64+ messages in thread
From: Jonathan Cameron @ 2019-06-18 16:03 UTC (permalink / raw)
  To: Jacob Pan
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker, Tian, Kevin, Raj Ashok,
	Andriy Shevchenko

On Sun, 9 Jun 2019 06:44:17 -0700
Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:

> After each setup for PASID entry, related translation caches must be flushed.
> We can combine duplicated code into one function which is less error prone.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Formatting nitpick below ;)

Otherwise it's cut and paste
> ---
>  drivers/iommu/intel-pasid.c | 48 +++++++++++++++++----------------------------
>  1 file changed, 18 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index 1e25539..1ff2ecc 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -522,6 +522,21 @@ void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
>  		devtlb_invalidation_with_pasid(iommu, dev, pasid);
>  }
>  
> +static inline void pasid_flush_caches(struct intel_iommu *iommu,
> +				struct pasid_entry *pte,
> +				int pasid, u16 did)
> +{
> +	if (!ecap_coherent(iommu->ecap))
> +		clflush_cache_range(pte, sizeof(*pte));
> +
> +	if (cap_caching_mode(iommu->cap)) {
> +		pasid_cache_invalidation_with_pasid(iommu, did, pasid);
> +		iotlb_invalidation_with_pasid(iommu, did, pasid);
> +	} else
> +		iommu_flush_write_buffer(iommu);

I have some vague recollection kernel style says use brackets around
single lines if other blocks in an if / else stack have multiple lines..

I checked, this case is specifically called out

https://www.kernel.org/doc/html/v5.1/process/coding-style.html
> +
This blank line doesn't add anything either ;)
> +}
> +
>  /*
>   * Set up the scalable mode pasid table entry for first only
>   * translation type.
> @@ -567,16 +582,7 @@ int intel_pasid_setup_first_level(struct intel_iommu *iommu,
>  	/* Setup Present and PASID Granular Transfer Type: */
>  	pasid_set_translation_type(pte, 1);
>  	pasid_set_present(pte);
> -
> -	if (!ecap_coherent(iommu->ecap))
> -		clflush_cache_range(pte, sizeof(*pte));
> -
> -	if (cap_caching_mode(iommu->cap)) {
> -		pasid_cache_invalidation_with_pasid(iommu, did, pasid);
> -		iotlb_invalidation_with_pasid(iommu, did, pasid);
> -	} else {
> -		iommu_flush_write_buffer(iommu);
> -	}
> +	pasid_flush_caches(iommu, pte, pasid, did);
>  
>  	return 0;
>  }
> @@ -640,16 +646,7 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
>  	 */
>  	pasid_set_sre(pte);
>  	pasid_set_present(pte);
> -
> -	if (!ecap_coherent(iommu->ecap))
> -		clflush_cache_range(pte, sizeof(*pte));
> -
> -	if (cap_caching_mode(iommu->cap)) {
> -		pasid_cache_invalidation_with_pasid(iommu, did, pasid);
> -		iotlb_invalidation_with_pasid(iommu, did, pasid);
> -	} else {
> -		iommu_flush_write_buffer(iommu);
> -	}
> +	pasid_flush_caches(iommu, pte, pasid, did);
>  
>  	return 0;
>  }
> @@ -683,16 +680,7 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
>  	 */
>  	pasid_set_sre(pte);
>  	pasid_set_present(pte);
> -
> -	if (!ecap_coherent(iommu->ecap))
> -		clflush_cache_range(pte, sizeof(*pte));
> -
> -	if (cap_caching_mode(iommu->cap)) {
> -		pasid_cache_invalidation_with_pasid(iommu, did, pasid);
> -		iotlb_invalidation_with_pasid(iommu, did, pasid);
> -	} else {
> -		iommu_flush_write_buffer(iommu);
> -	}
> +	pasid_flush_caches(iommu, pte, pasid, did);
>  
>  	return 0;
>  }



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 19/22] iommu/vt-d: Clean up for SVM device list
  2019-06-09 13:44 ` [PATCH v4 19/22] iommu/vt-d: Clean up for SVM device list Jacob Pan
@ 2019-06-18 16:42   ` Jonathan Cameron
  2019-06-24 23:59     ` Jacob Pan
  0 siblings, 1 reply; 64+ messages in thread
From: Jonathan Cameron @ 2019-06-18 16:42 UTC (permalink / raw)
  To: Jacob Pan
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker, Tian, Kevin, Raj Ashok,
	Andriy Shevchenko

On Sun, 9 Jun 2019 06:44:19 -0700
Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:

> Use combined macro for_each_svm_dev() to simplify SVM device iteration.
> 
> Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> ---
>  drivers/iommu/intel-svm.c | 79 +++++++++++++++++++++++------------------------
>  1 file changed, 39 insertions(+), 40 deletions(-)
> 
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 9cbcc1f..66d98e1 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -225,6 +225,9 @@ static const struct mmu_notifier_ops intel_mmuops = {
>  
>  static DEFINE_MUTEX(pasid_mutex);
>  static LIST_HEAD(global_svm_list);
> +#define for_each_svm_dev() \
> +	list_for_each_entry(sdev, &svm->devs, list)	\
> +	if (dev == sdev->dev)				\

Could we make this macro less opaque and have it take the svm and dev as
arguments?

>  
>  int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ops *ops)
>  {
> @@ -271,15 +274,13 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_
>  				goto out;
>  			}
>  
> -			list_for_each_entry(sdev, &svm->devs, list) {
> -				if (dev == sdev->dev) {
> -					if (sdev->ops != ops) {
> -						ret = -EBUSY;
> -						goto out;
> -					}
> -					sdev->users++;
> -					goto success;
> +			for_each_svm_dev() {
> +				if (sdev->ops != ops) {
> +					ret = -EBUSY;
> +					goto out;
>  				}
> +				sdev->users++;
> +				goto success;
>  			}
>  
>  			break;
> @@ -409,40 +410,38 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
>  	if (!svm)
>  		goto out;
>  
> -	list_for_each_entry(sdev, &svm->devs, list) {
> -		if (dev == sdev->dev) {
> -			ret = 0;
> -			sdev->users--;
> -			if (!sdev->users) {
> -				list_del_rcu(&sdev->list);
> -				/* Flush the PASID cache and IOTLB for this device.
> -				 * Note that we do depend on the hardware *not* using
> -				 * the PASID any more. Just as we depend on other
> -				 * devices never using PASIDs that they have no right
> -				 * to use. We have a *shared* PASID table, because it's
> -				 * large and has to be physically contiguous. So it's
> -				 * hard to be as defensive as we might like. */
> -				intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
> -				intel_flush_svm_range_dev(svm, sdev, 0, -1, 0, !svm->mm);
> -				kfree_rcu(sdev, rcu);
> -
> -				if (list_empty(&svm->devs)) {
> -					ioasid_free(svm->pasid);
> -					if (svm->mm)
> -						mmu_notifier_unregister(&svm->notifier, svm->mm);
> -
> -					list_del(&svm->list);
> -
> -					/* We mandate that no page faults may be outstanding
> -					 * for the PASID when intel_svm_unbind_mm() is called.
> -					 * If that is not obeyed, subtle errors will happen.
> -					 * Let's make them less subtle... */
> -					memset(svm, 0x6b, sizeof(*svm));
> -					kfree(svm);
> -				}
> +	for_each_svm_dev() {
> +		ret = 0;
> +		sdev->users--;
> +		if (!sdev->users) {
> +			list_del_rcu(&sdev->list);
> +			/* Flush the PASID cache and IOTLB for this device.
> +			 * Note that we do depend on the hardware *not* using
> +			 * the PASID any more. Just as we depend on other
> +			 * devices never using PASIDs that they have no right
> +			 * to use. We have a *shared* PASID table, because it's
> +			 * large and has to be physically contiguous. So it's
> +			 * hard to be as defensive as we might like. */
> +			intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
> +			intel_flush_svm_range_dev(svm, sdev, 0, -1, 0, !svm->mm);
> +			kfree_rcu(sdev, rcu);
> +
> +			if (list_empty(&svm->devs)) {
> +				ioasid_free(svm->pasid);
> +				if (svm->mm)
> +					mmu_notifier_unregister(&svm->notifier, svm->mm);
> +
> +				list_del(&svm->list);
> +
> +				/* We mandate that no page faults may be outstanding
> +				 * for the PASID when intel_svm_unbind_mm() is called.
> +				 * If that is not obeyed, subtle errors will happen.
> +				 * Let's make them less subtle... */
> +				memset(svm, 0x6b, sizeof(*svm));
> +				kfree(svm);
>  			}
> -			break;
>  		}
> +		break;
>  	}
>   out:
>  	mutex_unlock(&pasid_mutex);



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 20/22] iommu/vt-d: Add bind guest PASID support
  2019-06-09 13:44 ` [PATCH v4 20/22] iommu/vt-d: Add bind guest PASID support Jacob Pan
@ 2019-06-18 16:44   ` Jonathan Cameron
  2019-06-24 22:41     ` Jacob Pan
  2019-06-27  2:50   ` Lu Baolu
  2019-07-16 16:45   ` Auger Eric
  2 siblings, 1 reply; 64+ messages in thread
From: Jonathan Cameron @ 2019-06-18 16:44 UTC (permalink / raw)
  To: Jacob Pan
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker, Yi L, Tian, Kevin,
	Raj Ashok, Liu, Andriy Shevchenko

On Sun, 9 Jun 2019 06:44:20 -0700
Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:

> When supporting guest SVA with emulated IOMMU, the guest PASID
> table is shadowed in VMM. Updates to guest vIOMMU PASID table
> will result in PASID cache flush which will be passed down to
> the host as bind guest PASID calls.
> 
> For the SL page tables, it will be harvested from device's
> default domain (request w/o PASID), or aux domain in case of
> mediated device.
> 
>     .-------------.  .---------------------------.
>     |   vIOMMU    |  | Guest process CR3, FL only|
>     |             |  '---------------------------'
>     .----------------/
>     | PASID Entry |--- PASID cache flush -
>     '-------------'                       |
>     |             |                       V
>     |             |                CR3 in GPA
>     '-------------'
> Guest
> ------| Shadow |--------------------------|--------
>       v        v                          v
> Host
>     .-------------.  .----------------------.
>     |   pIOMMU    |  | Bind FL for GVA-GPA  |
>     |             |  '----------------------'
>     .----------------/  |
>     | PASID Entry |     V (Nested xlate)
>     '----------------\.------------------------------.
>     |             |   |SL for GPA-HPA, default domain|
>     |             |   '------------------------------'
>     '-------------'
> Where:
>  - FL = First level/stage one page tables
>  - SL = Second level/stage two page tables
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>


A few trivial bits inline.  As far as I can tell looks good but I'm not that
familiar with the hardware.

Jonathan

> ---
>  drivers/iommu/intel-iommu.c |   4 +
>  drivers/iommu/intel-svm.c   | 187 ++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/intel-iommu.h |  13 ++-
>  include/linux/intel-svm.h   |  17 ++++
>  4 files changed, 219 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 7cfa0eb..3b4d712 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5782,6 +5782,10 @@ const struct iommu_ops intel_iommu_ops = {
>  	.dev_enable_feat	= intel_iommu_dev_enable_feat,
>  	.dev_disable_feat	= intel_iommu_dev_disable_feat,
>  	.pgsize_bitmap		= INTEL_IOMMU_PGSIZES,
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> +	.sva_bind_gpasid	= intel_svm_bind_gpasid,
> +	.sva_unbind_gpasid	= intel_svm_unbind_gpasid,
> +#endif
>  };
>  
>  static void quirk_iommu_g4x_gfx(struct pci_dev *dev)
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 66d98e1..f06a82f 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -229,6 +229,193 @@ static LIST_HEAD(global_svm_list);
>  	list_for_each_entry(sdev, &svm->devs, list)	\
>  	if (dev == sdev->dev)				\
>  
> +int intel_svm_bind_gpasid(struct iommu_domain *domain,
> +			struct device *dev,
> +			struct gpasid_bind_data *data)
> +{
> +	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> +	struct intel_svm_dev *sdev;
> +	struct intel_svm *svm = NULL;
I think this is set in all the paths that use it..

> +	struct dmar_domain *ddomain;
> +	int ret = 0;
> +
> +	if (WARN_ON(!iommu) || !data)
> +		return -EINVAL;
> +
> +	if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
> +		data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
> +		return -EINVAL;
> +
> +	if (dev_is_pci(dev)) {
> +		/* VT-d supports devices with full 20 bit PASIDs only */
> +		if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX)
> +			return -EINVAL;
> +	}
> +
> +	/*
> +	 * We only check host PASID range, we have no knowledge to check
> +	 * guest PASID range nor do we use the guest PASID.
> +	 */
> +	if (data->hpasid <= 0 || data->hpasid >= PASID_MAX)
> +		return -EINVAL;
> +
> +	ddomain = to_dmar_domain(domain);
> +	/* REVISIT:
> +	 * Sanity check adddress width and paging mode support
> +	 * width matching in two dimensions:
> +	 * 1. paging mode CPU <= IOMMU
> +	 * 2. address width Guest <= Host.
> +	 */
> +	mutex_lock(&pasid_mutex);
> +	svm = ioasid_find(NULL, data->hpasid, NULL);
> +	if (IS_ERR(svm)) {
> +		ret = PTR_ERR(svm);
> +		goto out;
> +	}
> +	if (svm) {
> +		/*
> +		 * If we found svm for the PASID, there must be at
> +		 * least one device bond, otherwise svm should be freed.
> +		 */
> +		BUG_ON(list_empty(&svm->devs));
> +
> +		for_each_svm_dev() {
> +			/* In case of multiple sub-devices of the same pdev assigned, we should
> +			 * allow multiple bind calls with the same PASID and pdev.
> +			 */
> +			sdev->users++;
> +			goto out;
> +		}
> +	} else {
> +		/* We come here when PASID has never been bond to a device. */
> +		svm = kzalloc(sizeof(*svm), GFP_KERNEL);
> +		if (!svm) {
> +			ret = -ENOMEM;
> +			goto out;
> +		}
> +		/* REVISIT: upper layer/VFIO can track host process that bind the PASID.
> +		 * ioasid_set = mm might be sufficient for vfio to check pasid VMM
> +		 * ownership.
> +		 */
> +		svm->mm = get_task_mm(current);
> +		svm->pasid = data->hpasid;
> +		if (data->flags & IOMMU_SVA_GPASID_VAL) {
> +			svm->gpasid = data->gpasid;
> +			svm->flags &= SVM_FLAG_GUEST_PASID;
> +		}
> +		refcount_set(&svm->refs, 0);
> +		ioasid_set_data(data->hpasid, svm);
> +		INIT_LIST_HEAD_RCU(&svm->devs);
> +		INIT_LIST_HEAD(&svm->list);
> +
> +		mmput(svm->mm);
> +	}
> +	sdev = kzalloc(sizeof(*sdev), GFP_KERNEL);
> +	if (!sdev) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +	sdev->dev = dev;
> +	sdev->users = 1;
> +
> +	/* Set up device context entry for PASID if not enabled already */
> +	ret = intel_iommu_enable_pasid(iommu, sdev->dev);
> +	if (ret) {
> +		dev_err(dev, "Failed to enable PASID capability\n");
> +		kfree(sdev);
> +		goto out;
> +	}
> +
> +	/*
> +	 * For guest bind, we need to set up PASID table entry as follows:
> +	 * - FLPM matches guest paging mode
> +	 * - turn on nested mode
> +	 * - SL guest address width matching
> +	 */
> +	ret = intel_pasid_setup_nested(iommu,
> +				dev,
> +				(pgd_t *)data->gpgd,
> +				data->hpasid,
> +				data->flags,
> +				ddomain,
> +				data->addr_width);
> +	if (ret) {
> +		dev_err(dev, "Failed to set up PASID %llu in nested mode, Err %d\n",
> +			data->hpasid, ret);
> +		kfree(sdev);
> +		goto out;
> +	}
> +	svm->flags |= SVM_FLAG_GUEST_MODE;
> +
> +	init_rcu_head(&sdev->rcu);
> +	refcount_inc(&svm->refs);
> +	list_add_rcu(&sdev->list, &svm->devs);
> + out:
> +	mutex_unlock(&pasid_mutex);
> +	return ret;
> +}
> +
> +int intel_svm_unbind_gpasid(struct device *dev, int pasid)
> +{
> +	struct intel_svm_dev *sdev;
> +	struct intel_iommu *iommu;
> +	struct intel_svm *svm;
> +	int ret = -EINVAL;
> +
> +	mutex_lock(&pasid_mutex);
> +	iommu = intel_svm_device_to_iommu(dev);
> +	if (!iommu)
> +		goto out;
> +
> +	svm = ioasid_find(NULL, pasid, NULL);
> +	if (IS_ERR(svm)) {
> +		ret = PTR_ERR(svm);
> +		goto out;
> +	}
> +
> +	if (!svm)
> +		goto out;
> +
> +	for_each_svm_dev() {
> +		ret = 0;
> +		sdev->users--;
> +		if (!sdev->users) {
> +			list_del_rcu(&sdev->list);
> +			intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
> +			/* TODO: Drain in flight PRQ for the PASID since it
> +			 * may get reused soon, we don't want to
> +			 * confuse with its previous live.
life?

> +			 * intel_svm_drain_prq(dev, pasid);
> +			 */
> +			kfree_rcu(sdev, rcu);
> +
> +			if (list_empty(&svm->devs)) {
> +				list_del(&svm->list);
> +				kfree(svm);
> +				/*
> +				 * We do not free PASID here until explicit call
> +				 * from VFIO to free. The PASID life cycle
> +				 * management is largely tied to VFIO management
> +				 * of assigned device life cycles. In case of
> +				 * guest exit without a explicit free PASID call,
> +				 * the responsibility lies in VFIO layer to free
> +				 * the PASIDs allocated for the guest.
> +				 * For security reasons, VFIO has to track the
> +				 * PASID ownership per guest anyway to ensure
> +				 * that PASID allocated by one guest cannot be
> +				 * used by another.
> +				 */
> +				ioasid_set_data(pasid, NULL);
> +			}
> +		}
> +		break;
> +	}
> + out:
> +	mutex_unlock(&pasid_mutex);
> +
> +	return ret;
> +}
> +
>  int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ops *ops)
>  {
>  	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index b75f17d..94d3a9a 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -677,7 +677,9 @@ int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct device *dev);
>  int intel_svm_init(struct intel_iommu *iommu);
>  extern int intel_svm_enable_prq(struct intel_iommu *iommu);
>  extern int intel_svm_finish_prq(struct intel_iommu *iommu);
> -
> +extern int intel_svm_bind_gpasid(struct iommu_domain *domain,
> +		struct device *dev, struct gpasid_bind_data *data);
> +extern int intel_svm_unbind_gpasid(struct device *dev, int pasid);
>  struct svm_dev_ops;
>  
>  struct intel_svm_dev {
> @@ -693,12 +695,19 @@ struct intel_svm_dev {
>  
>  struct intel_svm {
>  	struct mmu_notifier notifier;
> -	struct mm_struct *mm;
> +	union {
> +		struct mm_struct *mm;
> +		u64 gcr3;
> +	};
>  	struct intel_iommu *iommu;
>  	int flags;
>  	int pasid;
> +	int gpasid; /* Guest PASID in case of vSVA bind with non-identity host
> +		     * to guest PASID mapping.
> +		     */
>  	struct list_head devs;
>  	struct list_head list;
> +	refcount_t refs; /* Number of devices sharing this PASID */
>  };
>  
>  extern struct intel_iommu *intel_svm_device_to_iommu(struct device *dev);
> diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
> index e3f7631..577d5df 100644
> --- a/include/linux/intel-svm.h
> +++ b/include/linux/intel-svm.h
> @@ -52,6 +52,23 @@ struct svm_dev_ops {
>   * do such IOTLB flushes automatically.
>   */
>  #define SVM_FLAG_SUPERVISOR_MODE	(1<<1)
> +/*
> + * The SVM_FLAG_GUEST_MODE flag is used when a guest process bind to a device.
> + * In this case the mm_struct is in the guest kernel or userspace, its life
> + * cycle is managed by VMM and VFIO layer. For IOMMU driver, this API provides
> + * means to bind/unbind guest CR3 with PASIDs allocated for a device.
> + */
> +#define SVM_FLAG_GUEST_MODE	(1<<2)
> +/*
> + * The SVM_FLAG_GUEST_PASID flag is used when a guest has its own PASID space,
> + * which requires guest and host PASID translation at both directions. We keep
> + * track of guest PASID in order to provide lookup service to device drivers.
> + * One such example is a physical function (PF) driver that supports mediated
> + * device (mdev) assignment. Guest programming of mdev configuration space can
> + * only be done with guest PASID, therefore PF driver needs to find the matching
> + * host PASID to program the real hardware.
> + */
> +#define SVM_FLAG_GUEST_PASID	(1<<3)
>  
>  #ifdef CONFIG_INTEL_IOMMU_SVM
>  



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 12/22] iommu: Add I/O ASID allocator
  2019-06-09 13:44 ` [PATCH v4 12/22] iommu: Add I/O ASID allocator Jacob Pan
@ 2019-06-18 16:50   ` Jonathan Cameron
  2019-06-25 18:55     ` Jacob Pan
  0 siblings, 1 reply; 64+ messages in thread
From: Jonathan Cameron @ 2019-06-18 16:50 UTC (permalink / raw)
  To: Jacob Pan
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker, Tian, Kevin, Raj Ashok,
	Andriy Shevchenko

On Sun, 9 Jun 2019 06:44:12 -0700
Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:

> From: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> 
> Some devices might support multiple DMA address spaces, in particular
> those that have the PCI PASID feature. PASID (Process Address Space ID)
> allows to share process address spaces with devices (SVA), partition a
> device into VM-assignable entities (VFIO mdev) or simply provide
> multiple DMA address space to kernel drivers. Add a global PASID
> allocator usable by different drivers at the same time. Name it I/O ASID
> to avoid confusion with ASIDs allocated by arch code, which are usually
> a separate ID space.
> 
> The IOASID space is global. Each device can have its own PASID space,
> but by convention the IOMMU ended up having a global PASID space, so
> that with SVA, each mm_struct is associated to a single PASID.
> 
> The allocator is primarily used by IOMMU subsystem but in rare occasions
> drivers would like to allocate PASIDs for devices that aren't managed by
> an IOMMU, using the same ID space as IOMMU.
> 
> There are two types of allocators:
> 1. default allocator - Always available, uses an XArray to track
> 2. custom allocators - Can be registered at runtime, take precedence
> over the default allocator.
> 
> Custom allocators have these attributes:
> - provides platform specific alloc()/free() functions with private data.
> - allocation results lookup are not provided by the allocator, lookup
>   request must be done by the IOASID framework by its own XArray.
> - allocators can be unregistered at runtime, either fallback to the next
>   custom allocator or to the default allocator.

What is the usecase for having a 'stack' of custom allocators?

> - custom allocators can share the same set of alloc()/free() helpers, in
>   this case they also share the same IOASID space, thus the same XArray.
> - switching between allocators requires all outstanding IOASIDs to be
>   freed unless the two allocators share the same alloc()/free() helpers.
In general this approach has a lot of features where the justification is
missing from this particular patch.  It may be useful to add some
more background to this intro?

> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Link: https://lkml.org/lkml/2019/4/26/462

Various comments inline.  Given the several cups of coffee this took to
review I may well have misunderstood everything ;)

Jonathan
> ---
>  drivers/iommu/Kconfig  |   8 +
>  drivers/iommu/Makefile |   1 +
>  drivers/iommu/ioasid.c | 427 +++++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/ioasid.h |  74 +++++++++
>  4 files changed, 510 insertions(+)
>  create mode 100644 drivers/iommu/ioasid.c
>  create mode 100644 include/linux/ioasid.h
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 83664db..c40c4b5 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -3,6 +3,13 @@
>  config IOMMU_IOVA
>  	tristate
>  
> +# The IOASID allocator may also be used by non-IOMMU_API users
> +config IOASID
> +	tristate
> +	help
> +	  Enable the I/O Address Space ID allocator. A single ID space shared
> +	  between different users.
> +
>  # IOMMU_API always gets selected by whoever wants it.
>  config IOMMU_API
>  	bool
> @@ -207,6 +214,7 @@ config INTEL_IOMMU_SVM
>  	depends on INTEL_IOMMU && X86
>  	select PCI_PASID
>  	select MMU_NOTIFIER
> +	select IOASID
>  	help
>  	  Shared Virtual Memory (SVM) provides a facility for devices
>  	  to access DMA resources through process address space by
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 8c71a15..0efac6f 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -7,6 +7,7 @@ obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
> +obj-$(CONFIG_IOASID) += ioasid.o
>  obj-$(CONFIG_IOMMU_IOVA) += iova.o
>  obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
>  obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
> diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> new file mode 100644
> index 0000000..0919b70
> --- /dev/null
> +++ b/drivers/iommu/ioasid.c
> @@ -0,0 +1,427 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * I/O Address Space ID allocator. There is one global IOASID space, split into
> + * subsets. Users create a subset with DECLARE_IOASID_SET, then allocate and
> + * free IOASIDs with ioasid_alloc and ioasid_free.
> + */
> +#define pr_fmt(fmt)	KBUILD_MODNAME ": " fmt
> +
> +#include <linux/xarray.h>
> +#include <linux/ioasid.h>
> +#include <linux/slab.h>
> +#include <linux/module.h>
> +#include <linux/spinlock.h>
> +
> +struct ioasid_data {
> +	ioasid_t id;
> +	struct ioasid_set *set;
> +	void *private;
> +	struct rcu_head rcu;
> +};
> +
> +/*
> + * struct ioasid_allocator_data - Internal data structure to hold information
> + * about an allocator. There are two types of allocators:
> + *
> + * - Default allocator always has its own XArray to track the IOASIDs allocated.
> + * - Custom allocators may share allocation helpers with different private data.
> + *   Custom allocators share the same helper functions also share the same
> + *   XArray.
> + * Rules:
> + * 1. Default allocator is always available, not dynamically registered. This is
> + *    to prevent race conditions with early boot code that want to register
> + *    custom allocators or allocate IOASIDs.
> + * 2. Custom allocators take precedence over the default allocator.
> + * 3. When all custom allocators sharing the same helper functions are
> + *    unregistered (e.g. due to hotplug), all outstanding IOASIDs must be
> + *    freed.
> + * 4. When switching between custom allocators sharing the same helper
> + *    functions, outstanding IOASIDs are preserved.
> + * 5. When switching between custom allocator and default allocator, all IOASIDs
> + *    must be freed to ensure unadulterated space for the new allocator.
> + *
> + * @ops:	allocator helper functions and its data
> + * @list:	registered custom allocators
> + * @slist:	allocators share the same ops but different data
> + * @flags:	attributes of the allocator
> + * @users	number of allocators sharing the same ops and XArray
> + * @xa		xarray holds the IOASID space
Doc ordering should match the struct.

> + */
> +struct ioasid_allocator_data {
> +	struct ioasid_allocator_ops *ops;
> +	struct list_head list;
> +	struct list_head slist;
> +#define IOASID_ALLOCATOR_CUSTOM BIT(0) /* Needs framework to track results */
> +	unsigned long flags;
> +	struct xarray xa;
> +	refcount_t users;
> +};
> +
> +static DEFINE_MUTEX(ioasid_allocator_lock);
> +static LIST_HEAD(allocators_list);
> +static ioasid_t default_alloc(ioasid_t min, ioasid_t max, void *opaque);
> +static void default_free(ioasid_t ioasid, void *opaque);
> +
> +static struct ioasid_allocator_ops default_ops = {
> +	.alloc = default_alloc,
> +	.free = default_free

Pure churn prevention but doesn't seem implausible that we'll end up with
more ops at somepoint, so add the commas on last elements?

> +};
> +
> +static struct ioasid_allocator_data default_allocator = {
> +	.ops = &default_ops,
> +	.flags = 0,
> +	.xa = XARRAY_INIT(ioasid_xa, XA_FLAGS_ALLOC)
> +};
> +
> +static struct ioasid_allocator_data *active_allocator = &default_allocator;
> +
> +static ioasid_t default_alloc(ioasid_t min, ioasid_t max, void *opaque)
> +{
Why not reorder to avoid having the forward defs above?  It's only moving things
a few lines in this case so I'm not seeing a strong readability argument.

> +	ioasid_t id;
> +
> +	if (xa_alloc(&default_allocator.xa, &id, opaque, XA_LIMIT(min, max), GFP_KERNEL)) {
> +		pr_err("Failed to alloc ioasid from %d to %d\n", min, max);

It seems xa_alloc can return a few different error values. Perhaps worth printing
out what we got as might help in debugging?

> +		return INVALID_IOASID;
> +	}
> +
> +	return id;
> +}
> +
> +void default_free(ioasid_t ioasid, void *opaque)
> +{
> +	struct ioasid_data *ioasid_data;
> +
> +	ioasid_data = xa_erase(&default_allocator.xa, ioasid);
> +	kfree_rcu(ioasid_data, rcu);
> +}
> +
> +/* Allocate and initialize a new custom allocator with its helper functions */
> +static inline struct ioasid_allocator_data *ioasid_alloc_allocator(struct ioasid_allocator_ops *ops)

Why inline?

> +{
> +	struct ioasid_allocator_data *ia_data;
> +
> +	ia_data = kzalloc(sizeof(*ia_data), GFP_KERNEL);
> +	if (!ia_data)
> +		return NULL;
> +
> +	xa_init_flags(&ia_data->xa, XA_FLAGS_ALLOC);
> +	INIT_LIST_HEAD(&ia_data->slist);
> +	ia_data->flags |= IOASID_ALLOCATOR_CUSTOM;
> +	ia_data->ops = ops;
> +
> +	/* For tracking custom allocators that share the same ops */
> +	list_add_tail(&ops->list, &ia_data->slist);
> +	refcount_set(&ia_data->users, 1);
> +
> +	return ia_data;
> +}
> +
> +static inline bool use_same_ops(struct ioasid_allocator_ops *a, struct ioasid_allocator_ops *b)
> +{
> +	return (a->free == b->free) && (a->alloc == b->alloc);
> +}
> +
> +/**
> + * ioasid_register_allocator - register a custom allocator
> + * @ops: the custom allocator ops to be registered
> + *
> + * Custom allocators take precedence over the default xarray based allocator.
> + * Private data associated with the IOASID allocated by the custom allocators
> + * are managed by IOASID framework similar to data stored in xa by default
> + * allocator.
> + *
> + * There can be multiple allocators registered but only one is active. In case
> + * of runtime removal of a custom allocator, the next one is activated based
> + * on the registration ordering.
> + *
> + * Multiple allocators can share the same alloc() function, in this case the
> + * IOASID space is shared.
> + *
Nitpick: This extra blank line seems inconsistent.

> + */
> +int ioasid_register_allocator(struct ioasid_allocator_ops *ops)
> +{
> +	struct ioasid_allocator_data *ia_data;
> +	struct ioasid_allocator_data *pallocator;
> +	int ret = 0;
> +
> +	mutex_lock(&ioasid_allocator_lock);
> +
> +	ia_data = ioasid_alloc_allocator(ops);
> +	if (!ia_data) {
> +		ret = -ENOMEM;
> +		goto out_unlock;
> +	}
> +
> +	/*
> +	 * No particular preference, we activate the first one and keep
> +	 * the later registered allocators in a list in case the first one gets
> +	 * removed due to hotplug.
> +	 */
> +	if (list_empty(&allocators_list)) {
> +		WARN_ON(active_allocator != &default_allocator);
> +		/* Use this new allocator if default is not active */
> +		if (xa_empty(&active_allocator->xa)) {
> +			active_allocator = ia_data;
> +			list_add_tail(&ia_data->list, &allocators_list);
> +			goto out_unlock;
> +		}
> +		pr_warn("Default allocator active with outstanding IOASID\n");
> +		ret = -EAGAIN;
> +		goto out_free;
> +	}
> +
> +	/* Check if the allocator is already registered */
> +	list_for_each_entry(pallocator, &allocators_list, list) {
> +		if (pallocator->ops == ops) {
> +			pr_err("IOASID allocator already registered\n");
> +			ret = -EEXIST;
> +			goto out_free;
> +		} else if (use_same_ops(pallocator->ops, ops)) {
> +			/*
> +			 * If the new allocator shares the same ops,
> +			 * then they will share the same IOASID space.
> +			 * We should put them under the same xarray.
> +			 */
> +			list_add_tail(&ops->list, &pallocator->slist);
> +			refcount_inc(&pallocator->users);
> +			pr_info("New IOASID allocator ops shared %u times\n",
> +				refcount_read(&pallocator->users));
> +			goto out_free;
> +		}
> +	}
> +	list_add_tail(&ia_data->list, &allocators_list);
> +
> +	goto out_unlock;
I commented on this in Jean-Phillipe's set as well I think.
Cleaner to just have a separate unlock copy here and return.

> +
> +out_free:
> +	kfree(ia_data);
> +out_unlock:
> +	mutex_unlock(&ioasid_allocator_lock);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(ioasid_register_allocator);
> +
> +static inline bool has_shared_ops(struct ioasid_allocator_data *allocator)
> +{
> +	return refcount_read(&allocator->users) > 1;
> +}
> +
> +/**
> + * ioasid_unregister_allocator - Remove a custom IOASID allocator ops
> + * @ops: the custom allocator to be removed
> + *
> + * Remove an allocator from the list, activate the next allocator in
> + * the order it was registered. Or revert to default allocator if all
> + * custom allocators are unregistered without outstanding IOASIDs.
> + */
> +void ioasid_unregister_allocator(struct ioasid_allocator_ops *ops)
> +{
> +	struct ioasid_allocator_data *pallocator;
> +	struct ioasid_allocator_ops *sops;
> +
> +	mutex_lock(&ioasid_allocator_lock);
> +	if (list_empty(&allocators_list)) {
> +		pr_warn("No custom IOASID allocators active!\n");
> +		goto exit_unlock;
> +	}
> +
> +	list_for_each_entry(pallocator, &allocators_list, list) {
> +		if (use_same_ops(pallocator->ops, ops)) {
Could reduce the depth with.

		if (!use_same_ops(pallocator->ops, ops))
			continue;

Trade off between having the unusual flow that will resort and the
code going off the side of your screen (if you use a very small
screen ;)

> +			if (refcount_read(&pallocator->users) == 1) {
> +				/* No shared helper functions */
> +				list_del(&pallocator->list);
> +				/*
> +				 * All IOASIDs should have been freed before
> +				 * the last allocator that shares the same ops
> +				 * is unregistered.
> +				 */
> +				WARN_ON(!xa_empty(&pallocator->xa));
> +				kfree(pallocator);
> +				if (list_empty(&allocators_list)) {
> +					pr_info("No custom IOASID allocators, switch to default.\n");
> +					active_allocator = &default_allocator;
> +				} else if (pallocator == active_allocator) {
> +					active_allocator = list_entry(&allocators_list, struct ioasid_allocator_data, list);

What is the intent here?  You seem to be calling the list entry dereference on
the list head itself.  That will get you something fairly random on the stack 
I think.

list_first_entry maybe?


> +					pr_info("IOASID allocator changed");
> +				}
> +				break;
> +			}
> +			/*
> +			 * Find the matching shared ops to delete,
> +			 * but keep outstanding IOASIDs
> +			 */
> +			list_for_each_entry(sops, &pallocator->slist, list) {
> +				if (sops == ops) {
> +					list_del(&ops->list);
> +					if (refcount_dec_and_test(&pallocator->users))
> +						pr_err("no shared ops\n");
> +					break;
> +				}
> +			}
> +			break;
> +		}
> +	}
> +
> +exit_unlock:
> +	mutex_unlock(&ioasid_allocator_lock);
> +}
> +EXPORT_SYMBOL_GPL(ioasid_unregister_allocator);
> +
> +/**
> + * ioasid_set_data - Set private data for an allocated ioasid
> + * @ioasid: the ID to set data
> + * @data:   the private data
> + *
> + * For IOASID that is already allocated, private data can be set
> + * via this API. Future lookup can be done via ioasid_find.
> + */
> +int ioasid_set_data(ioasid_t ioasid, void *data)
> +{
> +	struct ioasid_data *ioasid_data;
> +	int ret = 0;
> +
> +	mutex_lock(&ioasid_allocator_lock);
> +	ioasid_data = xa_load(&active_allocator->xa, ioasid);
> +	if (ioasid_data)
> +		ioasid_data->private = data;
> +	else
> +		ret = -ENOENT;
> +	mutex_unlock(&ioasid_allocator_lock);
> +
> +	/* getter may use the private data */
> +	synchronize_rcu();
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(ioasid_set_data);
> +
> +/**
> + * ioasid_alloc - Allocate an IOASID
> + * @set: the IOASID set
> + * @min: the minimum ID (inclusive)
> + * @max: the maximum ID (inclusive)
> + * @private: data private to the caller
> + *
> + * Allocate an ID between @min and @max. Return the allocated ID on success,
> + * or INVALID_IOASID on failure. The @private pointer is stored internally
> + * and can be retrieved with ioasid_find().
> + */
> +ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max,
> +		      void *private)
> +{
> +	ioasid_t id = INVALID_IOASID;
> +	struct ioasid_data *data;
> +
> +	data = kzalloc(sizeof(*data), GFP_KERNEL);
> +	if (!data)
> +		return INVALID_IOASID;
> +
> +	data->set = set;
> +	data->private = private;
> +
> +	mutex_lock(&ioasid_allocator_lock);
> +
> +	id = active_allocator->ops->alloc(min, max, data);
> +	if (id == INVALID_IOASID) {
> +		pr_err("Failed ASID allocation %lu\n", active_allocator->flags);
> +			mutex_unlock(&ioasid_allocator_lock);
> +			goto exit_free;
> +	}
> +	if (active_allocator->flags & IOASID_ALLOCATOR_CUSTOM) {
> +		/* Custom allocator needs framework to store and track allocation results */
> +		min = id;
> +		max = id + 1;

Looking at xa_limit definition these are inclusive limits so why do we let it take
either value?

> +
> +		if (xa_alloc(&active_allocator->xa, &id, data, XA_LIMIT(min, max), GFP_KERNEL)) {
> +			pr_err("Failed to alloc ioasid from %d to %d\n", min, max);
> +			active_allocator->ops->free(id, NULL);
> +			goto exit_free;
> +		}
> +	}
> +	data->id = id;
> +
> +	mutex_unlock(&ioasid_allocator_lock);
> +
> +exit_free:
As with the version of the patch without custom allocator, feels like we can simplify this
section as we can't get here with id == INVALID_IOASID unless we went to the exit_free.

If we did take the exit_free it is always INVALID_IOASID ( I think, but haven't checked...)

So just have a return from the good path and always run
this block i the bad one.
 
> +	if (id == INVALID_IOASID) {
> +		kfree(data);
> +		return INVALID_IOASID;
> +	}
> +	return id;
> +}
> +EXPORT_SYMBOL_GPL(ioasid_alloc);
> +
> +/**
> + * ioasid_free - Free an IOASID
> + * @ioasid: the ID to remove
> + */
> +void ioasid_free(ioasid_t ioasid)
> +{
> +	struct ioasid_data *ioasid_data;
> +
> +	mutex_lock(&ioasid_allocator_lock);
> +
> +	ioasid_data = xa_load(&active_allocator->xa, ioasid);
> +	if (!ioasid_data) {
> +		pr_err("Trying to free unknown IOASID %u\n", ioasid);
> +		goto exit_unlock;
> +	}
> +
> +	active_allocator->ops->free(ioasid, active_allocator->ops->pdata);
> +	/* Custom allocator needs additional steps to free the xa */
xa element perhaps?

> +	if (active_allocator->flags & IOASID_ALLOCATOR_CUSTOM) {
> +		ioasid_data = xa_erase(&active_allocator->xa, ioasid);
> +		kfree_rcu(ioasid_data, rcu);
> +	}
> +
> +exit_unlock:
> +	mutex_unlock(&ioasid_allocator_lock);
> +}
> +EXPORT_SYMBOL_GPL(ioasid_free);
> +
> +/**
> + * ioasid_find - Find IOASID data
> + * @set: the IOASID set
> + * @ioasid: the IOASID to find
> + * @getter: function to call on the found object
> + *
> + * The optional getter function allows to take a reference to the found object
> + * under the rcu lock. The function can also check if the object is still valid:
> + * if @getter returns false, then the object is invalid and NULL is returned.
> + *
> + * If the IOASID has been allocated for this set, return the private pointer
> + * passed to ioasid_alloc. Private data can be NULL if not set. Return an error
> + * if the IOASID is not found or does not belong to the set.
> + */
> +void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
> +		  bool (*getter)(void *))
> +{
> +	void *priv = NULL;

Always set so no need to initialize.

> +	struct ioasid_data *ioasid_data;
> +
> +	rcu_read_lock();
> +	ioasid_data = xa_load(&active_allocator->xa, ioasid);
> +	if (!ioasid_data) {
> +		priv = ERR_PTR(-ENOENT);
> +		goto unlock;
> +	}
> +	if (set && ioasid_data->set != set) {
> +		/* data found but does not belong to the set */
> +		priv = ERR_PTR(-EACCES);
> +		goto unlock;
> +	}
> +	/* Now IOASID and its set is verified, we can return the private data */
> +	priv = ioasid_data->private;
> +	if (getter && !getter(priv))
> +		priv = NULL;
> +unlock:
> +	rcu_read_unlock();
> +
> +	return priv;
> +}
> +EXPORT_SYMBOL_GPL(ioasid_find);
> +
> +MODULE_AUTHOR("Jean-Philippe Brucker <jean-philippe.brucker@arm.com>");
> +MODULE_AUTHOR("Jacob Pan <jacob.jun.pan@linux.intel.com>");
> +MODULE_DESCRIPTION("IO Address Space ID (IOASID) allocator");
> +MODULE_LICENSE("GPL");
> diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
> new file mode 100644
> index 0000000..8c8d1c5
> --- /dev/null
> +++ b/include/linux/ioasid.h
> @@ -0,0 +1,74 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __LINUX_IOASID_H
> +#define __LINUX_IOASID_H
> +
> +#define INVALID_IOASID ((ioasid_t)-1)
> +typedef unsigned int ioasid_t;
> +typedef ioasid_t (*ioasid_alloc_fn_t)(ioasid_t min, ioasid_t max, void *data);
> +typedef void (*ioasid_free_fn_t)(ioasid_t ioasid, void *data);
> +
> +struct ioasid_set {
> +	int dummy;
> +};
> +
> +/**
> + * struct ioasid_allocator_ops - IOASID allocator helper functions and data
> + *
> + * @alloc:	helper function to allocate IOASID
> + * @free:	helper function to free IOASID
> + * @list:	for tracking ops that share helper functions but not data
> + * @pdata:	data belong to the allocator, provided when calling alloc()
and free()

Seems odd to call out one and not the other.
I'm not actually sure it's true though as I can't see pdata being provided
to the alloc call.

> + */
> +struct ioasid_allocator_ops {
> +	ioasid_alloc_fn_t alloc;
> +	ioasid_free_fn_t free;
> +	struct list_head list;
> +	void *pdata;
> +};
> +
> +#define DECLARE_IOASID_SET(name) struct ioasid_set name = { 0 }

I was a little curious on how this would actually be used.
Even more curious is that it isn't used that I can find.

> +
> +#if IS_ENABLED(CONFIG_IOASID)
> +ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max,
> +		      void *private);
> +void ioasid_free(ioasid_t ioasid);
> +
> +void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
> +			bool (*getter)(void *));
> +int ioasid_register_allocator(struct ioasid_allocator_ops *allocator);
> +void ioasid_unregister_allocator(struct ioasid_allocator_ops *allocator);
> +int ioasid_set_data(ioasid_t ioasid, void *data);
> +
> +#else /* !CONFIG_IOASID */
> +static inline ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min,
> +				    ioasid_t max, void *private)
> +{
> +	return INVALID_IOASID;
> +}
> +
> +static inline void ioasid_free(ioasid_t ioasid)
> +{
> +}
> +
> +static inline void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
> +				bool (*getter)(void *))
> +{
> +	return NULL;
> +}
> +
> +static inline int ioasid_register_allocator(struct ioasid_allocator_ops *allocator)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline void ioasid_unregister_allocator(struct ioasid_allocator_ops *allocator)
> +{
> +}
> +
> +static inline int ioasid_set_data(ioasid_t ioasid, void *data)
> +{
> +	return -ENODEV;
> +}
> +
> +#endif /* CONFIG_IOASID */
> +#endif /* __LINUX_IOASID_H */



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 08/22] iommu: Introduce attach/detach_pasid_table API
  2019-06-18 15:41   ` Jonathan Cameron
@ 2019-06-24 15:06     ` Auger Eric
  2019-06-24 15:23       ` Jean-Philippe Brucker
  0 siblings, 1 reply; 64+ messages in thread
From: Auger Eric @ 2019-06-24 15:06 UTC (permalink / raw)
  To: Jonathan Cameron, Jacob Pan
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Alex Williamson,
	Jean-Philippe Brucker, Yi L, Tian, Kevin, Raj Ashok, Liu,
	Andriy Shevchenko

Hi,

On 6/18/19 5:41 PM, Jonathan Cameron wrote:
> On Sun, 9 Jun 2019 06:44:08 -0700
> Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> 
>> In virtualization use case, when a guest is assigned
>> a PCI host device, protected by a virtual IOMMU on the guest,
>> the physical IOMMU must be programmed to be consistent with
>> the guest mappings. If the physical IOMMU supports two
>> translation stages it makes sense to program guest mappings
>> onto the first stage/level (ARM/Intel terminology) while the host
>> owns the stage/level 2.
>>
>> In that case, it is mandated to trap on guest configuration
>> settings and pass those to the physical iommu driver.
>>
>> This patch adds a new API to the iommu subsystem that allows
>> to set/unset the pasid table information.
>>
>> A generic iommu_pasid_table_config struct is introduced in
>> a new iommu.h uapi header. This is going to be used by the VFIO
>> user API.
> 
> Another case where strictly speaking stuff is introduced that this series
> doesn't use.  I don't know what the plans are to merge the various
> related series though so this might make sense in general. Right now
> it just bloats this series a bit..

I am now the only user of this API in
[PATCH v8 00/29] SMMUv3 Nested Stage Setup
(https://patchwork.kernel.org/cover/10961733/).

This can live in this series or Jean-Philippe do you intend to keep on
maintaining it in your api branch?

Thanks

Eric
>>
>> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
>> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
>> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
>> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> Reviewed-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
>> ---
>>  drivers/iommu/iommu.c      | 19 +++++++++++++++++
>>  include/linux/iommu.h      | 18 ++++++++++++++++
>>  include/uapi/linux/iommu.h | 52 ++++++++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 89 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index 166adb8..4496ccd 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -1619,6 +1619,25 @@ int iommu_page_response(struct device *dev,
>>  }
>>  EXPORT_SYMBOL_GPL(iommu_page_response);
>>  
>> +int iommu_attach_pasid_table(struct iommu_domain *domain,
>> +			     struct iommu_pasid_table_config *cfg)
>> +{
>> +	if (unlikely(!domain->ops->attach_pasid_table))
>> +		return -ENODEV;
>> +
>> +	return domain->ops->attach_pasid_table(domain, cfg);
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_attach_pasid_table);
>> +
>> +void iommu_detach_pasid_table(struct iommu_domain *domain)
>> +{
>> +	if (unlikely(!domain->ops->detach_pasid_table))
>> +		return;
>> +
>> +	domain->ops->detach_pasid_table(domain);
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_detach_pasid_table);
>> +
>>  static void __iommu_detach_device(struct iommu_domain *domain,
>>  				  struct device *dev)
>>  {
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index 950347b..d3edb10 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -264,6 +264,8 @@ struct page_response_msg {
>>   * @sva_unbind: Unbind process address space from device
>>   * @sva_get_pasid: Get PASID associated to a SVA handle
>>   * @page_response: handle page request response
>> + * @attach_pasid_table: attach a pasid table
>> + * @detach_pasid_table: detach the pasid table
>>   * @pgsize_bitmap: bitmap of all possible supported page sizes
>>   */
>>  struct iommu_ops {
>> @@ -323,6 +325,9 @@ struct iommu_ops {
>>  				      void *drvdata);
>>  	void (*sva_unbind)(struct iommu_sva *handle);
>>  	int (*sva_get_pasid)(struct iommu_sva *handle);
>> +	int (*attach_pasid_table)(struct iommu_domain *domain,
>> +				  struct iommu_pasid_table_config *cfg);
>> +	void (*detach_pasid_table)(struct iommu_domain *domain);
>>  
>>  	int (*page_response)(struct device *dev, struct page_response_msg *msg);
>>  
>> @@ -434,6 +439,9 @@ extern int iommu_attach_device(struct iommu_domain *domain,
>>  			       struct device *dev);
>>  extern void iommu_detach_device(struct iommu_domain *domain,
>>  				struct device *dev);
>> +extern int iommu_attach_pasid_table(struct iommu_domain *domain,
>> +				    struct iommu_pasid_table_config *cfg);
>> +extern void iommu_detach_pasid_table(struct iommu_domain *domain);
>>  extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
>>  extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
>>  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
>> @@ -947,6 +955,13 @@ iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev)
>>  	return -ENODEV;
>>  }
>>  
>> +static inline
>> +int iommu_attach_pasid_table(struct iommu_domain *domain,
>> +			     struct iommu_pasid_table_config *cfg)
>> +{
>> +	return -ENODEV;
>> +}
>> +
>>  static inline struct iommu_sva *
>>  iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
>>  {
>> @@ -968,6 +983,9 @@ static inline int iommu_sva_get_pasid(struct iommu_sva *handle)
>>  	return IOMMU_PASID_INVALID;
>>  }
>>  
>> +static inline
>> +void iommu_detach_pasid_table(struct iommu_domain *domain) {}
>> +
>>  #endif /* CONFIG_IOMMU_API */
>>  
>>  #ifdef CONFIG_IOMMU_DEBUGFS
>> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
>> index aaa3b6a..3976767 100644
>> --- a/include/uapi/linux/iommu.h
>> +++ b/include/uapi/linux/iommu.h
>> @@ -115,4 +115,56 @@ struct iommu_fault {
>>  		struct iommu_fault_page_request prm;
>>  	};
>>  };
>> +
>> +/**
>> + * struct iommu_pasid_smmuv3 - ARM SMMUv3 Stream Table Entry stage 1 related
>> + *     information
>> + * @version: API version of this structure
>> + * @s1fmt: STE s1fmt (format of the CD table: single CD, linear table
>> + *         or 2-level table)
>> + * @s1dss: STE s1dss (specifies the behavior when @pasid_bits != 0
>> + *         and no PASID is passed along with the incoming transaction)
>> + * @padding: reserved for future use (should be zero)
>> + *
>> + * The PASID table is referred to as the Context Descriptor (CD) table on ARM
>> + * SMMUv3. Please refer to the ARM SMMU 3.x spec (ARM IHI 0070A) for full
>> + * details.
>> + */
>> +struct iommu_pasid_smmuv3 {
>> +#define PASID_TABLE_SMMUV3_CFG_VERSION_1 1
>> +	__u32	version;
>> +	__u8	s1fmt;
>> +	__u8	s1dss;
>> +	__u8	padding[2];
>> +};
>> +
>> +/**
>> + * struct iommu_pasid_table_config - PASID table data used to bind guest PASID
>> + *     table to the host IOMMU
>> + * @version: API version to prepare for future extensions
>> + * @format: format of the PASID table
>> + * @base_ptr: guest physical address of the PASID table
>> + * @pasid_bits: number of PASID bits used in the PASID table
>> + * @config: indicates whether the guest translation stage must
>> + *          be translated, bypassed or aborted.
>> + * @padding: reserved for future use (should be zero)
>> + * @smmuv3: table information when @format is %IOMMU_PASID_FORMAT_SMMUV3
>> + */
>> +struct iommu_pasid_table_config {
>> +#define PASID_TABLE_CFG_VERSION_1 1
>> +	__u32	version;
>> +#define IOMMU_PASID_FORMAT_SMMUV3	1
>> +	__u32	format;
>> +	__u64	base_ptr;
>> +	__u8	pasid_bits;
>> +#define IOMMU_PASID_CONFIG_TRANSLATE	1
>> +#define IOMMU_PASID_CONFIG_BYPASS	2
>> +#define IOMMU_PASID_CONFIG_ABORT	3
>> +	__u8	config;
>> +	__u8    padding[6];
>> +	union {
>> +		struct iommu_pasid_smmuv3 smmuv3;
>> +	};
>> +};
>> +
>>  #endif /* _UAPI_IOMMU_H */
> 
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 08/22] iommu: Introduce attach/detach_pasid_table API
  2019-06-24 15:06     ` Auger Eric
@ 2019-06-24 15:23       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 64+ messages in thread
From: Jean-Philippe Brucker @ 2019-06-24 15:23 UTC (permalink / raw)
  To: Auger Eric, Jonathan Cameron, Jacob Pan
  Cc: Yi L, Tian, Kevin, Raj Ashok, iommu, LKML, Liu, Alex Williamson,
	Andriy Shevchenko, David Woodhouse

On 24/06/2019 16:06, Auger Eric wrote:
> Hi,
> 
> On 6/18/19 5:41 PM, Jonathan Cameron wrote:
>> On Sun, 9 Jun 2019 06:44:08 -0700
>> Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
>>
>>> In virtualization use case, when a guest is assigned
>>> a PCI host device, protected by a virtual IOMMU on the guest,
>>> the physical IOMMU must be programmed to be consistent with
>>> the guest mappings. If the physical IOMMU supports two
>>> translation stages it makes sense to program guest mappings
>>> onto the first stage/level (ARM/Intel terminology) while the host
>>> owns the stage/level 2.
>>>
>>> In that case, it is mandated to trap on guest configuration
>>> settings and pass those to the physical iommu driver.
>>>
>>> This patch adds a new API to the iommu subsystem that allows
>>> to set/unset the pasid table information.
>>>
>>> A generic iommu_pasid_table_config struct is introduced in
>>> a new iommu.h uapi header. This is going to be used by the VFIO
>>> user API.
>>
>> Another case where strictly speaking stuff is introduced that this series
>> doesn't use.  I don't know what the plans are to merge the various
>> related series though so this might make sense in general. Right now
>> it just bloats this series a bit..
> 
> I am now the only user of this API in
> [PATCH v8 00/29] SMMUv3 Nested Stage Setup
> (https://patchwork.kernel.org/cover/10961733/).
> 
> This can live in this series or Jean-Philippe do you intend to keep on
> maintaining it in your api branch?

No, I think it makes sense to keep it within your series

Thanks,
Jean

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 15/22] iommu/vt-d: Replace Intel specific PASID allocator with IOASID
  2019-06-18 15:57   ` Jonathan Cameron
@ 2019-06-24 21:36     ` Jacob Pan
  0 siblings, 0 replies; 64+ messages in thread
From: Jacob Pan @ 2019-06-24 21:36 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker, Tian, Kevin, Raj Ashok,
	Andriy Shevchenko, jacob.jun.pan

On Tue, 18 Jun 2019 16:57:09 +0100
Jonathan Cameron <jonathan.cameron@huawei.com> wrote:

> On Sun, 9 Jun 2019 06:44:15 -0700
> Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> 
> > Make use of generic IOASID code to manage PASID allocation,
> > free, and lookup. Replace Intel specific code.
> > 
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>  
> Hi Jacob,
> 
> One question inline.
> 
Hi Jonathan,

Sorry, I got distracted and missed your review. My answer
below.

Thanks!

Jacob

> Jonathan
> 
> > ---
> >  drivers/iommu/intel-iommu.c | 11 +++++------
> >  drivers/iommu/intel-pasid.c | 36
> > ------------------------------------ drivers/iommu/intel-svm.c   |
> > 37 +++++++++++++++++++++---------------- 3 files changed, 26
> > insertions(+), 58 deletions(-)
> > 
> > diff --git a/drivers/iommu/intel-iommu.c
> > b/drivers/iommu/intel-iommu.c index 5b84994..39b63fe 100644
> > --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -5167,7 +5167,7 @@ static void auxiliary_unlink_device(struct
> > dmar_domain *domain, domain->auxd_refcnt--;
> >  
> >  	if (!domain->auxd_refcnt && domain->default_pasid > 0)
> > -		intel_pasid_free_id(domain->default_pasid);
> > +		ioasid_free(domain->default_pasid);
> >  }
> >  
> >  static int aux_domain_add_dev(struct dmar_domain *domain,
> > @@ -5185,10 +5185,9 @@ static int aux_domain_add_dev(struct
> > dmar_domain *domain, if (domain->default_pasid <= 0) {
> >  		int pasid;
> >  
> > -		pasid = intel_pasid_alloc_id(domain, PASID_MIN,
> > -
> > pci_max_pasids(to_pci_dev(dev)),
> > -					     GFP_KERNEL);
> > -		if (pasid <= 0) {
> > +		pasid = ioasid_alloc(NULL, PASID_MIN,
> > pci_max_pasids(to_pci_dev(dev)) - 1,
> > +				domain);  
> 
> Is there any point in passing the domain in as the private pointer
> here? I can't immediately see anywhere it is read back?
> 
Good point, I agree. There is no need for passing the domain. aux domain
and its defaul_pasid has 1:1 mapping.

> It is also rather confusing as the same driver stashes two different
> types of data in the same xarray.

On VT-d, there should be only one type of data for all PASID users that
needs private data. Whether it is native SVA or guest SVA. Only an
internal flag will tell them apart.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 11/22] iommu: Introduce guest PASID bind function
  2019-06-18 15:36   ` Jean-Philippe Brucker
@ 2019-06-24 22:24     ` Jacob Pan
  0 siblings, 0 replies; 64+ messages in thread
From: Jacob Pan @ 2019-06-24 22:24 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Yi Liu, Tian, Kevin, Raj Ashok,
	Christoph Hellwig, Lu Baolu, Andriy Shevchenko, jacob.jun.pan

On Tue, 18 Jun 2019 16:36:33 +0100
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> wrote:

> On 09/06/2019 14:44, Jacob Pan wrote:
> > Guest shared virtual address (SVA) may require host to shadow guest
> > PASID tables. Guest PASID can also be allocated from the host via
> > enlightened interfaces. In this case, guest needs to bind the guest
> > mm, i.e. cr3 in guest physical address to the actual PASID table in
> > the host IOMMU. Nesting will be turned on such that guest virtual
> > address can go through a two level translation:
> > - 1st level translates GVA to GPA
> > - 2nd level translates GPA to HPA
> > This patch introduces APIs to bind guest PASID data to the assigned
> > device entry in the physical IOMMU. See the diagram below for usage
> > explaination.  
> 
> explanation
> 
will fix, thanks
> > 
> >     .-------------.  .---------------------------.
> >     |   vIOMMU    |  | Guest process mm, FL only |
> >     |             |  '---------------------------'
> >     .----------------/
> >     | PASID Entry |--- PASID cache flush -
> >     '-------------'                       |
> >     |             |                       V
> >     |             |                      GP
> >     '-------------'
> > Guest
> > ------| Shadow |----------------------- GP->HP* ---------
> >       v        v                          |
> > Host                                      v
> >     .-------------.  .----------------------.
> >     |   pIOMMU    |  | Bind FL for GVA-GPA  |
> >     |             |  '----------------------'
> >     .----------------/  |
> >     | PASID Entry |     V (Nested xlate)
> >     '----------------\.---------------------.
> >     |             |   |Set SL to GPA-HPA    |
> >     |             |   '---------------------'
> >     '-------------'
> > 
> > Where:
> >  - FL = First level/stage one page tables
> >  - SL = Second level/stage two page tables
> >  - GP = Guest PASID
> >  - HP = Host PASID
> > * Conversion needed if non-identity GP-HP mapping option is chosen.
> > 
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > ---
> >  drivers/iommu/iommu.c      | 20 ++++++++++++++++
> >  include/linux/iommu.h      | 21 +++++++++++++++++
> >  include/uapi/linux/iommu.h | 58
> > ++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 99
> > insertions(+)
> > 
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index 1758b57..d0416f60 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -1648,6 +1648,26 @@ int iommu_cache_invalidate(struct
> > iommu_domain *domain, struct device *dev, }
> >  EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
> >  
> > +int iommu_sva_bind_gpasid(struct iommu_domain *domain,
> > +			struct device *dev, struct
> > gpasid_bind_data *data)  
> 
> I'm curious about the VFIO side of this. Is the ioctl on the device or
> on the container fd? For bind_pasid_table, it's on the container and
> we only pass the iommu_domain to the IOMMU driver, not the device
> (since devices in a domain share the same PASID table).
> 
VFIO side of gpasid bind is on the container fd (Yi can confirm :)).
We have per device PASID table regardless of domain sharing. It can
provide more protection within the guest.
Second level page tables are harvested from domain for nested
translation.
> > +{
> > +	if (unlikely(!domain->ops->sva_bind_gpasid))
> > +		return -ENODEV;
> > +
> > +	return domain->ops->sva_bind_gpasid(domain, dev, data);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_sva_bind_gpasid);
> > +
> > +int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct
> > device *dev,
> > +			ioasid_t pasid)
> > +{
> > +	if (unlikely(!domain->ops->sva_unbind_gpasid))
> > +		return -ENODEV;
> > +
> > +	return domain->ops->sva_unbind_gpasid(dev, pasid);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid);
> > +
> >  static void __iommu_detach_device(struct iommu_domain *domain,
> >  				  struct device *dev)
> >  {
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 8d766a8..560c8c8 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -25,6 +25,7 @@
> >  #include <linux/errno.h>
> >  #include <linux/err.h>
> >  #include <linux/of.h>
> > +#include <linux/ioasid.h>
> >  #include <uapi/linux/iommu.h>
> >  
> >  #define IOMMU_READ	(1 << 0)
> > @@ -267,6 +268,8 @@ struct page_response_msg {
> >   * @detach_pasid_table: detach the pasid table
> >   * @cache_invalidate: invalidate translation caches
> >   * @pgsize_bitmap: bitmap of all possible supported page sizes
> > + * @sva_bind_gpasid: bind guest pasid and mm
> > + * @sva_unbind_gpasid: unbind guest pasid and mm
> >   */
> >  struct iommu_ops {
> >  	bool (*capable)(enum iommu_cap);
> > @@ -332,6 +335,10 @@ struct iommu_ops {
> >  	int (*page_response)(struct device *dev, struct
> > page_response_msg *msg); int (*cache_invalidate)(struct
> > iommu_domain *domain, struct device *dev, struct
> > iommu_cache_invalidate_info *inv_info);
> > +	int (*sva_bind_gpasid)(struct iommu_domain *domain,
> > +			struct device *dev, struct
> > gpasid_bind_data *data); +
> > +	int (*sva_unbind_gpasid)(struct device *dev, int pasid);
> >  
> >  	unsigned long pgsize_bitmap;
> >  };
> > @@ -447,6 +454,10 @@ extern void iommu_detach_pasid_table(struct
> > iommu_domain *domain); extern int iommu_cache_invalidate(struct
> > iommu_domain *domain, struct device *dev,
> >  				  struct
> > iommu_cache_invalidate_info *inv_info); +extern int
> > iommu_sva_bind_gpasid(struct iommu_domain *domain,
> > +		struct device *dev, struct gpasid_bind_data *data);
> > +extern int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
> > +				struct device *dev, ioasid_t
> > pasid); extern struct iommu_domain *iommu_get_domain_for_dev(struct
> > device *dev); extern struct iommu_domain
> > *iommu_get_dma_domain(struct device *dev); extern int
> > iommu_map(struct iommu_domain *domain, unsigned long iova, @@
> > -998,6 +1009,16 @@ iommu_cache_invalidate(struct iommu_domain
> > *domain, { return -ENODEV;
> >  }
> > +static inline int iommu_sva_bind_gpasid(struct iommu_domain
> > *domain,
> > +				struct device *dev, struct
> > gpasid_bind_data *data) +{
> > +	return -ENODEV;
> > +}
> > +
> > +static inline int sva_unbind_gpasid(struct device *dev, int
> > pasid)  
> 
> The prototype above also has a domain argument
> 
right, i missed the function name and argument.
> > +{
> > +	return -ENODEV;
> > +}
> >  
> >  #endif /* CONFIG_IOMMU_API */
> >  
> > diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> > index ca4b753..a9cdc63 100644
> > --- a/include/uapi/linux/iommu.h
> > +++ b/include/uapi/linux/iommu.h
> > @@ -277,4 +277,62 @@ struct iommu_cache_invalidate_info {
> >  	};
> >  };
> >  
> > +/**
> > + * struct gpasid_bind_data_vtd - Intel VT-d specific data on
> > device and guest
> > + * SVA binding.
> > + *
> > + * @flags:	VT-d PASID table entry attributes
> > + * @pat:	Page attribute table data to compute effective
> > memory type
> > + * @emt:	Extended memory type
> > + *
> > + * Only guest vIOMMU selectable and effective options are passed
> > down to
> > + * the host IOMMU.
> > + */
> > +struct gpasid_bind_data_vtd {
> > +#define IOMMU_SVA_VTD_GPASID_SRE	(1 << 0) /* supervisor
> > request */ +#define IOMMU_SVA_VTD_GPASID_EAFE	(1 << 1) /*
> > extended access enable */ +#define IOMMU_SVA_VTD_GPASID_PCD
> > (1 << 2) /* page-level cache disable */ +#define
> > IOMMU_SVA_VTD_GPASID_PWT	(1 << 3) /* page-level write
> > through */ +#define IOMMU_SVA_VTD_GPASID_EMTE	(1 << 4) /*
> > extended mem type enable */ +#define
> > IOMMU_SVA_VTD_GPASID_CD		(1 << 5) /* PASID-level
> > cache disable */
> > +	__u64 flags;
> > +	__u32 pat;
> > +	__u32 emt;
> > +};
> > +
> > +/**
> > + * struct gpasid_bind_data - Information about device and guest
> > PASID binding
> > + * @version:	Version of this data structure
> > + * @format:	PASID table entry format
> > + * @flags:	Additional information on guest bind request
> > + * @gpgd:	Guest page directory base of the guest mm to bind
> > + * @hpasid:	Process address space ID used for the guest mm
> > in host IOMMU
> > + * @gpasid:	Process address space ID used for the guest mm
> > in guest IOMMU
> > + * @addr_width:	Guest virtual address width  
> 
> + "in bits"
> 
yes, precisely.
> > + * @vtd:	Intel VT-d specific data
> > + *
> > + * Guest to host PASID mapping can be an identity or non-identity,
> > where guest
> > + * has its own PASID space. For non-identify mapping, guest to
> > host PASID lookup
> > + * is needed when VM programs guest PASID into an assigned device.
> > VMM may
> > + * trap such PASID programming then request host IOMMU driver to
> > convert guest
> > + * PASID to host PASID based on this bind data.
> > + */
> > +struct gpasid_bind_data {
> > +#define IOMMU_GPASID_BIND_VERSION_1	1
> > +	__u32 version;
> > +#define IOMMU_PASID_FORMAT_INTEL_VTD	1
> > +	__u32 format;
> > +#define IOMMU_SVA_GPASID_VAL	(1 << 0) /* guest PASID valid
> > */
> > +	__u64 flags;
> > +	__u64 gpgd;
> > +	__u64 hpasid;
> > +	__u64 gpasid;
> > +	__u32 addr_width;  
> 
> We could use a __u8 for addr_width
> 
true

> Thanks,
> Jean
> 
> > +	__u8  padding[4];
> > +	/* Vendor specific data */
> > +	union {
> > +		struct gpasid_bind_data_vtd vtd;
> > +	};
> > +};
> > +
> >  #endif /* _UAPI_IOMMU_H */
> >   
> 

[Jacob Pan]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 10/22] iommu: Fix compile error without IOMMU_API
  2019-06-18 14:10   ` Jonathan Cameron
@ 2019-06-24 22:28     ` Jacob Pan
  0 siblings, 0 replies; 64+ messages in thread
From: Jacob Pan @ 2019-06-24 22:28 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker, Tian, Kevin, Raj Ashok,
	Andriy Shevchenko, jacob.jun.pan

On Tue, 18 Jun 2019 15:10:31 +0100
Jonathan Cameron <jonathan.cameron@huawei.com> wrote:

> On Sun, 9 Jun 2019 06:44:10 -0700
> Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> 
> > struct page_response_msg needs to be defined outside
> > CONFIG_IOMMU_API.  
> 
> What was the error? 
> 
struct page_response_msg can be undefined if under CONFIG_IOMMU_API.
This was resolved after Jean move it to uapi. Sorry, I could have made
it more clear, not a fix for mainline kernel bug just a patch for
previous patch in Jean's API tree.
> If this is a fix for an earlier patch in this series role it in there
> (or put it before it). If more general we should add a fixes tag.
> 
> Jonathan
>  [...]  
> 
> 

[Jacob Pan]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 20/22] iommu/vt-d: Add bind guest PASID support
  2019-06-18 16:44   ` Jonathan Cameron
@ 2019-06-24 22:41     ` Jacob Pan
  0 siblings, 0 replies; 64+ messages in thread
From: Jacob Pan @ 2019-06-24 22:41 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker, Yi L, Tian, Kevin,
	Raj Ashok, Liu, Andriy Shevchenko, jacob.jun.pan

On Tue, 18 Jun 2019 17:44:49 +0100
Jonathan Cameron <jonathan.cameron@huawei.com> wrote:

> On Sun, 9 Jun 2019 06:44:20 -0700
> Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> 
> > When supporting guest SVA with emulated IOMMU, the guest PASID
> > table is shadowed in VMM. Updates to guest vIOMMU PASID table
> > will result in PASID cache flush which will be passed down to
> > the host as bind guest PASID calls.
> > 
> > For the SL page tables, it will be harvested from device's
> > default domain (request w/o PASID), or aux domain in case of
> > mediated device.
> > 
> >     .-------------.  .---------------------------.
> >     |   vIOMMU    |  | Guest process CR3, FL only|
> >     |             |  '---------------------------'
> >     .----------------/
> >     | PASID Entry |--- PASID cache flush -
> >     '-------------'                       |
> >     |             |                       V
> >     |             |                CR3 in GPA
> >     '-------------'
> > Guest
> > ------| Shadow |--------------------------|--------
> >       v        v                          v
> > Host
> >     .-------------.  .----------------------.
> >     |   pIOMMU    |  | Bind FL for GVA-GPA  |
> >     |             |  '----------------------'
> >     .----------------/  |
> >     | PASID Entry |     V (Nested xlate)
> >     '----------------\.------------------------------.
> >     |             |   |SL for GPA-HPA, default domain|
> >     |             |   '------------------------------'
> >     '-------------'
> > Where:
> >  - FL = First level/stage one page tables
> >  - SL = Second level/stage two page tables
> > 
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>  
> 
> 
> A few trivial bits inline.  As far as I can tell looks good but I'm
> not that familiar with the hardware.
> 
Thank you so much for the review.

> Jonathan
> 
> > ---
> >  drivers/iommu/intel-iommu.c |   4 +
> >  drivers/iommu/intel-svm.c   | 187
> > ++++++++++++++++++++++++++++++++++++++++++++
> > include/linux/intel-iommu.h |  13 ++- include/linux/intel-svm.h
> > |  17 ++++ 4 files changed, 219 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/iommu/intel-iommu.c
> > b/drivers/iommu/intel-iommu.c index 7cfa0eb..3b4d712 100644
> > --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -5782,6 +5782,10 @@ const struct iommu_ops intel_iommu_ops = {
> >  	.dev_enable_feat	= intel_iommu_dev_enable_feat,
> >  	.dev_disable_feat	= intel_iommu_dev_disable_feat,
> >  	.pgsize_bitmap		= INTEL_IOMMU_PGSIZES,
> > +#ifdef CONFIG_INTEL_IOMMU_SVM
> > +	.sva_bind_gpasid	= intel_svm_bind_gpasid,
> > +	.sva_unbind_gpasid	= intel_svm_unbind_gpasid,
> > +#endif
> >  };
> >  
> >  static void quirk_iommu_g4x_gfx(struct pci_dev *dev)
> > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> > index 66d98e1..f06a82f 100644
> > --- a/drivers/iommu/intel-svm.c
> > +++ b/drivers/iommu/intel-svm.c
> > @@ -229,6 +229,193 @@ static LIST_HEAD(global_svm_list);
> >  	list_for_each_entry(sdev, &svm->devs, list)	\
> >  	if (dev == sdev->dev)				\
> >  
> > +int intel_svm_bind_gpasid(struct iommu_domain *domain,
> > +			struct device *dev,
> > +			struct gpasid_bind_data *data)
> > +{
> > +	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> > +	struct intel_svm_dev *sdev;
> > +	struct intel_svm *svm = NULL;  
> I think this is set in all the paths that use it..
> 
indeed. will remove.
> > +	struct dmar_domain *ddomain;
> > +	int ret = 0;
> > +
> > +	if (WARN_ON(!iommu) || !data)
> > +		return -EINVAL;
> > +
> > +	if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
> > +		data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
> > +		return -EINVAL;
> > +
> > +	if (dev_is_pci(dev)) {
> > +		/* VT-d supports devices with full 20 bit PASIDs
> > only */
> > +		if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX)
> > +			return -EINVAL;
> > +	}
> > +
> > +	/*
> > +	 * We only check host PASID range, we have no knowledge to
> > check
> > +	 * guest PASID range nor do we use the guest PASID.
> > +	 */
> > +	if (data->hpasid <= 0 || data->hpasid >= PASID_MAX)
> > +		return -EINVAL;
> > +
> > +	ddomain = to_dmar_domain(domain);
> > +	/* REVISIT:
> > +	 * Sanity check adddress width and paging mode support
> > +	 * width matching in two dimensions:
> > +	 * 1. paging mode CPU <= IOMMU
> > +	 * 2. address width Guest <= Host.
> > +	 */
> > +	mutex_lock(&pasid_mutex);
> > +	svm = ioasid_find(NULL, data->hpasid, NULL);
> > +	if (IS_ERR(svm)) {
> > +		ret = PTR_ERR(svm);
> > +		goto out;
> > +	}
> > +	if (svm) {
> > +		/*
> > +		 * If we found svm for the PASID, there must be at
> > +		 * least one device bond, otherwise svm should be
> > freed.
> > +		 */
> > +		BUG_ON(list_empty(&svm->devs));
> > +
> > +		for_each_svm_dev() {
> > +			/* In case of multiple sub-devices of the
> > same pdev assigned, we should
> > +			 * allow multiple bind calls with the same
> > PASID and pdev.
> > +			 */
> > +			sdev->users++;
> > +			goto out;
> > +		}
> > +	} else {
> > +		/* We come here when PASID has never been bond to
> > a device. */
> > +		svm = kzalloc(sizeof(*svm), GFP_KERNEL);
> > +		if (!svm) {
> > +			ret = -ENOMEM;
> > +			goto out;
> > +		}
> > +		/* REVISIT: upper layer/VFIO can track host
> > process that bind the PASID.
> > +		 * ioasid_set = mm might be sufficient for vfio to
> > check pasid VMM
> > +		 * ownership.
> > +		 */
> > +		svm->mm = get_task_mm(current);
> > +		svm->pasid = data->hpasid;
> > +		if (data->flags & IOMMU_SVA_GPASID_VAL) {
> > +			svm->gpasid = data->gpasid;
> > +			svm->flags &= SVM_FLAG_GUEST_PASID;
> > +		}
> > +		refcount_set(&svm->refs, 0);
> > +		ioasid_set_data(data->hpasid, svm);
> > +		INIT_LIST_HEAD_RCU(&svm->devs);
> > +		INIT_LIST_HEAD(&svm->list);
> > +
> > +		mmput(svm->mm);
> > +	}
> > +	sdev = kzalloc(sizeof(*sdev), GFP_KERNEL);
> > +	if (!sdev) {
> > +		ret = -ENOMEM;
> > +		goto out;
> > +	}
> > +	sdev->dev = dev;
> > +	sdev->users = 1;
> > +
> > +	/* Set up device context entry for PASID if not enabled
> > already */
> > +	ret = intel_iommu_enable_pasid(iommu, sdev->dev);
> > +	if (ret) {
> > +		dev_err(dev, "Failed to enable PASID
> > capability\n");
> > +		kfree(sdev);
> > +		goto out;
> > +	}
> > +
> > +	/*
> > +	 * For guest bind, we need to set up PASID table entry as
> > follows:
> > +	 * - FLPM matches guest paging mode
> > +	 * - turn on nested mode
> > +	 * - SL guest address width matching
> > +	 */
> > +	ret = intel_pasid_setup_nested(iommu,
> > +				dev,
> > +				(pgd_t *)data->gpgd,
> > +				data->hpasid,
> > +				data->flags,
> > +				ddomain,
> > +				data->addr_width);
> > +	if (ret) {
> > +		dev_err(dev, "Failed to set up PASID %llu in
> > nested mode, Err %d\n",
> > +			data->hpasid, ret);
> > +		kfree(sdev);
> > +		goto out;
> > +	}
> > +	svm->flags |= SVM_FLAG_GUEST_MODE;
> > +
> > +	init_rcu_head(&sdev->rcu);
> > +	refcount_inc(&svm->refs);
> > +	list_add_rcu(&sdev->list, &svm->devs);
> > + out:
> > +	mutex_unlock(&pasid_mutex);
> > +	return ret;
> > +}
> > +
> > +int intel_svm_unbind_gpasid(struct device *dev, int pasid)
> > +{
> > +	struct intel_svm_dev *sdev;
> > +	struct intel_iommu *iommu;
> > +	struct intel_svm *svm;
> > +	int ret = -EINVAL;
> > +
> > +	mutex_lock(&pasid_mutex);
> > +	iommu = intel_svm_device_to_iommu(dev);
> > +	if (!iommu)
> > +		goto out;
> > +
> > +	svm = ioasid_find(NULL, pasid, NULL);
> > +	if (IS_ERR(svm)) {
> > +		ret = PTR_ERR(svm);
> > +		goto out;
> > +	}
> > +
> > +	if (!svm)
> > +		goto out;
> > +
> > +	for_each_svm_dev() {
> > +		ret = 0;
> > +		sdev->users--;
> > +		if (!sdev->users) {
> > +			list_del_rcu(&sdev->list);
> > +			intel_pasid_tear_down_entry(iommu, dev,
> > svm->pasid);
> > +			/* TODO: Drain in flight PRQ for the PASID
> > since it
> > +			 * may get reused soon, we don't want to
> > +			 * confuse with its previous live.  
> life?
> 
right. thanks.
> > +			 * intel_svm_drain_prq(dev, pasid);
> > +			 */
> > +			kfree_rcu(sdev, rcu);
> > +
> > +			if (list_empty(&svm->devs)) {
> > +				list_del(&svm->list);
> > +				kfree(svm);
> > +				/*
> > +				 * We do not free PASID here until
> > explicit call
> > +				 * from VFIO to free. The PASID
> > life cycle
> > +				 * management is largely tied to
> > VFIO management
> > +				 * of assigned device life cycles.
> > In case of
> > +				 * guest exit without a explicit
> > free PASID call,
> > +				 * the responsibility lies in VFIO
> > layer to free
> > +				 * the PASIDs allocated for the
> > guest.
> > +				 * For security reasons, VFIO has
> > to track the
> > +				 * PASID ownership per guest
> > anyway to ensure
> > +				 * that PASID allocated by one
> > guest cannot be
> > +				 * used by another.
> > +				 */
> > +				ioasid_set_data(pasid, NULL);
> > +			}
> > +		}
> > +		break;
> > +	}
> > + out:
> > +	mutex_unlock(&pasid_mutex);
> > +
> > +	return ret;
> > +}
> > +
> >  int intel_svm_bind_mm(struct device *dev, int *pasid, int flags,
> > struct svm_dev_ops *ops) {
> >  	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> > diff --git a/include/linux/intel-iommu.h
> > b/include/linux/intel-iommu.h index b75f17d..94d3a9a 100644
> > --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -677,7 +677,9 @@ int intel_iommu_enable_pasid(struct intel_iommu
> > *iommu, struct device *dev); int intel_svm_init(struct intel_iommu
> > *iommu); extern int intel_svm_enable_prq(struct intel_iommu *iommu);
> >  extern int intel_svm_finish_prq(struct intel_iommu *iommu);
> > -
> > +extern int intel_svm_bind_gpasid(struct iommu_domain *domain,
> > +		struct device *dev, struct gpasid_bind_data *data);
> > +extern int intel_svm_unbind_gpasid(struct device *dev, int pasid);
> >  struct svm_dev_ops;
> >  
> >  struct intel_svm_dev {
> > @@ -693,12 +695,19 @@ struct intel_svm_dev {
> >  
> >  struct intel_svm {
> >  	struct mmu_notifier notifier;
> > -	struct mm_struct *mm;
> > +	union {
> > +		struct mm_struct *mm;
> > +		u64 gcr3;
> > +	};
> >  	struct intel_iommu *iommu;
> >  	int flags;
> >  	int pasid;
> > +	int gpasid; /* Guest PASID in case of vSVA bind with
> > non-identity host
> > +		     * to guest PASID mapping.
> > +		     */
> >  	struct list_head devs;
> >  	struct list_head list;
> > +	refcount_t refs; /* Number of devices sharing this PASID */
> >  };
> >  
> >  extern struct intel_iommu *intel_svm_device_to_iommu(struct device
> > *dev); diff --git a/include/linux/intel-svm.h
> > b/include/linux/intel-svm.h index e3f7631..577d5df 100644
> > --- a/include/linux/intel-svm.h
> > +++ b/include/linux/intel-svm.h
> > @@ -52,6 +52,23 @@ struct svm_dev_ops {
> >   * do such IOTLB flushes automatically.
> >   */
> >  #define SVM_FLAG_SUPERVISOR_MODE	(1<<1)
> > +/*
> > + * The SVM_FLAG_GUEST_MODE flag is used when a guest process bind
> > to a device.
> > + * In this case the mm_struct is in the guest kernel or userspace,
> > its life
> > + * cycle is managed by VMM and VFIO layer. For IOMMU driver, this
> > API provides
> > + * means to bind/unbind guest CR3 with PASIDs allocated for a
> > device.
> > + */
> > +#define SVM_FLAG_GUEST_MODE	(1<<2)
> > +/*
> > + * The SVM_FLAG_GUEST_PASID flag is used when a guest has its own
> > PASID space,
> > + * which requires guest and host PASID translation at both
> > directions. We keep
> > + * track of guest PASID in order to provide lookup service to
> > device drivers.
> > + * One such example is a physical function (PF) driver that
> > supports mediated
> > + * device (mdev) assignment. Guest programming of mdev
> > configuration space can
> > + * only be done with guest PASID, therefore PF driver needs to
> > find the matching
> > + * host PASID to program the real hardware.
> > + */
> > +#define SVM_FLAG_GUEST_PASID	(1<<3)
> >  
> >  #ifdef CONFIG_INTEL_IOMMU_SVM
> >    
> 
> 

[Jacob Pan]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 17/22] iommu/vt-d: Avoid duplicated code for PASID setup
  2019-06-18 16:03   ` Jonathan Cameron
@ 2019-06-24 23:44     ` Jacob Pan
  0 siblings, 0 replies; 64+ messages in thread
From: Jacob Pan @ 2019-06-24 23:44 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker, Tian, Kevin, Raj Ashok,
	Andriy Shevchenko, jacob.jun.pan

On Tue, 18 Jun 2019 17:03:35 +0100
Jonathan Cameron <jonathan.cameron@huawei.com> wrote:

> On Sun, 9 Jun 2019 06:44:17 -0700
> Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> 
> > After each setup for PASID entry, related translation caches must
> > be flushed. We can combine duplicated code into one function which
> > is less error prone.
> > 
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>  
> Formatting nitpick below ;)
> 
> Otherwise it's cut and paste
> > ---
> >  drivers/iommu/intel-pasid.c | 48
> > +++++++++++++++++---------------------------- 1 file changed, 18
> > insertions(+), 30 deletions(-)
> > 
> > diff --git a/drivers/iommu/intel-pasid.c
> > b/drivers/iommu/intel-pasid.c index 1e25539..1ff2ecc 100644
> > --- a/drivers/iommu/intel-pasid.c
> > +++ b/drivers/iommu/intel-pasid.c
> > @@ -522,6 +522,21 @@ void intel_pasid_tear_down_entry(struct
> > intel_iommu *iommu, devtlb_invalidation_with_pasid(iommu, dev,
> > pasid); }
> >  
> > +static inline void pasid_flush_caches(struct intel_iommu *iommu,
> > +				struct pasid_entry *pte,
> > +				int pasid, u16 did)
> > +{
> > +	if (!ecap_coherent(iommu->ecap))
> > +		clflush_cache_range(pte, sizeof(*pte));
> > +
> > +	if (cap_caching_mode(iommu->cap)) {
> > +		pasid_cache_invalidation_with_pasid(iommu, did,
> > pasid);
> > +		iotlb_invalidation_with_pasid(iommu, did, pasid);
> > +	} else
> > +		iommu_flush_write_buffer(iommu);  
> 
> I have some vague recollection kernel style says use brackets around
> single lines if other blocks in an if / else stack have multiple
> lines..
> 
> I checked, this case is specifically called out
> 
> https://www.kernel.org/doc/html/v5.1/process/coding-style.html
> > +  
> This blank line doesn't add anything either ;)
indeed. i will add the braces and remove the blank line.

Thanks for looking it up.
> > +}
> > +
> >  /*
> >   * Set up the scalable mode pasid table entry for first only
> >   * translation type.
> > @@ -567,16 +582,7 @@ int intel_pasid_setup_first_level(struct
> > intel_iommu *iommu, /* Setup Present and PASID Granular Transfer
> > Type: */ pasid_set_translation_type(pte, 1);
> >  	pasid_set_present(pte);
> > -
> > -	if (!ecap_coherent(iommu->ecap))
> > -		clflush_cache_range(pte, sizeof(*pte));
> > -
> > -	if (cap_caching_mode(iommu->cap)) {
> > -		pasid_cache_invalidation_with_pasid(iommu, did,
> > pasid);
> > -		iotlb_invalidation_with_pasid(iommu, did, pasid);
> > -	} else {
> > -		iommu_flush_write_buffer(iommu);
> > -	}
> > +	pasid_flush_caches(iommu, pte, pasid, did);
> >  
> >  	return 0;
> >  }
> > @@ -640,16 +646,7 @@ int intel_pasid_setup_second_level(struct
> > intel_iommu *iommu, */
> >  	pasid_set_sre(pte);
> >  	pasid_set_present(pte);
> > -
> > -	if (!ecap_coherent(iommu->ecap))
> > -		clflush_cache_range(pte, sizeof(*pte));
> > -
> > -	if (cap_caching_mode(iommu->cap)) {
> > -		pasid_cache_invalidation_with_pasid(iommu, did,
> > pasid);
> > -		iotlb_invalidation_with_pasid(iommu, did, pasid);
> > -	} else {
> > -		iommu_flush_write_buffer(iommu);
> > -	}
> > +	pasid_flush_caches(iommu, pte, pasid, did);
> >  
> >  	return 0;
> >  }
> > @@ -683,16 +680,7 @@ int intel_pasid_setup_pass_through(struct
> > intel_iommu *iommu, */
> >  	pasid_set_sre(pte);
> >  	pasid_set_present(pte);
> > -
> > -	if (!ecap_coherent(iommu->ecap))
> > -		clflush_cache_range(pte, sizeof(*pte));
> > -
> > -	if (cap_caching_mode(iommu->cap)) {
> > -		pasid_cache_invalidation_with_pasid(iommu, did,
> > pasid);
> > -		iotlb_invalidation_with_pasid(iommu, did, pasid);
> > -	} else {
> > -		iommu_flush_write_buffer(iommu);
> > -	}
> > +	pasid_flush_caches(iommu, pte, pasid, did);
> >  
> >  	return 0;
> >  }  
> 
> 

[Jacob Pan]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 19/22] iommu/vt-d: Clean up for SVM device list
  2019-06-18 16:42   ` Jonathan Cameron
@ 2019-06-24 23:59     ` Jacob Pan
  0 siblings, 0 replies; 64+ messages in thread
From: Jacob Pan @ 2019-06-24 23:59 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker, Tian, Kevin, Raj Ashok,
	Andriy Shevchenko, jacob.jun.pan

On Tue, 18 Jun 2019 17:42:37 +0100
Jonathan Cameron <jonathan.cameron@huawei.com> wrote:

> On Sun, 9 Jun 2019 06:44:19 -0700
> Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> 
> > Use combined macro for_each_svm_dev() to simplify SVM device
> > iteration.
> > 
> > Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Reviewed-by: Eric Auger <eric.auger@redhat.com>
> > ---
> >  drivers/iommu/intel-svm.c | 79
> > +++++++++++++++++++++++------------------------ 1 file changed, 39
> > insertions(+), 40 deletions(-)
> > 
> > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> > index 9cbcc1f..66d98e1 100644
> > --- a/drivers/iommu/intel-svm.c
> > +++ b/drivers/iommu/intel-svm.c
> > @@ -225,6 +225,9 @@ static const struct mmu_notifier_ops
> > intel_mmuops = { 
> >  static DEFINE_MUTEX(pasid_mutex);
> >  static LIST_HEAD(global_svm_list);
> > +#define for_each_svm_dev() \
> > +	list_for_each_entry(sdev, &svm->devs, list)	\
> > +	if (dev == sdev->dev)				\  
> 
> Could we make this macro less opaque and have it take the svm and dev
> as arguments?
> 
sounds good, it makes the code more readable.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 12/22] iommu: Add I/O ASID allocator
  2019-06-18 16:50   ` Jonathan Cameron
@ 2019-06-25 18:55     ` Jacob Pan
  0 siblings, 0 replies; 64+ messages in thread
From: Jacob Pan @ 2019-06-25 18:55 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker, Tian, Kevin, Raj Ashok,
	Andriy Shevchenko, jacob.jun.pan

Hi Jonathan,
Thanks for the review, comments inline below. I saw Jean already took
in changes based on your review in his sva/api tree. This is just some
additions.

On Tue, 18 Jun 2019 17:50:59 +0100
Jonathan Cameron <jonathan.cameron@huawei.com> wrote:

> On Sun, 9 Jun 2019 06:44:12 -0700
> Jacob Pan <jacob.jun.pan@linux.intel.com> wrote:
> 
> > From: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> > 
> > Some devices might support multiple DMA address spaces, in
> > particular those that have the PCI PASID feature. PASID (Process
> > Address Space ID) allows to share process address spaces with
> > devices (SVA), partition a device into VM-assignable entities (VFIO
> > mdev) or simply provide multiple DMA address space to kernel
> > drivers. Add a global PASID allocator usable by different drivers
> > at the same time. Name it I/O ASID to avoid confusion with ASIDs
> > allocated by arch code, which are usually a separate ID space.
> > 
> > The IOASID space is global. Each device can have its own PASID
> > space, but by convention the IOMMU ended up having a global PASID
> > space, so that with SVA, each mm_struct is associated to a single
> > PASID.
> > 
> > The allocator is primarily used by IOMMU subsystem but in rare
> > occasions drivers would like to allocate PASIDs for devices that
> > aren't managed by an IOMMU, using the same ID space as IOMMU.
> > 
> > There are two types of allocators:
> > 1. default allocator - Always available, uses an XArray to track
> > 2. custom allocators - Can be registered at runtime, take precedence
> > over the default allocator.
> > 
> > Custom allocators have these attributes:
> > - provides platform specific alloc()/free() functions with private
> > data.
> > - allocation results lookup are not provided by the allocator,
> > lookup request must be done by the IOASID framework by its own
> > XArray.
> > - allocators can be unregistered at runtime, either fallback to the
> > next custom allocator or to the default allocator.  
> 
> What is the usecase for having a 'stack' of custom allocators?
> 
mainly for hotplug. If a vIOMMU provides custom allocation service to a
guest and got hot removed we can use other custom allocators from the
stack.
> > - custom allocators can share the same set of alloc()/free()
> > helpers, in this case they also share the same IOASID space, thus
> > the same XArray.
> > - switching between allocators requires all outstanding IOASIDs to
> > be freed unless the two allocators share the same alloc()/free()
> > helpers.  
> In general this approach has a lot of features where the
> justification is missing from this particular patch.  It may be
> useful to add some more background to this intro?
> 
Good point. Here are the use cases as complexity grow.
1. native (host kernel) PASID/IOASID allocation, this will only use
default allocator, users are IOMMU driver or any device drivers.
2. guest with a single vIOMMU. vIOMMU driver will register a custom
allocator which eventually calls into the host to allocate system-wide
PASID.
3. guest with two or more vIOMMUs but share the same backend to
allocate system-wide PASID. IOASID code detects the sharing based on
alloc/free ops.
4. guest with two or more vIOMMUs each allocate guest-wide PASIDs.
5. In case of hot plug of vIOMMU, the active custom allocator is
chosen from a list established during registration.

> > 
> > Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Link: https://lkml.org/lkml/2019/4/26/462  
> 
> Various comments inline.  Given the several cups of coffee this took
> to review I may well have misunderstood everything ;)
> 
> Jonathan
> > ---
> >  drivers/iommu/Kconfig  |   8 +
> >  drivers/iommu/Makefile |   1 +
> >  drivers/iommu/ioasid.c | 427
> > +++++++++++++++++++++++++++++++++++++++++++++++++
> > include/linux/ioasid.h |  74 +++++++++ 4 files changed, 510
> > insertions(+) create mode 100644 drivers/iommu/ioasid.c
> >  create mode 100644 include/linux/ioasid.h
> > 
> > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > index 83664db..c40c4b5 100644
> > --- a/drivers/iommu/Kconfig
> > +++ b/drivers/iommu/Kconfig
> > @@ -3,6 +3,13 @@
> >  config IOMMU_IOVA
> >  	tristate
> >  
> > +# The IOASID allocator may also be used by non-IOMMU_API users
> > +config IOASID
> > +	tristate
> > +	help
> > +	  Enable the I/O Address Space ID allocator. A single ID
> > space shared
> > +	  between different users.
> > +
> >  # IOMMU_API always gets selected by whoever wants it.
> >  config IOMMU_API
> >  	bool
> > @@ -207,6 +214,7 @@ config INTEL_IOMMU_SVM
> >  	depends on INTEL_IOMMU && X86
> >  	select PCI_PASID
> >  	select MMU_NOTIFIER
> > +	select IOASID
> >  	help
> >  	  Shared Virtual Memory (SVM) provides a facility for
> > devices to access DMA resources through process address space by
> > diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> > index 8c71a15..0efac6f 100644
> > --- a/drivers/iommu/Makefile
> > +++ b/drivers/iommu/Makefile
> > @@ -7,6 +7,7 @@ obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
> >  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
> >  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
> >  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
> > +obj-$(CONFIG_IOASID) += ioasid.o
> >  obj-$(CONFIG_IOMMU_IOVA) += iova.o
> >  obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
> >  obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
> > diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> > new file mode 100644
> > index 0000000..0919b70
> > --- /dev/null
> > +++ b/drivers/iommu/ioasid.c
> > @@ -0,0 +1,427 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * I/O Address Space ID allocator. There is one global IOASID
> > space, split into
> > + * subsets. Users create a subset with DECLARE_IOASID_SET, then
> > allocate and
> > + * free IOASIDs with ioasid_alloc and ioasid_free.
> > + */
> > +#define pr_fmt(fmt)	KBUILD_MODNAME ": " fmt
> > +
> > +#include <linux/xarray.h>
> > +#include <linux/ioasid.h>
> > +#include <linux/slab.h>
> > +#include <linux/module.h>
> > +#include <linux/spinlock.h>
> > +
> > +struct ioasid_data {
> > +	ioasid_t id;
> > +	struct ioasid_set *set;
> > +	void *private;
> > +	struct rcu_head rcu;
> > +};
> > +
> > +/*
> > + * struct ioasid_allocator_data - Internal data structure to hold
> > information
> > + * about an allocator. There are two types of allocators:
> > + *
> > + * - Default allocator always has its own XArray to track the
> > IOASIDs allocated.
> > + * - Custom allocators may share allocation helpers with different
> > private data.
> > + *   Custom allocators share the same helper functions also share
> > the same
> > + *   XArray.
> > + * Rules:
> > + * 1. Default allocator is always available, not dynamically
> > registered. This is
> > + *    to prevent race conditions with early boot code that want to
> > register
> > + *    custom allocators or allocate IOASIDs.
> > + * 2. Custom allocators take precedence over the default allocator.
> > + * 3. When all custom allocators sharing the same helper functions
> > are
> > + *    unregistered (e.g. due to hotplug), all outstanding IOASIDs
> > must be
> > + *    freed.
> > + * 4. When switching between custom allocators sharing the same
> > helper
> > + *    functions, outstanding IOASIDs are preserved.
> > + * 5. When switching between custom allocator and default
> > allocator, all IOASIDs
> > + *    must be freed to ensure unadulterated space for the new
> > allocator.
> > + *
> > + * @ops:	allocator helper functions and its data
> > + * @list:	registered custom allocators
> > + * @slist:	allocators share the same ops but different data
> > + * @flags:	attributes of the allocator
> > + * @users	number of allocators sharing the same ops and
> > XArray
> > + * @xa		xarray holds the IOASID space  
> Doc ordering should match the struct.
> 
Fixed in Jean's tree.
> > + */
> > +struct ioasid_allocator_data {
> > +	struct ioasid_allocator_ops *ops;
> > +	struct list_head list;
> > +	struct list_head slist;
> > +#define IOASID_ALLOCATOR_CUSTOM BIT(0) /* Needs framework to track
> > results */
> > +	unsigned long flags;
> > +	struct xarray xa;
> > +	refcount_t users;
> > +};
> > +
> > +static DEFINE_MUTEX(ioasid_allocator_lock);
> > +static LIST_HEAD(allocators_list);
> > +static ioasid_t default_alloc(ioasid_t min, ioasid_t max, void
> > *opaque); +static void default_free(ioasid_t ioasid, void *opaque);
> > +
> > +static struct ioasid_allocator_ops default_ops = {
> > +	.alloc = default_alloc,
> > +	.free = default_free  
> 
> Pure churn prevention but doesn't seem implausible that we'll end up
> with more ops at somepoint, so add the commas on last elements?
> 
Sounds good, avoid a new line. Fixed in Jean's tree.
> > +};
> > +
> > +static struct ioasid_allocator_data default_allocator = {
> > +	.ops = &default_ops,
> > +	.flags = 0,
> > +	.xa = XARRAY_INIT(ioasid_xa, XA_FLAGS_ALLOC)
> > +};
> > +
> > +static struct ioasid_allocator_data *active_allocator =
> > &default_allocator; +
> > +static ioasid_t default_alloc(ioasid_t min, ioasid_t max, void
> > *opaque) +{  
> Why not reorder to avoid having the forward defs above?  It's only
> moving things a few lines in this case so I'm not seeing a strong
> readability argument.
> 
There is a circular dependency. default_alloc() depends on
default_allocator which in turn depends on default_ops.
> > +	ioasid_t id;
> > +
> > +	if (xa_alloc(&default_allocator.xa, &id, opaque,
> > XA_LIMIT(min, max), GFP_KERNEL)) {
> > +		pr_err("Failed to alloc ioasid from %d to %d\n",
> > min, max);  
> 
> It seems xa_alloc can return a few different error values. Perhaps
> worth printing out what we got as might help in debugging?
> 
yes, will do.
> > +		return INVALID_IOASID;
> > +	}
> > +
> > +	return id;
> > +}
> > +
> > +void default_free(ioasid_t ioasid, void *opaque)
> > +{
> > +	struct ioasid_data *ioasid_data;
> > +
> > +	ioasid_data = xa_erase(&default_allocator.xa, ioasid);
> > +	kfree_rcu(ioasid_data, rcu);
> > +}
> > +
> > +/* Allocate and initialize a new custom allocator with its helper
> > functions */ +static inline struct ioasid_allocator_data
> > *ioasid_alloc_allocator(struct ioasid_allocator_ops *ops)  
> 
> Why inline?
> 
Good catch, fixed by Jean in his tree.
> > +{
> > +	struct ioasid_allocator_data *ia_data;
> > +
> > +	ia_data = kzalloc(sizeof(*ia_data), GFP_KERNEL);
> > +	if (!ia_data)
> > +		return NULL;
> > +
> > +	xa_init_flags(&ia_data->xa, XA_FLAGS_ALLOC);
> > +	INIT_LIST_HEAD(&ia_data->slist);
> > +	ia_data->flags |= IOASID_ALLOCATOR_CUSTOM;
> > +	ia_data->ops = ops;
> > +
> > +	/* For tracking custom allocators that share the same ops
> > */
> > +	list_add_tail(&ops->list, &ia_data->slist);
> > +	refcount_set(&ia_data->users, 1);
> > +
> > +	return ia_data;
> > +}
> > +
> > +static inline bool use_same_ops(struct ioasid_allocator_ops *a,
> > struct ioasid_allocator_ops *b) +{
> > +	return (a->free == b->free) && (a->alloc == b->alloc);
> > +}
> > +
> > +/**
> > + * ioasid_register_allocator - register a custom allocator
> > + * @ops: the custom allocator ops to be registered
> > + *
> > + * Custom allocators take precedence over the default xarray based
> > allocator.
> > + * Private data associated with the IOASID allocated by the custom
> > allocators
> > + * are managed by IOASID framework similar to data stored in xa by
> > default
> > + * allocator.
> > + *
> > + * There can be multiple allocators registered but only one is
> > active. In case
> > + * of runtime removal of a custom allocator, the next one is
> > activated based
> > + * on the registration ordering.
> > + *
> > + * Multiple allocators can share the same alloc() function, in
> > this case the
> > + * IOASID space is shared.
> > + *  
> Nitpick: This extra blank line seems inconsistent.
> 
fixed by Jean in his tree.
> > + */
> > +int ioasid_register_allocator(struct ioasid_allocator_ops *ops)
> > +{
> > +	struct ioasid_allocator_data *ia_data;
> > +	struct ioasid_allocator_data *pallocator;
> > +	int ret = 0;
> > +
> > +	mutex_lock(&ioasid_allocator_lock);
> > +
> > +	ia_data = ioasid_alloc_allocator(ops);
> > +	if (!ia_data) {
> > +		ret = -ENOMEM;
> > +		goto out_unlock;
> > +	}
> > +
> > +	/*
> > +	 * No particular preference, we activate the first one and
> > keep
> > +	 * the later registered allocators in a list in case the
> > first one gets
> > +	 * removed due to hotplug.
> > +	 */
> > +	if (list_empty(&allocators_list)) {
> > +		WARN_ON(active_allocator != &default_allocator);
> > +		/* Use this new allocator if default is not active
> > */
> > +		if (xa_empty(&active_allocator->xa)) {
> > +			active_allocator = ia_data;
> > +			list_add_tail(&ia_data->list,
> > &allocators_list);
> > +			goto out_unlock;
> > +		}
> > +		pr_warn("Default allocator active with outstanding
> > IOASID\n");
> > +		ret = -EAGAIN;
> > +		goto out_free;
> > +	}
> > +
> > +	/* Check if the allocator is already registered */
> > +	list_for_each_entry(pallocator, &allocators_list, list) {
> > +		if (pallocator->ops == ops) {
> > +			pr_err("IOASID allocator already
> > registered\n");
> > +			ret = -EEXIST;
> > +			goto out_free;
> > +		} else if (use_same_ops(pallocator->ops, ops)) {
> > +			/*
> > +			 * If the new allocator shares the same
> > ops,
> > +			 * then they will share the same IOASID
> > space.
> > +			 * We should put them under the same
> > xarray.
> > +			 */
> > +			list_add_tail(&ops->list,
> > &pallocator->slist);
> > +			refcount_inc(&pallocator->users);
> > +			pr_info("New IOASID allocator ops shared
> > %u times\n",
> > +				refcount_read(&pallocator->users));
> > +			goto out_free;
> > +		}
> > +	}
> > +	list_add_tail(&ia_data->list, &allocators_list);
> > +
> > +	goto out_unlock;  
> I commented on this in Jean-Phillipe's set as well I think.
> Cleaner to just have a separate unlock copy here and return.
> 
fixed by Jean in his tree.
> > +
> > +out_free:
> > +	kfree(ia_data);
> > +out_unlock:
> > +	mutex_unlock(&ioasid_allocator_lock);
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(ioasid_register_allocator);
> > +
> > +static inline bool has_shared_ops(struct ioasid_allocator_data
> > *allocator) +{
> > +	return refcount_read(&allocator->users) > 1;
> > +}
> > +
> > +/**
> > + * ioasid_unregister_allocator - Remove a custom IOASID allocator
> > ops
> > + * @ops: the custom allocator to be removed
> > + *
> > + * Remove an allocator from the list, activate the next allocator
> > in
> > + * the order it was registered. Or revert to default allocator if
> > all
> > + * custom allocators are unregistered without outstanding IOASIDs.
> > + */
> > +void ioasid_unregister_allocator(struct ioasid_allocator_ops *ops)
> > +{
> > +	struct ioasid_allocator_data *pallocator;
> > +	struct ioasid_allocator_ops *sops;
> > +
> > +	mutex_lock(&ioasid_allocator_lock);
> > +	if (list_empty(&allocators_list)) {
> > +		pr_warn("No custom IOASID allocators active!\n");
> > +		goto exit_unlock;
> > +	}
> > +
> > +	list_for_each_entry(pallocator, &allocators_list, list) {
> > +		if (use_same_ops(pallocator->ops, ops)) {  
> Could reduce the depth with.
> 
> 		if (!use_same_ops(pallocator->ops, ops))
> 			continue;
> 
> Trade off between having the unusual flow that will resort and the
> code going off the side of your screen (if you use a very small
> screen ;)
> 
fixed by Jean in his tree.
> > +			if (refcount_read(&pallocator->users) ==
> > 1) {
> > +				/* No shared helper functions */
> > +				list_del(&pallocator->list);
> > +				/*
> > +				 * All IOASIDs should have been
> > freed before
> > +				 * the last allocator that shares
> > the same ops
> > +				 * is unregistered.
> > +				 */
> > +
> > WARN_ON(!xa_empty(&pallocator->xa));
> > +				kfree(pallocator);
> > +				if (list_empty(&allocators_list)) {
> > +					pr_info("No custom IOASID
> > allocators, switch to default.\n");
> > +					active_allocator =
> > &default_allocator;
> > +				} else if (pallocator ==
> > active_allocator) {
> > +					active_allocator =
> > list_entry(&allocators_list, struct ioasid_allocator_data, list);  
> 
> What is the intent here?  You seem to be calling the list entry
> dereference on the list head itself.  That will get you something
> fairly random on the stack I think.
> 
> list_first_entry maybe?
> 
The intent here is to pick a new active_allocator if the one being
unregistered happens to be the active allocator. It should be
list_first_entry. Thanks.
> 
> > +					pr_info("IOASID allocator
> > changed");
> > +				}
> > +				break;
> > +			}
> > +			/*
> > +			 * Find the matching shared ops to delete,
> > +			 * but keep outstanding IOASIDs
> > +			 */
> > +			list_for_each_entry(sops,
> > &pallocator->slist, list) {
> > +				if (sops == ops) {
> > +					list_del(&ops->list);
> > +					if
> > (refcount_dec_and_test(&pallocator->users))
> > +						pr_err("no shared
> > ops\n");
> > +					break;
> > +				}
> > +			}
> > +			break;
> > +		}
> > +	}
> > +
> > +exit_unlock:
> > +	mutex_unlock(&ioasid_allocator_lock);
> > +}
> > +EXPORT_SYMBOL_GPL(ioasid_unregister_allocator);
> > +
> > +/**
> > + * ioasid_set_data - Set private data for an allocated ioasid
> > + * @ioasid: the ID to set data
> > + * @data:   the private data
> > + *
> > + * For IOASID that is already allocated, private data can be set
> > + * via this API. Future lookup can be done via ioasid_find.
> > + */
> > +int ioasid_set_data(ioasid_t ioasid, void *data)
> > +{
> > +	struct ioasid_data *ioasid_data;
> > +	int ret = 0;
> > +
> > +	mutex_lock(&ioasid_allocator_lock);
> > +	ioasid_data = xa_load(&active_allocator->xa, ioasid);
> > +	if (ioasid_data)
> > +		ioasid_data->private = data;
> > +	else
> > +		ret = -ENOENT;
> > +	mutex_unlock(&ioasid_allocator_lock);
> > +
> > +	/* getter may use the private data */
> > +	synchronize_rcu();
> > +
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(ioasid_set_data);
> > +
> > +/**
> > + * ioasid_alloc - Allocate an IOASID
> > + * @set: the IOASID set
> > + * @min: the minimum ID (inclusive)
> > + * @max: the maximum ID (inclusive)
> > + * @private: data private to the caller
> > + *
> > + * Allocate an ID between @min and @max. Return the allocated ID
> > on success,
> > + * or INVALID_IOASID on failure. The @private pointer is stored
> > internally
> > + * and can be retrieved with ioasid_find().
> > + */
> > +ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min,
> > ioasid_t max,
> > +		      void *private)
> > +{
> > +	ioasid_t id = INVALID_IOASID;
> > +	struct ioasid_data *data;
> > +
> > +	data = kzalloc(sizeof(*data), GFP_KERNEL);
> > +	if (!data)
> > +		return INVALID_IOASID;
> > +
> > +	data->set = set;
> > +	data->private = private;
> > +
> > +	mutex_lock(&ioasid_allocator_lock);
> > +
> > +	id = active_allocator->ops->alloc(min, max, data);
> > +	if (id == INVALID_IOASID) {
> > +		pr_err("Failed ASID allocation %lu\n",
> > active_allocator->flags);
> > +			mutex_unlock(&ioasid_allocator_lock);
> > +			goto exit_free;
> > +	}
> > +	if (active_allocator->flags & IOASID_ALLOCATOR_CUSTOM) {
> > +		/* Custom allocator needs framework to store and
> > track allocation results */
> > +		min = id;
> > +		max = id + 1;  
> 
> Looking at xa_limit definition these are inclusive limits so why do
> we let it take either value?
> 
you are right, it was idr based which was exclusive. Jean already
fixed it.
> > +
> > +		if (xa_alloc(&active_allocator->xa, &id, data,
> > XA_LIMIT(min, max), GFP_KERNEL)) {
> > +			pr_err("Failed to alloc ioasid from %d to
> > %d\n", min, max);
> > +			active_allocator->ops->free(id, NULL);
> > +			goto exit_free;
> > +		}
> > +	}
> > +	data->id = id;
> > +
> > +	mutex_unlock(&ioasid_allocator_lock);
> > +
> > +exit_free:  
> As with the version of the patch without custom allocator, feels like
> we can simplify this section as we can't get here with id ==
> INVALID_IOASID unless we went to the exit_free.
> 
> If we did take the exit_free it is always INVALID_IOASID ( I think,
> but haven't checked...)
> 
> So just have a return from the good path and always run
> this block i the bad one.
>  
sounds good, fixed in Jean's tree.
> > +	if (id == INVALID_IOASID) {
> > +		kfree(data);
> > +		return INVALID_IOASID;
> > +	}
> > +	return id;
> > +}
> > +EXPORT_SYMBOL_GPL(ioasid_alloc);
> > +
> > +/**
> > + * ioasid_free - Free an IOASID
> > + * @ioasid: the ID to remove
> > + */
> > +void ioasid_free(ioasid_t ioasid)
> > +{
> > +	struct ioasid_data *ioasid_data;
> > +
> > +	mutex_lock(&ioasid_allocator_lock);
> > +
> > +	ioasid_data = xa_load(&active_allocator->xa, ioasid);
> > +	if (!ioasid_data) {
> > +		pr_err("Trying to free unknown IOASID %u\n",
> > ioasid);
> > +		goto exit_unlock;
> > +	}
> > +
> > +	active_allocator->ops->free(ioasid,
> > active_allocator->ops->pdata);
> > +	/* Custom allocator needs additional steps to free the xa
> > */  
> xa element perhaps?
> 
> > +	if (active_allocator->flags & IOASID_ALLOCATOR_CUSTOM) {
> > +		ioasid_data = xa_erase(&active_allocator->xa,
> > ioasid);
> > +		kfree_rcu(ioasid_data, rcu);
> > +	}
> > +
> > +exit_unlock:
> > +	mutex_unlock(&ioasid_allocator_lock);
> > +}
> > +EXPORT_SYMBOL_GPL(ioasid_free);
> > +
> > +/**
> > + * ioasid_find - Find IOASID data
> > + * @set: the IOASID set
> > + * @ioasid: the IOASID to find
> > + * @getter: function to call on the found object
> > + *
> > + * The optional getter function allows to take a reference to the
> > found object
> > + * under the rcu lock. The function can also check if the object
> > is still valid:
> > + * if @getter returns false, then the object is invalid and NULL
> > is returned.
> > + *
> > + * If the IOASID has been allocated for this set, return the
> > private pointer
> > + * passed to ioasid_alloc. Private data can be NULL if not set.
> > Return an error
> > + * if the IOASID is not found or does not belong to the set.
> > + */
> > +void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
> > +		  bool (*getter)(void *))
> > +{
> > +	void *priv = NULL;  
> 
> Always set so no need to initialize.
> 
right
> > +	struct ioasid_data *ioasid_data;
> > +
> > +	rcu_read_lock();
> > +	ioasid_data = xa_load(&active_allocator->xa, ioasid);
> > +	if (!ioasid_data) {
> > +		priv = ERR_PTR(-ENOENT);
> > +		goto unlock;
> > +	}
> > +	if (set && ioasid_data->set != set) {
> > +		/* data found but does not belong to the set */
> > +		priv = ERR_PTR(-EACCES);
> > +		goto unlock;
> > +	}
> > +	/* Now IOASID and its set is verified, we can return the
> > private data */
> > +	priv = ioasid_data->private;
> > +	if (getter && !getter(priv))
> > +		priv = NULL;
> > +unlock:
> > +	rcu_read_unlock();
> > +
> > +	return priv;
> > +}
> > +EXPORT_SYMBOL_GPL(ioasid_find);
> > +
> > +MODULE_AUTHOR("Jean-Philippe Brucker
> > <jean-philippe.brucker@arm.com>"); +MODULE_AUTHOR("Jacob Pan
> > <jacob.jun.pan@linux.intel.com>"); +MODULE_DESCRIPTION("IO Address
> > Space ID (IOASID) allocator"); +MODULE_LICENSE("GPL");
> > diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
> > new file mode 100644
> > index 0000000..8c8d1c5
> > --- /dev/null
> > +++ b/include/linux/ioasid.h
> > @@ -0,0 +1,74 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef __LINUX_IOASID_H
> > +#define __LINUX_IOASID_H
> > +
> > +#define INVALID_IOASID ((ioasid_t)-1)
> > +typedef unsigned int ioasid_t;
> > +typedef ioasid_t (*ioasid_alloc_fn_t)(ioasid_t min, ioasid_t max,
> > void *data); +typedef void (*ioasid_free_fn_t)(ioasid_t ioasid,
> > void *data); +
> > +struct ioasid_set {
> > +	int dummy;
> > +};
> > +
> > +/**
> > + * struct ioasid_allocator_ops - IOASID allocator helper functions
> > and data
> > + *
> > + * @alloc:	helper function to allocate IOASID
> > + * @free:	helper function to free IOASID
> > + * @list:	for tracking ops that share helper functions but
> > not data
> > + * @pdata:	data belong to the allocator, provided when
> > calling alloc()  
> and free()
> 
> Seems odd to call out one and not the other.
> I'm not actually sure it's true though as I can't see pdata being
> provided to the alloc call.
> 
You are right, allocator data should be provided to both free and
alloc. There is a bug in my code, should have called with allocator
private data instead of IOASID private data.
I will fix on top of Jean's version.

Thank you so much,

> > + */
> > +struct ioasid_allocator_ops {
> > +	ioasid_alloc_fn_t alloc;
> > +	ioasid_free_fn_t free;
> > +	struct list_head list;
> > +	void *pdata;
> > +};
> > +
> > +#define DECLARE_IOASID_SET(name) struct ioasid_set name = { 0 }  
> 
> I was a little curious on how this would actually be used.
> Even more curious is that it isn't used that I can find.
> 
It can be used for data types or ownership permission.
For VT-d. We use this as the mm_struct of the PASID allocator to ensure
that only IOASID allocated by the same process can operate on it,
e.g. free or bind.
> > +
> > +#if IS_ENABLED(CONFIG_IOASID)
> > +ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min,
> > ioasid_t max,
> > +		      void *private);
> > +void ioasid_free(ioasid_t ioasid);
> > +
> > +void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
> > +			bool (*getter)(void *));
> > +int ioasid_register_allocator(struct ioasid_allocator_ops
> > *allocator); +void ioasid_unregister_allocator(struct
> > ioasid_allocator_ops *allocator); +int ioasid_set_data(ioasid_t
> > ioasid, void *data); +
> > +#else /* !CONFIG_IOASID */
> > +static inline ioasid_t ioasid_alloc(struct ioasid_set *set,
> > ioasid_t min,
> > +				    ioasid_t max, void *private)
> > +{
> > +	return INVALID_IOASID;
> > +}
> > +
> > +static inline void ioasid_free(ioasid_t ioasid)
> > +{
> > +}
> > +
> > +static inline void *ioasid_find(struct ioasid_set *set, ioasid_t
> > ioasid,
> > +				bool (*getter)(void *))
> > +{
> > +	return NULL;
> > +}
> > +
> > +static inline int ioasid_register_allocator(struct
> > ioasid_allocator_ops *allocator) +{
> > +	return -ENODEV;
> > +}
> > +
> > +static inline void ioasid_unregister_allocator(struct
> > ioasid_allocator_ops *allocator) +{
> > +}
> > +
> > +static inline int ioasid_set_data(ioasid_t ioasid, void *data)
> > +{
> > +	return -ENODEV;
> > +}
> > +
> > +#endif /* CONFIG_IOASID */
> > +#endif /* __LINUX_IOASID_H */  
> 
> 

[Jacob Pan]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 15/22] iommu/vt-d: Replace Intel specific PASID allocator with IOASID
  2019-06-09 13:44 ` [PATCH v4 15/22] iommu/vt-d: Replace Intel specific PASID allocator with IOASID Jacob Pan
  2019-06-18 15:57   ` Jonathan Cameron
@ 2019-06-27  1:53   ` Lu Baolu
  2019-06-27 15:40     ` Jacob Pan
  1 sibling, 1 reply; 64+ messages in thread
From: Lu Baolu @ 2019-06-27  1:53 UTC (permalink / raw)
  To: Jacob Pan, iommu, LKML, Joerg Roedel, David Woodhouse,
	Eric Auger, Alex Williamson, Jean-Philippe Brucker
  Cc: baolu.lu, Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig,
	Andriy Shevchenko

Hi Jacob,

On 6/9/19 9:44 PM, Jacob Pan wrote:
> Make use of generic IOASID code to manage PASID allocation,
> free, and lookup. Replace Intel specific code.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
>   drivers/iommu/intel-iommu.c | 11 +++++------
>   drivers/iommu/intel-pasid.c | 36 ------------------------------------
>   drivers/iommu/intel-svm.c   | 37 +++++++++++++++++++++----------------
>   3 files changed, 26 insertions(+), 58 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 5b84994..39b63fe 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5167,7 +5167,7 @@ static void auxiliary_unlink_device(struct dmar_domain *domain,
>   	domain->auxd_refcnt--;
>   
>   	if (!domain->auxd_refcnt && domain->default_pasid > 0)
> -		intel_pasid_free_id(domain->default_pasid);
> +		ioasid_free(domain->default_pasid);
>   }
>   
>   static int aux_domain_add_dev(struct dmar_domain *domain,
> @@ -5185,10 +5185,9 @@ static int aux_domain_add_dev(struct dmar_domain *domain,
>   	if (domain->default_pasid <= 0) {
>   		int pasid;
>   
> -		pasid = intel_pasid_alloc_id(domain, PASID_MIN,
> -					     pci_max_pasids(to_pci_dev(dev)),
> -					     GFP_KERNEL);
> -		if (pasid <= 0) {
> +		pasid = ioasid_alloc(NULL, PASID_MIN, pci_max_pasids(to_pci_dev(dev)) - 1,
> +				domain);
> +		if (pasid == INVALID_IOASID) {
>   			pr_err("Can't allocate default pasid\n");
>   			return -ENODEV;
>   		}
> @@ -5224,7 +5223,7 @@ static int aux_domain_add_dev(struct dmar_domain *domain,
>   	spin_unlock(&iommu->lock);
>   	spin_unlock_irqrestore(&device_domain_lock, flags);
>   	if (!domain->auxd_refcnt && domain->default_pasid > 0)
> -		intel_pasid_free_id(domain->default_pasid);
> +		ioasid_free(domain->default_pasid);
>   
>   	return ret;
>   }
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index 69fddd3..1e25539 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -26,42 +26,6 @@
>    */
>   static DEFINE_SPINLOCK(pasid_lock);
>   u32 intel_pasid_max_id = PASID_MAX;
> -static DEFINE_IDR(pasid_idr);
> -
> -int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp)
> -{
> -	int ret, min, max;
> -
> -	min = max_t(int, start, PASID_MIN);
> -	max = min_t(int, end, intel_pasid_max_id);
> -
> -	WARN_ON(in_interrupt());
> -	idr_preload(gfp);
> -	spin_lock(&pasid_lock);
> -	ret = idr_alloc(&pasid_idr, ptr, min, max, GFP_ATOMIC);
> -	spin_unlock(&pasid_lock);
> -	idr_preload_end();
> -
> -	return ret;
> -}
> -
> -void intel_pasid_free_id(int pasid)
> -{
> -	spin_lock(&pasid_lock);
> -	idr_remove(&pasid_idr, pasid);
> -	spin_unlock(&pasid_lock);
> -}
> -
> -void *intel_pasid_lookup_id(int pasid)
> -{
> -	void *p;
> -
> -	spin_lock(&pasid_lock);
> -	p = idr_find(&pasid_idr, pasid);
> -	spin_unlock(&pasid_lock);
> -
> -	return p;
> -}
>   
>   int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid)
>   {
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 8f87304..9cbcc1f 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -25,6 +25,7 @@
>   #include <linux/dmar.h>
>   #include <linux/interrupt.h>
>   #include <linux/mm_types.h>
> +#include <linux/ioasid.h>
>   #include <asm/page.h>
>   
>   #include "intel-pasid.h"
> @@ -332,16 +333,15 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_
>   		if (pasid_max > intel_pasid_max_id)
>   			pasid_max = intel_pasid_max_id;
>   
> -		/* Do not use PASID 0 in caching mode (virtualised IOMMU) */
> -		ret = intel_pasid_alloc_id(svm,
> -					   !!cap_caching_mode(iommu->cap),
> -					   pasid_max - 1, GFP_KERNEL);
> -		if (ret < 0) {
> +		/* Do not use PASID 0, reserved for RID to PASID */
> +		svm->pasid = ioasid_alloc(NULL, PASID_MIN,
> +					pasid_max - 1, svm);
> +		if (svm->pasid == INVALID_IOASID) {
>   			kfree(svm);
>   			kfree(sdev);
> +			ret = ENOSPC;
>   			goto out;
>   		}
> -		svm->pasid = ret;
>   		svm->notifier.ops = &intel_mmuops;
>   		svm->mm = mm;
>   		svm->flags = flags;
> @@ -351,7 +351,7 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_
>   		if (mm) {
>   			ret = mmu_notifier_register(&svm->notifier, mm);
>   			if (ret) {
> -				intel_pasid_free_id(svm->pasid);
> +				ioasid_free(svm->pasid);
>   				kfree(svm);
>   				kfree(sdev);
>   				goto out;
> @@ -367,7 +367,7 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_
>   		if (ret) {
>   			if (mm)
>   				mmu_notifier_unregister(&svm->notifier, mm);
> -			intel_pasid_free_id(svm->pasid);
> +			ioasid_free(svm->pasid);
>   			kfree(svm);
>   			kfree(sdev);
>   			goto out;
> @@ -400,7 +400,12 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
>   	if (!iommu)
>   		goto out;
>   
> -	svm = intel_pasid_lookup_id(pasid);
> +	svm = ioasid_find(NULL, pasid, NULL);
> +	if (IS_ERR(svm)) {
> +		ret = PTR_ERR(svm);
> +		goto out;
> +	}
> +
>   	if (!svm)
>   		goto out;
>   

How about using IS_ERR_OR_NULL() here?

> @@ -422,7 +427,7 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
>   				kfree_rcu(sdev, rcu);
>   
>   				if (list_empty(&svm->devs)) {
> -					intel_pasid_free_id(svm->pasid);
> +					ioasid_free(svm->pasid);
>   					if (svm->mm)
>   						mmu_notifier_unregister(&svm->notifier, svm->mm);
>   
> @@ -457,10 +462,11 @@ int intel_svm_is_pasid_valid(struct device *dev, int pasid)
>   	if (!iommu)
>   		goto out;
>   
> -	svm = intel_pasid_lookup_id(pasid);
> -	if (!svm)
> +	svm = ioasid_find(NULL, pasid, NULL);
> +	if (IS_ERR(svm)) {

Same here.

> +		ret = PTR_ERR(svm);
>   		goto out;
> -
> +	}
>   	/* init_mm is used in this case */
>   	if (!svm->mm)
>   		ret = 1;
> @@ -567,13 +573,12 @@ static irqreturn_t prq_event_thread(int irq, void *d)
>   
>   		if (!svm || svm->pasid != req->pasid) {
>   			rcu_read_lock();
> -			svm = intel_pasid_lookup_id(req->pasid);
> +			svm = ioasid_find(NULL, req->pasid, NULL);
>   			/* It *can't* go away, because the driver is not permitted
>   			 * to unbind the mm while any page faults are outstanding.
>   			 * So we only need RCU to protect the internal idr code. */
>   			rcu_read_unlock();
> -
> -			if (!svm) {
> +			if (IS_ERR(svm) || !svm) {

Ditto.

>   				pr_err("%s: Page request for invalid PASID %d: %08llx %08llx\n",
>   				       iommu->name, req->pasid, ((unsigned long long *)req)[0],
>   				       ((unsigned long long *)req)[1]);
> 

Best regards,
Baolu

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 20/22] iommu/vt-d: Add bind guest PASID support
  2019-06-09 13:44 ` [PATCH v4 20/22] iommu/vt-d: Add bind guest PASID support Jacob Pan
  2019-06-18 16:44   ` Jonathan Cameron
@ 2019-06-27  2:50   ` Lu Baolu
  2019-06-27 20:22     ` Jacob Pan
  2019-07-16 16:45   ` Auger Eric
  2 siblings, 1 reply; 64+ messages in thread
From: Lu Baolu @ 2019-06-27  2:50 UTC (permalink / raw)
  To: Jacob Pan, iommu, LKML, Joerg Roedel, David Woodhouse,
	Eric Auger, Alex Williamson, Jean-Philippe Brucker
  Cc: baolu.lu, Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig,
	Andriy Shevchenko

Hi Jacob and Yi,

On 6/9/19 9:44 PM, Jacob Pan wrote:
> When supporting guest SVA with emulated IOMMU, the guest PASID
> table is shadowed in VMM. Updates to guest vIOMMU PASID table
> will result in PASID cache flush which will be passed down to
> the host as bind guest PASID calls.
> 
> For the SL page tables, it will be harvested from device's
> default domain (request w/o PASID), or aux domain in case of
> mediated device.
> 
>      .-------------.  .---------------------------.
>      |   vIOMMU    |  | Guest process CR3, FL only|
>      |             |  '---------------------------'
>      .----------------/
>      | PASID Entry |--- PASID cache flush -
>      '-------------'                       |
>      |             |                       V
>      |             |                CR3 in GPA
>      '-------------'
> Guest
> ------| Shadow |--------------------------|--------
>        v        v                          v
> Host
>      .-------------.  .----------------------.
>      |   pIOMMU    |  | Bind FL for GVA-GPA  |
>      |             |  '----------------------'
>      .----------------/  |
>      | PASID Entry |     V (Nested xlate)
>      '----------------\.------------------------------.
>      |             |   |SL for GPA-HPA, default domain|
>      |             |   '------------------------------'
>      '-------------'
> Where:
>   - FL = First level/stage one page tables
>   - SL = Second level/stage two page tables
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> ---
>   drivers/iommu/intel-iommu.c |   4 +
>   drivers/iommu/intel-svm.c   | 187 ++++++++++++++++++++++++++++++++++++++++++++
>   include/linux/intel-iommu.h |  13 ++-
>   include/linux/intel-svm.h   |  17 ++++
>   4 files changed, 219 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 7cfa0eb..3b4d712 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5782,6 +5782,10 @@ const struct iommu_ops intel_iommu_ops = {
>   	.dev_enable_feat	= intel_iommu_dev_enable_feat,
>   	.dev_disable_feat	= intel_iommu_dev_disable_feat,
>   	.pgsize_bitmap		= INTEL_IOMMU_PGSIZES,
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> +	.sva_bind_gpasid	= intel_svm_bind_gpasid,
> +	.sva_unbind_gpasid	= intel_svm_unbind_gpasid,
> +#endif
>   };
>   
>   static void quirk_iommu_g4x_gfx(struct pci_dev *dev)
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 66d98e1..f06a82f 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -229,6 +229,193 @@ static LIST_HEAD(global_svm_list);
>   	list_for_each_entry(sdev, &svm->devs, list)	\
>   	if (dev == sdev->dev)				\
>   
> +int intel_svm_bind_gpasid(struct iommu_domain *domain,
> +			struct device *dev,
> +			struct gpasid_bind_data *data)
> +{
> +	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> +	struct intel_svm_dev *sdev;
> +	struct intel_svm *svm = NULL;
> +	struct dmar_domain *ddomain;
> +	int ret = 0;
> +
> +	if (WARN_ON(!iommu) || !data)
> +		return -EINVAL;
> +
> +	if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
> +		data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
> +		return -EINVAL;
> +
> +	if (dev_is_pci(dev)) {
> +		/* VT-d supports devices with full 20 bit PASIDs only */
> +		if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX)
> +			return -EINVAL;
> +	}
> +
> +	/*
> +	 * We only check host PASID range, we have no knowledge to check
> +	 * guest PASID range nor do we use the guest PASID.
> +	 */
> +	if (data->hpasid <= 0 || data->hpasid >= PASID_MAX)
> +		return -EINVAL;
> +
> +	ddomain = to_dmar_domain(domain);
> +	/* REVISIT:
> +	 * Sanity check adddress width and paging mode support
> +	 * width matching in two dimensions:
> +	 * 1. paging mode CPU <= IOMMU
> +	 * 2. address width Guest <= Host.
> +	 */
> +	mutex_lock(&pasid_mutex);
> +	svm = ioasid_find(NULL, data->hpasid, NULL);
> +	if (IS_ERR(svm)) {
> +		ret = PTR_ERR(svm);
> +		goto out;
> +	}
> +	if (svm) {
> +		/*
> +		 * If we found svm for the PASID, there must be at
> +		 * least one device bond, otherwise svm should be freed.
> +		 */
> +		BUG_ON(list_empty(&svm->devs));
> +
> +		for_each_svm_dev() {
> +			/* In case of multiple sub-devices of the same pdev assigned, we should
> +			 * allow multiple bind calls with the same PASID and pdev.
> +			 */
> +			sdev->users++;
> +			goto out;
> +		}
> +	} else {
> +		/* We come here when PASID has never been bond to a device. */
> +		svm = kzalloc(sizeof(*svm), GFP_KERNEL);
> +		if (!svm) {
> +			ret = -ENOMEM;
> +			goto out;
> +		}
> +		/* REVISIT: upper layer/VFIO can track host process that bind the PASID.
> +		 * ioasid_set = mm might be sufficient for vfio to check pasid VMM
> +		 * ownership.
> +		 */
> +		svm->mm = get_task_mm(current);
> +		svm->pasid = data->hpasid;
> +		if (data->flags & IOMMU_SVA_GPASID_VAL) {
> +			svm->gpasid = data->gpasid;
> +			svm->flags &= SVM_FLAG_GUEST_PASID;

I am guessing that you want to set this flag bit, so it should be

			svm->flags |= SVM_FLAG_GUEST_PASID;

> +		}
> +		refcount_set(&svm->refs, 0);
> +		ioasid_set_data(data->hpasid, svm);
> +		INIT_LIST_HEAD_RCU(&svm->devs);
> +		INIT_LIST_HEAD(&svm->list);
> +
> +		mmput(svm->mm);
> +	}
> +	sdev = kzalloc(sizeof(*sdev), GFP_KERNEL);
> +	if (!sdev) {
> +		ret = -ENOMEM;
> +		goto out;

I think you need to clean up svm if its device list is empty here, as
you said above:

  +	if (svm) {
  +		/*
  +		 * If we found svm for the PASID, there must be at
  +		 * least one device bond, otherwise svm should be freed.
  +		 */
  +		BUG_ON(list_empty(&svm->devs));
  +

> +	}
> +	sdev->dev = dev;
> +	sdev->users = 1;
> +
> +	/* Set up device context entry for PASID if not enabled already */
> +	ret = intel_iommu_enable_pasid(iommu, sdev->dev);
> +	if (ret) {
> +		dev_err(dev, "Failed to enable PASID capability\n");
> +		kfree(sdev);
> +		goto out;
> +	}
> +
> +	/*
> +	 * For guest bind, we need to set up PASID table entry as follows:
> +	 * - FLPM matches guest paging mode
> +	 * - turn on nested mode
> +	 * - SL guest address width matching
> +	 */
> +	ret = intel_pasid_setup_nested(iommu,
> +				dev,
> +				(pgd_t *)data->gpgd,
> +				data->hpasid,
> +				data->flags,
> +				ddomain,
> +				data->addr_width);
> +	if (ret) {
> +		dev_err(dev, "Failed to set up PASID %llu in nested mode, Err %d\n",
> +			data->hpasid, ret);
> +		kfree(sdev);
> +		goto out;
> +	}
> +	svm->flags |= SVM_FLAG_GUEST_MODE;
> +
> +	init_rcu_head(&sdev->rcu);
> +	refcount_inc(&svm->refs);
> +	list_add_rcu(&sdev->list, &svm->devs);
> + out:
> +	mutex_unlock(&pasid_mutex);
> +	return ret;
> +}
> +
> +int intel_svm_unbind_gpasid(struct device *dev, int pasid)
> +{
> +	struct intel_svm_dev *sdev;
> +	struct intel_iommu *iommu;
> +	struct intel_svm *svm;
> +	int ret = -EINVAL;
> +
> +	mutex_lock(&pasid_mutex);
> +	iommu = intel_svm_device_to_iommu(dev);
> +	if (!iommu)
> +		goto out;
> +
> +	svm = ioasid_find(NULL, pasid, NULL);
> +	if (IS_ERR(svm)) {
> +		ret = PTR_ERR(svm);
> +		goto out;
> +	}
> +
> +	if (!svm)
> +		goto out;
> +
> +	for_each_svm_dev() {
> +		ret = 0;
> +		sdev->users--;
> +		if (!sdev->users) {
> +			list_del_rcu(&sdev->list);
> +			intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
> +			/* TODO: Drain in flight PRQ for the PASID since it
> +			 * may get reused soon, we don't want to
> +			 * confuse with its previous live.
> +			 * intel_svm_drain_prq(dev, pasid);
> +			 */
> +			kfree_rcu(sdev, rcu);
> +
> +			if (list_empty(&svm->devs)) {
> +				list_del(&svm->list);
> +				kfree(svm);
> +				/*
> +				 * We do not free PASID here until explicit call
> +				 * from VFIO to free. The PASID life cycle
> +				 * management is largely tied to VFIO management
> +				 * of assigned device life cycles. In case of
> +				 * guest exit without a explicit free PASID call,
> +				 * the responsibility lies in VFIO layer to free
> +				 * the PASIDs allocated for the guest.
> +				 * For security reasons, VFIO has to track the
> +				 * PASID ownership per guest anyway to ensure
> +				 * that PASID allocated by one guest cannot be
> +				 * used by another.
> +				 */
> +				ioasid_set_data(pasid, NULL);
> +			}
> +		}
> +		break;
> +	}
> + out:
> +	mutex_unlock(&pasid_mutex);
> +
> +	return ret;
> +}
> +
>   int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ops *ops)
>   {
>   	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index b75f17d..94d3a9a 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -677,7 +677,9 @@ int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct device *dev);
>   int intel_svm_init(struct intel_iommu *iommu);
>   extern int intel_svm_enable_prq(struct intel_iommu *iommu);
>   extern int intel_svm_finish_prq(struct intel_iommu *iommu);
> -
> +extern int intel_svm_bind_gpasid(struct iommu_domain *domain,
> +		struct device *dev, struct gpasid_bind_data *data);
> +extern int intel_svm_unbind_gpasid(struct device *dev, int pasid);
>   struct svm_dev_ops;
>   
>   struct intel_svm_dev {
> @@ -693,12 +695,19 @@ struct intel_svm_dev {
>   
>   struct intel_svm {
>   	struct mmu_notifier notifier;
> -	struct mm_struct *mm;
> +	union {
> +		struct mm_struct *mm;
> +		u64 gcr3;

I didn't see gcr3 being used anywhere? Anything I missed?

> +	};
>   	struct intel_iommu *iommu;
>   	int flags;
>   	int pasid;
> +	int gpasid; /* Guest PASID in case of vSVA bind with non-identity host
> +		     * to guest PASID mapping.
> +		     */
>   	struct list_head devs;
>   	struct list_head list;
> +	refcount_t refs; /* Number of devices sharing this PASID */
>   };
>   
>   extern struct intel_iommu *intel_svm_device_to_iommu(struct device *dev);
> diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
> index e3f7631..577d5df 100644
> --- a/include/linux/intel-svm.h
> +++ b/include/linux/intel-svm.h
> @@ -52,6 +52,23 @@ struct svm_dev_ops {
>    * do such IOTLB flushes automatically.
>    */
>   #define SVM_FLAG_SUPERVISOR_MODE	(1<<1)
> +/*
> + * The SVM_FLAG_GUEST_MODE flag is used when a guest process bind to a device.
> + * In this case the mm_struct is in the guest kernel or userspace, its life
> + * cycle is managed by VMM and VFIO layer. For IOMMU driver, this API provides
> + * means to bind/unbind guest CR3 with PASIDs allocated for a device.
> + */
> +#define SVM_FLAG_GUEST_MODE	(1<<2)
> +/*
> + * The SVM_FLAG_GUEST_PASID flag is used when a guest has its own PASID space,
> + * which requires guest and host PASID translation at both directions. We keep
> + * track of guest PASID in order to provide lookup service to device drivers.
> + * One such example is a physical function (PF) driver that supports mediated
> + * device (mdev) assignment. Guest programming of mdev configuration space can
> + * only be done with guest PASID, therefore PF driver needs to find the matching
> + * host PASID to program the real hardware.
> + */
> +#define SVM_FLAG_GUEST_PASID	(1<<3)
>   
>   #ifdef CONFIG_INTEL_IOMMU_SVM
>   
> 

Best regards,
Baolu

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 15/22] iommu/vt-d: Replace Intel specific PASID allocator with IOASID
  2019-06-27  1:53   ` Lu Baolu
@ 2019-06-27 15:40     ` Jacob Pan
  0 siblings, 0 replies; 64+ messages in thread
From: Jacob Pan @ 2019-06-27 15:40 UTC (permalink / raw)
  To: Lu Baolu
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker, Yi Liu, Tian, Kevin,
	Raj Ashok, Christoph Hellwig, Andriy Shevchenko, jacob.jun.pan

On Thu, 27 Jun 2019 09:53:11 +0800
Lu Baolu <baolu.lu@linux.intel.com> wrote:

> Hi Jacob,
> 
> On 6/9/19 9:44 PM, Jacob Pan wrote:
> > Make use of generic IOASID code to manage PASID allocation,
> > free, and lookup. Replace Intel specific code.
> > 
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> >   drivers/iommu/intel-iommu.c | 11 +++++------
> >   drivers/iommu/intel-pasid.c | 36
> > ------------------------------------ drivers/iommu/intel-svm.c   |
> > 37 +++++++++++++++++++++---------------- 3 files changed, 26
> > insertions(+), 58 deletions(-)
> > 
> > diff --git a/drivers/iommu/intel-iommu.c
> > b/drivers/iommu/intel-iommu.c index 5b84994..39b63fe 100644
> > --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -5167,7 +5167,7 @@ static void auxiliary_unlink_device(struct
> > dmar_domain *domain, domain->auxd_refcnt--;
> >   
> >   	if (!domain->auxd_refcnt && domain->default_pasid > 0)
> > -		intel_pasid_free_id(domain->default_pasid);
> > +		ioasid_free(domain->default_pasid);
> >   }
> >   
> >   static int aux_domain_add_dev(struct dmar_domain *domain,
> > @@ -5185,10 +5185,9 @@ static int aux_domain_add_dev(struct
> > dmar_domain *domain, if (domain->default_pasid <= 0) {
> >   		int pasid;
> >   
> > -		pasid = intel_pasid_alloc_id(domain, PASID_MIN,
> > -
> > pci_max_pasids(to_pci_dev(dev)),
> > -					     GFP_KERNEL);
> > -		if (pasid <= 0) {
> > +		pasid = ioasid_alloc(NULL, PASID_MIN,
> > pci_max_pasids(to_pci_dev(dev)) - 1,
> > +				domain);
> > +		if (pasid == INVALID_IOASID) {
> >   			pr_err("Can't allocate default pasid\n");
> >   			return -ENODEV;
> >   		}
> > @@ -5224,7 +5223,7 @@ static int aux_domain_add_dev(struct
> > dmar_domain *domain, spin_unlock(&iommu->lock);
> >   	spin_unlock_irqrestore(&device_domain_lock, flags);
> >   	if (!domain->auxd_refcnt && domain->default_pasid > 0)
> > -		intel_pasid_free_id(domain->default_pasid);
> > +		ioasid_free(domain->default_pasid);
> >   
> >   	return ret;
> >   }
> > diff --git a/drivers/iommu/intel-pasid.c
> > b/drivers/iommu/intel-pasid.c index 69fddd3..1e25539 100644
> > --- a/drivers/iommu/intel-pasid.c
> > +++ b/drivers/iommu/intel-pasid.c
> > @@ -26,42 +26,6 @@
> >    */
> >   static DEFINE_SPINLOCK(pasid_lock);
> >   u32 intel_pasid_max_id = PASID_MAX;
> > -static DEFINE_IDR(pasid_idr);
> > -
> > -int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp)
> > -{
> > -	int ret, min, max;
> > -
> > -	min = max_t(int, start, PASID_MIN);
> > -	max = min_t(int, end, intel_pasid_max_id);
> > -
> > -	WARN_ON(in_interrupt());
> > -	idr_preload(gfp);
> > -	spin_lock(&pasid_lock);
> > -	ret = idr_alloc(&pasid_idr, ptr, min, max, GFP_ATOMIC);
> > -	spin_unlock(&pasid_lock);
> > -	idr_preload_end();
> > -
> > -	return ret;
> > -}
> > -
> > -void intel_pasid_free_id(int pasid)
> > -{
> > -	spin_lock(&pasid_lock);
> > -	idr_remove(&pasid_idr, pasid);
> > -	spin_unlock(&pasid_lock);
> > -}
> > -
> > -void *intel_pasid_lookup_id(int pasid)
> > -{
> > -	void *p;
> > -
> > -	spin_lock(&pasid_lock);
> > -	p = idr_find(&pasid_idr, pasid);
> > -	spin_unlock(&pasid_lock);
> > -
> > -	return p;
> > -}
> >   
> >   int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int
> > *pasid) {
> > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> > index 8f87304..9cbcc1f 100644
> > --- a/drivers/iommu/intel-svm.c
> > +++ b/drivers/iommu/intel-svm.c
> > @@ -25,6 +25,7 @@
> >   #include <linux/dmar.h>
> >   #include <linux/interrupt.h>
> >   #include <linux/mm_types.h>
> > +#include <linux/ioasid.h>
> >   #include <asm/page.h>
> >   
> >   #include "intel-pasid.h"
> > @@ -332,16 +333,15 @@ int intel_svm_bind_mm(struct device *dev, int
> > *pasid, int flags, struct svm_dev_ if (pasid_max >
> > intel_pasid_max_id) pasid_max = intel_pasid_max_id;
> >   
> > -		/* Do not use PASID 0 in caching mode (virtualised
> > IOMMU) */
> > -		ret = intel_pasid_alloc_id(svm,
> > -					   !!cap_caching_mode(iommu->cap),
> > -					   pasid_max - 1,
> > GFP_KERNEL);
> > -		if (ret < 0) {
> > +		/* Do not use PASID 0, reserved for RID to PASID */
> > +		svm->pasid = ioasid_alloc(NULL, PASID_MIN,
> > +					pasid_max - 1, svm);
> > +		if (svm->pasid == INVALID_IOASID) {
> >   			kfree(svm);
> >   			kfree(sdev);
> > +			ret = ENOSPC;
> >   			goto out;
> >   		}
> > -		svm->pasid = ret;
> >   		svm->notifier.ops = &intel_mmuops;
> >   		svm->mm = mm;
> >   		svm->flags = flags;
> > @@ -351,7 +351,7 @@ int intel_svm_bind_mm(struct device *dev, int
> > *pasid, int flags, struct svm_dev_ if (mm) {
> >   			ret =
> > mmu_notifier_register(&svm->notifier, mm); if (ret) {
> > -				intel_pasid_free_id(svm->pasid);
> > +				ioasid_free(svm->pasid);
> >   				kfree(svm);
> >   				kfree(sdev);
> >   				goto out;
> > @@ -367,7 +367,7 @@ int intel_svm_bind_mm(struct device *dev, int
> > *pasid, int flags, struct svm_dev_ if (ret) {
> >   			if (mm)
> >   				mmu_notifier_unregister(&svm->notifier,
> > mm);
> > -			intel_pasid_free_id(svm->pasid);
> > +			ioasid_free(svm->pasid);
> >   			kfree(svm);
> >   			kfree(sdev);
> >   			goto out;
> > @@ -400,7 +400,12 @@ int intel_svm_unbind_mm(struct device *dev,
> > int pasid) if (!iommu)
> >   		goto out;
> >   
> > -	svm = intel_pasid_lookup_id(pasid);
> > +	svm = ioasid_find(NULL, pasid, NULL);
> > +	if (IS_ERR(svm)) {
> > +		ret = PTR_ERR(svm);
> > +		goto out;
> > +	}
> > +
> >   	if (!svm)
> >   		goto out;
> >     
> 
> How about using IS_ERR_OR_NULL() here?
> 
makes sense, same below. thanks!
>  [...]  
> 
> Same here.
> 
>  [...]  
> 
> Ditto.
> 
>  [...]  
> 
> Best regards,
> Baolu

[Jacob Pan]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 20/22] iommu/vt-d: Add bind guest PASID support
  2019-06-27  2:50   ` Lu Baolu
@ 2019-06-27 20:22     ` Jacob Pan
  2019-07-05  2:21       ` Lu Baolu
  0 siblings, 1 reply; 64+ messages in thread
From: Jacob Pan @ 2019-06-27 20:22 UTC (permalink / raw)
  To: Lu Baolu
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker, Yi Liu, Tian, Kevin,
	Raj Ashok, Christoph Hellwig, Andriy Shevchenko, jacob.jun.pan

On Thu, 27 Jun 2019 10:50:11 +0800
Lu Baolu <baolu.lu@linux.intel.com> wrote:

> Hi Jacob and Yi,
> 
> On 6/9/19 9:44 PM, Jacob Pan wrote:
> > When supporting guest SVA with emulated IOMMU, the guest PASID
> > table is shadowed in VMM. Updates to guest vIOMMU PASID table
> > will result in PASID cache flush which will be passed down to
> > the host as bind guest PASID calls.
> > 
> > For the SL page tables, it will be harvested from device's
> > default domain (request w/o PASID), or aux domain in case of
> > mediated device.
> > 
> >      .-------------.  .---------------------------.
> >      |   vIOMMU    |  | Guest process CR3, FL only|
> >      |             |  '---------------------------'
> >      .----------------/
> >      | PASID Entry |--- PASID cache flush -
> >      '-------------'                       |
> >      |             |                       V
> >      |             |                CR3 in GPA
> >      '-------------'
> > Guest
> > ------| Shadow |--------------------------|--------
> >        v        v                          v
> > Host
> >      .-------------.  .----------------------.
> >      |   pIOMMU    |  | Bind FL for GVA-GPA  |
> >      |             |  '----------------------'
> >      .----------------/  |
> >      | PASID Entry |     V (Nested xlate)
> >      '----------------\.------------------------------.
> >      |             |   |SL for GPA-HPA, default domain|
> >      |             |   '------------------------------'
> >      '-------------'
> > Where:
> >   - FL = First level/stage one page tables
> >   - SL = Second level/stage two page tables
> > 
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > ---
> >   drivers/iommu/intel-iommu.c |   4 +
> >   drivers/iommu/intel-svm.c   | 187
> > ++++++++++++++++++++++++++++++++++++++++++++
> > include/linux/intel-iommu.h |  13 ++- include/linux/intel-svm.h
> > |  17 ++++ 4 files changed, 219 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/iommu/intel-iommu.c
> > b/drivers/iommu/intel-iommu.c index 7cfa0eb..3b4d712 100644
> > --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -5782,6 +5782,10 @@ const struct iommu_ops intel_iommu_ops = {
> >   	.dev_enable_feat	= intel_iommu_dev_enable_feat,
> >   	.dev_disable_feat	= intel_iommu_dev_disable_feat,
> >   	.pgsize_bitmap		= INTEL_IOMMU_PGSIZES,
> > +#ifdef CONFIG_INTEL_IOMMU_SVM
> > +	.sva_bind_gpasid	= intel_svm_bind_gpasid,
> > +	.sva_unbind_gpasid	= intel_svm_unbind_gpasid,
> > +#endif
> >   };
> >   
> >   static void quirk_iommu_g4x_gfx(struct pci_dev *dev)
> > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> > index 66d98e1..f06a82f 100644
> > --- a/drivers/iommu/intel-svm.c
> > +++ b/drivers/iommu/intel-svm.c
> > @@ -229,6 +229,193 @@ static LIST_HEAD(global_svm_list);
> >   	list_for_each_entry(sdev, &svm->devs, list)	\
> >   	if (dev == sdev->dev)				\
> >   
> > +int intel_svm_bind_gpasid(struct iommu_domain *domain,
> > +			struct device *dev,
> > +			struct gpasid_bind_data *data)
> > +{
> > +	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> > +	struct intel_svm_dev *sdev;
> > +	struct intel_svm *svm = NULL;
> > +	struct dmar_domain *ddomain;
> > +	int ret = 0;
> > +
> > +	if (WARN_ON(!iommu) || !data)
> > +		return -EINVAL;
> > +
> > +	if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
> > +		data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
> > +		return -EINVAL;
> > +
> > +	if (dev_is_pci(dev)) {
> > +		/* VT-d supports devices with full 20 bit PASIDs
> > only */
> > +		if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX)
> > +			return -EINVAL;
> > +	}
> > +
> > +	/*
> > +	 * We only check host PASID range, we have no knowledge to
> > check
> > +	 * guest PASID range nor do we use the guest PASID.
> > +	 */
> > +	if (data->hpasid <= 0 || data->hpasid >= PASID_MAX)
> > +		return -EINVAL;
> > +
> > +	ddomain = to_dmar_domain(domain);
> > +	/* REVISIT:
> > +	 * Sanity check adddress width and paging mode support
> > +	 * width matching in two dimensions:
> > +	 * 1. paging mode CPU <= IOMMU
> > +	 * 2. address width Guest <= Host.
> > +	 */
> > +	mutex_lock(&pasid_mutex);
> > +	svm = ioasid_find(NULL, data->hpasid, NULL);
> > +	if (IS_ERR(svm)) {
> > +		ret = PTR_ERR(svm);
> > +		goto out;
> > +	}
> > +	if (svm) {
> > +		/*
> > +		 * If we found svm for the PASID, there must be at
> > +		 * least one device bond, otherwise svm should be
> > freed.
> > +		 */
> > +		BUG_ON(list_empty(&svm->devs));
> > +
> > +		for_each_svm_dev() {
> > +			/* In case of multiple sub-devices of the
> > same pdev assigned, we should
> > +			 * allow multiple bind calls with the same
> > PASID and pdev.
> > +			 */
> > +			sdev->users++;
> > +			goto out;
> > +		}
> > +	} else {
> > +		/* We come here when PASID has never been bond to
> > a device. */
> > +		svm = kzalloc(sizeof(*svm), GFP_KERNEL);
> > +		if (!svm) {
> > +			ret = -ENOMEM;
> > +			goto out;
> > +		}
> > +		/* REVISIT: upper layer/VFIO can track host
> > process that bind the PASID.
> > +		 * ioasid_set = mm might be sufficient for vfio to
> > check pasid VMM
> > +		 * ownership.
> > +		 */
> > +		svm->mm = get_task_mm(current);
> > +		svm->pasid = data->hpasid;
> > +		if (data->flags & IOMMU_SVA_GPASID_VAL) {
> > +			svm->gpasid = data->gpasid;
> > +			svm->flags &= SVM_FLAG_GUEST_PASID;  
> 
> I am guessing that you want to set this flag bit, so it should be
> 
> 			svm->flags |= SVM_FLAG_GUEST_PASID;
> 
you are right, this will be used when we have non-identity G-H pasid
mapping. thanks.
> > +		}
> > +		refcount_set(&svm->refs, 0);
> > +		ioasid_set_data(data->hpasid, svm);
> > +		INIT_LIST_HEAD_RCU(&svm->devs);
> > +		INIT_LIST_HEAD(&svm->list);
> > +
> > +		mmput(svm->mm);
> > +	}
> > +	sdev = kzalloc(sizeof(*sdev), GFP_KERNEL);
> > +	if (!sdev) {
> > +		ret = -ENOMEM;
> > +		goto out;  
> 
> I think you need to clean up svm if its device list is empty here, as
> you said above:
> 
No, we come here only if the device list is not empty and the new
device to bind is different than any existing device in the list. If we
cannot allocate memory for the new device, should not free the existing
SVM, right?

>   +	if (svm) {
>   +		/*
>   +		 * If we found svm for the PASID, there must be at
>   +		 * least one device bond, otherwise svm should be
> freed.
>   +		 */
>   +		BUG_ON(list_empty(&svm->devs));
>   +
> 
> > +	}
> > +	sdev->dev = dev;
> > +	sdev->users = 1;
> > +
> > +	/* Set up device context entry for PASID if not enabled
> > already */
> > +	ret = intel_iommu_enable_pasid(iommu, sdev->dev);
> > +	if (ret) {
> > +		dev_err(dev, "Failed to enable PASID
> > capability\n");
> > +		kfree(sdev);
> > +		goto out;
> > +	}
> > +
> > +	/*
> > +	 * For guest bind, we need to set up PASID table entry as
> > follows:
> > +	 * - FLPM matches guest paging mode
> > +	 * - turn on nested mode
> > +	 * - SL guest address width matching
> > +	 */
> > +	ret = intel_pasid_setup_nested(iommu,
> > +				dev,
> > +				(pgd_t *)data->gpgd,
> > +				data->hpasid,
> > +				data->flags,
> > +				ddomain,
> > +				data->addr_width);
> > +	if (ret) {
> > +		dev_err(dev, "Failed to set up PASID %llu in
> > nested mode, Err %d\n",
> > +			data->hpasid, ret);
> > +		kfree(sdev);
> > +		goto out;
> > +	}
> > +	svm->flags |= SVM_FLAG_GUEST_MODE;
> > +
> > +	init_rcu_head(&sdev->rcu);
> > +	refcount_inc(&svm->refs);
> > +	list_add_rcu(&sdev->list, &svm->devs);
> > + out:
> > +	mutex_unlock(&pasid_mutex);
> > +	return ret;
> > +}
> > +
> > +int intel_svm_unbind_gpasid(struct device *dev, int pasid)
> > +{
> > +	struct intel_svm_dev *sdev;
> > +	struct intel_iommu *iommu;
> > +	struct intel_svm *svm;
> > +	int ret = -EINVAL;
> > +
> > +	mutex_lock(&pasid_mutex);
> > +	iommu = intel_svm_device_to_iommu(dev);
> > +	if (!iommu)
> > +		goto out;
> > +
> > +	svm = ioasid_find(NULL, pasid, NULL);
> > +	if (IS_ERR(svm)) {
> > +		ret = PTR_ERR(svm);
> > +		goto out;
> > +	}
> > +
> > +	if (!svm)
> > +		goto out;
> > +
> > +	for_each_svm_dev() {
> > +		ret = 0;
> > +		sdev->users--;
> > +		if (!sdev->users) {
> > +			list_del_rcu(&sdev->list);
> > +			intel_pasid_tear_down_entry(iommu, dev,
> > svm->pasid);
> > +			/* TODO: Drain in flight PRQ for the PASID
> > since it
> > +			 * may get reused soon, we don't want to
> > +			 * confuse with its previous live.
> > +			 * intel_svm_drain_prq(dev, pasid);
> > +			 */
> > +			kfree_rcu(sdev, rcu);
> > +
> > +			if (list_empty(&svm->devs)) {
> > +				list_del(&svm->list);
> > +				kfree(svm);
> > +				/*
> > +				 * We do not free PASID here until
> > explicit call
> > +				 * from VFIO to free. The PASID
> > life cycle
> > +				 * management is largely tied to
> > VFIO management
> > +				 * of assigned device life cycles.
> > In case of
> > +				 * guest exit without a explicit
> > free PASID call,
> > +				 * the responsibility lies in VFIO
> > layer to free
> > +				 * the PASIDs allocated for the
> > guest.
> > +				 * For security reasons, VFIO has
> > to track the
> > +				 * PASID ownership per guest
> > anyway to ensure
> > +				 * that PASID allocated by one
> > guest cannot be
> > +				 * used by another.
> > +				 */
> > +				ioasid_set_data(pasid, NULL);
> > +			}
> > +		}
> > +		break;
> > +	}
> > + out:
> > +	mutex_unlock(&pasid_mutex);
> > +
> > +	return ret;
> > +}
> > +
> >   int intel_svm_bind_mm(struct device *dev, int *pasid, int flags,
> > struct svm_dev_ops *ops) {
> >   	struct intel_iommu *iommu =
> > intel_svm_device_to_iommu(dev); diff --git
> > a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index
> > b75f17d..94d3a9a 100644 --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -677,7 +677,9 @@ int intel_iommu_enable_pasid(struct intel_iommu
> > *iommu, struct device *dev); int intel_svm_init(struct intel_iommu
> > *iommu); extern int intel_svm_enable_prq(struct intel_iommu *iommu);
> >   extern int intel_svm_finish_prq(struct intel_iommu *iommu);
> > -
> > +extern int intel_svm_bind_gpasid(struct iommu_domain *domain,
> > +		struct device *dev, struct gpasid_bind_data *data);
> > +extern int intel_svm_unbind_gpasid(struct device *dev, int pasid);
> >   struct svm_dev_ops;
> >   
> >   struct intel_svm_dev {
> > @@ -693,12 +695,19 @@ struct intel_svm_dev {
> >   
> >   struct intel_svm {
> >   	struct mmu_notifier notifier;
> > -	struct mm_struct *mm;
> > +	union {
> > +		struct mm_struct *mm;
> > +		u64 gcr3;  
> 
> I didn't see gcr3 being used anywhere? Anything I missed?
> 
You are right, not used for now. We directly used guest PGD from passed
in parameters to program first level pointer. I guess we don't need to
store it. I will remove it. Thanks.

> > +	};
> >   	struct intel_iommu *iommu;
> >   	int flags;
> >   	int pasid;
> > +	int gpasid; /* Guest PASID in case of vSVA bind with
> > non-identity host
> > +		     * to guest PASID mapping.
> > +		     */
> >   	struct list_head devs;
> >   	struct list_head list;
> > +	refcount_t refs; /* Number of devices sharing this PASID */
> >   };
> >   
> >   extern struct intel_iommu *intel_svm_device_to_iommu(struct
> > device *dev); diff --git a/include/linux/intel-svm.h
> > b/include/linux/intel-svm.h index e3f7631..577d5df 100644
> > --- a/include/linux/intel-svm.h
> > +++ b/include/linux/intel-svm.h
> > @@ -52,6 +52,23 @@ struct svm_dev_ops {
> >    * do such IOTLB flushes automatically.
> >    */
> >   #define SVM_FLAG_SUPERVISOR_MODE	(1<<1)
> > +/*
> > + * The SVM_FLAG_GUEST_MODE flag is used when a guest process bind
> > to a device.
> > + * In this case the mm_struct is in the guest kernel or userspace,
> > its life
> > + * cycle is managed by VMM and VFIO layer. For IOMMU driver, this
> > API provides
> > + * means to bind/unbind guest CR3 with PASIDs allocated for a
> > device.
> > + */
> > +#define SVM_FLAG_GUEST_MODE	(1<<2)
> > +/*
> > + * The SVM_FLAG_GUEST_PASID flag is used when a guest has its own
> > PASID space,
> > + * which requires guest and host PASID translation at both
> > directions. We keep
> > + * track of guest PASID in order to provide lookup service to
> > device drivers.
> > + * One such example is a physical function (PF) driver that
> > supports mediated
> > + * device (mdev) assignment. Guest programming of mdev
> > configuration space can
> > + * only be done with guest PASID, therefore PF driver needs to
> > find the matching
> > + * host PASID to program the real hardware.
> > + */
> > +#define SVM_FLAG_GUEST_PASID	(1<<3)
> >   
> >   #ifdef CONFIG_INTEL_IOMMU_SVM
> >   
> >   
> 
> Best regards,
> Baolu

Thank you!
[Jacob Pan]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 20/22] iommu/vt-d: Add bind guest PASID support
  2019-06-27 20:22     ` Jacob Pan
@ 2019-07-05  2:21       ` Lu Baolu
  2019-08-14 17:20         ` Jacob Pan
  0 siblings, 1 reply; 64+ messages in thread
From: Lu Baolu @ 2019-07-05  2:21 UTC (permalink / raw)
  To: Jacob Pan
  Cc: baolu.lu, iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker, Yi Liu, Tian, Kevin,
	Raj Ashok, Christoph Hellwig, Andriy Shevchenko

Hi Jacob,

On 6/28/19 4:22 AM, Jacob Pan wrote:
>>> +		}
>>> +		refcount_set(&svm->refs, 0);
>>> +		ioasid_set_data(data->hpasid, svm);
>>> +		INIT_LIST_HEAD_RCU(&svm->devs);
>>> +		INIT_LIST_HEAD(&svm->list);
>>> +
>>> +		mmput(svm->mm);
>>> +	}
>>> +	sdev = kzalloc(sizeof(*sdev), GFP_KERNEL);
>>> +	if (!sdev) {
>>> +		ret = -ENOMEM;
>>> +		goto out;
>> I think you need to clean up svm if its device list is empty here, as
>> you said above:
>>
> No, we come here only if the device list is not empty and the new
> device to bind is different than any existing device in the list. If we
> cannot allocate memory for the new device, should not free the existing
> SVM, right?
> 

I'm sorry, but the code doesn't show this. We come here even an svm data
structure was newly created with an empty device list. I post the code
below to ensure that we are reading a same piece of code.

         mutex_lock(&pasid_mutex);
         svm = ioasid_find(NULL, data->hpasid, NULL);
         if (IS_ERR(svm)) {
                 ret = PTR_ERR(svm);
                 goto out;
         }
         if (svm) {
                 /*
                  * If we found svm for the PASID, there must be at
                  * least one device bond, otherwise svm should be freed.
                  */
                 BUG_ON(list_empty(&svm->devs));

                 for_each_svm_dev() {
                         /* In case of multiple sub-devices of the same 
pdev assigned, we should
                          * allow multiple bind calls with the same 
PASID and pdev.
                          */
                         sdev->users++;
                         goto out;
                 }
         } else {
                 /* We come here when PASID has never been bond to a 
device. */
                 svm = kzalloc(sizeof(*svm), GFP_KERNEL);
                 if (!svm) {
                         ret = -ENOMEM;
                         goto out;
                 }
                 /* REVISIT: upper layer/VFIO can track host process 
that bind the PASID.
                  * ioasid_set = mm might be sufficient for vfio to 
check pasid VMM
                  * ownership.
                  */
                 svm->mm = get_task_mm(current);
                 svm->pasid = data->hpasid;
                 if (data->flags & IOMMU_SVA_GPASID_VAL) {
                         svm->gpasid = data->gpasid;
                         svm->flags &= SVM_FLAG_GUEST_PASID;
                 }
                 refcount_set(&svm->refs, 0);
                 ioasid_set_data(data->hpasid, svm);
                 INIT_LIST_HEAD_RCU(&svm->devs);
                 INIT_LIST_HEAD(&svm->list);

                 mmput(svm->mm);
         }
         sdev = kzalloc(sizeof(*sdev), GFP_KERNEL);
         if (!sdev) {
                 ret = -ENOMEM;
                 goto out;
         }
         sdev->dev = dev;
         sdev->users = 1;

Best regards,
Baolu

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 13/22] iommu/vt-d: Enlightened PASID allocation
  2019-06-09 13:44 ` [PATCH v4 13/22] iommu/vt-d: Enlightened PASID allocation Jacob Pan
@ 2019-07-16  9:29   ` Auger Eric
  2019-08-13 16:57     ` Jacob Pan
  0 siblings, 1 reply; 64+ messages in thread
From: Auger Eric @ 2019-07-16  9:29 UTC (permalink / raw)
  To: Jacob Pan, iommu, LKML, Joerg Roedel, David Woodhouse,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko

Hi Jacob,
On 6/9/19 3:44 PM, Jacob Pan wrote:
> From: Lu Baolu <baolu.lu@linux.intel.com>
> 
> If Intel IOMMU runs in caching mode, a.k.a. virtual IOMMU, the
> IOMMU driver should rely on the emulation software to allocate
> and free PASID IDs. The Intel vt-d spec revision 3.0 defines a
> register set to support this. This includes a capability register,
> a virtual command register and a virtual response register. Refer
> to section 10.4.42, 10.4.43, 10.4.44 for more information.
> 
> This patch adds the enlightened PASID allocation/free interfaces
> via the virtual command register.>
> Cc: Ashok Raj <ashok.raj@intel.com>
> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
>  drivers/iommu/intel-pasid.c | 76 +++++++++++++++++++++++++++++++++++++++++++++
>  drivers/iommu/intel-pasid.h | 13 +++++++-
>  include/linux/intel-iommu.h |  2 ++
>  3 files changed, 90 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index 2fefeaf..69fddd3 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -63,6 +63,82 @@ void *intel_pasid_lookup_id(int pasid)
>  	return p;
>  }
>  
> +int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid)
> +{
> +	u64 res;
> +	u64 cap;
> +	u8 status_code;
> +	unsigned long flags;
> +	int ret = 0;
> +
> +	if (!ecap_vcs(iommu->ecap)) {
> +		pr_warn("IOMMU: %s: Hardware doesn't support virtual command\n",
> +			iommu->name);
> +		return -ENODEV;
> +	}
> +
> +	cap = dmar_readq(iommu->reg + DMAR_VCCAP_REG);
> +	if (!(cap & DMA_VCS_PAS)) {
> +		pr_warn("IOMMU: %s: Emulation software doesn't support PASID allocation\n",
> +			iommu->name);
> +		return -ENODEV;
> +	}
> +
> +	raw_spin_lock_irqsave(&iommu->register_lock, flags);
> +	dmar_writeq(iommu->reg + DMAR_VCMD_REG, VCMD_CMD_ALLOC);
> +	IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
> +		      !(res & VCMD_VRSP_IP), res);
> +	raw_spin_unlock_irqrestore(&iommu->register_lock, flags);
> +
> +	status_code = VCMD_VRSP_SC(res);
> +	switch (status_code) {
> +	case VCMD_VRSP_SC_SUCCESS:
> +		*pasid = VCMD_VRSP_RESULT(res);
> +		break;
> +	case VCMD_VRSP_SC_NO_PASID_AVAIL:
> +		pr_info("IOMMU: %s: No PASID available\n", iommu->name);
> +		ret = -ENOMEM;
> +		break;
> +	default:
> +		ret = -ENODEV;
> +		pr_warn("IOMMU: %s: Unkonwn error code %d\n",
unknown
s/unknown/unexpected
> +			iommu->name, status_code);
> +	}
> +
> +	return ret;
> +}
> +
> +void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid)
> +{
> +	u64 res;
> +	u8 status_code;
> +	unsigned long flags;
> +
> +	if (!ecap_vcs(iommu->ecap)) {
> +		pr_warn("IOMMU: %s: Hardware doesn't support virtual command\n",
> +			iommu->name);
> +		return;
> +	}
Logically shouldn't you also check DMAR_VCCAP_REG as well?
> +
> +	raw_spin_lock_irqsave(&iommu->register_lock, flags);
> +	dmar_writeq(iommu->reg + DMAR_VCMD_REG, (pasid << 8) | VCMD_CMD_FREE);
> +	IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
> +		      !(res & VCMD_VRSP_IP), res);
> +	raw_spin_unlock_irqrestore(&iommu->register_lock, flags);
> +
> +	status_code = VCMD_VRSP_SC(res);
> +	switch (status_code) {
> +	case VCMD_VRSP_SC_SUCCESS:
> +		break;
> +	case VCMD_VRSP_SC_INVALID_PASID:
> +		pr_info("IOMMU: %s: Invalid PASID\n", iommu->name);
> +		break;
> +	default:
> +		pr_warn("IOMMU: %s: Unkonwn error code %d\n",
> +			iommu->name, status_code);
s/Unkonwn/Unexpected
> +	}
> +}
> +
>  /*
>   * Per device pasid table management:
>   */
> diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
> index 23537b3..4b26ab5 100644
> --- a/drivers/iommu/intel-pasid.h
> +++ b/drivers/iommu/intel-pasid.h
> @@ -19,6 +19,16 @@
>  #define PASID_PDE_SHIFT			6
>  #define MAX_NR_PASID_BITS		20
>  
> +/* Virtual command interface for enlightened pasid management. */
> +#define VCMD_CMD_ALLOC			0x1
> +#define VCMD_CMD_FREE			0x2
> +#define VCMD_VRSP_IP			0x1
> +#define VCMD_VRSP_SC(e)			(((e) >> 1) & 0x3)
> +#define VCMD_VRSP_SC_SUCCESS		0
> +#define VCMD_VRSP_SC_NO_PASID_AVAIL	1
> +#define VCMD_VRSP_SC_INVALID_PASID	1
> +#define VCMD_VRSP_RESULT(e)		(((e) >> 8) & 0xfffff)
> +
>  /*
>   * Domain ID reserved for pasid entries programmed for first-level
>   * only and pass-through transfer modes.
> @@ -69,5 +79,6 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
>  				   struct device *dev, int pasid);
>  void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
>  				 struct device *dev, int pasid);
> -
> +int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid);
> +void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid);
>  #endif /* __INTEL_PASID_H */
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 6925a18..bff907b 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -173,6 +173,7 @@
>  #define ecap_smpwc(e)		(((e) >> 48) & 0x1)
>  #define ecap_flts(e)		(((e) >> 47) & 0x1)
>  #define ecap_slts(e)		(((e) >> 46) & 0x1)
> +#define ecap_vcs(e)		(((e) >> 44) & 0x1)
>  #define ecap_smts(e)		(((e) >> 43) & 0x1)
>  #define ecap_dit(e)		((e >> 41) & 0x1)
>  #define ecap_pasid(e)		((e >> 40) & 0x1)
> @@ -289,6 +290,7 @@
>  
>  /* PRS_REG */
>  #define DMA_PRS_PPR	((u32)1)
> +#define DMA_VCS_PAS	((u64)1)
>  
>  #define IOMMU_WAIT_OP(iommu, offset, op, cond, sts)			\
>  do {									\
> 
Otherwise

Reviewed-by: Eric Auger <eric.auger@redhat.com>

Thanks

Eric


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 14/22] iommu/vt-d: Add custom allocator for IOASID
  2019-06-09 13:44 ` [PATCH v4 14/22] iommu/vt-d: Add custom allocator for IOASID Jacob Pan
@ 2019-07-16  9:30   ` Auger Eric
  2019-08-05 20:02     ` Jacob Pan
  0 siblings, 1 reply; 64+ messages in thread
From: Auger Eric @ 2019-07-16  9:30 UTC (permalink / raw)
  To: Jacob Pan, iommu, LKML, Joerg Roedel, David Woodhouse,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko

Hi Jacob,

On 6/9/19 3:44 PM, Jacob Pan wrote:
> When VT-d driver runs in the guest, PASID allocation must be
> performed via virtual command interface. This patch registers a
> custom IOASID allocator which takes precedence over the default
> XArray based allocator. The resulting IOASID allocation will always
> come from the host. This ensures that PASID namespace is system-
> wide.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Liu, Yi L <yi.l.liu@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
>  drivers/iommu/Kconfig       |  1 +
>  drivers/iommu/intel-iommu.c | 60 +++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/intel-iommu.h |  2 ++
>  3 files changed, 63 insertions(+)
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index c40c4b5..5d1bc2a 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -213,6 +213,7 @@ config INTEL_IOMMU_SVM
>  	bool "Support for Shared Virtual Memory with Intel IOMMU"
>  	depends on INTEL_IOMMU && X86
>  	select PCI_PASID
> +	select IOASID
>  	select MMU_NOTIFIER
>  	select IOASID
>  	help
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 09b8ff0..5b84994 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -1711,6 +1711,8 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
>  		if (ecap_prs(iommu->ecap))
>  			intel_svm_finish_prq(iommu);
>  	}
> +	ioasid_unregister_allocator(&iommu->pasid_allocator);
> +
>  #endif
>  }
>  
> @@ -4820,6 +4822,46 @@ static int __init platform_optin_force_iommu(void)
>  	return 1;
>  }
>  
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> +static ioasid_t intel_ioasid_alloc(ioasid_t min, ioasid_t max, void *data)
> +{
> +	struct intel_iommu *iommu = data;
> +	ioasid_t ioasid;
> +
> +	/*
> +	 * VT-d virtual command interface always uses the full 20 bit
> +	 * PASID range. Host can partition guest PASID range based on
> +	 * policies but it is out of guest's control.
> +	 */
> +	if (min < PASID_MIN || max > PASID_MAX)
> +		return -EINVAL;
ioasid_alloc() does not handle that error value, use INVALID_IOASID?
> +
> +	if (vcmd_alloc_pasid(iommu, &ioasid))
> +		return INVALID_IOASID;
> +
> +	return ioasid;
> +}
> +
> +static void intel_ioasid_free(ioasid_t ioasid, void *data)
> +{
> +	struct iommu_pasid_alloc_info *svm;
> +	struct intel_iommu *iommu = data;
> +
> +	if (!iommu)
> +		return;
> +	/*
> +	 * Sanity check the ioasid owner is done at upper layer, e.g. VFIO
> +	 * We can only free the PASID when all the devices are unbond.
> +	 */
> +	svm = ioasid_find(NULL, ioasid, NULL);
> +	if (!svm) {
> +		pr_warn("Freeing unbond IOASID %d\n", ioasid);
> +		return;
> +	}
> +	vcmd_free_pasid(iommu, ioasid);
> +}
> +#endif
> +
>  int __init intel_iommu_init(void)
>  {
>  	int ret = -ENODEV;
> @@ -4924,6 +4966,24 @@ int __init intel_iommu_init(void)
>  				       "%s", iommu->name);
>  		iommu_device_set_ops(&iommu->iommu, &intel_iommu_ops);
>  		iommu_device_register(&iommu->iommu);
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> +		if (cap_caching_mode(iommu->cap) && sm_supported(iommu)) {
> +			/*
> +			 * Register a custom ASID allocator if we are running
> +			 * in a guest, the purpose is to have a system wide PASID
> +			 * namespace among all PASID users.
> +			 * There can be multiple vIOMMUs in each guest but only
> +			 * one allocator is active. All vIOMMU allocators will
> +			 * eventually be calling the same host allocator.
> +			 */
> +			iommu->pasid_allocator.alloc = intel_ioasid_alloc;
> +			iommu->pasid_allocator.free = intel_ioasid_free;
> +			iommu->pasid_allocator.pdata = (void *)iommu;
> +			ret = ioasid_register_allocator(&iommu->pasid_allocator);
> +			if (ret)
> +				pr_warn("Custom PASID allocator registeration failed\n");
what if it fails, don't you want a tear down path?

Thanks

Eric
> +		}
> +#endif
>  	}
>  
>  	bus_set_iommu(&pci_bus_type, &intel_iommu_ops);
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index bff907b..8605c74 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -31,6 +31,7 @@
>  #include <linux/iommu.h>
>  #include <linux/io-64-nonatomic-lo-hi.h>
>  #include <linux/dmar.h>
> +#include <linux/ioasid.h>
>  
>  #include <asm/cacheflush.h>
>  #include <asm/iommu.h>
> @@ -549,6 +550,7 @@ struct intel_iommu {
>  #ifdef CONFIG_INTEL_IOMMU_SVM
>  	struct page_req_dsc *prq;
>  	unsigned char prq_name[16];    /* Name for PRQ interrupt */
> +	struct ioasid_allocator_ops pasid_allocator; /* Custom allocator for PASIDs */
>  #endif
>  	struct q_inval  *qi;            /* Queued invalidation info */
>  	u32 *iommu_state; /* Store iommu states between suspend and resume.*/
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 16/22] iommu/vt-d: Move domain helper to header
  2019-06-09 13:44 ` [PATCH v4 16/22] iommu/vt-d: Move domain helper to header Jacob Pan
@ 2019-07-16  9:33   ` Auger Eric
  0 siblings, 0 replies; 64+ messages in thread
From: Auger Eric @ 2019-07-16  9:33 UTC (permalink / raw)
  To: Jacob Pan, iommu, LKML, Joerg Roedel, David Woodhouse,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko

Hi Jacob,
On 6/9/19 3:44 PM, Jacob Pan wrote:
> Move domainer helper to header to be used by SVA code.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
>  drivers/iommu/intel-iommu.c | 6 ------
>  include/linux/intel-iommu.h | 6 ++++++
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 39b63fe..7cfa0eb 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -427,12 +427,6 @@ static void init_translation_status(struct intel_iommu *iommu)
>  		iommu->flags |= VTD_FLAG_TRANS_PRE_ENABLED;
>  }
>  
> -/* Convert generic 'struct iommu_domain to private struct dmar_domain */
> -static struct dmar_domain *to_dmar_domain(struct iommu_domain *dom)
> -{
> -	return container_of(dom, struct dmar_domain, domain);
> -}
> -
>  static int __init intel_iommu_setup(char *str)
>  {
>  	if (!str)
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 8605c74..b75f17d 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -597,6 +597,12 @@ static inline void __iommu_flush_cache(
>  		clflush_cache_range(addr, size);
>  }
>  
> +/* Convert generic 'struct iommu_domain to private struct dmar_domain */
fix the single '?
> +static inline struct dmar_domain *to_dmar_domain(struct iommu_domain *dom)
> +{
> +	return container_of(dom, struct dmar_domain, domain);
> +}
> +
>  /*
>   * 0: readable
>   * 1: writable
> 
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Thanks

Eric


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 17/22] iommu/vt-d: Avoid duplicated code for PASID setup
  2019-06-09 13:44 ` [PATCH v4 17/22] iommu/vt-d: Avoid duplicated code for PASID setup Jacob Pan
  2019-06-18 16:03   ` Jonathan Cameron
@ 2019-07-16  9:52   ` Auger Eric
  1 sibling, 0 replies; 64+ messages in thread
From: Auger Eric @ 2019-07-16  9:52 UTC (permalink / raw)
  To: Jacob Pan, iommu, LKML, Joerg Roedel, David Woodhouse,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko

Hi Jacob,

On 6/9/19 3:44 PM, Jacob Pan wrote:
> After each setup for PASID entry, related translation caches must be flushed.
> We can combine duplicated code into one function which is less error prone.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
>  drivers/iommu/intel-pasid.c | 48 +++++++++++++++++----------------------------
>  1 file changed, 18 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index 1e25539..1ff2ecc 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -522,6 +522,21 @@ void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
>  		devtlb_invalidation_with_pasid(iommu, dev, pasid);
>  }
>  
> +static inline void pasid_flush_caches(struct intel_iommu *iommu,
> +				struct pasid_entry *pte,
> +				int pasid, u16 did)
> +{
> +	if (!ecap_coherent(iommu->ecap))
> +		clflush_cache_range(pte, sizeof(*pte));
> +
> +	if (cap_caching_mode(iommu->cap)) {
> +		pasid_cache_invalidation_with_pasid(iommu, did, pasid);
> +		iotlb_invalidation_with_pasid(iommu, did, pasid);
> +	} else
> +		iommu_flush_write_buffer(iommu);
Besides the style issue reported by Jonathan and the fact this function
may not deserve the inline (chapt 15 of
https://www.kernel.org/doc/html/v5.1/process/coding-style.html),

Reviewed-by: Eric Auger <eric.auger@redhat.com>

Thanks

Eric
> +
> +}
> +
>  /*
>   * Set up the scalable mode pasid table entry for first only
>   * translation type.
> @@ -567,16 +582,7 @@ int intel_pasid_setup_first_level(struct intel_iommu *iommu,
>  	/* Setup Present and PASID Granular Transfer Type: */
>  	pasid_set_translation_type(pte, 1);
>  	pasid_set_present(pte);
> -
> -	if (!ecap_coherent(iommu->ecap))
> -		clflush_cache_range(pte, sizeof(*pte));
> -
> -	if (cap_caching_mode(iommu->cap)) {
> -		pasid_cache_invalidation_with_pasid(iommu, did, pasid);
> -		iotlb_invalidation_with_pasid(iommu, did, pasid);
> -	} else {
> -		iommu_flush_write_buffer(iommu);
> -	}
> +	pasid_flush_caches(iommu, pte, pasid, did);
>  
>  	return 0;
>  }
> @@ -640,16 +646,7 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
>  	 */
>  	pasid_set_sre(pte);
>  	pasid_set_present(pte);
> -
> -	if (!ecap_coherent(iommu->ecap))
> -		clflush_cache_range(pte, sizeof(*pte));
> -
> -	if (cap_caching_mode(iommu->cap)) {
> -		pasid_cache_invalidation_with_pasid(iommu, did, pasid);
> -		iotlb_invalidation_with_pasid(iommu, did, pasid);
> -	} else {
> -		iommu_flush_write_buffer(iommu);
> -	}
> +	pasid_flush_caches(iommu, pte, pasid, did);
>  
>  	return 0;
>  }
> @@ -683,16 +680,7 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
>  	 */
>  	pasid_set_sre(pte);
>  	pasid_set_present(pte);
> -
> -	if (!ecap_coherent(iommu->ecap))
> -		clflush_cache_range(pte, sizeof(*pte));
> -
> -	if (cap_caching_mode(iommu->cap)) {
> -		pasid_cache_invalidation_with_pasid(iommu, did, pasid);
> -		iotlb_invalidation_with_pasid(iommu, did, pasid);
> -	} else {
> -		iommu_flush_write_buffer(iommu);
> -	}
> +	pasid_flush_caches(iommu, pte, pasid, did);
>  
>  	return 0;
>  }
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 11/22] iommu: Introduce guest PASID bind function
  2019-06-09 13:44 ` [PATCH v4 11/22] iommu: Introduce guest PASID bind function Jacob Pan
  2019-06-18 15:36   ` Jean-Philippe Brucker
@ 2019-07-16 16:44   ` Auger Eric
  2019-08-05 21:02     ` Jacob Pan
  2019-08-05 23:13     ` Jacob Pan
  1 sibling, 2 replies; 64+ messages in thread
From: Auger Eric @ 2019-07-16 16:44 UTC (permalink / raw)
  To: Jacob Pan, iommu, LKML, Joerg Roedel, David Woodhouse,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko

Hi Jacob,

On 6/9/19 3:44 PM, Jacob Pan wrote:
> Guest shared virtual address (SVA) may require host to shadow guest
> PASID tables. Guest PASID can also be allocated from the host via
> enlightened interfaces. In this case, guest needs to bind the guest
> mm, i.e. cr3 in guest physical address to the actual PASID table in
> the host IOMMU. Nesting will be turned on such that guest virtual
> address can go through a two level translation:
> - 1st level translates GVA to GPA
> - 2nd level translates GPA to HPA
> This patch introduces APIs to bind guest PASID data to the assigned
> device entry in the physical IOMMU. See the diagram below for usage
> explaination.
> 
>     .-------------.  .---------------------------.
>     |   vIOMMU    |  | Guest process mm, FL only |
>     |             |  '---------------------------'
>     .----------------/
>     | PASID Entry |--- PASID cache flush -
>     '-------------'                       |
>     |             |                       V
>     |             |                      GP
>     '-------------'
> Guest
> ------| Shadow |----------------------- GP->HP* ---------
>       v        v                          |
> Host                                      v
>     .-------------.  .----------------------.
>     |   pIOMMU    |  | Bind FL for GVA-GPA  |
>     |             |  '----------------------'
>     .----------------/  |
>     | PASID Entry |     V (Nested xlate)
>     '----------------\.---------------------.
>     |             |   |Set SL to GPA-HPA    |
>     |             |   '---------------------'
>     '-------------'
> 
> Where:
>  - FL = First level/stage one page tables
>  - SL = Second level/stage two page tables
>  - GP = Guest PASID
>  - HP = Host PASID
> * Conversion needed if non-identity GP-HP mapping option is chosen.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> ---
>  drivers/iommu/iommu.c      | 20 ++++++++++++++++
>  include/linux/iommu.h      | 21 +++++++++++++++++
>  include/uapi/linux/iommu.h | 58 ++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 99 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 1758b57..d0416f60 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1648,6 +1648,26 @@ int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
>  }
>  EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
>  
> +int iommu_sva_bind_gpasid(struct iommu_domain *domain,
> +			struct device *dev, struct gpasid_bind_data *data)
> +{
> +	if (unlikely(!domain->ops->sva_bind_gpasid))
> +		return -ENODEV;
> +
> +	return domain->ops->sva_bind_gpasid(domain, dev, data);
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_bind_gpasid);
> +
> +int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
> +			ioasid_t pasid)
> +{
> +	if (unlikely(!domain->ops->sva_unbind_gpasid))
> +		return -ENODEV;
> +
> +	return domain->ops->sva_unbind_gpasid(dev, pasid);
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid);
> +
>  static void __iommu_detach_device(struct iommu_domain *domain,
>  				  struct device *dev)
>  {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 8d766a8..560c8c8 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -25,6 +25,7 @@
>  #include <linux/errno.h>
>  #include <linux/err.h>
>  #include <linux/of.h>
> +#include <linux/ioasid.h>
>  #include <uapi/linux/iommu.h>
>  
>  #define IOMMU_READ	(1 << 0)
> @@ -267,6 +268,8 @@ struct page_response_msg {
>   * @detach_pasid_table: detach the pasid table
>   * @cache_invalidate: invalidate translation caches
>   * @pgsize_bitmap: bitmap of all possible supported page sizes
> + * @sva_bind_gpasid: bind guest pasid and mm
> + * @sva_unbind_gpasid: unbind guest pasid and mm
comment vs field ordering as pointed out by Jonathan on other patches
>   */
>  struct iommu_ops {
>  	bool (*capable)(enum iommu_cap);
> @@ -332,6 +335,10 @@ struct iommu_ops {
>  	int (*page_response)(struct device *dev, struct page_response_msg *msg);
>  	int (*cache_invalidate)(struct iommu_domain *domain, struct device *dev,
>  				struct iommu_cache_invalidate_info *inv_info);
> +	int (*sva_bind_gpasid)(struct iommu_domain *domain,
> +			struct device *dev, struct gpasid_bind_data *data);
> +
> +	int (*sva_unbind_gpasid)(struct device *dev, int pasid);
>  
>  	unsigned long pgsize_bitmap;
>  };
> @@ -447,6 +454,10 @@ extern void iommu_detach_pasid_table(struct iommu_domain *domain);
>  extern int iommu_cache_invalidate(struct iommu_domain *domain,
>  				  struct device *dev,
>  				  struct iommu_cache_invalidate_info *inv_info);
> +extern int iommu_sva_bind_gpasid(struct iommu_domain *domain,
> +		struct device *dev, struct gpasid_bind_data *data);
> +extern int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
> +				struct device *dev, ioasid_t pasid);
>  extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
>  extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
>  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
> @@ -998,6 +1009,16 @@ iommu_cache_invalidate(struct iommu_domain *domain,
>  {
>  	return -ENODEV;
>  }
> +static inline int iommu_sva_bind_gpasid(struct iommu_domain *domain,
> +				struct device *dev, struct gpasid_bind_data *data)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline int sva_unbind_gpasid(struct device *dev, int pasid)
> +{
> +	return -ENODEV;
> +}
>  
>  #endif /* CONFIG_IOMMU_API */
>  
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index ca4b753..a9cdc63 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -277,4 +277,62 @@ struct iommu_cache_invalidate_info {
>  	};
>  };
>  
> +/**
> + * struct gpasid_bind_data_vtd - Intel VT-d specific data on device and guest
> + * SVA binding.
> + *
> + * @flags:	VT-d PASID table entry attributes
> + * @pat:	Page attribute table data to compute effective memory type
> + * @emt:	Extended memory type
> + *
> + * Only guest vIOMMU selectable and effective options are passed down to
> + * the host IOMMU.
> + */
> +struct gpasid_bind_data_vtd {
> +#define IOMMU_SVA_VTD_GPASID_SRE	(1 << 0) /* supervisor request */
> +#define IOMMU_SVA_VTD_GPASID_EAFE	(1 << 1) /* extended access enable */
> +#define IOMMU_SVA_VTD_GPASID_PCD	(1 << 2) /* page-level cache disable */
> +#define IOMMU_SVA_VTD_GPASID_PWT	(1 << 3) /* page-level write through */
> +#define IOMMU_SVA_VTD_GPASID_EMTE	(1 << 4) /* extended mem type enable */
> +#define IOMMU_SVA_VTD_GPASID_CD		(1 << 5) /* PASID-level cache disable */
> +	__u64 flags;
> +	__u32 pat;
> +	__u32 emt;
> +};
> +
> +/**
> + * struct gpasid_bind_data - Information about device and guest PASID binding
> + * @version:	Version of this data structure
> + * @format:	PASID table entry format
> + * @flags:	Additional information on guest bind request
> + * @gpgd:	Guest page directory base of the guest mm to bind
> + * @hpasid:	Process address space ID used for the guest mm in host IOMMU
> + * @gpasid:	Process address space ID used for the guest mm in guest IOMMU
> + * @addr_width:	Guest virtual address width> + * @vtd:	Intel VT-d specific data
> + *
> + * Guest to host PASID mapping can be an identity or non-identity, where guest
> + * has its own PASID space. For non-identify mapping, guest to host PASID lookup
> + * is needed when VM programs guest PASID into an assigned device. VMM may
> + * trap such PASID programming then request host IOMMU driver to convert guest
> + * PASID to host PASID based on this bind data.
Sorry I don't get when you have separate PASID space as I understand you
eventually allocate the guest PASID with the host allocator (though
paravirt)
> + */
> +struct gpasid_bind_data {
other structs in iommu.h are prefixed with iommu_?
> +#define IOMMU_GPASID_BIND_VERSION_1	1
> +	__u32 version;
> +#define IOMMU_PASID_FORMAT_INTEL_VTD	1
> +	__u32 format;
> +#define IOMMU_SVA_GPASID_VAL	(1 << 0) /* guest PASID valid */
> +	__u64 flags;
> +	__u64 gpgd;
> +	__u64 hpasid;> +	__u64 gpasid;
> +	__u32 addr_width;
> +	__u8  padding[4];
> +	/* Vendor specific data */
> +	union {
> +		struct gpasid_bind_data_vtd vtd;
> +	};
vtd is not used in the series if I am not wrong
> +};
> +
>  #endif /* _UAPI_IOMMU_H */
> 
Thanks

Eric

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 20/22] iommu/vt-d: Add bind guest PASID support
  2019-06-09 13:44 ` [PATCH v4 20/22] iommu/vt-d: Add bind guest PASID support Jacob Pan
  2019-06-18 16:44   ` Jonathan Cameron
  2019-06-27  2:50   ` Lu Baolu
@ 2019-07-16 16:45   ` Auger Eric
  2019-07-16 17:04     ` Raj, Ashok
  2 siblings, 1 reply; 64+ messages in thread
From: Auger Eric @ 2019-07-16 16:45 UTC (permalink / raw)
  To: Jacob Pan, iommu, LKML, Joerg Roedel, David Woodhouse,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko, Yi L

Hi Jacob,

On 6/9/19 3:44 PM, Jacob Pan wrote:
> When supporting guest SVA with emulated IOMMU, the guest PASID
> table is shadowed in VMM. Updates to guest vIOMMU PASID table
> will result in PASID cache flush which will be passed down to
> the host as bind guest PASID calls.
> 
> For the SL page tables, it will be harvested from device's
> default domain (request w/o PASID), or aux domain in case of
> mediated device.
> 
>     .-------------.  .---------------------------.
>     |   vIOMMU    |  | Guest process CR3, FL only|
>     |             |  '---------------------------'
>     .----------------/
>     | PASID Entry |--- PASID cache flush -
>     '-------------'                       |
>     |             |                       V
>     |             |                CR3 in GPA
>     '-------------'
> Guest
> ------| Shadow |--------------------------|--------
>       v        v                          v
> Host
>     .-------------.  .----------------------.
>     |   pIOMMU    |  | Bind FL for GVA-GPA  |
>     |             |  '----------------------'
>     .----------------/  |
>     | PASID Entry |     V (Nested xlate)
>     '----------------\.------------------------------.
>     |             |   |SL for GPA-HPA, default domain|
>     |             |   '------------------------------'
>     '-------------'
> Where:
>  - FL = First level/stage one page tables
>  - SL = Second level/stage two page tables
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> ---
>  drivers/iommu/intel-iommu.c |   4 +
>  drivers/iommu/intel-svm.c   | 187 ++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/intel-iommu.h |  13 ++-
>  include/linux/intel-svm.h   |  17 ++++
>  4 files changed, 219 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 7cfa0eb..3b4d712 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5782,6 +5782,10 @@ const struct iommu_ops intel_iommu_ops = {
>  	.dev_enable_feat	= intel_iommu_dev_enable_feat,
>  	.dev_disable_feat	= intel_iommu_dev_disable_feat,
>  	.pgsize_bitmap		= INTEL_IOMMU_PGSIZES,
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> +	.sva_bind_gpasid	= intel_svm_bind_gpasid,
> +	.sva_unbind_gpasid	= intel_svm_unbind_gpasid,
> +#endif
>  };
>  
>  static void quirk_iommu_g4x_gfx(struct pci_dev *dev)
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 66d98e1..f06a82f 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -229,6 +229,193 @@ static LIST_HEAD(global_svm_list);
>  	list_for_each_entry(sdev, &svm->devs, list)	\
>  	if (dev == sdev->dev)				\
>  
> +int intel_svm_bind_gpasid(struct iommu_domain *domain,
> +			struct device *dev,
> +			struct gpasid_bind_data *data)
> +{
> +	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> +	struct intel_svm_dev *sdev;
> +	struct intel_svm *svm = NULL;
not requested
> +	struct dmar_domain *ddomain;
> +	int ret = 0;
> +
> +	if (WARN_ON(!iommu) || !data)
> +		return -EINVAL;
> +
> +	if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
> +		data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
> +		return -EINVAL;
> +
> +	if (dev_is_pci(dev)) {
> +		/* VT-d supports devices with full 20 bit PASIDs only */
> +		if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX)
> +			return -EINVAL;
> +	}
> +
> +	/*
> +	 * We only check host PASID range, we have no knowledge to check
> +	 * guest PASID range nor do we use the guest PASID.
guest pasid is set below in svm->gpasid.
So I am confused, do you handle gpasid or not? If you don't use gpasid
at the moment, then you may return -EINVAL if data->flags &
IOMMU_SVA_GPASID_VAL.

I confess I don't really understand gpasid as I thought you use
enlightened PASID allocation to use a system wide PASID allocator on
host so in which case guest and host PASID do differ?
> +	 */
> +	if (data->hpasid <= 0 || data->hpasid >= PASID_MAX)
> +		return -EINVAL;
> +
> +	ddomain = to_dmar_domain(domain);
> +	/* REVISIT:
> +	 * Sanity check adddress width and paging mode support
> +	 * width matching in two dimensions:
> +	 * 1. paging mode CPU <= IOMMU
> +	 * 2. address width Guest <= Host.
> +	 */
> +	mutex_lock(&pasid_mutex);
> +	svm = ioasid_find(NULL, data->hpasid, NULL);
> +	if (IS_ERR(svm)) {
> +		ret = PTR_ERR(svm);
> +		goto out;
> +	}
> +	if (svm) {
> +		/*
> +		 * If we found svm for the PASID, there must be at
> +		 * least one device bond, otherwise svm should be freed.
> +		 */
> +		BUG_ON(list_empty(&svm->devs));
> +
> +		for_each_svm_dev() {
> +			/* In case of multiple sub-devices of the same pdev assigned, we should
> +			 * allow multiple bind calls with the same PASID and pdev.
> +			 */
> +			sdev->users++;
> +			goto out;
> +		}
> +	} else {
> +		/* We come here when PASID has never been bond to a device. */
> +		svm = kzalloc(sizeof(*svm), GFP_KERNEL);
> +		if (!svm) {
> +			ret = -ENOMEM;
> +			goto out;
> +		}
> +		/* REVISIT: upper layer/VFIO can track host process that bind the PASID.
> +		 * ioasid_set = mm might be sufficient for vfio to check pasid VMM
> +		 * ownership.
> +		 */
> +		svm->mm = get_task_mm(current);
> +		svm->pasid = data->hpasid;
> +		if (data->flags & IOMMU_SVA_GPASID_VAL) {
> +			svm->gpasid = data->gpasid;
> +			svm->flags &= SVM_FLAG_GUEST_PASID;
I guest you mean |= ?
> +		}
> +		refcount_set(&svm->refs, 0);
> +		ioasid_set_data(data->hpasid, svm);
> +		INIT_LIST_HEAD_RCU(&svm->devs);
> +		INIT_LIST_HEAD(&svm->list);
> +
> +		mmput(svm->mm);
> +	}
> +	sdev = kzalloc(sizeof(*sdev), GFP_KERNEL);
> +	if (!sdev) {
> +		ret = -ENOMEM;
> +		goto out;
from now on, if any error occurs below and if you've just created the
svm object, no sdev will be attached to the svm object => opposed to
comment above and likely to hit BUG_ON() on next call.
> +	}
> +	sdev->dev = dev;
> +	sdev->users = 1;
> +
> +	/* Set up device context entry for PASID if not enabled already */
> +	ret = intel_iommu_enable_pasid(iommu, sdev->dev);
> +	if (ret) {
> +		dev_err(dev, "Failed to enable PASID capability\n");
> +		kfree(sdev);
> +		goto out;
> +	}
> +
> +	/*
> +	 * For guest bind, we need to set up PASID table entry as follows:
> +	 * - FLPM matches guest paging mode
> +	 * - turn on nested mode
> +	 * - SL guest address width matching
> +	 */
> +	ret = intel_pasid_setup_nested(iommu,
> +				dev,
> +				(pgd_t *)data->gpgd,
> +				data->hpasid,
> +				data->flags,
> +				ddomain,
> +				data->addr_width);
So at the moment you don't seem to use the data->vtd field and pass it
to intel_pasid_setup_nested(). Don't you need to perform some sanity
checks on some fields against the extended capability reg?


wouldn't it be simpler to pass a struct gpasid_bind_data pointer to
intel_pasid_setup_nested.
> +	if (ret) {
> +		dev_err(dev, "Failed to set up PASID %llu in nested mode, Err %d\n",
> +			data->hpasid, ret);
> +		kfree(sdev);
> +		goto out;
> +	}
> +	svm->flags |= SVM_FLAG_GUEST_MODE;
> +
> +	init_rcu_head(&sdev->rcu);
> +	refcount_inc(&svm->refs);
> +	list_add_rcu(&sdev->list, &svm->devs);
> + out:
> +	mutex_unlock(&pasid_mutex);
> +	return ret;
> +}
> +
> +int intel_svm_unbind_gpasid(struct device *dev, int pasid)
> +{
> +	struct intel_svm_dev *sdev;
> +	struct intel_iommu *iommu;
> +	struct intel_svm *svm;
> +	int ret = -EINVAL;
> +
> +	mutex_lock(&pasid_mutex);
> +	iommu = intel_svm_device_to_iommu(dev);
> +	if (!iommu)
> +		goto out;
> +
> +	svm = ioasid_find(NULL, pasid, NULL);
> +	if (IS_ERR(svm)) {
> +		ret = PTR_ERR(svm);
> +		goto out;
> +	}
> +
> +	if (!svm)
> +		goto out;
IS_ERR_OR_NULL as advised by Lu in another patch
> +
> +	for_each_svm_dev() {
> +		ret = 0;
> +		sdev->users--;
> +		if (!sdev->users) {
> +			list_del_rcu(&sdev->list);
> +			intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
> +			/* TODO: Drain in flight PRQ for the PASID since it
> +			 * may get reused soon, we don't want to
> +			 * confuse with its previous live.
life
> +			 * intel_svm_drain_prq(dev, pasid);
> +			 */
> +			kfree_rcu(sdev, rcu);
> +
> +			if (list_empty(&svm->devs)) {
> +				list_del(&svm->list);
> +				kfree(svm);
> +				/*
> +				 * We do not free PASID here until explicit call
> +				 * from VFIO to free. The PASID life cycle
> +				 * management is largely tied to VFIO management
> +				 * of assigned device life cycles. In case of
> +				 * guest exit without a explicit free PASID call,
> +				 * the responsibility lies in VFIO layer to free
> +				 * the PASIDs allocated for the guest.
> +				 * For security reasons, VFIO has to track the
> +				 * PASID ownership per guest anyway to ensure
> +				 * that PASID allocated by one guest cannot be
> +				 * used by another.
> +				 */
> +				ioasid_set_data(pasid, NULL);
> +			}
> +		}
> +		break;
> +	}
> + out:
> +	mutex_unlock(&pasid_mutex);
> +
> +	return ret;
> +}
> +
>  int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ops *ops)
>  {
>  	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index b75f17d..94d3a9a 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -677,7 +677,9 @@ int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct device *dev);
>  int intel_svm_init(struct intel_iommu *iommu);
>  extern int intel_svm_enable_prq(struct intel_iommu *iommu);
>  extern int intel_svm_finish_prq(struct intel_iommu *iommu);
> -
> +extern int intel_svm_bind_gpasid(struct iommu_domain *domain,
> +		struct device *dev, struct gpasid_bind_data *data);
> +extern int intel_svm_unbind_gpasid(struct device *dev, int pasid);
>  struct svm_dev_ops;
>  
>  struct intel_svm_dev {
> @@ -693,12 +695,19 @@ struct intel_svm_dev {
>  
>  struct intel_svm {
>  	struct mmu_notifier notifier;
> -	struct mm_struct *mm;
> +	union {
> +		struct mm_struct *mm;
> +		u64 gcr3;
> +	};
not related to that patch?
>  	struct intel_iommu *iommu;
>  	int flags;
>  	int pasid;
> +	int gpasid; /* Guest PASID in case of vSVA bind with non-identity host
> +		     * to guest PASID mapping.
> +		     */
>  	struct list_head devs;
>  	struct list_head list;
> +	refcount_t refs; /* Number of devices sharing this PASID */
>  };
>  
>  extern struct intel_iommu *intel_svm_device_to_iommu(struct device *dev);
> diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
> index e3f7631..577d5df 100644
> --- a/include/linux/intel-svm.h
> +++ b/include/linux/intel-svm.h
> @@ -52,6 +52,23 @@ struct svm_dev_ops {
>   * do such IOTLB flushes automatically.
>   */
>  #define SVM_FLAG_SUPERVISOR_MODE	(1<<1)
> +/*
> + * The SVM_FLAG_GUEST_MODE flag is used when a guest process bind to a device.
> + * In this case the mm_struct is in the guest kernel or userspace, its life
> + * cycle is managed by VMM and VFIO layer. For IOMMU driver, this API provides
> + * means to bind/unbind guest CR3 with PASIDs allocated for a device.
> + */
> +#define SVM_FLAG_GUEST_MODE	(1<<2)
> +/*
> + * The SVM_FLAG_GUEST_PASID flag is used when a guest has its own PASID space,
> + * which requires guest and host PASID translation at both directions. We keep
> + * track of guest PASID in order to provide lookup service to device drivers.
> + * One such example is a physical function (PF) driver that supports mediated
> + * device (mdev) assignment. Guest programming of mdev configuration space can
> + * only be done with guest PASID, therefore PF driver needs to find the matching
> + * host PASID to program the real hardware.
> + */
> +#define SVM_FLAG_GUEST_PASID	(1<<3)
>  
>  #ifdef CONFIG_INTEL_IOMMU_SVM
>  
> 

Thanks

Eric

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 20/22] iommu/vt-d: Add bind guest PASID support
  2019-07-16 16:45   ` Auger Eric
@ 2019-07-16 17:04     ` Raj, Ashok
  2019-07-18  7:47       ` Auger Eric
  0 siblings, 1 reply; 64+ messages in thread
From: Raj, Ashok @ 2019-07-16 17:04 UTC (permalink / raw)
  To: Auger Eric
  Cc: Jacob Pan, iommu, LKML, Joerg Roedel, David Woodhouse,
	Alex Williamson, Jean-Philippe Brucker, Yi Liu, Tian, Kevin,
	Christoph Hellwig, Lu Baolu, Andriy Shevchenko, Yi L, Ashok Raj

Hi Eric

Jacob is on sabbatical, so i'll give it my best shot :-)

Yi/Kevin can jump in...

On Tue, Jul 16, 2019 at 06:45:51PM +0200, Auger Eric wrote:
> Hi Jacob,
> 
> On 6/9/19 3:44 PM, Jacob Pan wrote:
> > When supporting guest SVA with emulated IOMMU, the guest PASID
> > table is shadowed in VMM. Updates to guest vIOMMU PASID table
> > will result in PASID cache flush which will be passed down to
> > the host as bind guest PASID calls.
> > 
> > For the SL page tables, it will be harvested from device's
> > default domain (request w/o PASID), or aux domain in case of
> > mediated device.
> > 
> >     .-------------.  .---------------------------.
> >     |   vIOMMU    |  | Guest process CR3, FL only|
> >     |             |  '---------------------------'
> >     .----------------/
> >     | PASID Entry |--- PASID cache flush -
> >     '-------------'                       |
> >     |             |                       V
> >     |             |                CR3 in GPA
> >     '-------------'
> > Guest
> > ------| Shadow |--------------------------|--------
> >       v        v                          v
> > Host
> >     .-------------.  .----------------------.
> >     |   pIOMMU    |  | Bind FL for GVA-GPA  |
> >     |             |  '----------------------'
> >     .----------------/  |
> >     | PASID Entry |     V (Nested xlate)
> >     '----------------\.------------------------------.
> >     |             |   |SL for GPA-HPA, default domain|
> >     |             |   '------------------------------'
> >     '-------------'
> > Where:
> >  - FL = First level/stage one page tables
> >  - SL = Second level/stage two page tables
> > 
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
> > ---
> >  drivers/iommu/intel-iommu.c |   4 +
> >  drivers/iommu/intel-svm.c   | 187 ++++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/intel-iommu.h |  13 ++-
> >  include/linux/intel-svm.h   |  17 ++++
> >  4 files changed, 219 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> > index 7cfa0eb..3b4d712 100644
> > --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -5782,6 +5782,10 @@ const struct iommu_ops intel_iommu_ops = {
> >  	.dev_enable_feat	= intel_iommu_dev_enable_feat,
> >  	.dev_disable_feat	= intel_iommu_dev_disable_feat,
> >  	.pgsize_bitmap		= INTEL_IOMMU_PGSIZES,
> > +#ifdef CONFIG_INTEL_IOMMU_SVM
> > +	.sva_bind_gpasid	= intel_svm_bind_gpasid,
> > +	.sva_unbind_gpasid	= intel_svm_unbind_gpasid,
> > +#endif
> >  };
> >  
> >  static void quirk_iommu_g4x_gfx(struct pci_dev *dev)
> > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> > index 66d98e1..f06a82f 100644
> > --- a/drivers/iommu/intel-svm.c
> > +++ b/drivers/iommu/intel-svm.c
> > @@ -229,6 +229,193 @@ static LIST_HEAD(global_svm_list);
> >  	list_for_each_entry(sdev, &svm->devs, list)	\
> >  	if (dev == sdev->dev)				\
> >  
> > +int intel_svm_bind_gpasid(struct iommu_domain *domain,
> > +			struct device *dev,
> > +			struct gpasid_bind_data *data)
> > +{
> > +	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> > +	struct intel_svm_dev *sdev;
> > +	struct intel_svm *svm = NULL;
> not requested
> > +	struct dmar_domain *ddomain;
> > +	int ret = 0;
> > +
> > +	if (WARN_ON(!iommu) || !data)
> > +		return -EINVAL;
> > +
> > +	if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
> > +		data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
> > +		return -EINVAL;
> > +
> > +	if (dev_is_pci(dev)) {
> > +		/* VT-d supports devices with full 20 bit PASIDs only */
> > +		if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX)
> > +			return -EINVAL;
> > +	}
> > +
> > +	/*
> > +	 * We only check host PASID range, we have no knowledge to check
> > +	 * guest PASID range nor do we use the guest PASID.
> guest pasid is set below in svm->gpasid.
> So I am confused, do you handle gpasid or not? If you don't use gpasid
> at the moment, then you may return -EINVAL if data->flags &
> IOMMU_SVA_GPASID_VAL.
> 
> I confess I don't really understand gpasid as I thought you use
> enlightened PASID allocation to use a system wide PASID allocator on
> host so in which case guest and host PASID do differ?

Correct, from within the guest, we only use the enlightned allocator.

Guest PASID can be managed via Qemu, and it will also associate
guest pasid's with appropriate host pasids. Sort of how guest bdf
and host bdf are managed for page-request etc.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 20/22] iommu/vt-d: Add bind guest PASID support
  2019-07-16 17:04     ` Raj, Ashok
@ 2019-07-18  7:47       ` Auger Eric
  0 siblings, 0 replies; 64+ messages in thread
From: Auger Eric @ 2019-07-18  7:47 UTC (permalink / raw)
  To: Raj, Ashok
  Cc: Jacob Pan, iommu, LKML, Joerg Roedel, David Woodhouse,
	Alex Williamson, Jean-Philippe Brucker, Yi Liu, Tian, Kevin,
	Christoph Hellwig, Lu Baolu, Andriy Shevchenko, Yi L

Hi Ashok,

On 7/16/19 7:04 PM, Raj, Ashok wrote:
> Hi Eric
> 
> Jacob is on sabbatical, so i'll give it my best shot :-)
> 
> Yi/Kevin can jump in...
> 
> On Tue, Jul 16, 2019 at 06:45:51PM +0200, Auger Eric wrote:
>> Hi Jacob,
>>
>> On 6/9/19 3:44 PM, Jacob Pan wrote:
>>> When supporting guest SVA with emulated IOMMU, the guest PASID
>>> table is shadowed in VMM. Updates to guest vIOMMU PASID table
>>> will result in PASID cache flush which will be passed down to
>>> the host as bind guest PASID calls.
>>>
>>> For the SL page tables, it will be harvested from device's
>>> default domain (request w/o PASID), or aux domain in case of
>>> mediated device.
>>>
>>>     .-------------.  .---------------------------.
>>>     |   vIOMMU    |  | Guest process CR3, FL only|
>>>     |             |  '---------------------------'
>>>     .----------------/
>>>     | PASID Entry |--- PASID cache flush -
>>>     '-------------'                       |
>>>     |             |                       V
>>>     |             |                CR3 in GPA
>>>     '-------------'
>>> Guest
>>> ------| Shadow |--------------------------|--------
>>>       v        v                          v
>>> Host
>>>     .-------------.  .----------------------.
>>>     |   pIOMMU    |  | Bind FL for GVA-GPA  |
>>>     |             |  '----------------------'
>>>     .----------------/  |
>>>     | PASID Entry |     V (Nested xlate)
>>>     '----------------\.------------------------------.
>>>     |             |   |SL for GPA-HPA, default domain|
>>>     |             |   '------------------------------'
>>>     '-------------'
>>> Where:
>>>  - FL = First level/stage one page tables
>>>  - SL = Second level/stage two page tables
>>>
>>> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
>>> Signed-off-by: Liu, Yi L <yi.l.liu@linux.intel.com>
>>> ---
>>>  drivers/iommu/intel-iommu.c |   4 +
>>>  drivers/iommu/intel-svm.c   | 187 ++++++++++++++++++++++++++++++++++++++++++++
>>>  include/linux/intel-iommu.h |  13 ++-
>>>  include/linux/intel-svm.h   |  17 ++++
>>>  4 files changed, 219 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>>> index 7cfa0eb..3b4d712 100644
>>> --- a/drivers/iommu/intel-iommu.c
>>> +++ b/drivers/iommu/intel-iommu.c
>>> @@ -5782,6 +5782,10 @@ const struct iommu_ops intel_iommu_ops = {
>>>  	.dev_enable_feat	= intel_iommu_dev_enable_feat,
>>>  	.dev_disable_feat	= intel_iommu_dev_disable_feat,
>>>  	.pgsize_bitmap		= INTEL_IOMMU_PGSIZES,
>>> +#ifdef CONFIG_INTEL_IOMMU_SVM
>>> +	.sva_bind_gpasid	= intel_svm_bind_gpasid,
>>> +	.sva_unbind_gpasid	= intel_svm_unbind_gpasid,
>>> +#endif
>>>  };
>>>  
>>>  static void quirk_iommu_g4x_gfx(struct pci_dev *dev)
>>> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
>>> index 66d98e1..f06a82f 100644
>>> --- a/drivers/iommu/intel-svm.c
>>> +++ b/drivers/iommu/intel-svm.c
>>> @@ -229,6 +229,193 @@ static LIST_HEAD(global_svm_list);
>>>  	list_for_each_entry(sdev, &svm->devs, list)	\
>>>  	if (dev == sdev->dev)				\
>>>  
>>> +int intel_svm_bind_gpasid(struct iommu_domain *domain,
>>> +			struct device *dev,
>>> +			struct gpasid_bind_data *data)
>>> +{
>>> +	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
>>> +	struct intel_svm_dev *sdev;
>>> +	struct intel_svm *svm = NULL;
>> not requested
>>> +	struct dmar_domain *ddomain;
>>> +	int ret = 0;
>>> +
>>> +	if (WARN_ON(!iommu) || !data)
>>> +		return -EINVAL;
>>> +
>>> +	if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
>>> +		data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
>>> +		return -EINVAL;
>>> +
>>> +	if (dev_is_pci(dev)) {
>>> +		/* VT-d supports devices with full 20 bit PASIDs only */
>>> +		if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX)
>>> +			return -EINVAL;
>>> +	}
>>> +
>>> +	/*
>>> +	 * We only check host PASID range, we have no knowledge to check
>>> +	 * guest PASID range nor do we use the guest PASID.
>> guest pasid is set below in svm->gpasid.
>> So I am confused, do you handle gpasid or not? If you don't use gpasid
>> at the moment, then you may return -EINVAL if data->flags &
>> IOMMU_SVA_GPASID_VAL.
>>
>> I confess I don't really understand gpasid as I thought you use
>> enlightened PASID allocation to use a system wide PASID allocator on
>> host so in which case guest and host PASID do differ?
> 
> Correct, from within the guest, we only use the enlightned allocator.
> 
> Guest PASID can be managed via Qemu, and it will also associate
> guest pasid's with appropriate host pasids. Sort of how guest bdf
> and host bdf are managed for page-request etc.

OK I will look at the QEMU side then.

Thanks

Eric
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 21/22] iommu/vt-d: Support flushing more translation cache types
  2019-06-09 13:44 ` [PATCH v4 21/22] iommu/vt-d: Support flushing more translation cache types Jacob Pan
@ 2019-07-18  8:35   ` Auger Eric
  2019-08-14 20:17     ` Jacob Pan
  0 siblings, 1 reply; 64+ messages in thread
From: Auger Eric @ 2019-07-18  8:35 UTC (permalink / raw)
  To: Jacob Pan, iommu, LKML, Joerg Roedel, David Woodhouse,
	Alex Williamson, Jean-Philippe Brucker
  Cc: Yi Liu, Tian, Kevin, Raj Ashok, Christoph Hellwig, Lu Baolu,
	Andriy Shevchenko

Hi Jacob,

On 6/9/19 3:44 PM, Jacob Pan wrote:
> When Shared Virtual Memory is exposed to a guest via vIOMMU, scalable
> IOTLB invalidation may be passed down from outside IOMMU subsystems.
> This patch adds invalidation functions that can be used for additional
> translation cache types.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
>  drivers/iommu/dmar.c        | 50 +++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/intel-iommu.h | 21 +++++++++++++++----
>  2 files changed, 67 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index 6d969a1..0cda6fb 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -1357,6 +1357,21 @@ void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
>  	qi_submit_sync(&desc, iommu);
>  }
>  
> +/* PASID-based IOTLB Invalidate */
> +void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u64 addr, u32 pasid,
> +		unsigned int size_order, u64 granu, int ih)
> +{
> +	struct qi_desc desc;
nit: you could also init to {};
> +
> +	desc.qw0 = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) |
> +		QI_EIOTLB_GRAN(granu) | QI_EIOTLB_TYPE;
> +	desc.qw1 = QI_EIOTLB_ADDR(addr) | QI_EIOTLB_IH(ih) |
> +		QI_EIOTLB_AM(size_order);
> +	desc.qw2 = 0;
> +	desc.qw3 = 0;
> +	qi_submit_sync(&desc, iommu);
> +}
> +
>  void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
>  			u16 qdep, u64 addr, unsigned mask)
>  {
> @@ -1380,6 +1395,41 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
>  	qi_submit_sync(&desc, iommu);
>  }
>  
> +/* PASID-based device IOTLB Invalidate */
> +void qi_flush_dev_piotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
> +		u32 pasid,  u16 qdep, u64 addr, unsigned size, u64 granu)
s/size/size_order
> +{
> +	struct qi_desc desc;
> +
> +	desc.qw0 = QI_DEV_EIOTLB_PASID(pasid) | QI_DEV_EIOTLB_SID(sid) |
> +		QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE |
> +		QI_DEV_IOTLB_PFSID(pfsid);
maybe add a comment to remind MIP hint is sent to 0 as of now.
> +	desc.qw1 = QI_DEV_EIOTLB_GLOB(granu);
> +
> +	/* If S bit is 0, we only flush a single page. If S bit is set,
> +	 * The least significant zero bit indicates the size. VT-d spec
> +	 * 6.5.2.6
> +	 */
> +	if (!size)
> +		desc.qw0 |= QI_DEV_EIOTLB_ADDR(addr) & ~QI_DEV_EIOTLB_SIZE;
this is qw1.
> +	else {
> +		unsigned long mask = 1UL << (VTD_PAGE_SHIFT + size);
don't you miss a "- 1 " here?
> +
> +		desc.qw1 |= QI_DEV_EIOTLB_ADDR(addr & ~mask) | QI_DEV_EIOTLB_SIZE;
desc.qw1 |= addr & ~mask | QI_DEV_EIOTLB_SIZE;
ie. I don't think QI_DEV_EIOTLB_ADDR is useful here?


So won't the following lines do the job?

unsigned long mask = 1UL << (VTD_PAGE_SHIFT + size) -1;
desc.qw1 |= addr & ~mask;
if (size)
    desc.qw1 |= QI_DEV_EIOTLB_SIZE
> +	}
> +	qi_submit_sync(&desc, iommu);
> +}
> +
> +void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu, int pasid)
> +{
> +	struct qi_desc desc;
> +
> +	desc.qw0 = QI_PC_TYPE | QI_PC_DID(did) | QI_PC_GRAN(granu) | QI_PC_PASID(pasid);
nit: reorder the fields according to the spec, easier to check if any is
missing.
> +	desc.qw1 = 0;
> +	desc.qw2 = 0;
> +	desc.qw3 = 0;
> +	qi_submit_sync(&desc, iommu);
> +}
>  /*
>   * Disable Queued Invalidation interface.
>   */
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 94d3a9a..1cdb35b 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -339,7 +339,7 @@ enum {
>  #define QI_IOTLB_GRAN(gran) 	(((u64)gran) >> (DMA_TLB_FLUSH_GRANU_OFFSET-4))
>  #define QI_IOTLB_ADDR(addr)	(((u64)addr) & VTD_PAGE_MASK)
>  #define QI_IOTLB_IH(ih)		(((u64)ih) << 6)
> -#define QI_IOTLB_AM(am)		(((u8)am))
> +#define QI_IOTLB_AM(am)		(((u8)am) & 0x3f)
>  
>  #define QI_CC_FM(fm)		(((u64)fm) << 48)
>  #define QI_CC_SID(sid)		(((u64)sid) << 32)
> @@ -357,17 +357,22 @@ enum {
>  #define QI_PC_DID(did)		(((u64)did) << 16)
>  #define QI_PC_GRAN(gran)	(((u64)gran) << 4)
>  
> -#define QI_PC_ALL_PASIDS	(QI_PC_TYPE | QI_PC_GRAN(0))
> -#define QI_PC_PASID_SEL		(QI_PC_TYPE | QI_PC_GRAN(1))
> +/* PASID cache invalidation granu */
> +#define QI_PC_ALL_PASIDS	0
> +#define QI_PC_PASID_SEL		1
>  
>  #define QI_EIOTLB_ADDR(addr)	((u64)(addr) & VTD_PAGE_MASK)
>  #define QI_EIOTLB_GL(gl)	(((u64)gl) << 7)
>  #define QI_EIOTLB_IH(ih)	(((u64)ih) << 6)
> -#define QI_EIOTLB_AM(am)	(((u64)am))
> +#define QI_EIOTLB_AM(am)	(((u64)am) & 0x3f)
>  #define QI_EIOTLB_PASID(pasid) 	(((u64)pasid) << 32)
>  #define QI_EIOTLB_DID(did)	(((u64)did) << 16)
>  #define QI_EIOTLB_GRAN(gran) 	(((u64)gran) << 4)
>  
> +/* QI Dev-IOTLB inv granu */
> +#define QI_DEV_IOTLB_GRAN_ALL		1
> +#define QI_DEV_IOTLB_GRAN_PASID_SEL	0
> +
>  #define QI_DEV_EIOTLB_ADDR(a)	((u64)(a) & VTD_PAGE_MASK)
>  #define QI_DEV_EIOTLB_SIZE	(((u64)1) << 11)
>  #define QI_DEV_EIOTLB_GLOB(g)	((u64)g)
> @@ -658,8 +663,16 @@ extern void qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid,
>  			     u8 fm, u64 type);
>  extern void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
>  			  unsigned int size_order, u64 type);
> +extern void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u64 addr,
> +			u32 pasid, unsigned int size_order, u64 type, int ih);
>  extern void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
>  			u16 qdep, u64 addr, unsigned mask);
> +
> +extern void qi_flush_dev_piotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
> +			u32 pasid, u16 qdep, u64 addr, unsigned size, u64 granu);
s/size/size_order
> +
> +extern void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu, int pasid);
> +
>  extern int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu);
>  
>  extern int dmar_ir_support(void);
> 

Thanks

Eric

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 14/22] iommu/vt-d: Add custom allocator for IOASID
  2019-07-16  9:30   ` Auger Eric
@ 2019-08-05 20:02     ` Jacob Pan
  0 siblings, 0 replies; 64+ messages in thread
From: Jacob Pan @ 2019-08-05 20:02 UTC (permalink / raw)
  To: Auger Eric
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Alex Williamson,
	Jean-Philippe Brucker, Yi Liu, Tian, Kevin, Raj Ashok,
	Christoph Hellwig, Lu Baolu, Andriy Shevchenko, jacob.jun.pan

On Tue, 16 Jul 2019 11:30:14 +0200
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Jacob,
> 
> On 6/9/19 3:44 PM, Jacob Pan wrote:
> > When VT-d driver runs in the guest, PASID allocation must be
> > performed via virtual command interface. This patch registers a
> > custom IOASID allocator which takes precedence over the default
> > XArray based allocator. The resulting IOASID allocation will always
> > come from the host. This ensures that PASID namespace is system-
> > wide.
> > 
> > Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> > Signed-off-by: Liu, Yi L <yi.l.liu@intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> >  drivers/iommu/Kconfig       |  1 +
> >  drivers/iommu/intel-iommu.c | 60
> > +++++++++++++++++++++++++++++++++++++++++++++
> > include/linux/intel-iommu.h |  2 ++ 3 files changed, 63
> > insertions(+)
> > 
> > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > index c40c4b5..5d1bc2a 100644
> > --- a/drivers/iommu/Kconfig
> > +++ b/drivers/iommu/Kconfig
> > @@ -213,6 +213,7 @@ config INTEL_IOMMU_SVM
> >  	bool "Support for Shared Virtual Memory with Intel IOMMU"
> >  	depends on INTEL_IOMMU && X86
> >  	select PCI_PASID
> > +	select IOASID
> >  	select MMU_NOTIFIER
> >  	select IOASID
> >  	help
> > diff --git a/drivers/iommu/intel-iommu.c
> > b/drivers/iommu/intel-iommu.c index 09b8ff0..5b84994 100644
> > --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -1711,6 +1711,8 @@ static void free_dmar_iommu(struct
> > intel_iommu *iommu) if (ecap_prs(iommu->ecap))
> >  			intel_svm_finish_prq(iommu);
> >  	}
> > +	ioasid_unregister_allocator(&iommu->pasid_allocator);
> > +
> >  #endif
> >  }
> >  
> > @@ -4820,6 +4822,46 @@ static int __init
> > platform_optin_force_iommu(void) return 1;
> >  }
> >  
> > +#ifdef CONFIG_INTEL_IOMMU_SVM
> > +static ioasid_t intel_ioasid_alloc(ioasid_t min, ioasid_t max,
> > void *data) +{
> > +	struct intel_iommu *iommu = data;
> > +	ioasid_t ioasid;
> > +
> > +	/*
> > +	 * VT-d virtual command interface always uses the full 20
> > bit
> > +	 * PASID range. Host can partition guest PASID range based
> > on
> > +	 * policies but it is out of guest's control.
> > +	 */
> > +	if (min < PASID_MIN || max > PASID_MAX)
> > +		return -EINVAL;  
> ioasid_alloc() does not handle that error value, use INVALID_IOASID?
> > +
> > +	if (vcmd_alloc_pasid(iommu, &ioasid))
> > +		return INVALID_IOASID;
> > +
> > +	return ioasid;
> > +}
> > +
> > +static void intel_ioasid_free(ioasid_t ioasid, void *data)
> > +{
> > +	struct iommu_pasid_alloc_info *svm;
> > +	struct intel_iommu *iommu = data;
> > +
> > +	if (!iommu)
> > +		return;
> > +	/*
> > +	 * Sanity check the ioasid owner is done at upper layer,
> > e.g. VFIO
> > +	 * We can only free the PASID when all the devices are
> > unbond.
> > +	 */
> > +	svm = ioasid_find(NULL, ioasid, NULL);
> > +	if (!svm) {
> > +		pr_warn("Freeing unbond IOASID %d\n", ioasid);
> > +		return;
> > +	}
> > +	vcmd_free_pasid(iommu, ioasid);
> > +}
> > +#endif
> > +
> >  int __init intel_iommu_init(void)
> >  {
> >  	int ret = -ENODEV;
> > @@ -4924,6 +4966,24 @@ int __init intel_iommu_init(void)
> >  				       "%s", iommu->name);
> >  		iommu_device_set_ops(&iommu->iommu,
> > &intel_iommu_ops); iommu_device_register(&iommu->iommu);
> > +#ifdef CONFIG_INTEL_IOMMU_SVM
> > +		if (cap_caching_mode(iommu->cap) &&
> > sm_supported(iommu)) {
> > +			/*
> > +			 * Register a custom ASID allocator if we
> > are running
> > +			 * in a guest, the purpose is to have a
> > system wide PASID
> > +			 * namespace among all PASID users.
> > +			 * There can be multiple vIOMMUs in each
> > guest but only
> > +			 * one allocator is active. All vIOMMU
> > allocators will
> > +			 * eventually be calling the same host
> > allocator.
> > +			 */
> > +			iommu->pasid_allocator.alloc =
> > intel_ioasid_alloc;
> > +			iommu->pasid_allocator.free =
> > intel_ioasid_free;
> > +			iommu->pasid_allocator.pdata = (void
> > *)iommu;
> > +			ret =
> > ioasid_register_allocator(&iommu->pasid_allocator);
> > +			if (ret)
> > +				pr_warn("Custom PASID allocator
> > registeration failed\n");  
> what if it fails, don't you want a tear down path?
> 

Good point, we need to disable PASID usage, i.e. disable scalable mode
if there is no virtual command based PASID allocator in the guest.

Sorry for the late reply.

> Thanks
> 
> Eric
>  [...]  

[Jacob Pan]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 11/22] iommu: Introduce guest PASID bind function
  2019-07-16 16:44   ` Auger Eric
@ 2019-08-05 21:02     ` Jacob Pan
  2019-08-05 23:13     ` Jacob Pan
  1 sibling, 0 replies; 64+ messages in thread
From: Jacob Pan @ 2019-08-05 21:02 UTC (permalink / raw)
  To: Auger Eric
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Alex Williamson,
	Jean-Philippe Brucker, Yi Liu, Tian, Kevin, Raj Ashok,
	Christoph Hellwig, Lu Baolu, Andriy Shevchenko, jacob.jun.pan

On Tue, 16 Jul 2019 18:44:56 +0200
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Jacob,
> 
> On 6/9/19 3:44 PM, Jacob Pan wrote:
> > Guest shared virtual address (SVA) may require host to shadow guest
> > PASID tables. Guest PASID can also be allocated from the host via
> > enlightened interfaces. In this case, guest needs to bind the guest
> > mm, i.e. cr3 in guest physical address to the actual PASID table in
> > the host IOMMU. Nesting will be turned on such that guest virtual
> > address can go through a two level translation:
> > - 1st level translates GVA to GPA
> > - 2nd level translates GPA to HPA
> > This patch introduces APIs to bind guest PASID data to the assigned
> > device entry in the physical IOMMU. See the diagram below for usage
> > explaination.
> > 
> >     .-------------.  .---------------------------.
> >     |   vIOMMU    |  | Guest process mm, FL only |
> >     |             |  '---------------------------'
> >     .----------------/
> >     | PASID Entry |--- PASID cache flush -
> >     '-------------'                       |
> >     |             |                       V
> >     |             |                      GP
> >     '-------------'
> > Guest
> > ------| Shadow |----------------------- GP->HP* ---------
> >       v        v                          |
> > Host                                      v
> >     .-------------.  .----------------------.
> >     |   pIOMMU    |  | Bind FL for GVA-GPA  |
> >     |             |  '----------------------'
> >     .----------------/  |
> >     | PASID Entry |     V (Nested xlate)
> >     '----------------\.---------------------.
> >     |             |   |Set SL to GPA-HPA    |
> >     |             |   '---------------------'
> >     '-------------'
> > 
> > Where:
> >  - FL = First level/stage one page tables
> >  - SL = Second level/stage two page tables
> >  - GP = Guest PASID
> >  - HP = Host PASID
> > * Conversion needed if non-identity GP-HP mapping option is chosen.
> > 
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > ---
> >  drivers/iommu/iommu.c      | 20 ++++++++++++++++
> >  include/linux/iommu.h      | 21 +++++++++++++++++
> >  include/uapi/linux/iommu.h | 58
> > ++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 99
> > insertions(+)
> > 
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index 1758b57..d0416f60 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -1648,6 +1648,26 @@ int iommu_cache_invalidate(struct
> > iommu_domain *domain, struct device *dev, }
> >  EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
> >  
> > +int iommu_sva_bind_gpasid(struct iommu_domain *domain,
> > +			struct device *dev, struct
> > gpasid_bind_data *data) +{
> > +	if (unlikely(!domain->ops->sva_bind_gpasid))
> > +		return -ENODEV;
> > +
> > +	return domain->ops->sva_bind_gpasid(domain, dev, data);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_sva_bind_gpasid);
> > +
> > +int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct
> > device *dev,
> > +			ioasid_t pasid)
> > +{
> > +	if (unlikely(!domain->ops->sva_unbind_gpasid))
> > +		return -ENODEV;
> > +
> > +	return domain->ops->sva_unbind_gpasid(dev, pasid);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid);
> > +
> >  static void __iommu_detach_device(struct iommu_domain *domain,
> >  				  struct device *dev)
> >  {
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 8d766a8..560c8c8 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -25,6 +25,7 @@
> >  #include <linux/errno.h>
> >  #include <linux/err.h>
> >  #include <linux/of.h>
> > +#include <linux/ioasid.h>
> >  #include <uapi/linux/iommu.h>
> >  
> >  #define IOMMU_READ	(1 << 0)
> > @@ -267,6 +268,8 @@ struct page_response_msg {
> >   * @detach_pasid_table: detach the pasid table
> >   * @cache_invalidate: invalidate translation caches
> >   * @pgsize_bitmap: bitmap of all possible supported page sizes
> > + * @sva_bind_gpasid: bind guest pasid and mm
> > + * @sva_unbind_gpasid: unbind guest pasid and mm  
> comment vs field ordering as pointed out by Jonathan on other patches
> >   */
> >  struct iommu_ops {
> >  	bool (*capable)(enum iommu_cap);
> > @@ -332,6 +335,10 @@ struct iommu_ops {
> >  	int (*page_response)(struct device *dev, struct
> > page_response_msg *msg); int (*cache_invalidate)(struct
> > iommu_domain *domain, struct device *dev, struct
> > iommu_cache_invalidate_info *inv_info);
> > +	int (*sva_bind_gpasid)(struct iommu_domain *domain,
> > +			struct device *dev, struct
> > gpasid_bind_data *data); +
> > +	int (*sva_unbind_gpasid)(struct device *dev, int pasid);
> >  
> >  	unsigned long pgsize_bitmap;
> >  };
> > @@ -447,6 +454,10 @@ extern void iommu_detach_pasid_table(struct
> > iommu_domain *domain); extern int iommu_cache_invalidate(struct
> > iommu_domain *domain, struct device *dev,
> >  				  struct
> > iommu_cache_invalidate_info *inv_info); +extern int
> > iommu_sva_bind_gpasid(struct iommu_domain *domain,
> > +		struct device *dev, struct gpasid_bind_data *data);
> > +extern int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
> > +				struct device *dev, ioasid_t
> > pasid); extern struct iommu_domain *iommu_get_domain_for_dev(struct
> > device *dev); extern struct iommu_domain
> > *iommu_get_dma_domain(struct device *dev); extern int
> > iommu_map(struct iommu_domain *domain, unsigned long iova, @@
> > -998,6 +1009,16 @@ iommu_cache_invalidate(struct iommu_domain
> > *domain, { return -ENODEV;
> >  }
> > +static inline int iommu_sva_bind_gpasid(struct iommu_domain
> > *domain,
> > +				struct device *dev, struct
> > gpasid_bind_data *data) +{
> > +	return -ENODEV;
> > +}
> > +
> > +static inline int sva_unbind_gpasid(struct device *dev, int pasid)
> > +{
> > +	return -ENODEV;
> > +}
> >  
> >  #endif /* CONFIG_IOMMU_API */
> >  
> > diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> > index ca4b753..a9cdc63 100644
> > --- a/include/uapi/linux/iommu.h
> > +++ b/include/uapi/linux/iommu.h
> > @@ -277,4 +277,62 @@ struct iommu_cache_invalidate_info {
> >  	};
> >  };
> >  
> > +/**
> > + * struct gpasid_bind_data_vtd - Intel VT-d specific data on
> > device and guest
> > + * SVA binding.
> > + *
> > + * @flags:	VT-d PASID table entry attributes
> > + * @pat:	Page attribute table data to compute effective
> > memory type
> > + * @emt:	Extended memory type
> > + *
> > + * Only guest vIOMMU selectable and effective options are passed
> > down to
> > + * the host IOMMU.
> > + */
> > +struct gpasid_bind_data_vtd {
> > +#define IOMMU_SVA_VTD_GPASID_SRE	(1 << 0) /* supervisor
> > request */ +#define IOMMU_SVA_VTD_GPASID_EAFE	(1 << 1) /*
> > extended access enable */ +#define IOMMU_SVA_VTD_GPASID_PCD
> > (1 << 2) /* page-level cache disable */ +#define
> > IOMMU_SVA_VTD_GPASID_PWT	(1 << 3) /* page-level write
> > through */ +#define IOMMU_SVA_VTD_GPASID_EMTE	(1 << 4) /*
> > extended mem type enable */ +#define
> > IOMMU_SVA_VTD_GPASID_CD		(1 << 5) /* PASID-level
> > cache disable */
> > +	__u64 flags;
> > +	__u32 pat;
> > +	__u32 emt;
> > +};
> > +
> > +/**
> > + * struct gpasid_bind_data - Information about device and guest
> > PASID binding
> > + * @version:	Version of this data structure
> > + * @format:	PASID table entry format
> > + * @flags:	Additional information on guest bind request
> > + * @gpgd:	Guest page directory base of the guest mm to bind
> > + * @hpasid:	Process address space ID used for the guest mm
> > in host IOMMU
> > + * @gpasid:	Process address space ID used for the guest mm
> > in guest IOMMU
> > + * @addr_width:	Guest virtual address width> + *
> > @vtd:	Intel VT-d specific data
> > + *
> > + * Guest to host PASID mapping can be an identity or non-identity,
> > where guest
> > + * has its own PASID space. For non-identify mapping, guest to
> > host PASID lookup
> > + * is needed when VM programs guest PASID into an assigned device.
> > VMM may
> > + * trap such PASID programming then request host IOMMU driver to
> > convert guest
> > + * PASID to host PASID based on this bind data.  
> Sorry I don't get when you have separate PASID space as I understand
> you eventually allocate the guest PASID with the host allocator
> (though paravirt)
For the current version it is true that we eventually allocate PASID
from the host. But in the future, we also want to support non-identical
host-guest PASID mapping. So want to leave space in the API to support
such case in the future.

> > + */
> > +struct gpasid_bind_data {  
> other structs in iommu.h are prefixed with iommu_?
> > +#define IOMMU_GPASID_BIND_VERSION_1	1
> > +	__u32 version;
> > +#define IOMMU_PASID_FORMAT_INTEL_VTD	1
> > +	__u32 format;
> > +#define IOMMU_SVA_GPASID_VAL	(1 << 0) /* guest PASID valid
> > */
> > +	__u64 flags;
> > +	__u64 gpgd;
> > +	__u64 hpasid;> +	__u64 gpasid;
> > +	__u32 addr_width;
> > +	__u8  padding[4];
> > +	/* Vendor specific data */
> > +	union {
> > +		struct gpasid_bind_data_vtd vtd;
> > +	};  
> vtd is not used in the series if I am not wrong
you are right. we had some internal discussion on what PASID entry
fields are actively used in the native case, whether they need to be
passed down or always set by the driver. It will be used in v5.

Thanks,

> > +};
> > +
> >  #endif /* _UAPI_IOMMU_H */
> >   
> Thanks
> 
> Eric

[Jacob Pan]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 11/22] iommu: Introduce guest PASID bind function
  2019-07-16 16:44   ` Auger Eric
  2019-08-05 21:02     ` Jacob Pan
@ 2019-08-05 23:13     ` Jacob Pan
  1 sibling, 0 replies; 64+ messages in thread
From: Jacob Pan @ 2019-08-05 23:13 UTC (permalink / raw)
  To: Auger Eric
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Alex Williamson,
	Jean-Philippe Brucker, Yi Liu, Tian, Kevin, Raj Ashok,
	Christoph Hellwig, Lu Baolu, Andriy Shevchenko, jacob.jun.pan

On Tue, 16 Jul 2019 18:44:56 +0200
Auger Eric <eric.auger@redhat.com> wrote:

> > +struct gpasid_bind_data {  
> other structs in iommu.h are prefixed with iommu_?
Good point, will add iommu_ prefix.

Thanks,

Jacob

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 13/22] iommu/vt-d: Enlightened PASID allocation
  2019-07-16  9:29   ` Auger Eric
@ 2019-08-13 16:57     ` Jacob Pan
  0 siblings, 0 replies; 64+ messages in thread
From: Jacob Pan @ 2019-08-13 16:57 UTC (permalink / raw)
  To: Auger Eric
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Alex Williamson,
	Jean-Philippe Brucker, Yi Liu, Tian, Kevin, Raj Ashok,
	Christoph Hellwig, Lu Baolu, Andriy Shevchenko, jacob.jun.pan

Hi Eric,

Apologize for the delayed response below,

On Tue, 16 Jul 2019 11:29:30 +0200
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Jacob,
> On 6/9/19 3:44 PM, Jacob Pan wrote:
> > From: Lu Baolu <baolu.lu@linux.intel.com>
> > 
> > If Intel IOMMU runs in caching mode, a.k.a. virtual IOMMU, the
> > IOMMU driver should rely on the emulation software to allocate
> > and free PASID IDs. The Intel vt-d spec revision 3.0 defines a
> > register set to support this. This includes a capability register,
> > a virtual command register and a virtual response register. Refer
> > to section 10.4.42, 10.4.43, 10.4.44 for more information.
> > 
> > This patch adds the enlightened PASID allocation/free interfaces
> > via the virtual command register.>
> > Cc: Ashok Raj <ashok.raj@intel.com>
> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com>
> > Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> >  drivers/iommu/intel-pasid.c | 76
> > +++++++++++++++++++++++++++++++++++++++++++++
> > drivers/iommu/intel-pasid.h | 13 +++++++-
> > include/linux/intel-iommu.h |  2 ++ 3 files changed, 90
> > insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/iommu/intel-pasid.c
> > b/drivers/iommu/intel-pasid.c index 2fefeaf..69fddd3 100644
> > --- a/drivers/iommu/intel-pasid.c
> > +++ b/drivers/iommu/intel-pasid.c
> > @@ -63,6 +63,82 @@ void *intel_pasid_lookup_id(int pasid)
> >  	return p;
> >  }
> >  
> > +int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int
> > *pasid) +{
> > +	u64 res;
> > +	u64 cap;
> > +	u8 status_code;
> > +	unsigned long flags;
> > +	int ret = 0;
> > +
> > +	if (!ecap_vcs(iommu->ecap)) {
> > +		pr_warn("IOMMU: %s: Hardware doesn't support
> > virtual command\n",
> > +			iommu->name);
> > +		return -ENODEV;
> > +	}
> > +
> > +	cap = dmar_readq(iommu->reg + DMAR_VCCAP_REG);
> > +	if (!(cap & DMA_VCS_PAS)) {
> > +		pr_warn("IOMMU: %s: Emulation software doesn't
> > support PASID allocation\n",
> > +			iommu->name);
> > +		return -ENODEV;
> > +	}
> > +
> > +	raw_spin_lock_irqsave(&iommu->register_lock, flags);
> > +	dmar_writeq(iommu->reg + DMAR_VCMD_REG, VCMD_CMD_ALLOC);
> > +	IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
> > +		      !(res & VCMD_VRSP_IP), res);
> > +	raw_spin_unlock_irqrestore(&iommu->register_lock, flags);
> > +
> > +	status_code = VCMD_VRSP_SC(res);
> > +	switch (status_code) {
> > +	case VCMD_VRSP_SC_SUCCESS:
> > +		*pasid = VCMD_VRSP_RESULT(res);
> > +		break;
> > +	case VCMD_VRSP_SC_NO_PASID_AVAIL:
> > +		pr_info("IOMMU: %s: No PASID available\n",
> > iommu->name);
> > +		ret = -ENOMEM;
> > +		break;
> > +	default:
> > +		ret = -ENODEV;
> > +		pr_warn("IOMMU: %s: Unkonwn error code %d\n",  
> unknown
> s/unknown/unexpected
sounds good.
> > +			iommu->name, status_code);
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid)
> > +{
> > +	u64 res;
> > +	u8 status_code;
> > +	unsigned long flags;
> > +
> > +	if (!ecap_vcs(iommu->ecap)) {
> > +		pr_warn("IOMMU: %s: Hardware doesn't support
> > virtual command\n",
> > +			iommu->name);
> > +		return;
> > +	}  
> Logically shouldn't you also check DMAR_VCCAP_REG as well?
Good point, we may have more than just PASID allocation virtual
commands.
> > +
> > +	raw_spin_lock_irqsave(&iommu->register_lock, flags);
> > +	dmar_writeq(iommu->reg + DMAR_VCMD_REG, (pasid << 8) |
> > VCMD_CMD_FREE);
> > +	IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
> > +		      !(res & VCMD_VRSP_IP), res);
> > +	raw_spin_unlock_irqrestore(&iommu->register_lock, flags);
> > +
> > +	status_code = VCMD_VRSP_SC(res);
> > +	switch (status_code) {
> > +	case VCMD_VRSP_SC_SUCCESS:
> > +		break;
> > +	case VCMD_VRSP_SC_INVALID_PASID:
> > +		pr_info("IOMMU: %s: Invalid PASID\n", iommu->name);
> > +		break;
> > +	default:
> > +		pr_warn("IOMMU: %s: Unkonwn error code %d\n",
> > +			iommu->name, status_code);  
> s/Unkonwn/Unexpected
will fix. 
> > +	}
> > +}
> > +
> >  /*
> >   * Per device pasid table management:
> >   */
> > diff --git a/drivers/iommu/intel-pasid.h
> > b/drivers/iommu/intel-pasid.h index 23537b3..4b26ab5 100644
> > --- a/drivers/iommu/intel-pasid.h
> > +++ b/drivers/iommu/intel-pasid.h
> > @@ -19,6 +19,16 @@
> >  #define PASID_PDE_SHIFT			6
> >  #define MAX_NR_PASID_BITS		20
> >  
> > +/* Virtual command interface for enlightened pasid management. */
> > +#define VCMD_CMD_ALLOC			0x1
> > +#define VCMD_CMD_FREE			0x2
> > +#define VCMD_VRSP_IP			0x1
> > +#define VCMD_VRSP_SC(e)			(((e) >> 1) & 0x3)
> > +#define VCMD_VRSP_SC_SUCCESS		0
> > +#define VCMD_VRSP_SC_NO_PASID_AVAIL	1
> > +#define VCMD_VRSP_SC_INVALID_PASID	1
> > +#define VCMD_VRSP_RESULT(e)		(((e) >> 8) & 0xfffff)
> > +
> >  /*
> >   * Domain ID reserved for pasid entries programmed for first-level
> >   * only and pass-through transfer modes.
> > @@ -69,5 +79,6 @@ int intel_pasid_setup_pass_through(struct
> > intel_iommu *iommu, struct device *dev, int pasid);
> >  void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
> >  				 struct device *dev, int pasid);
> > -
> > +int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int
> > *pasid); +void vcmd_free_pasid(struct intel_iommu *iommu, unsigned
> > int pasid); #endif /* __INTEL_PASID_H */
> > diff --git a/include/linux/intel-iommu.h
> > b/include/linux/intel-iommu.h index 6925a18..bff907b 100644
> > --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -173,6 +173,7 @@
> >  #define ecap_smpwc(e)		(((e) >> 48) & 0x1)
> >  #define ecap_flts(e)		(((e) >> 47) & 0x1)
> >  #define ecap_slts(e)		(((e) >> 46) & 0x1)
> > +#define ecap_vcs(e)		(((e) >> 44) & 0x1)
> >  #define ecap_smts(e)		(((e) >> 43) & 0x1)
> >  #define ecap_dit(e)		((e >> 41) & 0x1)
> >  #define ecap_pasid(e)		((e >> 40) & 0x1)
> > @@ -289,6 +290,7 @@
> >  
> >  /* PRS_REG */
> >  #define DMA_PRS_PPR	((u32)1)
> > +#define DMA_VCS_PAS	((u64)1)
> >  
> >  #define IOMMU_WAIT_OP(iommu, offset, op, cond,
> > sts)			\ do
> > {
> > \ 
> Otherwise
> 
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> 
> Thanks
> 
> Eric
> 

[Jacob Pan]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 20/22] iommu/vt-d: Add bind guest PASID support
  2019-07-05  2:21       ` Lu Baolu
@ 2019-08-14 17:20         ` Jacob Pan
  0 siblings, 0 replies; 64+ messages in thread
From: Jacob Pan @ 2019-08-14 17:20 UTC (permalink / raw)
  To: Lu Baolu
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Eric Auger,
	Alex Williamson, Jean-Philippe Brucker, Yi Liu, Tian, Kevin,
	Raj Ashok, Christoph Hellwig, Andriy Shevchenko, jacob.jun.pan

On Fri, 5 Jul 2019 10:21:27 +0800
Lu Baolu <baolu.lu@linux.intel.com> wrote:

> Hi Jacob,
> 
> On 6/28/19 4:22 AM, Jacob Pan wrote:
> >>> +		}
> >>> +		refcount_set(&svm->refs, 0);
> >>> +		ioasid_set_data(data->hpasid, svm);
> >>> +		INIT_LIST_HEAD_RCU(&svm->devs);
> >>> +		INIT_LIST_HEAD(&svm->list);
> >>> +
> >>> +		mmput(svm->mm);
> >>> +	}
> >>> +	sdev = kzalloc(sizeof(*sdev), GFP_KERNEL);
> >>> +	if (!sdev) {
> >>> +		ret = -ENOMEM;
> >>> +		goto out;  
> >> I think you need to clean up svm if its device list is empty here,
> >> as you said above:
> >>  
> > No, we come here only if the device list is not empty and the new
> > device to bind is different than any existing device in the list.
> > If we cannot allocate memory for the new device, should not free
> > the existing SVM, right?
> >   
> 
> I'm sorry, but the code doesn't show this. We come here even an svm
> data structure was newly created with an empty device list. I post
> the code below to ensure that we are reading a same piece of code.
> 
Sorry for the delay. You are right, I need to clean up svm if device
list is empty.

Thanks!
>          mutex_lock(&pasid_mutex);
>          svm = ioasid_find(NULL, data->hpasid, NULL);
>          if (IS_ERR(svm)) {
>                  ret = PTR_ERR(svm);
>                  goto out;
>          }
>          if (svm) {
>                  /*
>                   * If we found svm for the PASID, there must be at
>                   * least one device bond, otherwise svm should be
> freed. */
>                  BUG_ON(list_empty(&svm->devs));
> 
>                  for_each_svm_dev() {
>                          /* In case of multiple sub-devices of the
> same pdev assigned, we should
>                           * allow multiple bind calls with the same 
> PASID and pdev.
>                           */
>                          sdev->users++;
>                          goto out;
>                  }
>          } else {
>                  /* We come here when PASID has never been bond to a 
> device. */
>                  svm = kzalloc(sizeof(*svm), GFP_KERNEL);
>                  if (!svm) {
>                          ret = -ENOMEM;
>                          goto out;
>                  }
>                  /* REVISIT: upper layer/VFIO can track host process 
> that bind the PASID.
>                   * ioasid_set = mm might be sufficient for vfio to 
> check pasid VMM
>                   * ownership.
>                   */
>                  svm->mm = get_task_mm(current);
>                  svm->pasid = data->hpasid;
>                  if (data->flags & IOMMU_SVA_GPASID_VAL) {
>                          svm->gpasid = data->gpasid;
>                          svm->flags &= SVM_FLAG_GUEST_PASID;
>                  }
>                  refcount_set(&svm->refs, 0);
>                  ioasid_set_data(data->hpasid, svm);
>                  INIT_LIST_HEAD_RCU(&svm->devs);
>                  INIT_LIST_HEAD(&svm->list);
> 
>                  mmput(svm->mm);
>          }
>          sdev = kzalloc(sizeof(*sdev), GFP_KERNEL);
>          if (!sdev) {
>                  ret = -ENOMEM;
>                  goto out;
>          }
>          sdev->dev = dev;
>          sdev->users = 1;
> 
> Best regards,
> Baolu

[Jacob Pan]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v4 21/22] iommu/vt-d: Support flushing more translation cache types
  2019-07-18  8:35   ` Auger Eric
@ 2019-08-14 20:17     ` Jacob Pan
  0 siblings, 0 replies; 64+ messages in thread
From: Jacob Pan @ 2019-08-14 20:17 UTC (permalink / raw)
  To: Auger Eric
  Cc: iommu, LKML, Joerg Roedel, David Woodhouse, Alex Williamson,
	Jean-Philippe Brucker, Yi Liu, Tian, Kevin, Raj Ashok,
	Christoph Hellwig, Lu Baolu, Andriy Shevchenko, jacob.jun.pan

On Thu, 18 Jul 2019 10:35:37 +0200
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Jacob,
> 
> On 6/9/19 3:44 PM, Jacob Pan wrote:
> > When Shared Virtual Memory is exposed to a guest via vIOMMU,
> > scalable IOTLB invalidation may be passed down from outside IOMMU
> > subsystems. This patch adds invalidation functions that can be used
> > for additional translation cache types.
> > 
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> >  drivers/iommu/dmar.c        | 50
> > +++++++++++++++++++++++++++++++++++++++++++++
> > include/linux/intel-iommu.h | 21 +++++++++++++++---- 2 files
> > changed, 67 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> > index 6d969a1..0cda6fb 100644
> > --- a/drivers/iommu/dmar.c
> > +++ b/drivers/iommu/dmar.c
> > @@ -1357,6 +1357,21 @@ void qi_flush_iotlb(struct intel_iommu
> > *iommu, u16 did, u64 addr, qi_submit_sync(&desc, iommu);
> >  }
> >  
> > +/* PASID-based IOTLB Invalidate */
> > +void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u64 addr,
> > u32 pasid,
> > +		unsigned int size_order, u64 granu, int ih)
> > +{
> > +	struct qi_desc desc;  
> nit: you could also init to {};
> > +
> > +	desc.qw0 = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) |
> > +		QI_EIOTLB_GRAN(granu) | QI_EIOTLB_TYPE;
> > +	desc.qw1 = QI_EIOTLB_ADDR(addr) | QI_EIOTLB_IH(ih) |
> > +		QI_EIOTLB_AM(size_order);
> > +	desc.qw2 = 0;
> > +	desc.qw3 = 0;
> > +	qi_submit_sync(&desc, iommu);
> > +}
> > +
> >  void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16
> > pfsid, u16 qdep, u64 addr, unsigned mask)
> >  {
> > @@ -1380,6 +1395,41 @@ void qi_flush_dev_iotlb(struct intel_iommu
> > *iommu, u16 sid, u16 pfsid, qi_submit_sync(&desc, iommu);
> >  }
> >  
> > +/* PASID-based device IOTLB Invalidate */
> > +void qi_flush_dev_piotlb(struct intel_iommu *iommu, u16 sid, u16
> > pfsid,
> > +		u32 pasid,  u16 qdep, u64 addr, unsigned size, u64
> > granu)  
> s/size/size_order
> > +{
> > +	struct qi_desc desc;
> > +
> > +	desc.qw0 = QI_DEV_EIOTLB_PASID(pasid) |
> > QI_DEV_EIOTLB_SID(sid) |
> > +		QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE |
> > +		QI_DEV_IOTLB_PFSID(pfsid);  
> maybe add a comment to remind MIP hint is sent to 0 as of now.
> > +	desc.qw1 = QI_DEV_EIOTLB_GLOB(granu);
> > +
> > +	/* If S bit is 0, we only flush a single page. If S bit is
> > set,
> > +	 * The least significant zero bit indicates the size. VT-d
> > spec
> > +	 * 6.5.2.6
> > +	 */
> > +	if (!size)
> > +		desc.qw0 |= QI_DEV_EIOTLB_ADDR(addr) &
> > ~QI_DEV_EIOTLB_SIZE;  
> this is qw1.
my mistake.
> > +	else {
> > +		unsigned long mask = 1UL << (VTD_PAGE_SHIFT +
> > size);  
> don't you miss a "- 1 " here?
your are right
> > +
> > +		desc.qw1 |= QI_DEV_EIOTLB_ADDR(addr & ~mask) |
> > QI_DEV_EIOTLB_SIZE;  
> desc.qw1 |= addr & ~mask | QI_DEV_EIOTLB_SIZE;
> ie. I don't think QI_DEV_EIOTLB_ADDR is useful here?
> 
> 
> So won't the following lines do the job?
> 
> unsigned long mask = 1UL << (VTD_PAGE_SHIFT + size) -1;
> desc.qw1 |= addr & ~mask;
> if (size)
>     desc.qw1 |= QI_DEV_EIOTLB_SIZE
that would work too, and simpler. thanks for the suggestion. will
change.
> > +	}
> > +	qi_submit_sync(&desc, iommu);
> > +}
> > +
> > +void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64
> > granu, int pasid) +{
> > +	struct qi_desc desc;
> > +
> > +	desc.qw0 = QI_PC_TYPE | QI_PC_DID(did) | QI_PC_GRAN(granu)
> > | QI_PC_PASID(pasid);  
> nit: reorder the fields according to the spec, easier to check if any
> is missing.
sounds good.

> > +	desc.qw1 = 0;
> > +	desc.qw2 = 0;
> > +	desc.qw3 = 0;
> > +	qi_submit_sync(&desc, iommu);
> > +}
> >  /*
> >   * Disable Queued Invalidation interface.
> >   */
> > diff --git a/include/linux/intel-iommu.h
> > b/include/linux/intel-iommu.h index 94d3a9a..1cdb35b 100644
> > --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -339,7 +339,7 @@ enum {
> >  #define QI_IOTLB_GRAN(gran) 	(((u64)gran) >>
> > (DMA_TLB_FLUSH_GRANU_OFFSET-4)) #define QI_IOTLB_ADDR(addr)
> > (((u64)addr) & VTD_PAGE_MASK) #define
> > QI_IOTLB_IH(ih)		(((u64)ih) << 6) -#define
> > QI_IOTLB_AM(am)		(((u8)am)) +#define
> > QI_IOTLB_AM(am)		(((u8)am) & 0x3f) 
> >  #define QI_CC_FM(fm)		(((u64)fm) << 48)
> >  #define QI_CC_SID(sid)		(((u64)sid) << 32)
> > @@ -357,17 +357,22 @@ enum {
> >  #define QI_PC_DID(did)		(((u64)did) << 16)
> >  #define QI_PC_GRAN(gran)	(((u64)gran) << 4)
> >  
> > -#define QI_PC_ALL_PASIDS	(QI_PC_TYPE | QI_PC_GRAN(0))
> > -#define QI_PC_PASID_SEL		(QI_PC_TYPE | QI_PC_GRAN(1))
> > +/* PASID cache invalidation granu */
> > +#define QI_PC_ALL_PASIDS	0
> > +#define QI_PC_PASID_SEL		1
> >  
> >  #define QI_EIOTLB_ADDR(addr)	((u64)(addr) & VTD_PAGE_MASK)
> >  #define QI_EIOTLB_GL(gl)	(((u64)gl) << 7)
> >  #define QI_EIOTLB_IH(ih)	(((u64)ih) << 6)
> > -#define QI_EIOTLB_AM(am)	(((u64)am))
> > +#define QI_EIOTLB_AM(am)	(((u64)am) & 0x3f)
> >  #define QI_EIOTLB_PASID(pasid) 	(((u64)pasid) << 32)
> >  #define QI_EIOTLB_DID(did)	(((u64)did) << 16)
> >  #define QI_EIOTLB_GRAN(gran) 	(((u64)gran) << 4)
> >  
> > +/* QI Dev-IOTLB inv granu */
> > +#define QI_DEV_IOTLB_GRAN_ALL		1
> > +#define QI_DEV_IOTLB_GRAN_PASID_SEL	0
> > +
> >  #define QI_DEV_EIOTLB_ADDR(a)	((u64)(a) & VTD_PAGE_MASK)
> >  #define QI_DEV_EIOTLB_SIZE	(((u64)1) << 11)
> >  #define QI_DEV_EIOTLB_GLOB(g)	((u64)g)
> > @@ -658,8 +663,16 @@ extern void qi_flush_context(struct
> > intel_iommu *iommu, u16 did, u16 sid, u8 fm, u64 type);
> >  extern void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64
> > addr, unsigned int size_order, u64 type);
> > +extern void qi_flush_piotlb(struct intel_iommu *iommu, u16 did,
> > u64 addr,
> > +			u32 pasid, unsigned int size_order, u64
> > type, int ih); extern void qi_flush_dev_iotlb(struct intel_iommu
> > *iommu, u16 sid, u16 pfsid, u16 qdep, u64 addr, unsigned mask);
> > +
> > +extern void qi_flush_dev_piotlb(struct intel_iommu *iommu, u16
> > sid, u16 pfsid,
> > +			u32 pasid, u16 qdep, u64 addr, unsigned
> > size, u64 granu);  
> s/size/size_order
> > +
> > +extern void qi_flush_pasid_cache(struct intel_iommu *iommu, u16
> > did, u64 granu, int pasid); +
> >  extern int qi_submit_sync(struct qi_desc *desc, struct intel_iommu
> > *iommu); 
> >  extern int dmar_ir_support(void);
> >   
> 
> Thanks
> 
> Eric

[Jacob Pan]

^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2019-08-14 20:13 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-09 13:44 [PATCH v4 00/22] Shared virtual address IOMMU and VT-d support Jacob Pan
2019-06-09 13:44 ` [PATCH v4 01/22] driver core: Add per device iommu param Jacob Pan
2019-06-09 13:44 ` [PATCH v4 02/22] iommu: Introduce device fault data Jacob Pan
2019-06-18 15:42   ` Jonathan Cameron
2019-06-09 13:44 ` [PATCH v4 03/22] iommu: Introduce device fault report API Jacob Pan
2019-06-18 15:41   ` Jonathan Cameron
2019-06-09 13:44 ` [PATCH v4 04/22] iommu: Add recoverable fault reporting Jacob Pan
2019-06-18 15:44   ` Jonathan Cameron
2019-06-09 13:44 ` [PATCH v4 05/22] iommu: Add a timeout parameter for PRQ response Jacob Pan
2019-06-09 13:44 ` [PATCH v4 06/22] trace/iommu: Add sva trace events Jacob Pan
2019-06-09 13:44 ` [PATCH v4 07/22] iommu: Use device fault trace event Jacob Pan
2019-06-09 13:44 ` [PATCH v4 08/22] iommu: Introduce attach/detach_pasid_table API Jacob Pan
2019-06-18 15:41   ` Jonathan Cameron
2019-06-24 15:06     ` Auger Eric
2019-06-24 15:23       ` Jean-Philippe Brucker
2019-06-09 13:44 ` [PATCH v4 09/22] iommu: Introduce cache_invalidate API Jacob Pan
2019-06-18 15:41   ` Jonathan Cameron
2019-06-09 13:44 ` [PATCH v4 10/22] iommu: Fix compile error without IOMMU_API Jacob Pan
2019-06-18 14:10   ` Jonathan Cameron
2019-06-24 22:28     ` Jacob Pan
2019-06-09 13:44 ` [PATCH v4 11/22] iommu: Introduce guest PASID bind function Jacob Pan
2019-06-18 15:36   ` Jean-Philippe Brucker
2019-06-24 22:24     ` Jacob Pan
2019-07-16 16:44   ` Auger Eric
2019-08-05 21:02     ` Jacob Pan
2019-08-05 23:13     ` Jacob Pan
2019-06-09 13:44 ` [PATCH v4 12/22] iommu: Add I/O ASID allocator Jacob Pan
2019-06-18 16:50   ` Jonathan Cameron
2019-06-25 18:55     ` Jacob Pan
2019-06-09 13:44 ` [PATCH v4 13/22] iommu/vt-d: Enlightened PASID allocation Jacob Pan
2019-07-16  9:29   ` Auger Eric
2019-08-13 16:57     ` Jacob Pan
2019-06-09 13:44 ` [PATCH v4 14/22] iommu/vt-d: Add custom allocator for IOASID Jacob Pan
2019-07-16  9:30   ` Auger Eric
2019-08-05 20:02     ` Jacob Pan
2019-06-09 13:44 ` [PATCH v4 15/22] iommu/vt-d: Replace Intel specific PASID allocator with IOASID Jacob Pan
2019-06-18 15:57   ` Jonathan Cameron
2019-06-24 21:36     ` Jacob Pan
2019-06-27  1:53   ` Lu Baolu
2019-06-27 15:40     ` Jacob Pan
2019-06-09 13:44 ` [PATCH v4 16/22] iommu/vt-d: Move domain helper to header Jacob Pan
2019-07-16  9:33   ` Auger Eric
2019-06-09 13:44 ` [PATCH v4 17/22] iommu/vt-d: Avoid duplicated code for PASID setup Jacob Pan
2019-06-18 16:03   ` Jonathan Cameron
2019-06-24 23:44     ` Jacob Pan
2019-07-16  9:52   ` Auger Eric
2019-06-09 13:44 ` [PATCH v4 18/22] iommu/vt-d: Add nested translation helper function Jacob Pan
2019-06-09 13:44 ` [PATCH v4 19/22] iommu/vt-d: Clean up for SVM device list Jacob Pan
2019-06-18 16:42   ` Jonathan Cameron
2019-06-24 23:59     ` Jacob Pan
2019-06-09 13:44 ` [PATCH v4 20/22] iommu/vt-d: Add bind guest PASID support Jacob Pan
2019-06-18 16:44   ` Jonathan Cameron
2019-06-24 22:41     ` Jacob Pan
2019-06-27  2:50   ` Lu Baolu
2019-06-27 20:22     ` Jacob Pan
2019-07-05  2:21       ` Lu Baolu
2019-08-14 17:20         ` Jacob Pan
2019-07-16 16:45   ` Auger Eric
2019-07-16 17:04     ` Raj, Ashok
2019-07-18  7:47       ` Auger Eric
2019-06-09 13:44 ` [PATCH v4 21/22] iommu/vt-d: Support flushing more translation cache types Jacob Pan
2019-07-18  8:35   ` Auger Eric
2019-08-14 20:17     ` Jacob Pan
2019-06-09 13:44 ` [PATCH v4 22/22] iommu/vt-d: Add svm/sva invalidate function Jacob Pan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).