All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 00/12] iommu: Prepare to deliver page faults to user space
@ 2023-11-15  3:02 Lu Baolu
  2023-11-15  3:02 ` [PATCH v7 01/12] iommu: Move iommu fault data to linux/iommu.h Lu Baolu
                   ` (12 more replies)
  0 siblings, 13 replies; 49+ messages in thread
From: Lu Baolu @ 2023-11-15  3:02 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian, Jean-Philippe Brucker, Nicolin Chen
  Cc: Yi Liu, Jacob Pan, Yan Zhao, iommu, kvm, linux-kernel, Lu Baolu

When a user-managed page table is attached to an IOMMU, it is necessary
to deliver IO page faults to user space so that they can be handled
appropriately. One use case for this is nested translation, which is
currently being discussed in the mailing list.

I have posted a RFC series [1] that describes the implementation of
delivering page faults to user space through IOMMUFD. This series has
received several comments on the IOMMU refactoring, which I am trying to
address in this series.

The major refactoring includes:

- [PATCH 01 ~ 04] Move include/uapi/linux/iommu.h to
  include/linux/iommu.h. Remove the unrecoverable fault data definition.
- [PATCH 05 ~ 06] Remove iommu_[un]register_device_fault_handler().
- [PATCH 07 ~ 10] Separate SVA and IOPF. Make IOPF a generic page fault
  handling framework.
- [PATCH 11 ~ 12] Improve iopf framework for iommufd use.

This is also available at github [2].

[1] https://lore.kernel.org/linux-iommu/20230530053724.232765-1-baolu.lu@linux.intel.com/
[2] https://github.com/LuBaolu/intel-iommu/commits/preparatory-io-pgfault-delivery-v7

Change log:
v7:
 - Rebase to v6.7-rc1.
 - Export iopf_group_response() for global use.
 - Release lock when calling iopf handler.
 - The whole series has been verified to work for SVA case on Intel
   platforms by Zhao Yan. Add her Tested-by to affected patches.

v6: https://lore.kernel.org/linux-iommu/20230928042734.16134-1-baolu.lu@linux.intel.com/
 - [PATCH 09/12] Check IS_ERR() against the iommu domain. [Jingqi/Jason]
 - [PATCH 12/12] Rename the comments and name of iopf_queue_flush_dev(),
   no functionality changes. [Kevin]
 - All patches rebased on the latest iommu/core branch.

v5: https://lore.kernel.org/linux-iommu/20230914085638.17307-1-baolu.lu@linux.intel.com/
 - Consolidate per-device fault data management. (New patch 11)
 - Improve iopf_queue_flush_dev(). (New patch 12)

v4: https://lore.kernel.org/linux-iommu/20230825023026.132919-1-baolu.lu@linux.intel.com/
 - Merge iommu_fault_event and iopf_fault. They are duplicate.
 - Move iommu_report_device_fault() and iommu_page_response() to
   io-pgfault.c.
 - Move iommu_sva_domain_alloc() to iommu-sva.c.
 - Add group->domain and use it directly in sva fault handler.
 - Misc code refactoring and refining.

v3: https://lore.kernel.org/linux-iommu/20230817234047.195194-1-baolu.lu@linux.intel.com/
 - Convert the fault data structures from uAPI to kAPI.
 - Merge iopf_device_param into iommu_fault_param.
 - Add debugging on domain lifetime for iopf.
 - Remove patch "iommu: Change the return value of dev_iommu_get()".
 - Remove patch "iommu: Add helper to set iopf handler for domain".
 - Misc code refactoring and refining.

v2: https://lore.kernel.org/linux-iommu/20230727054837.147050-1-baolu.lu@linux.intel.com/
 - Remove unrecoverable fault data definition as suggested by Kevin.
 - Drop the per-device fault cookie code considering that doesn't make
   much sense for SVA.
 - Make the IOMMU page fault handling framework generic. So that it can
   available for use cases other than SVA.

v1: https://lore.kernel.org/linux-iommu/20230711010642.19707-1-baolu.lu@linux.intel.com/

Lu Baolu (12):
  iommu: Move iommu fault data to linux/iommu.h
  iommu/arm-smmu-v3: Remove unrecoverable faults reporting
  iommu: Remove unrecoverable fault data
  iommu: Cleanup iopf data structure definitions
  iommu: Merge iopf_device_param into iommu_fault_param
  iommu: Remove iommu_[un]register_device_fault_handler()
  iommu: Merge iommu_fault_event and iopf_fault
  iommu: Prepare for separating SVA and IOPF
  iommu: Make iommu_queue_iopf() more generic
  iommu: Separate SVA and IOPF
  iommu: Consolidate per-device fault data management
  iommu: Improve iopf_queue_flush_dev()

 include/linux/iommu.h                         | 266 +++++++---
 drivers/iommu/intel/iommu.h                   |   2 +-
 drivers/iommu/iommu-sva.h                     |  71 ---
 include/uapi/linux/iommu.h                    | 161 ------
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  14 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  51 +-
 drivers/iommu/intel/iommu.c                   |  25 +-
 drivers/iommu/intel/svm.c                     |   8 +-
 drivers/iommu/io-pgfault.c                    | 469 ++++++++++++------
 drivers/iommu/iommu-sva.c                     |  66 ++-
 drivers/iommu/iommu.c                         | 232 ---------
 MAINTAINERS                                   |   1 -
 drivers/iommu/Kconfig                         |   4 +
 drivers/iommu/Makefile                        |   3 +-
 drivers/iommu/intel/Kconfig                   |   1 +
 15 files changed, 601 insertions(+), 773 deletions(-)
 delete mode 100644 drivers/iommu/iommu-sva.h
 delete mode 100644 include/uapi/linux/iommu.h

-- 
2.34.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v7 01/12] iommu: Move iommu fault data to linux/iommu.h
  2023-11-15  3:02 [PATCH v7 00/12] iommu: Prepare to deliver page faults to user space Lu Baolu
@ 2023-11-15  3:02 ` Lu Baolu
  2023-12-04 10:52   ` Yi Liu
  2023-11-15  3:02 ` [PATCH v7 02/12] iommu/arm-smmu-v3: Remove unrecoverable faults reporting Lu Baolu
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 49+ messages in thread
From: Lu Baolu @ 2023-11-15  3:02 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian, Jean-Philippe Brucker, Nicolin Chen
  Cc: Yi Liu, Jacob Pan, Yan Zhao, iommu, kvm, linux-kernel, Lu Baolu,
	Jason Gunthorpe

The iommu fault data is currently defined in uapi/linux/iommu.h, but is
only used inside the iommu subsystem. Move it to linux/iommu.h, where it
will be more accessible to kernel drivers.

With this done, uapi/linux/iommu.h becomes empty and can be removed from
the tree.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
---
 include/linux/iommu.h      | 152 +++++++++++++++++++++++++++++++++-
 include/uapi/linux/iommu.h | 161 -------------------------------------
 MAINTAINERS                |   1 -
 3 files changed, 151 insertions(+), 163 deletions(-)
 delete mode 100644 include/uapi/linux/iommu.h

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index ec289c1016f5..c2e2225184cf 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -14,7 +14,6 @@
 #include <linux/err.h>
 #include <linux/of.h>
 #include <linux/iova_bitmap.h>
-#include <uapi/linux/iommu.h>
 
 #define IOMMU_READ	(1 << 0)
 #define IOMMU_WRITE	(1 << 1)
@@ -44,6 +43,157 @@ struct iommu_sva;
 struct iommu_fault_event;
 struct iommu_dma_cookie;
 
+#define IOMMU_FAULT_PERM_READ	(1 << 0) /* read */
+#define IOMMU_FAULT_PERM_WRITE	(1 << 1) /* write */
+#define IOMMU_FAULT_PERM_EXEC	(1 << 2) /* exec */
+#define IOMMU_FAULT_PERM_PRIV	(1 << 3) /* privileged */
+
+/* Generic fault types, can be expanded IRQ remapping fault */
+enum iommu_fault_type {
+	IOMMU_FAULT_DMA_UNRECOV = 1,	/* unrecoverable fault */
+	IOMMU_FAULT_PAGE_REQ,		/* page request fault */
+};
+
+enum iommu_fault_reason {
+	IOMMU_FAULT_REASON_UNKNOWN = 0,
+
+	/* Could not access the PASID table (fetch caused external abort) */
+	IOMMU_FAULT_REASON_PASID_FETCH,
+
+	/* PASID entry is invalid or has configuration errors */
+	IOMMU_FAULT_REASON_BAD_PASID_ENTRY,
+
+	/*
+	 * PASID is out of range (e.g. exceeds the maximum PASID
+	 * supported by the IOMMU) or disabled.
+	 */
+	IOMMU_FAULT_REASON_PASID_INVALID,
+
+	/*
+	 * An external abort occurred fetching (or updating) a translation
+	 * table descriptor
+	 */
+	IOMMU_FAULT_REASON_WALK_EABT,
+
+	/*
+	 * Could not access the page table entry (Bad address),
+	 * actual translation fault
+	 */
+	IOMMU_FAULT_REASON_PTE_FETCH,
+
+	/* Protection flag check failed */
+	IOMMU_FAULT_REASON_PERMISSION,
+
+	/* access flag check failed */
+	IOMMU_FAULT_REASON_ACCESS,
+
+	/* Output address of a translation stage caused Address Size fault */
+	IOMMU_FAULT_REASON_OOR_ADDRESS,
+};
+
+/**
+ * struct iommu_fault_unrecoverable - Unrecoverable fault data
+ * @reason: reason of the fault, from &enum iommu_fault_reason
+ * @flags: parameters of this fault (IOMMU_FAULT_UNRECOV_* values)
+ * @pasid: Process Address Space ID
+ * @perm: requested permission access using by the incoming transaction
+ *        (IOMMU_FAULT_PERM_* values)
+ * @addr: offending page address
+ * @fetch_addr: address that caused a fetch abort, if any
+ */
+struct iommu_fault_unrecoverable {
+	__u32	reason;
+#define IOMMU_FAULT_UNRECOV_PASID_VALID		(1 << 0)
+#define IOMMU_FAULT_UNRECOV_ADDR_VALID		(1 << 1)
+#define IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID	(1 << 2)
+	__u32	flags;
+	__u32	pasid;
+	__u32	perm;
+	__u64	addr;
+	__u64	fetch_addr;
+};
+
+/**
+ * struct iommu_fault_page_request - Page Request data
+ * @flags: encodes whether the corresponding fields are valid and whether this
+ *         is the last page in group (IOMMU_FAULT_PAGE_REQUEST_* values).
+ *         When IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID is set, the page response
+ *         must have the same PASID value as the page request. When it is clear,
+ *         the page response should not have a PASID.
+ * @pasid: Process Address Space ID
+ * @grpid: Page Request Group Index
+ * @perm: requested page permissions (IOMMU_FAULT_PERM_* values)
+ * @addr: page address
+ * @private_data: device-specific private information
+ */
+struct iommu_fault_page_request {
+#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID	(1 << 0)
+#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE	(1 << 1)
+#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA	(1 << 2)
+#define IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID	(1 << 3)
+	__u32	flags;
+	__u32	pasid;
+	__u32	grpid;
+	__u32	perm;
+	__u64	addr;
+	__u64	private_data[2];
+};
+
+/**
+ * struct iommu_fault - Generic fault data
+ * @type: fault type from &enum iommu_fault_type
+ * @padding: reserved for future use (should be zero)
+ * @event: fault event, when @type is %IOMMU_FAULT_DMA_UNRECOV
+ * @prm: Page Request message, when @type is %IOMMU_FAULT_PAGE_REQ
+ * @padding2: sets the fault size to allow for future extensions
+ */
+struct iommu_fault {
+	__u32	type;
+	__u32	padding;
+	union {
+		struct iommu_fault_unrecoverable event;
+		struct iommu_fault_page_request prm;
+		__u8 padding2[56];
+	};
+};
+
+/**
+ * enum iommu_page_response_code - Return status of fault handlers
+ * @IOMMU_PAGE_RESP_SUCCESS: Fault has been handled and the page tables
+ *	populated, retry the access. This is "Success" in PCI PRI.
+ * @IOMMU_PAGE_RESP_FAILURE: General error. Drop all subsequent faults from
+ *	this device if possible. This is "Response Failure" in PCI PRI.
+ * @IOMMU_PAGE_RESP_INVALID: Could not handle this fault, don't retry the
+ *	access. This is "Invalid Request" in PCI PRI.
+ */
+enum iommu_page_response_code {
+	IOMMU_PAGE_RESP_SUCCESS = 0,
+	IOMMU_PAGE_RESP_INVALID,
+	IOMMU_PAGE_RESP_FAILURE,
+};
+
+/**
+ * struct iommu_page_response - Generic page response information
+ * @argsz: User filled size of this data
+ * @version: API version of this structure
+ * @flags: encodes whether the corresponding fields are valid
+ *         (IOMMU_FAULT_PAGE_RESPONSE_* values)
+ * @pasid: Process Address Space ID
+ * @grpid: Page Request Group Index
+ * @code: response code from &enum iommu_page_response_code
+ */
+struct iommu_page_response {
+	__u32	argsz;
+#define IOMMU_PAGE_RESP_VERSION_1	1
+	__u32	version;
+#define IOMMU_PAGE_RESP_PASID_VALID	(1 << 0)
+	__u32	flags;
+	__u32	pasid;
+	__u32	grpid;
+	__u32	code;
+};
+
+
 /* iommu fault flags */
 #define IOMMU_FAULT_READ	0x0
 #define IOMMU_FAULT_WRITE	0x1
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
deleted file mode 100644
index 65d8b0234f69..000000000000
--- a/include/uapi/linux/iommu.h
+++ /dev/null
@@ -1,161 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
-/*
- * IOMMU user API definitions
- */
-
-#ifndef _UAPI_IOMMU_H
-#define _UAPI_IOMMU_H
-
-#include <linux/types.h>
-
-#define IOMMU_FAULT_PERM_READ	(1 << 0) /* read */
-#define IOMMU_FAULT_PERM_WRITE	(1 << 1) /* write */
-#define IOMMU_FAULT_PERM_EXEC	(1 << 2) /* exec */
-#define IOMMU_FAULT_PERM_PRIV	(1 << 3) /* privileged */
-
-/* Generic fault types, can be expanded IRQ remapping fault */
-enum iommu_fault_type {
-	IOMMU_FAULT_DMA_UNRECOV = 1,	/* unrecoverable fault */
-	IOMMU_FAULT_PAGE_REQ,		/* page request fault */
-};
-
-enum iommu_fault_reason {
-	IOMMU_FAULT_REASON_UNKNOWN = 0,
-
-	/* Could not access the PASID table (fetch caused external abort) */
-	IOMMU_FAULT_REASON_PASID_FETCH,
-
-	/* PASID entry is invalid or has configuration errors */
-	IOMMU_FAULT_REASON_BAD_PASID_ENTRY,
-
-	/*
-	 * PASID is out of range (e.g. exceeds the maximum PASID
-	 * supported by the IOMMU) or disabled.
-	 */
-	IOMMU_FAULT_REASON_PASID_INVALID,
-
-	/*
-	 * An external abort occurred fetching (or updating) a translation
-	 * table descriptor
-	 */
-	IOMMU_FAULT_REASON_WALK_EABT,
-
-	/*
-	 * Could not access the page table entry (Bad address),
-	 * actual translation fault
-	 */
-	IOMMU_FAULT_REASON_PTE_FETCH,
-
-	/* Protection flag check failed */
-	IOMMU_FAULT_REASON_PERMISSION,
-
-	/* access flag check failed */
-	IOMMU_FAULT_REASON_ACCESS,
-
-	/* Output address of a translation stage caused Address Size fault */
-	IOMMU_FAULT_REASON_OOR_ADDRESS,
-};
-
-/**
- * struct iommu_fault_unrecoverable - Unrecoverable fault data
- * @reason: reason of the fault, from &enum iommu_fault_reason
- * @flags: parameters of this fault (IOMMU_FAULT_UNRECOV_* values)
- * @pasid: Process Address Space ID
- * @perm: requested permission access using by the incoming transaction
- *        (IOMMU_FAULT_PERM_* values)
- * @addr: offending page address
- * @fetch_addr: address that caused a fetch abort, if any
- */
-struct iommu_fault_unrecoverable {
-	__u32	reason;
-#define IOMMU_FAULT_UNRECOV_PASID_VALID		(1 << 0)
-#define IOMMU_FAULT_UNRECOV_ADDR_VALID		(1 << 1)
-#define IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID	(1 << 2)
-	__u32	flags;
-	__u32	pasid;
-	__u32	perm;
-	__u64	addr;
-	__u64	fetch_addr;
-};
-
-/**
- * struct iommu_fault_page_request - Page Request data
- * @flags: encodes whether the corresponding fields are valid and whether this
- *         is the last page in group (IOMMU_FAULT_PAGE_REQUEST_* values).
- *         When IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID is set, the page response
- *         must have the same PASID value as the page request. When it is clear,
- *         the page response should not have a PASID.
- * @pasid: Process Address Space ID
- * @grpid: Page Request Group Index
- * @perm: requested page permissions (IOMMU_FAULT_PERM_* values)
- * @addr: page address
- * @private_data: device-specific private information
- */
-struct iommu_fault_page_request {
-#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID	(1 << 0)
-#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE	(1 << 1)
-#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA	(1 << 2)
-#define IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID	(1 << 3)
-	__u32	flags;
-	__u32	pasid;
-	__u32	grpid;
-	__u32	perm;
-	__u64	addr;
-	__u64	private_data[2];
-};
-
-/**
- * struct iommu_fault - Generic fault data
- * @type: fault type from &enum iommu_fault_type
- * @padding: reserved for future use (should be zero)
- * @event: fault event, when @type is %IOMMU_FAULT_DMA_UNRECOV
- * @prm: Page Request message, when @type is %IOMMU_FAULT_PAGE_REQ
- * @padding2: sets the fault size to allow for future extensions
- */
-struct iommu_fault {
-	__u32	type;
-	__u32	padding;
-	union {
-		struct iommu_fault_unrecoverable event;
-		struct iommu_fault_page_request prm;
-		__u8 padding2[56];
-	};
-};
-
-/**
- * enum iommu_page_response_code - Return status of fault handlers
- * @IOMMU_PAGE_RESP_SUCCESS: Fault has been handled and the page tables
- *	populated, retry the access. This is "Success" in PCI PRI.
- * @IOMMU_PAGE_RESP_FAILURE: General error. Drop all subsequent faults from
- *	this device if possible. This is "Response Failure" in PCI PRI.
- * @IOMMU_PAGE_RESP_INVALID: Could not handle this fault, don't retry the
- *	access. This is "Invalid Request" in PCI PRI.
- */
-enum iommu_page_response_code {
-	IOMMU_PAGE_RESP_SUCCESS = 0,
-	IOMMU_PAGE_RESP_INVALID,
-	IOMMU_PAGE_RESP_FAILURE,
-};
-
-/**
- * struct iommu_page_response - Generic page response information
- * @argsz: User filled size of this data
- * @version: API version of this structure
- * @flags: encodes whether the corresponding fields are valid
- *         (IOMMU_FAULT_PAGE_RESPONSE_* values)
- * @pasid: Process Address Space ID
- * @grpid: Page Request Group Index
- * @code: response code from &enum iommu_page_response_code
- */
-struct iommu_page_response {
-	__u32	argsz;
-#define IOMMU_PAGE_RESP_VERSION_1	1
-	__u32	version;
-#define IOMMU_PAGE_RESP_PASID_VALID	(1 << 0)
-	__u32	flags;
-	__u32	pasid;
-	__u32	grpid;
-	__u32	code;
-};
-
-#endif /* _UAPI_IOMMU_H */
diff --git a/MAINTAINERS b/MAINTAINERS
index 97f51d5ec1cf..bfd97aaeb01d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11129,7 +11129,6 @@ F:	drivers/iommu/
 F:	include/linux/iommu.h
 F:	include/linux/iova.h
 F:	include/linux/of_iommu.h
-F:	include/uapi/linux/iommu.h
 
 IOMMUFD
 M:	Jason Gunthorpe <jgg@nvidia.com>
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v7 02/12] iommu/arm-smmu-v3: Remove unrecoverable faults reporting
  2023-11-15  3:02 [PATCH v7 00/12] iommu: Prepare to deliver page faults to user space Lu Baolu
  2023-11-15  3:02 ` [PATCH v7 01/12] iommu: Move iommu fault data to linux/iommu.h Lu Baolu
@ 2023-11-15  3:02 ` Lu Baolu
  2023-12-01 15:42   ` Jason Gunthorpe
  2023-12-04 10:54   ` Yi Liu
  2023-11-15  3:02 ` [PATCH v7 03/12] iommu: Remove unrecoverable fault data Lu Baolu
                   ` (10 subsequent siblings)
  12 siblings, 2 replies; 49+ messages in thread
From: Lu Baolu @ 2023-11-15  3:02 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian, Jean-Philippe Brucker, Nicolin Chen
  Cc: Yi Liu, Jacob Pan, Yan Zhao, iommu, kvm, linux-kernel, Lu Baolu

No device driver registers fault handler to handle the reported
unrecoveraable faults. Remove it to avoid dead code.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 46 ++++++---------------
 1 file changed, 13 insertions(+), 33 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 7445454c2af2..505400538a2e 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1463,7 +1463,6 @@ arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
 static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
 {
 	int ret;
-	u32 reason;
 	u32 perm = 0;
 	struct arm_smmu_master *master;
 	bool ssid_valid = evt[0] & EVTQ_0_SSV;
@@ -1473,16 +1472,9 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
 
 	switch (FIELD_GET(EVTQ_0_ID, evt[0])) {
 	case EVT_ID_TRANSLATION_FAULT:
-		reason = IOMMU_FAULT_REASON_PTE_FETCH;
-		break;
 	case EVT_ID_ADDR_SIZE_FAULT:
-		reason = IOMMU_FAULT_REASON_OOR_ADDRESS;
-		break;
 	case EVT_ID_ACCESS_FAULT:
-		reason = IOMMU_FAULT_REASON_ACCESS;
-		break;
 	case EVT_ID_PERMISSION_FAULT:
-		reason = IOMMU_FAULT_REASON_PERMISSION;
 		break;
 	default:
 		return -EOPNOTSUPP;
@@ -1492,6 +1484,9 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
 	if (evt[1] & EVTQ_1_S2)
 		return -EFAULT;
 
+	if (!(evt[1] & EVTQ_1_STALL))
+		return -EOPNOTSUPP;
+
 	if (evt[1] & EVTQ_1_RnW)
 		perm |= IOMMU_FAULT_PERM_READ;
 	else
@@ -1503,32 +1498,17 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
 	if (evt[1] & EVTQ_1_PnU)
 		perm |= IOMMU_FAULT_PERM_PRIV;
 
-	if (evt[1] & EVTQ_1_STALL) {
-		flt->type = IOMMU_FAULT_PAGE_REQ;
-		flt->prm = (struct iommu_fault_page_request) {
-			.flags = IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE,
-			.grpid = FIELD_GET(EVTQ_1_STAG, evt[1]),
-			.perm = perm,
-			.addr = FIELD_GET(EVTQ_2_ADDR, evt[2]),
-		};
+	flt->type = IOMMU_FAULT_PAGE_REQ;
+	flt->prm = (struct iommu_fault_page_request) {
+		.flags = IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE,
+		.grpid = FIELD_GET(EVTQ_1_STAG, evt[1]),
+		.perm = perm,
+		.addr = FIELD_GET(EVTQ_2_ADDR, evt[2]),
+	};
 
-		if (ssid_valid) {
-			flt->prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
-			flt->prm.pasid = FIELD_GET(EVTQ_0_SSID, evt[0]);
-		}
-	} else {
-		flt->type = IOMMU_FAULT_DMA_UNRECOV;
-		flt->event = (struct iommu_fault_unrecoverable) {
-			.reason = reason,
-			.flags = IOMMU_FAULT_UNRECOV_ADDR_VALID,
-			.perm = perm,
-			.addr = FIELD_GET(EVTQ_2_ADDR, evt[2]),
-		};
-
-		if (ssid_valid) {
-			flt->event.flags |= IOMMU_FAULT_UNRECOV_PASID_VALID;
-			flt->event.pasid = FIELD_GET(EVTQ_0_SSID, evt[0]);
-		}
+	if (ssid_valid) {
+		flt->prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
+		flt->prm.pasid = FIELD_GET(EVTQ_0_SSID, evt[0]);
 	}
 
 	mutex_lock(&smmu->streams_mutex);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v7 03/12] iommu: Remove unrecoverable fault data
  2023-11-15  3:02 [PATCH v7 00/12] iommu: Prepare to deliver page faults to user space Lu Baolu
  2023-11-15  3:02 ` [PATCH v7 01/12] iommu: Move iommu fault data to linux/iommu.h Lu Baolu
  2023-11-15  3:02 ` [PATCH v7 02/12] iommu/arm-smmu-v3: Remove unrecoverable faults reporting Lu Baolu
@ 2023-11-15  3:02 ` Lu Baolu
  2023-12-04 10:58   ` Yi Liu
  2023-11-15  3:02 ` [PATCH v7 04/12] iommu: Cleanup iopf data structure definitions Lu Baolu
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 49+ messages in thread
From: Lu Baolu @ 2023-11-15  3:02 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian, Jean-Philippe Brucker, Nicolin Chen
  Cc: Yi Liu, Jacob Pan, Yan Zhao, iommu, kvm, linux-kernel, Lu Baolu,
	Jason Gunthorpe

The unrecoverable fault data is not used anywhere. Remove it to avoid
dead code.

Suggested-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 include/linux/iommu.h | 70 +------------------------------------------
 1 file changed, 1 insertion(+), 69 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index c2e2225184cf..81eee1afec72 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -50,69 +50,9 @@ struct iommu_dma_cookie;
 
 /* Generic fault types, can be expanded IRQ remapping fault */
 enum iommu_fault_type {
-	IOMMU_FAULT_DMA_UNRECOV = 1,	/* unrecoverable fault */
 	IOMMU_FAULT_PAGE_REQ,		/* page request fault */
 };
 
-enum iommu_fault_reason {
-	IOMMU_FAULT_REASON_UNKNOWN = 0,
-
-	/* Could not access the PASID table (fetch caused external abort) */
-	IOMMU_FAULT_REASON_PASID_FETCH,
-
-	/* PASID entry is invalid or has configuration errors */
-	IOMMU_FAULT_REASON_BAD_PASID_ENTRY,
-
-	/*
-	 * PASID is out of range (e.g. exceeds the maximum PASID
-	 * supported by the IOMMU) or disabled.
-	 */
-	IOMMU_FAULT_REASON_PASID_INVALID,
-
-	/*
-	 * An external abort occurred fetching (or updating) a translation
-	 * table descriptor
-	 */
-	IOMMU_FAULT_REASON_WALK_EABT,
-
-	/*
-	 * Could not access the page table entry (Bad address),
-	 * actual translation fault
-	 */
-	IOMMU_FAULT_REASON_PTE_FETCH,
-
-	/* Protection flag check failed */
-	IOMMU_FAULT_REASON_PERMISSION,
-
-	/* access flag check failed */
-	IOMMU_FAULT_REASON_ACCESS,
-
-	/* Output address of a translation stage caused Address Size fault */
-	IOMMU_FAULT_REASON_OOR_ADDRESS,
-};
-
-/**
- * struct iommu_fault_unrecoverable - Unrecoverable fault data
- * @reason: reason of the fault, from &enum iommu_fault_reason
- * @flags: parameters of this fault (IOMMU_FAULT_UNRECOV_* values)
- * @pasid: Process Address Space ID
- * @perm: requested permission access using by the incoming transaction
- *        (IOMMU_FAULT_PERM_* values)
- * @addr: offending page address
- * @fetch_addr: address that caused a fetch abort, if any
- */
-struct iommu_fault_unrecoverable {
-	__u32	reason;
-#define IOMMU_FAULT_UNRECOV_PASID_VALID		(1 << 0)
-#define IOMMU_FAULT_UNRECOV_ADDR_VALID		(1 << 1)
-#define IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID	(1 << 2)
-	__u32	flags;
-	__u32	pasid;
-	__u32	perm;
-	__u64	addr;
-	__u64	fetch_addr;
-};
-
 /**
  * struct iommu_fault_page_request - Page Request data
  * @flags: encodes whether the corresponding fields are valid and whether this
@@ -142,19 +82,11 @@ struct iommu_fault_page_request {
 /**
  * struct iommu_fault - Generic fault data
  * @type: fault type from &enum iommu_fault_type
- * @padding: reserved for future use (should be zero)
- * @event: fault event, when @type is %IOMMU_FAULT_DMA_UNRECOV
  * @prm: Page Request message, when @type is %IOMMU_FAULT_PAGE_REQ
- * @padding2: sets the fault size to allow for future extensions
  */
 struct iommu_fault {
 	__u32	type;
-	__u32	padding;
-	union {
-		struct iommu_fault_unrecoverable event;
-		struct iommu_fault_page_request prm;
-		__u8 padding2[56];
-	};
+	struct iommu_fault_page_request prm;
 };
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v7 04/12] iommu: Cleanup iopf data structure definitions
  2023-11-15  3:02 [PATCH v7 00/12] iommu: Prepare to deliver page faults to user space Lu Baolu
                   ` (2 preceding siblings ...)
  2023-11-15  3:02 ` [PATCH v7 03/12] iommu: Remove unrecoverable fault data Lu Baolu
@ 2023-11-15  3:02 ` Lu Baolu
  2023-12-04 11:03   ` Yi Liu
  2023-11-15  3:02 ` [PATCH v7 05/12] iommu: Merge iopf_device_param into iommu_fault_param Lu Baolu
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 49+ messages in thread
From: Lu Baolu @ 2023-11-15  3:02 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian, Jean-Philippe Brucker, Nicolin Chen
  Cc: Yi Liu, Jacob Pan, Yan Zhao, iommu, kvm, linux-kernel, Lu Baolu,
	Jason Gunthorpe

struct iommu_fault_page_request and struct iommu_page_response are not
part of uAPI anymore. Convert them to data structures for kAPI.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
---
 include/linux/iommu.h      | 27 +++++++++++----------------
 drivers/iommu/io-pgfault.c |  1 -
 drivers/iommu/iommu.c      |  4 ----
 3 files changed, 11 insertions(+), 21 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 81eee1afec72..79775859af42 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -71,12 +71,12 @@ struct iommu_fault_page_request {
 #define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE	(1 << 1)
 #define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA	(1 << 2)
 #define IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID	(1 << 3)
-	__u32	flags;
-	__u32	pasid;
-	__u32	grpid;
-	__u32	perm;
-	__u64	addr;
-	__u64	private_data[2];
+	u32	flags;
+	u32	pasid;
+	u32	grpid;
+	u32	perm;
+	u64	addr;
+	u64	private_data[2];
 };
 
 /**
@@ -85,7 +85,7 @@ struct iommu_fault_page_request {
  * @prm: Page Request message, when @type is %IOMMU_FAULT_PAGE_REQ
  */
 struct iommu_fault {
-	__u32	type;
+	u32 type;
 	struct iommu_fault_page_request prm;
 };
 
@@ -106,8 +106,6 @@ enum iommu_page_response_code {
 
 /**
  * struct iommu_page_response - Generic page response information
- * @argsz: User filled size of this data
- * @version: API version of this structure
  * @flags: encodes whether the corresponding fields are valid
  *         (IOMMU_FAULT_PAGE_RESPONSE_* values)
  * @pasid: Process Address Space ID
@@ -115,14 +113,11 @@ enum iommu_page_response_code {
  * @code: response code from &enum iommu_page_response_code
  */
 struct iommu_page_response {
-	__u32	argsz;
-#define IOMMU_PAGE_RESP_VERSION_1	1
-	__u32	version;
 #define IOMMU_PAGE_RESP_PASID_VALID	(1 << 0)
-	__u32	flags;
-	__u32	pasid;
-	__u32	grpid;
-	__u32	code;
+	u32	flags;
+	u32	pasid;
+	u32	grpid;
+	u32	code;
 };
 
 
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index e5b8b9110c13..24b5545352ae 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -56,7 +56,6 @@ static int iopf_complete_group(struct device *dev, struct iopf_fault *iopf,
 			       enum iommu_page_response_code status)
 {
 	struct iommu_page_response resp = {
-		.version		= IOMMU_PAGE_RESP_VERSION_1,
 		.pasid			= iopf->fault.prm.pasid,
 		.grpid			= iopf->fault.prm.grpid,
 		.code			= status,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f17a1113f3d6..f24513e2b025 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1465,10 +1465,6 @@ int iommu_page_response(struct device *dev,
 	if (!param || !param->fault_param)
 		return -EINVAL;
 
-	if (msg->version != IOMMU_PAGE_RESP_VERSION_1 ||
-	    msg->flags & ~IOMMU_PAGE_RESP_PASID_VALID)
-		return -EINVAL;
-
 	/* Only send response if there is a fault report pending */
 	mutex_lock(&param->fault_param->lock);
 	if (list_empty(&param->fault_param->faults)) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v7 05/12] iommu: Merge iopf_device_param into iommu_fault_param
  2023-11-15  3:02 [PATCH v7 00/12] iommu: Prepare to deliver page faults to user space Lu Baolu
                   ` (3 preceding siblings ...)
  2023-11-15  3:02 ` [PATCH v7 04/12] iommu: Cleanup iopf data structure definitions Lu Baolu
@ 2023-11-15  3:02 ` Lu Baolu
  2023-12-04 12:32   ` Yi Liu
  2023-11-15  3:02 ` [PATCH v7 06/12] iommu: Remove iommu_[un]register_device_fault_handler() Lu Baolu
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 49+ messages in thread
From: Lu Baolu @ 2023-11-15  3:02 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian, Jean-Philippe Brucker, Nicolin Chen
  Cc: Yi Liu, Jacob Pan, Yan Zhao, iommu, kvm, linux-kernel, Lu Baolu,
	Jason Gunthorpe

The struct dev_iommu contains two pointers, fault_param and iopf_param.
The fault_param pointer points to a data structure that is used to store
pending faults that are awaiting responses. The iopf_param pointer points
to a data structure that is used to store partial faults that are part of
a Page Request Group.

The fault_param and iopf_param pointers are essentially duplicate. This
causes memory waste. Merge the iopf_device_param pointer into the
iommu_fault_param pointer to consolidate the code and save memory. The
consolidated pointer would be allocated on demand when the device driver
enables the iopf on device, and would be freed after iopf is disabled.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
---
 include/linux/iommu.h      |  18 ++++--
 drivers/iommu/io-pgfault.c | 113 ++++++++++++++++++-------------------
 drivers/iommu/iommu.c      |  34 ++---------
 3 files changed, 75 insertions(+), 90 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 79775859af42..108ab50da1ad 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -42,6 +42,7 @@ struct notifier_block;
 struct iommu_sva;
 struct iommu_fault_event;
 struct iommu_dma_cookie;
+struct iopf_queue;
 
 #define IOMMU_FAULT_PERM_READ	(1 << 0) /* read */
 #define IOMMU_FAULT_PERM_WRITE	(1 << 1) /* write */
@@ -590,21 +591,31 @@ struct iommu_fault_event {
  * struct iommu_fault_param - per-device IOMMU fault data
  * @handler: Callback function to handle IOMMU faults at device level
  * @data: handler private data
- * @faults: holds the pending faults which needs response
  * @lock: protect pending faults list
+ * @dev: the device that owns this param
+ * @queue: IOPF queue
+ * @queue_list: index into queue->devices
+ * @partial: faults that are part of a Page Request Group for which the last
+ *           request hasn't been submitted yet.
+ * @faults: holds the pending faults which needs response
  */
 struct iommu_fault_param {
 	iommu_dev_fault_handler_t handler;
 	void *data;
+	struct mutex lock;
+
+	struct device *dev;
+	struct iopf_queue *queue;
+	struct list_head queue_list;
+
+	struct list_head partial;
 	struct list_head faults;
-	struct mutex lock;
 };
 
 /**
  * struct dev_iommu - Collection of per-device IOMMU data
  *
  * @fault_param: IOMMU detected device fault reporting data
- * @iopf_param:	 I/O Page Fault queue and data
  * @fwspec:	 IOMMU fwspec data
  * @iommu_dev:	 IOMMU device this device is linked to
  * @priv:	 IOMMU Driver private data
@@ -620,7 +631,6 @@ struct iommu_fault_param {
 struct dev_iommu {
 	struct mutex lock;
 	struct iommu_fault_param	*fault_param;
-	struct iopf_device_param	*iopf_param;
 	struct iommu_fwspec		*fwspec;
 	struct iommu_device		*iommu_dev;
 	void				*priv;
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 24b5545352ae..b1cf28055525 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -25,21 +25,6 @@ struct iopf_queue {
 	struct mutex			lock;
 };
 
-/**
- * struct iopf_device_param - IO Page Fault data attached to a device
- * @dev: the device that owns this param
- * @queue: IOPF queue
- * @queue_list: index into queue->devices
- * @partial: faults that are part of a Page Request Group for which the last
- *           request hasn't been submitted yet.
- */
-struct iopf_device_param {
-	struct device			*dev;
-	struct iopf_queue		*queue;
-	struct list_head		queue_list;
-	struct list_head		partial;
-};
-
 struct iopf_fault {
 	struct iommu_fault		fault;
 	struct list_head		list;
@@ -144,7 +129,7 @@ int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
 	int ret;
 	struct iopf_group *group;
 	struct iopf_fault *iopf, *next;
-	struct iopf_device_param *iopf_param;
+	struct iommu_fault_param *iopf_param;
 
 	struct device *dev = cookie;
 	struct dev_iommu *param = dev->iommu;
@@ -159,7 +144,7 @@ int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
 	 * As long as we're holding param->lock, the queue can't be unlinked
 	 * from the device and therefore cannot disappear.
 	 */
-	iopf_param = param->iopf_param;
+	iopf_param = param->fault_param;
 	if (!iopf_param)
 		return -ENODEV;
 
@@ -229,14 +214,14 @@ EXPORT_SYMBOL_GPL(iommu_queue_iopf);
 int iopf_queue_flush_dev(struct device *dev)
 {
 	int ret = 0;
-	struct iopf_device_param *iopf_param;
+	struct iommu_fault_param *iopf_param;
 	struct dev_iommu *param = dev->iommu;
 
 	if (!param)
 		return -ENODEV;
 
 	mutex_lock(&param->lock);
-	iopf_param = param->iopf_param;
+	iopf_param = param->fault_param;
 	if (iopf_param)
 		flush_workqueue(iopf_param->queue->wq);
 	else
@@ -260,7 +245,7 @@ EXPORT_SYMBOL_GPL(iopf_queue_flush_dev);
 int iopf_queue_discard_partial(struct iopf_queue *queue)
 {
 	struct iopf_fault *iopf, *next;
-	struct iopf_device_param *iopf_param;
+	struct iommu_fault_param *iopf_param;
 
 	if (!queue)
 		return -EINVAL;
@@ -287,34 +272,38 @@ EXPORT_SYMBOL_GPL(iopf_queue_discard_partial);
  */
 int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev)
 {
-	int ret = -EBUSY;
-	struct iopf_device_param *iopf_param;
+	int ret = 0;
 	struct dev_iommu *param = dev->iommu;
-
-	if (!param)
-		return -ENODEV;
-
-	iopf_param = kzalloc(sizeof(*iopf_param), GFP_KERNEL);
-	if (!iopf_param)
-		return -ENOMEM;
-
-	INIT_LIST_HEAD(&iopf_param->partial);
-	iopf_param->queue = queue;
-	iopf_param->dev = dev;
+	struct iommu_fault_param *fault_param;
 
 	mutex_lock(&queue->lock);
 	mutex_lock(&param->lock);
-	if (!param->iopf_param) {
-		list_add(&iopf_param->queue_list, &queue->devices);
-		param->iopf_param = iopf_param;
-		ret = 0;
+	if (param->fault_param) {
+		ret = -EBUSY;
+		goto done_unlock;
 	}
+
+	get_device(dev);
+	fault_param = kzalloc(sizeof(*fault_param), GFP_KERNEL);
+	if (!fault_param) {
+		put_device(dev);
+		ret = -ENOMEM;
+		goto done_unlock;
+	}
+
+	mutex_init(&fault_param->lock);
+	INIT_LIST_HEAD(&fault_param->faults);
+	INIT_LIST_HEAD(&fault_param->partial);
+	fault_param->dev = dev;
+	list_add(&fault_param->queue_list, &queue->devices);
+	fault_param->queue = queue;
+
+	param->fault_param = fault_param;
+
+done_unlock:
 	mutex_unlock(&param->lock);
 	mutex_unlock(&queue->lock);
 
-	if (ret)
-		kfree(iopf_param);
-
 	return ret;
 }
 EXPORT_SYMBOL_GPL(iopf_queue_add_device);
@@ -330,34 +319,42 @@ EXPORT_SYMBOL_GPL(iopf_queue_add_device);
  */
 int iopf_queue_remove_device(struct iopf_queue *queue, struct device *dev)
 {
-	int ret = -EINVAL;
+	int ret = 0;
 	struct iopf_fault *iopf, *next;
-	struct iopf_device_param *iopf_param;
 	struct dev_iommu *param = dev->iommu;
-
-	if (!param || !queue)
-		return -EINVAL;
+	struct iommu_fault_param *fault_param = param->fault_param;
 
 	mutex_lock(&queue->lock);
 	mutex_lock(&param->lock);
-	iopf_param = param->iopf_param;
-	if (iopf_param && iopf_param->queue == queue) {
-		list_del(&iopf_param->queue_list);
-		param->iopf_param = NULL;
-		ret = 0;
+	if (!fault_param) {
+		ret = -ENODEV;
+		goto unlock;
 	}
-	mutex_unlock(&param->lock);
-	mutex_unlock(&queue->lock);
-	if (ret)
-		return ret;
+
+	if (fault_param->queue != queue) {
+		ret = -EINVAL;
+		goto unlock;
+	}
+
+	if (!list_empty(&fault_param->faults)) {
+		ret = -EBUSY;
+		goto unlock;
+	}
+
+	list_del(&fault_param->queue_list);
 
 	/* Just in case some faults are still stuck */
-	list_for_each_entry_safe(iopf, next, &iopf_param->partial, list)
+	list_for_each_entry_safe(iopf, next, &fault_param->partial, list)
 		kfree(iopf);
 
-	kfree(iopf_param);
+	param->fault_param = NULL;
+	kfree(fault_param);
+	put_device(dev);
+unlock:
+	mutex_unlock(&param->lock);
+	mutex_unlock(&queue->lock);
 
-	return 0;
+	return ret;
 }
 EXPORT_SYMBOL_GPL(iopf_queue_remove_device);
 
@@ -403,7 +400,7 @@ EXPORT_SYMBOL_GPL(iopf_queue_alloc);
  */
 void iopf_queue_free(struct iopf_queue *queue)
 {
-	struct iopf_device_param *iopf_param, *next;
+	struct iommu_fault_param *iopf_param, *next;
 
 	if (!queue)
 		return;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f24513e2b025..9c9eacfa6761 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1326,27 +1326,18 @@ int iommu_register_device_fault_handler(struct device *dev,
 	struct dev_iommu *param = dev->iommu;
 	int ret = 0;
 
-	if (!param)
+	if (!param || !param->fault_param)
 		return -EINVAL;
 
 	mutex_lock(&param->lock);
 	/* Only allow one fault handler registered for each device */
-	if (param->fault_param) {
+	if (param->fault_param->handler) {
 		ret = -EBUSY;
 		goto done_unlock;
 	}
 
-	get_device(dev);
-	param->fault_param = kzalloc(sizeof(*param->fault_param), GFP_KERNEL);
-	if (!param->fault_param) {
-		put_device(dev);
-		ret = -ENOMEM;
-		goto done_unlock;
-	}
 	param->fault_param->handler = handler;
 	param->fault_param->data = data;
-	mutex_init(&param->fault_param->lock);
-	INIT_LIST_HEAD(&param->fault_param->faults);
 
 done_unlock:
 	mutex_unlock(&param->lock);
@@ -1367,29 +1358,16 @@ EXPORT_SYMBOL_GPL(iommu_register_device_fault_handler);
 int iommu_unregister_device_fault_handler(struct device *dev)
 {
 	struct dev_iommu *param = dev->iommu;
-	int ret = 0;
 
-	if (!param)
+	if (!param || !param->fault_param)
 		return -EINVAL;
 
 	mutex_lock(&param->lock);
-
-	if (!param->fault_param)
-		goto unlock;
-
-	/* we cannot unregister handler if there are pending faults */
-	if (!list_empty(&param->fault_param->faults)) {
-		ret = -EBUSY;
-		goto unlock;
-	}
-
-	kfree(param->fault_param);
-	param->fault_param = NULL;
-	put_device(dev);
-unlock:
+	param->fault_param->handler = NULL;
+	param->fault_param->data = NULL;
 	mutex_unlock(&param->lock);
 
-	return ret;
+	return 0;
 }
 EXPORT_SYMBOL_GPL(iommu_unregister_device_fault_handler);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v7 06/12] iommu: Remove iommu_[un]register_device_fault_handler()
  2023-11-15  3:02 [PATCH v7 00/12] iommu: Prepare to deliver page faults to user space Lu Baolu
                   ` (4 preceding siblings ...)
  2023-11-15  3:02 ` [PATCH v7 05/12] iommu: Merge iopf_device_param into iommu_fault_param Lu Baolu
@ 2023-11-15  3:02 ` Lu Baolu
  2023-12-04 12:36   ` Yi Liu
  2023-11-15  3:02 ` [PATCH v7 07/12] iommu: Merge iommu_fault_event and iopf_fault Lu Baolu
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 49+ messages in thread
From: Lu Baolu @ 2023-11-15  3:02 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian, Jean-Philippe Brucker, Nicolin Chen
  Cc: Yi Liu, Jacob Pan, Yan Zhao, iommu, kvm, linux-kernel, Lu Baolu,
	Jason Gunthorpe

The individual iommu driver reports the iommu page faults by calling
iommu_report_device_fault(), where a pre-registered device fault handler
is called to route the fault to another fault handler installed on the
corresponding iommu domain.

The pre-registered device fault handler is static and won't be dynamic
as the fault handler is eventually per iommu domain. Replace calling
device fault handler with iommu_queue_iopf().

After this replacement, the registering and unregistering fault handler
interfaces are not needed anywhere. Remove the interfaces and the related
data structures to avoid dead code.

Convert cookie parameter of iommu_queue_iopf() into a device pointer that
is really passed.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
---
 include/linux/iommu.h                         | 23 ------
 drivers/iommu/iommu-sva.h                     |  4 +-
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   | 13 +---
 drivers/iommu/intel/iommu.c                   | 24 ++----
 drivers/iommu/io-pgfault.c                    |  6 +-
 drivers/iommu/iommu.c                         | 76 +------------------
 6 files changed, 13 insertions(+), 133 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 108ab50da1ad..a45d92cc31ec 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -128,7 +128,6 @@ struct iommu_page_response {
 
 typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
 			struct device *, unsigned long, int, void *);
-typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault *, void *);
 
 struct iommu_domain_geometry {
 	dma_addr_t aperture_start; /* First address that can be mapped    */
@@ -589,8 +588,6 @@ struct iommu_fault_event {
 
 /**
  * struct iommu_fault_param - per-device IOMMU fault data
- * @handler: Callback function to handle IOMMU faults at device level
- * @data: handler private data
  * @lock: protect pending faults list
  * @dev: the device that owns this param
  * @queue: IOPF queue
@@ -600,8 +597,6 @@ struct iommu_fault_event {
  * @faults: holds the pending faults which needs response
  */
 struct iommu_fault_param {
-	iommu_dev_fault_handler_t handler;
-	void *data;
 	struct mutex lock;
 
 	struct device *dev;
@@ -724,11 +719,6 @@ extern int iommu_group_for_each_dev(struct iommu_group *group, void *data,
 extern struct iommu_group *iommu_group_get(struct device *dev);
 extern struct iommu_group *iommu_group_ref_get(struct iommu_group *group);
 extern void iommu_group_put(struct iommu_group *group);
-extern int iommu_register_device_fault_handler(struct device *dev,
-					iommu_dev_fault_handler_t handler,
-					void *data);
-
-extern int iommu_unregister_device_fault_handler(struct device *dev);
 
 extern int iommu_report_device_fault(struct device *dev,
 				     struct iommu_fault_event *evt);
@@ -1137,19 +1127,6 @@ static inline void iommu_group_put(struct iommu_group *group)
 {
 }
 
-static inline
-int iommu_register_device_fault_handler(struct device *dev,
-					iommu_dev_fault_handler_t handler,
-					void *data)
-{
-	return -ENODEV;
-}
-
-static inline int iommu_unregister_device_fault_handler(struct device *dev)
-{
-	return 0;
-}
-
 static inline
 int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
 {
diff --git a/drivers/iommu/iommu-sva.h b/drivers/iommu/iommu-sva.h
index 54946b5a7caf..de7819c796ce 100644
--- a/drivers/iommu/iommu-sva.h
+++ b/drivers/iommu/iommu-sva.h
@@ -13,7 +13,7 @@ struct iommu_fault;
 struct iopf_queue;
 
 #ifdef CONFIG_IOMMU_SVA
-int iommu_queue_iopf(struct iommu_fault *fault, void *cookie);
+int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev);
 
 int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev);
 int iopf_queue_remove_device(struct iopf_queue *queue,
@@ -26,7 +26,7 @@ enum iommu_page_response_code
 iommu_sva_handle_iopf(struct iommu_fault *fault, void *data);
 
 #else /* CONFIG_IOMMU_SVA */
-static inline int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
+static inline int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev)
 {
 	return -ENODEV;
 }
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index 353248ab18e7..84c9554144cb 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -480,7 +480,6 @@ bool arm_smmu_master_sva_enabled(struct arm_smmu_master *master)
 
 static int arm_smmu_master_sva_enable_iopf(struct arm_smmu_master *master)
 {
-	int ret;
 	struct device *dev = master->dev;
 
 	/*
@@ -493,16 +492,7 @@ static int arm_smmu_master_sva_enable_iopf(struct arm_smmu_master *master)
 	if (!master->iopf_enabled)
 		return -EINVAL;
 
-	ret = iopf_queue_add_device(master->smmu->evtq.iopf, dev);
-	if (ret)
-		return ret;
-
-	ret = iommu_register_device_fault_handler(dev, iommu_queue_iopf, dev);
-	if (ret) {
-		iopf_queue_remove_device(master->smmu->evtq.iopf, dev);
-		return ret;
-	}
-	return 0;
+	return iopf_queue_add_device(master->smmu->evtq.iopf, dev);
 }
 
 static void arm_smmu_master_sva_disable_iopf(struct arm_smmu_master *master)
@@ -512,7 +502,6 @@ static void arm_smmu_master_sva_disable_iopf(struct arm_smmu_master *master)
 	if (!master->iopf_enabled)
 		return;
 
-	iommu_unregister_device_fault_handler(dev);
 	iopf_queue_remove_device(master->smmu->evtq.iopf, dev);
 }
 
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 3531b956556c..cbe65827730d 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4616,23 +4616,15 @@ static int intel_iommu_enable_iopf(struct device *dev)
 	if (ret)
 		return ret;
 
-	ret = iommu_register_device_fault_handler(dev, iommu_queue_iopf, dev);
-	if (ret)
-		goto iopf_remove_device;
-
 	ret = pci_enable_pri(pdev, PRQ_DEPTH);
-	if (ret)
-		goto iopf_unregister_handler;
+	if (ret) {
+		iopf_queue_remove_device(iommu->iopf_queue, dev);
+		return ret;
+	}
+
 	info->pri_enabled = 1;
 
 	return 0;
-
-iopf_unregister_handler:
-	iommu_unregister_device_fault_handler(dev);
-iopf_remove_device:
-	iopf_queue_remove_device(iommu->iopf_queue, dev);
-
-	return ret;
 }
 
 static int intel_iommu_disable_iopf(struct device *dev)
@@ -4655,11 +4647,9 @@ static int intel_iommu_disable_iopf(struct device *dev)
 	info->pri_enabled = 0;
 
 	/*
-	 * With PRI disabled and outstanding PRQs drained, unregistering
-	 * fault handler and removing device from iopf queue should never
-	 * fail.
+	 * With PRI disabled and outstanding PRQs drained, removing device
+	 * from iopf queue should never fail.
 	 */
-	WARN_ON(iommu_unregister_device_fault_handler(dev));
 	WARN_ON(iopf_queue_remove_device(iommu->iopf_queue, dev));
 
 	return 0;
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index b1cf28055525..31832aeacdba 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -87,7 +87,7 @@ static void iopf_handler(struct work_struct *work)
 /**
  * iommu_queue_iopf - IO Page Fault handler
  * @fault: fault event
- * @cookie: struct device, passed to iommu_register_device_fault_handler.
+ * @dev: struct device.
  *
  * Add a fault to the device workqueue, to be handled by mm.
  *
@@ -124,14 +124,12 @@ static void iopf_handler(struct work_struct *work)
  *
  * Return: 0 on success and <0 on error.
  */
-int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
+int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev)
 {
 	int ret;
 	struct iopf_group *group;
 	struct iopf_fault *iopf, *next;
 	struct iommu_fault_param *iopf_param;
-
-	struct device *dev = cookie;
 	struct dev_iommu *param = dev->iommu;
 
 	lockdep_assert_held(&param->lock);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 9c9eacfa6761..0c6700b6659a 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1301,76 +1301,6 @@ void iommu_group_put(struct iommu_group *group)
 }
 EXPORT_SYMBOL_GPL(iommu_group_put);
 
-/**
- * iommu_register_device_fault_handler() - Register a device fault handler
- * @dev: the device
- * @handler: the fault handler
- * @data: private data passed as argument to the handler
- *
- * When an IOMMU fault event is received, this handler gets called with the
- * fault event and data as argument. The handler should return 0 on success. If
- * the fault is recoverable (IOMMU_FAULT_PAGE_REQ), the consumer should also
- * complete the fault by calling iommu_page_response() with one of the following
- * response code:
- * - IOMMU_PAGE_RESP_SUCCESS: retry the translation
- * - IOMMU_PAGE_RESP_INVALID: terminate the fault
- * - IOMMU_PAGE_RESP_FAILURE: terminate the fault and stop reporting
- *   page faults if possible.
- *
- * Return 0 if the fault handler was installed successfully, or an error.
- */
-int iommu_register_device_fault_handler(struct device *dev,
-					iommu_dev_fault_handler_t handler,
-					void *data)
-{
-	struct dev_iommu *param = dev->iommu;
-	int ret = 0;
-
-	if (!param || !param->fault_param)
-		return -EINVAL;
-
-	mutex_lock(&param->lock);
-	/* Only allow one fault handler registered for each device */
-	if (param->fault_param->handler) {
-		ret = -EBUSY;
-		goto done_unlock;
-	}
-
-	param->fault_param->handler = handler;
-	param->fault_param->data = data;
-
-done_unlock:
-	mutex_unlock(&param->lock);
-
-	return ret;
-}
-EXPORT_SYMBOL_GPL(iommu_register_device_fault_handler);
-
-/**
- * iommu_unregister_device_fault_handler() - Unregister the device fault handler
- * @dev: the device
- *
- * Remove the device fault handler installed with
- * iommu_register_device_fault_handler().
- *
- * Return 0 on success, or an error.
- */
-int iommu_unregister_device_fault_handler(struct device *dev)
-{
-	struct dev_iommu *param = dev->iommu;
-
-	if (!param || !param->fault_param)
-		return -EINVAL;
-
-	mutex_lock(&param->lock);
-	param->fault_param->handler = NULL;
-	param->fault_param->data = NULL;
-	mutex_unlock(&param->lock);
-
-	return 0;
-}
-EXPORT_SYMBOL_GPL(iommu_unregister_device_fault_handler);
-
 /**
  * iommu_report_device_fault() - Report fault event to device driver
  * @dev: the device
@@ -1395,10 +1325,6 @@ int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
 	/* we only report device fault if there is a handler registered */
 	mutex_lock(&param->lock);
 	fparam = param->fault_param;
-	if (!fparam || !fparam->handler) {
-		ret = -EINVAL;
-		goto done_unlock;
-	}
 
 	if (evt->fault.type == IOMMU_FAULT_PAGE_REQ &&
 	    (evt->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) {
@@ -1413,7 +1339,7 @@ int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
 		mutex_unlock(&fparam->lock);
 	}
 
-	ret = fparam->handler(&evt->fault, fparam->data);
+	ret = iommu_queue_iopf(&evt->fault, dev);
 	if (ret && evt_pending) {
 		mutex_lock(&fparam->lock);
 		list_del(&evt_pending->list);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v7 07/12] iommu: Merge iommu_fault_event and iopf_fault
  2023-11-15  3:02 [PATCH v7 00/12] iommu: Prepare to deliver page faults to user space Lu Baolu
                   ` (5 preceding siblings ...)
  2023-11-15  3:02 ` [PATCH v7 06/12] iommu: Remove iommu_[un]register_device_fault_handler() Lu Baolu
@ 2023-11-15  3:02 ` Lu Baolu
  2023-12-01 19:09   ` Jason Gunthorpe
  2023-12-04 12:40   ` Yi Liu
  2023-11-15  3:02 ` [PATCH v7 08/12] iommu: Prepare for separating SVA and IOPF Lu Baolu
                   ` (5 subsequent siblings)
  12 siblings, 2 replies; 49+ messages in thread
From: Lu Baolu @ 2023-11-15  3:02 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian, Jean-Philippe Brucker, Nicolin Chen
  Cc: Yi Liu, Jacob Pan, Yan Zhao, iommu, kvm, linux-kernel, Lu Baolu

The iommu_fault_event and iopf_fault data structures store the same
information about an iopf fault. They are also used in the same way.
Merge these two data structures into a single one to make the code
more concise and easier to maintain.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
---
 include/linux/iommu.h                       | 27 ++++++---------------
 drivers/iommu/intel/iommu.h                 |  2 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  4 +--
 drivers/iommu/intel/svm.c                   |  5 ++--
 drivers/iommu/io-pgfault.c                  |  5 ----
 drivers/iommu/iommu.c                       |  8 +++---
 6 files changed, 17 insertions(+), 34 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a45d92cc31ec..42b62bc8737a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -40,7 +40,6 @@ struct iommu_domain_ops;
 struct iommu_dirty_ops;
 struct notifier_block;
 struct iommu_sva;
-struct iommu_fault_event;
 struct iommu_dma_cookie;
 struct iopf_queue;
 
@@ -121,6 +120,11 @@ struct iommu_page_response {
 	u32	code;
 };
 
+struct iopf_fault {
+	struct iommu_fault fault;
+	/* node for pending lists */
+	struct list_head list;
+};
 
 /* iommu fault flags */
 #define IOMMU_FAULT_READ	0x0
@@ -480,7 +484,7 @@ struct iommu_ops {
 	int (*dev_disable_feat)(struct device *dev, enum iommu_dev_features f);
 
 	int (*page_response)(struct device *dev,
-			     struct iommu_fault_event *evt,
+			     struct iopf_fault *evt,
 			     struct iommu_page_response *msg);
 
 	int (*def_domain_type)(struct device *dev);
@@ -572,20 +576,6 @@ struct iommu_device {
 	u32 max_pasids;
 };
 
-/**
- * struct iommu_fault_event - Generic fault event
- *
- * Can represent recoverable faults such as a page requests or
- * unrecoverable faults such as DMA or IRQ remapping faults.
- *
- * @fault: fault descriptor
- * @list: pending fault event list, used for tracking responses
- */
-struct iommu_fault_event {
-	struct iommu_fault fault;
-	struct list_head list;
-};
-
 /**
  * struct iommu_fault_param - per-device IOMMU fault data
  * @lock: protect pending faults list
@@ -720,8 +710,7 @@ extern struct iommu_group *iommu_group_get(struct device *dev);
 extern struct iommu_group *iommu_group_ref_get(struct iommu_group *group);
 extern void iommu_group_put(struct iommu_group *group);
 
-extern int iommu_report_device_fault(struct device *dev,
-				     struct iommu_fault_event *evt);
+extern int iommu_report_device_fault(struct device *dev, struct iopf_fault *evt);
 extern int iommu_page_response(struct device *dev,
 			       struct iommu_page_response *msg);
 
@@ -1128,7 +1117,7 @@ static inline void iommu_group_put(struct iommu_group *group)
 }
 
 static inline
-int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
+int iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
 {
 	return -ENODEV;
 }
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 65d37a138c75..a1ddd5132aae 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -905,7 +905,7 @@ struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *parent,
 void intel_svm_check(struct intel_iommu *iommu);
 int intel_svm_enable_prq(struct intel_iommu *iommu);
 int intel_svm_finish_prq(struct intel_iommu *iommu);
-int intel_svm_page_response(struct device *dev, struct iommu_fault_event *evt,
+int intel_svm_page_response(struct device *dev, struct iopf_fault *evt,
 			    struct iommu_page_response *msg);
 struct iommu_domain *intel_svm_domain_alloc(void);
 void intel_svm_remove_dev_pasid(struct device *dev, ioasid_t pasid);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 505400538a2e..46780793b743 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -922,7 +922,7 @@ static int arm_smmu_cmdq_batch_submit(struct arm_smmu_device *smmu,
 }
 
 static int arm_smmu_page_response(struct device *dev,
-				  struct iommu_fault_event *unused,
+				  struct iopf_fault *unused,
 				  struct iommu_page_response *resp)
 {
 	struct arm_smmu_cmdq_ent cmd = {0};
@@ -1467,7 +1467,7 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
 	struct arm_smmu_master *master;
 	bool ssid_valid = evt[0] & EVTQ_0_SSV;
 	u32 sid = FIELD_GET(EVTQ_0_SID, evt[0]);
-	struct iommu_fault_event fault_evt = { };
+	struct iopf_fault fault_evt = { };
 	struct iommu_fault *flt = &fault_evt.fault;
 
 	switch (FIELD_GET(EVTQ_0_ID, evt[0])) {
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 50a481c895b8..9de349ea215c 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -543,13 +543,12 @@ static int prq_to_iommu_prot(struct page_req_dsc *req)
 static int intel_svm_prq_report(struct intel_iommu *iommu, struct device *dev,
 				struct page_req_dsc *desc)
 {
-	struct iommu_fault_event event;
+	struct iopf_fault event = { };
 
 	if (!dev || !dev_is_pci(dev))
 		return -ENODEV;
 
 	/* Fill in event data for device specific processing */
-	memset(&event, 0, sizeof(struct iommu_fault_event));
 	event.fault.type = IOMMU_FAULT_PAGE_REQ;
 	event.fault.prm.addr = (u64)desc->addr << VTD_PAGE_SHIFT;
 	event.fault.prm.pasid = desc->pasid;
@@ -721,7 +720,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
 }
 
 int intel_svm_page_response(struct device *dev,
-			    struct iommu_fault_event *evt,
+			    struct iopf_fault *evt,
 			    struct iommu_page_response *msg)
 {
 	struct iommu_fault_page_request *prm;
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 31832aeacdba..c45977bb7da3 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -25,11 +25,6 @@ struct iopf_queue {
 	struct mutex			lock;
 };
 
-struct iopf_fault {
-	struct iommu_fault		fault;
-	struct list_head		list;
-};
-
 struct iopf_group {
 	struct iopf_fault		last_fault;
 	struct list_head		faults;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 0c6700b6659a..36b597bb8a09 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1312,10 +1312,10 @@ EXPORT_SYMBOL_GPL(iommu_group_put);
  *
  * Return 0 on success, or an error.
  */
-int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
+int iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
 {
 	struct dev_iommu *param = dev->iommu;
-	struct iommu_fault_event *evt_pending = NULL;
+	struct iopf_fault *evt_pending = NULL;
 	struct iommu_fault_param *fparam;
 	int ret = 0;
 
@@ -1328,7 +1328,7 @@ int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
 
 	if (evt->fault.type == IOMMU_FAULT_PAGE_REQ &&
 	    (evt->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) {
-		evt_pending = kmemdup(evt, sizeof(struct iommu_fault_event),
+		evt_pending = kmemdup(evt, sizeof(struct iopf_fault),
 				      GFP_KERNEL);
 		if (!evt_pending) {
 			ret = -ENOMEM;
@@ -1357,7 +1357,7 @@ int iommu_page_response(struct device *dev,
 {
 	bool needs_pasid;
 	int ret = -EINVAL;
-	struct iommu_fault_event *evt;
+	struct iopf_fault *evt;
 	struct iommu_fault_page_request *prm;
 	struct dev_iommu *param = dev->iommu;
 	const struct iommu_ops *ops = dev_iommu_ops(dev);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v7 08/12] iommu: Prepare for separating SVA and IOPF
  2023-11-15  3:02 [PATCH v7 00/12] iommu: Prepare to deliver page faults to user space Lu Baolu
                   ` (6 preceding siblings ...)
  2023-11-15  3:02 ` [PATCH v7 07/12] iommu: Merge iommu_fault_event and iopf_fault Lu Baolu
@ 2023-11-15  3:02 ` Lu Baolu
  2023-12-05  7:10   ` Yi Liu
  2023-11-15  3:02 ` [PATCH v7 09/12] iommu: Make iommu_queue_iopf() more generic Lu Baolu
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 49+ messages in thread
From: Lu Baolu @ 2023-11-15  3:02 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian, Jean-Philippe Brucker, Nicolin Chen
  Cc: Yi Liu, Jacob Pan, Yan Zhao, iommu, kvm, linux-kernel, Lu Baolu,
	Jason Gunthorpe

Move iopf_group data structure to iommu.h to make it a minimal set of
faults that a domain's page fault handler should handle.

Add a new function, iopf_free_group(), to free a fault group after all
faults in the group are handled. This function will be made global so
that it can be called from other files, such as iommu-sva.c.

Move iopf_queue data structure to iommu.h to allow the workqueue to be
scheduled out of this file.

This will simplify the sequential patches.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
---
 include/linux/iommu.h      | 20 +++++++++++++++++++-
 drivers/iommu/io-pgfault.c | 37 +++++++++++++------------------------
 2 files changed, 32 insertions(+), 25 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 42b62bc8737a..0d3c5a56b078 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -41,7 +41,6 @@ struct iommu_dirty_ops;
 struct notifier_block;
 struct iommu_sva;
 struct iommu_dma_cookie;
-struct iopf_queue;
 
 #define IOMMU_FAULT_PERM_READ	(1 << 0) /* read */
 #define IOMMU_FAULT_PERM_WRITE	(1 << 1) /* write */
@@ -126,6 +125,25 @@ struct iopf_fault {
 	struct list_head list;
 };
 
+struct iopf_group {
+	struct iopf_fault last_fault;
+	struct list_head faults;
+	struct work_struct work;
+	struct device *dev;
+};
+
+/**
+ * struct iopf_queue - IO Page Fault queue
+ * @wq: the fault workqueue
+ * @devices: devices attached to this queue
+ * @lock: protects the device list
+ */
+struct iopf_queue {
+	struct workqueue_struct *wq;
+	struct list_head devices;
+	struct mutex lock;
+};
+
 /* iommu fault flags */
 #define IOMMU_FAULT_READ	0x0
 #define IOMMU_FAULT_WRITE	0x1
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index c45977bb7da3..09e05f483b4f 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -13,24 +13,17 @@
 
 #include "iommu-sva.h"
 
-/**
- * struct iopf_queue - IO Page Fault queue
- * @wq: the fault workqueue
- * @devices: devices attached to this queue
- * @lock: protects the device list
- */
-struct iopf_queue {
-	struct workqueue_struct		*wq;
-	struct list_head		devices;
-	struct mutex			lock;
-};
+static void iopf_free_group(struct iopf_group *group)
+{
+	struct iopf_fault *iopf, *next;
 
-struct iopf_group {
-	struct iopf_fault		last_fault;
-	struct list_head		faults;
-	struct work_struct		work;
-	struct device			*dev;
-};
+	list_for_each_entry_safe(iopf, next, &group->faults, list) {
+		if (!(iopf->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE))
+			kfree(iopf);
+	}
+
+	kfree(group);
+}
 
 static int iopf_complete_group(struct device *dev, struct iopf_fault *iopf,
 			       enum iommu_page_response_code status)
@@ -50,9 +43,9 @@ static int iopf_complete_group(struct device *dev, struct iopf_fault *iopf,
 
 static void iopf_handler(struct work_struct *work)
 {
+	struct iopf_fault *iopf;
 	struct iopf_group *group;
 	struct iommu_domain *domain;
-	struct iopf_fault *iopf, *next;
 	enum iommu_page_response_code status = IOMMU_PAGE_RESP_SUCCESS;
 
 	group = container_of(work, struct iopf_group, work);
@@ -61,7 +54,7 @@ static void iopf_handler(struct work_struct *work)
 	if (!domain || !domain->iopf_handler)
 		status = IOMMU_PAGE_RESP_INVALID;
 
-	list_for_each_entry_safe(iopf, next, &group->faults, list) {
+	list_for_each_entry(iopf, &group->faults, list) {
 		/*
 		 * For the moment, errors are sticky: don't handle subsequent
 		 * faults in the group if there is an error.
@@ -69,14 +62,10 @@ static void iopf_handler(struct work_struct *work)
 		if (status == IOMMU_PAGE_RESP_SUCCESS)
 			status = domain->iopf_handler(&iopf->fault,
 						      domain->fault_data);
-
-		if (!(iopf->fault.prm.flags &
-		      IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE))
-			kfree(iopf);
 	}
 
 	iopf_complete_group(group->dev, &group->last_fault, status);
-	kfree(group);
+	iopf_free_group(group);
 }
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v7 09/12] iommu: Make iommu_queue_iopf() more generic
  2023-11-15  3:02 [PATCH v7 00/12] iommu: Prepare to deliver page faults to user space Lu Baolu
                   ` (7 preceding siblings ...)
  2023-11-15  3:02 ` [PATCH v7 08/12] iommu: Prepare for separating SVA and IOPF Lu Baolu
@ 2023-11-15  3:02 ` Lu Baolu
  2023-12-01 19:14   ` Jason Gunthorpe
  2023-12-05  7:13   ` Yi Liu
  2023-11-15  3:02 ` [PATCH v7 10/12] iommu: Separate SVA and IOPF Lu Baolu
                   ` (3 subsequent siblings)
  12 siblings, 2 replies; 49+ messages in thread
From: Lu Baolu @ 2023-11-15  3:02 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian, Jean-Philippe Brucker, Nicolin Chen
  Cc: Yi Liu, Jacob Pan, Yan Zhao, iommu, kvm, linux-kernel, Lu Baolu

Make iommu_queue_iopf() more generic by making the iopf_group a minimal
set of iopf's that an iopf handler of domain should handle and respond
to. Add domain parameter to struct iopf_group so that the handler can
retrieve and use it directly.

Change iommu_queue_iopf() to forward groups of iopf's to the domain's
iopf handler. This is also a necessary step to decouple the sva iopf
handling code from this interface.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
---
 include/linux/iommu.h      |  4 +--
 drivers/iommu/iommu-sva.h  |  6 ++---
 drivers/iommu/io-pgfault.c | 55 +++++++++++++++++++++++++++++---------
 drivers/iommu/iommu-sva.c  |  3 +--
 4 files changed, 48 insertions(+), 20 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 0d3c5a56b078..96f6f210093e 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -130,6 +130,7 @@ struct iopf_group {
 	struct list_head faults;
 	struct work_struct work;
 	struct device *dev;
+	struct iommu_domain *domain;
 };
 
 /**
@@ -209,8 +210,7 @@ struct iommu_domain {
 	unsigned long pgsize_bitmap;	/* Bitmap of page sizes in use */
 	struct iommu_domain_geometry geometry;
 	struct iommu_dma_cookie *iova_cookie;
-	enum iommu_page_response_code (*iopf_handler)(struct iommu_fault *fault,
-						      void *data);
+	int (*iopf_handler)(struct iopf_group *group);
 	void *fault_data;
 	union {
 		struct {
diff --git a/drivers/iommu/iommu-sva.h b/drivers/iommu/iommu-sva.h
index de7819c796ce..27c8da115b41 100644
--- a/drivers/iommu/iommu-sva.h
+++ b/drivers/iommu/iommu-sva.h
@@ -22,8 +22,7 @@ int iopf_queue_flush_dev(struct device *dev);
 struct iopf_queue *iopf_queue_alloc(const char *name);
 void iopf_queue_free(struct iopf_queue *queue);
 int iopf_queue_discard_partial(struct iopf_queue *queue);
-enum iommu_page_response_code
-iommu_sva_handle_iopf(struct iommu_fault *fault, void *data);
+int iommu_sva_handle_iopf(struct iopf_group *group);
 
 #else /* CONFIG_IOMMU_SVA */
 static inline int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev)
@@ -62,8 +61,7 @@ static inline int iopf_queue_discard_partial(struct iopf_queue *queue)
 	return -ENODEV;
 }
 
-static inline enum iommu_page_response_code
-iommu_sva_handle_iopf(struct iommu_fault *fault, void *data)
+static inline int iommu_sva_handle_iopf(struct iopf_group *group)
 {
 	return IOMMU_PAGE_RESP_INVALID;
 }
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 09e05f483b4f..544653de0d45 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -13,6 +13,9 @@
 
 #include "iommu-sva.h"
 
+enum iommu_page_response_code
+iommu_sva_handle_mm(struct iommu_fault *fault, struct mm_struct *mm);
+
 static void iopf_free_group(struct iopf_group *group)
 {
 	struct iopf_fault *iopf, *next;
@@ -45,23 +48,18 @@ static void iopf_handler(struct work_struct *work)
 {
 	struct iopf_fault *iopf;
 	struct iopf_group *group;
-	struct iommu_domain *domain;
 	enum iommu_page_response_code status = IOMMU_PAGE_RESP_SUCCESS;
 
 	group = container_of(work, struct iopf_group, work);
-	domain = iommu_get_domain_for_dev_pasid(group->dev,
-				group->last_fault.fault.prm.pasid, 0);
-	if (!domain || !domain->iopf_handler)
-		status = IOMMU_PAGE_RESP_INVALID;
-
 	list_for_each_entry(iopf, &group->faults, list) {
 		/*
 		 * For the moment, errors are sticky: don't handle subsequent
 		 * faults in the group if there is an error.
 		 */
-		if (status == IOMMU_PAGE_RESP_SUCCESS)
-			status = domain->iopf_handler(&iopf->fault,
-						      domain->fault_data);
+		if (status != IOMMU_PAGE_RESP_SUCCESS)
+			break;
+
+		status = iommu_sva_handle_mm(&iopf->fault, group->domain->mm);
 	}
 
 	iopf_complete_group(group->dev, &group->last_fault, status);
@@ -113,6 +111,7 @@ int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev)
 	int ret;
 	struct iopf_group *group;
 	struct iopf_fault *iopf, *next;
+	struct iommu_domain *domain = NULL;
 	struct iommu_fault_param *iopf_param;
 	struct dev_iommu *param = dev->iommu;
 
@@ -143,6 +142,23 @@ int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev)
 		return 0;
 	}
 
+	if (fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) {
+		domain = iommu_get_domain_for_dev_pasid(dev, fault->prm.pasid, 0);
+		if (IS_ERR(domain))
+			domain = NULL;
+	}
+
+	if (!domain)
+		domain = iommu_get_domain_for_dev(dev);
+
+	if (!domain || !domain->iopf_handler) {
+		dev_warn_ratelimited(dev,
+			"iopf (pasid %d) without domain attached or handler installed\n",
+			 fault->prm.pasid);
+		ret = -ENODEV;
+		goto cleanup_partial;
+	}
+
 	group = kzalloc(sizeof(*group), GFP_KERNEL);
 	if (!group) {
 		/*
@@ -157,8 +173,8 @@ int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev)
 	group->dev = dev;
 	group->last_fault.fault = *fault;
 	INIT_LIST_HEAD(&group->faults);
+	group->domain = domain;
 	list_add(&group->last_fault.list, &group->faults);
-	INIT_WORK(&group->work, iopf_handler);
 
 	/* See if we have partial faults for this group */
 	list_for_each_entry_safe(iopf, next, &iopf_param->partial, list) {
@@ -167,9 +183,13 @@ int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev)
 			list_move(&iopf->list, &group->faults);
 	}
 
-	queue_work(iopf_param->queue->wq, &group->work);
-	return 0;
+	mutex_unlock(&iopf_param->lock);
+	ret = domain->iopf_handler(group);
+	mutex_lock(&iopf_param->lock);
+	if (ret)
+		iopf_free_group(group);
 
+	return ret;
 cleanup_partial:
 	list_for_each_entry_safe(iopf, next, &iopf_param->partial, list) {
 		if (iopf->fault.prm.grpid == fault->prm.grpid) {
@@ -181,6 +201,17 @@ int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_queue_iopf);
 
+int iommu_sva_handle_iopf(struct iopf_group *group)
+{
+	struct iommu_fault_param *fault_param = group->dev->iommu->fault_param;
+
+	INIT_WORK(&group->work, iopf_handler);
+	if (!queue_work(fault_param->queue->wq, &group->work))
+		return -EBUSY;
+
+	return 0;
+}
+
 /**
  * iopf_queue_flush_dev - Ensure that all queued faults have been processed
  * @dev: the endpoint whose faults need to be flushed.
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index b78671a8a914..ba0d5b7e106a 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -149,11 +149,10 @@ EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
  * I/O page fault handler for SVA
  */
 enum iommu_page_response_code
-iommu_sva_handle_iopf(struct iommu_fault *fault, void *data)
+iommu_sva_handle_mm(struct iommu_fault *fault, struct mm_struct *mm)
 {
 	vm_fault_t ret;
 	struct vm_area_struct *vma;
-	struct mm_struct *mm = data;
 	unsigned int access_flags = 0;
 	unsigned int fault_flags = FAULT_FLAG_REMOTE;
 	struct iommu_fault_page_request *prm = &fault->prm;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v7 10/12] iommu: Separate SVA and IOPF
  2023-11-15  3:02 [PATCH v7 00/12] iommu: Prepare to deliver page faults to user space Lu Baolu
                   ` (8 preceding siblings ...)
  2023-11-15  3:02 ` [PATCH v7 09/12] iommu: Make iommu_queue_iopf() more generic Lu Baolu
@ 2023-11-15  3:02 ` Lu Baolu
  2023-11-15  3:02 ` [PATCH v7 11/12] iommu: Consolidate per-device fault data management Lu Baolu
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 49+ messages in thread
From: Lu Baolu @ 2023-11-15  3:02 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian, Jean-Philippe Brucker, Nicolin Chen
  Cc: Yi Liu, Jacob Pan, Yan Zhao, iommu, kvm, linux-kernel, Lu Baolu,
	Jason Gunthorpe

Add CONFIG_IOMMU_IOPF for page fault handling framework and select it
from its real consumer. Move iopf function declaration from iommu-sva.h
to iommu.h and remove iommu-sva.h as it's empty now.

Consolidate all SVA related code into iommu-sva.c:
- Move iommu_sva_domain_alloc() from iommu.c to iommu-sva.c.
- Move sva iopf handling code from io-pgfault.c to iommu-sva.c.

Consolidate iommu_report_device_fault() and iommu_page_response() into
io-pgfault.c.

Export iopf_free_group() and iopf_group_response() for iopf handlers
implemented in modules. Some functions are renamed with more meaningful
names. No other intentional functionality changes.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
---
 include/linux/iommu.h                         |  98 ++++++---
 drivers/iommu/iommu-sva.h                     |  69 -------
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |   1 -
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |   1 -
 drivers/iommu/intel/iommu.c                   |   1 -
 drivers/iommu/intel/svm.c                     |   1 -
 drivers/iommu/io-pgfault.c                    | 188 +++++++++++++-----
 drivers/iommu/iommu-sva.c                     |  63 +++++-
 drivers/iommu/iommu.c                         | 132 ------------
 drivers/iommu/Kconfig                         |   4 +
 drivers/iommu/Makefile                        |   3 +-
 drivers/iommu/intel/Kconfig                   |   1 +
 12 files changed, 274 insertions(+), 288 deletions(-)
 delete mode 100644 drivers/iommu/iommu-sva.h

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 96f6f210093e..d19031c1b0e6 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -728,10 +728,6 @@ extern struct iommu_group *iommu_group_get(struct device *dev);
 extern struct iommu_group *iommu_group_ref_get(struct iommu_group *group);
 extern void iommu_group_put(struct iommu_group *group);
 
-extern int iommu_report_device_fault(struct device *dev, struct iopf_fault *evt);
-extern int iommu_page_response(struct device *dev,
-			       struct iommu_page_response *msg);
-
 extern int iommu_group_id(struct iommu_group *group);
 extern struct iommu_domain *iommu_group_default_domain(struct iommu_group *);
 
@@ -944,8 +940,6 @@ bool iommu_group_dma_owner_claimed(struct iommu_group *group);
 int iommu_device_claim_dma_owner(struct device *dev, void *owner);
 void iommu_device_release_dma_owner(struct device *dev);
 
-struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
-					    struct mm_struct *mm);
 int iommu_attach_device_pasid(struct iommu_domain *domain,
 			      struct device *dev, ioasid_t pasid);
 void iommu_detach_device_pasid(struct iommu_domain *domain,
@@ -1134,18 +1128,6 @@ static inline void iommu_group_put(struct iommu_group *group)
 {
 }
 
-static inline
-int iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
-{
-	return -ENODEV;
-}
-
-static inline int iommu_page_response(struct device *dev,
-				      struct iommu_page_response *msg)
-{
-	return -ENODEV;
-}
-
 static inline int iommu_group_id(struct iommu_group *group)
 {
 	return -ENODEV;
@@ -1294,12 +1276,6 @@ static inline int iommu_device_claim_dma_owner(struct device *dev, void *owner)
 	return -ENODEV;
 }
 
-static inline struct iommu_domain *
-iommu_sva_domain_alloc(struct device *dev, struct mm_struct *mm)
-{
-	return NULL;
-}
-
 static inline int iommu_attach_device_pasid(struct iommu_domain *domain,
 					    struct device *dev, ioasid_t pasid)
 {
@@ -1421,6 +1397,8 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev,
 					struct mm_struct *mm);
 void iommu_sva_unbind_device(struct iommu_sva *handle);
 u32 iommu_sva_get_pasid(struct iommu_sva *handle);
+struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
+					    struct mm_struct *mm);
 #else
 static inline struct iommu_sva *
 iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
@@ -1439,6 +1417,78 @@ static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle)
 static inline void mm_pasid_init(struct mm_struct *mm) {}
 static inline bool mm_valid_pasid(struct mm_struct *mm) { return false; }
 static inline void mm_pasid_drop(struct mm_struct *mm) {}
+
+static inline struct iommu_domain *
+iommu_sva_domain_alloc(struct device *dev, struct mm_struct *mm)
+{
+	return NULL;
+}
 #endif /* CONFIG_IOMMU_SVA */
 
+#ifdef CONFIG_IOMMU_IOPF
+int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev);
+int iopf_queue_remove_device(struct iopf_queue *queue, struct device *dev);
+int iopf_queue_flush_dev(struct device *dev);
+struct iopf_queue *iopf_queue_alloc(const char *name);
+void iopf_queue_free(struct iopf_queue *queue);
+int iopf_queue_discard_partial(struct iopf_queue *queue);
+void iopf_free_group(struct iopf_group *group);
+int iommu_report_device_fault(struct device *dev, struct iopf_fault *evt);
+int iommu_page_response(struct device *dev, struct iommu_page_response *msg);
+int iopf_group_response(struct iopf_group *group,
+			enum iommu_page_response_code status);
+#else
+static inline int
+iopf_queue_add_device(struct iopf_queue *queue, struct device *dev)
+{
+	return -ENODEV;
+}
+
+static inline int
+iopf_queue_remove_device(struct iopf_queue *queue, struct device *dev)
+{
+	return -ENODEV;
+}
+
+static inline int iopf_queue_flush_dev(struct device *dev)
+{
+	return -ENODEV;
+}
+
+static inline struct iopf_queue *iopf_queue_alloc(const char *name)
+{
+	return NULL;
+}
+
+static inline void iopf_queue_free(struct iopf_queue *queue)
+{
+}
+
+static inline int iopf_queue_discard_partial(struct iopf_queue *queue)
+{
+	return -ENODEV;
+}
+
+static inline void iopf_free_group(struct iopf_group *group)
+{
+}
+
+static inline int
+iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
+{
+	return -ENODEV;
+}
+
+static inline int
+iommu_page_response(struct device *dev, struct iommu_page_response *msg)
+{
+	return -ENODEV;
+}
+
+static inline int iopf_group_response(struct iopf_group *group,
+				      enum iommu_page_response_code status)
+{
+	return -ENODEV;
+}
+#endif /* CONFIG_IOMMU_IOPF */
 #endif /* __LINUX_IOMMU_H */
diff --git a/drivers/iommu/iommu-sva.h b/drivers/iommu/iommu-sva.h
deleted file mode 100644
index 27c8da115b41..000000000000
--- a/drivers/iommu/iommu-sva.h
+++ /dev/null
@@ -1,69 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * SVA library for IOMMU drivers
- */
-#ifndef _IOMMU_SVA_H
-#define _IOMMU_SVA_H
-
-#include <linux/mm_types.h>
-
-/* I/O Page fault */
-struct device;
-struct iommu_fault;
-struct iopf_queue;
-
-#ifdef CONFIG_IOMMU_SVA
-int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev);
-
-int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev);
-int iopf_queue_remove_device(struct iopf_queue *queue,
-			     struct device *dev);
-int iopf_queue_flush_dev(struct device *dev);
-struct iopf_queue *iopf_queue_alloc(const char *name);
-void iopf_queue_free(struct iopf_queue *queue);
-int iopf_queue_discard_partial(struct iopf_queue *queue);
-int iommu_sva_handle_iopf(struct iopf_group *group);
-
-#else /* CONFIG_IOMMU_SVA */
-static inline int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev)
-{
-	return -ENODEV;
-}
-
-static inline int iopf_queue_add_device(struct iopf_queue *queue,
-					struct device *dev)
-{
-	return -ENODEV;
-}
-
-static inline int iopf_queue_remove_device(struct iopf_queue *queue,
-					   struct device *dev)
-{
-	return -ENODEV;
-}
-
-static inline int iopf_queue_flush_dev(struct device *dev)
-{
-	return -ENODEV;
-}
-
-static inline struct iopf_queue *iopf_queue_alloc(const char *name)
-{
-	return NULL;
-}
-
-static inline void iopf_queue_free(struct iopf_queue *queue)
-{
-}
-
-static inline int iopf_queue_discard_partial(struct iopf_queue *queue)
-{
-	return -ENODEV;
-}
-
-static inline int iommu_sva_handle_iopf(struct iopf_group *group)
-{
-	return IOMMU_PAGE_RESP_INVALID;
-}
-#endif /* CONFIG_IOMMU_SVA */
-#endif /* _IOMMU_SVA_H */
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index 84c9554144cb..c8bdbb9ec8de 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -10,7 +10,6 @@
 #include <linux/slab.h>
 
 #include "arm-smmu-v3.h"
-#include "../../iommu-sva.h"
 #include "../../io-pgtable-arm.h"
 
 struct arm_smmu_mmu_notifier {
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 46780793b743..0c15161bc005 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -29,7 +29,6 @@
 
 #include "arm-smmu-v3.h"
 #include "../../dma-iommu.h"
-#include "../../iommu-sva.h"
 
 static bool disable_bypass = true;
 module_param(disable_bypass, bool, 0444);
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index cbe65827730d..05490fe21b16 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -27,7 +27,6 @@
 #include "iommu.h"
 #include "../dma-iommu.h"
 #include "../irq_remapping.h"
-#include "../iommu-sva.h"
 #include "pasid.h"
 #include "cap_audit.h"
 #include "perfmon.h"
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 9de349ea215c..780c5bd73ec2 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -22,7 +22,6 @@
 #include "iommu.h"
 #include "pasid.h"
 #include "perf.h"
-#include "../iommu-sva.h"
 #include "trace.h"
 
 static irqreturn_t prq_event_thread(int irq, void *d);
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 544653de0d45..3c119bfa1d4a 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -11,12 +11,9 @@
 #include <linux/slab.h>
 #include <linux/workqueue.h>
 
-#include "iommu-sva.h"
+#include "iommu-priv.h"
 
-enum iommu_page_response_code
-iommu_sva_handle_mm(struct iommu_fault *fault, struct mm_struct *mm);
-
-static void iopf_free_group(struct iopf_group *group)
+void iopf_free_group(struct iopf_group *group)
 {
 	struct iopf_fault *iopf, *next;
 
@@ -27,47 +24,10 @@ static void iopf_free_group(struct iopf_group *group)
 
 	kfree(group);
 }
-
-static int iopf_complete_group(struct device *dev, struct iopf_fault *iopf,
-			       enum iommu_page_response_code status)
-{
-	struct iommu_page_response resp = {
-		.pasid			= iopf->fault.prm.pasid,
-		.grpid			= iopf->fault.prm.grpid,
-		.code			= status,
-	};
-
-	if ((iopf->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) &&
-	    (iopf->fault.prm.flags & IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID))
-		resp.flags = IOMMU_PAGE_RESP_PASID_VALID;
-
-	return iommu_page_response(dev, &resp);
-}
-
-static void iopf_handler(struct work_struct *work)
-{
-	struct iopf_fault *iopf;
-	struct iopf_group *group;
-	enum iommu_page_response_code status = IOMMU_PAGE_RESP_SUCCESS;
-
-	group = container_of(work, struct iopf_group, work);
-	list_for_each_entry(iopf, &group->faults, list) {
-		/*
-		 * For the moment, errors are sticky: don't handle subsequent
-		 * faults in the group if there is an error.
-		 */
-		if (status != IOMMU_PAGE_RESP_SUCCESS)
-			break;
-
-		status = iommu_sva_handle_mm(&iopf->fault, group->domain->mm);
-	}
-
-	iopf_complete_group(group->dev, &group->last_fault, status);
-	iopf_free_group(group);
-}
+EXPORT_SYMBOL_GPL(iopf_free_group);
 
 /**
- * iommu_queue_iopf - IO Page Fault handler
+ * iommu_handle_iopf - IO Page Fault handler
  * @fault: fault event
  * @dev: struct device.
  *
@@ -106,7 +66,7 @@ static void iopf_handler(struct work_struct *work)
  *
  * Return: 0 on success and <0 on error.
  */
-int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev)
+static int iommu_handle_iopf(struct iommu_fault *fault, struct device *dev)
 {
 	int ret;
 	struct iopf_group *group;
@@ -199,18 +159,117 @@ int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev)
 	}
 	return ret;
 }
-EXPORT_SYMBOL_GPL(iommu_queue_iopf);
 
-int iommu_sva_handle_iopf(struct iopf_group *group)
+/**
+ * iommu_report_device_fault() - Report fault event to device driver
+ * @dev: the device
+ * @evt: fault event data
+ *
+ * Called by IOMMU drivers when a fault is detected, typically in a threaded IRQ
+ * handler. When this function fails and the fault is recoverable, it is the
+ * caller's responsibility to complete the fault.
+ *
+ * Return 0 on success, or an error.
+ */
+int iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
 {
-	struct iommu_fault_param *fault_param = group->dev->iommu->fault_param;
+	struct dev_iommu *param = dev->iommu;
+	struct iopf_fault *evt_pending = NULL;
+	struct iommu_fault_param *fparam;
+	int ret = 0;
 
-	INIT_WORK(&group->work, iopf_handler);
-	if (!queue_work(fault_param->queue->wq, &group->work))
-		return -EBUSY;
+	if (!param || !evt)
+		return -EINVAL;
 
-	return 0;
+	/* we only report device fault if there is a handler registered */
+	mutex_lock(&param->lock);
+	fparam = param->fault_param;
+
+	if (evt->fault.type == IOMMU_FAULT_PAGE_REQ &&
+	    (evt->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) {
+		evt_pending = kmemdup(evt, sizeof(struct iopf_fault),
+				      GFP_KERNEL);
+		if (!evt_pending) {
+			ret = -ENOMEM;
+			goto done_unlock;
+		}
+		mutex_lock(&fparam->lock);
+		list_add_tail(&evt_pending->list, &fparam->faults);
+		mutex_unlock(&fparam->lock);
+	}
+
+	ret = iommu_handle_iopf(&evt->fault, dev);
+	if (ret && evt_pending) {
+		mutex_lock(&fparam->lock);
+		list_del(&evt_pending->list);
+		mutex_unlock(&fparam->lock);
+		kfree(evt_pending);
+	}
+done_unlock:
+	mutex_unlock(&param->lock);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_report_device_fault);
+
+int iommu_page_response(struct device *dev,
+			struct iommu_page_response *msg)
+{
+	bool needs_pasid;
+	int ret = -EINVAL;
+	struct iopf_fault *evt;
+	struct iommu_fault_page_request *prm;
+	struct dev_iommu *param = dev->iommu;
+	const struct iommu_ops *ops = dev_iommu_ops(dev);
+	bool has_pasid = msg->flags & IOMMU_PAGE_RESP_PASID_VALID;
+
+	if (!ops->page_response)
+		return -ENODEV;
+
+	if (!param || !param->fault_param)
+		return -EINVAL;
+
+	/* Only send response if there is a fault report pending */
+	mutex_lock(&param->fault_param->lock);
+	if (list_empty(&param->fault_param->faults)) {
+		dev_warn_ratelimited(dev, "no pending PRQ, drop response\n");
+		goto done_unlock;
+	}
+	/*
+	 * Check if we have a matching page request pending to respond,
+	 * otherwise return -EINVAL
+	 */
+	list_for_each_entry(evt, &param->fault_param->faults, list) {
+		prm = &evt->fault.prm;
+		if (prm->grpid != msg->grpid)
+			continue;
+
+		/*
+		 * If the PASID is required, the corresponding request is
+		 * matched using the group ID, the PASID valid bit and the PASID
+		 * value. Otherwise only the group ID matches request and
+		 * response.
+		 */
+		needs_pasid = prm->flags & IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID;
+		if (needs_pasid && (!has_pasid || msg->pasid != prm->pasid))
+			continue;
+
+		if (!needs_pasid && has_pasid) {
+			/* No big deal, just clear it. */
+			msg->flags &= ~IOMMU_PAGE_RESP_PASID_VALID;
+			msg->pasid = 0;
+		}
+
+		ret = ops->page_response(dev, evt, msg);
+		list_del(&evt->list);
+		kfree(evt);
+		break;
+	}
+
+done_unlock:
+	mutex_unlock(&param->fault_param->lock);
+	return ret;
 }
+EXPORT_SYMBOL_GPL(iommu_page_response);
 
 /**
  * iopf_queue_flush_dev - Ensure that all queued faults have been processed
@@ -245,6 +304,31 @@ int iopf_queue_flush_dev(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iopf_queue_flush_dev);
 
+/**
+ * iopf_group_response - Respond a group of page faults
+ * @group: the group of faults with the same group id
+ * @status: the response code
+ *
+ * Return 0 on success and <0 on error.
+ */
+int iopf_group_response(struct iopf_group *group,
+			enum iommu_page_response_code status)
+{
+	struct iopf_fault *iopf = &group->last_fault;
+	struct iommu_page_response resp = {
+		.pasid = iopf->fault.prm.pasid,
+		.grpid = iopf->fault.prm.grpid,
+		.code = status,
+	};
+
+	if ((iopf->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) &&
+	    (iopf->fault.prm.flags & IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID))
+		resp.flags = IOMMU_PAGE_RESP_PASID_VALID;
+
+	return iommu_page_response(group->dev, &resp);
+}
+EXPORT_SYMBOL_GPL(iopf_group_response);
+
 /**
  * iopf_queue_discard_partial - Remove all pending partial fault
  * @queue: the queue whose partial faults need to be discarded
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index ba0d5b7e106a..f3fbafb50159 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -7,7 +7,7 @@
 #include <linux/sched/mm.h>
 #include <linux/iommu.h>
 
-#include "iommu-sva.h"
+#include "iommu-priv.h"
 
 static DEFINE_MUTEX(iommu_sva_lock);
 
@@ -145,10 +145,18 @@ u32 iommu_sva_get_pasid(struct iommu_sva *handle)
 }
 EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
 
+void mm_pasid_drop(struct mm_struct *mm)
+{
+	if (likely(!mm_valid_pasid(mm)))
+		return;
+
+	iommu_free_global_pasid(mm->pasid);
+}
+
 /*
  * I/O page fault handler for SVA
  */
-enum iommu_page_response_code
+static enum iommu_page_response_code
 iommu_sva_handle_mm(struct iommu_fault *fault, struct mm_struct *mm)
 {
 	vm_fault_t ret;
@@ -202,10 +210,53 @@ iommu_sva_handle_mm(struct iommu_fault *fault, struct mm_struct *mm)
 	return status;
 }
 
-void mm_pasid_drop(struct mm_struct *mm)
+static void iommu_sva_handle_iopf(struct work_struct *work)
 {
-	if (likely(!mm_valid_pasid(mm)))
-		return;
+	struct iopf_fault *iopf;
+	struct iopf_group *group;
+	enum iommu_page_response_code status = IOMMU_PAGE_RESP_SUCCESS;
 
-	iommu_free_global_pasid(mm->pasid);
+	group = container_of(work, struct iopf_group, work);
+	list_for_each_entry(iopf, &group->faults, list) {
+		/*
+		 * For the moment, errors are sticky: don't handle subsequent
+		 * faults in the group if there is an error.
+		 */
+		if (status != IOMMU_PAGE_RESP_SUCCESS)
+			break;
+
+		status = iommu_sva_handle_mm(&iopf->fault, group->domain->mm);
+	}
+
+	iopf_group_response(group, status);
+	iopf_free_group(group);
+}
+
+static int iommu_sva_iopf_handler(struct iopf_group *group)
+{
+	struct iommu_fault_param *fault_param = group->dev->iommu->fault_param;
+
+	INIT_WORK(&group->work, iommu_sva_handle_iopf);
+	if (!queue_work(fault_param->queue->wq, &group->work))
+		return -EBUSY;
+
+	return 0;
+}
+
+struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
+					    struct mm_struct *mm)
+{
+	const struct iommu_ops *ops = dev_iommu_ops(dev);
+	struct iommu_domain *domain;
+
+	domain = ops->domain_alloc(IOMMU_DOMAIN_SVA);
+	if (!domain)
+		return NULL;
+
+	domain->type = IOMMU_DOMAIN_SVA;
+	mmgrab(mm);
+	domain->mm = mm;
+	domain->iopf_handler = iommu_sva_iopf_handler;
+
+	return domain;
 }
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 36b597bb8a09..47ee2b891982 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -36,8 +36,6 @@
 #include "dma-iommu.h"
 #include "iommu-priv.h"
 
-#include "iommu-sva.h"
-
 static struct kset *iommu_group_kset;
 static DEFINE_IDA(iommu_group_ida);
 static DEFINE_IDA(iommu_global_pasid_ida);
@@ -1301,117 +1299,6 @@ void iommu_group_put(struct iommu_group *group)
 }
 EXPORT_SYMBOL_GPL(iommu_group_put);
 
-/**
- * iommu_report_device_fault() - Report fault event to device driver
- * @dev: the device
- * @evt: fault event data
- *
- * Called by IOMMU drivers when a fault is detected, typically in a threaded IRQ
- * handler. When this function fails and the fault is recoverable, it is the
- * caller's responsibility to complete the fault.
- *
- * Return 0 on success, or an error.
- */
-int iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
-{
-	struct dev_iommu *param = dev->iommu;
-	struct iopf_fault *evt_pending = NULL;
-	struct iommu_fault_param *fparam;
-	int ret = 0;
-
-	if (!param || !evt)
-		return -EINVAL;
-
-	/* we only report device fault if there is a handler registered */
-	mutex_lock(&param->lock);
-	fparam = param->fault_param;
-
-	if (evt->fault.type == IOMMU_FAULT_PAGE_REQ &&
-	    (evt->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) {
-		evt_pending = kmemdup(evt, sizeof(struct iopf_fault),
-				      GFP_KERNEL);
-		if (!evt_pending) {
-			ret = -ENOMEM;
-			goto done_unlock;
-		}
-		mutex_lock(&fparam->lock);
-		list_add_tail(&evt_pending->list, &fparam->faults);
-		mutex_unlock(&fparam->lock);
-	}
-
-	ret = iommu_queue_iopf(&evt->fault, dev);
-	if (ret && evt_pending) {
-		mutex_lock(&fparam->lock);
-		list_del(&evt_pending->list);
-		mutex_unlock(&fparam->lock);
-		kfree(evt_pending);
-	}
-done_unlock:
-	mutex_unlock(&param->lock);
-	return ret;
-}
-EXPORT_SYMBOL_GPL(iommu_report_device_fault);
-
-int iommu_page_response(struct device *dev,
-			struct iommu_page_response *msg)
-{
-	bool needs_pasid;
-	int ret = -EINVAL;
-	struct iopf_fault *evt;
-	struct iommu_fault_page_request *prm;
-	struct dev_iommu *param = dev->iommu;
-	const struct iommu_ops *ops = dev_iommu_ops(dev);
-	bool has_pasid = msg->flags & IOMMU_PAGE_RESP_PASID_VALID;
-
-	if (!ops->page_response)
-		return -ENODEV;
-
-	if (!param || !param->fault_param)
-		return -EINVAL;
-
-	/* Only send response if there is a fault report pending */
-	mutex_lock(&param->fault_param->lock);
-	if (list_empty(&param->fault_param->faults)) {
-		dev_warn_ratelimited(dev, "no pending PRQ, drop response\n");
-		goto done_unlock;
-	}
-	/*
-	 * Check if we have a matching page request pending to respond,
-	 * otherwise return -EINVAL
-	 */
-	list_for_each_entry(evt, &param->fault_param->faults, list) {
-		prm = &evt->fault.prm;
-		if (prm->grpid != msg->grpid)
-			continue;
-
-		/*
-		 * If the PASID is required, the corresponding request is
-		 * matched using the group ID, the PASID valid bit and the PASID
-		 * value. Otherwise only the group ID matches request and
-		 * response.
-		 */
-		needs_pasid = prm->flags & IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID;
-		if (needs_pasid && (!has_pasid || msg->pasid != prm->pasid))
-			continue;
-
-		if (!needs_pasid && has_pasid) {
-			/* No big deal, just clear it. */
-			msg->flags &= ~IOMMU_PAGE_RESP_PASID_VALID;
-			msg->pasid = 0;
-		}
-
-		ret = ops->page_response(dev, evt, msg);
-		list_del(&evt->list);
-		kfree(evt);
-		break;
-	}
-
-done_unlock:
-	mutex_unlock(&param->fault_param->lock);
-	return ret;
-}
-EXPORT_SYMBOL_GPL(iommu_page_response);
-
 /**
  * iommu_group_id - Return ID for a group
  * @group: the group to ID
@@ -3437,25 +3324,6 @@ struct iommu_domain *iommu_get_domain_for_dev_pasid(struct device *dev,
 }
 EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev_pasid);
 
-struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
-					    struct mm_struct *mm)
-{
-	const struct iommu_ops *ops = dev_iommu_ops(dev);
-	struct iommu_domain *domain;
-
-	domain = ops->domain_alloc(IOMMU_DOMAIN_SVA);
-	if (!domain)
-		return NULL;
-
-	domain->type = IOMMU_DOMAIN_SVA;
-	mmgrab(mm);
-	domain->mm = mm;
-	domain->iopf_handler = iommu_sva_handle_iopf;
-	domain->fault_data = mm;
-
-	return domain;
-}
-
 ioasid_t iommu_alloc_global_pasid(struct device *dev)
 {
 	int ret;
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 7673bb82945b..4d6291664ca6 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -162,6 +162,9 @@ config IOMMU_DMA
 config IOMMU_SVA
 	bool
 
+config IOMMU_IOPF
+	bool
+
 config FSL_PAMU
 	bool "Freescale IOMMU support"
 	depends on PCI
@@ -397,6 +400,7 @@ config ARM_SMMU_V3_SVA
 	bool "Shared Virtual Addressing support for the ARM SMMUv3"
 	depends on ARM_SMMU_V3
 	select IOMMU_SVA
+	select IOMMU_IOPF
 	select MMU_NOTIFIER
 	help
 	  Support for sharing process address spaces with devices using the
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 95ad9dbfbda0..542760d963ec 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -26,6 +26,7 @@ obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
 obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
 obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
 obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
-obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o io-pgfault.o
+obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
+obj-$(CONFIG_IOMMU_IOPF) += io-pgfault.o
 obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o
 obj-$(CONFIG_APPLE_DART) += apple-dart.o
diff --git a/drivers/iommu/intel/Kconfig b/drivers/iommu/intel/Kconfig
index 012cd2541a68..a4a125666293 100644
--- a/drivers/iommu/intel/Kconfig
+++ b/drivers/iommu/intel/Kconfig
@@ -51,6 +51,7 @@ config INTEL_IOMMU_SVM
 	depends on X86_64
 	select MMU_NOTIFIER
 	select IOMMU_SVA
+	select IOMMU_IOPF
 	help
 	  Shared Virtual Memory (SVM) provides a facility for devices
 	  to access DMA resources through process address space by
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v7 11/12] iommu: Consolidate per-device fault data management
  2023-11-15  3:02 [PATCH v7 00/12] iommu: Prepare to deliver page faults to user space Lu Baolu
                   ` (9 preceding siblings ...)
  2023-11-15  3:02 ` [PATCH v7 10/12] iommu: Separate SVA and IOPF Lu Baolu
@ 2023-11-15  3:02 ` Lu Baolu
  2023-12-01 19:46   ` Jason Gunthorpe
  2023-11-15  3:02 ` [PATCH v7 12/12] iommu: Improve iopf_queue_flush_dev() Lu Baolu
  2023-11-24  6:30 ` [PATCH v7 00/12] iommu: Prepare to deliver page faults to user space liulongfang
  12 siblings, 1 reply; 49+ messages in thread
From: Lu Baolu @ 2023-11-15  3:02 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian, Jean-Philippe Brucker, Nicolin Chen
  Cc: Yi Liu, Jacob Pan, Yan Zhao, iommu, kvm, linux-kernel, Lu Baolu

The per-device fault data is a data structure that is used to store
information about faults that occur on a device. This data is allocated
when IOPF is enabled on the device and freed when IOPF is disabled. The
data is used in the paths of iopf reporting, handling, responding, and
draining.

The fault data is protected by two locks:

- dev->iommu->lock: This lock is used to protect the allocation and
  freeing of the fault data.
- dev->iommu->fault_parameter->lock: This lock is used to protect the
  fault data itself.

To make the code simpler and easier to maintain, consolidate the lock
mechanism into two helper functions.

The dev->iommu->fault_parameter->lock lock is also used in
iopf_queue_discard_partial() to improve code readability. This does not
fix any existing issues, as iopf_queue_discard_partial() is only used in
the VT-d driver's prq_event_thread(), which is a single-threaded path that
reports the IOPFs.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
---
 include/linux/iommu.h      |   3 +
 drivers/iommu/io-pgfault.c | 122 +++++++++++++++++++++++--------------
 2 files changed, 79 insertions(+), 46 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d19031c1b0e6..c17d5979d70d 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -597,6 +597,8 @@ struct iommu_device {
 /**
  * struct iommu_fault_param - per-device IOMMU fault data
  * @lock: protect pending faults list
+ * @users: user counter to manage the lifetime of the data, this field
+ *         is protected by dev->iommu->lock.
  * @dev: the device that owns this param
  * @queue: IOPF queue
  * @queue_list: index into queue->devices
@@ -606,6 +608,7 @@ struct iommu_device {
  */
 struct iommu_fault_param {
 	struct mutex lock;
+	int users;
 
 	struct device *dev;
 	struct iopf_queue *queue;
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 3c119bfa1d4a..b80574323cbc 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -26,6 +26,49 @@ void iopf_free_group(struct iopf_group *group)
 }
 EXPORT_SYMBOL_GPL(iopf_free_group);
 
+/*
+ * Return the fault parameter of a device if it exists. Otherwise, return NULL.
+ * On a successful return, the caller takes a reference of this parameter and
+ * should put it after use by calling iopf_put_dev_fault_param().
+ */
+static struct iommu_fault_param *iopf_get_dev_fault_param(struct device *dev)
+{
+	struct dev_iommu *param = dev->iommu;
+	struct iommu_fault_param *fault_param;
+
+	if (!param)
+		return NULL;
+
+	mutex_lock(&param->lock);
+	fault_param = param->fault_param;
+	if (fault_param)
+		fault_param->users++;
+	mutex_unlock(&param->lock);
+
+	return fault_param;
+}
+
+/* Caller must hold a reference of the fault parameter. */
+static void iopf_put_dev_fault_param(struct iommu_fault_param *fault_param)
+{
+	struct device *dev = fault_param->dev;
+	struct dev_iommu *param = dev->iommu;
+
+	mutex_lock(&param->lock);
+	if (WARN_ON(fault_param->users <= 0 ||
+		    fault_param != param->fault_param)) {
+		mutex_unlock(&param->lock);
+		return;
+	}
+
+	if (--fault_param->users == 0) {
+		param->fault_param = NULL;
+		kfree(fault_param);
+		put_device(dev);
+	}
+	mutex_unlock(&param->lock);
+}
+
 /**
  * iommu_handle_iopf - IO Page Fault handler
  * @fault: fault event
@@ -72,23 +115,14 @@ static int iommu_handle_iopf(struct iommu_fault *fault, struct device *dev)
 	struct iopf_group *group;
 	struct iopf_fault *iopf, *next;
 	struct iommu_domain *domain = NULL;
-	struct iommu_fault_param *iopf_param;
-	struct dev_iommu *param = dev->iommu;
+	struct iommu_fault_param *iopf_param = dev->iommu->fault_param;
 
-	lockdep_assert_held(&param->lock);
+	lockdep_assert_held(&iopf_param->lock);
 
 	if (fault->type != IOMMU_FAULT_PAGE_REQ)
 		/* Not a recoverable page fault */
 		return -EOPNOTSUPP;
 
-	/*
-	 * As long as we're holding param->lock, the queue can't be unlinked
-	 * from the device and therefore cannot disappear.
-	 */
-	iopf_param = param->fault_param;
-	if (!iopf_param)
-		return -ENODEV;
-
 	if (!(fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) {
 		iopf = kzalloc(sizeof(*iopf), GFP_KERNEL);
 		if (!iopf)
@@ -173,18 +207,15 @@ static int iommu_handle_iopf(struct iommu_fault *fault, struct device *dev)
  */
 int iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
 {
-	struct dev_iommu *param = dev->iommu;
+	struct iommu_fault_param *fault_param;
 	struct iopf_fault *evt_pending = NULL;
-	struct iommu_fault_param *fparam;
 	int ret = 0;
 
-	if (!param || !evt)
+	fault_param = iopf_get_dev_fault_param(dev);
+	if (!fault_param)
 		return -EINVAL;
 
-	/* we only report device fault if there is a handler registered */
-	mutex_lock(&param->lock);
-	fparam = param->fault_param;
-
+	mutex_lock(&fault_param->lock);
 	if (evt->fault.type == IOMMU_FAULT_PAGE_REQ &&
 	    (evt->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) {
 		evt_pending = kmemdup(evt, sizeof(struct iopf_fault),
@@ -193,20 +224,18 @@ int iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
 			ret = -ENOMEM;
 			goto done_unlock;
 		}
-		mutex_lock(&fparam->lock);
-		list_add_tail(&evt_pending->list, &fparam->faults);
-		mutex_unlock(&fparam->lock);
+		list_add_tail(&evt_pending->list, &fault_param->faults);
 	}
 
 	ret = iommu_handle_iopf(&evt->fault, dev);
 	if (ret && evt_pending) {
-		mutex_lock(&fparam->lock);
 		list_del(&evt_pending->list);
-		mutex_unlock(&fparam->lock);
 		kfree(evt_pending);
 	}
 done_unlock:
-	mutex_unlock(&param->lock);
+	mutex_unlock(&fault_param->lock);
+	iopf_put_dev_fault_param(fault_param);
+
 	return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_report_device_fault);
@@ -218,19 +247,20 @@ int iommu_page_response(struct device *dev,
 	int ret = -EINVAL;
 	struct iopf_fault *evt;
 	struct iommu_fault_page_request *prm;
-	struct dev_iommu *param = dev->iommu;
+	struct iommu_fault_param *fault_param;
 	const struct iommu_ops *ops = dev_iommu_ops(dev);
 	bool has_pasid = msg->flags & IOMMU_PAGE_RESP_PASID_VALID;
 
 	if (!ops->page_response)
 		return -ENODEV;
 
-	if (!param || !param->fault_param)
-		return -EINVAL;
+	fault_param = iopf_get_dev_fault_param(dev);
+	if (!fault_param)
+		return -ENODEV;
 
 	/* Only send response if there is a fault report pending */
-	mutex_lock(&param->fault_param->lock);
-	if (list_empty(&param->fault_param->faults)) {
+	mutex_lock(&fault_param->lock);
+	if (list_empty(&fault_param->faults)) {
 		dev_warn_ratelimited(dev, "no pending PRQ, drop response\n");
 		goto done_unlock;
 	}
@@ -238,7 +268,7 @@ int iommu_page_response(struct device *dev,
 	 * Check if we have a matching page request pending to respond,
 	 * otherwise return -EINVAL
 	 */
-	list_for_each_entry(evt, &param->fault_param->faults, list) {
+	list_for_each_entry(evt, &fault_param->faults, list) {
 		prm = &evt->fault.prm;
 		if (prm->grpid != msg->grpid)
 			continue;
@@ -266,7 +296,9 @@ int iommu_page_response(struct device *dev,
 	}
 
 done_unlock:
-	mutex_unlock(&param->fault_param->lock);
+	mutex_unlock(&fault_param->lock);
+	iopf_put_dev_fault_param(fault_param);
+
 	return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_page_response);
@@ -285,22 +317,15 @@ EXPORT_SYMBOL_GPL(iommu_page_response);
  */
 int iopf_queue_flush_dev(struct device *dev)
 {
-	int ret = 0;
-	struct iommu_fault_param *iopf_param;
-	struct dev_iommu *param = dev->iommu;
+	struct iommu_fault_param *iopf_param = iopf_get_dev_fault_param(dev);
 
-	if (!param)
+	if (!iopf_param)
 		return -ENODEV;
 
-	mutex_lock(&param->lock);
-	iopf_param = param->fault_param;
-	if (iopf_param)
-		flush_workqueue(iopf_param->queue->wq);
-	else
-		ret = -ENODEV;
-	mutex_unlock(&param->lock);
+	flush_workqueue(iopf_param->queue->wq);
+	iopf_put_dev_fault_param(iopf_param);
 
-	return ret;
+	return 0;
 }
 EXPORT_SYMBOL_GPL(iopf_queue_flush_dev);
 
@@ -349,11 +374,13 @@ int iopf_queue_discard_partial(struct iopf_queue *queue)
 
 	mutex_lock(&queue->lock);
 	list_for_each_entry(iopf_param, &queue->devices, queue_list) {
+		mutex_lock(&iopf_param->lock);
 		list_for_each_entry_safe(iopf, next, &iopf_param->partial,
 					 list) {
 			list_del(&iopf->list);
 			kfree(iopf);
 		}
+		mutex_unlock(&iopf_param->lock);
 	}
 	mutex_unlock(&queue->lock);
 	return 0;
@@ -392,6 +419,7 @@ int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev)
 	INIT_LIST_HEAD(&fault_param->faults);
 	INIT_LIST_HEAD(&fault_param->partial);
 	fault_param->dev = dev;
+	fault_param->users = 1;
 	list_add(&fault_param->queue_list, &queue->devices);
 	fault_param->queue = queue;
 
@@ -444,9 +472,11 @@ int iopf_queue_remove_device(struct iopf_queue *queue, struct device *dev)
 	list_for_each_entry_safe(iopf, next, &fault_param->partial, list)
 		kfree(iopf);
 
-	param->fault_param = NULL;
-	kfree(fault_param);
-	put_device(dev);
+	if (--fault_param->users == 0) {
+		param->fault_param = NULL;
+		kfree(fault_param);
+		put_device(dev);
+	}
 unlock:
 	mutex_unlock(&param->lock);
 	mutex_unlock(&queue->lock);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v7 12/12] iommu: Improve iopf_queue_flush_dev()
  2023-11-15  3:02 [PATCH v7 00/12] iommu: Prepare to deliver page faults to user space Lu Baolu
                   ` (10 preceding siblings ...)
  2023-11-15  3:02 ` [PATCH v7 11/12] iommu: Consolidate per-device fault data management Lu Baolu
@ 2023-11-15  3:02 ` Lu Baolu
  2023-12-01 20:35   ` Jason Gunthorpe
  2023-11-24  6:30 ` [PATCH v7 00/12] iommu: Prepare to deliver page faults to user space liulongfang
  12 siblings, 1 reply; 49+ messages in thread
From: Lu Baolu @ 2023-11-15  3:02 UTC (permalink / raw)
  To: Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian, Jean-Philippe Brucker, Nicolin Chen
  Cc: Yi Liu, Jacob Pan, Yan Zhao, iommu, kvm, linux-kernel, Lu Baolu

The iopf_queue_flush_dev() is called by the iommu driver before releasing
a PASID. It ensures that all pending faults for this PASID have been
handled or cancelled, and won't hit the address space that reuses this
PASID. The driver must make sure that no new fault is added to the queue.

The SMMUv3 driver doesn't use it because it only implements the
Arm-specific stall fault model where DMA transactions are held in the SMMU
while waiting for the OS to handle iopf's. Since a device driver must
complete all DMA transactions before detaching domain, there are no
pending iopf's with the stall model. PRI support requires adding a call to
iopf_queue_flush_dev() after flushing the hardware page fault queue.

The current implementation of iopf_queue_flush_dev() is a simplified
version. It is only suitable for SVA case in which the processing of iopf
is implemented in the inner loop of the iommu subsystem.

Improve this interface to make it also work for handling iopf out of the
iommu core. Rename the function with a more meaningful name. Remove a
warning message in iommu_page_response() since the iopf queue might get
flushed before possible pending responses.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
---
 include/linux/iommu.h      |  4 +--
 drivers/iommu/intel/svm.c  |  2 +-
 drivers/iommu/io-pgfault.c | 60 ++++++++++++++++++++++++++++++--------
 3 files changed, 51 insertions(+), 15 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index c17d5979d70d..cd3cdeb69f49 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -1431,7 +1431,7 @@ iommu_sva_domain_alloc(struct device *dev, struct mm_struct *mm)
 #ifdef CONFIG_IOMMU_IOPF
 int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev);
 int iopf_queue_remove_device(struct iopf_queue *queue, struct device *dev);
-int iopf_queue_flush_dev(struct device *dev);
+int iopf_queue_discard_dev_pasid(struct device *dev, ioasid_t pasid);
 struct iopf_queue *iopf_queue_alloc(const char *name);
 void iopf_queue_free(struct iopf_queue *queue);
 int iopf_queue_discard_partial(struct iopf_queue *queue);
@@ -1453,7 +1453,7 @@ iopf_queue_remove_device(struct iopf_queue *queue, struct device *dev)
 	return -ENODEV;
 }
 
-static inline int iopf_queue_flush_dev(struct device *dev)
+static inline int iopf_queue_discard_dev_pasid(struct device *dev, ioasid_t pasid)
 {
 	return -ENODEV;
 }
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 780c5bd73ec2..659de9c16024 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -495,7 +495,7 @@ void intel_drain_pasid_prq(struct device *dev, u32 pasid)
 		goto prq_retry;
 	}
 
-	iopf_queue_flush_dev(dev);
+	iopf_queue_discard_dev_pasid(dev, pasid);
 
 	/*
 	 * Perform steps described in VT-d spec CH7.10 to drain page
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index b80574323cbc..b288c73f2b22 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -260,10 +260,9 @@ int iommu_page_response(struct device *dev,
 
 	/* Only send response if there is a fault report pending */
 	mutex_lock(&fault_param->lock);
-	if (list_empty(&fault_param->faults)) {
-		dev_warn_ratelimited(dev, "no pending PRQ, drop response\n");
+	if (list_empty(&fault_param->faults))
 		goto done_unlock;
-	}
+
 	/*
 	 * Check if we have a matching page request pending to respond,
 	 * otherwise return -EINVAL
@@ -304,30 +303,67 @@ int iommu_page_response(struct device *dev,
 EXPORT_SYMBOL_GPL(iommu_page_response);
 
 /**
- * iopf_queue_flush_dev - Ensure that all queued faults have been processed
- * @dev: the endpoint whose faults need to be flushed.
+ * iopf_queue_discard_dev_pasid - Discard all pending faults for a PASID
+ * @dev: the endpoint whose faults need to be discarded.
+ * @pasid: the PASID of the endpoint.
  *
  * The IOMMU driver calls this before releasing a PASID, to ensure that all
- * pending faults for this PASID have been handled, and won't hit the address
- * space of the next process that uses this PASID. The driver must make sure
- * that no new fault is added to the queue. In particular it must flush its
- * low-level queue before calling this function.
+ * pending faults for this PASID have been handled or dropped, and won't hit
+ * the address space of the next process that uses this PASID. The driver
+ * must make sure that no new fault is added to the queue. In particular it
+ * must flush its low-level queue before calling this function.
  *
  * Return: 0 on success and <0 on error.
  */
-int iopf_queue_flush_dev(struct device *dev)
+int iopf_queue_discard_dev_pasid(struct device *dev, ioasid_t pasid)
 {
 	struct iommu_fault_param *iopf_param = iopf_get_dev_fault_param(dev);
+	const struct iommu_ops *ops = dev_iommu_ops(dev);
+	struct iommu_page_response resp;
+	struct iopf_fault *iopf, *next;
+	int ret = 0;
 
 	if (!iopf_param)
 		return -ENODEV;
 
 	flush_workqueue(iopf_param->queue->wq);
+
+	mutex_lock(&iopf_param->lock);
+	list_for_each_entry_safe(iopf, next, &iopf_param->partial, list) {
+		if (!(iopf->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) ||
+		    iopf->fault.prm.pasid != pasid)
+			break;
+
+		list_del(&iopf->list);
+		kfree(iopf);
+	}
+
+	list_for_each_entry_safe(iopf, next, &iopf_param->faults, list) {
+		if (!(iopf->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) ||
+		    iopf->fault.prm.pasid != pasid)
+			continue;
+
+		memset(&resp, 0, sizeof(struct iommu_page_response));
+		resp.pasid = iopf->fault.prm.pasid;
+		resp.grpid = iopf->fault.prm.grpid;
+		resp.code = IOMMU_PAGE_RESP_INVALID;
+
+		if (iopf->fault.prm.flags & IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID)
+			resp.flags = IOMMU_PAGE_RESP_PASID_VALID;
+
+		ret = ops->page_response(dev, iopf, &resp);
+		if (ret)
+			break;
+
+		list_del(&iopf->list);
+		kfree(iopf);
+	}
+	mutex_unlock(&iopf_param->lock);
 	iopf_put_dev_fault_param(iopf_param);
 
-	return 0;
+	return ret;
 }
-EXPORT_SYMBOL_GPL(iopf_queue_flush_dev);
+EXPORT_SYMBOL_GPL(iopf_queue_discard_dev_pasid);
 
 /**
  * iopf_group_response - Respond a group of page faults
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v7 00/12] iommu: Prepare to deliver page faults to user space
  2023-11-15  3:02 [PATCH v7 00/12] iommu: Prepare to deliver page faults to user space Lu Baolu
                   ` (11 preceding siblings ...)
  2023-11-15  3:02 ` [PATCH v7 12/12] iommu: Improve iopf_queue_flush_dev() Lu Baolu
@ 2023-11-24  6:30 ` liulongfang
  2023-11-24 12:01   ` Baolu Lu
  12 siblings, 1 reply; 49+ messages in thread
From: liulongfang @ 2023-11-24  6:30 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, Will Deacon, Robin Murphy,
	Jason Gunthorpe, Kevin Tian, Jean-Philippe Brucker, Nicolin Chen
  Cc: Yi Liu, Jacob Pan, Yan Zhao, iommu, kvm, linux-kernel

On 2023/11/15 11:02, Lu Baolu Wrote:
> When a user-managed page table is attached to an IOMMU, it is necessary
> to deliver IO page faults to user space so that they can be handled
> appropriately. One use case for this is nested translation, which is
> currently being discussed in the mailing list.
> 
> I have posted a RFC series [1] that describes the implementation of
> delivering page faults to user space through IOMMUFD. This series has
> received several comments on the IOMMU refactoring, which I am trying to
> address in this series.
> 
> The major refactoring includes:
> 
> - [PATCH 01 ~ 04] Move include/uapi/linux/iommu.h to
>   include/linux/iommu.h. Remove the unrecoverable fault data definition.
> - [PATCH 05 ~ 06] Remove iommu_[un]register_device_fault_handler().
> - [PATCH 07 ~ 10] Separate SVA and IOPF. Make IOPF a generic page fault
>   handling framework.
> - [PATCH 11 ~ 12] Improve iopf framework for iommufd use.
> 
> This is also available at github [2].
> 
> [1] https://lore.kernel.org/linux-iommu/20230530053724.232765-1-baolu.lu@linux.intel.com/
> [2] https://github.com/LuBaolu/intel-iommu/commits/preparatory-io-pgfault-delivery-v7
> 
> Change log:
> v7:
>  - Rebase to v6.7-rc1.
>  - Export iopf_group_response() for global use.
>  - Release lock when calling iopf handler.
>  - The whole series has been verified to work for SVA case on Intel
>    platforms by Zhao Yan. Add her Tested-by to affected patches.
> 
> v6: https://lore.kernel.org/linux-iommu/20230928042734.16134-1-baolu.lu@linux.intel.com/
>  - [PATCH 09/12] Check IS_ERR() against the iommu domain. [Jingqi/Jason]
>  - [PATCH 12/12] Rename the comments and name of iopf_queue_flush_dev(),
>    no functionality changes. [Kevin]
>  - All patches rebased on the latest iommu/core branch.
> 
> v5: https://lore.kernel.org/linux-iommu/20230914085638.17307-1-baolu.lu@linux.intel.com/
>  - Consolidate per-device fault data management. (New patch 11)
>  - Improve iopf_queue_flush_dev(). (New patch 12)
> 
> v4: https://lore.kernel.org/linux-iommu/20230825023026.132919-1-baolu.lu@linux.intel.com/
>  - Merge iommu_fault_event and iopf_fault. They are duplicate.
>  - Move iommu_report_device_fault() and iommu_page_response() to
>    io-pgfault.c.
>  - Move iommu_sva_domain_alloc() to iommu-sva.c.
>  - Add group->domain and use it directly in sva fault handler.
>  - Misc code refactoring and refining.
> 
> v3: https://lore.kernel.org/linux-iommu/20230817234047.195194-1-baolu.lu@linux.intel.com/
>  - Convert the fault data structures from uAPI to kAPI.
>  - Merge iopf_device_param into iommu_fault_param.
>  - Add debugging on domain lifetime for iopf.
>  - Remove patch "iommu: Change the return value of dev_iommu_get()".
>  - Remove patch "iommu: Add helper to set iopf handler for domain".
>  - Misc code refactoring and refining.
> 
> v2: https://lore.kernel.org/linux-iommu/20230727054837.147050-1-baolu.lu@linux.intel.com/
>  - Remove unrecoverable fault data definition as suggested by Kevin.
>  - Drop the per-device fault cookie code considering that doesn't make
>    much sense for SVA.
>  - Make the IOMMU page fault handling framework generic. So that it can
>    available for use cases other than SVA.
> 
> v1: https://lore.kernel.org/linux-iommu/20230711010642.19707-1-baolu.lu@linux.intel.com/
> 
> Lu Baolu (12):
>   iommu: Move iommu fault data to linux/iommu.h
>   iommu/arm-smmu-v3: Remove unrecoverable faults reporting
>   iommu: Remove unrecoverable fault data
>   iommu: Cleanup iopf data structure definitions
>   iommu: Merge iopf_device_param into iommu_fault_param
>   iommu: Remove iommu_[un]register_device_fault_handler()
>   iommu: Merge iommu_fault_event and iopf_fault
>   iommu: Prepare for separating SVA and IOPF
>   iommu: Make iommu_queue_iopf() more generic
>   iommu: Separate SVA and IOPF
>   iommu: Consolidate per-device fault data management
>   iommu: Improve iopf_queue_flush_dev()
> 
>  include/linux/iommu.h                         | 266 +++++++---
>  drivers/iommu/intel/iommu.h                   |   2 +-
>  drivers/iommu/iommu-sva.h                     |  71 ---
>  include/uapi/linux/iommu.h                    | 161 ------
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  14 +-
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  51 +-
>  drivers/iommu/intel/iommu.c                   |  25 +-
>  drivers/iommu/intel/svm.c                     |   8 +-
>  drivers/iommu/io-pgfault.c                    | 469 ++++++++++++------
>  drivers/iommu/iommu-sva.c                     |  66 ++-
>  drivers/iommu/iommu.c                         | 232 ---------
>  MAINTAINERS                                   |   1 -
>  drivers/iommu/Kconfig                         |   4 +
>  drivers/iommu/Makefile                        |   3 +-
>  drivers/iommu/intel/Kconfig                   |   1 +
>  15 files changed, 601 insertions(+), 773 deletions(-)
>  delete mode 100644 drivers/iommu/iommu-sva.h
>  delete mode 100644 include/uapi/linux/iommu.h
> 

Tested-By: Longfang Liu <liulongfang@huawei.com>

The Arm SVA mode based on HiSilicon crypto accelerator completed the functional test
and performance test of page fault scenarios.
1. The IOMMU page fault processing function is normal.
2. Performance test on 128 core ARM platform. performance is reduced:

Threads  Performance
8         -0.77%
16        -1.1%
32        -0.31%
64        -0.49%
128       -0.72%
256       -1.7%
384       -4.94%
512       NA(iopf timeout)

Finally, continuing to increase the number of threads will cause iommu's page fault
processing to time out(more than 4.2 seconds).
This problem occurs both in the before version(kernel6.7-rc1) and
in the after modification's version.

Thanks.
Longfang.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v7 00/12] iommu: Prepare to deliver page faults to user space
  2023-11-24  6:30 ` [PATCH v7 00/12] iommu: Prepare to deliver page faults to user space liulongfang
@ 2023-11-24 12:01   ` Baolu Lu
  2023-11-25  4:05     ` liulongfang
  0 siblings, 1 reply; 49+ messages in thread
From: Baolu Lu @ 2023-11-24 12:01 UTC (permalink / raw)
  To: liulongfang, Joerg Roedel, Will Deacon, Robin Murphy,
	Jason Gunthorpe, Kevin Tian, Jean-Philippe Brucker, Nicolin Chen
  Cc: baolu.lu, Yi Liu, Jacob Pan, Yan Zhao, iommu, kvm, linux-kernel

On 2023/11/24 14:30, liulongfang wrote:
> On 2023/11/15 11:02, Lu Baolu Wrote:
>> When a user-managed page table is attached to an IOMMU, it is necessary
>> to deliver IO page faults to user space so that they can be handled
>> appropriately. One use case for this is nested translation, which is
>> currently being discussed in the mailing list.
>>
>> I have posted a RFC series [1] that describes the implementation of
>> delivering page faults to user space through IOMMUFD. This series has
>> received several comments on the IOMMU refactoring, which I am trying to
>> address in this series.
>>
>> The major refactoring includes:
>>
>> - [PATCH 01 ~ 04] Move include/uapi/linux/iommu.h to
>>    include/linux/iommu.h. Remove the unrecoverable fault data definition.
>> - [PATCH 05 ~ 06] Remove iommu_[un]register_device_fault_handler().
>> - [PATCH 07 ~ 10] Separate SVA and IOPF. Make IOPF a generic page fault
>>    handling framework.
>> - [PATCH 11 ~ 12] Improve iopf framework for iommufd use.
>>
>> This is also available at github [2].
>>
>> [1] https://lore.kernel.org/linux-iommu/20230530053724.232765-1-baolu.lu@linux.intel.com/
>> [2] https://github.com/LuBaolu/intel-iommu/commits/preparatory-io-pgfault-delivery-v7
>>
>> Change log:
>> v7:
>>   - Rebase to v6.7-rc1.
>>   - Export iopf_group_response() for global use.
>>   - Release lock when calling iopf handler.
>>   - The whole series has been verified to work for SVA case on Intel
>>     platforms by Zhao Yan. Add her Tested-by to affected patches.
>>
>> v6: https://lore.kernel.org/linux-iommu/20230928042734.16134-1-baolu.lu@linux.intel.com/
>>   - [PATCH 09/12] Check IS_ERR() against the iommu domain. [Jingqi/Jason]
>>   - [PATCH 12/12] Rename the comments and name of iopf_queue_flush_dev(),
>>     no functionality changes. [Kevin]
>>   - All patches rebased on the latest iommu/core branch.
>>
>> v5: https://lore.kernel.org/linux-iommu/20230914085638.17307-1-baolu.lu@linux.intel.com/
>>   - Consolidate per-device fault data management. (New patch 11)
>>   - Improve iopf_queue_flush_dev(). (New patch 12)
>>
>> v4: https://lore.kernel.org/linux-iommu/20230825023026.132919-1-baolu.lu@linux.intel.com/
>>   - Merge iommu_fault_event and iopf_fault. They are duplicate.
>>   - Move iommu_report_device_fault() and iommu_page_response() to
>>     io-pgfault.c.
>>   - Move iommu_sva_domain_alloc() to iommu-sva.c.
>>   - Add group->domain and use it directly in sva fault handler.
>>   - Misc code refactoring and refining.
>>
>> v3: https://lore.kernel.org/linux-iommu/20230817234047.195194-1-baolu.lu@linux.intel.com/
>>   - Convert the fault data structures from uAPI to kAPI.
>>   - Merge iopf_device_param into iommu_fault_param.
>>   - Add debugging on domain lifetime for iopf.
>>   - Remove patch "iommu: Change the return value of dev_iommu_get()".
>>   - Remove patch "iommu: Add helper to set iopf handler for domain".
>>   - Misc code refactoring and refining.
>>
>> v2: https://lore.kernel.org/linux-iommu/20230727054837.147050-1-baolu.lu@linux.intel.com/
>>   - Remove unrecoverable fault data definition as suggested by Kevin.
>>   - Drop the per-device fault cookie code considering that doesn't make
>>     much sense for SVA.
>>   - Make the IOMMU page fault handling framework generic. So that it can
>>     available for use cases other than SVA.
>>
>> v1: https://lore.kernel.org/linux-iommu/20230711010642.19707-1-baolu.lu@linux.intel.com/
>>
>> Lu Baolu (12):
>>    iommu: Move iommu fault data to linux/iommu.h
>>    iommu/arm-smmu-v3: Remove unrecoverable faults reporting
>>    iommu: Remove unrecoverable fault data
>>    iommu: Cleanup iopf data structure definitions
>>    iommu: Merge iopf_device_param into iommu_fault_param
>>    iommu: Remove iommu_[un]register_device_fault_handler()
>>    iommu: Merge iommu_fault_event and iopf_fault
>>    iommu: Prepare for separating SVA and IOPF
>>    iommu: Make iommu_queue_iopf() more generic
>>    iommu: Separate SVA and IOPF
>>    iommu: Consolidate per-device fault data management
>>    iommu: Improve iopf_queue_flush_dev()
>>
>>   include/linux/iommu.h                         | 266 +++++++---
>>   drivers/iommu/intel/iommu.h                   |   2 +-
>>   drivers/iommu/iommu-sva.h                     |  71 ---
>>   include/uapi/linux/iommu.h                    | 161 ------
>>   .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  14 +-
>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  51 +-
>>   drivers/iommu/intel/iommu.c                   |  25 +-
>>   drivers/iommu/intel/svm.c                     |   8 +-
>>   drivers/iommu/io-pgfault.c                    | 469 ++++++++++++------
>>   drivers/iommu/iommu-sva.c                     |  66 ++-
>>   drivers/iommu/iommu.c                         | 232 ---------
>>   MAINTAINERS                                   |   1 -
>>   drivers/iommu/Kconfig                         |   4 +
>>   drivers/iommu/Makefile                        |   3 +-
>>   drivers/iommu/intel/Kconfig                   |   1 +
>>   15 files changed, 601 insertions(+), 773 deletions(-)
>>   delete mode 100644 drivers/iommu/iommu-sva.h
>>   delete mode 100644 include/uapi/linux/iommu.h
>>
> 
> Tested-By: Longfang Liu <liulongfang@huawei.com>

Thank you for the testing.

> 
> The Arm SVA mode based on HiSilicon crypto accelerator completed the functional test
> and performance test of page fault scenarios.
> 1. The IOMMU page fault processing function is normal.
> 2. Performance test on 128 core ARM platform. performance is reduced:
> 
> Threads  Performance
> 8         -0.77%
> 16        -1.1%
> 32        -0.31%
> 64        -0.49%
> 128       -0.72%
> 256       -1.7%
> 384       -4.94%
> 512       NA(iopf timeout)
> 
> Finally, continuing to increase the number of threads will cause iommu's page fault
> processing to time out(more than 4.2 seconds).
> This problem occurs both in the before version(kernel6.7-rc1) and
> in the after modification's version.

Probably you can check whether commit 6bbd42e2df8f ("mmu_notifiers: call
invalidate_range() when invalidating TLBs") matters.

It was discussed in this thread.

https://lore.kernel.org/linux-iommu/20231117090933.75267-1-baolu.lu@linux.intel.com/

Best regards,
baolu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v7 00/12] iommu: Prepare to deliver page faults to user space
  2023-11-24 12:01   ` Baolu Lu
@ 2023-11-25  4:05     ` liulongfang
  0 siblings, 0 replies; 49+ messages in thread
From: liulongfang @ 2023-11-25  4:05 UTC (permalink / raw)
  To: Baolu Lu, Joerg Roedel, Will Deacon, Robin Murphy,
	Jason Gunthorpe, Kevin Tian, Jean-Philippe Brucker, Nicolin Chen
  Cc: Yi Liu, Jacob Pan, Yan Zhao, iommu, kvm, linux-kernel

On 2023/11/24 20:01, Baolu Lu wrote:
> On 2023/11/24 14:30, liulongfang wrote:
>> On 2023/11/15 11:02, Lu Baolu Wrote:
>>> When a user-managed page table is attached to an IOMMU, it is necessary
>>> to deliver IO page faults to user space so that they can be handled
>>> appropriately. One use case for this is nested translation, which is
>>> currently being discussed in the mailing list.
>>>
>>> I have posted a RFC series [1] that describes the implementation of
>>> delivering page faults to user space through IOMMUFD. This series has
>>> received several comments on the IOMMU refactoring, which I am trying to
>>> address in this series.
>>>
>>> The major refactoring includes:
>>>
>>> - [PATCH 01 ~ 04] Move include/uapi/linux/iommu.h to
>>>    include/linux/iommu.h. Remove the unrecoverable fault data definition.
>>> - [PATCH 05 ~ 06] Remove iommu_[un]register_device_fault_handler().
>>> - [PATCH 07 ~ 10] Separate SVA and IOPF. Make IOPF a generic page fault
>>>    handling framework.
>>> - [PATCH 11 ~ 12] Improve iopf framework for iommufd use.
>>>
>>> This is also available at github [2].
>>>
>>> [1] https://lore.kernel.org/linux-iommu/20230530053724.232765-1-baolu.lu@linux.intel.com/
>>> [2] https://github.com/LuBaolu/intel-iommu/commits/preparatory-io-pgfault-delivery-v7
>>>
>>> Change log:
>>> v7:
>>>   - Rebase to v6.7-rc1.
>>>   - Export iopf_group_response() for global use.
>>>   - Release lock when calling iopf handler.
>>>   - The whole series has been verified to work for SVA case on Intel
>>>     platforms by Zhao Yan. Add her Tested-by to affected patches.
>>>
>>> v6: https://lore.kernel.org/linux-iommu/20230928042734.16134-1-baolu.lu@linux.intel.com/
>>>   - [PATCH 09/12] Check IS_ERR() against the iommu domain. [Jingqi/Jason]
>>>   - [PATCH 12/12] Rename the comments and name of iopf_queue_flush_dev(),
>>>     no functionality changes. [Kevin]
>>>   - All patches rebased on the latest iommu/core branch.
>>>
>>> v5: https://lore.kernel.org/linux-iommu/20230914085638.17307-1-baolu.lu@linux.intel.com/
>>>   - Consolidate per-device fault data management. (New patch 11)
>>>   - Improve iopf_queue_flush_dev(). (New patch 12)
>>>
>>> v4: https://lore.kernel.org/linux-iommu/20230825023026.132919-1-baolu.lu@linux.intel.com/
>>>   - Merge iommu_fault_event and iopf_fault. They are duplicate.
>>>   - Move iommu_report_device_fault() and iommu_page_response() to
>>>     io-pgfault.c.
>>>   - Move iommu_sva_domain_alloc() to iommu-sva.c.
>>>   - Add group->domain and use it directly in sva fault handler.
>>>   - Misc code refactoring and refining.
>>>
>>> v3: https://lore.kernel.org/linux-iommu/20230817234047.195194-1-baolu.lu@linux.intel.com/
>>>   - Convert the fault data structures from uAPI to kAPI.
>>>   - Merge iopf_device_param into iommu_fault_param.
>>>   - Add debugging on domain lifetime for iopf.
>>>   - Remove patch "iommu: Change the return value of dev_iommu_get()".
>>>   - Remove patch "iommu: Add helper to set iopf handler for domain".
>>>   - Misc code refactoring and refining.
>>>
>>> v2: https://lore.kernel.org/linux-iommu/20230727054837.147050-1-baolu.lu@linux.intel.com/
>>>   - Remove unrecoverable fault data definition as suggested by Kevin.
>>>   - Drop the per-device fault cookie code considering that doesn't make
>>>     much sense for SVA.
>>>   - Make the IOMMU page fault handling framework generic. So that it can
>>>     available for use cases other than SVA.
>>>
>>> v1: https://lore.kernel.org/linux-iommu/20230711010642.19707-1-baolu.lu@linux.intel.com/
>>>
>>> Lu Baolu (12):
>>>    iommu: Move iommu fault data to linux/iommu.h
>>>    iommu/arm-smmu-v3: Remove unrecoverable faults reporting
>>>    iommu: Remove unrecoverable fault data
>>>    iommu: Cleanup iopf data structure definitions
>>>    iommu: Merge iopf_device_param into iommu_fault_param
>>>    iommu: Remove iommu_[un]register_device_fault_handler()
>>>    iommu: Merge iommu_fault_event and iopf_fault
>>>    iommu: Prepare for separating SVA and IOPF
>>>    iommu: Make iommu_queue_iopf() more generic
>>>    iommu: Separate SVA and IOPF
>>>    iommu: Consolidate per-device fault data management
>>>    iommu: Improve iopf_queue_flush_dev()
>>>
>>>   include/linux/iommu.h                         | 266 +++++++---
>>>   drivers/iommu/intel/iommu.h                   |   2 +-
>>>   drivers/iommu/iommu-sva.h                     |  71 ---
>>>   include/uapi/linux/iommu.h                    | 161 ------
>>>   .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  14 +-
>>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  51 +-
>>>   drivers/iommu/intel/iommu.c                   |  25 +-
>>>   drivers/iommu/intel/svm.c                     |   8 +-
>>>   drivers/iommu/io-pgfault.c                    | 469 ++++++++++++------
>>>   drivers/iommu/iommu-sva.c                     |  66 ++-
>>>   drivers/iommu/iommu.c                         | 232 ---------
>>>   MAINTAINERS                                   |   1 -
>>>   drivers/iommu/Kconfig                         |   4 +
>>>   drivers/iommu/Makefile                        |   3 +-
>>>   drivers/iommu/intel/Kconfig                   |   1 +
>>>   15 files changed, 601 insertions(+), 773 deletions(-)
>>>   delete mode 100644 drivers/iommu/iommu-sva.h
>>>   delete mode 100644 include/uapi/linux/iommu.h
>>>
>>
>> Tested-By: Longfang Liu <liulongfang@huawei.com>
> 
> Thank you for the testing.
> 
>>
>> The Arm SVA mode based on HiSilicon crypto accelerator completed the functional test
>> and performance test of page fault scenarios.
>> 1. The IOMMU page fault processing function is normal.
>> 2. Performance test on 128 core ARM platform. performance is reduced:
>>
>> Threads  Performance
>> 8         -0.77%
>> 16        -1.1%
>> 32        -0.31%
>> 64        -0.49%
>> 128       -0.72%
>> 256       -1.7%
>> 384       -4.94%
>> 512       NA(iopf timeout)
>>
>> Finally, continuing to increase the number of threads will cause iommu's page fault
>> processing to time out(more than 4.2 seconds).
>> This problem occurs both in the before version(kernel6.7-rc1) and
>> in the after modification's version.
> 
> Probably you can check whether commit 6bbd42e2df8f ("mmu_notifiers: call
> invalidate_range() when invalidating TLBs") matters.
> 
> It was discussed in this thread.
> 
> https://lore.kernel.org/linux-iommu/20231117090933.75267-1-baolu.lu@linux.intel.com/
>

Thanks for your reminder. But the reason for the iopf timeout in this test scenario is
different from what is pointed out in your patch.

Our analysis found that the emergence of iopf is related to the numa balance function.
The CMWQ solution for iommu's iopf currently uses a large number of kernel threads.
The page fault processing in the numa balance function will compete with the page fault
processing in iommu to occupy the CPU.
This will lead to a longer page fault processing time and trigger repeated page faults
in the IO task.This will produce an unpredictable and huge amount of page fault events,
eventually causing the entire system to be unable to respond to page fault processing
in a timely manner.

Thanks.
Longfang.
> Best regards,
> baolu
> 
> .
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v7 02/12] iommu/arm-smmu-v3: Remove unrecoverable faults reporting
  2023-11-15  3:02 ` [PATCH v7 02/12] iommu/arm-smmu-v3: Remove unrecoverable faults reporting Lu Baolu
@ 2023-12-01 15:42   ` Jason Gunthorpe
  2023-12-04 10:54   ` Yi Liu
  1 sibling, 0 replies; 49+ messages in thread
From: Jason Gunthorpe @ 2023-12-01 15:42 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan, Yan Zhao,
	iommu, kvm, linux-kernel

On Wed, Nov 15, 2023 at 11:02:16AM +0800, Lu Baolu wrote:
> No device driver registers fault handler to handle the reported
> unrecoveraable faults. Remove it to avoid dead code.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 46 ++++++---------------
>  1 file changed, 13 insertions(+), 33 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

If we do bring this back it will be in some form where the opaque
driver event information is delivered to userspace to forward to the
VM.

Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v7 07/12] iommu: Merge iommu_fault_event and iopf_fault
  2023-11-15  3:02 ` [PATCH v7 07/12] iommu: Merge iommu_fault_event and iopf_fault Lu Baolu
@ 2023-12-01 19:09   ` Jason Gunthorpe
  2023-12-04 12:40   ` Yi Liu
  1 sibling, 0 replies; 49+ messages in thread
From: Jason Gunthorpe @ 2023-12-01 19:09 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan, Yan Zhao,
	iommu, kvm, linux-kernel

On Wed, Nov 15, 2023 at 11:02:21AM +0800, Lu Baolu wrote:
> The iommu_fault_event and iopf_fault data structures store the same
> information about an iopf fault. They are also used in the same way.
> Merge these two data structures into a single one to make the code
> more concise and easier to maintain.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Tested-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
>  include/linux/iommu.h                       | 27 ++++++---------------
>  drivers/iommu/intel/iommu.h                 |  2 +-
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  4 +--
>  drivers/iommu/intel/svm.c                   |  5 ++--
>  drivers/iommu/io-pgfault.c                  |  5 ----
>  drivers/iommu/iommu.c                       |  8 +++---
>  6 files changed, 17 insertions(+), 34 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v7 09/12] iommu: Make iommu_queue_iopf() more generic
  2023-11-15  3:02 ` [PATCH v7 09/12] iommu: Make iommu_queue_iopf() more generic Lu Baolu
@ 2023-12-01 19:14   ` Jason Gunthorpe
  2023-12-05  7:13   ` Yi Liu
  1 sibling, 0 replies; 49+ messages in thread
From: Jason Gunthorpe @ 2023-12-01 19:14 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan, Yan Zhao,
	iommu, kvm, linux-kernel

On Wed, Nov 15, 2023 at 11:02:23AM +0800, Lu Baolu wrote:
> Make iommu_queue_iopf() more generic by making the iopf_group a minimal
> set of iopf's that an iopf handler of domain should handle and respond
> to. Add domain parameter to struct iopf_group so that the handler can
> retrieve and use it directly.
> 
> Change iommu_queue_iopf() to forward groups of iopf's to the domain's
> iopf handler. This is also a necessary step to decouple the sva iopf
> handling code from this interface.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Tested-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
>  include/linux/iommu.h      |  4 +--
>  drivers/iommu/iommu-sva.h  |  6 ++---
>  drivers/iommu/io-pgfault.c | 55 +++++++++++++++++++++++++++++---------
>  drivers/iommu/iommu-sva.c  |  3 +--
>  4 files changed, 48 insertions(+), 20 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v7 11/12] iommu: Consolidate per-device fault data management
  2023-11-15  3:02 ` [PATCH v7 11/12] iommu: Consolidate per-device fault data management Lu Baolu
@ 2023-12-01 19:46   ` Jason Gunthorpe
  2023-12-04  0:58     ` Baolu Lu
  0 siblings, 1 reply; 49+ messages in thread
From: Jason Gunthorpe @ 2023-12-01 19:46 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan, Yan Zhao,
	iommu, kvm, linux-kernel

On Wed, Nov 15, 2023 at 11:02:25AM +0800, Lu Baolu wrote:

> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index d19031c1b0e6..c17d5979d70d 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -597,6 +597,8 @@ struct iommu_device {
>  /**
>   * struct iommu_fault_param - per-device IOMMU fault data
>   * @lock: protect pending faults list
> + * @users: user counter to manage the lifetime of the data, this field
> + *         is protected by dev->iommu->lock.
>   * @dev: the device that owns this param
>   * @queue: IOPF queue
>   * @queue_list: index into queue->devices
> @@ -606,6 +608,7 @@ struct iommu_device {
>   */
>  struct iommu_fault_param {
>  	struct mutex lock;
> +	int users;

Use refcount_t for the debugging features

>  	struct device *dev;
>  	struct iopf_queue *queue;

But why do we need this to be refcounted? iopf_queue_remove_device()
is always called before we get to release? This struct isn't very big
so I'd just leave it allocated and free it during release?

> @@ -72,23 +115,14 @@ static int iommu_handle_iopf(struct iommu_fault *fault, struct device *dev)
>  	struct iopf_group *group;
>  	struct iopf_fault *iopf, *next;
>  	struct iommu_domain *domain = NULL;
> -	struct iommu_fault_param *iopf_param;
> -	struct dev_iommu *param = dev->iommu;
> +	struct iommu_fault_param *iopf_param = dev->iommu->fault_param;
>  
> -	lockdep_assert_held(&param->lock);
> +	lockdep_assert_held(&iopf_param->lock);

This patch seems like it is doing a few things, can the locking
changes be kept in their own patch?

Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v7 12/12] iommu: Improve iopf_queue_flush_dev()
  2023-11-15  3:02 ` [PATCH v7 12/12] iommu: Improve iopf_queue_flush_dev() Lu Baolu
@ 2023-12-01 20:35   ` Jason Gunthorpe
  2023-12-03  8:53     ` Baolu Lu
  2023-12-04  3:46     ` Baolu Lu
  0 siblings, 2 replies; 49+ messages in thread
From: Jason Gunthorpe @ 2023-12-01 20:35 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan, Yan Zhao,
	iommu, kvm, linux-kernel

On Wed, Nov 15, 2023 at 11:02:26AM +0800, Lu Baolu wrote:
> The iopf_queue_flush_dev() is called by the iommu driver before releasing
> a PASID. It ensures that all pending faults for this PASID have been
> handled or cancelled, and won't hit the address space that reuses this
> PASID. The driver must make sure that no new fault is added to the queue.

This needs more explanation, why should anyone care?

More importantly, why is *discarding* the right thing to do?
Especially why would we discard a partial page request group?

After we change a translation we may have PRI requests in a
queue. They need to be acknowledged, not discarded. The DMA in the
device should be restarted and the device should observe the new
translation - if it is blocking then it should take a DMA error.

More broadly, we should just let things run their normal course. The
domain to deliver the fault to should be determined very early. If we
get a fault and there is no fault domain currently assigned then just
restart it.

The main reason to fence would be to allow the domain to become freed
as the faults should be holding pointers to it. But I feel there are
simpler options for that then this..

> The SMMUv3 driver doesn't use it because it only implements the
> Arm-specific stall fault model where DMA transactions are held in the SMMU
> while waiting for the OS to handle iopf's. Since a device driver must
> complete all DMA transactions before detaching domain, there are no
> pending iopf's with the stall model. PRI support requires adding a call to
> iopf_queue_flush_dev() after flushing the hardware page fault queue.

This explanation doesn't make much sense, from a device driver
perspective both PRI and stall cause the device to not complete DMAs.

The difference between stall and PRI is fairly small, stall causes an
internal bus to lock up while PRI does not.

> -int iopf_queue_flush_dev(struct device *dev)
> +int iopf_queue_discard_dev_pasid(struct device *dev, ioasid_t pasid)
>  {
>  	struct iommu_fault_param *iopf_param = iopf_get_dev_fault_param(dev);
> +	const struct iommu_ops *ops = dev_iommu_ops(dev);
> +	struct iommu_page_response resp;
> +	struct iopf_fault *iopf, *next;
> +	int ret = 0;
>  
>  	if (!iopf_param)
>  		return -ENODEV;
>  
>  	flush_workqueue(iopf_param->queue->wq);
> +

A naked flush_workqueue like this is really suspicious, it needs a
comment explaining why the queue can't get more work queued at this
point. 

I suppose the driver is expected to stop calling
iommu_report_device_fault() before calling this function, but that
doesn't seem like it is going to be possible. Drivers should be
implementing atomic replace for the PASID updates and in that case
there is no momement when it can say the HW will stop generating PRI.

I'm looking at this code after these patches are applied and it still
seems quite bonkers to me :(

Why do we allocate two copies of the memory on all fault paths?

Why do we have fault->type still that only has one value?

What is serializing iommu_get_domain_for_dev_pasid() in the fault
path? It looks sort of like the plan is to use iopf_param->lock and
ensure domain removal grabs that lock at least after the xarray is
changed - but does that actually happen?

I would suggest, broadly, a flow for iommu_report_device_fault() sort
of:

1) Allocate memory for the evt. Every path except errors needs this,
   so just do it
2) iopf_get_dev_fault_param() should not have locks in it! This is
   fast path now. Use a refcount, atomic compare exchange to allocate,
   and RCU free.
3) Everything runs under the fault_param->lock
4) Check if !IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE, set it aside and then
   exit! This logic is really tortured and confusing
5) Allocate memory and assemble the group
6) Obtain the domain for this group and incr a per-domain counter that a
   fault is pending on that domain
7) Put the *group* into the WQ. Put the *group* on a list in fault_param
   instead of the individual faults
8) Don't linear search a linked list in iommu_page_response()! Pass
   the group in that we got from the WQ that we *know* is still
   active. Ack that passed group.

When freeing a domain wait for the per-domain counter to go to
zero. This ensures that the WQ is flushed out and all the outside
domain references are gone.

When wanting to turn off PRI make sure a non-PRI domain is
attached to everything. Fence against the HW's event queue. No new
iommu_report_device_fault() is possible.

Lock the fault_param->lock and go through every pending group and
respond it. Mark the group memory as invalid so iommu_page_response()
NOP's it. Unlock, fence the HW against queued responses, and turn off
PRI.

An *optimization* would be to lightly flush the domain when changing
the translation. Lock the fault_param->lock and look for groups in the
list with old_domain.  Do the same as for PRI-off: respond to the
group, mark it as NOP. The WQ may still be chewing on something so the
domain free still has to check and wait.

Did I get it right??

Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v7 12/12] iommu: Improve iopf_queue_flush_dev()
  2023-12-01 20:35   ` Jason Gunthorpe
@ 2023-12-03  8:53     ` Baolu Lu
  2023-12-03 14:14       ` Jason Gunthorpe
  2023-12-04  3:46     ` Baolu Lu
  1 sibling, 1 reply; 49+ messages in thread
From: Baolu Lu @ 2023-12-03  8:53 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan, Yan Zhao,
	iommu, kvm, linux-kernel

On 12/2/23 4:35 AM, Jason Gunthorpe wrote:
> On Wed, Nov 15, 2023 at 11:02:26AM +0800, Lu Baolu wrote:
>> The iopf_queue_flush_dev() is called by the iommu driver before releasing
>> a PASID. It ensures that all pending faults for this PASID have been
>> handled or cancelled, and won't hit the address space that reuses this
>> PASID. The driver must make sure that no new fault is added to the queue.
> This needs more explanation, why should anyone care?
> 
> More importantly, why is*discarding*  the right thing to do?
> Especially why would we discard a partial page request group?
> 
> After we change a translation we may have PRI requests in a
> queue. They need to be acknowledged, not discarded. The DMA in the
> device should be restarted and the device should observe the new
> translation - if it is blocking then it should take a DMA error.
> 
> More broadly, we should just let things run their normal course. The
> domain to deliver the fault to should be determined very early. If we
> get a fault and there is no fault domain currently assigned then just
> restart it.
> 
> The main reason to fence would be to allow the domain to become freed
> as the faults should be holding pointers to it. But I feel there are
> simpler options for that then this..

In the iommu_detach_device_pasid() path, the domain is about to be
removed from the pasid of device. The IOMMU driver performs the
following steps sequentially:

1. Clears the pasid translation entry. Thus, all subsequent DMA
    transactions (translation requests, translated requests or page
    requests) targeting the iommu domain will be blocked.

2. Waits until all pending page requests for the device's PASID have
    been reported to upper layers via the iommu_report_device_fault().
    However, this does not guarantee that all page requests have been
    responded.

3. Free all partial page requests for this pasid since the page request
    response is only needed for a complete request group. There's no
    action required for the page requests which are not last of a request
    group.

4. Iterate through the list of pending page requests and identifies
    those originating from the device's PASID. For each identified
    request, the driver responds to the hardware with the
    IOMMU_PAGE_RESP_INVALID code, indicating that the request cannot be
    handled and retries should not be attempted. This response code
    corresponds to the "Invalid Request" status defined in the PCI PRI
    specification.

5. Follow the IOMMU hardware requirements (for example, VT-d sepc,
    section 7.10, Software Steps to Drain Page Requests & Responses) to
    drain in-flight page requests and page group responses between the
    remapping hardware queues and the endpoint device.

With above steps done in iommu_detach_device_pasid(), the pasid could be
re-used for any other address space.

The iopf_queue_discard_dev_pasid() helper does step 3 and 4.

> 
>> The SMMUv3 driver doesn't use it because it only implements the
>> Arm-specific stall fault model where DMA transactions are held in the SMMU
>> while waiting for the OS to handle iopf's. Since a device driver must
>> complete all DMA transactions before detaching domain, there are no
>> pending iopf's with the stall model. PRI support requires adding a call to
>> iopf_queue_flush_dev() after flushing the hardware page fault queue.
> This explanation doesn't make much sense, from a device driver
> perspective both PRI and stall cause the device to not complete DMAs.
> 
> The difference between stall and PRI is fairly small, stall causes an
> internal bus to lock up while PRI does not.
> 
>> -int iopf_queue_flush_dev(struct device *dev)
>> +int iopf_queue_discard_dev_pasid(struct device *dev, ioasid_t pasid)
>>   {
>>   	struct iommu_fault_param *iopf_param = iopf_get_dev_fault_param(dev);
>> +	const struct iommu_ops *ops = dev_iommu_ops(dev);
>> +	struct iommu_page_response resp;
>> +	struct iopf_fault *iopf, *next;
>> +	int ret = 0;
>>   
>>   	if (!iopf_param)
>>   		return -ENODEV;
>>   
>>   	flush_workqueue(iopf_param->queue->wq);
>> +
> A naked flush_workqueue like this is really suspicious, it needs a
> comment explaining why the queue can't get more work queued at this
> point.
> 
> I suppose the driver is expected to stop calling
> iommu_report_device_fault() before calling this function, but that
> doesn't seem like it is going to be possible. Drivers should be
> implementing atomic replace for the PASID updates and in that case
> there is no momement when it can say the HW will stop generating PRI.

Atomic domain replacement for a PASID is not currently implemented in
the core or driver. Even if atomic replacement were to be implemented,
it would be necessary to ensure that all translation requests,
translated requests, page requests and responses for the old domain are
drained before switching to the new domain. I am not sure whether the
existing iommu hardware architecture supports this functionality.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v7 12/12] iommu: Improve iopf_queue_flush_dev()
  2023-12-03  8:53     ` Baolu Lu
@ 2023-12-03 14:14       ` Jason Gunthorpe
  2023-12-04  1:32         ` Baolu Lu
  0 siblings, 1 reply; 49+ messages in thread
From: Jason Gunthorpe @ 2023-12-03 14:14 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan, Yan Zhao,
	iommu, kvm, linux-kernel

On Sun, Dec 03, 2023 at 04:53:08PM +0800, Baolu Lu wrote:
> On 12/2/23 4:35 AM, Jason Gunthorpe wrote:
> > On Wed, Nov 15, 2023 at 11:02:26AM +0800, Lu Baolu wrote:
> > > The iopf_queue_flush_dev() is called by the iommu driver before releasing
> > > a PASID. It ensures that all pending faults for this PASID have been
> > > handled or cancelled, and won't hit the address space that reuses this
> > > PASID. The driver must make sure that no new fault is added to the queue.
> > This needs more explanation, why should anyone care?
> > 
> > More importantly, why is*discarding*  the right thing to do?
> > Especially why would we discard a partial page request group?
> > 
> > After we change a translation we may have PRI requests in a
> > queue. They need to be acknowledged, not discarded. The DMA in the
> > device should be restarted and the device should observe the new
> > translation - if it is blocking then it should take a DMA error.
> > 
> > More broadly, we should just let things run their normal course. The
> > domain to deliver the fault to should be determined very early. If we
> > get a fault and there is no fault domain currently assigned then just
> > restart it.
> > 
> > The main reason to fence would be to allow the domain to become freed
> > as the faults should be holding pointers to it. But I feel there are
> > simpler options for that then this..
> 
> In the iommu_detach_device_pasid() path, the domain is about to be
> removed from the pasid of device. The IOMMU driver performs the
> following steps sequentially:

I know that is why it does, but it doesn't explain at all why.

> 1. Clears the pasid translation entry. Thus, all subsequent DMA
>    transactions (translation requests, translated requests or page
>    requests) targeting the iommu domain will be blocked.
> 
> 2. Waits until all pending page requests for the device's PASID have
>    been reported to upper layers via the iommu_report_device_fault().
>    However, this does not guarantee that all page requests have been
>    responded.
>
> 3. Free all partial page requests for this pasid since the page request
>    response is only needed for a complete request group. There's no
>    action required for the page requests which are not last of a request
>    group.

But we expect the last to come eventually since everything should be
grouped properly, so why bother doing this?

Indeed if 2 worked, how is this even possible to have partials?
 
> 5. Follow the IOMMU hardware requirements (for example, VT-d sepc,
>    section 7.10, Software Steps to Drain Page Requests & Responses) to
>    drain in-flight page requests and page group responses between the
>    remapping hardware queues and the endpoint device.
> 
> With above steps done in iommu_detach_device_pasid(), the pasid could be
> re-used for any other address space.

As I said, that isn't even required. There is no issue with leaking
PRI's across attachments.


> > I suppose the driver is expected to stop calling
> > iommu_report_device_fault() before calling this function, but that
> > doesn't seem like it is going to be possible. Drivers should be
> > implementing atomic replace for the PASID updates and in that case
> > there is no momement when it can say the HW will stop generating PRI.
> 
> Atomic domain replacement for a PASID is not currently implemented in
> the core or driver. 

It is, the driver should implement set_dev_pasid in such a way that
repeated calls do replacements, ideally atomically. This is what ARM
SMMUv3 does after my changes.

> Even if atomic replacement were to be implemented,
> it would be necessary to ensure that all translation requests,
> translated requests, page requests and responses for the old domain are
> drained before switching to the new domain. 

Again, no it isn't required.

Requests simply have to continue to be acked, it doesn't matter if
they are acked against the wrong domain because the device will simply
re-issue them..

Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v7 11/12] iommu: Consolidate per-device fault data management
  2023-12-01 19:46   ` Jason Gunthorpe
@ 2023-12-04  0:58     ` Baolu Lu
  0 siblings, 0 replies; 49+ messages in thread
From: Baolu Lu @ 2023-12-04  0:58 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan, Yan Zhao,
	iommu, kvm, linux-kernel

On 12/2/23 3:46 AM, Jason Gunthorpe wrote:
> On Wed, Nov 15, 2023 at 11:02:25AM +0800, Lu Baolu wrote:
> 
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index d19031c1b0e6..c17d5979d70d 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -597,6 +597,8 @@ struct iommu_device {
>>   /**
>>    * struct iommu_fault_param - per-device IOMMU fault data
>>    * @lock: protect pending faults list
>> + * @users: user counter to manage the lifetime of the data, this field
>> + *         is protected by dev->iommu->lock.
>>    * @dev: the device that owns this param
>>    * @queue: IOPF queue
>>    * @queue_list: index into queue->devices
>> @@ -606,6 +608,7 @@ struct iommu_device {
>>    */
>>   struct iommu_fault_param {
>>   	struct mutex lock;
>> +	int users;
> 
> Use refcount_t for the debugging features

Yes.

> 
>>   	struct device *dev;
>>   	struct iopf_queue *queue;
> 
> But why do we need this to be refcounted? iopf_queue_remove_device()
> is always called before we get to release? This struct isn't very big
> so I'd just leave it allocated and free it during release?

iopf_queue_remove_device() should always be called before device
release.

The reference counter is implemented to synchronize access to the fault
parameter among different paths. For example, iopf_queue_remove_device()
removes the parameter, while iommu_report_device_fault() and
iommu_page_response() have needs to reference it. These three paths
could possibly happen in different threads.

> 
>> @@ -72,23 +115,14 @@ static int iommu_handle_iopf(struct iommu_fault *fault, struct device *dev)
>>   	struct iopf_group *group;
>>   	struct iopf_fault *iopf, *next;
>>   	struct iommu_domain *domain = NULL;
>> -	struct iommu_fault_param *iopf_param;
>> -	struct dev_iommu *param = dev->iommu;
>> +	struct iommu_fault_param *iopf_param = dev->iommu->fault_param;
>>   
>> -	lockdep_assert_held(&param->lock);
>> +	lockdep_assert_held(&iopf_param->lock);
> 
> This patch seems like it is doing a few things, can the locking
> changes be kept in their own patch?

Yes. Let me try to.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v7 12/12] iommu: Improve iopf_queue_flush_dev()
  2023-12-03 14:14       ` Jason Gunthorpe
@ 2023-12-04  1:32         ` Baolu Lu
  2023-12-04  5:37           ` Tian, Kevin
  2023-12-04 13:12           ` Jason Gunthorpe
  0 siblings, 2 replies; 49+ messages in thread
From: Baolu Lu @ 2023-12-04  1:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan, Yan Zhao,
	iommu, kvm, linux-kernel

On 12/3/23 10:14 PM, Jason Gunthorpe wrote:
> On Sun, Dec 03, 2023 at 04:53:08PM +0800, Baolu Lu wrote:
>> On 12/2/23 4:35 AM, Jason Gunthorpe wrote:
>>> On Wed, Nov 15, 2023 at 11:02:26AM +0800, Lu Baolu wrote:
>>>> The iopf_queue_flush_dev() is called by the iommu driver before releasing
>>>> a PASID. It ensures that all pending faults for this PASID have been
>>>> handled or cancelled, and won't hit the address space that reuses this
>>>> PASID. The driver must make sure that no new fault is added to the queue.
>>> This needs more explanation, why should anyone care?
>>>
>>> More importantly, why is*discarding*  the right thing to do?
>>> Especially why would we discard a partial page request group?
>>>
>>> After we change a translation we may have PRI requests in a
>>> queue. They need to be acknowledged, not discarded. The DMA in the
>>> device should be restarted and the device should observe the new
>>> translation - if it is blocking then it should take a DMA error.
>>>
>>> More broadly, we should just let things run their normal course. The
>>> domain to deliver the fault to should be determined very early. If we
>>> get a fault and there is no fault domain currently assigned then just
>>> restart it.
>>>
>>> The main reason to fence would be to allow the domain to become freed
>>> as the faults should be holding pointers to it. But I feel there are
>>> simpler options for that then this..
>>
>> In the iommu_detach_device_pasid() path, the domain is about to be
>> removed from the pasid of device. The IOMMU driver performs the
>> following steps sequentially:
> 
> I know that is why it does, but it doesn't explain at all why.
> 
>> 1. Clears the pasid translation entry. Thus, all subsequent DMA
>>     transactions (translation requests, translated requests or page
>>     requests) targeting the iommu domain will be blocked.
>>
>> 2. Waits until all pending page requests for the device's PASID have
>>     been reported to upper layers via the iommu_report_device_fault().
>>     However, this does not guarantee that all page requests have been
>>     responded.
>>
>> 3. Free all partial page requests for this pasid since the page request
>>     response is only needed for a complete request group. There's no
>>     action required for the page requests which are not last of a request
>>     group.
> 
> But we expect the last to come eventually since everything should be
> grouped properly, so why bother doing this?
> 
> Indeed if 2 worked, how is this even possible to have partials?

Step 1 clears the pasid table entry, hence all subsequent page requests
are blocked (hardware auto-respond the request but not put it in the
queue).

It is possible that a portion of a page fault group may have been queued
for processing, but the last request is being blocked by hardware due
to the pasid entry being in the blocking state.

In reality, this may be a no-op as I haven't seen any real-world
implementations of multiple-requests fault groups on Intel platforms.

>   
>> 5. Follow the IOMMU hardware requirements (for example, VT-d sepc,
>>     section 7.10, Software Steps to Drain Page Requests & Responses) to
>>     drain in-flight page requests and page group responses between the
>>     remapping hardware queues and the endpoint device.
>>
>> With above steps done in iommu_detach_device_pasid(), the pasid could be
>> re-used for any other address space.
> 
> As I said, that isn't even required. There is no issue with leaking
> PRI's across attachments.
> 
> 
>>> I suppose the driver is expected to stop calling
>>> iommu_report_device_fault() before calling this function, but that
>>> doesn't seem like it is going to be possible. Drivers should be
>>> implementing atomic replace for the PASID updates and in that case
>>> there is no momement when it can say the HW will stop generating PRI.
>>
>> Atomic domain replacement for a PASID is not currently implemented in
>> the core or driver.
> 
> It is, the driver should implement set_dev_pasid in such a way that
> repeated calls do replacements, ideally atomically. This is what ARM
> SMMUv3 does after my changes.
> 
>> Even if atomic replacement were to be implemented,
>> it would be necessary to ensure that all translation requests,
>> translated requests, page requests and responses for the old domain are
>> drained before switching to the new domain.
> 
> Again, no it isn't required.
> 
> Requests simply have to continue to be acked, it doesn't matter if
> they are acked against the wrong domain because the device will simply
> re-issue them..

Ah! I start to get your point now.

Even a page fault response is postponed to a new address space, which
possibly be another address space or hardware blocking state, the
hardware just retries.

As long as we flushes all caches (IOTLB and device TLB) during 
switching, the mappings of the old domain won't leak. So it's safe to 
keep page requests there.

Do I get you correctly?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v7 12/12] iommu: Improve iopf_queue_flush_dev()
  2023-12-01 20:35   ` Jason Gunthorpe
  2023-12-03  8:53     ` Baolu Lu
@ 2023-12-04  3:46     ` Baolu Lu
  2023-12-04 13:27       ` Jason Gunthorpe
  1 sibling, 1 reply; 49+ messages in thread
From: Baolu Lu @ 2023-12-04  3:46 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan, Yan Zhao,
	iommu, kvm, linux-kernel

On 12/2/23 4:35 AM, Jason Gunthorpe wrote:
> I'm looking at this code after these patches are applied and it still
> seems quite bonkers to me 🙁
> 
> Why do we allocate two copies of the memory on all fault paths?
> 
> Why do we have fault->type still that only has one value?
> 
> What is serializing iommu_get_domain_for_dev_pasid() in the fault
> path? It looks sort of like the plan is to use iopf_param->lock and
> ensure domain removal grabs that lock at least after the xarray is
> changed - but does that actually happen?
> 
> I would suggest, broadly, a flow for iommu_report_device_fault() sort
> of:
> 
> 1) Allocate memory for the evt. Every path except errors needs this,
>     so just do it
> 2) iopf_get_dev_fault_param() should not have locks in it! This is
>     fast path now. Use a refcount, atomic compare exchange to allocate,
>     and RCU free.
> 3) Everything runs under the fault_param->lock
> 4) Check if !IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE, set it aside and then
>     exit! This logic is really tortured and confusing
> 5) Allocate memory and assemble the group
> 6) Obtain the domain for this group and incr a per-domain counter that a
>     fault is pending on that domain
> 7) Put the*group*  into the WQ. Put the*group*  on a list in fault_param
>     instead of the individual faults
> 8) Don't linear search a linked list in iommu_page_response()! Pass
>     the group in that we got from the WQ that we*know*  is still
>     active. Ack that passed group.
> 
> When freeing a domain wait for the per-domain counter to go to
> zero. This ensures that the WQ is flushed out and all the outside
> domain references are gone.
> 
> When wanting to turn off PRI make sure a non-PRI domain is
> attached to everything. Fence against the HW's event queue. No new
> iommu_report_device_fault() is possible.
> 
> Lock the fault_param->lock and go through every pending group and
> respond it. Mark the group memory as invalid so iommu_page_response()
> NOP's it. Unlock, fence the HW against queued responses, and turn off
> PRI.
> 
> An*optimization*  would be to lightly flush the domain when changing
> the translation. Lock the fault_param->lock and look for groups in the
> list with old_domain.  Do the same as for PRI-off: respond to the
> group, mark it as NOP. The WQ may still be chewing on something so the
> domain free still has to check and wait.

Very appreciated for all the ideas. I looked through the items and felt
that all these are good optimizations.

I am wondering whether we can take patch 1/12 ~ 10/12 of this series as
a first step, a refactoring effort to support delivering iopf to
userspace? I will follow up with one or multiple series to add the
optimizations.

Does this work for you? Or, you want to take any of above as the
requirement for iommufd use case?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [PATCH v7 12/12] iommu: Improve iopf_queue_flush_dev()
  2023-12-04  1:32         ` Baolu Lu
@ 2023-12-04  5:37           ` Tian, Kevin
  2023-12-04 13:25             ` Jason Gunthorpe
  2023-12-04 13:12           ` Jason Gunthorpe
  1 sibling, 1 reply; 49+ messages in thread
From: Tian, Kevin @ 2023-12-04  5:37 UTC (permalink / raw)
  To: Baolu Lu, Jason Gunthorpe
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Jean-Philippe Brucker,
	Nicolin Chen, Liu, Yi L, Jacob Pan, Zhao, Yan Y, iommu, kvm,
	linux-kernel

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Monday, December 4, 2023 9:33 AM
> 
> On 12/3/23 10:14 PM, Jason Gunthorpe wrote:
> > On Sun, Dec 03, 2023 at 04:53:08PM +0800, Baolu Lu wrote:
> >> Even if atomic replacement were to be implemented,
> >> it would be necessary to ensure that all translation requests,
> >> translated requests, page requests and responses for the old domain are
> >> drained before switching to the new domain.
> >
> > Again, no it isn't required.
> >
> > Requests simply have to continue to be acked, it doesn't matter if
> > they are acked against the wrong domain because the device will simply
> > re-issue them..
> 
> Ah! I start to get your point now.
> 
> Even a page fault response is postponed to a new address space, which
> possibly be another address space or hardware blocking state, the
> hardware just retries.

if blocking then the device shouldn't retry.

btw if a stale request targets an virtual address which is outside of the
valid VMA's of the new address space then visible side-effect will
be incurred in handle_mm_fault() on the new space. Is it desired?

Or if a pending response carries an error code (Invalid Request) from
the old address space is received by the device when the new address
space is already activated, the hardware will report an error even
though there might be a valid mapping in the new space.

> 
> As long as we flushes all caches (IOTLB and device TLB) during
> switching, the mappings of the old domain won't leak. So it's safe to
> keep page requests there.
> 

I don't think atomic replace is the main usage for this draining 
requirement. Instead I'm more interested in the basic popular usage: 
attach-detach-attach and not convinced that no draining is required
between iommu/device to avoid interference between activities
from old/new address space.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v7 01/12] iommu: Move iommu fault data to linux/iommu.h
  2023-11-15  3:02 ` [PATCH v7 01/12] iommu: Move iommu fault data to linux/iommu.h Lu Baolu
@ 2023-12-04 10:52   ` Yi Liu
  0 siblings, 0 replies; 49+ messages in thread
From: Yi Liu @ 2023-12-04 10:52 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, Will Deacon, Robin Murphy,
	Jason Gunthorpe, Kevin Tian, Jean-Philippe Brucker, Nicolin Chen
  Cc: Jacob Pan, Yan Zhao, iommu, kvm, linux-kernel, Jason Gunthorpe

On 2023/11/15 11:02, Lu Baolu wrote:
> The iommu fault data is currently defined in uapi/linux/iommu.h, but is
> only used inside the iommu subsystem. Move it to linux/iommu.h, where it
> will be more accessible to kernel drivers.
> 
> With this done, uapi/linux/iommu.h becomes empty and can be removed from
> the tree.

It was supposed to be an uapi, but now the counterpart is going to be
defined in iommufd.h. :)

Reviewed-by: Yi Liu <yi.l.liu@intel.com>

> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Tested-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
>   include/linux/iommu.h      | 152 +++++++++++++++++++++++++++++++++-
>   include/uapi/linux/iommu.h | 161 -------------------------------------
>   MAINTAINERS                |   1 -
>   3 files changed, 151 insertions(+), 163 deletions(-)
>   delete mode 100644 include/uapi/linux/iommu.h
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index ec289c1016f5..c2e2225184cf 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -14,7 +14,6 @@
>   #include <linux/err.h>
>   #include <linux/of.h>
>   #include <linux/iova_bitmap.h>
> -#include <uapi/linux/iommu.h>
>   
>   #define IOMMU_READ	(1 << 0)
>   #define IOMMU_WRITE	(1 << 1)
> @@ -44,6 +43,157 @@ struct iommu_sva;
>   struct iommu_fault_event;
>   struct iommu_dma_cookie;
>   
> +#define IOMMU_FAULT_PERM_READ	(1 << 0) /* read */
> +#define IOMMU_FAULT_PERM_WRITE	(1 << 1) /* write */
> +#define IOMMU_FAULT_PERM_EXEC	(1 << 2) /* exec */
> +#define IOMMU_FAULT_PERM_PRIV	(1 << 3) /* privileged */
> +
> +/* Generic fault types, can be expanded IRQ remapping fault */
> +enum iommu_fault_type {
> +	IOMMU_FAULT_DMA_UNRECOV = 1,	/* unrecoverable fault */
> +	IOMMU_FAULT_PAGE_REQ,		/* page request fault */
> +};
> +
> +enum iommu_fault_reason {
> +	IOMMU_FAULT_REASON_UNKNOWN = 0,
> +
> +	/* Could not access the PASID table (fetch caused external abort) */
> +	IOMMU_FAULT_REASON_PASID_FETCH,
> +
> +	/* PASID entry is invalid or has configuration errors */
> +	IOMMU_FAULT_REASON_BAD_PASID_ENTRY,
> +
> +	/*
> +	 * PASID is out of range (e.g. exceeds the maximum PASID
> +	 * supported by the IOMMU) or disabled.
> +	 */
> +	IOMMU_FAULT_REASON_PASID_INVALID,
> +
> +	/*
> +	 * An external abort occurred fetching (or updating) a translation
> +	 * table descriptor
> +	 */
> +	IOMMU_FAULT_REASON_WALK_EABT,
> +
> +	/*
> +	 * Could not access the page table entry (Bad address),
> +	 * actual translation fault
> +	 */
> +	IOMMU_FAULT_REASON_PTE_FETCH,
> +
> +	/* Protection flag check failed */
> +	IOMMU_FAULT_REASON_PERMISSION,
> +
> +	/* access flag check failed */
> +	IOMMU_FAULT_REASON_ACCESS,
> +
> +	/* Output address of a translation stage caused Address Size fault */
> +	IOMMU_FAULT_REASON_OOR_ADDRESS,
> +};
> +
> +/**
> + * struct iommu_fault_unrecoverable - Unrecoverable fault data
> + * @reason: reason of the fault, from &enum iommu_fault_reason
> + * @flags: parameters of this fault (IOMMU_FAULT_UNRECOV_* values)
> + * @pasid: Process Address Space ID
> + * @perm: requested permission access using by the incoming transaction
> + *        (IOMMU_FAULT_PERM_* values)
> + * @addr: offending page address
> + * @fetch_addr: address that caused a fetch abort, if any
> + */
> +struct iommu_fault_unrecoverable {
> +	__u32	reason;
> +#define IOMMU_FAULT_UNRECOV_PASID_VALID		(1 << 0)
> +#define IOMMU_FAULT_UNRECOV_ADDR_VALID		(1 << 1)
> +#define IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID	(1 << 2)
> +	__u32	flags;
> +	__u32	pasid;
> +	__u32	perm;
> +	__u64	addr;
> +	__u64	fetch_addr;
> +};
> +
> +/**
> + * struct iommu_fault_page_request - Page Request data
> + * @flags: encodes whether the corresponding fields are valid and whether this
> + *         is the last page in group (IOMMU_FAULT_PAGE_REQUEST_* values).
> + *         When IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID is set, the page response
> + *         must have the same PASID value as the page request. When it is clear,
> + *         the page response should not have a PASID.
> + * @pasid: Process Address Space ID
> + * @grpid: Page Request Group Index
> + * @perm: requested page permissions (IOMMU_FAULT_PERM_* values)
> + * @addr: page address
> + * @private_data: device-specific private information
> + */
> +struct iommu_fault_page_request {
> +#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID	(1 << 0)
> +#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE	(1 << 1)
> +#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA	(1 << 2)
> +#define IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID	(1 << 3)
> +	__u32	flags;
> +	__u32	pasid;
> +	__u32	grpid;
> +	__u32	perm;
> +	__u64	addr;
> +	__u64	private_data[2];
> +};
> +
> +/**
> + * struct iommu_fault - Generic fault data
> + * @type: fault type from &enum iommu_fault_type
> + * @padding: reserved for future use (should be zero)
> + * @event: fault event, when @type is %IOMMU_FAULT_DMA_UNRECOV
> + * @prm: Page Request message, when @type is %IOMMU_FAULT_PAGE_REQ
> + * @padding2: sets the fault size to allow for future extensions
> + */
> +struct iommu_fault {
> +	__u32	type;
> +	__u32	padding;
> +	union {
> +		struct iommu_fault_unrecoverable event;
> +		struct iommu_fault_page_request prm;
> +		__u8 padding2[56];
> +	};
> +};
> +
> +/**
> + * enum iommu_page_response_code - Return status of fault handlers
> + * @IOMMU_PAGE_RESP_SUCCESS: Fault has been handled and the page tables
> + *	populated, retry the access. This is "Success" in PCI PRI.
> + * @IOMMU_PAGE_RESP_FAILURE: General error. Drop all subsequent faults from
> + *	this device if possible. This is "Response Failure" in PCI PRI.
> + * @IOMMU_PAGE_RESP_INVALID: Could not handle this fault, don't retry the
> + *	access. This is "Invalid Request" in PCI PRI.
> + */
> +enum iommu_page_response_code {
> +	IOMMU_PAGE_RESP_SUCCESS = 0,
> +	IOMMU_PAGE_RESP_INVALID,
> +	IOMMU_PAGE_RESP_FAILURE,
> +};
> +
> +/**
> + * struct iommu_page_response - Generic page response information
> + * @argsz: User filled size of this data
> + * @version: API version of this structure
> + * @flags: encodes whether the corresponding fields are valid
> + *         (IOMMU_FAULT_PAGE_RESPONSE_* values)
> + * @pasid: Process Address Space ID
> + * @grpid: Page Request Group Index
> + * @code: response code from &enum iommu_page_response_code
> + */
> +struct iommu_page_response {
> +	__u32	argsz;
> +#define IOMMU_PAGE_RESP_VERSION_1	1
> +	__u32	version;
> +#define IOMMU_PAGE_RESP_PASID_VALID	(1 << 0)
> +	__u32	flags;
> +	__u32	pasid;
> +	__u32	grpid;
> +	__u32	code;
> +};
> +
> +
>   /* iommu fault flags */
>   #define IOMMU_FAULT_READ	0x0
>   #define IOMMU_FAULT_WRITE	0x1
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> deleted file mode 100644
> index 65d8b0234f69..000000000000
> --- a/include/uapi/linux/iommu.h
> +++ /dev/null
> @@ -1,161 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> -/*
> - * IOMMU user API definitions
> - */
> -
> -#ifndef _UAPI_IOMMU_H
> -#define _UAPI_IOMMU_H
> -
> -#include <linux/types.h>
> -
> -#define IOMMU_FAULT_PERM_READ	(1 << 0) /* read */
> -#define IOMMU_FAULT_PERM_WRITE	(1 << 1) /* write */
> -#define IOMMU_FAULT_PERM_EXEC	(1 << 2) /* exec */
> -#define IOMMU_FAULT_PERM_PRIV	(1 << 3) /* privileged */
> -
> -/* Generic fault types, can be expanded IRQ remapping fault */
> -enum iommu_fault_type {
> -	IOMMU_FAULT_DMA_UNRECOV = 1,	/* unrecoverable fault */
> -	IOMMU_FAULT_PAGE_REQ,		/* page request fault */
> -};
> -
> -enum iommu_fault_reason {
> -	IOMMU_FAULT_REASON_UNKNOWN = 0,
> -
> -	/* Could not access the PASID table (fetch caused external abort) */
> -	IOMMU_FAULT_REASON_PASID_FETCH,
> -
> -	/* PASID entry is invalid or has configuration errors */
> -	IOMMU_FAULT_REASON_BAD_PASID_ENTRY,
> -
> -	/*
> -	 * PASID is out of range (e.g. exceeds the maximum PASID
> -	 * supported by the IOMMU) or disabled.
> -	 */
> -	IOMMU_FAULT_REASON_PASID_INVALID,
> -
> -	/*
> -	 * An external abort occurred fetching (or updating) a translation
> -	 * table descriptor
> -	 */
> -	IOMMU_FAULT_REASON_WALK_EABT,
> -
> -	/*
> -	 * Could not access the page table entry (Bad address),
> -	 * actual translation fault
> -	 */
> -	IOMMU_FAULT_REASON_PTE_FETCH,
> -
> -	/* Protection flag check failed */
> -	IOMMU_FAULT_REASON_PERMISSION,
> -
> -	/* access flag check failed */
> -	IOMMU_FAULT_REASON_ACCESS,
> -
> -	/* Output address of a translation stage caused Address Size fault */
> -	IOMMU_FAULT_REASON_OOR_ADDRESS,
> -};
> -
> -/**
> - * struct iommu_fault_unrecoverable - Unrecoverable fault data
> - * @reason: reason of the fault, from &enum iommu_fault_reason
> - * @flags: parameters of this fault (IOMMU_FAULT_UNRECOV_* values)
> - * @pasid: Process Address Space ID
> - * @perm: requested permission access using by the incoming transaction
> - *        (IOMMU_FAULT_PERM_* values)
> - * @addr: offending page address
> - * @fetch_addr: address that caused a fetch abort, if any
> - */
> -struct iommu_fault_unrecoverable {
> -	__u32	reason;
> -#define IOMMU_FAULT_UNRECOV_PASID_VALID		(1 << 0)
> -#define IOMMU_FAULT_UNRECOV_ADDR_VALID		(1 << 1)
> -#define IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID	(1 << 2)
> -	__u32	flags;
> -	__u32	pasid;
> -	__u32	perm;
> -	__u64	addr;
> -	__u64	fetch_addr;
> -};
> -
> -/**
> - * struct iommu_fault_page_request - Page Request data
> - * @flags: encodes whether the corresponding fields are valid and whether this
> - *         is the last page in group (IOMMU_FAULT_PAGE_REQUEST_* values).
> - *         When IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID is set, the page response
> - *         must have the same PASID value as the page request. When it is clear,
> - *         the page response should not have a PASID.
> - * @pasid: Process Address Space ID
> - * @grpid: Page Request Group Index
> - * @perm: requested page permissions (IOMMU_FAULT_PERM_* values)
> - * @addr: page address
> - * @private_data: device-specific private information
> - */
> -struct iommu_fault_page_request {
> -#define IOMMU_FAULT_PAGE_REQUEST_PASID_VALID	(1 << 0)
> -#define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE	(1 << 1)
> -#define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA	(1 << 2)
> -#define IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID	(1 << 3)
> -	__u32	flags;
> -	__u32	pasid;
> -	__u32	grpid;
> -	__u32	perm;
> -	__u64	addr;
> -	__u64	private_data[2];
> -};
> -
> -/**
> - * struct iommu_fault - Generic fault data
> - * @type: fault type from &enum iommu_fault_type
> - * @padding: reserved for future use (should be zero)
> - * @event: fault event, when @type is %IOMMU_FAULT_DMA_UNRECOV
> - * @prm: Page Request message, when @type is %IOMMU_FAULT_PAGE_REQ
> - * @padding2: sets the fault size to allow for future extensions
> - */
> -struct iommu_fault {
> -	__u32	type;
> -	__u32	padding;
> -	union {
> -		struct iommu_fault_unrecoverable event;
> -		struct iommu_fault_page_request prm;
> -		__u8 padding2[56];
> -	};
> -};
> -
> -/**
> - * enum iommu_page_response_code - Return status of fault handlers
> - * @IOMMU_PAGE_RESP_SUCCESS: Fault has been handled and the page tables
> - *	populated, retry the access. This is "Success" in PCI PRI.
> - * @IOMMU_PAGE_RESP_FAILURE: General error. Drop all subsequent faults from
> - *	this device if possible. This is "Response Failure" in PCI PRI.
> - * @IOMMU_PAGE_RESP_INVALID: Could not handle this fault, don't retry the
> - *	access. This is "Invalid Request" in PCI PRI.
> - */
> -enum iommu_page_response_code {
> -	IOMMU_PAGE_RESP_SUCCESS = 0,
> -	IOMMU_PAGE_RESP_INVALID,
> -	IOMMU_PAGE_RESP_FAILURE,
> -};
> -
> -/**
> - * struct iommu_page_response - Generic page response information
> - * @argsz: User filled size of this data
> - * @version: API version of this structure
> - * @flags: encodes whether the corresponding fields are valid
> - *         (IOMMU_FAULT_PAGE_RESPONSE_* values)
> - * @pasid: Process Address Space ID
> - * @grpid: Page Request Group Index
> - * @code: response code from &enum iommu_page_response_code
> - */
> -struct iommu_page_response {
> -	__u32	argsz;
> -#define IOMMU_PAGE_RESP_VERSION_1	1
> -	__u32	version;
> -#define IOMMU_PAGE_RESP_PASID_VALID	(1 << 0)
> -	__u32	flags;
> -	__u32	pasid;
> -	__u32	grpid;
> -	__u32	code;
> -};
> -
> -#endif /* _UAPI_IOMMU_H */
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 97f51d5ec1cf..bfd97aaeb01d 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -11129,7 +11129,6 @@ F:	drivers/iommu/
>   F:	include/linux/iommu.h
>   F:	include/linux/iova.h
>   F:	include/linux/of_iommu.h
> -F:	include/uapi/linux/iommu.h
>   
>   IOMMUFD
>   M:	Jason Gunthorpe <jgg@nvidia.com>

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v7 02/12] iommu/arm-smmu-v3: Remove unrecoverable faults reporting
  2023-11-15  3:02 ` [PATCH v7 02/12] iommu/arm-smmu-v3: Remove unrecoverable faults reporting Lu Baolu
  2023-12-01 15:42   ` Jason Gunthorpe
@ 2023-12-04 10:54   ` Yi Liu
  2023-12-05 11:48     ` Baolu Lu
  1 sibling, 1 reply; 49+ messages in thread
From: Yi Liu @ 2023-12-04 10:54 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, Will Deacon, Robin Murphy,
	Jason Gunthorpe, Kevin Tian, Jean-Philippe Brucker, Nicolin Chen
  Cc: Jacob Pan, Yan Zhao, iommu, kvm, linux-kernel



On 2023/11/15 11:02, Lu Baolu wrote:
> No device driver registers fault handler to handle the reported
> unrecoveraable faults. Remove it to avoid dead code.

I noticed only ARM code is removed. So intel iommu driver does not have
code that tries to report unrecoveraable faults?

> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> ---
>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 46 ++++++---------------
>   1 file changed, 13 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 7445454c2af2..505400538a2e 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1463,7 +1463,6 @@ arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
>   static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
>   {
>   	int ret;
> -	u32 reason;
>   	u32 perm = 0;
>   	struct arm_smmu_master *master;
>   	bool ssid_valid = evt[0] & EVTQ_0_SSV;
> @@ -1473,16 +1472,9 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
>   
>   	switch (FIELD_GET(EVTQ_0_ID, evt[0])) {
>   	case EVT_ID_TRANSLATION_FAULT:
> -		reason = IOMMU_FAULT_REASON_PTE_FETCH;
> -		break;
>   	case EVT_ID_ADDR_SIZE_FAULT:
> -		reason = IOMMU_FAULT_REASON_OOR_ADDRESS;
> -		break;
>   	case EVT_ID_ACCESS_FAULT:
> -		reason = IOMMU_FAULT_REASON_ACCESS;
> -		break;
>   	case EVT_ID_PERMISSION_FAULT:
> -		reason = IOMMU_FAULT_REASON_PERMISSION;
>   		break;
>   	default:
>   		return -EOPNOTSUPP;
> @@ -1492,6 +1484,9 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
>   	if (evt[1] & EVTQ_1_S2)
>   		return -EFAULT;
>   
> +	if (!(evt[1] & EVTQ_1_STALL))
> +		return -EOPNOTSUPP;
> +
>   	if (evt[1] & EVTQ_1_RnW)
>   		perm |= IOMMU_FAULT_PERM_READ;
>   	else
> @@ -1503,32 +1498,17 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
>   	if (evt[1] & EVTQ_1_PnU)
>   		perm |= IOMMU_FAULT_PERM_PRIV;
>   
> -	if (evt[1] & EVTQ_1_STALL) {
> -		flt->type = IOMMU_FAULT_PAGE_REQ;
> -		flt->prm = (struct iommu_fault_page_request) {
> -			.flags = IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE,
> -			.grpid = FIELD_GET(EVTQ_1_STAG, evt[1]),
> -			.perm = perm,
> -			.addr = FIELD_GET(EVTQ_2_ADDR, evt[2]),
> -		};
> +	flt->type = IOMMU_FAULT_PAGE_REQ;
> +	flt->prm = (struct iommu_fault_page_request) {
> +		.flags = IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE,
> +		.grpid = FIELD_GET(EVTQ_1_STAG, evt[1]),
> +		.perm = perm,
> +		.addr = FIELD_GET(EVTQ_2_ADDR, evt[2]),
> +	};
>   
> -		if (ssid_valid) {
> -			flt->prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
> -			flt->prm.pasid = FIELD_GET(EVTQ_0_SSID, evt[0]);
> -		}
> -	} else {
> -		flt->type = IOMMU_FAULT_DMA_UNRECOV;
> -		flt->event = (struct iommu_fault_unrecoverable) {
> -			.reason = reason,
> -			.flags = IOMMU_FAULT_UNRECOV_ADDR_VALID,
> -			.perm = perm,
> -			.addr = FIELD_GET(EVTQ_2_ADDR, evt[2]),
> -		};
> -
> -		if (ssid_valid) {
> -			flt->event.flags |= IOMMU_FAULT_UNRECOV_PASID_VALID;
> -			flt->event.pasid = FIELD_GET(EVTQ_0_SSID, evt[0]);
> -		}
> +	if (ssid_valid) {
> +		flt->prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
> +		flt->prm.pasid = FIELD_GET(EVTQ_0_SSID, evt[0]);
>   	}
>   
>   	mutex_lock(&smmu->streams_mutex);

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v7 03/12] iommu: Remove unrecoverable fault data
  2023-11-15  3:02 ` [PATCH v7 03/12] iommu: Remove unrecoverable fault data Lu Baolu
@ 2023-12-04 10:58   ` Yi Liu
  2023-12-05 11:55     ` Baolu Lu
  0 siblings, 1 reply; 49+ messages in thread
From: Yi Liu @ 2023-12-04 10:58 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, Will Deacon, Robin Murphy,
	Jason Gunthorpe, Kevin Tian, Jean-Philippe Brucker, Nicolin Chen
  Cc: Jacob Pan, Yan Zhao, iommu, kvm, linux-kernel, Jason Gunthorpe

On 2023/11/15 11:02, Lu Baolu wrote:
> The unrecoverable fault data is not used anywhere. Remove it to avoid
> dead code.
> 
> Suggested-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> ---
>   include/linux/iommu.h | 70 +------------------------------------------
>   1 file changed, 1 insertion(+), 69 deletions(-)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index c2e2225184cf..81eee1afec72 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -50,69 +50,9 @@ struct iommu_dma_cookie;
>   
>   /* Generic fault types, can be expanded IRQ remapping fault */
>   enum iommu_fault_type {
> -	IOMMU_FAULT_DMA_UNRECOV = 1,	/* unrecoverable fault */
>   	IOMMU_FAULT_PAGE_REQ,		/* page request fault */

a nit, do you kno why this enum was starting from 1? Should it still
start from 1 after deleting UNRECOV?

>   };
>   
> -enum iommu_fault_reason {
> -	IOMMU_FAULT_REASON_UNKNOWN = 0,
> -
> -	/* Could not access the PASID table (fetch caused external abort) */
> -	IOMMU_FAULT_REASON_PASID_FETCH,
> -
> -	/* PASID entry is invalid or has configuration errors */
> -	IOMMU_FAULT_REASON_BAD_PASID_ENTRY,
> -
> -	/*
> -	 * PASID is out of range (e.g. exceeds the maximum PASID
> -	 * supported by the IOMMU) or disabled.
> -	 */
> -	IOMMU_FAULT_REASON_PASID_INVALID,
> -
> -	/*
> -	 * An external abort occurred fetching (or updating) a translation
> -	 * table descriptor
> -	 */
> -	IOMMU_FAULT_REASON_WALK_EABT,
> -
> -	/*
> -	 * Could not access the page table entry (Bad address),
> -	 * actual translation fault
> -	 */
> -	IOMMU_FAULT_REASON_PTE_FETCH,
> -
> -	/* Protection flag check failed */
> -	IOMMU_FAULT_REASON_PERMISSION,
> -
> -	/* access flag check failed */
> -	IOMMU_FAULT_REASON_ACCESS,
> -
> -	/* Output address of a translation stage caused Address Size fault */
> -	IOMMU_FAULT_REASON_OOR_ADDRESS,
> -};
> -
> -/**
> - * struct iommu_fault_unrecoverable - Unrecoverable fault data
> - * @reason: reason of the fault, from &enum iommu_fault_reason
> - * @flags: parameters of this fault (IOMMU_FAULT_UNRECOV_* values)
> - * @pasid: Process Address Space ID
> - * @perm: requested permission access using by the incoming transaction
> - *        (IOMMU_FAULT_PERM_* values)
> - * @addr: offending page address
> - * @fetch_addr: address that caused a fetch abort, if any
> - */
> -struct iommu_fault_unrecoverable {
> -	__u32	reason;
> -#define IOMMU_FAULT_UNRECOV_PASID_VALID		(1 << 0)
> -#define IOMMU_FAULT_UNRECOV_ADDR_VALID		(1 << 1)
> -#define IOMMU_FAULT_UNRECOV_FETCH_ADDR_VALID	(1 << 2)
> -	__u32	flags;
> -	__u32	pasid;
> -	__u32	perm;
> -	__u64	addr;
> -	__u64	fetch_addr;
> -};
> -
>   /**
>    * struct iommu_fault_page_request - Page Request data
>    * @flags: encodes whether the corresponding fields are valid and whether this
> @@ -142,19 +82,11 @@ struct iommu_fault_page_request {
>   /**
>    * struct iommu_fault - Generic fault data
>    * @type: fault type from &enum iommu_fault_type
> - * @padding: reserved for future use (should be zero)
> - * @event: fault event, when @type is %IOMMU_FAULT_DMA_UNRECOV
>    * @prm: Page Request message, when @type is %IOMMU_FAULT_PAGE_REQ
> - * @padding2: sets the fault size to allow for future extensions
>    */
>   struct iommu_fault {
>   	__u32	type;
> -	__u32	padding;
> -	union {
> -		struct iommu_fault_unrecoverable event;
> -		struct iommu_fault_page_request prm;
> -		__u8 padding2[56];
> -	};
> +	struct iommu_fault_page_request prm;
>   };
>   
>   /**

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v7 04/12] iommu: Cleanup iopf data structure definitions
  2023-11-15  3:02 ` [PATCH v7 04/12] iommu: Cleanup iopf data structure definitions Lu Baolu
@ 2023-12-04 11:03   ` Yi Liu
  0 siblings, 0 replies; 49+ messages in thread
From: Yi Liu @ 2023-12-04 11:03 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, Will Deacon, Robin Murphy,
	Jason Gunthorpe, Kevin Tian, Jean-Philippe Brucker, Nicolin Chen
  Cc: Jacob Pan, Yan Zhao, iommu, kvm, linux-kernel, Jason Gunthorpe

On 2023/11/15 11:02, Lu Baolu wrote:
> struct iommu_fault_page_request and struct iommu_page_response are not
> part of uAPI anymore. Convert them to data structures for kAPI.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Tested-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
>   include/linux/iommu.h      | 27 +++++++++++----------------
>   drivers/iommu/io-pgfault.c |  1 -
>   drivers/iommu/iommu.c      |  4 ----
>   3 files changed, 11 insertions(+), 21 deletions(-)

Reviewed-by: Yi Liu <yi.l.liu@intel.com>

> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 81eee1afec72..79775859af42 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -71,12 +71,12 @@ struct iommu_fault_page_request {
>   #define IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE	(1 << 1)
>   #define IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA	(1 << 2)
>   #define IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID	(1 << 3)
> -	__u32	flags;
> -	__u32	pasid;
> -	__u32	grpid;
> -	__u32	perm;
> -	__u64	addr;
> -	__u64	private_data[2];
> +	u32	flags;
> +	u32	pasid;
> +	u32	grpid;
> +	u32	perm;
> +	u64	addr;
> +	u64	private_data[2];
>   };
>   
>   /**
> @@ -85,7 +85,7 @@ struct iommu_fault_page_request {
>    * @prm: Page Request message, when @type is %IOMMU_FAULT_PAGE_REQ
>    */
>   struct iommu_fault {
> -	__u32	type;
> +	u32 type;
>   	struct iommu_fault_page_request prm;
>   };
>   
> @@ -106,8 +106,6 @@ enum iommu_page_response_code {
>   
>   /**
>    * struct iommu_page_response - Generic page response information
> - * @argsz: User filled size of this data
> - * @version: API version of this structure
>    * @flags: encodes whether the corresponding fields are valid
>    *         (IOMMU_FAULT_PAGE_RESPONSE_* values)
>    * @pasid: Process Address Space ID
> @@ -115,14 +113,11 @@ enum iommu_page_response_code {
>    * @code: response code from &enum iommu_page_response_code
>    */
>   struct iommu_page_response {
> -	__u32	argsz;
> -#define IOMMU_PAGE_RESP_VERSION_1	1
> -	__u32	version;
>   #define IOMMU_PAGE_RESP_PASID_VALID	(1 << 0)
> -	__u32	flags;
> -	__u32	pasid;
> -	__u32	grpid;
> -	__u32	code;
> +	u32	flags;
> +	u32	pasid;
> +	u32	grpid;
> +	u32	code;
>   };
>   
>   
> diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
> index e5b8b9110c13..24b5545352ae 100644
> --- a/drivers/iommu/io-pgfault.c
> +++ b/drivers/iommu/io-pgfault.c
> @@ -56,7 +56,6 @@ static int iopf_complete_group(struct device *dev, struct iopf_fault *iopf,
>   			       enum iommu_page_response_code status)
>   {
>   	struct iommu_page_response resp = {
> -		.version		= IOMMU_PAGE_RESP_VERSION_1,
>   		.pasid			= iopf->fault.prm.pasid,
>   		.grpid			= iopf->fault.prm.grpid,
>   		.code			= status,
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index f17a1113f3d6..f24513e2b025 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1465,10 +1465,6 @@ int iommu_page_response(struct device *dev,
>   	if (!param || !param->fault_param)
>   		return -EINVAL;
>   
> -	if (msg->version != IOMMU_PAGE_RESP_VERSION_1 ||
> -	    msg->flags & ~IOMMU_PAGE_RESP_PASID_VALID)
> -		return -EINVAL;
> -
>   	/* Only send response if there is a fault report pending */
>   	mutex_lock(&param->fault_param->lock);
>   	if (list_empty(&param->fault_param->faults)) {

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v7 05/12] iommu: Merge iopf_device_param into iommu_fault_param
  2023-11-15  3:02 ` [PATCH v7 05/12] iommu: Merge iopf_device_param into iommu_fault_param Lu Baolu
@ 2023-12-04 12:32   ` Yi Liu
  2023-12-05 12:01     ` Baolu Lu
  0 siblings, 1 reply; 49+ messages in thread
From: Yi Liu @ 2023-12-04 12:32 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, Will Deacon, Robin Murphy,
	Jason Gunthorpe, Kevin Tian, Jean-Philippe Brucker, Nicolin Chen
  Cc: Jacob Pan, Yan Zhao, iommu, kvm, linux-kernel, Jason Gunthorpe

On 2023/11/15 11:02, Lu Baolu wrote:
> The struct dev_iommu contains two pointers, fault_param and iopf_param.
> The fault_param pointer points to a data structure that is used to store
> pending faults that are awaiting responses. The iopf_param pointer points
> to a data structure that is used to store partial faults that are part of
> a Page Request Group.
> 
> The fault_param and iopf_param pointers are essentially duplicate. This
> causes memory waste. Merge the iopf_device_param pointer into the
> iommu_fault_param pointer to consolidate the code and save memory. The
> consolidated pointer would be allocated on demand when the device driver
> enables the iopf on device, and would be freed after iopf is disabled.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Tested-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
>   include/linux/iommu.h      |  18 ++++--
>   drivers/iommu/io-pgfault.c | 113 ++++++++++++++++++-------------------
>   drivers/iommu/iommu.c      |  34 ++---------
>   3 files changed, 75 insertions(+), 90 deletions(-)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 79775859af42..108ab50da1ad 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -42,6 +42,7 @@ struct notifier_block;
>   struct iommu_sva;
>   struct iommu_fault_event;
>   struct iommu_dma_cookie;
> +struct iopf_queue;
>   
>   #define IOMMU_FAULT_PERM_READ	(1 << 0) /* read */
>   #define IOMMU_FAULT_PERM_WRITE	(1 << 1) /* write */
> @@ -590,21 +591,31 @@ struct iommu_fault_event {
>    * struct iommu_fault_param - per-device IOMMU fault data
>    * @handler: Callback function to handle IOMMU faults at device level
>    * @data: handler private data
> - * @faults: holds the pending faults which needs response
>    * @lock: protect pending faults list
> + * @dev: the device that owns this param
> + * @queue: IOPF queue
> + * @queue_list: index into queue->devices
> + * @partial: faults that are part of a Page Request Group for which the last
> + *           request hasn't been submitted yet.
> + * @faults: holds the pending faults which needs response

since you already moved this line, maybe fix this typo as well.
s/needs/need/

>    */
>   struct iommu_fault_param {
>   	iommu_dev_fault_handler_t handler;
>   	void *data;
> +	struct mutex lock;

can you share why move this line up? It results in a line move as well
in the above kdoc.

> +
> +	struct device *dev;
> +	struct iopf_queue *queue;
> +	struct list_head queue_list;
> +
> +	struct list_head partial;
>   	struct list_head faults;
> -	struct mutex lock;
>   };
>   
>   /**
>    * struct dev_iommu - Collection of per-device IOMMU data
>    *
>    * @fault_param: IOMMU detected device fault reporting data
> - * @iopf_param:	 I/O Page Fault queue and data
>    * @fwspec:	 IOMMU fwspec data
>    * @iommu_dev:	 IOMMU device this device is linked to
>    * @priv:	 IOMMU Driver private data
> @@ -620,7 +631,6 @@ struct iommu_fault_param {
>   struct dev_iommu {
>   	struct mutex lock;
>   	struct iommu_fault_param	*fault_param;
> -	struct iopf_device_param	*iopf_param;
>   	struct iommu_fwspec		*fwspec;
>   	struct iommu_device		*iommu_dev;
>   	void				*priv;
> diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
> index 24b5545352ae..b1cf28055525 100644
> --- a/drivers/iommu/io-pgfault.c
> +++ b/drivers/iommu/io-pgfault.c
> @@ -25,21 +25,6 @@ struct iopf_queue {
>   	struct mutex			lock;
>   };
>   
> -/**
> - * struct iopf_device_param - IO Page Fault data attached to a device
> - * @dev: the device that owns this param
> - * @queue: IOPF queue
> - * @queue_list: index into queue->devices
> - * @partial: faults that are part of a Page Request Group for which the last
> - *           request hasn't been submitted yet.
> - */
> -struct iopf_device_param {
> -	struct device			*dev;
> -	struct iopf_queue		*queue;
> -	struct list_head		queue_list;
> -	struct list_head		partial;
> -};
> -
>   struct iopf_fault {
>   	struct iommu_fault		fault;
>   	struct list_head		list;
> @@ -144,7 +129,7 @@ int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
>   	int ret;
>   	struct iopf_group *group;
>   	struct iopf_fault *iopf, *next;
> -	struct iopf_device_param *iopf_param;
> +	struct iommu_fault_param *iopf_param;
>   
>   	struct device *dev = cookie;
>   	struct dev_iommu *param = dev->iommu;
> @@ -159,7 +144,7 @@ int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
>   	 * As long as we're holding param->lock, the queue can't be unlinked
>   	 * from the device and therefore cannot disappear.
>   	 */
> -	iopf_param = param->iopf_param;
> +	iopf_param = param->fault_param;
>   	if (!iopf_param)
>   		return -ENODEV;
>   
> @@ -229,14 +214,14 @@ EXPORT_SYMBOL_GPL(iommu_queue_iopf);
>   int iopf_queue_flush_dev(struct device *dev)
>   {
>   	int ret = 0;
> -	struct iopf_device_param *iopf_param;
> +	struct iommu_fault_param *iopf_param;
>   	struct dev_iommu *param = dev->iommu;
>   
>   	if (!param)
>   		return -ENODEV;
>   
>   	mutex_lock(&param->lock);
> -	iopf_param = param->iopf_param;
> +	iopf_param = param->fault_param;
>   	if (iopf_param)
>   		flush_workqueue(iopf_param->queue->wq);
>   	else
> @@ -260,7 +245,7 @@ EXPORT_SYMBOL_GPL(iopf_queue_flush_dev);
>   int iopf_queue_discard_partial(struct iopf_queue *queue)
>   {
>   	struct iopf_fault *iopf, *next;
> -	struct iopf_device_param *iopf_param;
> +	struct iommu_fault_param *iopf_param;
>   
>   	if (!queue)
>   		return -EINVAL;
> @@ -287,34 +272,38 @@ EXPORT_SYMBOL_GPL(iopf_queue_discard_partial);
>    */
>   int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev)
>   {
> -	int ret = -EBUSY;
> -	struct iopf_device_param *iopf_param;
> +	int ret = 0;
>   	struct dev_iommu *param = dev->iommu;
> -
> -	if (!param)
> -		return -ENODEV;
> -
> -	iopf_param = kzalloc(sizeof(*iopf_param), GFP_KERNEL);
> -	if (!iopf_param)
> -		return -ENOMEM;
> -
> -	INIT_LIST_HEAD(&iopf_param->partial);
> -	iopf_param->queue = queue;
> -	iopf_param->dev = dev;
> +	struct iommu_fault_param *fault_param;
>   
>   	mutex_lock(&queue->lock);
>   	mutex_lock(&param->lock);
> -	if (!param->iopf_param) {
> -		list_add(&iopf_param->queue_list, &queue->devices);
> -		param->iopf_param = iopf_param;
> -		ret = 0;
> +	if (param->fault_param) {
> +		ret = -EBUSY;
> +		goto done_unlock;
>   	}
> +
> +	get_device(dev);

noticed the old code has this get as well. :) but still want to ask if
it is really need.

> +	fault_param = kzalloc(sizeof(*fault_param), GFP_KERNEL);
> +	if (!fault_param) {
> +		put_device(dev);
> +		ret = -ENOMEM;
> +		goto done_unlock;
> +	}
> +
> +	mutex_init(&fault_param->lock);
> +	INIT_LIST_HEAD(&fault_param->faults);
> +	INIT_LIST_HEAD(&fault_param->partial);
> +	fault_param->dev = dev;
> +	list_add(&fault_param->queue_list, &queue->devices);
> +	fault_param->queue = queue;
> +
> +	param->fault_param = fault_param;
> +
> +done_unlock:
>   	mutex_unlock(&param->lock);
>   	mutex_unlock(&queue->lock);
>   
> -	if (ret)
> -		kfree(iopf_param);
> -
>   	return ret;
>   }
>   EXPORT_SYMBOL_GPL(iopf_queue_add_device);
> @@ -330,34 +319,42 @@ EXPORT_SYMBOL_GPL(iopf_queue_add_device);
>    */
>   int iopf_queue_remove_device(struct iopf_queue *queue, struct device *dev)
>   {
> -	int ret = -EINVAL;
> +	int ret = 0;
>   	struct iopf_fault *iopf, *next;
> -	struct iopf_device_param *iopf_param;
>   	struct dev_iommu *param = dev->iommu;
> -
> -	if (!param || !queue)
> -		return -EINVAL;
> +	struct iommu_fault_param *fault_param = param->fault_param;
>   
>   	mutex_lock(&queue->lock);
>   	mutex_lock(&param->lock);
> -	iopf_param = param->iopf_param;
> -	if (iopf_param && iopf_param->queue == queue) {
> -		list_del(&iopf_param->queue_list);
> -		param->iopf_param = NULL;
> -		ret = 0;
> +	if (!fault_param) {
> +		ret = -ENODEV;
> +		goto unlock;
>   	}
> -	mutex_unlock(&param->lock);
> -	mutex_unlock(&queue->lock);
> -	if (ret)
> -		return ret;
> +
> +	if (fault_param->queue != queue) {
> +		ret = -EINVAL;
> +		goto unlock;
> +	}
> +
> +	if (!list_empty(&fault_param->faults)) {
> +		ret = -EBUSY;
> +		goto unlock;
> +	}
> +
> +	list_del(&fault_param->queue_list);
>   
>   	/* Just in case some faults are still stuck */
> -	list_for_each_entry_safe(iopf, next, &iopf_param->partial, list)
> +	list_for_each_entry_safe(iopf, next, &fault_param->partial, list)
>   		kfree(iopf);
>   
> -	kfree(iopf_param);
> +	param->fault_param = NULL;
> +	kfree(fault_param);
> +	put_device(dev);
> +unlock:
> +	mutex_unlock(&param->lock);
> +	mutex_unlock(&queue->lock);
>   
> -	return 0;
> +	return ret;
>   }
>   EXPORT_SYMBOL_GPL(iopf_queue_remove_device);
>   
> @@ -403,7 +400,7 @@ EXPORT_SYMBOL_GPL(iopf_queue_alloc);
>    */
>   void iopf_queue_free(struct iopf_queue *queue)
>   {
> -	struct iopf_device_param *iopf_param, *next;
> +	struct iommu_fault_param *iopf_param, *next;
>   
>   	if (!queue)
>   		return;
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index f24513e2b025..9c9eacfa6761 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1326,27 +1326,18 @@ int iommu_register_device_fault_handler(struct device *dev,
>   	struct dev_iommu *param = dev->iommu;
>   	int ret = 0;
>   
> -	if (!param)
> +	if (!param || !param->fault_param)
>   		return -EINVAL;
>   
>   	mutex_lock(&param->lock);
>   	/* Only allow one fault handler registered for each device */
> -	if (param->fault_param) {
> +	if (param->fault_param->handler) {
>   		ret = -EBUSY;
>   		goto done_unlock;
>   	}
>   
> -	get_device(dev);
> -	param->fault_param = kzalloc(sizeof(*param->fault_param), GFP_KERNEL);
> -	if (!param->fault_param) {
> -		put_device(dev);
> -		ret = -ENOMEM;
> -		goto done_unlock;
> -	}
>   	param->fault_param->handler = handler;
>   	param->fault_param->data = data;
> -	mutex_init(&param->fault_param->lock);
> -	INIT_LIST_HEAD(&param->fault_param->faults);
>   
>   done_unlock:
>   	mutex_unlock(&param->lock);
> @@ -1367,29 +1358,16 @@ EXPORT_SYMBOL_GPL(iommu_register_device_fault_handler);
>   int iommu_unregister_device_fault_handler(struct device *dev)
>   {
>   	struct dev_iommu *param = dev->iommu;
> -	int ret = 0;
>   
> -	if (!param)
> +	if (!param || !param->fault_param)
>   		return -EINVAL;
>   
>   	mutex_lock(&param->lock);
> -
> -	if (!param->fault_param)
> -		goto unlock;
> -
> -	/* we cannot unregister handler if there are pending faults */
> -	if (!list_empty(&param->fault_param->faults)) {
> -		ret = -EBUSY;
> -		goto unlock;
> -	}
> -
> -	kfree(param->fault_param);
> -	param->fault_param = NULL;
> -	put_device(dev);
> -unlock:
> +	param->fault_param->handler = NULL;
> +	param->fault_param->data = NULL;
>   	mutex_unlock(&param->lock);
>   
> -	return ret;
> +	return 0;
>   }
>   EXPORT_SYMBOL_GPL(iommu_unregister_device_fault_handler);
>   

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v7 06/12] iommu: Remove iommu_[un]register_device_fault_handler()
  2023-11-15  3:02 ` [PATCH v7 06/12] iommu: Remove iommu_[un]register_device_fault_handler() Lu Baolu
@ 2023-12-04 12:36   ` Yi Liu
  2023-12-05 12:09     ` Baolu Lu
  0 siblings, 1 reply; 49+ messages in thread
From: Yi Liu @ 2023-12-04 12:36 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, Will Deacon, Robin Murphy,
	Jason Gunthorpe, Kevin Tian, Jean-Philippe Brucker, Nicolin Chen
  Cc: Jacob Pan, Yan Zhao, iommu, kvm, linux-kernel, Jason Gunthorpe

On 2023/11/15 11:02, Lu Baolu wrote:
> The individual iommu driver reports the iommu page faults by calling
> iommu_report_device_fault(), where a pre-registered device fault handler
> is called to route the fault to another fault handler installed on the
> corresponding iommu domain.
> 
> The pre-registered device fault handler is static and won't be dynamic
> as the fault handler is eventually per iommu domain. Replace calling
> device fault handler with iommu_queue_iopf().
> 
> After this replacement, the registering and unregistering fault handler
> interfaces are not needed anywhere. Remove the interfaces and the related
> data structures to avoid dead code.
> 
> Convert cookie parameter of iommu_queue_iopf() into a device pointer that
> is really passed.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Tested-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
>   include/linux/iommu.h                         | 23 ------
>   drivers/iommu/iommu-sva.h                     |  4 +-
>   .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   | 13 +---
>   drivers/iommu/intel/iommu.c                   | 24 ++----
>   drivers/iommu/io-pgfault.c                    |  6 +-
>   drivers/iommu/iommu.c                         | 76 +------------------
>   6 files changed, 13 insertions(+), 133 deletions(-)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 108ab50da1ad..a45d92cc31ec 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -128,7 +128,6 @@ struct iommu_page_response {
>   
>   typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
>   			struct device *, unsigned long, int, void *);
> -typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault *, void *);
>   
>   struct iommu_domain_geometry {
>   	dma_addr_t aperture_start; /* First address that can be mapped    */
> @@ -589,8 +588,6 @@ struct iommu_fault_event {
>   
>   /**
>    * struct iommu_fault_param - per-device IOMMU fault data
> - * @handler: Callback function to handle IOMMU faults at device level
> - * @data: handler private data
>    * @lock: protect pending faults list
>    * @dev: the device that owns this param
>    * @queue: IOPF queue
> @@ -600,8 +597,6 @@ struct iommu_fault_event {
>    * @faults: holds the pending faults which needs response
>    */
>   struct iommu_fault_param {
> -	iommu_dev_fault_handler_t handler;
> -	void *data;
>   	struct mutex lock;
>   
>   	struct device *dev;
> @@ -724,11 +719,6 @@ extern int iommu_group_for_each_dev(struct iommu_group *group, void *data,
>   extern struct iommu_group *iommu_group_get(struct device *dev);
>   extern struct iommu_group *iommu_group_ref_get(struct iommu_group *group);
>   extern void iommu_group_put(struct iommu_group *group);
> -extern int iommu_register_device_fault_handler(struct device *dev,
> -					iommu_dev_fault_handler_t handler,
> -					void *data);
> -
> -extern int iommu_unregister_device_fault_handler(struct device *dev);
>   
>   extern int iommu_report_device_fault(struct device *dev,
>   				     struct iommu_fault_event *evt);
> @@ -1137,19 +1127,6 @@ static inline void iommu_group_put(struct iommu_group *group)
>   {
>   }
>   
> -static inline
> -int iommu_register_device_fault_handler(struct device *dev,
> -					iommu_dev_fault_handler_t handler,
> -					void *data)
> -{
> -	return -ENODEV;
> -}
> -
> -static inline int iommu_unregister_device_fault_handler(struct device *dev)
> -{
> -	return 0;
> -}
> -
>   static inline
>   int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
>   {
> diff --git a/drivers/iommu/iommu-sva.h b/drivers/iommu/iommu-sva.h
> index 54946b5a7caf..de7819c796ce 100644
> --- a/drivers/iommu/iommu-sva.h
> +++ b/drivers/iommu/iommu-sva.h
> @@ -13,7 +13,7 @@ struct iommu_fault;
>   struct iopf_queue;
>   
>   #ifdef CONFIG_IOMMU_SVA
> -int iommu_queue_iopf(struct iommu_fault *fault, void *cookie);
> +int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev);
>   
>   int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev);
>   int iopf_queue_remove_device(struct iopf_queue *queue,
> @@ -26,7 +26,7 @@ enum iommu_page_response_code
>   iommu_sva_handle_iopf(struct iommu_fault *fault, void *data);
>   
>   #else /* CONFIG_IOMMU_SVA */
> -static inline int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
> +static inline int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev)
>   {
>   	return -ENODEV;
>   }
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> index 353248ab18e7..84c9554144cb 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> @@ -480,7 +480,6 @@ bool arm_smmu_master_sva_enabled(struct arm_smmu_master *master)
>   
>   static int arm_smmu_master_sva_enable_iopf(struct arm_smmu_master *master)
>   {
> -	int ret;
>   	struct device *dev = master->dev;
>   
>   	/*
> @@ -493,16 +492,7 @@ static int arm_smmu_master_sva_enable_iopf(struct arm_smmu_master *master)
>   	if (!master->iopf_enabled)
>   		return -EINVAL;
>   
> -	ret = iopf_queue_add_device(master->smmu->evtq.iopf, dev);
> -	if (ret)
> -		return ret;
> -
> -	ret = iommu_register_device_fault_handler(dev, iommu_queue_iopf, dev);
> -	if (ret) {
> -		iopf_queue_remove_device(master->smmu->evtq.iopf, dev);
> -		return ret;
> -	}
> -	return 0;
> +	return iopf_queue_add_device(master->smmu->evtq.iopf, dev);
>   }
>   
>   static void arm_smmu_master_sva_disable_iopf(struct arm_smmu_master *master)
> @@ -512,7 +502,6 @@ static void arm_smmu_master_sva_disable_iopf(struct arm_smmu_master *master)
>   	if (!master->iopf_enabled)
>   		return;
>   
> -	iommu_unregister_device_fault_handler(dev);
>   	iopf_queue_remove_device(master->smmu->evtq.iopf, dev);
>   }
>   
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 3531b956556c..cbe65827730d 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -4616,23 +4616,15 @@ static int intel_iommu_enable_iopf(struct device *dev)
>   	if (ret)
>   		return ret;
>   
> -	ret = iommu_register_device_fault_handler(dev, iommu_queue_iopf, dev);
> -	if (ret)
> -		goto iopf_remove_device;
> -
>   	ret = pci_enable_pri(pdev, PRQ_DEPTH);
> -	if (ret)
> -		goto iopf_unregister_handler;
> +	if (ret) {
> +		iopf_queue_remove_device(iommu->iopf_queue, dev);
> +		return ret;
> +	}
> +
>   	info->pri_enabled = 1;
>   
>   	return 0;
> -
> -iopf_unregister_handler:
> -	iommu_unregister_device_fault_handler(dev);
> -iopf_remove_device:
> -	iopf_queue_remove_device(iommu->iopf_queue, dev);
> -
> -	return ret;
>   }
>   
>   static int intel_iommu_disable_iopf(struct device *dev)
> @@ -4655,11 +4647,9 @@ static int intel_iommu_disable_iopf(struct device *dev)
>   	info->pri_enabled = 0;
>   
>   	/*
> -	 * With PRI disabled and outstanding PRQs drained, unregistering
> -	 * fault handler and removing device from iopf queue should never
> -	 * fail.
> +	 * With PRI disabled and outstanding PRQs drained, removing device
> +	 * from iopf queue should never fail.
>   	 */
> -	WARN_ON(iommu_unregister_device_fault_handler(dev));
>   	WARN_ON(iopf_queue_remove_device(iommu->iopf_queue, dev));
>   
>   	return 0;
> diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
> index b1cf28055525..31832aeacdba 100644
> --- a/drivers/iommu/io-pgfault.c
> +++ b/drivers/iommu/io-pgfault.c
> @@ -87,7 +87,7 @@ static void iopf_handler(struct work_struct *work)
>   /**
>    * iommu_queue_iopf - IO Page Fault handler
>    * @fault: fault event
> - * @cookie: struct device, passed to iommu_register_device_fault_handler.
> + * @dev: struct device.
>    *
>    * Add a fault to the device workqueue, to be handled by mm.
>    *
> @@ -124,14 +124,12 @@ static void iopf_handler(struct work_struct *work)
>    *
>    * Return: 0 on success and <0 on error.
>    */
> -int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
> +int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev)
>   {
>   	int ret;
>   	struct iopf_group *group;
>   	struct iopf_fault *iopf, *next;
>   	struct iommu_fault_param *iopf_param;
> -
> -	struct device *dev = cookie;
>   	struct dev_iommu *param = dev->iommu;
>   
>   	lockdep_assert_held(&param->lock);
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 9c9eacfa6761..0c6700b6659a 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1301,76 +1301,6 @@ void iommu_group_put(struct iommu_group *group)
>   }
>   EXPORT_SYMBOL_GPL(iommu_group_put);
>   
> -/**
> - * iommu_register_device_fault_handler() - Register a device fault handler
> - * @dev: the device
> - * @handler: the fault handler
> - * @data: private data passed as argument to the handler
> - *
> - * When an IOMMU fault event is received, this handler gets called with the
> - * fault event and data as argument. The handler should return 0 on success. If
> - * the fault is recoverable (IOMMU_FAULT_PAGE_REQ), the consumer should also
> - * complete the fault by calling iommu_page_response() with one of the following
> - * response code:
> - * - IOMMU_PAGE_RESP_SUCCESS: retry the translation
> - * - IOMMU_PAGE_RESP_INVALID: terminate the fault
> - * - IOMMU_PAGE_RESP_FAILURE: terminate the fault and stop reporting
> - *   page faults if possible.
> - *
> - * Return 0 if the fault handler was installed successfully, or an error.
> - */
> -int iommu_register_device_fault_handler(struct device *dev,
> -					iommu_dev_fault_handler_t handler,
> -					void *data)
> -{
> -	struct dev_iommu *param = dev->iommu;
> -	int ret = 0;
> -
> -	if (!param || !param->fault_param)
> -		return -EINVAL;
> -
> -	mutex_lock(&param->lock);
> -	/* Only allow one fault handler registered for each device */
> -	if (param->fault_param->handler) {
> -		ret = -EBUSY;
> -		goto done_unlock;
> -	}
> -
> -	param->fault_param->handler = handler;
> -	param->fault_param->data = data;
> -
> -done_unlock:
> -	mutex_unlock(&param->lock);
> -
> -	return ret;
> -}
> -EXPORT_SYMBOL_GPL(iommu_register_device_fault_handler);
> -
> -/**
> - * iommu_unregister_device_fault_handler() - Unregister the device fault handler
> - * @dev: the device
> - *
> - * Remove the device fault handler installed with
> - * iommu_register_device_fault_handler().
> - *
> - * Return 0 on success, or an error.
> - */
> -int iommu_unregister_device_fault_handler(struct device *dev)
> -{
> -	struct dev_iommu *param = dev->iommu;
> -
> -	if (!param || !param->fault_param)
> -		return -EINVAL;
> -
> -	mutex_lock(&param->lock);
> -	param->fault_param->handler = NULL;
> -	param->fault_param->data = NULL;
> -	mutex_unlock(&param->lock);
> -
> -	return 0;
> -}
> -EXPORT_SYMBOL_GPL(iommu_unregister_device_fault_handler);
> -
>   /**
>    * iommu_report_device_fault() - Report fault event to device driver
>    * @dev: the device
> @@ -1395,10 +1325,6 @@ int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
>   	/* we only report device fault if there is a handler registered */
>   	mutex_lock(&param->lock);
>   	fparam = param->fault_param;
> -	if (!fparam || !fparam->handler) {

should it still check the fparam?

> -		ret = -EINVAL;
> -		goto done_unlock;
> -	}
>   
>   	if (evt->fault.type == IOMMU_FAULT_PAGE_REQ &&
>   	    (evt->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) {
> @@ -1413,7 +1339,7 @@ int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
>   		mutex_unlock(&fparam->lock);
>   	}
>   
> -	ret = fparam->handler(&evt->fault, fparam->data);
> +	ret = iommu_queue_iopf(&evt->fault, dev);
>   	if (ret && evt_pending) {
>   		mutex_lock(&fparam->lock);
>   		list_del(&evt_pending->list);

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v7 07/12] iommu: Merge iommu_fault_event and iopf_fault
  2023-11-15  3:02 ` [PATCH v7 07/12] iommu: Merge iommu_fault_event and iopf_fault Lu Baolu
  2023-12-01 19:09   ` Jason Gunthorpe
@ 2023-12-04 12:40   ` Yi Liu
  1 sibling, 0 replies; 49+ messages in thread
From: Yi Liu @ 2023-12-04 12:40 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, Will Deacon, Robin Murphy,
	Jason Gunthorpe, Kevin Tian, Jean-Philippe Brucker, Nicolin Chen
  Cc: Jacob Pan, Yan Zhao, iommu, kvm, linux-kernel

On 2023/11/15 11:02, Lu Baolu wrote:
> The iommu_fault_event and iopf_fault data structures store the same
> information about an iopf fault. They are also used in the same way.
> Merge these two data structures into a single one to make the code
> more concise and easier to maintain.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Tested-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
>   include/linux/iommu.h                       | 27 ++++++---------------
>   drivers/iommu/intel/iommu.h                 |  2 +-
>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  4 +--
>   drivers/iommu/intel/svm.c                   |  5 ++--
>   drivers/iommu/io-pgfault.c                  |  5 ----
>   drivers/iommu/iommu.c                       |  8 +++---
>   6 files changed, 17 insertions(+), 34 deletions(-)

Reviewed-by: Yi Liu <yi.l.liu@intel.com>

> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index a45d92cc31ec..42b62bc8737a 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -40,7 +40,6 @@ struct iommu_domain_ops;
>   struct iommu_dirty_ops;
>   struct notifier_block;
>   struct iommu_sva;
> -struct iommu_fault_event;
>   struct iommu_dma_cookie;
>   struct iopf_queue;
>   
> @@ -121,6 +120,11 @@ struct iommu_page_response {
>   	u32	code;
>   };
>   
> +struct iopf_fault {
> +	struct iommu_fault fault;
> +	/* node for pending lists */
> +	struct list_head list;
> +};
>   
>   /* iommu fault flags */
>   #define IOMMU_FAULT_READ	0x0
> @@ -480,7 +484,7 @@ struct iommu_ops {
>   	int (*dev_disable_feat)(struct device *dev, enum iommu_dev_features f);
>   
>   	int (*page_response)(struct device *dev,
> -			     struct iommu_fault_event *evt,
> +			     struct iopf_fault *evt,
>   			     struct iommu_page_response *msg);
>   
>   	int (*def_domain_type)(struct device *dev);
> @@ -572,20 +576,6 @@ struct iommu_device {
>   	u32 max_pasids;
>   };
>   
> -/**
> - * struct iommu_fault_event - Generic fault event
> - *
> - * Can represent recoverable faults such as a page requests or
> - * unrecoverable faults such as DMA or IRQ remapping faults.
> - *
> - * @fault: fault descriptor
> - * @list: pending fault event list, used for tracking responses
> - */
> -struct iommu_fault_event {
> -	struct iommu_fault fault;
> -	struct list_head list;
> -};
> -
>   /**
>    * struct iommu_fault_param - per-device IOMMU fault data
>    * @lock: protect pending faults list
> @@ -720,8 +710,7 @@ extern struct iommu_group *iommu_group_get(struct device *dev);
>   extern struct iommu_group *iommu_group_ref_get(struct iommu_group *group);
>   extern void iommu_group_put(struct iommu_group *group);
>   
> -extern int iommu_report_device_fault(struct device *dev,
> -				     struct iommu_fault_event *evt);
> +extern int iommu_report_device_fault(struct device *dev, struct iopf_fault *evt);
>   extern int iommu_page_response(struct device *dev,
>   			       struct iommu_page_response *msg);
>   
> @@ -1128,7 +1117,7 @@ static inline void iommu_group_put(struct iommu_group *group)
>   }
>   
>   static inline
> -int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
> +int iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
>   {
>   	return -ENODEV;
>   }
> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> index 65d37a138c75..a1ddd5132aae 100644
> --- a/drivers/iommu/intel/iommu.h
> +++ b/drivers/iommu/intel/iommu.h
> @@ -905,7 +905,7 @@ struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *parent,
>   void intel_svm_check(struct intel_iommu *iommu);
>   int intel_svm_enable_prq(struct intel_iommu *iommu);
>   int intel_svm_finish_prq(struct intel_iommu *iommu);
> -int intel_svm_page_response(struct device *dev, struct iommu_fault_event *evt,
> +int intel_svm_page_response(struct device *dev, struct iopf_fault *evt,
>   			    struct iommu_page_response *msg);
>   struct iommu_domain *intel_svm_domain_alloc(void);
>   void intel_svm_remove_dev_pasid(struct device *dev, ioasid_t pasid);
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 505400538a2e..46780793b743 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -922,7 +922,7 @@ static int arm_smmu_cmdq_batch_submit(struct arm_smmu_device *smmu,
>   }
>   
>   static int arm_smmu_page_response(struct device *dev,
> -				  struct iommu_fault_event *unused,
> +				  struct iopf_fault *unused,
>   				  struct iommu_page_response *resp)
>   {
>   	struct arm_smmu_cmdq_ent cmd = {0};
> @@ -1467,7 +1467,7 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt)
>   	struct arm_smmu_master *master;
>   	bool ssid_valid = evt[0] & EVTQ_0_SSV;
>   	u32 sid = FIELD_GET(EVTQ_0_SID, evt[0]);
> -	struct iommu_fault_event fault_evt = { };
> +	struct iopf_fault fault_evt = { };
>   	struct iommu_fault *flt = &fault_evt.fault;
>   
>   	switch (FIELD_GET(EVTQ_0_ID, evt[0])) {
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index 50a481c895b8..9de349ea215c 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -543,13 +543,12 @@ static int prq_to_iommu_prot(struct page_req_dsc *req)
>   static int intel_svm_prq_report(struct intel_iommu *iommu, struct device *dev,
>   				struct page_req_dsc *desc)
>   {
> -	struct iommu_fault_event event;
> +	struct iopf_fault event = { };
>   
>   	if (!dev || !dev_is_pci(dev))
>   		return -ENODEV;
>   
>   	/* Fill in event data for device specific processing */
> -	memset(&event, 0, sizeof(struct iommu_fault_event));
>   	event.fault.type = IOMMU_FAULT_PAGE_REQ;
>   	event.fault.prm.addr = (u64)desc->addr << VTD_PAGE_SHIFT;
>   	event.fault.prm.pasid = desc->pasid;
> @@ -721,7 +720,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
>   }
>   
>   int intel_svm_page_response(struct device *dev,
> -			    struct iommu_fault_event *evt,
> +			    struct iopf_fault *evt,
>   			    struct iommu_page_response *msg)
>   {
>   	struct iommu_fault_page_request *prm;
> diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
> index 31832aeacdba..c45977bb7da3 100644
> --- a/drivers/iommu/io-pgfault.c
> +++ b/drivers/iommu/io-pgfault.c
> @@ -25,11 +25,6 @@ struct iopf_queue {
>   	struct mutex			lock;
>   };
>   
> -struct iopf_fault {
> -	struct iommu_fault		fault;
> -	struct list_head		list;
> -};
> -
>   struct iopf_group {
>   	struct iopf_fault		last_fault;
>   	struct list_head		faults;
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 0c6700b6659a..36b597bb8a09 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1312,10 +1312,10 @@ EXPORT_SYMBOL_GPL(iommu_group_put);
>    *
>    * Return 0 on success, or an error.
>    */
> -int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
> +int iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
>   {
>   	struct dev_iommu *param = dev->iommu;
> -	struct iommu_fault_event *evt_pending = NULL;
> +	struct iopf_fault *evt_pending = NULL;
>   	struct iommu_fault_param *fparam;
>   	int ret = 0;
>   
> @@ -1328,7 +1328,7 @@ int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt)
>   
>   	if (evt->fault.type == IOMMU_FAULT_PAGE_REQ &&
>   	    (evt->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) {
> -		evt_pending = kmemdup(evt, sizeof(struct iommu_fault_event),
> +		evt_pending = kmemdup(evt, sizeof(struct iopf_fault),
>   				      GFP_KERNEL);
>   		if (!evt_pending) {
>   			ret = -ENOMEM;
> @@ -1357,7 +1357,7 @@ int iommu_page_response(struct device *dev,
>   {
>   	bool needs_pasid;
>   	int ret = -EINVAL;
> -	struct iommu_fault_event *evt;
> +	struct iopf_fault *evt;
>   	struct iommu_fault_page_request *prm;
>   	struct dev_iommu *param = dev->iommu;
>   	const struct iommu_ops *ops = dev_iommu_ops(dev);

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v7 12/12] iommu: Improve iopf_queue_flush_dev()
  2023-12-04  1:32         ` Baolu Lu
  2023-12-04  5:37           ` Tian, Kevin
@ 2023-12-04 13:12           ` Jason Gunthorpe
  1 sibling, 0 replies; 49+ messages in thread
From: Jason Gunthorpe @ 2023-12-04 13:12 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan, Yan Zhao,
	iommu, kvm, linux-kernel

On Mon, Dec 04, 2023 at 09:32:37AM +0800, Baolu Lu wrote:

> > 
> > I know that is why it does, but it doesn't explain at all why.
> > 
> > > 1. Clears the pasid translation entry. Thus, all subsequent DMA
> > >     transactions (translation requests, translated requests or page
> > >     requests) targeting the iommu domain will be blocked.
> > > 
> > > 2. Waits until all pending page requests for the device's PASID have
> > >     been reported to upper layers via the iommu_report_device_fault().
> > >     However, this does not guarantee that all page requests have been
> > >     responded.
> > > 
> > > 3. Free all partial page requests for this pasid since the page request
> > >     response is only needed for a complete request group. There's no
> > >     action required for the page requests which are not last of a request
> > >     group.
> > 
> > But we expect the last to come eventually since everything should be
> > grouped properly, so why bother doing this?
> > 
> > Indeed if 2 worked, how is this even possible to have partials?
> 
> Step 1 clears the pasid table entry, hence all subsequent page requests
> are blocked (hardware auto-respond the request but not put it in the
> queue).

OK, that part makes sense, but it should be clearly documented that is
why this stuff is going on with the partial list. 

"We have to clear the parial list as the new domain may not generate a
SW visible LAST. If it does generate a SW visible last then we simply
incompletely fault it and restart the device which will fix things on
retry"

> > Requests simply have to continue to be acked, it doesn't matter if
> > they are acked against the wrong domain because the device will simply
> > re-issue them..
> 
> Ah! I start to get your point now.
> 
> Even a page fault response is postponed to a new address space, which
> possibly be another address space or hardware blocking state, the
> hardware just retries.
> 
> As long as we flushes all caches (IOTLB and device TLB) during switching,
> the mappings of the old domain won't leak. So it's safe to keep page
> requests there.
> 
> Do I get you correctly?

Yes

It seems much simpler to me than trying to make this synchronous and
it is compatible with hitless replace of a PASID.

The lifetime and locking rules are also far more understandable

Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v7 12/12] iommu: Improve iopf_queue_flush_dev()
  2023-12-04  5:37           ` Tian, Kevin
@ 2023-12-04 13:25             ` Jason Gunthorpe
  2023-12-05  1:32               ` Tian, Kevin
  0 siblings, 1 reply; 49+ messages in thread
From: Jason Gunthorpe @ 2023-12-04 13:25 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Baolu Lu, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L, Jacob Pan, Zhao,
	Yan Y, iommu, kvm, linux-kernel

On Mon, Dec 04, 2023 at 05:37:13AM +0000, Tian, Kevin wrote:
> > From: Baolu Lu <baolu.lu@linux.intel.com>
> > Sent: Monday, December 4, 2023 9:33 AM
> > 
> > On 12/3/23 10:14 PM, Jason Gunthorpe wrote:
> > > On Sun, Dec 03, 2023 at 04:53:08PM +0800, Baolu Lu wrote:
> > >> Even if atomic replacement were to be implemented,
> > >> it would be necessary to ensure that all translation requests,
> > >> translated requests, page requests and responses for the old domain are
> > >> drained before switching to the new domain.
> > >
> > > Again, no it isn't required.
> > >
> > > Requests simply have to continue to be acked, it doesn't matter if
> > > they are acked against the wrong domain because the device will simply
> > > re-issue them..
> > 
> > Ah! I start to get your point now.
> > 
> > Even a page fault response is postponed to a new address space, which
> > possibly be another address space or hardware blocking state, the
> > hardware just retries.
> 
> if blocking then the device shouldn't retry.

It does retry.

The device is waiting on a PRI, it gets back an completion. It issues
a new ATS (this is the rety) and the new-domain responds back with a
failure indication.

If the new domain had a present page it would respond with a
translation

If the new domain has a non-present page then we get a new PRI.

The point is from a device perspective it is always doing something
correct.

> btw if a stale request targets an virtual address which is outside of the
> valid VMA's of the new address space then visible side-effect will
> be incurred in handle_mm_fault() on the new space. Is it desired?

The whole thing is racy, if someone is radically changing the
underlying mappings while DMA is ongoing then there is no way to
synchronize 'before' and 'after' against a concurrent external device.

So who cares?

What we care about is that the ATC is coherent and never has stale
data. The invalidation after changing the translation ensures this
regardless of any outstanding un-acked PRI.

> Or if a pending response carries an error code (Invalid Request) from
> the old address space is received by the device when the new address
> space is already activated, the hardware will report an error even
> though there might be a valid mapping in the new space.

Again, all racy. If a DMA is ongoing at the same instant things are
changed there is no definitive way to say if it resolved before or
after.

The only thing we care about is that dmas that are completed before
see the before translation and dmas that are started after see the
after translation.

DMAs that cross choose one at random.

> I don't think atomic replace is the main usage for this draining 
> requirement. Instead I'm more interested in the basic popular usage: 
> attach-detach-attach and not convinced that no draining is required
> between iommu/device to avoid interference between activities
> from old/new address space.

Something like IDXD needs to halt DMAs on the PASID and flush all
outstanding DMA to get to a state where the PASID is quiet from the
device perspective. This is the only way to stop interference.

If the device is still issuing DMA after the domain changes then it is
never going to work right.

If *IDXD* needs some help to flush PRIs after it halts DMAs (because
it can't do it on its own for some reason) then IDXD should have an
explicit call to do that, after suspending new DMA.

We don't know what things devices will need to do here, devices that
are able to wait for PRIs to complete may want a cancelling flush to
speed that up, and that shouldn't be part of the translation change.

IOW the act of halting DMA and the act of changing the translation
really should be different things. Then we get into interesting
questions like what sequence is required for a successful FLR. :\

Jason


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v7 12/12] iommu: Improve iopf_queue_flush_dev()
  2023-12-04  3:46     ` Baolu Lu
@ 2023-12-04 13:27       ` Jason Gunthorpe
  2023-12-05  1:13         ` Baolu Lu
  0 siblings, 1 reply; 49+ messages in thread
From: Jason Gunthorpe @ 2023-12-04 13:27 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan, Yan Zhao,
	iommu, kvm, linux-kernel

On Mon, Dec 04, 2023 at 11:46:30AM +0800, Baolu Lu wrote:
> On 12/2/23 4:35 AM, Jason Gunthorpe wrote:

> I am wondering whether we can take patch 1/12 ~ 10/12 of this series as
> a first step, a refactoring effort to support delivering iopf to
> userspace? I will follow up with one or multiple series to add the
> optimizations.

I think that is reasonable, though I would change the earlier patch to
use RCU to obtain the fault data.

Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v7 12/12] iommu: Improve iopf_queue_flush_dev()
  2023-12-04 13:27       ` Jason Gunthorpe
@ 2023-12-05  1:13         ` Baolu Lu
  0 siblings, 0 replies; 49+ messages in thread
From: Baolu Lu @ 2023-12-05  1:13 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan, Yan Zhao,
	iommu, kvm, linux-kernel

On 12/4/23 9:27 PM, Jason Gunthorpe wrote:
> On Mon, Dec 04, 2023 at 11:46:30AM +0800, Baolu Lu wrote:
>> On 12/2/23 4:35 AM, Jason Gunthorpe wrote:
>> I am wondering whether we can take patch 1/12 ~ 10/12 of this series as
>> a first step, a refactoring effort to support delivering iopf to
>> userspace? I will follow up with one or multiple series to add the
>> optimizations.
> I think that is reasonable, though I would change the earlier patch to
> use RCU to obtain the fault data.

All right! I will do this in the updated version.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [PATCH v7 12/12] iommu: Improve iopf_queue_flush_dev()
  2023-12-04 13:25             ` Jason Gunthorpe
@ 2023-12-05  1:32               ` Tian, Kevin
  2023-12-05  1:53                 ` Jason Gunthorpe
  0 siblings, 1 reply; 49+ messages in thread
From: Tian, Kevin @ 2023-12-05  1:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Baolu Lu, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L, Jacob Pan, Zhao,
	Yan Y, iommu, kvm, linux-kernel

> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Monday, December 4, 2023 9:25 PM
> 
> On Mon, Dec 04, 2023 at 05:37:13AM +0000, Tian, Kevin wrote:
> > > From: Baolu Lu <baolu.lu@linux.intel.com>
> > > Sent: Monday, December 4, 2023 9:33 AM
> > >
> > > On 12/3/23 10:14 PM, Jason Gunthorpe wrote:
> > > > On Sun, Dec 03, 2023 at 04:53:08PM +0800, Baolu Lu wrote:
> > > >> Even if atomic replacement were to be implemented,
> > > >> it would be necessary to ensure that all translation requests,
> > > >> translated requests, page requests and responses for the old domain
> are
> > > >> drained before switching to the new domain.
> > > >
> > > > Again, no it isn't required.
> > > >
> > > > Requests simply have to continue to be acked, it doesn't matter if
> > > > they are acked against the wrong domain because the device will simply
> > > > re-issue them..
> > >
> > > Ah! I start to get your point now.
> > >
> > > Even a page fault response is postponed to a new address space, which
> > > possibly be another address space or hardware blocking state, the
> > > hardware just retries.
> >
> > if blocking then the device shouldn't retry.
> 
> It does retry.
> 
> The device is waiting on a PRI, it gets back an completion. It issues
> a new ATS (this is the rety) and the new-domain responds back with a
> failure indication.

I'm not sure that is the standard behavior defined by PCIe spec.

According to "10.4.2 Page Request Group Response Message", function's
response to Page Request failure is implementation specific.

so a new ATS is optional and likely the device will instead abort the DMA
if PRI response already indicates a failure.

> 
> If the new domain had a present page it would respond with a
> translation
> 
> If the new domain has a non-present page then we get a new PRI.
> 
> The point is from a device perspective it is always doing something
> correct.
> 
> > btw if a stale request targets an virtual address which is outside of the
> > valid VMA's of the new address space then visible side-effect will
> > be incurred in handle_mm_fault() on the new space. Is it desired?
> 
> The whole thing is racy, if someone is radically changing the
> underlying mappings while DMA is ongoing then there is no way to
> synchronize 'before' and 'after' against a concurrent external device.
> 
> So who cares?
> 
> What we care about is that the ATC is coherent and never has stale
> data. The invalidation after changing the translation ensures this
> regardless of any outstanding un-acked PRI.
> 
> > Or if a pending response carries an error code (Invalid Request) from
> > the old address space is received by the device when the new address
> > space is already activated, the hardware will report an error even
> > though there might be a valid mapping in the new space.
> 
> Again, all racy. If a DMA is ongoing at the same instant things are
> changed there is no definitive way to say if it resolved before or
> after.
> 
> The only thing we care about is that dmas that are completed before
> see the before translation and dmas that are started after see the
> after translation.
> 
> DMAs that cross choose one at random.

Yes that makes sense for replacement.

But here we are talking about a draining requirement when disabling
a pasid entry, which is certainly not involved in replacement.

> 
> > I don't think atomic replace is the main usage for this draining
> > requirement. Instead I'm more interested in the basic popular usage:
> > attach-detach-attach and not convinced that no draining is required
> > between iommu/device to avoid interference between activities
> > from old/new address space.
> 
> Something like IDXD needs to halt DMAs on the PASID and flush all
> outstanding DMA to get to a state where the PASID is quiet from the
> device perspective. This is the only way to stop interference.

why is it IDXD specific behavior? I suppose all devices need to quiesce
the outstanding DMAs when tearing down the binding between the
PASID and previous address space.

and here what you described is the normal behavior. In this case
I agree that no draining is required in iommu side given the device
should have quiesced all outstanding DMAs including page requests.

but there are also terminal conditions e.g. when a workqueue is
reset after hang hence additional draining is required from the 
iommu side to ensure all the outstanding page requests/responses
are properly handled.

vt-d spec defines a draining process to cope with those terminal
conditions (see 7.9 Pending Page Request Handling on Terminal
Conditions). intel-iommu driver just implements it by default for
simplicity (one may consider providing explicit API for drivers to
call but not sure of the necessity if such terminal conditions
apply to most devices). anyway this is not a fast path.

another example might be stop marker. A device using stop marker
doesn't need to wait for outstanding page requests. According to PCIe
spec (10.4.1.2 Managing PASID Usage on PRG Requests) the device
simply marks outstanding page request as stale and sends a stop
marker message to the IOMMU. Page responses for those stale
requests are ignored. But presumably the iommu driver still needs
to drain those requests until the stop marker message in unbind
to avoid them incorrectly routed to a new address space in case the
PASID is rebound to another process immediately.

> 
> If the device is still issuing DMA after the domain changes then it is
> never going to work right.
> 
> If *IDXD* needs some help to flush PRIs after it halts DMAs (because
> it can't do it on its own for some reason) then IDXD should have an
> explicit call to do that, after suspending new DMA.

as above I don't think IDXD itself has any special requirements. We
are discussing general device terminal conditions which are considered
by the iommu spec.

> 
> We don't know what things devices will need to do here, devices that
> are able to wait for PRIs to complete may want a cancelling flush to
> speed that up, and that shouldn't be part of the translation change.
> 
> IOW the act of halting DMA and the act of changing the translation
> really should be different things. Then we get into interesting
> questions like what sequence is required for a successful FLR. :\
> 
> Jason


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v7 12/12] iommu: Improve iopf_queue_flush_dev()
  2023-12-05  1:32               ` Tian, Kevin
@ 2023-12-05  1:53                 ` Jason Gunthorpe
  2023-12-05  3:23                   ` Tian, Kevin
  0 siblings, 1 reply; 49+ messages in thread
From: Jason Gunthorpe @ 2023-12-05  1:53 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Baolu Lu, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L, Jacob Pan, Zhao,
	Yan Y, iommu, kvm, linux-kernel

On Tue, Dec 05, 2023 at 01:32:26AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@ziepe.ca>
> > Sent: Monday, December 4, 2023 9:25 PM
> > 
> > On Mon, Dec 04, 2023 at 05:37:13AM +0000, Tian, Kevin wrote:
> > > > From: Baolu Lu <baolu.lu@linux.intel.com>
> > > > Sent: Monday, December 4, 2023 9:33 AM
> > > >
> > > > On 12/3/23 10:14 PM, Jason Gunthorpe wrote:
> > > > > On Sun, Dec 03, 2023 at 04:53:08PM +0800, Baolu Lu wrote:
> > > > >> Even if atomic replacement were to be implemented,
> > > > >> it would be necessary to ensure that all translation requests,
> > > > >> translated requests, page requests and responses for the old domain
> > are
> > > > >> drained before switching to the new domain.
> > > > >
> > > > > Again, no it isn't required.
> > > > >
> > > > > Requests simply have to continue to be acked, it doesn't matter if
> > > > > they are acked against the wrong domain because the device will simply
> > > > > re-issue them..
> > > >
> > > > Ah! I start to get your point now.
> > > >
> > > > Even a page fault response is postponed to a new address space, which
> > > > possibly be another address space or hardware blocking state, the
> > > > hardware just retries.
> > >
> > > if blocking then the device shouldn't retry.
> > 
> > It does retry.
> > 
> > The device is waiting on a PRI, it gets back an completion. It issues
> > a new ATS (this is the rety) and the new-domain responds back with a
> > failure indication.
> 
> I'm not sure that is the standard behavior defined by PCIe spec.

> According to "10.4.2 Page Request Group Response Message", function's
> response to Page Request failure is implementation specific.
> 
> so a new ATS is optional and likely the device will instead abort the DMA
> if PRI response already indicates a failure.

I didn't said the PRI would fail, I said the ATS would fail with a
non-present.

It has to work this way or it is completely broken with respect to
existing races in the mm side. Agents must retry non-present ATS
answers until you get a present or a ATS failure.

> > Again, all racy. If a DMA is ongoing at the same instant things are
> > changed there is no definitive way to say if it resolved before or
> > after.
> > 
> > The only thing we care about is that dmas that are completed before
> > see the before translation and dmas that are started after see the
> > after translation.
> > 
> > DMAs that cross choose one at random.
> 
> Yes that makes sense for replacement.
> 
> But here we are talking about a draining requirement when disabling
> a pasid entry, which is certainly not involved in replacement.

It is the same argument, you are replacing a PTE that was non-present
with one that is failing/blocking - the result of a DMA that crosses
this event can be either.

> > > I don't think atomic replace is the main usage for this draining
> > > requirement. Instead I'm more interested in the basic popular usage:
> > > attach-detach-attach and not convinced that no draining is required
> > > between iommu/device to avoid interference between activities
> > > from old/new address space.
> > 
> > Something like IDXD needs to halt DMAs on the PASID and flush all
> > outstanding DMA to get to a state where the PASID is quiet from the
> > device perspective. This is the only way to stop interference.
> 
> why is it IDXD specific behavior? I suppose all devices need to quiesce
> the outstanding DMAs when tearing down the binding between the
> PASID and previous address space.

Because it is so simple HW I assume this is why this code is being
pushed here :)

> but there are also terminal conditions e.g. when a workqueue is
> reset after hang hence additional draining is required from the 
> iommu side to ensure all the outstanding page requests/responses
> are properly handled.

Then it should be coded as an explicit drain request from device when
and where they need it.

It should not be integrated into the iommu side because it is
nonsensical. Devices expecting consistent behavior must stop DMA
before changing translation, and if they need help to do it they must
call APIs. Changing translation is not required after a so called
"terminal event".

> vt-d spec defines a draining process to cope with those terminal
> conditions (see 7.9 Pending Page Request Handling on Terminal
> Conditions). intel-iommu driver just implements it by default for
> simplicity (one may consider providing explicit API for drivers to
> call but not sure of the necessity if such terminal conditions
> apply to most devices). anyway this is not a fast path.

It is not "by default" it is in the wrong place. These terminal
conditions are things like FLR. FLR has nothing to do with changing
the translation. I can trigger FLR and keep the current translation
and still would want to flush out all the PRIs before starting DMA
again to avoid protocol confusion.

An API is absolutely necessary. Confusing the cases that need draining
with translation change is just not logically right.

eg we do need to modify VFIO to do the drain on FLR like the spec
explains!

Draining has to be ordered correctly with whatever the device is
doing. Drain needs to come after FLR, for instance. It needs to come
after a work queue reset, because drain doesn't make any sense unless
it is coupled with a DMA stop at the device.

Hacking a DMA stop by forcing a blocking translation is not logically
correct, with wrong ordering the device may see unexpected translation
failures which may trigger AERs or bad things..

> another example might be stop marker. A device using stop marker
> doesn't need to wait for outstanding page requests. According to PCIe
> spec (10.4.1.2 Managing PASID Usage on PRG Requests) the device
> simply marks outstanding page request as stale and sends a stop
> marker message to the IOMMU. Page responses for those stale
> requests are ignored. But presumably the iommu driver still needs
> to drain those requests until the stop marker message in unbind
> to avoid them incorrectly routed to a new address space in case the
> PASID is rebound to another process immediately.

Stop marker doesn't change anything, in all processing it just removes
requests that have yet to complete. If a device is using stop then
most likely the whole thing is racy and the OS simply has to be ready
to handle stop at any time.

Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [PATCH v7 12/12] iommu: Improve iopf_queue_flush_dev()
  2023-12-05  1:53                 ` Jason Gunthorpe
@ 2023-12-05  3:23                   ` Tian, Kevin
  2023-12-05 15:52                     ` Jason Gunthorpe
  0 siblings, 1 reply; 49+ messages in thread
From: Tian, Kevin @ 2023-12-05  3:23 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Baolu Lu, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L, Jacob Pan, Zhao,
	Yan Y, iommu, kvm, linux-kernel

> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Tuesday, December 5, 2023 9:53 AM
> 
> On Tue, Dec 05, 2023 at 01:32:26AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@ziepe.ca>
> > > Sent: Monday, December 4, 2023 9:25 PM
> > >
> > > On Mon, Dec 04, 2023 at 05:37:13AM +0000, Tian, Kevin wrote:
> > > > > From: Baolu Lu <baolu.lu@linux.intel.com>
> > > > > Sent: Monday, December 4, 2023 9:33 AM
> > > > >
> > > > > On 12/3/23 10:14 PM, Jason Gunthorpe wrote:
> > > > > > On Sun, Dec 03, 2023 at 04:53:08PM +0800, Baolu Lu wrote:
> > > > > >> Even if atomic replacement were to be implemented,
> > > > > >> it would be necessary to ensure that all translation requests,
> > > > > >> translated requests, page requests and responses for the old
> domain
> > > are
> > > > > >> drained before switching to the new domain.
> > > > > >
> > > > > > Again, no it isn't required.
> > > > > >
> > > > > > Requests simply have to continue to be acked, it doesn't matter if
> > > > > > they are acked against the wrong domain because the device will
> simply
> > > > > > re-issue them..
> > > > >
> > > > > Ah! I start to get your point now.
> > > > >
> > > > > Even a page fault response is postponed to a new address space,
> which
> > > > > possibly be another address space or hardware blocking state, the
> > > > > hardware just retries.
> > > >
> > > > if blocking then the device shouldn't retry.
> > >
> > > It does retry.
> > >
> > > The device is waiting on a PRI, it gets back an completion. It issues
> > > a new ATS (this is the rety) and the new-domain responds back with a
> > > failure indication.
> >
> > I'm not sure that is the standard behavior defined by PCIe spec.
> 
> > According to "10.4.2 Page Request Group Response Message", function's
> > response to Page Request failure is implementation specific.
> >
> > so a new ATS is optional and likely the device will instead abort the DMA
> > if PRI response already indicates a failure.
> 
> I didn't said the PRI would fail, I said the ATS would fail with a
> non-present.
> 
> It has to work this way or it is completely broken with respect to
> existing races in the mm side. Agents must retry non-present ATS
> answers until you get a present or a ATS failure.

My understanding of the sequence is like below:

<'D' for device, 'I' for IOMMU>

  (D) send a ATS translation request
  (I) respond translation result
  (D) If success then sends DMA to the target page
      otherwise send a PRI request
        (I) raise an IOMMU interrupt allowing sw to fix the translation
        (I) generate a PRI response to device
        (D) if success then jump to the first step to retry
            otherwise abort the current request

If mm changes the mapping after a success PRI response, mmu notifier
callback in iommu driver needs to wait for device tlb invalidation completion
which the device will order properly with outstanding DMA requests using
the old translation.

If you refer to the 'retry' after receiving a success PRI response, then yes.

but there is really no reason to retry upon a PRI response failure which
indicates that the faulting address is not a valid one which OS would like
to fix.

> 
> > > Again, all racy. If a DMA is ongoing at the same instant things are
> > > changed there is no definitive way to say if it resolved before or
> > > after.
> > >
> > > The only thing we care about is that dmas that are completed before
> > > see the before translation and dmas that are started after see the
> > > after translation.
> > >
> > > DMAs that cross choose one at random.
> >
> > Yes that makes sense for replacement.
> >
> > But here we are talking about a draining requirement when disabling
> > a pasid entry, which is certainly not involved in replacement.
> 
> It is the same argument, you are replacing a PTE that was non-present

s/non-present/present/

> with one that is failing/blocking - the result of a DMA that crosses
> this event can be either.

kind of

> 
> > > > I don't think atomic replace is the main usage for this draining
> > > > requirement. Instead I'm more interested in the basic popular usage:
> > > > attach-detach-attach and not convinced that no draining is required
> > > > between iommu/device to avoid interference between activities
> > > > from old/new address space.
> > >
> > > Something like IDXD needs to halt DMAs on the PASID and flush all
> > > outstanding DMA to get to a state where the PASID is quiet from the
> > > device perspective. This is the only way to stop interference.
> >
> > why is it IDXD specific behavior? I suppose all devices need to quiesce
> > the outstanding DMAs when tearing down the binding between the
> > PASID and previous address space.
> 
> Because it is so simple HW I assume this is why this code is being
> pushed here :)
> 
> > but there are also terminal conditions e.g. when a workqueue is
> > reset after hang hence additional draining is required from the
> > iommu side to ensure all the outstanding page requests/responses
> > are properly handled.
> 
> Then it should be coded as an explicit drain request from device when
> and where they need it.
> 
> It should not be integrated into the iommu side because it is
> nonsensical. Devices expecting consistent behavior must stop DMA
> before changing translation, and if they need help to do it they must
> call APIs. Changing translation is not required after a so called
> "terminal event".
> 
> > vt-d spec defines a draining process to cope with those terminal
> > conditions (see 7.9 Pending Page Request Handling on Terminal
> > Conditions). intel-iommu driver just implements it by default for
> > simplicity (one may consider providing explicit API for drivers to
> > call but not sure of the necessity if such terminal conditions
> > apply to most devices). anyway this is not a fast path.
> 
> It is not "by default" it is in the wrong place. These terminal
> conditions are things like FLR. FLR has nothing to do with changing
> the translation. I can trigger FLR and keep the current translation
> and still would want to flush out all the PRIs before starting DMA
> again to avoid protocol confusion.
> 
> An API is absolutely necessary. Confusing the cases that need draining
> with translation change is just not logically right.
> 
> eg we do need to modify VFIO to do the drain on FLR like the spec
> explains!
> 
> Draining has to be ordered correctly with whatever the device is
> doing. Drain needs to come after FLR, for instance. It needs to come
> after a work queue reset, because drain doesn't make any sense unless
> it is coupled with a DMA stop at the device.

Okay that makes sense. As Baolu and you already agreed let's separate
this fix out of this series.

The minor interesting aspect is how to document this requirement
clearly so drivers won't skip calling it when sva is enabled. 

> 
> Hacking a DMA stop by forcing a blocking translation is not logically
> correct, with wrong ordering the device may see unexpected translation
> failures which may trigger AERs or bad things..

where is such hack? though the current implementation of draining
is not clean, it's put inside pasid-disable-sequence instead of forcing
a blocking translation implicitly in iommu driver i.e. it's still the driver
making decision for what translation to be used...

> 
> > another example might be stop marker. A device using stop marker
> > doesn't need to wait for outstanding page requests. According to PCIe
> > spec (10.4.1.2 Managing PASID Usage on PRG Requests) the device
> > simply marks outstanding page request as stale and sends a stop
> > marker message to the IOMMU. Page responses for those stale
> > requests are ignored. But presumably the iommu driver still needs
> > to drain those requests until the stop marker message in unbind
> > to avoid them incorrectly routed to a new address space in case the
> > PASID is rebound to another process immediately.
> 
> Stop marker doesn't change anything, in all processing it just removes
> requests that have yet to complete. If a device is using stop then
> most likely the whole thing is racy and the OS simply has to be ready
> to handle stop at any time.
> 

I'm not sure whether a device supporting stop marker may provide
certain abort-dma cmd to not wait. But guess such abort semantics
will be likely used in terminal condition too so the same argument
still hold. Presumably normal translation change should always use
a stop-wait semantics for outstanding DMAs. 😊

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v7 08/12] iommu: Prepare for separating SVA and IOPF
  2023-11-15  3:02 ` [PATCH v7 08/12] iommu: Prepare for separating SVA and IOPF Lu Baolu
@ 2023-12-05  7:10   ` Yi Liu
  0 siblings, 0 replies; 49+ messages in thread
From: Yi Liu @ 2023-12-05  7:10 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, Will Deacon, Robin Murphy,
	Jason Gunthorpe, Kevin Tian, Jean-Philippe Brucker, Nicolin Chen
  Cc: Jacob Pan, Yan Zhao, iommu, kvm, linux-kernel, Jason Gunthorpe

On 2023/11/15 11:02, Lu Baolu wrote:
> Move iopf_group data structure to iommu.h to make it a minimal set of
> faults that a domain's page fault handler should handle.
> 
> Add a new function, iopf_free_group(), to free a fault group after all
> faults in the group are handled. This function will be made global so
> that it can be called from other files, such as iommu-sva.c.
> 
> Move iopf_queue data structure to iommu.h to allow the workqueue to be
> scheduled out of this file.
> 
> This will simplify the sequential patches.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Tested-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
>   include/linux/iommu.h      | 20 +++++++++++++++++++-
>   drivers/iommu/io-pgfault.c | 37 +++++++++++++------------------------
>   2 files changed, 32 insertions(+), 25 deletions(-)

Reviewed-by:Yi Liu <yi.l.liu@intel.com>

> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 42b62bc8737a..0d3c5a56b078 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -41,7 +41,6 @@ struct iommu_dirty_ops;
>   struct notifier_block;
>   struct iommu_sva;
>   struct iommu_dma_cookie;
> -struct iopf_queue;
>   
>   #define IOMMU_FAULT_PERM_READ	(1 << 0) /* read */
>   #define IOMMU_FAULT_PERM_WRITE	(1 << 1) /* write */
> @@ -126,6 +125,25 @@ struct iopf_fault {
>   	struct list_head list;
>   };
>   
> +struct iopf_group {
> +	struct iopf_fault last_fault;
> +	struct list_head faults;
> +	struct work_struct work;
> +	struct device *dev;
> +};
> +
> +/**
> + * struct iopf_queue - IO Page Fault queue
> + * @wq: the fault workqueue
> + * @devices: devices attached to this queue
> + * @lock: protects the device list
> + */
> +struct iopf_queue {
> +	struct workqueue_struct *wq;
> +	struct list_head devices;
> +	struct mutex lock;
> +};
> +
>   /* iommu fault flags */
>   #define IOMMU_FAULT_READ	0x0
>   #define IOMMU_FAULT_WRITE	0x1
> diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
> index c45977bb7da3..09e05f483b4f 100644
> --- a/drivers/iommu/io-pgfault.c
> +++ b/drivers/iommu/io-pgfault.c
> @@ -13,24 +13,17 @@
>   
>   #include "iommu-sva.h"
>   
> -/**
> - * struct iopf_queue - IO Page Fault queue
> - * @wq: the fault workqueue
> - * @devices: devices attached to this queue
> - * @lock: protects the device list
> - */
> -struct iopf_queue {
> -	struct workqueue_struct		*wq;
> -	struct list_head		devices;
> -	struct mutex			lock;
> -};
> +static void iopf_free_group(struct iopf_group *group)
> +{
> +	struct iopf_fault *iopf, *next;
>   
> -struct iopf_group {
> -	struct iopf_fault		last_fault;
> -	struct list_head		faults;
> -	struct work_struct		work;
> -	struct device			*dev;
> -};
> +	list_for_each_entry_safe(iopf, next, &group->faults, list) {
> +		if (!(iopf->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE))
> +			kfree(iopf);
> +	}
> +
> +	kfree(group);
> +}
>   
>   static int iopf_complete_group(struct device *dev, struct iopf_fault *iopf,
>   			       enum iommu_page_response_code status)
> @@ -50,9 +43,9 @@ static int iopf_complete_group(struct device *dev, struct iopf_fault *iopf,
>   
>   static void iopf_handler(struct work_struct *work)
>   {
> +	struct iopf_fault *iopf;
>   	struct iopf_group *group;
>   	struct iommu_domain *domain;
> -	struct iopf_fault *iopf, *next;
>   	enum iommu_page_response_code status = IOMMU_PAGE_RESP_SUCCESS;
>   
>   	group = container_of(work, struct iopf_group, work);
> @@ -61,7 +54,7 @@ static void iopf_handler(struct work_struct *work)
>   	if (!domain || !domain->iopf_handler)
>   		status = IOMMU_PAGE_RESP_INVALID;
>   
> -	list_for_each_entry_safe(iopf, next, &group->faults, list) {
> +	list_for_each_entry(iopf, &group->faults, list) {
>   		/*
>   		 * For the moment, errors are sticky: don't handle subsequent
>   		 * faults in the group if there is an error.
> @@ -69,14 +62,10 @@ static void iopf_handler(struct work_struct *work)
>   		if (status == IOMMU_PAGE_RESP_SUCCESS)
>   			status = domain->iopf_handler(&iopf->fault,
>   						      domain->fault_data);
> -
> -		if (!(iopf->fault.prm.flags &
> -		      IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE))
> -			kfree(iopf);
>   	}
>   
>   	iopf_complete_group(group->dev, &group->last_fault, status);
> -	kfree(group);
> +	iopf_free_group(group);
>   }
>   
>   /**

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v7 09/12] iommu: Make iommu_queue_iopf() more generic
  2023-11-15  3:02 ` [PATCH v7 09/12] iommu: Make iommu_queue_iopf() more generic Lu Baolu
  2023-12-01 19:14   ` Jason Gunthorpe
@ 2023-12-05  7:13   ` Yi Liu
  2023-12-05 12:13     ` Baolu Lu
  1 sibling, 1 reply; 49+ messages in thread
From: Yi Liu @ 2023-12-05  7:13 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, Will Deacon, Robin Murphy,
	Jason Gunthorpe, Kevin Tian, Jean-Philippe Brucker, Nicolin Chen
  Cc: Jacob Pan, Yan Zhao, iommu, kvm, linux-kernel

On 2023/11/15 11:02, Lu Baolu wrote:
> Make iommu_queue_iopf() more generic by making the iopf_group a minimal
> set of iopf's that an iopf handler of domain should handle and respond
> to. Add domain parameter to struct iopf_group so that the handler can
> retrieve and use it directly.
> 
> Change iommu_queue_iopf() to forward groups of iopf's to the domain's
> iopf handler. This is also a necessary step to decouple the sva iopf
> handling code from this interface.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Tested-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
>   include/linux/iommu.h      |  4 +--
>   drivers/iommu/iommu-sva.h  |  6 ++---
>   drivers/iommu/io-pgfault.c | 55 +++++++++++++++++++++++++++++---------
>   drivers/iommu/iommu-sva.c  |  3 +--
>   4 files changed, 48 insertions(+), 20 deletions(-)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 0d3c5a56b078..96f6f210093e 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -130,6 +130,7 @@ struct iopf_group {
>   	struct list_head faults;
>   	struct work_struct work;
>   	struct device *dev;
> +	struct iommu_domain *domain;
>   };
>   
>   /**
> @@ -209,8 +210,7 @@ struct iommu_domain {
>   	unsigned long pgsize_bitmap;	/* Bitmap of page sizes in use */
>   	struct iommu_domain_geometry geometry;
>   	struct iommu_dma_cookie *iova_cookie;
> -	enum iommu_page_response_code (*iopf_handler)(struct iommu_fault *fault,
> -						      void *data);
> +	int (*iopf_handler)(struct iopf_group *group);
>   	void *fault_data;
>   	union {
>   		struct {
> diff --git a/drivers/iommu/iommu-sva.h b/drivers/iommu/iommu-sva.h
> index de7819c796ce..27c8da115b41 100644
> --- a/drivers/iommu/iommu-sva.h
> +++ b/drivers/iommu/iommu-sva.h
> @@ -22,8 +22,7 @@ int iopf_queue_flush_dev(struct device *dev);
>   struct iopf_queue *iopf_queue_alloc(const char *name);
>   void iopf_queue_free(struct iopf_queue *queue);
>   int iopf_queue_discard_partial(struct iopf_queue *queue);
> -enum iommu_page_response_code
> -iommu_sva_handle_iopf(struct iommu_fault *fault, void *data);
> +int iommu_sva_handle_iopf(struct iopf_group *group);
>   
>   #else /* CONFIG_IOMMU_SVA */
>   static inline int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev)
> @@ -62,8 +61,7 @@ static inline int iopf_queue_discard_partial(struct iopf_queue *queue)
>   	return -ENODEV;
>   }
>   
> -static inline enum iommu_page_response_code
> -iommu_sva_handle_iopf(struct iommu_fault *fault, void *data)
> +static inline int iommu_sva_handle_iopf(struct iopf_group *group)
>   {
>   	return IOMMU_PAGE_RESP_INVALID;
>   }
> diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
> index 09e05f483b4f..544653de0d45 100644
> --- a/drivers/iommu/io-pgfault.c
> +++ b/drivers/iommu/io-pgfault.c
> @@ -13,6 +13,9 @@
>   
>   #include "iommu-sva.h"
>   
> +enum iommu_page_response_code
> +iommu_sva_handle_mm(struct iommu_fault *fault, struct mm_struct *mm);
> +
>   static void iopf_free_group(struct iopf_group *group)
>   {
>   	struct iopf_fault *iopf, *next;
> @@ -45,23 +48,18 @@ static void iopf_handler(struct work_struct *work)
>   {
>   	struct iopf_fault *iopf;
>   	struct iopf_group *group;
> -	struct iommu_domain *domain;
>   	enum iommu_page_response_code status = IOMMU_PAGE_RESP_SUCCESS;
>   
>   	group = container_of(work, struct iopf_group, work);
> -	domain = iommu_get_domain_for_dev_pasid(group->dev,
> -				group->last_fault.fault.prm.pasid, 0);
> -	if (!domain || !domain->iopf_handler)
> -		status = IOMMU_PAGE_RESP_INVALID;
> -
>   	list_for_each_entry(iopf, &group->faults, list) {
>   		/*
>   		 * For the moment, errors are sticky: don't handle subsequent
>   		 * faults in the group if there is an error.
>   		 */
> -		if (status == IOMMU_PAGE_RESP_SUCCESS)
> -			status = domain->iopf_handler(&iopf->fault,
> -						      domain->fault_data);
> +		if (status != IOMMU_PAGE_RESP_SUCCESS)
> +			break;
> +
> +		status = iommu_sva_handle_mm(&iopf->fault, group->domain->mm);
>   	}
>   
>   	iopf_complete_group(group->dev, &group->last_fault, status);
> @@ -113,6 +111,7 @@ int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev)
>   	int ret;
>   	struct iopf_group *group;
>   	struct iopf_fault *iopf, *next;
> +	struct iommu_domain *domain = NULL;
>   	struct iommu_fault_param *iopf_param;
>   	struct dev_iommu *param = dev->iommu;
>   
> @@ -143,6 +142,23 @@ int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev)
>   		return 0;
>   	}
>   
> +	if (fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) {
> +		domain = iommu_get_domain_for_dev_pasid(dev, fault->prm.pasid, 0);
> +		if (IS_ERR(domain))
> +			domain = NULL;
> +	}
> +
> +	if (!domain)
> +		domain = iommu_get_domain_for_dev(dev);
> +
> +	if (!domain || !domain->iopf_handler) {
> +		dev_warn_ratelimited(dev,
> +			"iopf (pasid %d) without domain attached or handler installed\n",
> +			 fault->prm.pasid);
> +		ret = -ENODEV;
> +		goto cleanup_partial;
> +	}
> +
>   	group = kzalloc(sizeof(*group), GFP_KERNEL);
>   	if (!group) {
>   		/*
> @@ -157,8 +173,8 @@ int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev)
>   	group->dev = dev;
>   	group->last_fault.fault = *fault;
>   	INIT_LIST_HEAD(&group->faults);
> +	group->domain = domain;
>   	list_add(&group->last_fault.list, &group->faults);
> -	INIT_WORK(&group->work, iopf_handler);
>   
>   	/* See if we have partial faults for this group */
>   	list_for_each_entry_safe(iopf, next, &iopf_param->partial, list) {
> @@ -167,9 +183,13 @@ int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev)
>   			list_move(&iopf->list, &group->faults);
>   	}
>   
> -	queue_work(iopf_param->queue->wq, &group->work);
> -	return 0;
> +	mutex_unlock(&iopf_param->lock);
> +	ret = domain->iopf_handler(group);
> +	mutex_lock(&iopf_param->lock);

After this change, this function (iommu_queue_iopf) does not queue
anything. Should this function be renamed? Except this, I didn't see
other problem.

Reviewed-by:Yi Liu <yi.l.liu@intel.com>

> +	if (ret)
> +		iopf_free_group(group);
>   
> +	return ret;
>   cleanup_partial:
>   	list_for_each_entry_safe(iopf, next, &iopf_param->partial, list) {
>   		if (iopf->fault.prm.grpid == fault->prm.grpid) {
> @@ -181,6 +201,17 @@ int iommu_queue_iopf(struct iommu_fault *fault, struct device *dev)
>   }
>   EXPORT_SYMBOL_GPL(iommu_queue_iopf);
>   
> +int iommu_sva_handle_iopf(struct iopf_group *group)
> +{
> +	struct iommu_fault_param *fault_param = group->dev->iommu->fault_param;
> +
> +	INIT_WORK(&group->work, iopf_handler);
> +	if (!queue_work(fault_param->queue->wq, &group->work))
> +		return -EBUSY;
> +
> +	return 0;
> +}
> +
>   /**
>    * iopf_queue_flush_dev - Ensure that all queued faults have been processed
>    * @dev: the endpoint whose faults need to be flushed.
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index b78671a8a914..ba0d5b7e106a 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -149,11 +149,10 @@ EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
>    * I/O page fault handler for SVA
>    */
>   enum iommu_page_response_code
> -iommu_sva_handle_iopf(struct iommu_fault *fault, void *data)
> +iommu_sva_handle_mm(struct iommu_fault *fault, struct mm_struct *mm)
>   {
>   	vm_fault_t ret;
>   	struct vm_area_struct *vma;
> -	struct mm_struct *mm = data;
>   	unsigned int access_flags = 0;
>   	unsigned int fault_flags = FAULT_FLAG_REMOTE;
>   	struct iommu_fault_page_request *prm = &fault->prm;

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v7 02/12] iommu/arm-smmu-v3: Remove unrecoverable faults reporting
  2023-12-04 10:54   ` Yi Liu
@ 2023-12-05 11:48     ` Baolu Lu
  0 siblings, 0 replies; 49+ messages in thread
From: Baolu Lu @ 2023-12-05 11:48 UTC (permalink / raw)
  To: Yi Liu, Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian, Jean-Philippe Brucker, Nicolin Chen
  Cc: baolu.lu, Jacob Pan, Yan Zhao, iommu, kvm, linux-kernel

On 2023/12/4 18:54, Yi Liu wrote:
> On 2023/11/15 11:02, Lu Baolu wrote:
>> No device driver registers fault handler to handle the reported
>> unrecoveraable faults. Remove it to avoid dead code.
> 
> I noticed only ARM code is removed. So intel iommu driver does not have
> code that tries to report unrecoveraable faults?

Yes.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v7 03/12] iommu: Remove unrecoverable fault data
  2023-12-04 10:58   ` Yi Liu
@ 2023-12-05 11:55     ` Baolu Lu
  0 siblings, 0 replies; 49+ messages in thread
From: Baolu Lu @ 2023-12-05 11:55 UTC (permalink / raw)
  To: Yi Liu, Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian, Jean-Philippe Brucker, Nicolin Chen
  Cc: baolu.lu, Jacob Pan, Yan Zhao, iommu, kvm, linux-kernel, Jason Gunthorpe

On 2023/12/4 18:58, Yi Liu wrote:
> On 2023/11/15 11:02, Lu Baolu wrote:
>> The unrecoverable fault data is not used anywhere. Remove it to avoid
>> dead code.
>>
>> Suggested-by: Kevin Tian <kevin.tian@intel.com>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
>> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
>> ---
>>   include/linux/iommu.h | 70 +------------------------------------------
>>   1 file changed, 1 insertion(+), 69 deletions(-)
>>
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index c2e2225184cf..81eee1afec72 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -50,69 +50,9 @@ struct iommu_dma_cookie;
>>   /* Generic fault types, can be expanded IRQ remapping fault */
>>   enum iommu_fault_type {
>> -    IOMMU_FAULT_DMA_UNRECOV = 1,    /* unrecoverable fault */
>>       IOMMU_FAULT_PAGE_REQ,        /* page request fault */
> 
> a nit, do you kno why this enum was starting from 1? Should it still
> start from 1 after deleting UNRECOV?

As Jason suggested in another thread, we will address this issue in
another thread. I am not sure for now whether we will remove the fault
type field or re-use the previous scheme.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v7 05/12] iommu: Merge iopf_device_param into iommu_fault_param
  2023-12-04 12:32   ` Yi Liu
@ 2023-12-05 12:01     ` Baolu Lu
  0 siblings, 0 replies; 49+ messages in thread
From: Baolu Lu @ 2023-12-05 12:01 UTC (permalink / raw)
  To: Yi Liu, Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian, Jean-Philippe Brucker, Nicolin Chen
  Cc: baolu.lu, Jacob Pan, Yan Zhao, iommu, kvm, linux-kernel, Jason Gunthorpe

On 2023/12/4 20:32, Yi Liu wrote:
> On 2023/11/15 11:02, Lu Baolu wrote:
>> The struct dev_iommu contains two pointers, fault_param and iopf_param.
>> The fault_param pointer points to a data structure that is used to store
>> pending faults that are awaiting responses. The iopf_param pointer points
>> to a data structure that is used to store partial faults that are part of
>> a Page Request Group.
>>
>> The fault_param and iopf_param pointers are essentially duplicate. This
>> causes memory waste. Merge the iopf_device_param pointer into the
>> iommu_fault_param pointer to consolidate the code and save memory. The
>> consolidated pointer would be allocated on demand when the device driver
>> enables the iopf on device, and would be freed after iopf is disabled.
>>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
>> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
>> Tested-by: Yan Zhao <yan.y.zhao@intel.com>
>> ---
>>   include/linux/iommu.h      |  18 ++++--
>>   drivers/iommu/io-pgfault.c | 113 ++++++++++++++++++-------------------
>>   drivers/iommu/iommu.c      |  34 ++---------
>>   3 files changed, 75 insertions(+), 90 deletions(-)
>>
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index 79775859af42..108ab50da1ad 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -42,6 +42,7 @@ struct notifier_block;
>>   struct iommu_sva;
>>   struct iommu_fault_event;
>>   struct iommu_dma_cookie;
>> +struct iopf_queue;
>>   #define IOMMU_FAULT_PERM_READ    (1 << 0) /* read */
>>   #define IOMMU_FAULT_PERM_WRITE    (1 << 1) /* write */
>> @@ -590,21 +591,31 @@ struct iommu_fault_event {
>>    * struct iommu_fault_param - per-device IOMMU fault data
>>    * @handler: Callback function to handle IOMMU faults at device level
>>    * @data: handler private data
>> - * @faults: holds the pending faults which needs response
>>    * @lock: protect pending faults list
>> + * @dev: the device that owns this param
>> + * @queue: IOPF queue
>> + * @queue_list: index into queue->devices
>> + * @partial: faults that are part of a Page Request Group for which 
>> the last
>> + *           request hasn't been submitted yet.
>> + * @faults: holds the pending faults which needs response
> 
> since you already moved this line, maybe fix this typo as well.
> s/needs/need/
> 
>>    */
>>   struct iommu_fault_param {
>>       iommu_dev_fault_handler_t handler;
>>       void *data;
>> +    struct mutex lock;
> 
> can you share why move this line up? It results in a line move as well
> in the above kdoc.

This mutex protects the whole data structure. So I moved it up.

> 
>> +
>> +    struct device *dev;
>> +    struct iopf_queue *queue;
>> +    struct list_head queue_list;
>> +
>> +    struct list_head partial;
>>       struct list_head faults;
>> -    struct mutex lock;
>>   };
>>   /**
>>    * struct dev_iommu - Collection of per-device IOMMU data
>>    *
>>    * @fault_param: IOMMU detected device fault reporting data
>> - * @iopf_param:     I/O Page Fault queue and data
>>    * @fwspec:     IOMMU fwspec data
>>    * @iommu_dev:     IOMMU device this device is linked to
>>    * @priv:     IOMMU Driver private data
>> @@ -620,7 +631,6 @@ struct iommu_fault_param {
>>   struct dev_iommu {
>>       struct mutex lock;
>>       struct iommu_fault_param    *fault_param;
>> -    struct iopf_device_param    *iopf_param;
>>       struct iommu_fwspec        *fwspec;
>>       struct iommu_device        *iommu_dev;
>>       void                *priv;
>> diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
>> index 24b5545352ae..b1cf28055525 100644
>> --- a/drivers/iommu/io-pgfault.c
>> +++ b/drivers/iommu/io-pgfault.c
>> @@ -25,21 +25,6 @@ struct iopf_queue {
>>       struct mutex            lock;
>>   };
>> -/**
>> - * struct iopf_device_param - IO Page Fault data attached to a device
>> - * @dev: the device that owns this param
>> - * @queue: IOPF queue
>> - * @queue_list: index into queue->devices
>> - * @partial: faults that are part of a Page Request Group for which 
>> the last
>> - *           request hasn't been submitted yet.
>> - */
>> -struct iopf_device_param {
>> -    struct device            *dev;
>> -    struct iopf_queue        *queue;
>> -    struct list_head        queue_list;
>> -    struct list_head        partial;
>> -};
>> -
>>   struct iopf_fault {
>>       struct iommu_fault        fault;
>>       struct list_head        list;
>> @@ -144,7 +129,7 @@ int iommu_queue_iopf(struct iommu_fault *fault, 
>> void *cookie)
>>       int ret;
>>       struct iopf_group *group;
>>       struct iopf_fault *iopf, *next;
>> -    struct iopf_device_param *iopf_param;
>> +    struct iommu_fault_param *iopf_param;
>>       struct device *dev = cookie;
>>       struct dev_iommu *param = dev->iommu;
>> @@ -159,7 +144,7 @@ int iommu_queue_iopf(struct iommu_fault *fault, 
>> void *cookie)
>>        * As long as we're holding param->lock, the queue can't be 
>> unlinked
>>        * from the device and therefore cannot disappear.
>>        */
>> -    iopf_param = param->iopf_param;
>> +    iopf_param = param->fault_param;
>>       if (!iopf_param)
>>           return -ENODEV;
>> @@ -229,14 +214,14 @@ EXPORT_SYMBOL_GPL(iommu_queue_iopf);
>>   int iopf_queue_flush_dev(struct device *dev)
>>   {
>>       int ret = 0;
>> -    struct iopf_device_param *iopf_param;
>> +    struct iommu_fault_param *iopf_param;
>>       struct dev_iommu *param = dev->iommu;
>>       if (!param)
>>           return -ENODEV;
>>       mutex_lock(&param->lock);
>> -    iopf_param = param->iopf_param;
>> +    iopf_param = param->fault_param;
>>       if (iopf_param)
>>           flush_workqueue(iopf_param->queue->wq);
>>       else
>> @@ -260,7 +245,7 @@ EXPORT_SYMBOL_GPL(iopf_queue_flush_dev);
>>   int iopf_queue_discard_partial(struct iopf_queue *queue)
>>   {
>>       struct iopf_fault *iopf, *next;
>> -    struct iopf_device_param *iopf_param;
>> +    struct iommu_fault_param *iopf_param;
>>       if (!queue)
>>           return -EINVAL;
>> @@ -287,34 +272,38 @@ EXPORT_SYMBOL_GPL(iopf_queue_discard_partial);
>>    */
>>   int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev)
>>   {
>> -    int ret = -EBUSY;
>> -    struct iopf_device_param *iopf_param;
>> +    int ret = 0;
>>       struct dev_iommu *param = dev->iommu;
>> -
>> -    if (!param)
>> -        return -ENODEV;
>> -
>> -    iopf_param = kzalloc(sizeof(*iopf_param), GFP_KERNEL);
>> -    if (!iopf_param)
>> -        return -ENOMEM;
>> -
>> -    INIT_LIST_HEAD(&iopf_param->partial);
>> -    iopf_param->queue = queue;
>> -    iopf_param->dev = dev;
>> +    struct iommu_fault_param *fault_param;
>>       mutex_lock(&queue->lock);
>>       mutex_lock(&param->lock);
>> -    if (!param->iopf_param) {
>> -        list_add(&iopf_param->queue_list, &queue->devices);
>> -        param->iopf_param = iopf_param;
>> -        ret = 0;
>> +    if (param->fault_param) {
>> +        ret = -EBUSY;
>> +        goto done_unlock;
>>       }
>> +
>> +    get_device(dev);
> 
> noticed the old code has this get as well. :) but still want to ask if
> it is really need.

It's not needed. It was part of iommu_register_device_fault_handler(),
which will be removed in the next patch.

> 
>> +    fault_param = kzalloc(sizeof(*fault_param), GFP_KERNEL);
>> +    if (!fault_param) {
>> +        put_device(dev);
>> +        ret = -ENOMEM;
>> +        goto done_unlock;
>> +    }
>> +
>> +    mutex_init(&fault_param->lock);
>> +    INIT_LIST_HEAD(&fault_param->faults);
>> +    INIT_LIST_HEAD(&fault_param->partial);
>> +    fault_param->dev = dev;
>> +    list_add(&fault_param->queue_list, &queue->devices);
>> +    fault_param->queue = queue;
>> +
>> +    param->fault_param = fault_param;
>> +
>> +done_unlock:
>>       mutex_unlock(&param->lock);
>>       mutex_unlock(&queue->lock);
>> -    if (ret)
>> -        kfree(iopf_param);
>> -
>>       return ret;
>>   }
>>   EXPORT_SYMBOL_GPL(iopf_queue_add_device);
>> @@ -330,34 +319,42 @@ EXPORT_SYMBOL_GPL(iopf_queue_add_device);
>>    */
>>   int iopf_queue_remove_device(struct iopf_queue *queue, struct device 
>> *dev)
>>   {
>> -    int ret = -EINVAL;
>> +    int ret = 0;
>>       struct iopf_fault *iopf, *next;
>> -    struct iopf_device_param *iopf_param;
>>       struct dev_iommu *param = dev->iommu;
>> -
>> -    if (!param || !queue)
>> -        return -EINVAL;
>> +    struct iommu_fault_param *fault_param = param->fault_param;
>>       mutex_lock(&queue->lock);
>>       mutex_lock(&param->lock);
>> -    iopf_param = param->iopf_param;
>> -    if (iopf_param && iopf_param->queue == queue) {
>> -        list_del(&iopf_param->queue_list);
>> -        param->iopf_param = NULL;
>> -        ret = 0;
>> +    if (!fault_param) {
>> +        ret = -ENODEV;
>> +        goto unlock;
>>       }
>> -    mutex_unlock(&param->lock);
>> -    mutex_unlock(&queue->lock);
>> -    if (ret)
>> -        return ret;
>> +
>> +    if (fault_param->queue != queue) {
>> +        ret = -EINVAL;
>> +        goto unlock;
>> +    }
>> +
>> +    if (!list_empty(&fault_param->faults)) {
>> +        ret = -EBUSY;
>> +        goto unlock;
>> +    }
>> +
>> +    list_del(&fault_param->queue_list);
>>       /* Just in case some faults are still stuck */
>> -    list_for_each_entry_safe(iopf, next, &iopf_param->partial, list)
>> +    list_for_each_entry_safe(iopf, next, &fault_param->partial, list)
>>           kfree(iopf);
>> -    kfree(iopf_param);
>> +    param->fault_param = NULL;
>> +    kfree(fault_param);
>> +    put_device(dev);
>> +unlock:
>> +    mutex_unlock(&param->lock);
>> +    mutex_unlock(&queue->lock);
>> -    return 0;
>> +    return ret;
>>   }
>>   EXPORT_SYMBOL_GPL(iopf_queue_remove_device);
>> @@ -403,7 +400,7 @@ EXPORT_SYMBOL_GPL(iopf_queue_alloc);
>>    */
>>   void iopf_queue_free(struct iopf_queue *queue)
>>   {
>> -    struct iopf_device_param *iopf_param, *next;
>> +    struct iommu_fault_param *iopf_param, *next;
>>       if (!queue)
>>           return;
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index f24513e2b025..9c9eacfa6761 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -1326,27 +1326,18 @@ int iommu_register_device_fault_handler(struct 
>> device *dev,
>>       struct dev_iommu *param = dev->iommu;
>>       int ret = 0;
>> -    if (!param)
>> +    if (!param || !param->fault_param)
>>           return -EINVAL;
>>       mutex_lock(&param->lock);
>>       /* Only allow one fault handler registered for each device */
>> -    if (param->fault_param) {
>> +    if (param->fault_param->handler) {
>>           ret = -EBUSY;
>>           goto done_unlock;
>>       }
>> -    get_device(dev);
>> -    param->fault_param = kzalloc(sizeof(*param->fault_param), 
>> GFP_KERNEL);
>> -    if (!param->fault_param) {
>> -        put_device(dev);
>> -        ret = -ENOMEM;
>> -        goto done_unlock;
>> -    }
>>       param->fault_param->handler = handler;
>>       param->fault_param->data = data;
>> -    mutex_init(&param->fault_param->lock);
>> -    INIT_LIST_HEAD(&param->fault_param->faults);
>>   done_unlock:
>>       mutex_unlock(&param->lock);
>> @@ -1367,29 +1358,16 @@ 
>> EXPORT_SYMBOL_GPL(iommu_register_device_fault_handler);
>>   int iommu_unregister_device_fault_handler(struct device *dev)
>>   {
>>       struct dev_iommu *param = dev->iommu;
>> -    int ret = 0;
>> -    if (!param)
>> +    if (!param || !param->fault_param)
>>           return -EINVAL;
>>       mutex_lock(&param->lock);
>> -
>> -    if (!param->fault_param)
>> -        goto unlock;
>> -
>> -    /* we cannot unregister handler if there are pending faults */
>> -    if (!list_empty(&param->fault_param->faults)) {
>> -        ret = -EBUSY;
>> -        goto unlock;
>> -    }
>> -
>> -    kfree(param->fault_param);
>> -    param->fault_param = NULL;
>> -    put_device(dev);
>> -unlock:
>> +    param->fault_param->handler = NULL;
>> +    param->fault_param->data = NULL;
>>       mutex_unlock(&param->lock);
>> -    return ret;
>> +    return 0;
>>   }
>>   EXPORT_SYMBOL_GPL(iommu_unregister_device_fault_handler);
> 

Best regards,
baolu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v7 06/12] iommu: Remove iommu_[un]register_device_fault_handler()
  2023-12-04 12:36   ` Yi Liu
@ 2023-12-05 12:09     ` Baolu Lu
  0 siblings, 0 replies; 49+ messages in thread
From: Baolu Lu @ 2023-12-05 12:09 UTC (permalink / raw)
  To: Yi Liu, Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian, Jean-Philippe Brucker, Nicolin Chen
  Cc: baolu.lu, Jacob Pan, Yan Zhao, iommu, kvm, linux-kernel, Jason Gunthorpe

On 2023/12/4 20:36, Yi Liu wrote:
>>   /**
>>    * iommu_report_device_fault() - Report fault event to device driver
>>    * @dev: the device
>> @@ -1395,10 +1325,6 @@ int iommu_report_device_fault(struct device 
>> *dev, struct iommu_fault_event *evt)
>>       /* we only report device fault if there is a handler registered */
>>       mutex_lock(&param->lock);
>>       fparam = param->fault_param;
>> -    if (!fparam || !fparam->handler) {
> 
> should it still check the fparam?

Yes.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v7 09/12] iommu: Make iommu_queue_iopf() more generic
  2023-12-05  7:13   ` Yi Liu
@ 2023-12-05 12:13     ` Baolu Lu
  0 siblings, 0 replies; 49+ messages in thread
From: Baolu Lu @ 2023-12-05 12:13 UTC (permalink / raw)
  To: Yi Liu, Joerg Roedel, Will Deacon, Robin Murphy, Jason Gunthorpe,
	Kevin Tian, Jean-Philippe Brucker, Nicolin Chen
  Cc: baolu.lu, Jacob Pan, Yan Zhao, iommu, kvm, linux-kernel

On 2023/12/5 15:13, Yi Liu wrote:
>> @@ -157,8 +173,8 @@ int iommu_queue_iopf(struct iommu_fault *fault, 
>> struct device *dev)
>>       group->dev = dev;
>>       group->last_fault.fault = *fault;
>>       INIT_LIST_HEAD(&group->faults);
>> +    group->domain = domain;
>>       list_add(&group->last_fault.list, &group->faults);
>> -    INIT_WORK(&group->work, iopf_handler);
>>       /* See if we have partial faults for this group */
>>       list_for_each_entry_safe(iopf, next, &iopf_param->partial, list) {
>> @@ -167,9 +183,13 @@ int iommu_queue_iopf(struct iommu_fault *fault, 
>> struct device *dev)
>>               list_move(&iopf->list, &group->faults);
>>       }
>> -    queue_work(iopf_param->queue->wq, &group->work);
>> -    return 0;
>> +    mutex_unlock(&iopf_param->lock);
>> +    ret = domain->iopf_handler(group);
>> +    mutex_lock(&iopf_param->lock);
> 
> After this change, this function (iommu_queue_iopf) does not queue
> anything. Should this function be renamed? Except this, I didn't see
> other problem.

It's renamed in the next patch.

> 
> Reviewed-by:Yi Liu <yi.l.liu@intel.com>

Thank you!

Best regards,
baolu

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v7 12/12] iommu: Improve iopf_queue_flush_dev()
  2023-12-05  3:23                   ` Tian, Kevin
@ 2023-12-05 15:52                     ` Jason Gunthorpe
  0 siblings, 0 replies; 49+ messages in thread
From: Jason Gunthorpe @ 2023-12-05 15:52 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Baolu Lu, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L, Jacob Pan, Zhao,
	Yan Y, iommu, kvm, linux-kernel

On Tue, Dec 05, 2023 at 03:23:05AM +0000, Tian, Kevin wrote:

> > I didn't said the PRI would fail, I said the ATS would fail with a
> > non-present.
> > 
> > It has to work this way or it is completely broken with respect to
> > existing races in the mm side. Agents must retry non-present ATS
> > answers until you get a present or a ATS failure.
> 
> My understanding of the sequence is like below:
> 
> <'D' for device, 'I' for IOMMU>
> 
>   (D) send a ATS translation request
>   (I) respond translation result
>   (D) If success then sends DMA to the target page
>       otherwise send a PRI request
>         (I) raise an IOMMU interrupt allowing sw to fix the translation
>         (I) generate a PRI response to device
>         (D) if success then jump to the first step to retry
>             otherwise abort the current request
> 
> If mm changes the mapping after a success PRI response, mmu notifier
> callback in iommu driver needs to wait for device tlb invalidation completion
> which the device will order properly with outstanding DMA requests using
> the old translation.
> 
> If you refer to the 'retry' after receiving a success PRI response, then yes.
> 
> but there is really no reason to retry upon a PRI response failure which
> indicates that the faulting address is not a valid one which OS would like
> to fix.

Right

> > Draining has to be ordered correctly with whatever the device is
> > doing. Drain needs to come after FLR, for instance. It needs to come
> > after a work queue reset, because drain doesn't make any sense unless
> > it is coupled with a DMA stop at the device.
> 
> Okay that makes sense. As Baolu and you already agreed let's separate
> this fix out of this series.
> 
> The minor interesting aspect is how to document this requirement
> clearly so drivers won't skip calling it when sva is enabled. 

All changes to translation inside kernel drivers should only be done
once the DMA is halted, otherwise things possibily become security
troubled.

We should document this clearly, it is already expected in common
cases like using the DMA API and when removing() drivers. It also
applies when the driver is manually changing a PASID.

The issue is not drain, it is that the HW is still doing DMA on the
PASID and the PASID may be assigned to a new process. This kernel
*must* prevent this directly and strongly.

If the device requires a drain to halt its DMA, then that is a device
specific sequence. Otherwise it should simply halt its DMA in whatever
device specific way it has.

> > Hacking a DMA stop by forcing a blocking translation is not logically
> > correct, with wrong ordering the device may see unexpected translation
> > failures which may trigger AERs or bad things..
> 
> where is such hack? though the current implementation of draining
> is not clean, it's put inside pasid-disable-sequence instead of forcing
> a blocking translation implicitly in iommu driver i.e. it's still the driver
> making decision for what translation to be used...

It is mis-understanding the purpose of drain.

In normal operating cases PRI just flows and the device will
eventually, naturally, reach a stable terminal case. We don't provide
any ordering guarentees across translation changes so PRI just follows
that design. If you change the translation with ongoing DMA then you
just don't know what order things will happen in.

The main purpose of drain is to keep the PRI protocol itself in sync
against events on the device side that cause it to forget about the
tags it has already issued. Eg a FLR should reset the tag record. If a
device then issues a new PRI with a tag that matches a tag that was
outstanding prior to FLR we can get a corruption.

So any drain sequence should start with the device halting new
PRIs. We flush all PRI tags from the system completely, and then the
device may resume issuing new PRIs.

When the device resets it's tag labels is a property of the device.

Notice none of this has anything to do with change of
translation. Change of translation, or flush of ATC, does not
invalidate the tags.

A secondary case is to help devices halt their DMA when they cannot do
this on their own.

Jason

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2023-12-05 15:52 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-15  3:02 [PATCH v7 00/12] iommu: Prepare to deliver page faults to user space Lu Baolu
2023-11-15  3:02 ` [PATCH v7 01/12] iommu: Move iommu fault data to linux/iommu.h Lu Baolu
2023-12-04 10:52   ` Yi Liu
2023-11-15  3:02 ` [PATCH v7 02/12] iommu/arm-smmu-v3: Remove unrecoverable faults reporting Lu Baolu
2023-12-01 15:42   ` Jason Gunthorpe
2023-12-04 10:54   ` Yi Liu
2023-12-05 11:48     ` Baolu Lu
2023-11-15  3:02 ` [PATCH v7 03/12] iommu: Remove unrecoverable fault data Lu Baolu
2023-12-04 10:58   ` Yi Liu
2023-12-05 11:55     ` Baolu Lu
2023-11-15  3:02 ` [PATCH v7 04/12] iommu: Cleanup iopf data structure definitions Lu Baolu
2023-12-04 11:03   ` Yi Liu
2023-11-15  3:02 ` [PATCH v7 05/12] iommu: Merge iopf_device_param into iommu_fault_param Lu Baolu
2023-12-04 12:32   ` Yi Liu
2023-12-05 12:01     ` Baolu Lu
2023-11-15  3:02 ` [PATCH v7 06/12] iommu: Remove iommu_[un]register_device_fault_handler() Lu Baolu
2023-12-04 12:36   ` Yi Liu
2023-12-05 12:09     ` Baolu Lu
2023-11-15  3:02 ` [PATCH v7 07/12] iommu: Merge iommu_fault_event and iopf_fault Lu Baolu
2023-12-01 19:09   ` Jason Gunthorpe
2023-12-04 12:40   ` Yi Liu
2023-11-15  3:02 ` [PATCH v7 08/12] iommu: Prepare for separating SVA and IOPF Lu Baolu
2023-12-05  7:10   ` Yi Liu
2023-11-15  3:02 ` [PATCH v7 09/12] iommu: Make iommu_queue_iopf() more generic Lu Baolu
2023-12-01 19:14   ` Jason Gunthorpe
2023-12-05  7:13   ` Yi Liu
2023-12-05 12:13     ` Baolu Lu
2023-11-15  3:02 ` [PATCH v7 10/12] iommu: Separate SVA and IOPF Lu Baolu
2023-11-15  3:02 ` [PATCH v7 11/12] iommu: Consolidate per-device fault data management Lu Baolu
2023-12-01 19:46   ` Jason Gunthorpe
2023-12-04  0:58     ` Baolu Lu
2023-11-15  3:02 ` [PATCH v7 12/12] iommu: Improve iopf_queue_flush_dev() Lu Baolu
2023-12-01 20:35   ` Jason Gunthorpe
2023-12-03  8:53     ` Baolu Lu
2023-12-03 14:14       ` Jason Gunthorpe
2023-12-04  1:32         ` Baolu Lu
2023-12-04  5:37           ` Tian, Kevin
2023-12-04 13:25             ` Jason Gunthorpe
2023-12-05  1:32               ` Tian, Kevin
2023-12-05  1:53                 ` Jason Gunthorpe
2023-12-05  3:23                   ` Tian, Kevin
2023-12-05 15:52                     ` Jason Gunthorpe
2023-12-04 13:12           ` Jason Gunthorpe
2023-12-04  3:46     ` Baolu Lu
2023-12-04 13:27       ` Jason Gunthorpe
2023-12-05  1:13         ` Baolu Lu
2023-11-24  6:30 ` [PATCH v7 00/12] iommu: Prepare to deliver page faults to user space liulongfang
2023-11-24 12:01   ` Baolu Lu
2023-11-25  4:05     ` liulongfang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.