iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/9] IOMMUFD: Deliver IO page faults to user space
@ 2024-04-30 14:57 Lu Baolu
  2024-04-30 14:57 ` [PATCH v5 1/9] iommu: Introduce domain attachment handle Lu Baolu
                   ` (8 more replies)
  0 siblings, 9 replies; 57+ messages in thread
From: Lu Baolu @ 2024-04-30 14:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel, Lu Baolu

This series implements the functionality of delivering IO page faults to
user space through the IOMMUFD framework. One feasible use case is the
nested translation. Nested translation is a hardware feature that
supports two-stage translation tables for IOMMU. The second-stage
translation table is managed by the host VMM, while the first-stage
translation table is owned by user space. This allows user space to
control the IOMMU mappings for its devices.

When an IO page fault occurs on the first-stage translation table, the
IOMMU hardware can deliver the page fault to user space through the
IOMMUFD framework. User space can then handle the page fault and respond
to the device top-down through the IOMMUFD. This allows user space to
implement its own IO page fault handling policies.

User space application that is capable of handling IO page faults should
allocate a fault object, and bind the fault object to any domain that it
is willing to handle the fault generatd for them. On a successful return
of fault object allocation, the user can retrieve and respond to page
faults by reading or writing to the file descriptor (FD) returned.

The iommu selftest framework has been updated to test the IO page fault
delivery and response functionality.

The series and related patches are available on GitHub:
https://github.com/LuBaolu/intel-iommu/commits/iommufd-io-pgfault-delivery-v5

Change log:
v5:
 - Removed attach handle reference count from the core. Drivers will now
   synchronize their use of handles and domain attach/detach.
 - Automatically responds to all outstanding faults in hwpt detach or
   replace paths.
 - Supports getting a domain-type specific attach handle.
 - Reorganized the series by changing the patch order.
 - Miscellaneous cleanup.

v4: https://lore.kernel.org/linux-iommu/20240403011519.78512-1-baolu.lu@linux.intel.com/
 - Add the iommu domain attachment handle to replace the iopf-specific
   domain attachment interfaces introduced in the previous v3.
 - Replace the iommu_sva with iommu domain attachment handle.
 - Refine some fields in the fault and response message encoding
   according to feedback collected during v3 review period.
 - Refine and fix some problems in the fault FD implementation.
 - Miscellaneous cleanup.

v3: https://lore.kernel.org/linux-iommu/20240122073903.24406-1-baolu.lu@linux.intel.com/
 - Add iopf domain attach/detach/replace interfaces to manage the
   reference counters of hwpt and device, ensuring that both can only be
   destroyed after all outstanding IOPFs have been responded to. 
 - Relocate the fault handling file descriptor from hwpt to a fault
   object to enable a single fault handling object to be utilized
   across multiple domains.
 - Miscellaneous cleanup and performance improvements.

v2: https://lore.kernel.org/linux-iommu/20231026024930.382898-1-baolu.lu@linux.intel.com/
 - Move all iommu refactoring patches into a sparated series and discuss
   it in a different thread. The latest patch series [v6] is available at
   https://lore.kernel.org/linux-iommu/20230928042734.16134-1-baolu.lu@linux.intel.com/
 - We discussed the timeout of the pending page fault messages. We
   agreed that we shouldn't apply any timeout policy for the page fault
   handling in user space.
   https://lore.kernel.org/linux-iommu/20230616113232.GA84678@myrica/
 - Jason suggested that we adopt a simple file descriptor interface for
   reading and responding to I/O page requests, so that user space
   applications can improve performance using io_uring.
   https://lore.kernel.org/linux-iommu/ZJWjD1ajeem6pK3I@ziepe.ca/

v1: https://lore.kernel.org/linux-iommu/20230530053724.232765-1-baolu.lu@linux.intel.com/

Lu Baolu (9):
  iommu: Introduce domain attachment handle
  iommu: Replace sva_iommu with iommu_attach_handle
  iommu: Add attachment handle to struct iopf_group
  iommufd: Add fault and response message definitions
  iommufd: Add iommufd fault object
  iommufd: Fault-capable hwpt attach/detach/replace
  iommufd: Associate fault object with iommufd_hw_pgtable
  iommufd/selftest: Add IOPF support for mock device
  iommufd/selftest: Add coverage for IOPF test

 include/linux/iommu.h                         |  32 +-
 include/linux/uacce.h                         |   2 +-
 drivers/dma/idxd/idxd.h                       |   2 +-
 drivers/iommu/iommu-priv.h                    |  17 +
 drivers/iommu/iommufd/iommufd_private.h       |  43 ++
 drivers/iommu/iommufd/iommufd_test.h          |   8 +
 include/uapi/linux/iommufd.h                  | 122 ++++++
 tools/testing/selftests/iommu/iommufd_utils.h |  84 +++-
 drivers/dma/idxd/cdev.c                       |   4 +-
 drivers/iommu/io-pgfault.c                    |  34 +-
 drivers/iommu/iommu-sva.c                     |  43 +-
 drivers/iommu/iommu.c                         | 156 +++++++-
 drivers/iommu/iommufd/device.c                |  16 +-
 drivers/iommu/iommufd/fault.c                 | 373 ++++++++++++++++++
 drivers/iommu/iommufd/hw_pagetable.c          |  35 +-
 drivers/iommu/iommufd/main.c                  |   8 +-
 drivers/iommu/iommufd/selftest.c              |  63 +++
 drivers/misc/uacce/uacce.c                    |   2 +-
 tools/testing/selftests/iommu/iommufd.c       |  18 +
 .../selftests/iommu/iommufd_fail_nth.c        |   2 +-
 drivers/iommu/iommufd/Makefile                |   1 +
 21 files changed, 959 insertions(+), 106 deletions(-)
 create mode 100644 drivers/iommu/iommufd/fault.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v5 1/9] iommu: Introduce domain attachment handle
  2024-04-30 14:57 [PATCH v5 0/9] IOMMUFD: Deliver IO page faults to user space Lu Baolu
@ 2024-04-30 14:57 ` Lu Baolu
  2024-05-15  7:17   ` Tian, Kevin
  2024-04-30 14:57 ` [PATCH v5 2/9] iommu: Replace sva_iommu with iommu_attach_handle Lu Baolu
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 57+ messages in thread
From: Lu Baolu @ 2024-04-30 14:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel, Lu Baolu, Jason Gunthorpe

Currently, when attaching a domain to a device or its PASID, domain is
stored within the iommu group. It could be retrieved for use during the
window between attachment and detachment.

With new features introduced, there's a need to store more information
than just a domain pointer. This information essentially represents the
association between a domain and a device. For example, the SVA code
already has a custom struct iommu_sva which represents a bond between
sva domain and a PASID of a device. Looking forward, the IOMMUFD needs
a place to store the iommufd_device pointer in the core, so that the
device object ID could be quickly retrieved in the critical fault handling
path.

Introduce domain attachment handle that explicitly represents the
attachment relationship between a domain and a device or its PASID.
Caller-specific data fields can be added later to store additional
information beyond a domain pointer, depending on its specific use case.

Co-developed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/iommu-priv.h |   6 ++
 drivers/iommu/iommu.c      | 156 ++++++++++++++++++++++++++++++++-----
 2 files changed, 142 insertions(+), 20 deletions(-)

diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
index 5f731d994803..da1addaa1a31 100644
--- a/drivers/iommu/iommu-priv.h
+++ b/drivers/iommu/iommu-priv.h
@@ -28,4 +28,10 @@ void iommu_device_unregister_bus(struct iommu_device *iommu,
 				 const struct bus_type *bus,
 				 struct notifier_block *nb);
 
+struct iommu_attach_handle {
+	struct iommu_domain		*domain;
+};
+
+struct iommu_attach_handle *
+iommu_attach_handle_get(struct iommu_group *group, ioasid_t pasid, unsigned int type);
 #endif /* __LINUX_IOMMU_PRIV_H */
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index a95a483def2d..0cdd58e69e20 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2039,6 +2039,89 @@ void iommu_domain_free(struct iommu_domain *domain)
 }
 EXPORT_SYMBOL_GPL(iommu_domain_free);
 
+/* Add an attach handle to the group's pasid array. */
+static struct iommu_attach_handle *
+iommu_attach_handle_set(struct iommu_domain *domain,
+			struct iommu_group *group, ioasid_t pasid)
+{
+	struct iommu_attach_handle *handle;
+	void *curr;
+
+	handle = kzalloc(sizeof(*handle), GFP_KERNEL);
+	if (!handle)
+		return ERR_PTR(-ENOMEM);
+
+	handle->domain = domain;
+	curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, handle, GFP_KERNEL);
+	if (curr) {
+		kfree(handle);
+		return xa_err(curr) ? curr : ERR_PTR(-EBUSY);
+	}
+
+	return handle;
+}
+
+/* Remove the attach handle stored in group's pasid array. */
+static void iommu_attach_handle_remove(struct iommu_group *group, ioasid_t pasid)
+{
+	struct iommu_attach_handle *handle;
+
+	handle = xa_erase(&group->pasid_array, pasid);
+	kfree(handle);
+}
+
+static struct iommu_attach_handle *
+iommu_attach_handle_replace(struct iommu_domain *domain,
+			    struct iommu_group *group, ioasid_t pasid)
+{
+	struct iommu_attach_handle *handle, *curr;
+
+	handle = kzalloc(sizeof(*handle), GFP_KERNEL);
+	if (!handle)
+		return ERR_PTR(-ENOMEM);
+
+	handle->domain = domain;
+	curr = xa_store(&group->pasid_array, pasid, handle, GFP_KERNEL);
+	if (xa_err(curr)) {
+		kfree(handle);
+		return curr;
+	}
+	kfree(curr);
+
+	return handle;
+}
+
+/*
+ * iommu_attach_handle_get - Return the attach handle
+ * @group: the iommu group that domain was attached to
+ * @pasid: the pasid within the group
+ * @type: matched domain type, 0 for any match
+ *
+ * Return handle or ERR_PTR(-ENOENT) on none, ERR_PTR(-EBUSY) on mismatch.
+ *
+ * Return the attach handle to the caller. The life cycle of an iommu attach
+ * handle is from the time when the domain is attached to the time when the
+ * domain is detached. Callers are required to synchronize the call of
+ * iommu_attach_handle_get() with domain attachment and detachment. The attach
+ * handle can only be used during its life cycle.
+ */
+struct iommu_attach_handle *
+iommu_attach_handle_get(struct iommu_group *group, ioasid_t pasid, unsigned int type)
+{
+	struct iommu_attach_handle *handle;
+
+	xa_lock(&group->pasid_array);
+	handle = xa_load(&group->pasid_array, pasid);
+	if (!handle)
+		handle = ERR_PTR(-ENOENT);
+	else if (type && handle->domain->type != type)
+		handle = ERR_PTR(-EBUSY);
+	xa_unlock(&group->pasid_array);
+
+	return handle;
+}
+EXPORT_SYMBOL_NS_GPL(iommu_attach_handle_get, IOMMUFD_INTERNAL);
+
 /*
  * Put the group's domain back to the appropriate core-owned domain - either the
  * standard kernel-mode DMA configuration or an all-DMA-blocked domain.
@@ -2187,12 +2270,25 @@ static int __iommu_attach_group(struct iommu_domain *domain,
  */
 int iommu_attach_group(struct iommu_domain *domain, struct iommu_group *group)
 {
+	struct iommu_attach_handle *handle;
 	int ret;
 
 	mutex_lock(&group->mutex);
+	handle = iommu_attach_handle_set(domain, group, IOMMU_NO_PASID);
+	if (IS_ERR(handle)) {
+		ret = PTR_ERR(handle);
+		goto out_unlock;
+	}
 	ret = __iommu_attach_group(domain, group);
+	if (ret)
+		goto out_remove_handle;
 	mutex_unlock(&group->mutex);
 
+	return 0;
+out_remove_handle:
+	iommu_attach_handle_remove(group, IOMMU_NO_PASID);
+out_unlock:
+	mutex_unlock(&group->mutex);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_attach_group);
@@ -2211,13 +2307,33 @@ EXPORT_SYMBOL_GPL(iommu_attach_group);
 int iommu_group_replace_domain(struct iommu_group *group,
 			       struct iommu_domain *new_domain)
 {
+	struct iommu_domain *old_domain = group->domain;
+	struct iommu_attach_handle *handle;
 	int ret;
 
 	if (!new_domain)
 		return -EINVAL;
 
+	if (new_domain == old_domain)
+		return 0;
+
 	mutex_lock(&group->mutex);
 	ret = __iommu_group_set_domain(group, new_domain);
+	if (ret)
+		goto out_unlock;
+
+	handle = iommu_attach_handle_replace(new_domain, group, IOMMU_NO_PASID);
+	if (IS_ERR(handle)) {
+		ret = PTR_ERR(handle);
+		goto out_old_domain;
+	}
+	mutex_unlock(&group->mutex);
+
+	return 0;
+
+out_old_domain:
+	__iommu_group_set_domain(group, old_domain);
+out_unlock:
 	mutex_unlock(&group->mutex);
 	return ret;
 }
@@ -2352,6 +2468,7 @@ void iommu_detach_group(struct iommu_domain *domain, struct iommu_group *group)
 {
 	mutex_lock(&group->mutex);
 	__iommu_group_set_core_domain(group);
+	iommu_attach_handle_remove(group, IOMMU_NO_PASID);
 	mutex_unlock(&group->mutex);
 }
 EXPORT_SYMBOL_GPL(iommu_detach_group);
@@ -3354,8 +3471,8 @@ int iommu_attach_device_pasid(struct iommu_domain *domain,
 {
 	/* Caller must be a probed driver on dev */
 	struct iommu_group *group = dev->iommu_group;
+	struct iommu_attach_handle *handle;
 	struct group_device *device;
-	void *curr;
 	int ret;
 
 	if (!domain->ops->set_dev_pasid)
@@ -3376,17 +3493,22 @@ int iommu_attach_device_pasid(struct iommu_domain *domain,
 		}
 	}
 
-	curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain, GFP_KERNEL);
-	if (curr) {
-		ret = xa_err(curr) ? : -EBUSY;
+	handle = iommu_attach_handle_set(domain, group, pasid);
+	if (IS_ERR(handle)) {
+		ret = PTR_ERR(handle);
 		goto out_unlock;
 	}
 
 	ret = __iommu_set_group_pasid(domain, group, pasid);
-	if (ret) {
-		__iommu_remove_group_pasid(group, pasid);
-		xa_erase(&group->pasid_array, pasid);
-	}
+	if (ret)
+		goto out_put_handle;
+	mutex_unlock(&group->mutex);
+
+	return 0;
+
+out_put_handle:
+	__iommu_remove_group_pasid(group, pasid);
+	iommu_attach_handle_remove(group, pasid);
 out_unlock:
 	mutex_unlock(&group->mutex);
 	return ret;
@@ -3410,7 +3532,7 @@ void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev,
 
 	mutex_lock(&group->mutex);
 	__iommu_remove_group_pasid(group, pasid);
-	WARN_ON(xa_erase(&group->pasid_array, pasid) != domain);
+	iommu_attach_handle_remove(group, pasid);
 	mutex_unlock(&group->mutex);
 }
 EXPORT_SYMBOL_GPL(iommu_detach_device_pasid);
@@ -3433,20 +3555,14 @@ struct iommu_domain *iommu_get_domain_for_dev_pasid(struct device *dev,
 						    ioasid_t pasid,
 						    unsigned int type)
 {
-	/* Caller must be a probed driver on dev */
 	struct iommu_group *group = dev->iommu_group;
-	struct iommu_domain *domain;
+	struct iommu_attach_handle *handle;
 
-	if (!group)
-		return NULL;
+	handle = iommu_attach_handle_get(group, pasid, type);
+	if (IS_ERR(handle))
+		return (void *)handle;
 
-	xa_lock(&group->pasid_array);
-	domain = xa_load(&group->pasid_array, pasid);
-	if (type && domain && domain->type != type)
-		domain = ERR_PTR(-EBUSY);
-	xa_unlock(&group->pasid_array);
-
-	return domain;
+	return handle->domain;
 }
 EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev_pasid);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 2/9] iommu: Replace sva_iommu with iommu_attach_handle
  2024-04-30 14:57 [PATCH v5 0/9] IOMMUFD: Deliver IO page faults to user space Lu Baolu
  2024-04-30 14:57 ` [PATCH v5 1/9] iommu: Introduce domain attachment handle Lu Baolu
@ 2024-04-30 14:57 ` Lu Baolu
  2024-05-07 23:58   ` Jason Gunthorpe
  2024-05-15  7:21   ` Tian, Kevin
  2024-04-30 14:57 ` [PATCH v5 3/9] iommu: Add attachment handle to struct iopf_group Lu Baolu
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 57+ messages in thread
From: Lu Baolu @ 2024-04-30 14:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel, Lu Baolu

The struct sva_iommu represents a bond of an SVA domain and a device.
It is functionally equivalent to the iommu_attach_handle. To avoid
code duplication, replace sva_iommu with the iommu_attach_handle and
remove the code that manages sva_iommu.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 include/linux/iommu.h      | 28 +++++++------------------
 include/linux/uacce.h      |  2 +-
 drivers/dma/idxd/idxd.h    |  2 +-
 drivers/iommu/iommu-priv.h |  7 +++++++
 drivers/dma/idxd/cdev.c    |  4 ++--
 drivers/iommu/iommu-sva.c  | 43 ++++++++++++++++++--------------------
 drivers/misc/uacce/uacce.c |  2 +-
 7 files changed, 40 insertions(+), 48 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 2e925b5eba53..be9c9a10169d 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -39,7 +39,6 @@ struct iommu_domain;
 struct iommu_domain_ops;
 struct iommu_dirty_ops;
 struct notifier_block;
-struct iommu_sva;
 struct iommu_dma_cookie;
 struct iommu_fault_param;
 
@@ -986,20 +985,9 @@ struct iommu_fwspec {
 /* ATS is supported */
 #define IOMMU_FWSPEC_PCI_RC_ATS			(1 << 0)
 
-/**
- * struct iommu_sva - handle to a device-mm bond
- */
-struct iommu_sva {
-	struct device			*dev;
-	struct iommu_domain		*domain;
-	struct list_head		handle_item;
-	refcount_t			users;
-};
-
 struct iommu_mm_data {
 	u32			pasid;
 	struct list_head	sva_domains;
-	struct list_head	sva_handles;
 };
 
 int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
@@ -1527,24 +1515,24 @@ static inline u32 mm_get_enqcmd_pasid(struct mm_struct *mm)
 }
 
 void mm_pasid_drop(struct mm_struct *mm);
-struct iommu_sva *iommu_sva_bind_device(struct device *dev,
-					struct mm_struct *mm);
-void iommu_sva_unbind_device(struct iommu_sva *handle);
-u32 iommu_sva_get_pasid(struct iommu_sva *handle);
+struct iommu_attach_handle *iommu_sva_bind_device(struct device *dev,
+						  struct mm_struct *mm);
+void iommu_sva_unbind_device(struct iommu_attach_handle *handle);
+u32 iommu_sva_get_pasid(struct iommu_attach_handle *handle);
 struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
 					    struct mm_struct *mm);
 #else
-static inline struct iommu_sva *
+static inline struct iommu_attach_handle *
 iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
 {
-	return NULL;
+	return ERR_PTR(-ENODEV);
 }
 
-static inline void iommu_sva_unbind_device(struct iommu_sva *handle)
+static inline void iommu_sva_unbind_device(struct iommu_attach_handle *handle)
 {
 }
 
-static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle)
+static inline u32 iommu_sva_get_pasid(struct iommu_attach_handle *handle)
 {
 	return IOMMU_PASID_INVALID;
 }
diff --git a/include/linux/uacce.h b/include/linux/uacce.h
index e290c0269944..1548119c89ae 100644
--- a/include/linux/uacce.h
+++ b/include/linux/uacce.h
@@ -97,7 +97,7 @@ struct uacce_queue {
 	struct mutex mutex;
 	enum uacce_q_state state;
 	u32 pasid;
-	struct iommu_sva *handle;
+	struct iommu_attach_handle *handle;
 	struct address_space *mapping;
 };
 
diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index 7b98944135eb..45588156ca60 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -335,7 +335,7 @@ struct idxd_device {
 	struct idxd_wq **wqs;
 	struct idxd_engine **engines;
 
-	struct iommu_sva *sva;
+	struct iommu_attach_handle *sva;
 	unsigned int pasid;
 
 	int num_groups;
diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
index da1addaa1a31..ae65e0b85d69 100644
--- a/drivers/iommu/iommu-priv.h
+++ b/drivers/iommu/iommu-priv.h
@@ -30,6 +30,13 @@ void iommu_device_unregister_bus(struct iommu_device *iommu,
 
 struct iommu_attach_handle {
 	struct iommu_domain		*domain;
+	union {
+		/* attach data for SVA domain */
+		struct {
+			struct device	*dev;
+			refcount_t	users;
+		};
+	};
 };
 
 struct iommu_attach_handle *
diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
index c095a2c8f659..f8ac47bdfc4c 100644
--- a/drivers/dma/idxd/cdev.c
+++ b/drivers/dma/idxd/cdev.c
@@ -45,7 +45,7 @@ struct idxd_user_context {
 	unsigned int pasid;
 	struct mm_struct *mm;
 	unsigned int flags;
-	struct iommu_sva *sva;
+	struct iommu_attach_handle *sva;
 	struct idxd_dev idxd_dev;
 	u64 counters[COUNTER_MAX];
 	int id;
@@ -225,7 +225,7 @@ static int idxd_cdev_open(struct inode *inode, struct file *filp)
 	struct idxd_wq *wq;
 	struct device *dev, *fdev;
 	int rc = 0;
-	struct iommu_sva *sva;
+	struct iommu_attach_handle *sva;
 	unsigned int pasid;
 	struct idxd_cdev *idxd_cdev;
 
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 640acc804e8c..d49737e43de2 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -41,7 +41,6 @@ static struct iommu_mm_data *iommu_alloc_mm_data(struct mm_struct *mm, struct de
 	}
 	iommu_mm->pasid = pasid;
 	INIT_LIST_HEAD(&iommu_mm->sva_domains);
-	INIT_LIST_HEAD(&iommu_mm->sva_handles);
 	/*
 	 * Make sure the write to mm->iommu_mm is not reordered in front of
 	 * initialization to iommu_mm fields. If it does, readers may see a
@@ -67,13 +66,17 @@ static struct iommu_mm_data *iommu_alloc_mm_data(struct mm_struct *mm, struct de
  *
  * On error, returns an ERR_PTR value.
  */
-struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
+struct iommu_attach_handle *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
 {
+	struct iommu_group *group = dev->iommu_group;
+	struct iommu_attach_handle *handle;
 	struct iommu_mm_data *iommu_mm;
 	struct iommu_domain *domain;
-	struct iommu_sva *handle;
 	int ret;
 
+	if (!group)
+		return ERR_PTR(-ENODEV);
+
 	mutex_lock(&iommu_sva_lock);
 
 	/* Allocate mm->pasid if necessary. */
@@ -83,17 +86,15 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm
 		goto out_unlock;
 	}
 
-	list_for_each_entry(handle, &mm->iommu_mm->sva_handles, handle_item) {
-		if (handle->dev == dev) {
-			refcount_inc(&handle->users);
-			mutex_unlock(&iommu_sva_lock);
-			return handle;
-		}
+	/* A bond already exists, just take a reference`. */
+	handle = iommu_attach_handle_get(group, iommu_mm->pasid, IOMMU_DOMAIN_SVA);
+	if (!IS_ERR(handle)) {
+		refcount_inc(&handle->users);
+		mutex_unlock(&iommu_sva_lock);
+		return handle;
 	}
-
-	handle = kzalloc(sizeof(*handle), GFP_KERNEL);
-	if (!handle) {
-		ret = -ENOMEM;
+	if (handle == ERR_PTR(-EBUSY)) {
+		ret = PTR_ERR(handle);
 		goto out_unlock;
 	}
 
@@ -110,7 +111,7 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm
 	domain = iommu_sva_domain_alloc(dev, mm);
 	if (!domain) {
 		ret = -ENOMEM;
-		goto out_free_handle;
+		goto out_unlock;
 	}
 
 	ret = iommu_attach_device_pasid(domain, dev, iommu_mm->pasid);
@@ -120,17 +121,15 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm
 	list_add(&domain->next, &mm->iommu_mm->sva_domains);
 
 out:
+	handle = iommu_attach_handle_get(group, iommu_mm->pasid, 0);
 	refcount_set(&handle->users, 1);
-	list_add(&handle->handle_item, &mm->iommu_mm->sva_handles);
-	mutex_unlock(&iommu_sva_lock);
 	handle->dev = dev;
-	handle->domain = domain;
+	mutex_unlock(&iommu_sva_lock);
+
 	return handle;
 
 out_free_domain:
 	iommu_domain_free(domain);
-out_free_handle:
-	kfree(handle);
 out_unlock:
 	mutex_unlock(&iommu_sva_lock);
 	return ERR_PTR(ret);
@@ -145,7 +144,7 @@ EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
  * not be issuing any more transaction for this PASID. All outstanding page
  * requests for this PASID must have been flushed to the IOMMU.
  */
-void iommu_sva_unbind_device(struct iommu_sva *handle)
+void iommu_sva_unbind_device(struct iommu_attach_handle *handle)
 {
 	struct iommu_domain *domain = handle->domain;
 	struct iommu_mm_data *iommu_mm = domain->mm->iommu_mm;
@@ -156,7 +155,6 @@ void iommu_sva_unbind_device(struct iommu_sva *handle)
 		mutex_unlock(&iommu_sva_lock);
 		return;
 	}
-	list_del(&handle->handle_item);
 
 	iommu_detach_device_pasid(domain, dev, iommu_mm->pasid);
 	if (--domain->users == 0) {
@@ -164,11 +162,10 @@ void iommu_sva_unbind_device(struct iommu_sva *handle)
 		iommu_domain_free(domain);
 	}
 	mutex_unlock(&iommu_sva_lock);
-	kfree(handle);
 }
 EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
 
-u32 iommu_sva_get_pasid(struct iommu_sva *handle)
+u32 iommu_sva_get_pasid(struct iommu_attach_handle *handle)
 {
 	struct iommu_domain *domain = handle->domain;
 
diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
index bdc2e6fda782..b325097421c1 100644
--- a/drivers/misc/uacce/uacce.c
+++ b/drivers/misc/uacce/uacce.c
@@ -106,7 +106,7 @@ static long uacce_fops_compat_ioctl(struct file *filep,
 static int uacce_bind_queue(struct uacce_device *uacce, struct uacce_queue *q)
 {
 	u32 pasid;
-	struct iommu_sva *handle;
+	struct iommu_attach_handle *handle;
 
 	if (!(uacce->flags & UACCE_DEV_SVA))
 		return 0;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 3/9] iommu: Add attachment handle to struct iopf_group
  2024-04-30 14:57 [PATCH v5 0/9] IOMMUFD: Deliver IO page faults to user space Lu Baolu
  2024-04-30 14:57 ` [PATCH v5 1/9] iommu: Introduce domain attachment handle Lu Baolu
  2024-04-30 14:57 ` [PATCH v5 2/9] iommu: Replace sva_iommu with iommu_attach_handle Lu Baolu
@ 2024-04-30 14:57 ` Lu Baolu
  2024-05-08  0:04   ` Jason Gunthorpe
  2024-05-15  7:31   ` Tian, Kevin
  2024-04-30 14:57 ` [PATCH v5 4/9] iommufd: Add fault and response message definitions Lu Baolu
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 57+ messages in thread
From: Lu Baolu @ 2024-04-30 14:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel, Lu Baolu

Previously, the domain that a page fault targets is stored in an
iopf_group, which represents a minimal set of page faults. With the
introduction of attachment handle, replace the domain with the handle
so that the fault handler can obtain more information as needed
when handling the faults.

iommu_report_device_fault() is currently used for SVA page faults,
which handles the page fault in an internal cycle. The domain is retrieved
with iommu_get_domain_for_dev_pasid() if the pasid in the fault message
is valid. This doesn't work in IOMMUFD case, where if the pasid table of
a device is wholly managed by user space, there is no domain attached to
the PASID of the device. Improve this code to try fetching the attachment
handle on a PASID if possible, otherwise try it on the RID.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 include/linux/iommu.h      |  1 +
 drivers/iommu/io-pgfault.c | 34 ++++++++++++++--------------------
 2 files changed, 15 insertions(+), 20 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index be9c9a10169d..4ff68d50ae39 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -128,6 +128,7 @@ struct iopf_group {
 	struct list_head pending_node;
 	struct work_struct work;
 	struct iommu_domain *domain;
+	struct iommu_attach_handle *attach_handle;
 	/* The device's fault data parameter. */
 	struct iommu_fault_param *fault_param;
 };
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 06d78fcc79fd..f19215d4ffa2 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -59,28 +59,19 @@ void iopf_free_group(struct iopf_group *group)
 }
 EXPORT_SYMBOL_GPL(iopf_free_group);
 
-static struct iommu_domain *get_domain_for_iopf(struct device *dev,
-						struct iommu_fault *fault)
+static int get_attach_handle_for_iopf(struct device *dev, ioasid_t pasid,
+				      struct iopf_group *group)
 {
-	struct iommu_domain *domain;
+	struct iommu_attach_handle *handle;
 
-	if (fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) {
-		domain = iommu_get_domain_for_dev_pasid(dev, fault->prm.pasid, 0);
-		if (IS_ERR(domain))
-			domain = NULL;
-	} else {
-		domain = iommu_get_domain_for_dev(dev);
-	}
+	handle = iommu_attach_handle_get(dev->iommu_group, pasid, 0);
+	if (IS_ERR(handle))
+		return PTR_ERR(handle);
 
-	if (!domain || !domain->iopf_handler) {
-		dev_warn_ratelimited(dev,
-			"iopf (pasid %d) without domain attached or handler installed\n",
-			 fault->prm.pasid);
+	group->attach_handle = handle;
+	group->domain = handle->domain;
 
-		return NULL;
-	}
-
-	return domain;
+	return 0;
 }
 
 /* Non-last request of a group. Postpone until the last one. */
@@ -206,8 +197,11 @@ void iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
 	if (group == &abort_group)
 		goto err_abort;
 
-	group->domain = get_domain_for_iopf(dev, fault);
-	if (!group->domain)
+	if (!(fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) ||
+	    get_attach_handle_for_iopf(dev, fault->prm.pasid, group))
+		get_attach_handle_for_iopf(dev, IOMMU_NO_PASID, group);
+
+	if (!group->attach_handle || !group->domain->iopf_handler)
 		goto err_abort;
 
 	/*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 4/9] iommufd: Add fault and response message definitions
  2024-04-30 14:57 [PATCH v5 0/9] IOMMUFD: Deliver IO page faults to user space Lu Baolu
                   ` (2 preceding siblings ...)
  2024-04-30 14:57 ` [PATCH v5 3/9] iommu: Add attachment handle to struct iopf_group Lu Baolu
@ 2024-04-30 14:57 ` Lu Baolu
  2024-05-15  7:43   ` Tian, Kevin
  2024-04-30 14:57 ` [PATCH v5 5/9] iommufd: Add iommufd fault object Lu Baolu
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 57+ messages in thread
From: Lu Baolu @ 2024-04-30 14:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel, Lu Baolu

iommu_hwpt_pgfaults represent fault messages that the userspace can
retrieve. Multiple iommu_hwpt_pgfaults might be put in an iopf group,
with the IOMMU_PGFAULT_FLAGS_LAST_PAGE flag set only for the last
iommu_hwpt_pgfault.

An iommu_hwpt_page_response is a response message that the userspace
should send to the kernel after finishing handling a group of fault
messages. The @dev_id, @pasid, and @grpid fields in the message
identify an outstanding iopf group for a device. The @cookie field,
which matches the cookie field of the last fault in the group, will
be used by the kernel to look up the pending message.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 include/uapi/linux/iommufd.h | 96 ++++++++++++++++++++++++++++++++++++
 1 file changed, 96 insertions(+)

diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 1dfeaa2e649e..83b45dce94a4 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -692,4 +692,100 @@ struct iommu_hwpt_invalidate {
 	__u32 __reserved;
 };
 #define IOMMU_HWPT_INVALIDATE _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_INVALIDATE)
+
+/**
+ * enum iommu_hwpt_pgfault_flags - flags for struct iommu_hwpt_pgfault
+ * @IOMMU_PGFAULT_FLAGS_PASID_VALID: The pasid field of the fault data is
+ *                                   valid.
+ * @IOMMU_PGFAULT_FLAGS_LAST_PAGE: It's the last fault of a fault group.
+ */
+enum iommu_hwpt_pgfault_flags {
+	IOMMU_PGFAULT_FLAGS_PASID_VALID		= (1 << 0),
+	IOMMU_PGFAULT_FLAGS_LAST_PAGE		= (1 << 1),
+};
+
+/**
+ * enum iommu_hwpt_pgfault_perm - perm bits for struct iommu_hwpt_pgfault
+ * @IOMMU_PGFAULT_PERM_READ: request for read permission
+ * @IOMMU_PGFAULT_PERM_WRITE: request for write permission
+ * @IOMMU_PGFAULT_PERM_EXEC: (PCIE 10.4.1) request with a PASID that has the
+ *                           Execute Requested bit set in PASID TLP Prefix.
+ * @IOMMU_PGFAULT_PERM_PRIV: (PCIE 10.4.1) request with a PASID that has the
+ *                           Privileged Mode Requested bit set in PASID TLP
+ *                           Prefix.
+ */
+enum iommu_hwpt_pgfault_perm {
+	IOMMU_PGFAULT_PERM_READ			= (1 << 0),
+	IOMMU_PGFAULT_PERM_WRITE		= (1 << 1),
+	IOMMU_PGFAULT_PERM_EXEC			= (1 << 2),
+	IOMMU_PGFAULT_PERM_PRIV			= (1 << 3),
+};
+
+/**
+ * struct iommu_hwpt_pgfault - iommu page fault data
+ * @size: sizeof(struct iommu_hwpt_pgfault)
+ * @flags: Combination of enum iommu_hwpt_pgfault_flags
+ * @dev_id: id of the originated device
+ * @pasid: Process Address Space ID
+ * @grpid: Page Request Group Index
+ * @perm: Combination of enum iommu_hwpt_pgfault_perm
+ * @addr: Page address
+ * @length: a hint of how much data the requestor is expecting to fetch. For
+ *          example, if the PRI initiator knows it is going to do a 10MB
+ *          transfer, it could fill in 10MB and the OS could pre-fault in
+ *          10MB of IOVA. It's default to 0 if there's no such hint.
+ * @cookie: kernel-managed cookie identifying a group of fault messages. The
+ *          cookie number encoded in the last page fault of the group should
+ *          be echoed back in the response message.
+ */
+struct iommu_hwpt_pgfault {
+	__u32 size;
+	__u32 flags;
+	__u32 dev_id;
+	__u32 pasid;
+	__u32 grpid;
+	__u32 perm;
+	__u64 addr;
+	__u32 length;
+	__u32 cookie;
+};
+
+/**
+ * enum iommufd_page_response_code - Return status of fault handlers
+ * @IOMMUFD_PAGE_RESP_SUCCESS: Fault has been handled and the page tables
+ *                             populated, retry the access. This is the
+ *                             "Success" defined in PCI 10.4.2.1.
+ * @IOMMUFD_PAGE_RESP_INVALID: General error. Drop all subsequent faults
+ *                             from this device if possible. This is the
+ *                             "Response Failure" in PCI 10.4.2.1.
+ * @IOMMUFD_PAGE_RESP_FAILURE: Could not handle this fault, don't retry the
+ *                             access. This is the "Invalid Request" in PCI
+ *                             10.4.2.1.
+ */
+enum iommufd_page_response_code {
+	IOMMUFD_PAGE_RESP_SUCCESS = 0,
+	IOMMUFD_PAGE_RESP_INVALID,
+	IOMMUFD_PAGE_RESP_FAILURE,
+};
+
+/**
+ * struct iommu_hwpt_page_response - IOMMU page fault response
+ * @size: sizeof(struct iommu_hwpt_page_response)
+ * @flags: Must be set to 0
+ * @dev_id: device ID of target device for the response
+ * @pasid: Process Address Space ID
+ * @grpid: Page Request Group Index
+ * @code: One of response code in enum iommufd_page_response_code.
+ * @cookie: The kernel-managed cookie reported in the fault message.
+ */
+struct iommu_hwpt_page_response {
+	__u32 size;
+	__u32 flags;
+	__u32 dev_id;
+	__u32 pasid;
+	__u32 grpid;
+	__u32 code;
+	__u32 cookie;
+	__u32 reserved;
+};
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 5/9] iommufd: Add iommufd fault object
  2024-04-30 14:57 [PATCH v5 0/9] IOMMUFD: Deliver IO page faults to user space Lu Baolu
                   ` (3 preceding siblings ...)
  2024-04-30 14:57 ` [PATCH v5 4/9] iommufd: Add fault and response message definitions Lu Baolu
@ 2024-04-30 14:57 ` Lu Baolu
  2024-05-08  0:11   ` Jason Gunthorpe
                     ` (2 more replies)
  2024-04-30 14:57 ` [PATCH v5 6/9] iommufd: Fault-capable hwpt attach/detach/replace Lu Baolu
                   ` (3 subsequent siblings)
  8 siblings, 3 replies; 57+ messages in thread
From: Lu Baolu @ 2024-04-30 14:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel, Lu Baolu

An iommufd fault object provides an interface for delivering I/O page
faults to user space. These objects are created and destroyed by user
space, and they can be associated with or dissociated from hardware page
table objects during page table allocation or destruction.

User space interacts with the fault object through a file interface. This
interface offers a straightforward and efficient way for user space to
handle page faults. It allows user space to read fault messages
sequentially and respond to them by writing to the same file. The file
interface supports reading messages in poll mode, so it's recommended that
user space applications use io_uring to enhance read and write efficiency.

A fault object can be associated with any iopf-capable iommufd_hw_pgtable
during the pgtable's allocation. All I/O page faults triggered by devices
when accessing the I/O addresses of an iommufd_hw_pgtable are routed
through the fault object to user space. Similarly, user space's responses
to these page faults are routed back to the iommu device driver through
the same fault object.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 include/linux/iommu.h                   |   3 +
 drivers/iommu/iommu-priv.h              |   4 +
 drivers/iommu/iommufd/iommufd_private.h |  24 +++
 include/uapi/linux/iommufd.h            |  18 ++
 drivers/iommu/iommufd/device.c          |   1 +
 drivers/iommu/iommufd/fault.c           | 238 ++++++++++++++++++++++++
 drivers/iommu/iommufd/main.c            |   6 +
 drivers/iommu/iommufd/Makefile          |   1 +
 8 files changed, 295 insertions(+)
 create mode 100644 drivers/iommu/iommufd/fault.c

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 4ff68d50ae39..a7301b27e01f 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -131,6 +131,9 @@ struct iopf_group {
 	struct iommu_attach_handle *attach_handle;
 	/* The device's fault data parameter. */
 	struct iommu_fault_param *fault_param;
+	/* Used by handler provider to hook the group on its own lists. */
+	struct list_head node;
+	u32 cookie;
 };
 
 /**
diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
index ae65e0b85d69..1a0450a83bd0 100644
--- a/drivers/iommu/iommu-priv.h
+++ b/drivers/iommu/iommu-priv.h
@@ -36,6 +36,10 @@ struct iommu_attach_handle {
 			struct device	*dev;
 			refcount_t	users;
 		};
+		/* attach data for IOMMUFD */
+		struct {
+			void		*idev;
+		};
 	};
 };
 
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 991f864d1f9b..acb89bbf9f8e 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -128,6 +128,7 @@ enum iommufd_object_type {
 	IOMMUFD_OBJ_HWPT_NESTED,
 	IOMMUFD_OBJ_IOAS,
 	IOMMUFD_OBJ_ACCESS,
+	IOMMUFD_OBJ_FAULT,
 #ifdef CONFIG_IOMMUFD_TEST
 	IOMMUFD_OBJ_SELFTEST,
 #endif
@@ -395,6 +396,8 @@ struct iommufd_device {
 	/* always the physical device */
 	struct device *dev;
 	bool enforce_cache_coherency;
+	/* outstanding faults awaiting response indexed by fault group id */
+	struct xarray faults;
 };
 
 static inline struct iommufd_device *
@@ -426,6 +429,27 @@ void iopt_remove_access(struct io_pagetable *iopt,
 			u32 iopt_access_list_id);
 void iommufd_access_destroy_object(struct iommufd_object *obj);
 
+/*
+ * An iommufd_fault object represents an interface to deliver I/O page faults
+ * to the user space. These objects are created/destroyed by the user space and
+ * associated with hardware page table objects during page-table allocation.
+ */
+struct iommufd_fault {
+	struct iommufd_object obj;
+	struct iommufd_ctx *ictx;
+	struct file *filep;
+
+	/* The lists of outstanding faults protected by below mutex. */
+	struct mutex mutex;
+	struct list_head deliver;
+	struct list_head response;
+
+	struct wait_queue_head wait_queue;
+};
+
+int iommufd_fault_alloc(struct iommufd_ucmd *ucmd);
+void iommufd_fault_destroy(struct iommufd_object *obj);
+
 #ifdef CONFIG_IOMMUFD_TEST
 int iommufd_test(struct iommufd_ucmd *ucmd);
 void iommufd_selftest_destroy(struct iommufd_object *obj);
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 83b45dce94a4..1819b28e9e6b 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -50,6 +50,7 @@ enum {
 	IOMMUFD_CMD_HWPT_SET_DIRTY_TRACKING,
 	IOMMUFD_CMD_HWPT_GET_DIRTY_BITMAP,
 	IOMMUFD_CMD_HWPT_INVALIDATE,
+	IOMMUFD_CMD_FAULT_ALLOC,
 };
 
 /**
@@ -788,4 +789,21 @@ struct iommu_hwpt_page_response {
 	__u32 cookie;
 	__u32 reserved;
 };
+
+/**
+ * struct iommu_fault_alloc - ioctl(IOMMU_FAULT_ALLOC)
+ * @size: sizeof(struct iommu_fault_alloc)
+ * @flags: Must be 0
+ * @out_fault_id: The ID of the new FAULT
+ * @out_fault_fd: The fd of the new FAULT
+ *
+ * Explicitly allocate a fault handling object.
+ */
+struct iommu_fault_alloc {
+	__u32 size;
+	__u32 flags;
+	__u32 out_fault_id;
+	__u32 out_fault_fd;
+};
+#define IOMMU_FAULT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_FAULT_ALLOC)
 #endif
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 873630c111c1..22e22969363a 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -215,6 +215,7 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
 	refcount_inc(&idev->obj.users);
 	/* igroup refcount moves into iommufd_device */
 	idev->igroup = igroup;
+	xa_init_flags(&idev->faults, XA_FLAGS_ALLOC1);
 
 	/*
 	 * If the caller fails after this success it must call
diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
new file mode 100644
index 000000000000..13125c0feecb
--- /dev/null
+++ b/drivers/iommu/iommufd/fault.c
@@ -0,0 +1,238 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (C) 2024 Intel Corporation
+ */
+#define pr_fmt(fmt) "iommufd: " fmt
+
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/iommufd.h>
+#include <linux/poll.h>
+#include <linux/anon_inodes.h>
+#include <uapi/linux/iommufd.h>
+
+#include "../iommu-priv.h"
+#include "iommufd_private.h"
+
+void iommufd_fault_destroy(struct iommufd_object *obj)
+{
+	struct iommufd_fault *fault = container_of(obj, struct iommufd_fault, obj);
+	struct iopf_group *group, *next;
+
+	/*
+	 * The iommufd object's reference count is zero at this point.
+	 * We can be confident that no other threads are currently
+	 * accessing this pointer. Therefore, acquiring the mutex here
+	 * is unnecessary.
+	 */
+	list_for_each_entry_safe(group, next, &fault->deliver, node) {
+		list_del(&group->node);
+		iopf_group_response(group, IOMMU_PAGE_RESP_INVALID);
+		iopf_free_group(group);
+	}
+}
+
+static void iommufd_compose_fault_message(struct iommu_fault *fault,
+					  struct iommu_hwpt_pgfault *hwpt_fault,
+					  struct iommufd_device *idev,
+					  u32 cookie)
+{
+	hwpt_fault->size = sizeof(*hwpt_fault);
+	hwpt_fault->flags = fault->prm.flags;
+	hwpt_fault->dev_id = idev->obj.id;
+	hwpt_fault->pasid = fault->prm.pasid;
+	hwpt_fault->grpid = fault->prm.grpid;
+	hwpt_fault->perm = fault->prm.perm;
+	hwpt_fault->addr = fault->prm.addr;
+	hwpt_fault->length = 0;
+	hwpt_fault->cookie = cookie;
+}
+
+static ssize_t iommufd_fault_fops_read(struct file *filep, char __user *buf,
+				       size_t count, loff_t *ppos)
+{
+	size_t fault_size = sizeof(struct iommu_hwpt_pgfault);
+	struct iommufd_fault *fault = filep->private_data;
+	struct iommu_hwpt_pgfault data;
+	struct iommufd_device *idev;
+	struct iopf_group *group;
+	struct iopf_fault *iopf;
+	size_t done = 0;
+	int rc;
+
+	if (*ppos || count % fault_size)
+		return -ESPIPE;
+
+	mutex_lock(&fault->mutex);
+	while (!list_empty(&fault->deliver) && count > done) {
+		group = list_first_entry(&fault->deliver,
+					 struct iopf_group, node);
+
+		if (list_count_nodes(&group->faults) * fault_size > count - done)
+			break;
+
+		idev = group->attach_handle->idev;
+		if (!idev)
+			break;
+
+		rc = xa_alloc(&idev->faults, &group->cookie, group,
+			      xa_limit_32b, GFP_KERNEL);
+		if (rc)
+			break;
+
+		list_for_each_entry(iopf, &group->faults, list) {
+			iommufd_compose_fault_message(&iopf->fault,
+						      &data, idev,
+						      group->cookie);
+			rc = copy_to_user(buf + done, &data, fault_size);
+			if (rc) {
+				xa_erase(&idev->faults, group->cookie);
+				break;
+			}
+			done += fault_size;
+		}
+
+		list_del(&group->node);
+	}
+	mutex_unlock(&fault->mutex);
+
+	return done;
+}
+
+static ssize_t iommufd_fault_fops_write(struct file *filep, const char __user *buf,
+					size_t count, loff_t *ppos)
+{
+	size_t response_size = sizeof(struct iommu_hwpt_page_response);
+	struct iommufd_fault *fault = filep->private_data;
+	struct iommu_hwpt_page_response response;
+	struct iommufd_device *idev = NULL;
+	struct iopf_group *group;
+	size_t done = 0;
+	int rc;
+
+	if (*ppos || count % response_size)
+		return -ESPIPE;
+
+	mutex_lock(&fault->mutex);
+	while (count > done) {
+		rc = copy_from_user(&response, buf + done, response_size);
+		if (rc)
+			break;
+
+		if (!idev || idev->obj.id != response.dev_id)
+			idev = container_of(iommufd_get_object(fault->ictx,
+							       response.dev_id,
+							       IOMMUFD_OBJ_DEVICE),
+					    struct iommufd_device, obj);
+		if (IS_ERR(idev))
+			break;
+
+		group = xa_erase(&idev->faults, response.cookie);
+		if (!group)
+			break;
+
+		iopf_group_response(group, response.code);
+		iopf_free_group(group);
+		done += response_size;
+
+		iommufd_put_object(fault->ictx, &idev->obj);
+	}
+	mutex_unlock(&fault->mutex);
+
+	return done;
+}
+
+static __poll_t iommufd_fault_fops_poll(struct file *filep,
+					struct poll_table_struct *wait)
+{
+	struct iommufd_fault *fault = filep->private_data;
+	__poll_t pollflags = EPOLLOUT;
+
+	poll_wait(filep, &fault->wait_queue, wait);
+	mutex_lock(&fault->mutex);
+	if (!list_empty(&fault->deliver))
+		pollflags |= EPOLLIN | EPOLLRDNORM;
+	mutex_unlock(&fault->mutex);
+
+	return pollflags;
+}
+
+static int iommufd_fault_fops_release(struct inode *inode, struct file *filep)
+{
+	struct iommufd_fault *fault = filep->private_data;
+
+	iommufd_ctx_put(fault->ictx);
+	refcount_dec(&fault->obj.users);
+	return 0;
+}
+
+static const struct file_operations iommufd_fault_fops = {
+	.owner		= THIS_MODULE,
+	.open		= nonseekable_open,
+	.read		= iommufd_fault_fops_read,
+	.write		= iommufd_fault_fops_write,
+	.poll		= iommufd_fault_fops_poll,
+	.release	= iommufd_fault_fops_release,
+	.llseek		= no_llseek,
+};
+
+int iommufd_fault_alloc(struct iommufd_ucmd *ucmd)
+{
+	struct iommu_fault_alloc *cmd = ucmd->cmd;
+	struct iommufd_fault *fault;
+	struct file *filep;
+	int fdno;
+	int rc;
+
+	if (cmd->flags)
+		return -EOPNOTSUPP;
+
+	fault = iommufd_object_alloc(ucmd->ictx, fault, IOMMUFD_OBJ_FAULT);
+	if (IS_ERR(fault))
+		return PTR_ERR(fault);
+
+	fault->ictx = ucmd->ictx;
+	INIT_LIST_HEAD(&fault->deliver);
+	mutex_init(&fault->mutex);
+	init_waitqueue_head(&fault->wait_queue);
+
+	filep = anon_inode_getfile("[iommufd-pgfault]", &iommufd_fault_fops,
+				   fault, O_RDWR);
+	if (IS_ERR(filep)) {
+		rc = PTR_ERR(filep);
+		goto out_abort;
+	}
+
+	refcount_inc(&fault->obj.users);
+	iommufd_ctx_get(fault->ictx);
+	fault->filep = filep;
+
+	fdno = get_unused_fd_flags(O_CLOEXEC);
+	if (fdno < 0) {
+		rc = fdno;
+		goto out_fput;
+	}
+
+	cmd->out_fault_id = fault->obj.id;
+	cmd->out_fault_fd = fdno;
+
+	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
+	if (rc)
+		goto out_put_fdno;
+	iommufd_object_finalize(ucmd->ictx, &fault->obj);
+
+	fd_install(fdno, fault->filep);
+
+	return 0;
+out_put_fdno:
+	put_unused_fd(fdno);
+out_fput:
+	fput(filep);
+	refcount_dec(&fault->obj.users);
+	iommufd_ctx_put(fault->ictx);
+out_abort:
+	iommufd_object_abort_and_destroy(ucmd->ictx, &fault->obj);
+
+	return rc;
+}
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 39b32932c61e..792db077d33e 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -332,6 +332,7 @@ union ucmd_buffer {
 	struct iommu_ioas_unmap unmap;
 	struct iommu_option option;
 	struct iommu_vfio_ioas vfio_ioas;
+	struct iommu_fault_alloc fault;
 #ifdef CONFIG_IOMMUFD_TEST
 	struct iommu_test_cmd test;
 #endif
@@ -381,6 +382,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
 		 val64),
 	IOCTL_OP(IOMMU_VFIO_IOAS, iommufd_vfio_ioas, struct iommu_vfio_ioas,
 		 __reserved),
+	IOCTL_OP(IOMMU_FAULT_ALLOC, iommufd_fault_alloc, struct iommu_fault_alloc,
+		 out_fault_fd),
 #ifdef CONFIG_IOMMUFD_TEST
 	IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last),
 #endif
@@ -513,6 +516,9 @@ static const struct iommufd_object_ops iommufd_object_ops[] = {
 		.destroy = iommufd_hwpt_nested_destroy,
 		.abort = iommufd_hwpt_nested_abort,
 	},
+	[IOMMUFD_OBJ_FAULT] = {
+		.destroy = iommufd_fault_destroy,
+	},
 #ifdef CONFIG_IOMMUFD_TEST
 	[IOMMUFD_OBJ_SELFTEST] = {
 		.destroy = iommufd_selftest_destroy,
diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile
index 34b446146961..b94a74366eed 100644
--- a/drivers/iommu/iommufd/Makefile
+++ b/drivers/iommu/iommufd/Makefile
@@ -6,6 +6,7 @@ iommufd-y := \
 	ioas.o \
 	main.o \
 	pages.o \
+	fault.o \
 	vfio_compat.o
 
 iommufd-$(CONFIG_IOMMUFD_TEST) += selftest.o
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 6/9] iommufd: Fault-capable hwpt attach/detach/replace
  2024-04-30 14:57 [PATCH v5 0/9] IOMMUFD: Deliver IO page faults to user space Lu Baolu
                   ` (4 preceding siblings ...)
  2024-04-30 14:57 ` [PATCH v5 5/9] iommufd: Add iommufd fault object Lu Baolu
@ 2024-04-30 14:57 ` Lu Baolu
  2024-05-08  0:18   ` Jason Gunthorpe
  2024-05-15  8:43   ` Tian, Kevin
  2024-04-30 14:57 ` [PATCH v5 7/9] iommufd: Associate fault object with iommufd_hw_pgtable Lu Baolu
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 57+ messages in thread
From: Lu Baolu @ 2024-04-30 14:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel, Lu Baolu

Add iopf-capable hw page table attach/detach/replace helpers. The pointer
to iommufd_device is stored in the domain attachment handle, so that it
can be echo'ed back in the iopf_group.

The iopf-capable hw page tables can only be attached to devices that
support the IOMMU_DEV_FEAT_IOPF feature. On the first attachment of an
iopf-capable hw_pagetable to the device, the IOPF feature is enabled on
the device. Similarly, after the last iopf-capable hwpt is detached from
the device, the IOPF feature is disabled on the device.

The current implementation allows a replacement between iopf-capable and
non-iopf-capable hw page tables. This matches the nested translation use
case, where a parent domain is attached by default and can then be
replaced with a nested user domain with iopf support.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/iommufd/iommufd_private.h |  10 ++
 drivers/iommu/iommufd/device.c          |  15 ++-
 drivers/iommu/iommufd/fault.c           | 118 ++++++++++++++++++++++++
 3 files changed, 140 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index acb89bbf9f8e..18c8d7d38dcd 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -293,6 +293,7 @@ int iommufd_check_iova_range(struct io_pagetable *iopt,
 struct iommufd_hw_pagetable {
 	struct iommufd_object obj;
 	struct iommu_domain *domain;
+	struct iommufd_fault *fault;
 };
 
 struct iommufd_hwpt_paging {
@@ -396,6 +397,7 @@ struct iommufd_device {
 	/* always the physical device */
 	struct device *dev;
 	bool enforce_cache_coherency;
+	bool iopf_enabled;
 	/* outstanding faults awaiting response indexed by fault group id */
 	struct xarray faults;
 };
@@ -450,6 +452,14 @@ struct iommufd_fault {
 int iommufd_fault_alloc(struct iommufd_ucmd *ucmd);
 void iommufd_fault_destroy(struct iommufd_object *obj);
 
+int iommufd_fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt,
+				    struct iommufd_device *idev);
+void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable *hwpt,
+				     struct iommufd_device *idev);
+int iommufd_fault_domain_replace_dev(struct iommufd_device *idev,
+				     struct iommufd_hw_pagetable *hwpt,
+				     struct iommufd_hw_pagetable *old);
+
 #ifdef CONFIG_IOMMUFD_TEST
 int iommufd_test(struct iommufd_ucmd *ucmd);
 void iommufd_selftest_destroy(struct iommufd_object *obj);
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 22e22969363a..2bee3f399ec7 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -377,7 +377,10 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 	 * attachment.
 	 */
 	if (list_empty(&idev->igroup->device_list)) {
-		rc = iommu_attach_group(hwpt->domain, idev->igroup->group);
+		if (hwpt->fault)
+			rc = iommufd_fault_domain_attach_dev(hwpt, idev);
+		else
+			rc = iommu_attach_group(hwpt->domain, idev->igroup->group);
 		if (rc)
 			goto err_unresv;
 		idev->igroup->hwpt = hwpt;
@@ -403,7 +406,10 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev)
 	mutex_lock(&idev->igroup->lock);
 	list_del(&idev->group_item);
 	if (list_empty(&idev->igroup->device_list)) {
-		iommu_detach_group(hwpt->domain, idev->igroup->group);
+		if (hwpt->fault)
+			iommufd_fault_domain_detach_dev(hwpt, idev);
+		else
+			iommu_detach_group(hwpt->domain, idev->igroup->group);
 		idev->igroup->hwpt = NULL;
 	}
 	if (hwpt_is_paging(hwpt))
@@ -498,7 +504,10 @@ iommufd_device_do_replace(struct iommufd_device *idev,
 			goto err_unlock;
 	}
 
-	rc = iommu_group_replace_domain(igroup->group, hwpt->domain);
+	if (old_hwpt->fault || hwpt->fault)
+		rc = iommufd_fault_domain_replace_dev(idev, hwpt, old_hwpt);
+	else
+		rc = iommu_group_replace_domain(igroup->group, hwpt->domain);
 	if (rc)
 		goto err_unresv;
 
diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
index 13125c0feecb..6357229bf3b4 100644
--- a/drivers/iommu/iommufd/fault.c
+++ b/drivers/iommu/iommufd/fault.c
@@ -15,6 +15,124 @@
 #include "../iommu-priv.h"
 #include "iommufd_private.h"
 
+static int iommufd_fault_iopf_enable(struct iommufd_device *idev)
+{
+	int ret;
+
+	if (idev->iopf_enabled)
+		return 0;
+
+	ret = iommu_dev_enable_feature(idev->dev, IOMMU_DEV_FEAT_IOPF);
+	if (ret)
+		return ret;
+
+	idev->iopf_enabled = true;
+
+	return 0;
+}
+
+static void iommufd_fault_iopf_disable(struct iommufd_device *idev)
+{
+	if (!idev->iopf_enabled)
+		return;
+
+	iommu_dev_disable_feature(idev->dev, IOMMU_DEV_FEAT_IOPF);
+	idev->iopf_enabled = false;
+}
+
+int iommufd_fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt,
+				    struct iommufd_device *idev)
+{
+	struct iommu_attach_handle *handle;
+	int ret;
+
+	if (!hwpt->fault)
+		return -EINVAL;
+
+	ret = iommufd_fault_iopf_enable(idev);
+	if (ret)
+		return ret;
+
+	ret = iommu_attach_group(hwpt->domain, idev->igroup->group);
+	if (ret) {
+		iommufd_fault_iopf_disable(idev);
+		return ret;
+	}
+
+	handle = iommu_attach_handle_get(idev->igroup->group, IOMMU_NO_PASID, 0);
+	handle->idev = idev;
+
+	return 0;
+}
+
+static void iommufd_auto_response_faults(struct iommufd_hw_pagetable *hwpt,
+					 struct iommufd_device *idev)
+{
+	struct iommufd_fault *fault = hwpt->fault;
+	struct iopf_group *group, *next;
+	unsigned long index;
+
+	if (!fault)
+		return;
+
+	mutex_lock(&fault->mutex);
+	list_for_each_entry_safe(group, next, &fault->deliver, node) {
+		if (group->domain != hwpt->domain ||
+		    group->fault_param->dev != idev->dev)
+			continue;
+		list_del(&group->node);
+		iopf_group_response(group, IOMMU_PAGE_RESP_INVALID);
+		iopf_free_group(group);
+	}
+
+	xa_for_each(&idev->faults, index, group) {
+		if (group->domain != hwpt->domain)
+			continue;
+		xa_erase(&idev->faults, index);
+		iopf_group_response(group, IOMMU_PAGE_RESP_INVALID);
+		iopf_free_group(group);
+	}
+	mutex_unlock(&fault->mutex);
+}
+
+void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable *hwpt,
+				     struct iommufd_device *idev)
+{
+	iommu_detach_group(hwpt->domain, idev->igroup->group);
+	iommufd_fault_iopf_disable(idev);
+	iommufd_auto_response_faults(hwpt, idev);
+}
+
+int iommufd_fault_domain_replace_dev(struct iommufd_device *idev,
+				     struct iommufd_hw_pagetable *hwpt,
+				     struct iommufd_hw_pagetable *old)
+{
+	struct iommu_attach_handle *handle;
+	int ret;
+
+	if (hwpt->fault)
+		ret = iommufd_fault_iopf_enable(idev);
+	else
+		iommufd_fault_iopf_disable(idev);
+
+	ret = iommu_group_replace_domain(idev->igroup->group, hwpt->domain);
+	if (ret)
+		goto out_cleanup;
+
+	iommufd_auto_response_faults(old, idev);
+	handle = iommu_attach_handle_get(idev->igroup->group, IOMMU_NO_PASID, 0);
+	handle->idev = idev;
+
+	return 0;
+out_cleanup:
+	if (old->fault)
+		ret = iommufd_fault_iopf_enable(idev);
+	else
+		iommufd_fault_iopf_disable(idev);
+
+	return ret;
+}
+
 void iommufd_fault_destroy(struct iommufd_object *obj)
 {
 	struct iommufd_fault *fault = container_of(obj, struct iommufd_fault, obj);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 7/9] iommufd: Associate fault object with iommufd_hw_pgtable
  2024-04-30 14:57 [PATCH v5 0/9] IOMMUFD: Deliver IO page faults to user space Lu Baolu
                   ` (5 preceding siblings ...)
  2024-04-30 14:57 ` [PATCH v5 6/9] iommufd: Fault-capable hwpt attach/detach/replace Lu Baolu
@ 2024-04-30 14:57 ` Lu Baolu
  2024-05-08  0:25   ` Jason Gunthorpe
  2024-05-15  8:50   ` Tian, Kevin
  2024-04-30 14:57 ` [PATCH v5 8/9] iommufd/selftest: Add IOPF support for mock device Lu Baolu
  2024-04-30 14:57 ` [PATCH v5 9/9] iommufd/selftest: Add coverage for IOPF test Lu Baolu
  8 siblings, 2 replies; 57+ messages in thread
From: Lu Baolu @ 2024-04-30 14:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel, Lu Baolu

When allocating a user iommufd_hw_pagetable, the user space is allowed to
associate a fault object with the hw_pagetable by specifying the fault
object ID in the page table allocation data and setting the
IOMMU_HWPT_FAULT_ID_VALID flag bit.

On a successful return of hwpt allocation, the user can retrieve and
respond to page faults by reading and writing the file interface of the
fault object.

Once a fault object has been associated with a hwpt, the hwpt is
iopf-capable, indicated by fault_capable set to true. Attaching,
detaching, or replacing an iopf-capable hwpt to an RID or PASID will
differ from those that are not iopf-capable.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/iommufd/iommufd_private.h |  9 +++++++
 include/uapi/linux/iommufd.h            |  8 ++++++
 drivers/iommu/iommufd/fault.c           | 17 ++++++++++++
 drivers/iommu/iommufd/hw_pagetable.c    | 35 +++++++++++++++++++------
 drivers/iommu/iommufd/main.c            |  2 +-
 5 files changed, 62 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 18c8d7d38dcd..100c7b585e7e 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -449,8 +449,17 @@ struct iommufd_fault {
 	struct wait_queue_head wait_queue;
 };
 
+static inline struct iommufd_fault *
+iommufd_get_fault(struct iommufd_ucmd *ucmd, u32 id)
+{
+	return container_of(iommufd_get_object(ucmd->ictx, id,
+					       IOMMUFD_OBJ_FAULT),
+			    struct iommufd_fault, obj);
+}
+
 int iommufd_fault_alloc(struct iommufd_ucmd *ucmd);
 void iommufd_fault_destroy(struct iommufd_object *obj);
+int iommufd_fault_iopf_handler(struct iopf_group *group);
 
 int iommufd_fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt,
 				    struct iommufd_device *idev);
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 1819b28e9e6b..3d566c1ffcc6 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -357,10 +357,13 @@ struct iommu_vfio_ioas {
  *                                the parent HWPT in a nesting configuration.
  * @IOMMU_HWPT_ALLOC_DIRTY_TRACKING: Dirty tracking support for device IOMMU is
  *                                   enforced on device attachment
+ * @IOMMU_HWPT_FAULT_ID_VALID: The fault_id field of hwpt allocation data is
+ *                             valid.
  */
 enum iommufd_hwpt_alloc_flags {
 	IOMMU_HWPT_ALLOC_NEST_PARENT = 1 << 0,
 	IOMMU_HWPT_ALLOC_DIRTY_TRACKING = 1 << 1,
+	IOMMU_HWPT_FAULT_ID_VALID = 1 << 2,
 };
 
 /**
@@ -412,6 +415,9 @@ enum iommu_hwpt_data_type {
  * @data_type: One of enum iommu_hwpt_data_type
  * @data_len: Length of the type specific data
  * @data_uptr: User pointer to the type specific data
+ * @fault_id: The ID of IOMMUFD_FAULT object. Valid only if flags field of
+ *            IOMMU_HWPT_FAULT_ID_VALID is set.
+ * @__reserved2: Padding to 64-bit alignment. Must be 0.
  *
  * Explicitly allocate a hardware page table object. This is the same object
  * type that is returned by iommufd_device_attach() and represents the
@@ -442,6 +448,8 @@ struct iommu_hwpt_alloc {
 	__u32 data_type;
 	__u32 data_len;
 	__aligned_u64 data_uptr;
+	__u32 fault_id;
+	__u32 __reserved2;
 };
 #define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_ALLOC)
 
diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
index 6357229bf3b4..802d0f819b22 100644
--- a/drivers/iommu/iommufd/fault.c
+++ b/drivers/iommu/iommufd/fault.c
@@ -354,3 +354,20 @@ int iommufd_fault_alloc(struct iommufd_ucmd *ucmd)
 
 	return rc;
 }
+
+int iommufd_fault_iopf_handler(struct iopf_group *group)
+{
+	struct iommufd_hw_pagetable *hwpt;
+	struct iommufd_fault *fault;
+
+	hwpt = group->domain->fault_data;
+	fault = hwpt->fault;
+
+	mutex_lock(&fault->mutex);
+	list_add_tail(&group->node, &fault->deliver);
+	mutex_unlock(&fault->mutex);
+
+	wake_up_interruptible(&fault->wait_queue);
+
+	return 0;
+}
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 33d142f8057d..f7d05e12dea1 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -8,6 +8,15 @@
 #include "../iommu-priv.h"
 #include "iommufd_private.h"
 
+static void __iommufd_hwpt_destroy(struct iommufd_hw_pagetable *hwpt)
+{
+	if (hwpt->domain)
+		iommu_domain_free(hwpt->domain);
+
+	if (hwpt->fault)
+		iommufd_put_object(hwpt->fault->ictx, &hwpt->fault->obj);
+}
+
 void iommufd_hwpt_paging_destroy(struct iommufd_object *obj)
 {
 	struct iommufd_hwpt_paging *hwpt_paging =
@@ -22,9 +31,7 @@ void iommufd_hwpt_paging_destroy(struct iommufd_object *obj)
 					 hwpt_paging->common.domain);
 	}
 
-	if (hwpt_paging->common.domain)
-		iommu_domain_free(hwpt_paging->common.domain);
-
+	__iommufd_hwpt_destroy(&hwpt_paging->common);
 	refcount_dec(&hwpt_paging->ioas->obj.users);
 }
 
@@ -49,9 +56,7 @@ void iommufd_hwpt_nested_destroy(struct iommufd_object *obj)
 	struct iommufd_hwpt_nested *hwpt_nested =
 		container_of(obj, struct iommufd_hwpt_nested, common.obj);
 
-	if (hwpt_nested->common.domain)
-		iommu_domain_free(hwpt_nested->common.domain);
-
+	__iommufd_hwpt_destroy(&hwpt_nested->common);
 	refcount_dec(&hwpt_nested->parent->common.obj.users);
 }
 
@@ -213,7 +218,8 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx,
 	struct iommufd_hw_pagetable *hwpt;
 	int rc;
 
-	if (flags || !user_data->len || !ops->domain_alloc_user)
+	if ((flags & ~IOMMU_HWPT_FAULT_ID_VALID) ||
+	    !user_data->len || !ops->domain_alloc_user)
 		return ERR_PTR(-EOPNOTSUPP);
 	if (parent->auto_domain || !parent->nest_parent)
 		return ERR_PTR(-EINVAL);
@@ -227,7 +233,7 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx,
 	refcount_inc(&parent->common.obj.users);
 	hwpt_nested->parent = parent;
 
-	hwpt->domain = ops->domain_alloc_user(idev->dev, flags,
+	hwpt->domain = ops->domain_alloc_user(idev->dev, 0,
 					      parent->common.domain, user_data);
 	if (IS_ERR(hwpt->domain)) {
 		rc = PTR_ERR(hwpt->domain);
@@ -308,6 +314,19 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 		goto out_put_pt;
 	}
 
+	if (cmd->flags & IOMMU_HWPT_FAULT_ID_VALID) {
+		struct iommufd_fault *fault;
+
+		fault = iommufd_get_fault(ucmd, cmd->fault_id);
+		if (IS_ERR(fault)) {
+			rc = PTR_ERR(fault);
+			goto out_hwpt;
+		}
+		hwpt->fault = fault;
+		hwpt->domain->iopf_handler = iommufd_fault_iopf_handler;
+		hwpt->domain->fault_data = hwpt;
+	}
+
 	cmd->out_hwpt_id = hwpt->obj.id;
 	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
 	if (rc)
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 792db077d33e..d8414ee9feae 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -359,7 +359,7 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
 	IOCTL_OP(IOMMU_GET_HW_INFO, iommufd_get_hw_info, struct iommu_hw_info,
 		 __reserved),
 	IOCTL_OP(IOMMU_HWPT_ALLOC, iommufd_hwpt_alloc, struct iommu_hwpt_alloc,
-		 __reserved),
+		 __reserved2),
 	IOCTL_OP(IOMMU_HWPT_GET_DIRTY_BITMAP, iommufd_hwpt_get_dirty_bitmap,
 		 struct iommu_hwpt_get_dirty_bitmap, data),
 	IOCTL_OP(IOMMU_HWPT_INVALIDATE, iommufd_hwpt_invalidate,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 8/9] iommufd/selftest: Add IOPF support for mock device
  2024-04-30 14:57 [PATCH v5 0/9] IOMMUFD: Deliver IO page faults to user space Lu Baolu
                   ` (6 preceding siblings ...)
  2024-04-30 14:57 ` [PATCH v5 7/9] iommufd: Associate fault object with iommufd_hw_pgtable Lu Baolu
@ 2024-04-30 14:57 ` Lu Baolu
  2024-04-30 14:57 ` [PATCH v5 9/9] iommufd/selftest: Add coverage for IOPF test Lu Baolu
  8 siblings, 0 replies; 57+ messages in thread
From: Lu Baolu @ 2024-04-30 14:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel, Lu Baolu

Extend the selftest mock device to support generating and responding to
an IOPF. Also add an ioctl interface to userspace applications to trigger
the IOPF on the mock device. This would allow userspace applications to
test the IOMMUFD's handling of IOPFs without having to rely on any real
hardware.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/iommufd/iommufd_test.h |  8 ++++
 drivers/iommu/iommufd/selftest.c     | 63 ++++++++++++++++++++++++++++
 2 files changed, 71 insertions(+)

diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index e854d3f67205..acbbba1c6671 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -22,6 +22,7 @@ enum {
 	IOMMU_TEST_OP_MOCK_DOMAIN_FLAGS,
 	IOMMU_TEST_OP_DIRTY,
 	IOMMU_TEST_OP_MD_CHECK_IOTLB,
+	IOMMU_TEST_OP_TRIGGER_IOPF,
 };
 
 enum {
@@ -127,6 +128,13 @@ struct iommu_test_cmd {
 			__u32 id;
 			__u32 iotlb;
 		} check_iotlb;
+		struct {
+			__u32 dev_id;
+			__u32 pasid;
+			__u32 grpid;
+			__u32 perm;
+			__u64 addr;
+		} trigger_iopf;
 	};
 	__u32 last;
 };
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 97ce62602e66..f925f5e5c00a 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -504,6 +504,8 @@ static bool mock_domain_capable(struct device *dev, enum iommu_cap cap)
 	return false;
 }
 
+static struct iopf_queue *mock_iommu_iopf_queue;
+
 static struct iommu_device mock_iommu_device = {
 };
 
@@ -514,6 +516,29 @@ static struct iommu_device *mock_probe_device(struct device *dev)
 	return &mock_iommu_device;
 }
 
+static void mock_domain_page_response(struct device *dev, struct iopf_fault *evt,
+				      struct iommu_page_response *msg)
+{
+}
+
+static int mock_dev_enable_feat(struct device *dev, enum iommu_dev_features feat)
+{
+	if (feat != IOMMU_DEV_FEAT_IOPF || !mock_iommu_iopf_queue)
+		return -ENODEV;
+
+	return iopf_queue_add_device(mock_iommu_iopf_queue, dev);
+}
+
+static int mock_dev_disable_feat(struct device *dev, enum iommu_dev_features feat)
+{
+	if (feat != IOMMU_DEV_FEAT_IOPF || !mock_iommu_iopf_queue)
+		return -ENODEV;
+
+	iopf_queue_remove_device(mock_iommu_iopf_queue, dev);
+
+	return 0;
+}
+
 static const struct iommu_ops mock_ops = {
 	/*
 	 * IOMMU_DOMAIN_BLOCKED cannot be returned from def_domain_type()
@@ -529,6 +554,9 @@ static const struct iommu_ops mock_ops = {
 	.capable = mock_domain_capable,
 	.device_group = generic_device_group,
 	.probe_device = mock_probe_device,
+	.page_response = mock_domain_page_response,
+	.dev_enable_feat = mock_dev_enable_feat,
+	.dev_disable_feat = mock_dev_disable_feat,
 	.default_domain_ops =
 		&(struct iommu_domain_ops){
 			.free = mock_domain_free,
@@ -1397,6 +1425,31 @@ static int iommufd_test_dirty(struct iommufd_ucmd *ucmd, unsigned int mockpt_id,
 	return rc;
 }
 
+static int iommufd_test_trigger_iopf(struct iommufd_ucmd *ucmd,
+				     struct iommu_test_cmd *cmd)
+{
+	struct iopf_fault event = { };
+	struct iommufd_device *idev;
+
+	idev = iommufd_get_device(ucmd, cmd->trigger_iopf.dev_id);
+	if (IS_ERR(idev))
+		return PTR_ERR(idev);
+
+	event.fault.prm.flags = IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
+	if (cmd->trigger_iopf.pasid != IOMMU_NO_PASID)
+		event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
+	event.fault.type = IOMMU_FAULT_PAGE_REQ;
+	event.fault.prm.addr = cmd->trigger_iopf.addr;
+	event.fault.prm.pasid = cmd->trigger_iopf.pasid;
+	event.fault.prm.grpid = cmd->trigger_iopf.grpid;
+	event.fault.prm.perm = cmd->trigger_iopf.perm;
+
+	iommu_report_device_fault(idev->dev, &event);
+	iommufd_put_object(ucmd->ictx, &idev->obj);
+
+	return 0;
+}
+
 void iommufd_selftest_destroy(struct iommufd_object *obj)
 {
 	struct selftest_obj *sobj = container_of(obj, struct selftest_obj, obj);
@@ -1472,6 +1525,8 @@ int iommufd_test(struct iommufd_ucmd *ucmd)
 					  cmd->dirty.page_size,
 					  u64_to_user_ptr(cmd->dirty.uptr),
 					  cmd->dirty.flags);
+	case IOMMU_TEST_OP_TRIGGER_IOPF:
+		return iommufd_test_trigger_iopf(ucmd, cmd);
 	default:
 		return -EOPNOTSUPP;
 	}
@@ -1513,6 +1568,9 @@ int __init iommufd_test_init(void)
 				  &iommufd_mock_bus_type.nb);
 	if (rc)
 		goto err_sysfs;
+
+	mock_iommu_iopf_queue = iopf_queue_alloc("mock-iopfq");
+
 	return 0;
 
 err_sysfs:
@@ -1528,6 +1586,11 @@ int __init iommufd_test_init(void)
 
 void iommufd_test_exit(void)
 {
+	if (mock_iommu_iopf_queue) {
+		iopf_queue_free(mock_iommu_iopf_queue);
+		mock_iommu_iopf_queue = NULL;
+	}
+
 	iommu_device_sysfs_remove(&mock_iommu_device);
 	iommu_device_unregister_bus(&mock_iommu_device,
 				    &iommufd_mock_bus_type.bus,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 9/9] iommufd/selftest: Add coverage for IOPF test
  2024-04-30 14:57 [PATCH v5 0/9] IOMMUFD: Deliver IO page faults to user space Lu Baolu
                   ` (7 preceding siblings ...)
  2024-04-30 14:57 ` [PATCH v5 8/9] iommufd/selftest: Add IOPF support for mock device Lu Baolu
@ 2024-04-30 14:57 ` Lu Baolu
  8 siblings, 0 replies; 57+ messages in thread
From: Lu Baolu @ 2024-04-30 14:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel, Lu Baolu

Extend the selftest tool to add coverage of testing IOPF handling. This
would include the following tests:

- Allocating and destroying an iommufd fault object.
- Allocating and destroying an IOPF-capable HWPT.
- Attaching/detaching/replacing an IOPF-capable HWPT on a device.
- Triggering an IOPF on the mock device.
- Retrieving and responding to the IOPF through the file interface.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 tools/testing/selftests/iommu/iommufd_utils.h | 84 +++++++++++++++++--
 tools/testing/selftests/iommu/iommufd.c       | 18 ++++
 .../selftests/iommu/iommufd_fail_nth.c        |  2 +-
 3 files changed, 98 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 8d2b46b2114d..7f33389e070c 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -153,7 +153,7 @@ static int _test_cmd_mock_domain_replace(int fd, __u32 stdev_id, __u32 pt_id,
 	EXPECT_ERRNO(_errno, _test_cmd_mock_domain_replace(self->fd, stdev_id, \
 							   pt_id, NULL))
 
-static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id,
+static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, __u32 ft_id,
 				__u32 flags, __u32 *hwpt_id, __u32 data_type,
 				void *data, size_t data_len)
 {
@@ -165,6 +165,7 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id,
 		.data_type = data_type,
 		.data_len = data_len,
 		.data_uptr = (uint64_t)data,
+		.fault_id = ft_id,
 	};
 	int ret;
 
@@ -177,24 +178,30 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id,
 }
 
 #define test_cmd_hwpt_alloc(device_id, pt_id, flags, hwpt_id)                  \
-	ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags,   \
+	ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, 0, flags,   \
 					  hwpt_id, IOMMU_HWPT_DATA_NONE, NULL, \
 					  0))
 #define test_err_hwpt_alloc(_errno, device_id, pt_id, flags, hwpt_id)   \
 	EXPECT_ERRNO(_errno, _test_cmd_hwpt_alloc(                      \
-				     self->fd, device_id, pt_id, flags, \
+				     self->fd, device_id, pt_id, 0, flags, \
 				     hwpt_id, IOMMU_HWPT_DATA_NONE, NULL, 0))
 
 #define test_cmd_hwpt_alloc_nested(device_id, pt_id, flags, hwpt_id,         \
 				   data_type, data, data_len)                \
-	ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \
+	ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, 0, flags, \
 					  hwpt_id, data_type, data, data_len))
 #define test_err_hwpt_alloc_nested(_errno, device_id, pt_id, flags, hwpt_id, \
 				   data_type, data, data_len)                \
 	EXPECT_ERRNO(_errno,                                                 \
-		     _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \
+		     _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, 0, flags, \
 					  hwpt_id, data_type, data, data_len))
 
+#define test_cmd_hwpt_alloc_iopf(device_id, pt_id, fault_id, flags, hwpt_id,    \
+				   data_type, data, data_len)                   \
+	ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, fault_id, \
+					  flags, hwpt_id, data_type, data,      \
+					  data_len))
+
 #define test_cmd_hwpt_check_iotlb(hwpt_id, iotlb_id, expected)                 \
 	({                                                                     \
 		struct iommu_test_cmd test_cmd = {                             \
@@ -684,3 +691,70 @@ static int _test_cmd_get_hw_info(int fd, __u32 device_id, void *data,
 
 #define test_cmd_get_hw_capabilities(device_id, caps, mask) \
 	ASSERT_EQ(0, _test_cmd_get_hw_info(self->fd, device_id, NULL, 0, &caps))
+
+static int _test_ioctl_fault_alloc(int fd, __u32 *fault_id, __u32 *fault_fd)
+{
+	struct iommu_fault_alloc cmd = {
+		.size = sizeof(cmd),
+	};
+	int ret;
+
+	ret = ioctl(fd, IOMMU_FAULT_ALLOC, &cmd);
+	if (ret)
+		return ret;
+	*fault_id = cmd.out_fault_id;
+	*fault_fd = cmd.out_fault_fd;
+	return 0;
+}
+
+#define test_ioctl_fault_alloc(fault_id, fault_fd)                       \
+	({                                                               \
+		ASSERT_EQ(0, _test_ioctl_fault_alloc(self->fd, fault_id, \
+						     fault_fd));         \
+		ASSERT_NE(0, *(fault_id));                               \
+		ASSERT_NE(0, *(fault_fd));                               \
+	})
+
+static int _test_cmd_trigger_iopf(int fd, __u32 device_id, __u32 fault_fd)
+{
+	struct iommu_test_cmd trigger_iopf_cmd = {
+		.size = sizeof(trigger_iopf_cmd),
+		.op = IOMMU_TEST_OP_TRIGGER_IOPF,
+		.trigger_iopf = {
+			.dev_id = device_id,
+			.pasid = 0x1,
+			.grpid = 0x2,
+			.perm = IOMMU_PGFAULT_PERM_READ | IOMMU_PGFAULT_PERM_WRITE,
+			.addr = 0xdeadbeaf,
+		},
+	};
+	struct iommu_hwpt_page_response response = {
+		.size = sizeof(struct iommu_hwpt_page_response),
+		.dev_id = device_id,
+		.pasid = 0x1,
+		.grpid = 0x2,
+		.code = 0,
+	};
+	struct iommu_hwpt_pgfault fault = {};
+	ssize_t bytes;
+	int ret;
+
+	ret = ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_TRIGGER_IOPF), &trigger_iopf_cmd);
+	if (ret)
+		return ret;
+
+	bytes = read(fault_fd, &fault, sizeof(fault));
+	if (bytes <= 0)
+		return -EIO;
+
+	response.cookie = fault.cookie;
+
+	bytes = write(fault_fd, &response, sizeof(response));
+	if (bytes <= 0)
+		return -EIO;
+
+	return 0;
+}
+
+#define test_cmd_trigger_iopf(device_id, fault_fd) \
+	ASSERT_EQ(0, _test_cmd_trigger_iopf(self->fd, device_id, fault_fd))
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index edf1c99c9936..5b0169875a4d 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -279,6 +279,9 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested)
 	uint32_t parent_hwpt_id = 0;
 	uint32_t parent_hwpt_id_not_work = 0;
 	uint32_t test_hwpt_id = 0;
+	uint32_t iopf_hwpt_id;
+	uint32_t fault_id;
+	uint32_t fault_fd;
 
 	if (self->device_id) {
 		/* Negative tests */
@@ -326,6 +329,7 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested)
 					   sizeof(data));
 
 		/* Allocate two nested hwpts sharing one common parent hwpt */
+		test_ioctl_fault_alloc(&fault_id, &fault_fd);
 		test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id, 0,
 					   &nested_hwpt_id[0],
 					   IOMMU_HWPT_DATA_SELFTEST, &data,
@@ -334,6 +338,10 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested)
 					   &nested_hwpt_id[1],
 					   IOMMU_HWPT_DATA_SELFTEST, &data,
 					   sizeof(data));
+		test_cmd_hwpt_alloc_iopf(self->device_id, parent_hwpt_id, fault_id,
+					 IOMMU_HWPT_FAULT_ID_VALID, &iopf_hwpt_id,
+					 IOMMU_HWPT_DATA_SELFTEST, &data,
+					 sizeof(data));
 		test_cmd_hwpt_check_iotlb_all(nested_hwpt_id[0],
 					      IOMMU_TEST_IOTLB_DEFAULT);
 		test_cmd_hwpt_check_iotlb_all(nested_hwpt_id[1],
@@ -504,14 +512,24 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested)
 			     _test_ioctl_destroy(self->fd, nested_hwpt_id[1]));
 		test_ioctl_destroy(nested_hwpt_id[0]);
 
+		/* Switch from nested_hwpt_id[1] to iopf_hwpt_id */
+		test_cmd_mock_domain_replace(self->stdev_id, iopf_hwpt_id);
+		EXPECT_ERRNO(EBUSY,
+			     _test_ioctl_destroy(self->fd, iopf_hwpt_id));
+		/* Trigger an IOPF on the device */
+		test_cmd_trigger_iopf(self->device_id, fault_fd);
+
 		/* Detach from nested_hwpt_id[1] and destroy it */
 		test_cmd_mock_domain_replace(self->stdev_id, parent_hwpt_id);
 		test_ioctl_destroy(nested_hwpt_id[1]);
+		test_ioctl_destroy(iopf_hwpt_id);
 
 		/* Detach from the parent hw_pagetable and destroy it */
 		test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id);
 		test_ioctl_destroy(parent_hwpt_id);
 		test_ioctl_destroy(parent_hwpt_id_not_work);
+		close(fault_fd);
+		test_ioctl_destroy(fault_id);
 	} else {
 		test_err_hwpt_alloc(ENOENT, self->device_id, self->ioas_id, 0,
 				    &parent_hwpt_id);
diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c
index f590417cd67a..c5d5e69452b0 100644
--- a/tools/testing/selftests/iommu/iommufd_fail_nth.c
+++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c
@@ -615,7 +615,7 @@ TEST_FAIL_NTH(basic_fail_nth, device)
 	if (_test_cmd_get_hw_info(self->fd, idev_id, &info, sizeof(info), NULL))
 		return -1;
 
-	if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0, &hwpt_id,
+	if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0, 0, &hwpt_id,
 				 IOMMU_HWPT_DATA_NONE, 0, 0))
 		return -1;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 2/9] iommu: Replace sva_iommu with iommu_attach_handle
  2024-04-30 14:57 ` [PATCH v5 2/9] iommu: Replace sva_iommu with iommu_attach_handle Lu Baolu
@ 2024-05-07 23:58   ` Jason Gunthorpe
  2024-05-15  7:21   ` Tian, Kevin
  1 sibling, 0 replies; 57+ messages in thread
From: Jason Gunthorpe @ 2024-05-07 23:58 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On Tue, Apr 30, 2024 at 10:57:03PM +0800, Lu Baolu wrote:
> diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
> index da1addaa1a31..ae65e0b85d69 100644
> --- a/drivers/iommu/iommu-priv.h
> +++ b/drivers/iommu/iommu-priv.h
> @@ -30,6 +30,13 @@ void iommu_device_unregister_bus(struct iommu_device *iommu,
>  
>  struct iommu_attach_handle {
>  	struct iommu_domain		*domain;
> +	union {
> +		/* attach data for SVA domain */
> +		struct {
> +			struct device	*dev;
> +			refcount_t	users;
> +		};
> +	};
>  };

FWIW I was thinking of having the caller allocate the handle and pass it
down, but this seems workable too and is a bit simpler.

> diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
> index bdc2e6fda782..b325097421c1 100644
> --- a/drivers/misc/uacce/uacce.c
> +++ b/drivers/misc/uacce/uacce.c
> @@ -106,7 +106,7 @@ static long uacce_fops_compat_ioctl(struct file *filep,
>  static int uacce_bind_queue(struct uacce_device *uacce, struct uacce_queue *q)
>  {
>  	u32 pasid;
> -	struct iommu_sva *handle;
> +	struct iommu_attach_handle *handle;

Though I'm much less keen on this..

Maybe

  struct iommu_attach_handle {
  	struct iommu_domain		*domain;
 	union {
	      struct iommu_sva sva;
 	};
  };

?

Then container_of(sva) to get back to handle and keep the meaningful
type?

Jason

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 3/9] iommu: Add attachment handle to struct iopf_group
  2024-04-30 14:57 ` [PATCH v5 3/9] iommu: Add attachment handle to struct iopf_group Lu Baolu
@ 2024-05-08  0:04   ` Jason Gunthorpe
  2024-05-10  3:14     ` Baolu Lu
  2024-05-15  7:31   ` Tian, Kevin
  1 sibling, 1 reply; 57+ messages in thread
From: Jason Gunthorpe @ 2024-05-08  0:04 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On Tue, Apr 30, 2024 at 10:57:04PM +0800, Lu Baolu wrote:
> @@ -206,8 +197,11 @@ void iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
>  	if (group == &abort_group)
>  		goto err_abort;
>  
> -	group->domain = get_domain_for_iopf(dev, fault);
> -	if (!group->domain)
> +	if (!(fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) ||
> +	    get_attach_handle_for_iopf(dev, fault->prm.pasid, group))
> +		get_attach_handle_for_iopf(dev, IOMMU_NO_PASID, group);

That seems a bit weird looking?

get_attach_handle_for_iopf(dev, 
   (fault->prm.flags &
   IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) ? fault->prm.pasid : IOMMU_NO_PASID,
   group);

Jason

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 5/9] iommufd: Add iommufd fault object
  2024-04-30 14:57 ` [PATCH v5 5/9] iommufd: Add iommufd fault object Lu Baolu
@ 2024-05-08  0:11   ` Jason Gunthorpe
  2024-05-08 10:05     ` Baolu Lu
  2024-05-08  0:22   ` Jason Gunthorpe
  2024-05-15  8:37   ` Tian, Kevin
  2 siblings, 1 reply; 57+ messages in thread
From: Jason Gunthorpe @ 2024-05-08  0:11 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On Tue, Apr 30, 2024 at 10:57:06PM +0800, Lu Baolu wrote:
> diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
> index ae65e0b85d69..1a0450a83bd0 100644
> --- a/drivers/iommu/iommu-priv.h
> +++ b/drivers/iommu/iommu-priv.h
> @@ -36,6 +36,10 @@ struct iommu_attach_handle {
>  			struct device	*dev;
>  			refcount_t	users;
>  		};
> +		/* attach data for IOMMUFD */
> +		struct {
> +			void		*idev;
> +		};

We can use a proper type here, just forward declare it.

But this sequence in the other patch:

+       ret = iommu_attach_group(hwpt->domain, idev->igroup->group);
+       if (ret) {
+               iommufd_fault_iopf_disable(idev);
+               return ret;
+       }
+
+       handle = iommu_attach_handle_get(idev->igroup->group, IOMMU_NO_PASID, 0);
+       handle->idev = idev;

Is why I was imagining the caller would allocate, because now we have
the issue that a fault capable domain was installed into the IOMMU
before it's handle could be fully setup, so we have a race where a
fault could come in right between those things. Then what happens?
I suppose we can retry the fault and by the time it comes back the
race should resolve. A bit ugly I suppose.

> diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
> index 83b45dce94a4..1819b28e9e6b 100644
> --- a/include/uapi/linux/iommufd.h
> +++ b/include/uapi/linux/iommufd.h
> @@ -50,6 +50,7 @@ enum {
>  	IOMMUFD_CMD_HWPT_SET_DIRTY_TRACKING,
>  	IOMMUFD_CMD_HWPT_GET_DIRTY_BITMAP,
>  	IOMMUFD_CMD_HWPT_INVALIDATE,
> +	IOMMUFD_CMD_FAULT_ALLOC,
>  };

I think I'd call this a CMD_FAULT_QUEUE_ALLOC - does that make sense?

Jason

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 6/9] iommufd: Fault-capable hwpt attach/detach/replace
  2024-04-30 14:57 ` [PATCH v5 6/9] iommufd: Fault-capable hwpt attach/detach/replace Lu Baolu
@ 2024-05-08  0:18   ` Jason Gunthorpe
  2024-05-10  3:20     ` Baolu Lu
  2024-05-15  8:43   ` Tian, Kevin
  1 sibling, 1 reply; 57+ messages in thread
From: Jason Gunthorpe @ 2024-05-08  0:18 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On Tue, Apr 30, 2024 at 10:57:07PM +0800, Lu Baolu wrote:
> diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
> index 13125c0feecb..6357229bf3b4 100644
> --- a/drivers/iommu/iommufd/fault.c
> +++ b/drivers/iommu/iommufd/fault.c
> @@ -15,6 +15,124 @@
>  #include "../iommu-priv.h"
>  #include "iommufd_private.h"
>  
> +static int iommufd_fault_iopf_enable(struct iommufd_device *idev)
> +{
> +	int ret;
> +
> +	if (idev->iopf_enabled)
> +		return 0;
> +
> +	ret = iommu_dev_enable_feature(idev->dev, IOMMU_DEV_FEAT_IOPF);
> +	if (ret)
> +		return ret;
> +
> +	idev->iopf_enabled = true;
> +
> +	return 0;
> +}

I would greatly prefer we remove this from the drivers :\ I guess it
is Ok for now

Doesn't this need a counter? We can have many fault capable PASIDs?
That will get changed in the PASID series?

Jason


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 5/9] iommufd: Add iommufd fault object
  2024-04-30 14:57 ` [PATCH v5 5/9] iommufd: Add iommufd fault object Lu Baolu
  2024-05-08  0:11   ` Jason Gunthorpe
@ 2024-05-08  0:22   ` Jason Gunthorpe
  2024-05-10  9:13     ` Baolu Lu
  2024-05-15  8:37   ` Tian, Kevin
  2 siblings, 1 reply; 57+ messages in thread
From: Jason Gunthorpe @ 2024-05-08  0:22 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On Tue, Apr 30, 2024 at 10:57:06PM +0800, Lu Baolu wrote:
> +static ssize_t iommufd_fault_fops_read(struct file *filep, char __user *buf,
> +				       size_t count, loff_t *ppos)
> +{
> +	size_t fault_size = sizeof(struct iommu_hwpt_pgfault);
> +	struct iommufd_fault *fault = filep->private_data;
> +	struct iommu_hwpt_pgfault data;
> +	struct iommufd_device *idev;
> +	struct iopf_group *group;
> +	struct iopf_fault *iopf;
> +	size_t done = 0;
> +	int rc;
> +
> +	if (*ppos || count % fault_size)
> +		return -ESPIPE;
> +
> +	mutex_lock(&fault->mutex);
> +	while (!list_empty(&fault->deliver) && count > done) {
> +		group = list_first_entry(&fault->deliver,
> +					 struct iopf_group, node);
> +
> +		if (list_count_nodes(&group->faults) * fault_size > count - done)
> +			break;

Can this list_count be precomputed when we build the fault group?

> +
> +		idev = group->attach_handle->idev;
> +		if (!idev)
> +			break;

This check should be done before adding the fault to the linked
list. See my other note about the race.

> +
> +		rc = xa_alloc(&idev->faults, &group->cookie, group,
> +			      xa_limit_32b, GFP_KERNEL);
> +		if (rc)
> +			break;

This error handling is not quite right, if done == 0 then this should
return rc.


> +
> +		list_for_each_entry(iopf, &group->faults, list) {
> +			iommufd_compose_fault_message(&iopf->fault,
> +						      &data, idev,
> +						      group->cookie);
> +			rc = copy_to_user(buf + done, &data, fault_size);
> +			if (rc) {
> +				xa_erase(&idev->faults, group->cookie);
> +				break;

Same here

(same comment on the write side too)

Jason

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 7/9] iommufd: Associate fault object with iommufd_hw_pgtable
  2024-04-30 14:57 ` [PATCH v5 7/9] iommufd: Associate fault object with iommufd_hw_pgtable Lu Baolu
@ 2024-05-08  0:25   ` Jason Gunthorpe
  2024-05-10  3:23     ` Baolu Lu
  2024-05-15  8:50   ` Tian, Kevin
  1 sibling, 1 reply; 57+ messages in thread
From: Jason Gunthorpe @ 2024-05-08  0:25 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On Tue, Apr 30, 2024 at 10:57:08PM +0800, Lu Baolu wrote:
>  /**
> @@ -412,6 +415,9 @@ enum iommu_hwpt_data_type {
>   * @data_type: One of enum iommu_hwpt_data_type
>   * @data_len: Length of the type specific data
>   * @data_uptr: User pointer to the type specific data
> + * @fault_id: The ID of IOMMUFD_FAULT object. Valid only if flags field of
> + *            IOMMU_HWPT_FAULT_ID_VALID is set.
> + * @__reserved2: Padding to 64-bit alignment. Must be 0.
>   *
>   * Explicitly allocate a hardware page table object. This is the same object
>   * type that is returned by iommufd_device_attach() and represents the
> @@ -442,6 +448,8 @@ struct iommu_hwpt_alloc {
>  	__u32 data_type;
>  	__u32 data_len;
>  	__aligned_u64 data_uptr;
> +	__u32 fault_id;
> +	__u32 __reserved2;
>  };
>  #define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_ALLOC)

[..]

> @@ -359,7 +359,7 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
>  	IOCTL_OP(IOMMU_GET_HW_INFO, iommufd_get_hw_info, struct iommu_hw_info,
>  		 __reserved),
>  	IOCTL_OP(IOMMU_HWPT_ALLOC, iommufd_hwpt_alloc, struct iommu_hwpt_alloc,
> -		 __reserved),
> +		 __reserved2),

This is now how the back compat mechanism works. The value here is the
absolute minimum size, it should never increase. The first __reserved
is always the right value.

If you change it then old userspace that doesn't include the fault_id
will stop working.

Jason

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 5/9] iommufd: Add iommufd fault object
  2024-05-08  0:11   ` Jason Gunthorpe
@ 2024-05-08 10:05     ` Baolu Lu
  2024-05-15  7:57       ` Tian, Kevin
  0 siblings, 1 reply; 57+ messages in thread
From: Baolu Lu @ 2024-05-08 10:05 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On 2024/5/8 8:11, Jason Gunthorpe wrote:
> On Tue, Apr 30, 2024 at 10:57:06PM +0800, Lu Baolu wrote:
>> diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
>> index ae65e0b85d69..1a0450a83bd0 100644
>> --- a/drivers/iommu/iommu-priv.h
>> +++ b/drivers/iommu/iommu-priv.h
>> @@ -36,6 +36,10 @@ struct iommu_attach_handle {
>>   			struct device	*dev;
>>   			refcount_t	users;
>>   		};
>> +		/* attach data for IOMMUFD */
>> +		struct {
>> +			void		*idev;
>> +		};
> We can use a proper type here, just forward declare it.
> 
> But this sequence in the other patch:
> 
> +       ret = iommu_attach_group(hwpt->domain, idev->igroup->group);
> +       if (ret) {
> +               iommufd_fault_iopf_disable(idev);
> +               return ret;
> +       }
> +
> +       handle = iommu_attach_handle_get(idev->igroup->group, IOMMU_NO_PASID, 0);
> +       handle->idev = idev;
> 
> Is why I was imagining the caller would allocate, because now we have
> the issue that a fault capable domain was installed into the IOMMU
> before it's handle could be fully setup, so we have a race where a
> fault could come in right between those things. Then what happens?
> I suppose we can retry the fault and by the time it comes back the
> race should resolve. A bit ugly I suppose.

You are right. It makes more sense if the attached data is allocated and
managed by the caller. I will go in this direction and update my series.
I will also consider other review comments you have given in other
places.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 3/9] iommu: Add attachment handle to struct iopf_group
  2024-05-08  0:04   ` Jason Gunthorpe
@ 2024-05-10  3:14     ` Baolu Lu
  2024-05-10 13:38       ` Jason Gunthorpe
  0 siblings, 1 reply; 57+ messages in thread
From: Baolu Lu @ 2024-05-10  3:14 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On 5/8/24 8:04 AM, Jason Gunthorpe wrote:
> On Tue, Apr 30, 2024 at 10:57:04PM +0800, Lu Baolu wrote:
>> @@ -206,8 +197,11 @@ void iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
>>   	if (group == &abort_group)
>>   		goto err_abort;
>>   
>> -	group->domain = get_domain_for_iopf(dev, fault);
>> -	if (!group->domain)
>> +	if (!(fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) ||
>> +	    get_attach_handle_for_iopf(dev, fault->prm.pasid, group))
>> +		get_attach_handle_for_iopf(dev, IOMMU_NO_PASID, group);
> That seems a bit weird looking?

Agreed.

> get_attach_handle_for_iopf(dev,
>     (fault->prm.flags &
>     IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) ? fault->prm.pasid : IOMMU_NO_PASID,
>     group);

The logic here is that it tries the PASID domain and if it doesn't
exist, then tries the RID domain as well. I explained this in the commit
message:

"
... if the pasid table of a device is wholly managed by user space,
there is no domain attached to the PASID of the device ...
"

Perhaps I can improve it like this,

	int rc = -EINVAL;
	...
	if (fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID)
		rc = get_attach_handle_for_iopf(dev, fault->prm.pasid, group);
	if (rc)
		rc = get_attach_handle_for_iopf(dev, IOMMU_NO_PASID, group);

	if (rc || !group->attach_handle->domain->iopf_handler)
		goto err_abort;

Best regards,
baolu

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 6/9] iommufd: Fault-capable hwpt attach/detach/replace
  2024-05-08  0:18   ` Jason Gunthorpe
@ 2024-05-10  3:20     ` Baolu Lu
  2024-05-10 13:39       ` Jason Gunthorpe
  0 siblings, 1 reply; 57+ messages in thread
From: Baolu Lu @ 2024-05-10  3:20 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On 5/8/24 8:18 AM, Jason Gunthorpe wrote:
> On Tue, Apr 30, 2024 at 10:57:07PM +0800, Lu Baolu wrote:
>> diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
>> index 13125c0feecb..6357229bf3b4 100644
>> --- a/drivers/iommu/iommufd/fault.c
>> +++ b/drivers/iommu/iommufd/fault.c
>> @@ -15,6 +15,124 @@
>>   #include "../iommu-priv.h"
>>   #include "iommufd_private.h"
>>   
>> +static int iommufd_fault_iopf_enable(struct iommufd_device *idev)
>> +{
>> +	int ret;
>> +
>> +	if (idev->iopf_enabled)
>> +		return 0;
>> +
>> +	ret = iommu_dev_enable_feature(idev->dev, IOMMU_DEV_FEAT_IOPF);
>> +	if (ret)
>> +		return ret;
>> +
>> +	idev->iopf_enabled = true;
>> +
>> +	return 0;
>> +}
> I would greatly prefer we remove this from the drivers :\ I guess it
> is Ok for now
> 
> Doesn't this need a counter? We can have many fault capable PASIDs?
> That will get changed in the PASID series?

Okay, let's design this more gracefully after the PASID interfaces are
landed. For now, we assume that the device driver will do this.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 7/9] iommufd: Associate fault object with iommufd_hw_pgtable
  2024-05-08  0:25   ` Jason Gunthorpe
@ 2024-05-10  3:23     ` Baolu Lu
  0 siblings, 0 replies; 57+ messages in thread
From: Baolu Lu @ 2024-05-10  3:23 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On 5/8/24 8:25 AM, Jason Gunthorpe wrote:
> On Tue, Apr 30, 2024 at 10:57:08PM +0800, Lu Baolu wrote:
>>   /**
>> @@ -412,6 +415,9 @@ enum iommu_hwpt_data_type {
>>    * @data_type: One of enum iommu_hwpt_data_type
>>    * @data_len: Length of the type specific data
>>    * @data_uptr: User pointer to the type specific data
>> + * @fault_id: The ID of IOMMUFD_FAULT object. Valid only if flags field of
>> + *            IOMMU_HWPT_FAULT_ID_VALID is set.
>> + * @__reserved2: Padding to 64-bit alignment. Must be 0.
>>    *
>>    * Explicitly allocate a hardware page table object. This is the same object
>>    * type that is returned by iommufd_device_attach() and represents the
>> @@ -442,6 +448,8 @@ struct iommu_hwpt_alloc {
>>   	__u32 data_type;
>>   	__u32 data_len;
>>   	__aligned_u64 data_uptr;
>> +	__u32 fault_id;
>> +	__u32 __reserved2;
>>   };
>>   #define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_ALLOC)
> [..]
> 
>> @@ -359,7 +359,7 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
>>   	IOCTL_OP(IOMMU_GET_HW_INFO, iommufd_get_hw_info, struct iommu_hw_info,
>>   		 __reserved),
>>   	IOCTL_OP(IOMMU_HWPT_ALLOC, iommufd_hwpt_alloc, struct iommu_hwpt_alloc,
>> -		 __reserved),
>> +		 __reserved2),
> This is now how the back compat mechanism works. The value here is the
> absolute minimum size, it should never increase. The first __reserved
> is always the right value.
> 
> If you change it then old userspace that doesn't include the fault_id
> will stop working.

Yeah! I will remove this change.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 5/9] iommufd: Add iommufd fault object
  2024-05-08  0:22   ` Jason Gunthorpe
@ 2024-05-10  9:13     ` Baolu Lu
  0 siblings, 0 replies; 57+ messages in thread
From: Baolu Lu @ 2024-05-10  9:13 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On 2024/5/8 8:22, Jason Gunthorpe wrote:
> On Tue, Apr 30, 2024 at 10:57:06PM +0800, Lu Baolu wrote:
>> +static ssize_t iommufd_fault_fops_read(struct file *filep, char __user *buf,
>> +				       size_t count, loff_t *ppos)
>> +{
>> +	size_t fault_size = sizeof(struct iommu_hwpt_pgfault);
>> +	struct iommufd_fault *fault = filep->private_data;
>> +	struct iommu_hwpt_pgfault data;
>> +	struct iommufd_device *idev;
>> +	struct iopf_group *group;
>> +	struct iopf_fault *iopf;
>> +	size_t done = 0;
>> +	int rc;
>> +
>> +	if (*ppos || count % fault_size)
>> +		return -ESPIPE;
>> +
>> +	mutex_lock(&fault->mutex);
>> +	while (!list_empty(&fault->deliver) && count > done) {
>> +		group = list_first_entry(&fault->deliver,
>> +					 struct iopf_group, node);
>> +
>> +		if (list_count_nodes(&group->faults) * fault_size > count - done)
>> +			break;
> 
> Can this list_count be precomputed when we build the fault group?

Yes. Done.

> 
>> +
>> +		idev = group->attach_handle->idev;
>> +		if (!idev)
>> +			break;
> 
> This check should be done before adding the fault to the linked
> list. See my other note about the race.

Done.

> 
>> +
>> +		rc = xa_alloc(&idev->faults, &group->cookie, group,
>> +			      xa_limit_32b, GFP_KERNEL);
>> +		if (rc)
>> +			break;
> 
> This error handling is not quite right, if done == 0 then this should
> return rc.
> 
> 
>> +
>> +		list_for_each_entry(iopf, &group->faults, list) {
>> +			iommufd_compose_fault_message(&iopf->fault,
>> +						      &data, idev,
>> +						      group->cookie);
>> +			rc = copy_to_user(buf + done, &data, fault_size);
>> +			if (rc) {
>> +				xa_erase(&idev->faults, group->cookie);
>> +				break;
> 
> Same here
> 
> (same comment on the write side too)

All fixed. Thank you!

Best regards,
baolu


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 3/9] iommu: Add attachment handle to struct iopf_group
  2024-05-10  3:14     ` Baolu Lu
@ 2024-05-10 13:38       ` Jason Gunthorpe
  2024-05-10 14:30         ` Baolu Lu
  0 siblings, 1 reply; 57+ messages in thread
From: Jason Gunthorpe @ 2024-05-10 13:38 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On Fri, May 10, 2024 at 11:14:20AM +0800, Baolu Lu wrote:
> On 5/8/24 8:04 AM, Jason Gunthorpe wrote:
> > On Tue, Apr 30, 2024 at 10:57:04PM +0800, Lu Baolu wrote:
> > > @@ -206,8 +197,11 @@ void iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
> > >   	if (group == &abort_group)
> > >   		goto err_abort;
> > > -	group->domain = get_domain_for_iopf(dev, fault);
> > > -	if (!group->domain)
> > > +	if (!(fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) ||
> > > +	    get_attach_handle_for_iopf(dev, fault->prm.pasid, group))
> > > +		get_attach_handle_for_iopf(dev, IOMMU_NO_PASID, group);
> > That seems a bit weird looking?
> 
> Agreed.
> 
> > get_attach_handle_for_iopf(dev,
> >     (fault->prm.flags &
> >     IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) ? fault->prm.pasid : IOMMU_NO_PASID,
> >     group);
> 
> The logic here is that it tries the PASID domain and if it doesn't
> exist, then tries the RID domain as well. I explained this in the commit
> message:
> 
> "
> ... if the pasid table of a device is wholly managed by user space,
> there is no domain attached to the PASID of the device ...
> "

Okay, it needs a comment in the code, and the RID fallback should be
based aroudn checking for a NESTING domain type which includes the
PASID table. (ie ARM and AMD not Intel)

We shouldn't just elevate a random PASID to the RID if it isn't
approprite..

Jason

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 6/9] iommufd: Fault-capable hwpt attach/detach/replace
  2024-05-10  3:20     ` Baolu Lu
@ 2024-05-10 13:39       ` Jason Gunthorpe
  0 siblings, 0 replies; 57+ messages in thread
From: Jason Gunthorpe @ 2024-05-10 13:39 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On Fri, May 10, 2024 at 11:20:01AM +0800, Baolu Lu wrote:
> On 5/8/24 8:18 AM, Jason Gunthorpe wrote:
> > On Tue, Apr 30, 2024 at 10:57:07PM +0800, Lu Baolu wrote:
> > > diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
> > > index 13125c0feecb..6357229bf3b4 100644
> > > --- a/drivers/iommu/iommufd/fault.c
> > > +++ b/drivers/iommu/iommufd/fault.c
> > > @@ -15,6 +15,124 @@
> > >   #include "../iommu-priv.h"
> > >   #include "iommufd_private.h"
> > > +static int iommufd_fault_iopf_enable(struct iommufd_device *idev)
> > > +{
> > > +	int ret;
> > > +
> > > +	if (idev->iopf_enabled)
> > > +		return 0;
> > > +
> > > +	ret = iommu_dev_enable_feature(idev->dev, IOMMU_DEV_FEAT_IOPF);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	idev->iopf_enabled = true;
> > > +
> > > +	return 0;
> > > +}
> > I would greatly prefer we remove this from the drivers :\ I guess it
> > is Ok for now
> > 
> > Doesn't this need a counter? We can have many fault capable PASIDs?
> > That will get changed in the PASID series?
> 
> Okay, let's design this more gracefully after the PASID interfaces are
> landed. For now, we assume that the device driver will do this.

Well, for now to work the device drivers do still need these calls.

I'm trying to get them into NOPs in the drivers so we can remove this.

Jason

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 3/9] iommu: Add attachment handle to struct iopf_group
  2024-05-10 13:38       ` Jason Gunthorpe
@ 2024-05-10 14:30         ` Baolu Lu
  2024-05-10 16:28           ` Jason Gunthorpe
  0 siblings, 1 reply; 57+ messages in thread
From: Baolu Lu @ 2024-05-10 14:30 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On 2024/5/10 21:38, Jason Gunthorpe wrote:
> On Fri, May 10, 2024 at 11:14:20AM +0800, Baolu Lu wrote:
>> On 5/8/24 8:04 AM, Jason Gunthorpe wrote:
>>> On Tue, Apr 30, 2024 at 10:57:04PM +0800, Lu Baolu wrote:
>>>> @@ -206,8 +197,11 @@ void iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
>>>>    	if (group == &abort_group)
>>>>    		goto err_abort;
>>>> -	group->domain = get_domain_for_iopf(dev, fault);
>>>> -	if (!group->domain)
>>>> +	if (!(fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) ||
>>>> +	    get_attach_handle_for_iopf(dev, fault->prm.pasid, group))
>>>> +		get_attach_handle_for_iopf(dev, IOMMU_NO_PASID, group);
>>> That seems a bit weird looking?
>> Agreed.
>>
>>> get_attach_handle_for_iopf(dev,
>>>      (fault->prm.flags &
>>>      IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) ? fault->prm.pasid : IOMMU_NO_PASID,
>>>      group);
>> The logic here is that it tries the PASID domain and if it doesn't
>> exist, then tries the RID domain as well. I explained this in the commit
>> message:
>>
>> "
>> ... if the pasid table of a device is wholly managed by user space,
>> there is no domain attached to the PASID of the device ...
>> "
> Okay, it needs a comment in the code, and the RID fallback should be
> based aroudn checking for a NESTING domain type which includes the
> PASID table. (ie ARM and AMD not Intel)
It appears that Intel is just special. ARM, AMD, and RISC-V all support
a NESTING domain which includes the PASID table. So, how about defining
a special NESTING domain type for Intel? Like this,

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 35ae9a6f73d3..09b4e671dcee 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -173,6 +173,8 @@ struct iommu_domain_geometry {

  #define __IOMMU_DOMAIN_NESTED  (1U << 6)  /* User-managed address 
space nested
                                               on a stage-2 translation 
       */
+#define __IOMMU_DOMAIN_PASID   (1U << 7)  /* User-managed domain for a
+                                             specific PASID*/

  #define IOMMU_DOMAIN_ALLOC_FLAGS ~__IOMMU_DOMAIN_DMA_FQ
  /*
@@ -204,6 +206,9 @@ struct iommu_domain_geometry {
  #define IOMMU_DOMAIN_SVA       (__IOMMU_DOMAIN_SVA)
  #define IOMMU_DOMAIN_PLATFORM  (__IOMMU_DOMAIN_PLATFORM)
  #define IOMMU_DOMAIN_NESTED    (__IOMMU_DOMAIN_NESTED)
+#define IOMMU_DOMAIN_NESTED_PASID                              \
+                               (__IOMMU_DOMAIN_NESTED |        \
+                                __IOMMU_DOMAIN_PASID)

And current IOMMU_DOMAIN_NESTED domain type is just for a nesting domain
with PASID table.

The Intel IOMMU driver will convert to use IOMMU_DOMAIN_NESTED_PASID in
the PASID interface series.

> We shouldn't just elevate a random PASID to the RID if it isn't
> approprite..

If above propose looks good to you, then I just need to add a domain
type check before RID fallback.

Best regards,
baolu

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 3/9] iommu: Add attachment handle to struct iopf_group
  2024-05-10 14:30         ` Baolu Lu
@ 2024-05-10 16:28           ` Jason Gunthorpe
  2024-05-15  7:28             ` Tian, Kevin
  0 siblings, 1 reply; 57+ messages in thread
From: Jason Gunthorpe @ 2024-05-10 16:28 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On Fri, May 10, 2024 at 10:30:10PM +0800, Baolu Lu wrote:

> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 35ae9a6f73d3..09b4e671dcee 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -173,6 +173,8 @@ struct iommu_domain_geometry {
> 
>  #define __IOMMU_DOMAIN_NESTED  (1U << 6)  /* User-managed address space
> nested
>                                               on a stage-2 translation
> */
> +#define __IOMMU_DOMAIN_PASID   (1U << 7)  /* User-managed domain for a
> +                                             specific PASID*/
> 
>  #define IOMMU_DOMAIN_ALLOC_FLAGS ~__IOMMU_DOMAIN_DMA_FQ
>  /*
> @@ -204,6 +206,9 @@ struct iommu_domain_geometry {
>  #define IOMMU_DOMAIN_SVA       (__IOMMU_DOMAIN_SVA)
>  #define IOMMU_DOMAIN_PLATFORM  (__IOMMU_DOMAIN_PLATFORM)
>  #define IOMMU_DOMAIN_NESTED    (__IOMMU_DOMAIN_NESTED)
> +#define IOMMU_DOMAIN_NESTED_PASID                              \
> +                               (__IOMMU_DOMAIN_NESTED |        \
> +                                __IOMMU_DOMAIN_PASID)

Yeah, something like that, I don't know about naming though..

How about

DOMAIN_NESTED = Attachment to RID or PASID
DOMAIN_NESTED_PASID_TABLE = RID and the entire PASID space

?

Jason

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [PATCH v5 1/9] iommu: Introduce domain attachment handle
  2024-04-30 14:57 ` [PATCH v5 1/9] iommu: Introduce domain attachment handle Lu Baolu
@ 2024-05-15  7:17   ` Tian, Kevin
  2024-05-19 10:07     ` Baolu Lu
  0 siblings, 1 reply; 57+ messages in thread
From: Tian, Kevin @ 2024-05-15  7:17 UTC (permalink / raw)
  To: Lu Baolu, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel, Jason Gunthorpe

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Tuesday, April 30, 2024 10:57 PM
> 
> +/* Add an attach handle to the group's pasid array. */
> +static struct iommu_attach_handle *
> +iommu_attach_handle_set(struct iommu_domain *domain,
> +			struct iommu_group *group, ioasid_t pasid)
> +{
> +	struct iommu_attach_handle *handle;
> +	void *curr;
> +
> +	handle = kzalloc(sizeof(*handle), GFP_KERNEL);
> +	if (!handle)
> +		return ERR_PTR(-ENOMEM);
> +
> +	handle->domain = domain;
> +	curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, handle,
> GFP_KERNEL);

this could be just xa_insert() which returns -EBUSY if the entry isn't NULL.

> +	if (curr) {
> +		kfree(handle);
> +		return xa_err(curr) ? curr : ERR_PTR(-EBUSY);

'curr' is not an error pointer. You need ERR_PTR(xa_err(curr)).

>  int iommu_attach_group(struct iommu_domain *domain, struct
> iommu_group *group)
>  {
> +	struct iommu_attach_handle *handle;
>  	int ret;
> 
>  	mutex_lock(&group->mutex);
> +	handle = iommu_attach_handle_set(domain, group,
> IOMMU_NO_PASID);
> +	if (IS_ERR(handle)) {
> +		ret = PTR_ERR(handle);
> +		goto out_unlock;
> +	}
>  	ret = __iommu_attach_group(domain, group);
> +	if (ret)
> +		goto out_remove_handle;
>  	mutex_unlock(&group->mutex);

It's slightly better to set the handler after __iommu_attach_group().

doing so is aligned to the sequence in iommu_group_replace_domain().

> @@ -2211,13 +2307,33 @@ EXPORT_SYMBOL_GPL(iommu_attach_group);
>  int iommu_group_replace_domain(struct iommu_group *group,
>  			       struct iommu_domain *new_domain)
>  {
> +	struct iommu_domain *old_domain = group->domain;
> +	struct iommu_attach_handle *handle;
>  	int ret;
> 
>  	if (!new_domain)
>  		return -EINVAL;
> 
> +	if (new_domain == old_domain)
> +		return 0;
> +

__iommu_group_set_domain() already does the check.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [PATCH v5 2/9] iommu: Replace sva_iommu with iommu_attach_handle
  2024-04-30 14:57 ` [PATCH v5 2/9] iommu: Replace sva_iommu with iommu_attach_handle Lu Baolu
  2024-05-07 23:58   ` Jason Gunthorpe
@ 2024-05-15  7:21   ` Tian, Kevin
  2024-05-19 10:14     ` Baolu Lu
  1 sibling, 1 reply; 57+ messages in thread
From: Tian, Kevin @ 2024-05-15  7:21 UTC (permalink / raw)
  To: Lu Baolu, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Tuesday, April 30, 2024 10:57 PM
> 
>  #else
> -static inline struct iommu_sva *
> +static inline struct iommu_attach_handle *
>  iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
>  {
> -	return NULL;
> +	return ERR_PTR(-ENODEV);
>  }
> 

this should be a separate fix.

existing drivers (idxd and uacce) only check IS_ERR() on the return
value. A Null pointer may lead to an error reported at a later point.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [PATCH v5 3/9] iommu: Add attachment handle to struct iopf_group
  2024-05-10 16:28           ` Jason Gunthorpe
@ 2024-05-15  7:28             ` Tian, Kevin
  0 siblings, 0 replies; 57+ messages in thread
From: Tian, Kevin @ 2024-05-15  7:28 UTC (permalink / raw)
  To: Jason Gunthorpe, Baolu Lu
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Jean-Philippe Brucker,
	Nicolin Chen, Liu, Yi L, Jacob Pan, Joel Granados, iommu,
	virtualization, linux-kernel

> From: Jason Gunthorpe <jgg@ziepe.ca>
> Sent: Saturday, May 11, 2024 12:29 AM
> 
> On Fri, May 10, 2024 at 10:30:10PM +0800, Baolu Lu wrote:
> 
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 35ae9a6f73d3..09b4e671dcee 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -173,6 +173,8 @@ struct iommu_domain_geometry {
> >
> >  #define __IOMMU_DOMAIN_NESTED  (1U << 6)  /* User-managed address
> space
> > nested
> >                                               on a stage-2 translation
> > */
> > +#define __IOMMU_DOMAIN_PASID   (1U << 7)  /* User-managed domain
> for a
> > +                                             specific PASID*/
> >
> >  #define IOMMU_DOMAIN_ALLOC_FLAGS ~__IOMMU_DOMAIN_DMA_FQ
> >  /*
> > @@ -204,6 +206,9 @@ struct iommu_domain_geometry {
> >  #define IOMMU_DOMAIN_SVA       (__IOMMU_DOMAIN_SVA)
> >  #define IOMMU_DOMAIN_PLATFORM  (__IOMMU_DOMAIN_PLATFORM)
> >  #define IOMMU_DOMAIN_NESTED    (__IOMMU_DOMAIN_NESTED)
> > +#define IOMMU_DOMAIN_NESTED_PASID                              \
> > +                               (__IOMMU_DOMAIN_NESTED |        \
> > +                                __IOMMU_DOMAIN_PASID)
> 
> Yeah, something like that, I don't know about naming though..
> 
> How about
> 
> DOMAIN_NESTED = Attachment to RID or PASID
> DOMAIN_NESTED_PASID_TABLE = RID and the entire PASID space
> 

this sounds clearer

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [PATCH v5 3/9] iommu: Add attachment handle to struct iopf_group
  2024-04-30 14:57 ` [PATCH v5 3/9] iommu: Add attachment handle to struct iopf_group Lu Baolu
  2024-05-08  0:04   ` Jason Gunthorpe
@ 2024-05-15  7:31   ` Tian, Kevin
  2024-05-19 14:03     ` Baolu Lu
  1 sibling, 1 reply; 57+ messages in thread
From: Tian, Kevin @ 2024-05-15  7:31 UTC (permalink / raw)
  To: Lu Baolu, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Tuesday, April 30, 2024 10:57 PM
> 
> Previously, the domain that a page fault targets is stored in an
> iopf_group, which represents a minimal set of page faults. With the
> introduction of attachment handle, replace the domain with the handle

It's better to use 'attach handle' as the code does.

> +	handle = iommu_attach_handle_get(dev->iommu_group, pasid, 0);
> +	if (IS_ERR(handle))
> +		return PTR_ERR(handle);
> 
> -	if (!domain || !domain->iopf_handler) {
> -		dev_warn_ratelimited(dev,
> -			"iopf (pasid %d) without domain attached or handler
> installed\n",
> -			 fault->prm.pasid);
> +	group->attach_handle = handle;
> +	group->domain = handle->domain;

this change also removes the warning message. Is it desired?


^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [PATCH v5 4/9] iommufd: Add fault and response message definitions
  2024-04-30 14:57 ` [PATCH v5 4/9] iommufd: Add fault and response message definitions Lu Baolu
@ 2024-05-15  7:43   ` Tian, Kevin
  2024-05-19 14:37     ` Baolu Lu
  0 siblings, 1 reply; 57+ messages in thread
From: Tian, Kevin @ 2024-05-15  7:43 UTC (permalink / raw)
  To: Lu Baolu, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Tuesday, April 30, 2024 10:57 PM
> 
> iommu_hwpt_pgfaults represent fault messages that the userspace can
> retrieve. Multiple iommu_hwpt_pgfaults might be put in an iopf group,
> with the IOMMU_PGFAULT_FLAGS_LAST_PAGE flag set only for the last
> iommu_hwpt_pgfault.

Do you envision extending the same structure to report unrecoverable
fault in the future?

If yes this could be named more neutral e.g. iommu_hwpt_faults with
flags to indicate it's a recoverable PRI request.

If it's only for PRI probably iommu_hwpt_pgreqs is clearer.

> +
> +/**
> + * struct iommu_hwpt_pgfault - iommu page fault data
> + * @size: sizeof(struct iommu_hwpt_pgfault)
> + * @flags: Combination of enum iommu_hwpt_pgfault_flags
> + * @dev_id: id of the originated device
> + * @pasid: Process Address Space ID
> + * @grpid: Page Request Group Index
> + * @perm: Combination of enum iommu_hwpt_pgfault_perm
> + * @addr: Page address

'Fault address'

> + * @length: a hint of how much data the requestor is expecting to fetch. For
> + *          example, if the PRI initiator knows it is going to do a 10MB
> + *          transfer, it could fill in 10MB and the OS could pre-fault in
> + *          10MB of IOVA. It's default to 0 if there's no such hint.

This is not clear to me and I don't remember PCIe spec defines such
mechanism.

> +/**
> + * enum iommufd_page_response_code - Return status of fault handlers
> + * @IOMMUFD_PAGE_RESP_SUCCESS: Fault has been handled and the page
> tables
> + *                             populated, retry the access. This is the
> + *                             "Success" defined in PCI 10.4.2.1.
> + * @IOMMUFD_PAGE_RESP_INVALID: General error. Drop all subsequent
> faults
> + *                             from this device if possible. This is the
> + *                             "Response Failure" in PCI 10.4.2.1.
> + * @IOMMUFD_PAGE_RESP_FAILURE: Could not handle this fault, don't
> retry the
> + *                             access. This is the "Invalid Request" in PCI
> + *                             10.4.2.1.

the comment for 'INVALID' and 'FAILURE' are misplaced. Also I'd more
use the spec words to be accurate.

> + */
> +enum iommufd_page_response_code {
> +	IOMMUFD_PAGE_RESP_SUCCESS = 0,
> +	IOMMUFD_PAGE_RESP_INVALID,
> +	IOMMUFD_PAGE_RESP_FAILURE,
> +};
> +

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [PATCH v5 5/9] iommufd: Add iommufd fault object
  2024-05-08 10:05     ` Baolu Lu
@ 2024-05-15  7:57       ` Tian, Kevin
  2024-05-20  0:41         ` Baolu Lu
  0 siblings, 1 reply; 57+ messages in thread
From: Tian, Kevin @ 2024-05-15  7:57 UTC (permalink / raw)
  To: Baolu Lu, Jason Gunthorpe
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Jean-Philippe Brucker,
	Nicolin Chen, Liu, Yi L, Jacob Pan, Joel Granados, iommu,
	virtualization, linux-kernel

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Wednesday, May 8, 2024 6:05 PM
> 
> On 2024/5/8 8:11, Jason Gunthorpe wrote:
> > On Tue, Apr 30, 2024 at 10:57:06PM +0800, Lu Baolu wrote:
> >> diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
> >> index ae65e0b85d69..1a0450a83bd0 100644
> >> --- a/drivers/iommu/iommu-priv.h
> >> +++ b/drivers/iommu/iommu-priv.h
> >> @@ -36,6 +36,10 @@ struct iommu_attach_handle {
> >>   			struct device	*dev;
> >>   			refcount_t	users;
> >>   		};
> >> +		/* attach data for IOMMUFD */
> >> +		struct {
> >> +			void		*idev;
> >> +		};
> > We can use a proper type here, just forward declare it.
> >
> > But this sequence in the other patch:
> >
> > +       ret = iommu_attach_group(hwpt->domain, idev->igroup->group);
> > +       if (ret) {
> > +               iommufd_fault_iopf_disable(idev);
> > +               return ret;
> > +       }
> > +
> > +       handle = iommu_attach_handle_get(idev->igroup->group,
> IOMMU_NO_PASID, 0);
> > +       handle->idev = idev;
> >
> > Is why I was imagining the caller would allocate, because now we have
> > the issue that a fault capable domain was installed into the IOMMU
> > before it's handle could be fully setup, so we have a race where a
> > fault could come in right between those things. Then what happens?
> > I suppose we can retry the fault and by the time it comes back the
> > race should resolve. A bit ugly I suppose.
> 
> You are right. It makes more sense if the attached data is allocated and
> managed by the caller. I will go in this direction and update my series.
> I will also consider other review comments you have given in other
> places.
> 

Does this direction imply a new iommu_attach_group_handle() helper
to pass in the caller-allocated handle pointer or exposing a new
iommu_group_set_handle() to set the handle to the group pasid_array 
and then having iomm_attach_group() to update the domain info in
the handle?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [PATCH v5 5/9] iommufd: Add iommufd fault object
  2024-04-30 14:57 ` [PATCH v5 5/9] iommufd: Add iommufd fault object Lu Baolu
  2024-05-08  0:11   ` Jason Gunthorpe
  2024-05-08  0:22   ` Jason Gunthorpe
@ 2024-05-15  8:37   ` Tian, Kevin
  2024-05-20  1:15     ` Baolu Lu
                       ` (3 more replies)
  2 siblings, 4 replies; 57+ messages in thread
From: Tian, Kevin @ 2024-05-15  8:37 UTC (permalink / raw)
  To: Lu Baolu, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Tuesday, April 30, 2024 10:57 PM
> 
> @@ -131,6 +131,9 @@ struct iopf_group {
>  	struct iommu_attach_handle *attach_handle;
>  	/* The device's fault data parameter. */
>  	struct iommu_fault_param *fault_param;
> +	/* Used by handler provider to hook the group on its own lists. */
> +	struct list_head node;
> +	u32 cookie;

better put together with attach_handle.

rename 'node' to 'handle_node'

> @@ -128,6 +128,7 @@ enum iommufd_object_type {
>  	IOMMUFD_OBJ_HWPT_NESTED,
>  	IOMMUFD_OBJ_IOAS,
>  	IOMMUFD_OBJ_ACCESS,
> +	IOMMUFD_OBJ_FAULT,

Agree with Jason that 'FAULT_QUEUE' sounds a clearer object name.

> @@ -395,6 +396,8 @@ struct iommufd_device {
>  	/* always the physical device */
>  	struct device *dev;
>  	bool enforce_cache_coherency;
> +	/* outstanding faults awaiting response indexed by fault group id */
> +	struct xarray faults;

this...

> +struct iommufd_fault {
> +	struct iommufd_object obj;
> +	struct iommufd_ctx *ictx;
> +	struct file *filep;
> +
> +	/* The lists of outstanding faults protected by below mutex. */
> +	struct mutex mutex;
> +	struct list_head deliver;
> +	struct list_head response;

...and here worth a discussion.

First the response list is not used. If continuing the choice of queueing
faults per device it should be removed.

But I wonder whether it makes more sense to keep this response
queue per fault object. sounds simpler to me.

Also it's unclear why we need the response message to carry the
same info as the request while only id/code/cookie are used.

+struct iommu_hwpt_page_response {
+	__u32 size;
+	__u32 flags;
+	__u32 dev_id;
+	__u32 pasid;
+	__u32 grpid;
+	__u32 code;
+	__u32 cookie;
+	__u32 reserved;
+};

If we keep the response queue in the fault object, the response message
only needs to carry size/flags/code/cookie and cookie can identify the
pending message uniquely in the response queue.

> +static ssize_t iommufd_fault_fops_write(struct file *filep, const char __user
> *buf,
> +					size_t count, loff_t *ppos)
> +{
> +	size_t response_size = sizeof(struct iommu_hwpt_page_response);
> +	struct iommufd_fault *fault = filep->private_data;
> +	struct iommu_hwpt_page_response response;
> +	struct iommufd_device *idev = NULL;
> +	struct iopf_group *group;
> +	size_t done = 0;
> +	int rc;
> +
> +	if (*ppos || count % response_size)
> +		return -ESPIPE;
> +
> +	mutex_lock(&fault->mutex);
> +	while (count > done) {
> +		rc = copy_from_user(&response, buf + done, response_size);
> +		if (rc)
> +			break;
> +
> +		if (!idev || idev->obj.id != response.dev_id)
> +			idev = container_of(iommufd_get_object(fault->ictx,
> +							       response.dev_id,
> +
> IOMMUFD_OBJ_DEVICE),
> +					    struct iommufd_device, obj);
> +		if (IS_ERR(idev))
> +			break;
> +
> +		group = xa_erase(&idev->faults, response.cookie);
> +		if (!group)
> +			break;

is 'continue' better?

> +
> +		iopf_group_response(group, response.code);

PCIe spec states that a response failure disables the PRI interface. For SR-IOV
it'd be dangerous allowing user to trigger such code to VF to close the entire
shared PRI interface.

Just another example lacking of coordination for shared capabilities between
PF/VF. But exposing such gap to userspace makes it worse.

I guess we don't want to make this work depending on that cleanup. The
minimal correct thing is to disallow attaching VF to a fault-capable hwpt
with a note here that once we turn on support for VF the response failure
code should not be forwarded to the hardware. Instead it's an indication
that the user cannot serve more requests and such situation waits for
a vPRI reset to recover.

> +		iopf_free_group(group);
> +		done += response_size;
> +
> +		iommufd_put_object(fault->ictx, &idev->obj);

get/put is unpaired:

		if (!idev || idev->obj.id != response.dev_id)
			idev = iommufd_get_object();

		...

		iommufd_put_object(idev);

The intention might be reusing idev if multiple fault responses are
for a same idev. But idev is always put in each iteration then following
messages will access the idev w/o holding the reference.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [PATCH v5 6/9] iommufd: Fault-capable hwpt attach/detach/replace
  2024-04-30 14:57 ` [PATCH v5 6/9] iommufd: Fault-capable hwpt attach/detach/replace Lu Baolu
  2024-05-08  0:18   ` Jason Gunthorpe
@ 2024-05-15  8:43   ` Tian, Kevin
  2024-05-20  2:10     ` Baolu Lu
  1 sibling, 1 reply; 57+ messages in thread
From: Tian, Kevin @ 2024-05-15  8:43 UTC (permalink / raw)
  To: Lu Baolu, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Tuesday, April 30, 2024 10:57 PM
> +
> +int iommufd_fault_domain_replace_dev(struct iommufd_device *idev,
> +				     struct iommufd_hw_pagetable *hwpt,
> +				     struct iommufd_hw_pagetable *old)
> +{
> +	struct iommu_attach_handle *handle;
> +	int ret;
> +
> +	if (hwpt->fault)
> +		ret = iommufd_fault_iopf_enable(idev);
> +	else
> +		iommufd_fault_iopf_disable(idev);
> +
> +	ret = iommu_group_replace_domain(idev->igroup->group, hwpt-
> >domain);
> +	if (ret)
> +		goto out_cleanup;
> +
> +	iommufd_auto_response_faults(old, idev);
> +	handle = iommu_attach_handle_get(idev->igroup->group,
> IOMMU_NO_PASID, 0);
> +	handle->idev = idev;

why is auto response required in replace? new requests can come
after the auto response anyway...

The user should prepare for faults delivered to the old or new hwpt
in the transition window.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [PATCH v5 7/9] iommufd: Associate fault object with iommufd_hw_pgtable
  2024-04-30 14:57 ` [PATCH v5 7/9] iommufd: Associate fault object with iommufd_hw_pgtable Lu Baolu
  2024-05-08  0:25   ` Jason Gunthorpe
@ 2024-05-15  8:50   ` Tian, Kevin
  2024-05-20  2:18     ` Baolu Lu
  1 sibling, 1 reply; 57+ messages in thread
From: Tian, Kevin @ 2024-05-15  8:50 UTC (permalink / raw)
  To: Lu Baolu, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Tuesday, April 30, 2024 10:57 PM
>
> @@ -227,7 +233,7 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx
> *ictx,
>  	refcount_inc(&parent->common.obj.users);
>  	hwpt_nested->parent = parent;
> 
> -	hwpt->domain = ops->domain_alloc_user(idev->dev, flags,
> +	hwpt->domain = ops->domain_alloc_user(idev->dev, 0,
>  					      parent->common.domain,
> user_data);

it reads slightly better to clear the fault bit and still pass in flags.

> @@ -308,6 +314,19 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd
> *ucmd)
>  		goto out_put_pt;
>  	}
> 
> +	if (cmd->flags & IOMMU_HWPT_FAULT_ID_VALID) {
> +		struct iommufd_fault *fault;
> +
> +		fault = iommufd_get_fault(ucmd, cmd->fault_id);
> +		if (IS_ERR(fault)) {
> +			rc = PTR_ERR(fault);
> +			goto out_hwpt;
> +		}
> +		hwpt->fault = fault;
> +		hwpt->domain->iopf_handler = iommufd_fault_iopf_handler;
> +		hwpt->domain->fault_data = hwpt;
> +	}

this is nesting specific. why not moving it to the nested_alloc()?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 1/9] iommu: Introduce domain attachment handle
  2024-05-15  7:17   ` Tian, Kevin
@ 2024-05-19 10:07     ` Baolu Lu
  0 siblings, 0 replies; 57+ messages in thread
From: Baolu Lu @ 2024-05-19 10:07 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: baolu.lu, iommu, virtualization, linux-kernel, Jason Gunthorpe

On 5/15/24 3:17 PM, Tian, Kevin wrote:
>> From: Lu Baolu <baolu.lu@linux.intel.com>
>> Sent: Tuesday, April 30, 2024 10:57 PM
>>
>> +/* Add an attach handle to the group's pasid array. */
>> +static struct iommu_attach_handle *
>> +iommu_attach_handle_set(struct iommu_domain *domain,
>> +			struct iommu_group *group, ioasid_t pasid)
>> +{
>> +	struct iommu_attach_handle *handle;
>> +	void *curr;
>> +
>> +	handle = kzalloc(sizeof(*handle), GFP_KERNEL);
>> +	if (!handle)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	handle->domain = domain;
>> +	curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, handle,
>> GFP_KERNEL);
> 
> this could be just xa_insert() which returns -EBUSY if the entry isn't NULL.

Yes, exactly.

> 
>> +	if (curr) {
>> +		kfree(handle);
>> +		return xa_err(curr) ? curr : ERR_PTR(-EBUSY);
> 
> 'curr' is not an error pointer. You need ERR_PTR(xa_err(curr)).

Fixed.

> 
>>   int iommu_attach_group(struct iommu_domain *domain, struct
>> iommu_group *group)
>>   {
>> +	struct iommu_attach_handle *handle;
>>   	int ret;
>>
>>   	mutex_lock(&group->mutex);
>> +	handle = iommu_attach_handle_set(domain, group,
>> IOMMU_NO_PASID);
>> +	if (IS_ERR(handle)) {
>> +		ret = PTR_ERR(handle);
>> +		goto out_unlock;
>> +	}
>>   	ret = __iommu_attach_group(domain, group);
>> +	if (ret)
>> +		goto out_remove_handle;
>>   	mutex_unlock(&group->mutex);
> 
> It's slightly better to set the handler after __iommu_attach_group().
> 
> doing so is aligned to the sequence in iommu_group_replace_domain().

I did it in this way because erasing an attach handle is easier than
rolling back the group to the previous domain from the perspective of
coding. I didn't realize attaching the group and storing the attach
handle had to be done in a specific order.

> 
>> @@ -2211,13 +2307,33 @@ EXPORT_SYMBOL_GPL(iommu_attach_group);
>>   int iommu_group_replace_domain(struct iommu_group *group,
>>   			       struct iommu_domain *new_domain)
>>   {
>> +	struct iommu_domain *old_domain = group->domain;
>> +	struct iommu_attach_handle *handle;
>>   	int ret;
>>
>>   	if (!new_domain)
>>   		return -EINVAL;
>>
>> +	if (new_domain == old_domain)
>> +		return 0;
>> +
> 
> __iommu_group_set_domain() already does the check.

Removed.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 2/9] iommu: Replace sva_iommu with iommu_attach_handle
  2024-05-15  7:21   ` Tian, Kevin
@ 2024-05-19 10:14     ` Baolu Lu
  2024-05-20  3:18       ` Tian, Kevin
  0 siblings, 1 reply; 57+ messages in thread
From: Baolu Lu @ 2024-05-19 10:14 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: baolu.lu, iommu, virtualization, linux-kernel

On 5/15/24 3:21 PM, Tian, Kevin wrote:
>> From: Lu Baolu <baolu.lu@linux.intel.com>
>> Sent: Tuesday, April 30, 2024 10:57 PM
>>
>>   #else
>> -static inline struct iommu_sva *
>> +static inline struct iommu_attach_handle *
>>   iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
>>   {
>> -	return NULL;
>> +	return ERR_PTR(-ENODEV);
>>   }
>>
> 
> this should be a separate fix.

Yes. It could be a fix.

> 
> existing drivers (idxd and uacce) only check IS_ERR() on the return
> value. A Null pointer may lead to an error reported at a later point.

Though I don't think it could cause any real problem because this
interface should always be called after the sva enabling one.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 3/9] iommu: Add attachment handle to struct iopf_group
  2024-05-15  7:31   ` Tian, Kevin
@ 2024-05-19 14:03     ` Baolu Lu
  2024-05-20  3:20       ` Tian, Kevin
  0 siblings, 1 reply; 57+ messages in thread
From: Baolu Lu @ 2024-05-19 14:03 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: baolu.lu, iommu, virtualization, linux-kernel

On 2024/5/15 15:31, Tian, Kevin wrote:
>> From: Lu Baolu <baolu.lu@linux.intel.com>
>> Sent: Tuesday, April 30, 2024 10:57 PM
>>
>> Previously, the domain that a page fault targets is stored in an
>> iopf_group, which represents a minimal set of page faults. With the
>> introduction of attachment handle, replace the domain with the handle
> 
> It's better to use 'attach handle' as the code does.

Done.

> 
>> +	handle = iommu_attach_handle_get(dev->iommu_group, pasid, 0);
>> +	if (IS_ERR(handle))
>> +		return PTR_ERR(handle);
>>
>> -	if (!domain || !domain->iopf_handler) {
>> -		dev_warn_ratelimited(dev,
>> -			"iopf (pasid %d) without domain attached or handler
>> installed\n",
>> -			 fault->prm.pasid);
>> +	group->attach_handle = handle;
>> +	group->domain = handle->domain;
> 
> this change also removes the warning message. Is it desired?

Not desired.

Perhaps we can add a message when the iopf handling is aborted.
Something like below:

err_abort:
+        dev_warn_ratelimited(dev, "iopf with pasid %d aborted\n",
+                             fault->prm.pasid);
         iopf_group_response(group, IOMMU_PAGE_RESP_FAILURE);

Though I am not sure which is better dev_warn() or dev_info().

Best regards,
baolu

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 4/9] iommufd: Add fault and response message definitions
  2024-05-15  7:43   ` Tian, Kevin
@ 2024-05-19 14:37     ` Baolu Lu
  2024-05-20  3:24       ` Tian, Kevin
  0 siblings, 1 reply; 57+ messages in thread
From: Baolu Lu @ 2024-05-19 14:37 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: baolu.lu, iommu, virtualization, linux-kernel

On 2024/5/15 15:43, Tian, Kevin wrote:
>> From: Lu Baolu <baolu.lu@linux.intel.com>
>> Sent: Tuesday, April 30, 2024 10:57 PM
>>
>> iommu_hwpt_pgfaults represent fault messages that the userspace can
>> retrieve. Multiple iommu_hwpt_pgfaults might be put in an iopf group,
>> with the IOMMU_PGFAULT_FLAGS_LAST_PAGE flag set only for the last
>> iommu_hwpt_pgfault.
> 
> Do you envision extending the same structure to report unrecoverable
> fault in the future?

I am not envisioning extending this to report unrecoverable faults in
the future. The unrecoverable faults are not always related to a hwpt,
and therefore it's more suitable to route them through a viommu object
which is under discussion in Nicolin's series.

> 
> If yes this could be named more neutral e.g. iommu_hwpt_faults with
> flags to indicate it's a recoverable PRI request.
> 
> If it's only for PRI probably iommu_hwpt_pgreqs is clearer.
> 
>> +
>> +/**
>> + * struct iommu_hwpt_pgfault - iommu page fault data
>> + * @size: sizeof(struct iommu_hwpt_pgfault)
>> + * @flags: Combination of enum iommu_hwpt_pgfault_flags
>> + * @dev_id: id of the originated device
>> + * @pasid: Process Address Space ID
>> + * @grpid: Page Request Group Index
>> + * @perm: Combination of enum iommu_hwpt_pgfault_perm
>> + * @addr: Page address
> 
> 'Fault address'

Okay.

> 
>> + * @length: a hint of how much data the requestor is expecting to fetch. For
>> + *          example, if the PRI initiator knows it is going to do a 10MB
>> + *          transfer, it could fill in 10MB and the OS could pre-fault in
>> + *          10MB of IOVA. It's default to 0 if there's no such hint.
> 
> This is not clear to me and I don't remember PCIe spec defines such
> mechanism.

This came up in a previous discussion. While it's not currently part of
the PCI specification and may not be in the future, we'd like to add
this mechanism for potential future advanced device features as it
offers significant optimization benefits.

> 
>> +/**
>> + * enum iommufd_page_response_code - Return status of fault handlers
>> + * @IOMMUFD_PAGE_RESP_SUCCESS: Fault has been handled and the page
>> tables
>> + *                             populated, retry the access. This is the
>> + *                             "Success" defined in PCI 10.4.2.1.
>> + * @IOMMUFD_PAGE_RESP_INVALID: General error. Drop all subsequent
>> faults
>> + *                             from this device if possible. This is the
>> + *                             "Response Failure" in PCI 10.4.2.1.
>> + * @IOMMUFD_PAGE_RESP_FAILURE: Could not handle this fault, don't
>> retry the
>> + *                             access. This is the "Invalid Request" in PCI
>> + *                             10.4.2.1.
> 
> the comment for 'INVALID' and 'FAILURE' are misplaced. Also I'd more
> use the spec words to be accurate.

Yes. Fixed.

> 
>> + */
>> +enum iommufd_page_response_code {
>> +	IOMMUFD_PAGE_RESP_SUCCESS = 0,
>> +	IOMMUFD_PAGE_RESP_INVALID,
>> +	IOMMUFD_PAGE_RESP_FAILURE,
>> +};
>> +

Best regards,
baolu

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 5/9] iommufd: Add iommufd fault object
  2024-05-15  7:57       ` Tian, Kevin
@ 2024-05-20  0:41         ` Baolu Lu
  2024-05-20  3:26           ` Tian, Kevin
  0 siblings, 1 reply; 57+ messages in thread
From: Baolu Lu @ 2024-05-20  0:41 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe
  Cc: baolu.lu, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On 5/15/24 3:57 PM, Tian, Kevin wrote:
>> From: Baolu Lu <baolu.lu@linux.intel.com>
>> Sent: Wednesday, May 8, 2024 6:05 PM
>>
>> On 2024/5/8 8:11, Jason Gunthorpe wrote:
>>> On Tue, Apr 30, 2024 at 10:57:06PM +0800, Lu Baolu wrote:
>>>> diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
>>>> index ae65e0b85d69..1a0450a83bd0 100644
>>>> --- a/drivers/iommu/iommu-priv.h
>>>> +++ b/drivers/iommu/iommu-priv.h
>>>> @@ -36,6 +36,10 @@ struct iommu_attach_handle {
>>>>    			struct device	*dev;
>>>>    			refcount_t	users;
>>>>    		};
>>>> +		/* attach data for IOMMUFD */
>>>> +		struct {
>>>> +			void		*idev;
>>>> +		};
>>> We can use a proper type here, just forward declare it.
>>>
>>> But this sequence in the other patch:
>>>
>>> +       ret = iommu_attach_group(hwpt->domain, idev->igroup->group);
>>> +       if (ret) {
>>> +               iommufd_fault_iopf_disable(idev);
>>> +               return ret;
>>> +       }
>>> +
>>> +       handle = iommu_attach_handle_get(idev->igroup->group,
>> IOMMU_NO_PASID, 0);
>>> +       handle->idev = idev;
>>>
>>> Is why I was imagining the caller would allocate, because now we have
>>> the issue that a fault capable domain was installed into the IOMMU
>>> before it's handle could be fully setup, so we have a race where a
>>> fault could come in right between those things. Then what happens?
>>> I suppose we can retry the fault and by the time it comes back the
>>> race should resolve. A bit ugly I suppose.
>>
>> You are right. It makes more sense if the attached data is allocated and
>> managed by the caller. I will go in this direction and update my series.
>> I will also consider other review comments you have given in other
>> places.
>>
> 
> Does this direction imply a new iommu_attach_group_handle() helper
> to pass in the caller-allocated handle pointer or exposing a new
> iommu_group_set_handle() to set the handle to the group pasid_array
> and then having iomm_attach_group() to update the domain info in
> the handle?

I will add new iommu_attach/replace/detach_group_handle() helpers. Like
below:

+/**
+ * iommu_attach_group_handle - Attach an IOMMU domain to an IOMMU group
+ * @domain: IOMMU domain to attach
+ * @group: IOMMU group that will be attached
+ * @handle: attach handle
+ *
+ * Returns 0 on success and error code on failure.
+ *
+ * This is a variant of iommu_attach_group(). It allows the caller to 
provide
+ * an attach handle and use it when the domain is attached. This is 
currently
+ * only designed for IOMMUFD to deliver the I/O page faults.
+ */
+int iommu_attach_group_handle(struct iommu_domain *domain,
+                             struct iommu_group *group,
+                             struct iommu_attach_handle *handle)

Best regards,
baolu

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 5/9] iommufd: Add iommufd fault object
  2024-05-15  8:37   ` Tian, Kevin
@ 2024-05-20  1:15     ` Baolu Lu
  2024-05-20  1:24     ` Baolu Lu
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 57+ messages in thread
From: Baolu Lu @ 2024-05-20  1:15 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: baolu.lu, iommu, virtualization, linux-kernel

On 5/15/24 4:37 PM, Tian, Kevin wrote:
>> @@ -395,6 +396,8 @@ struct iommufd_device {
>>   	/* always the physical device */
>>   	struct device *dev;
>>   	bool enforce_cache_coherency;
>> +	/* outstanding faults awaiting response indexed by fault group id */
>> +	struct xarray faults;
> this...
> 
>> +struct iommufd_fault {
>> +	struct iommufd_object obj;
>> +	struct iommufd_ctx *ictx;
>> +	struct file *filep;
>> +
>> +	/* The lists of outstanding faults protected by below mutex. */
>> +	struct mutex mutex;
>> +	struct list_head deliver;
>> +	struct list_head response;
> ...and here worth a discussion.
> 
> First the response list is not used. If continuing the choice of queueing
> faults per device it should be removed.

You are right. I have removed the response list in the new version.

> 
> But I wonder whether it makes more sense to keep this response
> queue per fault object. sounds simpler to me.
> 
> Also it's unclear why we need the response message to carry the
> same info as the request while only id/code/cookie are used.
> 
> +struct iommu_hwpt_page_response {
> +	__u32 size;
> +	__u32 flags;
> +	__u32 dev_id;
> +	__u32 pasid;
> +	__u32 grpid;
> +	__u32 code;
> +	__u32 cookie;
> +	__u32 reserved;
> +};
> 
> If we keep the response queue in the fault object, the response message
> only needs to carry size/flags/code/cookie and cookie can identify the
> pending message uniquely in the response queue.

It seems fine from the code's point of view. Let's wait and see whether
there are any concerns from the uAPI's perspective.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 5/9] iommufd: Add iommufd fault object
  2024-05-15  8:37   ` Tian, Kevin
  2024-05-20  1:15     ` Baolu Lu
@ 2024-05-20  1:24     ` Baolu Lu
  2024-05-20  1:33     ` Baolu Lu
  2024-05-20  1:38     ` Baolu Lu
  3 siblings, 0 replies; 57+ messages in thread
From: Baolu Lu @ 2024-05-20  1:24 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: baolu.lu, iommu, virtualization, linux-kernel

On 5/15/24 4:37 PM, Tian, Kevin wrote:
>> +static ssize_t iommufd_fault_fops_write(struct file *filep, const char __user
>> *buf,
>> +					size_t count, loff_t *ppos)
>> +{
>> +	size_t response_size = sizeof(struct iommu_hwpt_page_response);
>> +	struct iommufd_fault *fault = filep->private_data;
>> +	struct iommu_hwpt_page_response response;
>> +	struct iommufd_device *idev = NULL;
>> +	struct iopf_group *group;
>> +	size_t done = 0;
>> +	int rc;
>> +
>> +	if (*ppos || count % response_size)
>> +		return -ESPIPE;
>> +
>> +	mutex_lock(&fault->mutex);
>> +	while (count > done) {
>> +		rc = copy_from_user(&response, buf + done, response_size);
>> +		if (rc)
>> +			break;
>> +
>> +		if (!idev || idev->obj.id != response.dev_id)
>> +			idev = container_of(iommufd_get_object(fault->ictx,
>> +							       response.dev_id,
>> +
>> IOMMUFD_OBJ_DEVICE),
>> +					    struct iommufd_device, obj);
>> +		if (IS_ERR(idev))
>> +			break;
>> +
>> +		group = xa_erase(&idev->faults, response.cookie);
>> +		if (!group)
>> +			break;
> is 'continue' better?

If we can't find a matched iopf group here, it means userspace provided
something wrong. The current logic is that we stop here and tell
userspace that only part of the faults have been responded to and it
should retry the remaining responses with the right message.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 5/9] iommufd: Add iommufd fault object
  2024-05-15  8:37   ` Tian, Kevin
  2024-05-20  1:15     ` Baolu Lu
  2024-05-20  1:24     ` Baolu Lu
@ 2024-05-20  1:33     ` Baolu Lu
  2024-05-20  3:33       ` Tian, Kevin
  2024-05-20  1:38     ` Baolu Lu
  3 siblings, 1 reply; 57+ messages in thread
From: Baolu Lu @ 2024-05-20  1:33 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: baolu.lu, iommu, virtualization, linux-kernel

On 5/15/24 4:37 PM, Tian, Kevin wrote:
>> +
>> +		iopf_group_response(group, response.code);
> PCIe spec states that a response failure disables the PRI interface. For SR-IOV
> it'd be dangerous allowing user to trigger such code to VF to close the entire
> shared PRI interface.
> 
> Just another example lacking of coordination for shared capabilities between
> PF/VF. But exposing such gap to userspace makes it worse.

Yes. You are right.

> 
> I guess we don't want to make this work depending on that cleanup. The
> minimal correct thing is to disallow attaching VF to a fault-capable hwpt
> with a note here that once we turn on support for VF the response failure
> code should not be forwarded to the hardware. Instead it's an indication
> that the user cannot serve more requests and such situation waits for
> a vPRI reset to recover.

Is it the same thing to disallow PRI for VF in IOMMUFD?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 5/9] iommufd: Add iommufd fault object
  2024-05-15  8:37   ` Tian, Kevin
                       ` (2 preceding siblings ...)
  2024-05-20  1:33     ` Baolu Lu
@ 2024-05-20  1:38     ` Baolu Lu
  3 siblings, 0 replies; 57+ messages in thread
From: Baolu Lu @ 2024-05-20  1:38 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: baolu.lu, iommu, virtualization, linux-kernel

On 5/15/24 4:37 PM, Tian, Kevin wrote:
>> +		iopf_free_group(group);
>> +		done += response_size;
>> +
>> +		iommufd_put_object(fault->ictx, &idev->obj);
> get/put is unpaired:
> 
> 		if (!idev || idev->obj.id != response.dev_id)
> 			idev = iommufd_get_object();
> 
> 		...
> 
> 		iommufd_put_object(idev);
> 
> The intention might be reusing idev if multiple fault responses are
> for a same idev. But idev is always put in each iteration then following
> messages will access the idev w/o holding the reference.

Good catch. Let me fix it by putting the response queue in the fault
object.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 6/9] iommufd: Fault-capable hwpt attach/detach/replace
  2024-05-15  8:43   ` Tian, Kevin
@ 2024-05-20  2:10     ` Baolu Lu
  2024-05-20  3:35       ` Tian, Kevin
  0 siblings, 1 reply; 57+ messages in thread
From: Baolu Lu @ 2024-05-20  2:10 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: baolu.lu, iommu, virtualization, linux-kernel

On 5/15/24 4:43 PM, Tian, Kevin wrote:
>> From: Lu Baolu <baolu.lu@linux.intel.com>
>> Sent: Tuesday, April 30, 2024 10:57 PM
>> +
>> +int iommufd_fault_domain_replace_dev(struct iommufd_device *idev,
>> +				     struct iommufd_hw_pagetable *hwpt,
>> +				     struct iommufd_hw_pagetable *old)
>> +{
>> +	struct iommu_attach_handle *handle;
>> +	int ret;
>> +
>> +	if (hwpt->fault)
>> +		ret = iommufd_fault_iopf_enable(idev);
>> +	else
>> +		iommufd_fault_iopf_disable(idev);
>> +
>> +	ret = iommu_group_replace_domain(idev->igroup->group, hwpt-
>>> domain);
>> +	if (ret)
>> +		goto out_cleanup;
>> +
>> +	iommufd_auto_response_faults(old, idev);
>> +	handle = iommu_attach_handle_get(idev->igroup->group,
>> IOMMU_NO_PASID, 0);
>> +	handle->idev = idev;
> 
> why is auto response required in replace? new requests can come
> after the auto response anyway...
> 
> The user should prepare for faults delivered to the old or new hwpt
> in the transition window.

The current design of replace allows switching between one that is not
IOPF-capable and one that is. This implies that if we switch from an
IOPF-capable hwpt to a non-IOPF-capable one, the response queue needs to
be auto responded.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 7/9] iommufd: Associate fault object with iommufd_hw_pgtable
  2024-05-15  8:50   ` Tian, Kevin
@ 2024-05-20  2:18     ` Baolu Lu
  2024-05-20  3:39       ` Tian, Kevin
  0 siblings, 1 reply; 57+ messages in thread
From: Baolu Lu @ 2024-05-20  2:18 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: baolu.lu, iommu, virtualization, linux-kernel

On 5/15/24 4:50 PM, Tian, Kevin wrote:
>> From: Lu Baolu <baolu.lu@linux.intel.com>
>> Sent: Tuesday, April 30, 2024 10:57 PM
>>
>> @@ -227,7 +233,7 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx
>> *ictx,
>>   	refcount_inc(&parent->common.obj.users);
>>   	hwpt_nested->parent = parent;
>>
>> -	hwpt->domain = ops->domain_alloc_user(idev->dev, flags,
>> +	hwpt->domain = ops->domain_alloc_user(idev->dev, 0,
>>   					      parent->common.domain,
>> user_data);
> 
> it reads slightly better to clear the fault bit and still pass in flags.
> 

Done.

-       hwpt->domain = ops->domain_alloc_user(idev->dev, 0,
+       hwpt->domain = ops->domain_alloc_user(idev->dev,
+                                             flags & 
~IOMMU_HWPT_FAULT_ID_VALID,
                                               parent->common.domain, 
user_data);

>> @@ -308,6 +314,19 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd
>> *ucmd)
>>   		goto out_put_pt;
>>   	}
>>
>> +	if (cmd->flags & IOMMU_HWPT_FAULT_ID_VALID) {
>> +		struct iommufd_fault *fault;
>> +
>> +		fault = iommufd_get_fault(ucmd, cmd->fault_id);
>> +		if (IS_ERR(fault)) {
>> +			rc = PTR_ERR(fault);
>> +			goto out_hwpt;
>> +		}
>> +		hwpt->fault = fault;
>> +		hwpt->domain->iopf_handler = iommufd_fault_iopf_handler;
>> +		hwpt->domain->fault_data = hwpt;
>> +	}
> 
> this is nesting specific. why not moving it to the nested_alloc()?

Nesting is currently a use case for userspace I/O page faults, but this
design should be general enough to support other scenarios as well.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [PATCH v5 2/9] iommu: Replace sva_iommu with iommu_attach_handle
  2024-05-19 10:14     ` Baolu Lu
@ 2024-05-20  3:18       ` Tian, Kevin
  0 siblings, 0 replies; 57+ messages in thread
From: Tian, Kevin @ 2024-05-20  3:18 UTC (permalink / raw)
  To: Baolu Lu, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Sunday, May 19, 2024 6:14 PM
> 
> On 5/15/24 3:21 PM, Tian, Kevin wrote:
> >> From: Lu Baolu <baolu.lu@linux.intel.com>
> >> Sent: Tuesday, April 30, 2024 10:57 PM
> >>
> >>   #else
> >> -static inline struct iommu_sva *
> >> +static inline struct iommu_attach_handle *
> >>   iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
> >>   {
> >> -	return NULL;
> >> +	return ERR_PTR(-ENODEV);
> >>   }
> >>
> >
> > this should be a separate fix.
> 
> Yes. It could be a fix.
> 
> >
> > existing drivers (idxd and uacce) only check IS_ERR() on the return
> > value. A Null pointer may lead to an error reported at a later point.
> 
> Though I don't think it could cause any real problem because this
> interface should always be called after the sva enabling one.
> 

correct.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [PATCH v5 3/9] iommu: Add attachment handle to struct iopf_group
  2024-05-19 14:03     ` Baolu Lu
@ 2024-05-20  3:20       ` Tian, Kevin
  0 siblings, 0 replies; 57+ messages in thread
From: Tian, Kevin @ 2024-05-20  3:20 UTC (permalink / raw)
  To: Baolu Lu, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Sunday, May 19, 2024 10:04 PM
> 
> On 2024/5/15 15:31, Tian, Kevin wrote:
> >> From: Lu Baolu <baolu.lu@linux.intel.com>
> >> Sent: Tuesday, April 30, 2024 10:57 PM
> >>
> >> +	handle = iommu_attach_handle_get(dev->iommu_group, pasid, 0);
> >> +	if (IS_ERR(handle))
> >> +		return PTR_ERR(handle);
> >>
> >> -	if (!domain || !domain->iopf_handler) {
> >> -		dev_warn_ratelimited(dev,
> >> -			"iopf (pasid %d) without domain attached or handler
> >> installed\n",
> >> -			 fault->prm.pasid);
> >> +	group->attach_handle = handle;
> >> +	group->domain = handle->domain;
> >
> > this change also removes the warning message. Is it desired?
> 
> Not desired.
> 
> Perhaps we can add a message when the iopf handling is aborted.
> Something like below:
> 
> err_abort:
> +        dev_warn_ratelimited(dev, "iopf with pasid %d aborted\n",
> +                             fault->prm.pasid);
>          iopf_group_response(group, IOMMU_PAGE_RESP_FAILURE);
> 
> Though I am not sure which is better dev_warn() or dev_info().
> 

yes this works. dev_warn() is fine as long as we don't expect it to
happen in sane cases.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [PATCH v5 4/9] iommufd: Add fault and response message definitions
  2024-05-19 14:37     ` Baolu Lu
@ 2024-05-20  3:24       ` Tian, Kevin
  2024-05-20  3:33         ` Baolu Lu
  0 siblings, 1 reply; 57+ messages in thread
From: Tian, Kevin @ 2024-05-20  3:24 UTC (permalink / raw)
  To: Baolu Lu, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Sunday, May 19, 2024 10:38 PM
> 
> On 2024/5/15 15:43, Tian, Kevin wrote:
> >> From: Lu Baolu <baolu.lu@linux.intel.com>
> >> Sent: Tuesday, April 30, 2024 10:57 PM
> >>
> >> iommu_hwpt_pgfaults represent fault messages that the userspace can
> >> retrieve. Multiple iommu_hwpt_pgfaults might be put in an iopf group,
> >> with the IOMMU_PGFAULT_FLAGS_LAST_PAGE flag set only for the last
> >> iommu_hwpt_pgfault.
> >
> > Do you envision extending the same structure to report unrecoverable
> > fault in the future?
> 
> I am not envisioning extending this to report unrecoverable faults in
> the future. The unrecoverable faults are not always related to a hwpt,
> and therefore it's more suitable to route them through a viommu object
> which is under discussion in Nicolin's series.

OK, I'll take a look at that series when reaching it in my TODO list. 😊

> >> + * @length: a hint of how much data the requestor is expecting to fetch.
> For
> >> + *          example, if the PRI initiator knows it is going to do a 10MB
> >> + *          transfer, it could fill in 10MB and the OS could pre-fault in
> >> + *          10MB of IOVA. It's default to 0 if there's no such hint.
> >
> > This is not clear to me and I don't remember PCIe spec defines such
> > mechanism.
> 
> This came up in a previous discussion. While it's not currently part of

Can you provide a link to that discussion?

> the PCI specification and may not be in the future, we'd like to add
> this mechanism for potential future advanced device features as it
> offers significant optimization benefits.
> 

We design uAPI for real usages. It's a bit weird to introduce a format
for unknown future features w/o an actual user to demonstrate its
correctness.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [PATCH v5 5/9] iommufd: Add iommufd fault object
  2024-05-20  0:41         ` Baolu Lu
@ 2024-05-20  3:26           ` Tian, Kevin
  2024-05-20  3:28             ` Baolu Lu
  0 siblings, 1 reply; 57+ messages in thread
From: Tian, Kevin @ 2024-05-20  3:26 UTC (permalink / raw)
  To: Baolu Lu, Jason Gunthorpe
  Cc: Joerg Roedel, Will Deacon, Robin Murphy, Jean-Philippe Brucker,
	Nicolin Chen, Liu, Yi L, Jacob Pan, Joel Granados, iommu,
	virtualization, linux-kernel

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Monday, May 20, 2024 8:41 AM
> 
> On 5/15/24 3:57 PM, Tian, Kevin wrote:
> >> From: Baolu Lu <baolu.lu@linux.intel.com>
> >> Sent: Wednesday, May 8, 2024 6:05 PM
> >>
> >> On 2024/5/8 8:11, Jason Gunthorpe wrote:
> >>> On Tue, Apr 30, 2024 at 10:57:06PM +0800, Lu Baolu wrote:
> >>>> diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-
> priv.h
> >>>> index ae65e0b85d69..1a0450a83bd0 100644
> >>>> --- a/drivers/iommu/iommu-priv.h
> >>>> +++ b/drivers/iommu/iommu-priv.h
> >>>> @@ -36,6 +36,10 @@ struct iommu_attach_handle {
> >>>>    			struct device	*dev;
> >>>>    			refcount_t	users;
> >>>>    		};
> >>>> +		/* attach data for IOMMUFD */
> >>>> +		struct {
> >>>> +			void		*idev;
> >>>> +		};
> >>> We can use a proper type here, just forward declare it.
> >>>
> >>> But this sequence in the other patch:
> >>>
> >>> +       ret = iommu_attach_group(hwpt->domain, idev->igroup->group);
> >>> +       if (ret) {
> >>> +               iommufd_fault_iopf_disable(idev);
> >>> +               return ret;
> >>> +       }
> >>> +
> >>> +       handle = iommu_attach_handle_get(idev->igroup->group,
> >> IOMMU_NO_PASID, 0);
> >>> +       handle->idev = idev;
> >>>
> >>> Is why I was imagining the caller would allocate, because now we have
> >>> the issue that a fault capable domain was installed into the IOMMU
> >>> before it's handle could be fully setup, so we have a race where a
> >>> fault could come in right between those things. Then what happens?
> >>> I suppose we can retry the fault and by the time it comes back the
> >>> race should resolve. A bit ugly I suppose.
> >>
> >> You are right. It makes more sense if the attached data is allocated and
> >> managed by the caller. I will go in this direction and update my series.
> >> I will also consider other review comments you have given in other
> >> places.
> >>
> >
> > Does this direction imply a new iommu_attach_group_handle() helper
> > to pass in the caller-allocated handle pointer or exposing a new
> > iommu_group_set_handle() to set the handle to the group pasid_array
> > and then having iomm_attach_group() to update the domain info in
> > the handle?
> 
> I will add new iommu_attach/replace/detach_group_handle() helpers. Like
> below:
> 
> +/**
> + * iommu_attach_group_handle - Attach an IOMMU domain to an IOMMU
> group
> + * @domain: IOMMU domain to attach
> + * @group: IOMMU group that will be attached
> + * @handle: attach handle
> + *
> + * Returns 0 on success and error code on failure.
> + *
> + * This is a variant of iommu_attach_group(). It allows the caller to
> provide
> + * an attach handle and use it when the domain is attached. This is
> currently
> + * only designed for IOMMUFD to deliver the I/O page faults.
> + */
> +int iommu_attach_group_handle(struct iommu_domain *domain,
> +                             struct iommu_group *group,
> +                             struct iommu_attach_handle *handle)
> 

"currently only designed for IOMMUFD" doesn't sound correct.

design-wise this can be used by anyone which relies on the handle.
There is nothing tied to IOMMUFD.

s/designed for/used by/ is more accurate.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 5/9] iommufd: Add iommufd fault object
  2024-05-20  3:26           ` Tian, Kevin
@ 2024-05-20  3:28             ` Baolu Lu
  0 siblings, 0 replies; 57+ messages in thread
From: Baolu Lu @ 2024-05-20  3:28 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe
  Cc: baolu.lu, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On 5/20/24 11:26 AM, Tian, Kevin wrote:
>> From: Baolu Lu <baolu.lu@linux.intel.com>
>> Sent: Monday, May 20, 2024 8:41 AM
>>
>> On 5/15/24 3:57 PM, Tian, Kevin wrote:
>>>> From: Baolu Lu <baolu.lu@linux.intel.com>
>>>> Sent: Wednesday, May 8, 2024 6:05 PM
>>>>
>>>> On 2024/5/8 8:11, Jason Gunthorpe wrote:
>>>>> On Tue, Apr 30, 2024 at 10:57:06PM +0800, Lu Baolu wrote:
>>>>>> diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-
>> priv.h
>>>>>> index ae65e0b85d69..1a0450a83bd0 100644
>>>>>> --- a/drivers/iommu/iommu-priv.h
>>>>>> +++ b/drivers/iommu/iommu-priv.h
>>>>>> @@ -36,6 +36,10 @@ struct iommu_attach_handle {
>>>>>>     			struct device	*dev;
>>>>>>     			refcount_t	users;
>>>>>>     		};
>>>>>> +		/* attach data for IOMMUFD */
>>>>>> +		struct {
>>>>>> +			void		*idev;
>>>>>> +		};
>>>>> We can use a proper type here, just forward declare it.
>>>>>
>>>>> But this sequence in the other patch:
>>>>>
>>>>> +       ret = iommu_attach_group(hwpt->domain, idev->igroup->group);
>>>>> +       if (ret) {
>>>>> +               iommufd_fault_iopf_disable(idev);
>>>>> +               return ret;
>>>>> +       }
>>>>> +
>>>>> +       handle = iommu_attach_handle_get(idev->igroup->group,
>>>> IOMMU_NO_PASID, 0);
>>>>> +       handle->idev = idev;
>>>>>
>>>>> Is why I was imagining the caller would allocate, because now we have
>>>>> the issue that a fault capable domain was installed into the IOMMU
>>>>> before it's handle could be fully setup, so we have a race where a
>>>>> fault could come in right between those things. Then what happens?
>>>>> I suppose we can retry the fault and by the time it comes back the
>>>>> race should resolve. A bit ugly I suppose.
>>>>
>>>> You are right. It makes more sense if the attached data is allocated and
>>>> managed by the caller. I will go in this direction and update my series.
>>>> I will also consider other review comments you have given in other
>>>> places.
>>>>
>>>
>>> Does this direction imply a new iommu_attach_group_handle() helper
>>> to pass in the caller-allocated handle pointer or exposing a new
>>> iommu_group_set_handle() to set the handle to the group pasid_array
>>> and then having iomm_attach_group() to update the domain info in
>>> the handle?
>>
>> I will add new iommu_attach/replace/detach_group_handle() helpers. Like
>> below:
>>
>> +/**
>> + * iommu_attach_group_handle - Attach an IOMMU domain to an IOMMU
>> group
>> + * @domain: IOMMU domain to attach
>> + * @group: IOMMU group that will be attached
>> + * @handle: attach handle
>> + *
>> + * Returns 0 on success and error code on failure.
>> + *
>> + * This is a variant of iommu_attach_group(). It allows the caller to
>> provide
>> + * an attach handle and use it when the domain is attached. This is
>> currently
>> + * only designed for IOMMUFD to deliver the I/O page faults.
>> + */
>> +int iommu_attach_group_handle(struct iommu_domain *domain,
>> +                             struct iommu_group *group,
>> +                             struct iommu_attach_handle *handle)
>>
> 
> "currently only designed for IOMMUFD" doesn't sound correct.
> 
> design-wise this can be used by anyone which relies on the handle.
> There is nothing tied to IOMMUFD.
> 
> s/designed for/used by/ is more accurate.

Done.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [PATCH v5 5/9] iommufd: Add iommufd fault object
  2024-05-20  1:33     ` Baolu Lu
@ 2024-05-20  3:33       ` Tian, Kevin
  0 siblings, 0 replies; 57+ messages in thread
From: Tian, Kevin @ 2024-05-20  3:33 UTC (permalink / raw)
  To: Baolu Lu, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Monday, May 20, 2024 9:34 AM
> 
> On 5/15/24 4:37 PM, Tian, Kevin wrote:
> >> +
> >> +		iopf_group_response(group, response.code);
> > PCIe spec states that a response failure disables the PRI interface. For SR-
> IOV
> > it'd be dangerous allowing user to trigger such code to VF to close the
> entire
> > shared PRI interface.
> >
> > Just another example lacking of coordination for shared capabilities
> between
> > PF/VF. But exposing such gap to userspace makes it worse.
> 
> Yes. You are right.
> 
> >
> > I guess we don't want to make this work depending on that cleanup. The
> > minimal correct thing is to disallow attaching VF to a fault-capable hwpt
> > with a note here that once we turn on support for VF the response failure
> > code should not be forwarded to the hardware. Instead it's an indication
> > that the user cannot serve more requests and such situation waits for
> > a vPRI reset to recover.
> 
> Is it the same thing to disallow PRI for VF in IOMMUFD?
> 

yes

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 4/9] iommufd: Add fault and response message definitions
  2024-05-20  3:24       ` Tian, Kevin
@ 2024-05-20  3:33         ` Baolu Lu
  2024-05-20  4:59           ` Tian, Kevin
  0 siblings, 1 reply; 57+ messages in thread
From: Baolu Lu @ 2024-05-20  3:33 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: baolu.lu, iommu, virtualization, linux-kernel

On 5/20/24 11:24 AM, Tian, Kevin wrote:
>> From: Baolu Lu <baolu.lu@linux.intel.com>
>> Sent: Sunday, May 19, 2024 10:38 PM
>>
>> On 2024/5/15 15:43, Tian, Kevin wrote:
>>>> From: Lu Baolu <baolu.lu@linux.intel.com>
>>>> Sent: Tuesday, April 30, 2024 10:57 PM
>>>>
>>>> iommu_hwpt_pgfaults represent fault messages that the userspace can
>>>> retrieve. Multiple iommu_hwpt_pgfaults might be put in an iopf group,
>>>> with the IOMMU_PGFAULT_FLAGS_LAST_PAGE flag set only for the last
>>>> iommu_hwpt_pgfault.
>>>
>>> Do you envision extending the same structure to report unrecoverable
>>> fault in the future?
>>
>> I am not envisioning extending this to report unrecoverable faults in
>> the future. The unrecoverable faults are not always related to a hwpt,
>> and therefore it's more suitable to route them through a viommu object
>> which is under discussion in Nicolin's series.
> 
> OK, I'll take a look at that series when reaching it in my TODO list. 😊
> 
>>>> + * @length: a hint of how much data the requestor is expecting to fetch.
>> For
>>>> + *          example, if the PRI initiator knows it is going to do a 10MB
>>>> + *          transfer, it could fill in 10MB and the OS could pre-fault in
>>>> + *          10MB of IOVA. It's default to 0 if there's no such hint.
>>>
>>> This is not clear to me and I don't remember PCIe spec defines such
>>> mechanism.
>>
>> This came up in a previous discussion. While it's not currently part of
> 
> Can you provide a link to that discussion?

https://lore.kernel.org/linux-iommu/20240322170410.GH66976@ziepe.ca/

> 
>> the PCI specification and may not be in the future, we'd like to add
>> this mechanism for potential future advanced device features as it
>> offers significant optimization benefits.
>>
> 
> We design uAPI for real usages. It's a bit weird to introduce a format
> for unknown future features w/o an actual user to demonstrate its
> correctness.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [PATCH v5 6/9] iommufd: Fault-capable hwpt attach/detach/replace
  2024-05-20  2:10     ` Baolu Lu
@ 2024-05-20  3:35       ` Tian, Kevin
  2024-05-20  3:55         ` Baolu Lu
  0 siblings, 1 reply; 57+ messages in thread
From: Tian, Kevin @ 2024-05-20  3:35 UTC (permalink / raw)
  To: Baolu Lu, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Monday, May 20, 2024 10:10 AM
> 
> On 5/15/24 4:43 PM, Tian, Kevin wrote:
> >> From: Lu Baolu <baolu.lu@linux.intel.com>
> >> Sent: Tuesday, April 30, 2024 10:57 PM
> >> +
> >> +int iommufd_fault_domain_replace_dev(struct iommufd_device *idev,
> >> +				     struct iommufd_hw_pagetable *hwpt,
> >> +				     struct iommufd_hw_pagetable *old)
> >> +{
> >> +	struct iommu_attach_handle *handle;
> >> +	int ret;
> >> +
> >> +	if (hwpt->fault)
> >> +		ret = iommufd_fault_iopf_enable(idev);
> >> +	else
> >> +		iommufd_fault_iopf_disable(idev);
> >> +
> >> +	ret = iommu_group_replace_domain(idev->igroup->group, hwpt-
> >>> domain);
> >> +	if (ret)
> >> +		goto out_cleanup;
> >> +
> >> +	iommufd_auto_response_faults(old, idev);
> >> +	handle = iommu_attach_handle_get(idev->igroup->group,
> >> IOMMU_NO_PASID, 0);
> >> +	handle->idev = idev;
> >
> > why is auto response required in replace? new requests can come
> > after the auto response anyway...
> >
> > The user should prepare for faults delivered to the old or new hwpt
> > in the transition window.
> 
> The current design of replace allows switching between one that is not
> IOPF-capable and one that is. This implies that if we switch from an
> IOPF-capable hwpt to a non-IOPF-capable one, the response queue needs to
> be auto responded.
> 

then do it only for that scenario?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [PATCH v5 7/9] iommufd: Associate fault object with iommufd_hw_pgtable
  2024-05-20  2:18     ` Baolu Lu
@ 2024-05-20  3:39       ` Tian, Kevin
  2024-05-20  4:00         ` Baolu Lu
  0 siblings, 1 reply; 57+ messages in thread
From: Tian, Kevin @ 2024-05-20  3:39 UTC (permalink / raw)
  To: Baolu Lu, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Monday, May 20, 2024 10:19 AM
> 
> On 5/15/24 4:50 PM, Tian, Kevin wrote:
> >> From: Lu Baolu <baolu.lu@linux.intel.com>
> >> Sent: Tuesday, April 30, 2024 10:57 PM
> >>
> >> @@ -308,6 +314,19 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd
> >> *ucmd)
> >>   		goto out_put_pt;
> >>   	}
> >>
> >> +	if (cmd->flags & IOMMU_HWPT_FAULT_ID_VALID) {
> >> +		struct iommufd_fault *fault;
> >> +
> >> +		fault = iommufd_get_fault(ucmd, cmd->fault_id);
> >> +		if (IS_ERR(fault)) {
> >> +			rc = PTR_ERR(fault);
> >> +			goto out_hwpt;
> >> +		}
> >> +		hwpt->fault = fault;
> >> +		hwpt->domain->iopf_handler = iommufd_fault_iopf_handler;
> >> +		hwpt->domain->fault_data = hwpt;
> >> +	}
> >
> > this is nesting specific. why not moving it to the nested_alloc()?
> 
> Nesting is currently a use case for userspace I/O page faults, but this
> design should be general enough to support other scenarios as well.
> 

Do we allow user page table w/o nesting?

What would be a scenario in which the user doesn't manage the
page table but still want to handle the I/O page fault? The fault
should always be delivered to the owner managing the page table...

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 6/9] iommufd: Fault-capable hwpt attach/detach/replace
  2024-05-20  3:35       ` Tian, Kevin
@ 2024-05-20  3:55         ` Baolu Lu
  0 siblings, 0 replies; 57+ messages in thread
From: Baolu Lu @ 2024-05-20  3:55 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: baolu.lu, iommu, virtualization, linux-kernel

On 5/20/24 11:35 AM, Tian, Kevin wrote:
>> From: Baolu Lu <baolu.lu@linux.intel.com>
>> Sent: Monday, May 20, 2024 10:10 AM
>>
>> On 5/15/24 4:43 PM, Tian, Kevin wrote:
>>>> From: Lu Baolu <baolu.lu@linux.intel.com>
>>>> Sent: Tuesday, April 30, 2024 10:57 PM
>>>> +
>>>> +int iommufd_fault_domain_replace_dev(struct iommufd_device *idev,
>>>> +				     struct iommufd_hw_pagetable *hwpt,
>>>> +				     struct iommufd_hw_pagetable *old)
>>>> +{
>>>> +	struct iommu_attach_handle *handle;
>>>> +	int ret;
>>>> +
>>>> +	if (hwpt->fault)
>>>> +		ret = iommufd_fault_iopf_enable(idev);
>>>> +	else
>>>> +		iommufd_fault_iopf_disable(idev);
>>>> +
>>>> +	ret = iommu_group_replace_domain(idev->igroup->group, hwpt-
>>>>> domain);
>>>> +	if (ret)
>>>> +		goto out_cleanup;
>>>> +
>>>> +	iommufd_auto_response_faults(old, idev);
>>>> +	handle = iommu_attach_handle_get(idev->igroup->group,
>>>> IOMMU_NO_PASID, 0);
>>>> +	handle->idev = idev;
>>>
>>> why is auto response required in replace? new requests can come
>>> after the auto response anyway...
>>>
>>> The user should prepare for faults delivered to the old or new hwpt
>>> in the transition window.
>>
>> The current design of replace allows switching between one that is not
>> IOPF-capable and one that is. This implies that if we switch from an
>> IOPF-capable hwpt to a non-IOPF-capable one, the response queue needs to
>> be auto responded.
>>
> 
> then do it only for that scenario?

Yes. Will do this in the new version.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 7/9] iommufd: Associate fault object with iommufd_hw_pgtable
  2024-05-20  3:39       ` Tian, Kevin
@ 2024-05-20  4:00         ` Baolu Lu
  0 siblings, 0 replies; 57+ messages in thread
From: Baolu Lu @ 2024-05-20  4:00 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: baolu.lu, iommu, virtualization, linux-kernel

On 5/20/24 11:39 AM, Tian, Kevin wrote:
>> From: Baolu Lu <baolu.lu@linux.intel.com>
>> Sent: Monday, May 20, 2024 10:19 AM
>>
>> On 5/15/24 4:50 PM, Tian, Kevin wrote:
>>>> From: Lu Baolu <baolu.lu@linux.intel.com>
>>>> Sent: Tuesday, April 30, 2024 10:57 PM
>>>>
>>>> @@ -308,6 +314,19 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd
>>>> *ucmd)
>>>>    		goto out_put_pt;
>>>>    	}
>>>>
>>>> +	if (cmd->flags & IOMMU_HWPT_FAULT_ID_VALID) {
>>>> +		struct iommufd_fault *fault;
>>>> +
>>>> +		fault = iommufd_get_fault(ucmd, cmd->fault_id);
>>>> +		if (IS_ERR(fault)) {
>>>> +			rc = PTR_ERR(fault);
>>>> +			goto out_hwpt;
>>>> +		}
>>>> +		hwpt->fault = fault;
>>>> +		hwpt->domain->iopf_handler = iommufd_fault_iopf_handler;
>>>> +		hwpt->domain->fault_data = hwpt;
>>>> +	}
>>>
>>> this is nesting specific. why not moving it to the nested_alloc()?
>>
>> Nesting is currently a use case for userspace I/O page faults, but this
>> design should be general enough to support other scenarios as well.
>>
> 
> Do we allow user page table w/o nesting?
> 
> What would be a scenario in which the user doesn't manage the
> page table but still want to handle the I/O page fault? The fault
> should always be delivered to the owner managing the page table...

I am not sure. But if nesting is the only case for user page table, it's
fine to move above code to the nested_alloc helper.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [PATCH v5 4/9] iommufd: Add fault and response message definitions
  2024-05-20  3:33         ` Baolu Lu
@ 2024-05-20  4:59           ` Tian, Kevin
  0 siblings, 0 replies; 57+ messages in thread
From: Tian, Kevin @ 2024-05-20  4:59 UTC (permalink / raw)
  To: Baolu Lu, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Monday, May 20, 2024 11:33 AM
> 
> On 5/20/24 11:24 AM, Tian, Kevin wrote:
> >> From: Baolu Lu <baolu.lu@linux.intel.com>
> >> Sent: Sunday, May 19, 2024 10:38 PM
> >>
> >> On 2024/5/15 15:43, Tian, Kevin wrote:
> >>>> From: Lu Baolu <baolu.lu@linux.intel.com>
> >>>> Sent: Tuesday, April 30, 2024 10:57 PM
> >>>>
> >>>> + * @length: a hint of how much data the requestor is expecting to
> fetch.
> >> For
> >>>> + *          example, if the PRI initiator knows it is going to do a 10MB
> >>>> + *          transfer, it could fill in 10MB and the OS could pre-fault in
> >>>> + *          10MB of IOVA. It's default to 0 if there's no such hint.
> >>>
> >>> This is not clear to me and I don't remember PCIe spec defines such
> >>> mechanism.
> >>
> >> This came up in a previous discussion. While it's not currently part of
> >
> > Can you provide a link to that discussion?
> 
> https://lore.kernel.org/linux-iommu/20240322170410.GH66976@ziepe.ca/
> 

We can always extend uAPI for new usages, e.g. having a new flag
bit to indicate the additional filed for carrying the number of pages.
But requiring the user to handle non-zero length now (though trivial)
is unnecessary burden.

Do we want the response message to also carry a length field i.e.
allowing the user to partially fix the fault? 

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2024-05-20  4:59 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-30 14:57 [PATCH v5 0/9] IOMMUFD: Deliver IO page faults to user space Lu Baolu
2024-04-30 14:57 ` [PATCH v5 1/9] iommu: Introduce domain attachment handle Lu Baolu
2024-05-15  7:17   ` Tian, Kevin
2024-05-19 10:07     ` Baolu Lu
2024-04-30 14:57 ` [PATCH v5 2/9] iommu: Replace sva_iommu with iommu_attach_handle Lu Baolu
2024-05-07 23:58   ` Jason Gunthorpe
2024-05-15  7:21   ` Tian, Kevin
2024-05-19 10:14     ` Baolu Lu
2024-05-20  3:18       ` Tian, Kevin
2024-04-30 14:57 ` [PATCH v5 3/9] iommu: Add attachment handle to struct iopf_group Lu Baolu
2024-05-08  0:04   ` Jason Gunthorpe
2024-05-10  3:14     ` Baolu Lu
2024-05-10 13:38       ` Jason Gunthorpe
2024-05-10 14:30         ` Baolu Lu
2024-05-10 16:28           ` Jason Gunthorpe
2024-05-15  7:28             ` Tian, Kevin
2024-05-15  7:31   ` Tian, Kevin
2024-05-19 14:03     ` Baolu Lu
2024-05-20  3:20       ` Tian, Kevin
2024-04-30 14:57 ` [PATCH v5 4/9] iommufd: Add fault and response message definitions Lu Baolu
2024-05-15  7:43   ` Tian, Kevin
2024-05-19 14:37     ` Baolu Lu
2024-05-20  3:24       ` Tian, Kevin
2024-05-20  3:33         ` Baolu Lu
2024-05-20  4:59           ` Tian, Kevin
2024-04-30 14:57 ` [PATCH v5 5/9] iommufd: Add iommufd fault object Lu Baolu
2024-05-08  0:11   ` Jason Gunthorpe
2024-05-08 10:05     ` Baolu Lu
2024-05-15  7:57       ` Tian, Kevin
2024-05-20  0:41         ` Baolu Lu
2024-05-20  3:26           ` Tian, Kevin
2024-05-20  3:28             ` Baolu Lu
2024-05-08  0:22   ` Jason Gunthorpe
2024-05-10  9:13     ` Baolu Lu
2024-05-15  8:37   ` Tian, Kevin
2024-05-20  1:15     ` Baolu Lu
2024-05-20  1:24     ` Baolu Lu
2024-05-20  1:33     ` Baolu Lu
2024-05-20  3:33       ` Tian, Kevin
2024-05-20  1:38     ` Baolu Lu
2024-04-30 14:57 ` [PATCH v5 6/9] iommufd: Fault-capable hwpt attach/detach/replace Lu Baolu
2024-05-08  0:18   ` Jason Gunthorpe
2024-05-10  3:20     ` Baolu Lu
2024-05-10 13:39       ` Jason Gunthorpe
2024-05-15  8:43   ` Tian, Kevin
2024-05-20  2:10     ` Baolu Lu
2024-05-20  3:35       ` Tian, Kevin
2024-05-20  3:55         ` Baolu Lu
2024-04-30 14:57 ` [PATCH v5 7/9] iommufd: Associate fault object with iommufd_hw_pgtable Lu Baolu
2024-05-08  0:25   ` Jason Gunthorpe
2024-05-10  3:23     ` Baolu Lu
2024-05-15  8:50   ` Tian, Kevin
2024-05-20  2:18     ` Baolu Lu
2024-05-20  3:39       ` Tian, Kevin
2024-05-20  4:00         ` Baolu Lu
2024-04-30 14:57 ` [PATCH v5 8/9] iommufd/selftest: Add IOPF support for mock device Lu Baolu
2024-04-30 14:57 ` [PATCH v5 9/9] iommufd/selftest: Add coverage for IOPF test Lu Baolu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).