All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/8] IOMMUFD: Deliver IO page faults to user space
@ 2024-01-22  7:38 Lu Baolu
  2024-01-22  7:38 ` [PATCH v3 1/8] iommu: Add iopf domain attach/detach/replace interface Lu Baolu
                   ` (7 more replies)
  0 siblings, 8 replies; 43+ messages in thread
From: Lu Baolu @ 2024-01-22  7:38 UTC (permalink / raw)
  To: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel, Lu Baolu

This series implements the functionality of delivering IO page faults to
user space through the IOMMUFD framework. One feasible use case is the
nested translation. Nested translation is a hardware feature that
supports two-stage translation tables for IOMMU. The second-stage
translation table is managed by the host VMM, while the first-stage
translation table is owned by user space. This allows user space to
control the IOMMU mappings for its devices.

When an IO page fault occurs on the first-stage translation table, the
IOMMU hardware can deliver the page fault to user space through the
IOMMUFD framework. User space can then handle the page fault and respond
to the device top-down through the IOMMUFD. This allows user space to
implement its own IO page fault handling policies.

User space application that is capable of handling IO page faults should
allocate a fault object, and bind the fault object to any domain that it
is willing to handle the fault generatd for them. On a successful return
of fault object allocation, the user can retrieve and respond to page
faults by reading or writing to the file descriptor (FD) returned.

The iommu selftest framework has been updated to test the IO page fault
delivery and response functionality.

This series is based on the page fault handling framework refactoring
in the IOMMU core [1].

The series and related patches are available on GitHub: [2]

[1] https://lore.kernel.org/linux-iommu/20240122054308.23901-1-baolu.lu@linux.intel.com/
[2] https://github.com/LuBaolu/intel-iommu/commits/iommufd-io-pgfault-delivery-v3

Best regards,
baolu

Change log:

v3:
 - Add iopf domain attach/detach/replace interfaces to manage the
   reference counters of hwpt and device, ensuring that both can only be
   destroyed after all outstanding IOPFs have been responded to. 
 - Relocate the fault handling file descriptor from hwpt to a fault
   object to enable a single fault handling object to be utilized
   across multiple domains.
 - Miscellaneous cleanup and performance improvements.

v2: https://lore.kernel.org/linux-iommu/20231026024930.382898-1-baolu.lu@linux.intel.com/
 - Move all iommu refactoring patches into a sparated series and discuss
   it in a different thread. The latest patch series [v6] is available at
   https://lore.kernel.org/linux-iommu/20230928042734.16134-1-baolu.lu@linux.intel.com/
 - We discussed the timeout of the pending page fault messages. We
   agreed that we shouldn't apply any timeout policy for the page fault
   handling in user space.
   https://lore.kernel.org/linux-iommu/20230616113232.GA84678@myrica/
 - Jason suggested that we adopt a simple file descriptor interface for
   reading and responding to I/O page requests, so that user space
   applications can improve performance using io_uring.
   https://lore.kernel.org/linux-iommu/ZJWjD1ajeem6pK3I@ziepe.ca/

v1: https://lore.kernel.org/linux-iommu/20230530053724.232765-1-baolu.lu@linux.intel.com/

Lu Baolu (8):
  iommu: Add iopf domain attach/detach/replace interface
  iommu/sva: Use iopf domain attach/detach interface
  iommufd: Add fault and response message definitions
  iommufd: Add iommufd fault object
  iommufd: Associate fault object with iommufd_hw_pgtable
  iommufd: IOPF-capable hw page table attach/detach/replace
  iommufd/selftest: Add IOPF support for mock device
  iommufd/selftest: Add coverage for IOPF test

 include/linux/iommu.h                         |  40 +-
 drivers/iommu/iommufd/iommufd_private.h       |  41 ++
 drivers/iommu/iommufd/iommufd_test.h          |   8 +
 include/uapi/linux/iommufd.h                  |  91 ++++
 tools/testing/selftests/iommu/iommufd_utils.h |  83 +++-
 drivers/iommu/io-pgfault.c                    | 215 ++++++++--
 drivers/iommu/iommu-sva.c                     |  48 ++-
 drivers/iommu/iommufd/device.c                |  16 +-
 drivers/iommu/iommufd/fault.c                 | 391 ++++++++++++++++++
 drivers/iommu/iommufd/hw_pagetable.c          |  36 +-
 drivers/iommu/iommufd/main.c                  |   6 +
 drivers/iommu/iommufd/selftest.c              |  63 +++
 tools/testing/selftests/iommu/iommufd.c       |  17 +
 .../selftests/iommu/iommufd_fail_nth.c        |   2 +-
 drivers/iommu/iommufd/Makefile                |   1 +
 15 files changed, 1001 insertions(+), 57 deletions(-)
 create mode 100644 drivers/iommu/iommufd/fault.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v3 1/8] iommu: Add iopf domain attach/detach/replace interface
  2024-01-22  7:38 [PATCH v3 0/8] IOMMUFD: Deliver IO page faults to user space Lu Baolu
@ 2024-01-22  7:38 ` Lu Baolu
  2024-02-07  8:11   ` Tian, Kevin
  2024-01-22  7:38 ` [PATCH v3 2/8] iommu/sva: Use iopf domain attach/detach interface Lu Baolu
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 43+ messages in thread
From: Lu Baolu @ 2024-01-22  7:38 UTC (permalink / raw)
  To: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel, Lu Baolu

There is a slight difference between iopf domains and non-iopf domains.
In the latter, references to domains occur between attach and detach;
While in the former, due to the existence of asynchronous iopf handling
paths, references to the domain may occur after detach, which leads to
potential UAF issues.

Introduce iopf-specific domain attach/detach/replace interface where the
caller provides an attach cookie. This cookie can only be freed after all
outstanding iopf groups are handled and the domain is detached from the
RID or PASID. The presence of this attach cookie indicates that a domain
has been attached to the RID or PASID and won't be released until all
outstanding iopf groups are handled.

The cookie data structure also includes a private field for storing a
caller-specific pointer that will be passed back to its page fault handler.
This field provides flexibility for various uses. For example, the IOMMUFD
could use it to store the iommufd_device pointer, so that it could easily
retrieve the dev_id of the device that triggered the fault.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 include/linux/iommu.h      |  36 +++++++++
 drivers/iommu/io-pgfault.c | 158 +++++++++++++++++++++++++++++++++++++
 2 files changed, 194 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 1ccad10e8164..6d85be23952a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -120,6 +120,16 @@ struct iommu_page_response {
 	u32	code;
 };
 
+struct iopf_attach_cookie {
+	struct iommu_domain *domain;
+	struct device *dev;
+	unsigned int pasid;
+	refcount_t users;
+
+	void *private;
+	void (*release)(struct iopf_attach_cookie *cookie);
+};
+
 struct iopf_fault {
 	struct iommu_fault fault;
 	/* node for pending lists */
@@ -699,6 +709,7 @@ struct iommu_fault_param {
 	struct device *dev;
 	struct iopf_queue *queue;
 	struct list_head queue_list;
+	struct xarray pasid_cookie;
 
 	struct list_head partial;
 	struct list_head faults;
@@ -1552,6 +1563,12 @@ void iopf_free_group(struct iopf_group *group);
 void iommu_report_device_fault(struct device *dev, struct iopf_fault *evt);
 void iopf_group_response(struct iopf_group *group,
 			 enum iommu_page_response_code status);
+int iopf_domain_attach(struct iommu_domain *domain, struct device *dev,
+		       ioasid_t pasid, struct iopf_attach_cookie *cookie);
+void iopf_domain_detach(struct iommu_domain *domain, struct device *dev,
+			ioasid_t pasid);
+int iopf_domain_replace(struct iommu_domain *domain, struct device *dev,
+			ioasid_t pasid, struct iopf_attach_cookie *cookie);
 #else
 static inline int
 iopf_queue_add_device(struct iopf_queue *queue, struct device *dev)
@@ -1596,5 +1613,24 @@ static inline void iopf_group_response(struct iopf_group *group,
 				       enum iommu_page_response_code status)
 {
 }
+
+static inline int iopf_domain_attach(struct iommu_domain *domain,
+				     struct device *dev, ioasid_t pasid,
+				     struct iopf_attach_cookie *cookie)
+{
+	return -ENODEV;
+}
+
+static inline void iopf_domain_detach(struct iommu_domain *domain,
+				      struct device *dev, ioasid_t pasid)
+{
+}
+
+static inline int iopf_domain_replace(struct iommu_domain *domain,
+				      struct device *dev, ioasid_t pasid,
+				      struct iopf_attach_cookie *cookie)
+{
+	return -ENODEV;
+}
 #endif /* CONFIG_IOMMU_IOPF */
 #endif /* __LINUX_IOMMU_H */
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index b64229dab976..f7ce41573799 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -39,6 +39,103 @@ static void iopf_put_dev_fault_param(struct iommu_fault_param *fault_param)
 		kfree_rcu(fault_param, rcu);
 }
 
+/* Get the domain attachment cookie for pasid of a device. */
+static struct iopf_attach_cookie __maybe_unused *
+iopf_pasid_cookie_get(struct device *dev, ioasid_t pasid)
+{
+	struct iommu_fault_param *iopf_param = iopf_get_dev_fault_param(dev);
+	struct iopf_attach_cookie *curr;
+
+	if (!iopf_param)
+		return ERR_PTR(-ENODEV);
+
+	xa_lock(&iopf_param->pasid_cookie);
+	curr = xa_load(&iopf_param->pasid_cookie, pasid);
+	if (curr && !refcount_inc_not_zero(&curr->users))
+		curr = ERR_PTR(-EINVAL);
+	xa_unlock(&iopf_param->pasid_cookie);
+
+	iopf_put_dev_fault_param(iopf_param);
+
+	return curr;
+}
+
+/* Put the domain attachment cookie. */
+static void iopf_pasid_cookie_put(struct iopf_attach_cookie *cookie)
+{
+	if (cookie && refcount_dec_and_test(&cookie->users))
+		cookie->release(cookie);
+}
+
+/*
+ * Set the domain attachment cookie for pasid of a device. Return 0 on
+ * success, or error number on failure.
+ */
+static int iopf_pasid_cookie_set(struct iommu_domain *domain, struct device *dev,
+				 ioasid_t pasid, struct iopf_attach_cookie *cookie)
+{
+	struct iommu_fault_param *iopf_param = iopf_get_dev_fault_param(dev);
+	struct iopf_attach_cookie *curr;
+
+	if (!iopf_param)
+		return -ENODEV;
+
+	refcount_set(&cookie->users, 1);
+	cookie->dev = dev;
+	cookie->pasid = pasid;
+	cookie->domain = domain;
+
+	curr = xa_cmpxchg(&iopf_param->pasid_cookie, pasid, NULL, cookie, GFP_KERNEL);
+	iopf_put_dev_fault_param(iopf_param);
+
+	return curr ? xa_err(curr) : 0;
+}
+
+/* Clear the domain attachment cookie for pasid of a device. */
+static void iopf_pasid_cookie_clear(struct device *dev, ioasid_t pasid)
+{
+	struct iommu_fault_param *iopf_param = iopf_get_dev_fault_param(dev);
+	struct iopf_attach_cookie *curr;
+
+	if (WARN_ON(!iopf_param))
+		return;
+
+	curr = xa_erase(&iopf_param->pasid_cookie, pasid);
+	/* paired with iopf_pasid_cookie_set/replace() */
+	iopf_pasid_cookie_put(curr);
+
+	iopf_put_dev_fault_param(iopf_param);
+}
+
+/* Replace the domain attachment cookie for pasid of a device. */
+static int iopf_pasid_cookie_replace(struct iommu_domain *domain, struct device *dev,
+				     ioasid_t pasid, struct iopf_attach_cookie *cookie)
+{
+	struct iommu_fault_param *iopf_param = iopf_get_dev_fault_param(dev);
+	struct iopf_attach_cookie *curr;
+
+	if (!iopf_param)
+		return -ENODEV;
+
+	if (cookie) {
+		refcount_set(&cookie->users, 1);
+		cookie->dev = dev;
+		cookie->pasid = pasid;
+		cookie->domain = domain;
+	}
+
+	curr = xa_store(&iopf_param->pasid_cookie, pasid, cookie, GFP_KERNEL);
+	if (xa_err(curr))
+		return xa_err(curr);
+
+	/* paired with iopf_pasid_cookie_set/replace() */
+	iopf_pasid_cookie_put(curr);
+
+	iopf_put_dev_fault_param(iopf_param);
+
+	return 0;
+}
+
 static void __iopf_free_group(struct iopf_group *group)
 {
 	struct iopf_fault *iopf, *next;
@@ -362,6 +459,7 @@ int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev)
 	mutex_init(&fault_param->lock);
 	INIT_LIST_HEAD(&fault_param->faults);
 	INIT_LIST_HEAD(&fault_param->partial);
+	xa_init(&fault_param->pasid_cookie);
 	fault_param->dev = dev;
 	refcount_set(&fault_param->users, 1);
 	list_add(&fault_param->queue_list, &queue->devices);
@@ -502,3 +600,63 @@ void iopf_queue_free(struct iopf_queue *queue)
 	kfree(queue);
 }
 EXPORT_SYMBOL_GPL(iopf_queue_free);
+
+int iopf_domain_attach(struct iommu_domain *domain, struct device *dev,
+		       ioasid_t pasid, struct iopf_attach_cookie *cookie)
+{
+	int ret;
+
+	if (!domain->iopf_handler)
+		return -EINVAL;
+
+	if (pasid == IOMMU_NO_PASID)
+		ret = iommu_attach_group(domain, dev->iommu_group);
+	else
+		ret = iommu_attach_device_pasid(domain, dev, pasid);
+	if (ret)
+		return ret;
+
+	ret = iopf_pasid_cookie_set(domain, dev, pasid, cookie);
+	if (ret) {
+		if (pasid == IOMMU_NO_PASID)
+			iommu_detach_group(domain, dev->iommu_group);
+		else
+			iommu_detach_device_pasid(domain, dev, pasid);
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iopf_domain_attach);
+
+void iopf_domain_detach(struct iommu_domain *domain, struct device *dev, ioasid_t pasid)
+{
+	iopf_pasid_cookie_clear(dev, pasid);
+
+	if (pasid == IOMMU_NO_PASID)
+		iommu_detach_group(domain, dev->iommu_group);
+	else
+		iommu_detach_device_pasid(domain, dev, pasid);
+}
+EXPORT_SYMBOL_GPL(iopf_domain_detach);
+
+int iopf_domain_replace(struct iommu_domain *domain, struct device *dev,
+			ioasid_t pasid, struct iopf_attach_cookie *cookie)
+{
+	struct iommu_domain *old_domain = iommu_get_domain_for_dev(dev);
+	int ret;
+
+	if (!old_domain || pasid != IOMMU_NO_PASID ||
+	    (!old_domain->iopf_handler && !domain->iopf_handler))
+		return -EINVAL;
+
+	ret = iommu_group_replace_domain(dev->iommu_group, domain);
+	if (ret)
+		return ret;
+
+	ret = iopf_pasid_cookie_replace(domain, dev, pasid, cookie);
+	if (ret)
+		iommu_group_replace_domain(dev->iommu_group, old_domain);
+
+	return ret;
+}
+EXPORT_SYMBOL_NS_GPL(iopf_domain_replace, IOMMUFD_INTERNAL);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v3 2/8] iommu/sva: Use iopf domain attach/detach interface
  2024-01-22  7:38 [PATCH v3 0/8] IOMMUFD: Deliver IO page faults to user space Lu Baolu
  2024-01-22  7:38 ` [PATCH v3 1/8] iommu: Add iopf domain attach/detach/replace interface Lu Baolu
@ 2024-01-22  7:38 ` Lu Baolu
  2024-03-08 17:46   ` Jason Gunthorpe
  2024-01-22  7:38 ` [PATCH v3 3/8] iommufd: Add fault and response message definitions Lu Baolu
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 43+ messages in thread
From: Lu Baolu @ 2024-01-22  7:38 UTC (permalink / raw)
  To: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel, Lu Baolu

The iommu sva implementation relies on iopf handling. Allocate an
attachment cookie and use the iopf domain attach/detach interface.
The SVA domain is guaranteed to be released after all outstanding
page faults are handled.

In the fault delivering path, the attachment cookie is retrieved,
instead of the domain. This ensures that the page fault is forwarded
only if an iopf-capable domain is attached, and the domain will only
be released after all outstanding faults are handled.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 include/linux/iommu.h      |  2 +-
 drivers/iommu/io-pgfault.c | 59 +++++++++++++++++++-------------------
 drivers/iommu/iommu-sva.c  | 48 ++++++++++++++++++++++++-------
 3 files changed, 68 insertions(+), 41 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 6d85be23952a..511dc7b4bdb2 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -142,9 +142,9 @@ struct iopf_group {
 	/* list node for iommu_fault_param::faults */
 	struct list_head pending_node;
 	struct work_struct work;
-	struct iommu_domain *domain;
 	/* The device's fault data parameter. */
 	struct iommu_fault_param *fault_param;
+	struct iopf_attach_cookie *cookie;
 };
 
 /**
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index f7ce41573799..2567d8c04e46 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -40,7 +40,7 @@ static void iopf_put_dev_fault_param(struct iommu_fault_param *fault_param)
 }
 
 /* Get the domain attachment cookie for pasid of a device. */
-static struct iopf_attach_cookie __maybe_unused *
+static struct iopf_attach_cookie *
 iopf_pasid_cookie_get(struct device *dev, ioasid_t pasid)
 {
 	struct iommu_fault_param *iopf_param = iopf_get_dev_fault_param(dev);
@@ -147,6 +147,7 @@ static void __iopf_free_group(struct iopf_group *group)
 
 	/* Pair with iommu_report_device_fault(). */
 	iopf_put_dev_fault_param(group->fault_param);
+	iopf_pasid_cookie_put(group->cookie);
 }
 
 void iopf_free_group(struct iopf_group *group)
@@ -156,30 +157,6 @@ void iopf_free_group(struct iopf_group *group)
 }
 EXPORT_SYMBOL_GPL(iopf_free_group);
 
-static struct iommu_domain *get_domain_for_iopf(struct device *dev,
-						struct iommu_fault *fault)
-{
-	struct iommu_domain *domain;
-
-	if (fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) {
-		domain = iommu_get_domain_for_dev_pasid(dev, fault->prm.pasid, 0);
-		if (IS_ERR(domain))
-			domain = NULL;
-	} else {
-		domain = iommu_get_domain_for_dev(dev);
-	}
-
-	if (!domain || !domain->iopf_handler) {
-		dev_warn_ratelimited(dev,
-			"iopf (pasid %d) without domain attached or handler installed\n",
-			 fault->prm.pasid);
-
-		return NULL;
-	}
-
-	return domain;
-}
-
 /* Non-last request of a group. Postpone until the last one. */
 static int report_partial_fault(struct iommu_fault_param *fault_param,
 				struct iommu_fault *fault)
@@ -199,10 +176,20 @@ static int report_partial_fault(struct iommu_fault_param *fault_param,
 	return 0;
 }
 
+static ioasid_t fault_to_pasid(struct iommu_fault *fault)
+{
+	if (fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID)
+		return fault->prm.pasid;
+
+	return IOMMU_NO_PASID;
+}
+
 static struct iopf_group *iopf_group_alloc(struct iommu_fault_param *iopf_param,
 					   struct iopf_fault *evt,
 					   struct iopf_group *abort_group)
 {
+	ioasid_t pasid = fault_to_pasid(&evt->fault);
+	struct iopf_attach_cookie *cookie;
 	struct iopf_fault *iopf, *next;
 	struct iopf_group *group;
 
@@ -215,7 +202,23 @@ static struct iopf_group *iopf_group_alloc(struct iommu_fault_param *iopf_param,
 		group = abort_group;
 	}
 
+	cookie = iopf_pasid_cookie_get(iopf_param->dev, pasid);
+	if (!cookie && pasid != IOMMU_NO_PASID)
+		cookie = iopf_pasid_cookie_get(iopf_param->dev, IOMMU_NO_PASID);
+	if (IS_ERR(cookie) || !cookie) {
+		/*
+		 * The PASID of this device was not attached by an I/O-capable
+		 * domain. Ask the caller to abort handling of this fault.
+		 * Otherwise, the reference count will be switched to the new
+		 * iopf group and will be released in iopf_free_group().
+		 */
+		kfree(group);
+		group = abort_group;
+		cookie = NULL;
+	}
+
 	group->fault_param = iopf_param;
+	group->cookie = cookie;
 	group->last_fault.fault = evt->fault;
 	INIT_LIST_HEAD(&group->faults);
 	INIT_LIST_HEAD(&group->pending_node);
@@ -305,15 +308,11 @@ void iommu_report_device_fault(struct device *dev, struct iopf_fault *evt)
 	if (group == &abort_group)
 		goto err_abort;
 
-	group->domain = get_domain_for_iopf(dev, fault);
-	if (!group->domain)
-		goto err_abort;
-
 	/*
 	 * On success iopf_handler must call iopf_group_response() and
 	 * iopf_free_group()
 	 */
-	if (group->domain->iopf_handler(group))
+	if (group->cookie->domain->iopf_handler(group))
 		goto err_abort;
 
 	return;
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index b51995b4fe90..fff3ee1ee9ce 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -50,6 +50,39 @@ static struct iommu_mm_data *iommu_alloc_mm_data(struct mm_struct *mm, struct de
 	return iommu_mm;
 }
 
+static void release_attach_cookie(struct iopf_attach_cookie *cookie)
+{
+	struct iommu_domain *domain = cookie->domain;
+
+	mutex_lock(&iommu_sva_lock);
+	if (--domain->users == 0) {
+		list_del(&domain->next);
+		iommu_domain_free(domain);
+	}
+	mutex_unlock(&iommu_sva_lock);
+
+	kfree(cookie);
+}
+
+static int sva_attach_device_pasid(struct iommu_domain *domain,
+				   struct device *dev, ioasid_t pasid)
+{
+	struct iopf_attach_cookie *cookie;
+	int ret;
+
+	cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
+	if (!cookie)
+		return -ENOMEM;
+
+	cookie->release = release_attach_cookie;
+
+	ret = iopf_domain_attach(domain, dev, pasid, cookie);
+	if (ret)
+		kfree(cookie);
+
+	return ret;
+}
+
 /**
  * iommu_sva_bind_device() - Bind a process address space to a device
  * @dev: the device
@@ -90,7 +123,7 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm
 
 	/* Search for an existing domain. */
 	list_for_each_entry(domain, &mm->iommu_mm->sva_domains, next) {
-		ret = iommu_attach_device_pasid(domain, dev, iommu_mm->pasid);
+		ret = sva_attach_device_pasid(domain, dev, iommu_mm->pasid);
 		if (!ret) {
 			domain->users++;
 			goto out;
@@ -104,7 +137,7 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm
 		goto out_free_handle;
 	}
 
-	ret = iommu_attach_device_pasid(domain, dev, iommu_mm->pasid);
+	ret = sva_attach_device_pasid(domain, dev, iommu_mm->pasid);
 	if (ret)
 		goto out_free_domain;
 	domain->users = 1;
@@ -140,13 +173,7 @@ void iommu_sva_unbind_device(struct iommu_sva *handle)
 	struct iommu_mm_data *iommu_mm = domain->mm->iommu_mm;
 	struct device *dev = handle->dev;
 
-	mutex_lock(&iommu_sva_lock);
-	iommu_detach_device_pasid(domain, dev, iommu_mm->pasid);
-	if (--domain->users == 0) {
-		list_del(&domain->next);
-		iommu_domain_free(domain);
-	}
-	mutex_unlock(&iommu_sva_lock);
+	iopf_domain_detach(domain, dev, iommu_mm->pasid);
 	kfree(handle);
 }
 EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
@@ -242,7 +269,8 @@ static void iommu_sva_handle_iopf(struct work_struct *work)
 		if (status != IOMMU_PAGE_RESP_SUCCESS)
 			break;
 
-		status = iommu_sva_handle_mm(&iopf->fault, group->domain->mm);
+		status = iommu_sva_handle_mm(&iopf->fault,
+					     group->cookie->domain->mm);
 	}
 
 	iopf_group_response(group, status);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v3 3/8] iommufd: Add fault and response message definitions
  2024-01-22  7:38 [PATCH v3 0/8] IOMMUFD: Deliver IO page faults to user space Lu Baolu
  2024-01-22  7:38 ` [PATCH v3 1/8] iommu: Add iopf domain attach/detach/replace interface Lu Baolu
  2024-01-22  7:38 ` [PATCH v3 2/8] iommu/sva: Use iopf domain attach/detach interface Lu Baolu
@ 2024-01-22  7:38 ` Lu Baolu
  2024-03-08 17:50   ` Jason Gunthorpe
  2024-01-22  7:38 ` [PATCH v3 4/8] iommufd: Add iommufd fault object Lu Baolu
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 43+ messages in thread
From: Lu Baolu @ 2024-01-22  7:38 UTC (permalink / raw)
  To: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel, Lu Baolu

iommu_hwpt_pgfaults represent fault messages that the userspace can
retrieve. Multiple iommu_hwpt_pgfaults might be put in an iopf group,
with the IOMMU_PGFAULT_FLAGS_LAST_PAGE flag set only for the last
iommu_hwpt_pgfault.

An iommu_hwpt_page_response is a response message that the userspace
should send to the kernel after finishing handling a group of fault
messages. The @dev_id, @pasid, and @grpid fields in the message
identify an outstanding iopf group for a device. The @addr field,
which matches the fault address of the last fault in the group, will
be used by the kernel for a sanity check.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 include/uapi/linux/iommufd.h | 67 ++++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)

diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 1dfeaa2e649e..d59e839ae49e 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -692,4 +692,71 @@ struct iommu_hwpt_invalidate {
 	__u32 __reserved;
 };
 #define IOMMU_HWPT_INVALIDATE _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_INVALIDATE)
+
+/**
+ * enum iommu_hwpt_pgfault_flags - flags for struct iommu_hwpt_pgfault
+ * @IOMMU_PGFAULT_FLAGS_PASID_VALID: The pasid field of the fault data is
+ *                                   valid.
+ * @IOMMU_PGFAULT_FLAGS_LAST_PAGE: It's the last fault of a fault group.
+ */
+enum iommu_hwpt_pgfault_flags {
+	IOMMU_PGFAULT_FLAGS_PASID_VALID		= (1 << 0),
+	IOMMU_PGFAULT_FLAGS_LAST_PAGE		= (1 << 1),
+};
+
+/**
+ * enum iommu_hwpt_pgfault_perm - perm bits for struct iommu_hwpt_pgfault
+ * @IOMMU_PGFAULT_PERM_READ: request for read permission
+ * @IOMMU_PGFAULT_PERM_WRITE: request for write permission
+ * @IOMMU_PGFAULT_PERM_EXEC: request for execute permission
+ * @IOMMU_PGFAULT_PERM_PRIV: request for privileged permission
+ */
+enum iommu_hwpt_pgfault_perm {
+	IOMMU_PGFAULT_PERM_READ			= (1 << 0),
+	IOMMU_PGFAULT_PERM_WRITE		= (1 << 1),
+	IOMMU_PGFAULT_PERM_EXEC			= (1 << 2),
+	IOMMU_PGFAULT_PERM_PRIV			= (1 << 3),
+};
+
+/**
+ * struct iommu_hwpt_pgfault - iommu page fault data
+ * @size: sizeof(struct iommu_hwpt_pgfault)
+ * @flags: Combination of enum iommu_hwpt_pgfault_flags
+ * @dev_id: id of the originated device
+ * @pasid: Process Address Space ID
+ * @grpid: Page Request Group Index
+ * @perm: Combination of enum iommu_hwpt_pgfault_perm
+ * @addr: page address
+ */
+struct iommu_hwpt_pgfault {
+	__u32 size;
+	__u32 flags;
+	__u32 dev_id;
+	__u32 pasid;
+	__u32 grpid;
+	__u32 perm;
+	__u64 addr;
+};
+
+/**
+ * struct iommu_hwpt_page_response - IOMMU page fault response
+ * @size: sizeof(struct iommu_hwpt_page_response)
+ * @flags: Must be set to 0
+ * @dev_id: device ID of target device for the response
+ * @pasid: Process Address Space ID
+ * @grpid: Page Request Group Index
+ * @code: response code. The supported codes include:
+ *        0: Successful; 1: Response Failure; 2: Invalid Request.
+ * @addr: The fault address. Must match the addr field of the
+ *        last iommu_hwpt_pgfault of a reported iopf group.
+ */
+struct iommu_hwpt_page_response {
+	__u32 size;
+	__u32 flags;
+	__u32 dev_id;
+	__u32 pasid;
+	__u32 grpid;
+	__u32 code;
+	__u64 addr;
+};
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v3 4/8] iommufd: Add iommufd fault object
  2024-01-22  7:38 [PATCH v3 0/8] IOMMUFD: Deliver IO page faults to user space Lu Baolu
                   ` (2 preceding siblings ...)
  2024-01-22  7:38 ` [PATCH v3 3/8] iommufd: Add fault and response message definitions Lu Baolu
@ 2024-01-22  7:38 ` Lu Baolu
  2024-03-08 18:03   ` Jason Gunthorpe
  2024-03-20 16:18   ` Shameerali Kolothum Thodi
  2024-01-22  7:39 ` [PATCH v3 5/8] iommufd: Associate fault object with iommufd_hw_pgtable Lu Baolu
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 43+ messages in thread
From: Lu Baolu @ 2024-01-22  7:38 UTC (permalink / raw)
  To: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel, Lu Baolu

An iommufd fault object provides an interface for delivering I/O page
faults to user space. These objects are created and destroyed by user
space, and they can be associated with or dissociated from hardware page
table objects during page table allocation or destruction.

User space interacts with the fault object through a file interface. This
interface offers a straightforward and efficient way for user space to
handle page faults. It allows user space to read fault messages
sequentially and respond to them by writing to the same file. The file
interface supports reading messages in poll mode, so it's recommended that
user space applications use io_uring to enhance read and write efficiency.

A fault object can be associated with any iopf-capable iommufd_hw_pgtable
during the pgtable's allocation. All I/O page faults triggered by devices
when accessing the I/O addresses of an iommufd_hw_pgtable are routed
through the fault object to user space. Similarly, user space's responses
to these page faults are routed back to the iommu device driver through
the same fault object.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 include/linux/iommu.h                   |   2 +
 drivers/iommu/iommufd/iommufd_private.h |  23 +++
 include/uapi/linux/iommufd.h            |  18 ++
 drivers/iommu/iommufd/device.c          |   1 +
 drivers/iommu/iommufd/fault.c           | 255 ++++++++++++++++++++++++
 drivers/iommu/iommufd/main.c            |   6 +
 drivers/iommu/iommufd/Makefile          |   1 +
 7 files changed, 306 insertions(+)
 create mode 100644 drivers/iommu/iommufd/fault.c

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 511dc7b4bdb2..4372648ac22e 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -145,6 +145,8 @@ struct iopf_group {
 	/* The device's fault data parameter. */
 	struct iommu_fault_param *fault_param;
 	struct iopf_attach_cookie *cookie;
+	/* Used by handler provider to hook the group on its own lists. */
+	struct list_head node;
 };
 
 /**
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 991f864d1f9b..52d83e888bd0 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -128,6 +128,7 @@ enum iommufd_object_type {
 	IOMMUFD_OBJ_HWPT_NESTED,
 	IOMMUFD_OBJ_IOAS,
 	IOMMUFD_OBJ_ACCESS,
+	IOMMUFD_OBJ_FAULT,
 #ifdef CONFIG_IOMMUFD_TEST
 	IOMMUFD_OBJ_SELFTEST,
 #endif
@@ -395,6 +396,8 @@ struct iommufd_device {
 	/* always the physical device */
 	struct device *dev;
 	bool enforce_cache_coherency;
+	/* outstanding faults awaiting response indexed by fault group id */
+	struct xarray faults;
 };
 
 static inline struct iommufd_device *
@@ -426,6 +429,26 @@ void iopt_remove_access(struct io_pagetable *iopt,
 			u32 iopt_access_list_id);
 void iommufd_access_destroy_object(struct iommufd_object *obj);
 
+/*
+ * An iommufd_fault object represents an interface to deliver I/O page faults
+ * to the user space. These objects are created/destroyed by the user space and
+ * associated with hardware page table objects during page-table allocation.
+ */
+struct iommufd_fault {
+	struct iommufd_object obj;
+	struct iommufd_ctx *ictx;
+
+	/* The lists of outstanding faults protected by below mutex. */
+	struct mutex mutex;
+	struct list_head deliver;
+	struct list_head response;
+
+	struct wait_queue_head wait_queue;
+};
+
+int iommufd_fault_alloc(struct iommufd_ucmd *ucmd);
+void iommufd_fault_destroy(struct iommufd_object *obj);
+
 #ifdef CONFIG_IOMMUFD_TEST
 int iommufd_test(struct iommufd_ucmd *ucmd);
 void iommufd_selftest_destroy(struct iommufd_object *obj);
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index d59e839ae49e..c32d62b02306 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -50,6 +50,7 @@ enum {
 	IOMMUFD_CMD_HWPT_SET_DIRTY_TRACKING,
 	IOMMUFD_CMD_HWPT_GET_DIRTY_BITMAP,
 	IOMMUFD_CMD_HWPT_INVALIDATE,
+	IOMMUFD_CMD_FAULT_ALLOC,
 };
 
 /**
@@ -759,4 +760,21 @@ struct iommu_hwpt_page_response {
 	__u32 code;
 	__u64 addr;
 };
+
+/**
+ * struct iommu_fault_alloc - ioctl(IOMMU_FAULT_ALLOC)
+ * @size: sizeof(struct iommu_fault_alloc)
+ * @flags: Must be 0
+ * @out_fault_id: The ID of the new FAULT
+ * @out_fault_fd: The fd of the new FAULT
+ *
+ * Explicitly allocate a fault handling object.
+ */
+struct iommu_fault_alloc {
+	__u32 size;
+	__u32 flags;
+	__u32 out_fault_id;
+	__u32 out_fault_fd;
+};
+#define IOMMU_FAULT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_FAULT_ALLOC)
 #endif
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 873630c111c1..d70913ee8fdf 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -215,6 +215,7 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
 	refcount_inc(&idev->obj.users);
 	/* igroup refcount moves into iommufd_device */
 	idev->igroup = igroup;
+	xa_init(&idev->faults);
 
 	/*
 	 * If the caller fails after this success it must call
diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
new file mode 100644
index 000000000000..9844a85feeb2
--- /dev/null
+++ b/drivers/iommu/iommufd/fault.c
@@ -0,0 +1,255 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (C) 2024 Intel Corporation
+ */
+#define pr_fmt(fmt) "iommufd: " fmt
+
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/iommufd.h>
+#include <linux/poll.h>
+#include <linux/anon_inodes.h>
+#include <uapi/linux/iommufd.h>
+
+#include "iommufd_private.h"
+
+static int device_add_fault(struct iopf_group *group)
+{
+	struct iommufd_device *idev = group->cookie->private;
+	void *curr;
+
+	curr = xa_cmpxchg(&idev->faults, group->last_fault.fault.prm.grpid,
+			  NULL, group, GFP_KERNEL);
+
+	return curr ? xa_err(curr) : 0;
+}
+
+static void device_remove_fault(struct iopf_group *group)
+{
+	struct iommufd_device *idev = group->cookie->private;
+
+	xa_store(&idev->faults, group->last_fault.fault.prm.grpid,
+		 NULL, GFP_KERNEL);
+}
+
+static struct iopf_group *device_get_fault(struct iommufd_device *idev,
+					   unsigned long grpid)
+{
+	return xa_load(&idev->faults, grpid);
+}
+
+void iommufd_fault_destroy(struct iommufd_object *obj)
+{
+	struct iommufd_fault *fault = container_of(obj, struct iommufd_fault, obj);
+	struct iopf_group *group, *next;
+
+	mutex_lock(&fault->mutex);
+	list_for_each_entry_safe(group, next, &fault->deliver, node) {
+		list_del(&group->node);
+		iopf_group_response(group, IOMMU_PAGE_RESP_INVALID);
+		iopf_free_group(group);
+	}
+	list_for_each_entry_safe(group, next, &fault->response, node) {
+		list_del(&group->node);
+		device_remove_fault(group);
+		iopf_group_response(group, IOMMU_PAGE_RESP_INVALID);
+		iopf_free_group(group);
+	}
+	mutex_unlock(&fault->mutex);
+
+	mutex_destroy(&fault->mutex);
+}
+
+static void iommufd_compose_fault_message(struct iommu_fault *fault,
+					  struct iommu_hwpt_pgfault *hwpt_fault,
+					  struct iommufd_device *idev)
+{
+	hwpt_fault->size = sizeof(*hwpt_fault);
+	hwpt_fault->flags = fault->prm.flags;
+	hwpt_fault->dev_id = idev->obj.id;
+	hwpt_fault->pasid = fault->prm.pasid;
+	hwpt_fault->grpid = fault->prm.grpid;
+	hwpt_fault->perm = fault->prm.perm;
+	hwpt_fault->addr = fault->prm.addr;
+}
+
+static ssize_t iommufd_fault_fops_read(struct file *filep, char __user *buf,
+				       size_t count, loff_t *ppos)
+{
+	size_t fault_size = sizeof(struct iommu_hwpt_pgfault);
+	struct iommufd_fault *fault = filep->private_data;
+	struct iommu_hwpt_pgfault data;
+	struct iommufd_device *idev;
+	struct iopf_group *group;
+	struct iopf_fault *iopf;
+	size_t done = 0;
+	int rc;
+
+	if (*ppos || count % fault_size)
+		return -ESPIPE;
+
+	mutex_lock(&fault->mutex);
+	while (!list_empty(&fault->deliver) && count > done) {
+		group = list_first_entry(&fault->deliver,
+					 struct iopf_group, node);
+
+		if (list_count_nodes(&group->faults) * fault_size > count - done)
+			break;
+
+		idev = (struct iommufd_device *)group->cookie->private;
+		list_for_each_entry(iopf, &group->faults, list) {
+			iommufd_compose_fault_message(&iopf->fault, &data, idev);
+			rc = copy_to_user(buf + done, &data, fault_size);
+			if (rc)
+				goto err_unlock;
+			done += fault_size;
+		}
+
+		rc = device_add_fault(group);
+		if (rc)
+			goto err_unlock;
+
+		list_move_tail(&group->node, &fault->response);
+	}
+	mutex_unlock(&fault->mutex);
+
+	return done;
+err_unlock:
+	mutex_unlock(&fault->mutex);
+	return rc;
+}
+
+static ssize_t iommufd_fault_fops_write(struct file *filep, const char __user *buf,
+					size_t count, loff_t *ppos)
+{
+	size_t response_size = sizeof(struct iommu_hwpt_page_response);
+	struct iommufd_fault *fault = filep->private_data;
+	struct iommu_hwpt_page_response response;
+	struct iommufd_device *idev;
+	struct iopf_group *group;
+	size_t done = 0;
+	int rc;
+
+	if (*ppos || count % response_size)
+		return -ESPIPE;
+
+	while (!list_empty(&fault->response) && count > done) {
+		rc = copy_from_user(&response, buf + done, response_size);
+		if (rc)
+			break;
+
+		idev = container_of(iommufd_get_object(fault->ictx,
+						       response.dev_id,
+						       IOMMUFD_OBJ_DEVICE),
+				    struct iommufd_device, obj);
+		if (IS_ERR(idev))
+			break;
+
+		group = device_get_fault(idev, response.grpid);
+		if (!group ||
+		    response.addr != group->last_fault.fault.prm.addr ||
+		    ((group->last_fault.fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) &&
+		      response.pasid != group->last_fault.fault.prm.pasid)) {
+			iommufd_put_object(fault->ictx, &idev->obj);
+			break;
+		}
+
+		iopf_group_response(group, response.code);
+
+		mutex_lock(&fault->mutex);
+		list_del(&group->node);
+		mutex_unlock(&fault->mutex);
+
+		device_remove_fault(group);
+		iopf_free_group(group);
+		done += response_size;
+
+		iommufd_put_object(fault->ictx, &idev->obj);
+	}
+
+	return done;
+}
+
+static __poll_t iommufd_fault_fops_poll(struct file *filep,
+					struct poll_table_struct *wait)
+{
+	struct iommufd_fault *fault = filep->private_data;
+	__poll_t pollflags = 0;
+
+	poll_wait(filep, &fault->wait_queue, wait);
+	mutex_lock(&fault->mutex);
+	if (!list_empty(&fault->deliver))
+		pollflags = EPOLLIN | EPOLLRDNORM;
+	mutex_unlock(&fault->mutex);
+
+	return pollflags;
+}
+
+static const struct file_operations iommufd_fault_fops = {
+	.owner		= THIS_MODULE,
+	.open		= nonseekable_open,
+	.read		= iommufd_fault_fops_read,
+	.write		= iommufd_fault_fops_write,
+	.poll		= iommufd_fault_fops_poll,
+	.llseek		= no_llseek,
+};
+
+static int get_fault_fd(struct iommufd_fault *fault)
+{
+	struct file *filep;
+	int fdno;
+
+	fdno = get_unused_fd_flags(O_CLOEXEC);
+	if (fdno < 0)
+		return fdno;
+
+	filep = anon_inode_getfile("[iommufd-pgfault]", &iommufd_fault_fops,
+				   fault, O_RDWR);
+	if (IS_ERR(filep)) {
+		put_unused_fd(fdno);
+		return PTR_ERR(filep);
+	}
+
+	fd_install(fdno, filep);
+
+	return fdno;
+}
+
+int iommufd_fault_alloc(struct iommufd_ucmd *ucmd)
+{
+	struct iommu_fault_alloc *cmd = ucmd->cmd;
+	struct iommufd_fault *fault;
+	int rc;
+
+	if (cmd->flags)
+		return -EOPNOTSUPP;
+
+	fault = iommufd_object_alloc(ucmd->ictx, fault, IOMMUFD_OBJ_FAULT);
+	if (IS_ERR(fault))
+		return PTR_ERR(fault);
+
+	fault->ictx = ucmd->ictx;
+	INIT_LIST_HEAD(&fault->deliver);
+	INIT_LIST_HEAD(&fault->response);
+	mutex_init(&fault->mutex);
+	init_waitqueue_head(&fault->wait_queue);
+
+	rc = get_fault_fd(fault);
+	if (rc < 0)
+		goto out_abort;
+
+	cmd->out_fault_id = fault->obj.id;
+	cmd->out_fault_fd = rc;
+
+	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
+	if (rc)
+		goto out_abort;
+	iommufd_object_finalize(ucmd->ictx, &fault->obj);
+
+	return 0;
+out_abort:
+	iommufd_object_abort_and_destroy(ucmd->ictx, &fault->obj);
+
+	return rc;
+}
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 39b32932c61e..792db077d33e 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -332,6 +332,7 @@ union ucmd_buffer {
 	struct iommu_ioas_unmap unmap;
 	struct iommu_option option;
 	struct iommu_vfio_ioas vfio_ioas;
+	struct iommu_fault_alloc fault;
 #ifdef CONFIG_IOMMUFD_TEST
 	struct iommu_test_cmd test;
 #endif
@@ -381,6 +382,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
 		 val64),
 	IOCTL_OP(IOMMU_VFIO_IOAS, iommufd_vfio_ioas, struct iommu_vfio_ioas,
 		 __reserved),
+	IOCTL_OP(IOMMU_FAULT_ALLOC, iommufd_fault_alloc, struct iommu_fault_alloc,
+		 out_fault_fd),
 #ifdef CONFIG_IOMMUFD_TEST
 	IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last),
 #endif
@@ -513,6 +516,9 @@ static const struct iommufd_object_ops iommufd_object_ops[] = {
 		.destroy = iommufd_hwpt_nested_destroy,
 		.abort = iommufd_hwpt_nested_abort,
 	},
+	[IOMMUFD_OBJ_FAULT] = {
+		.destroy = iommufd_fault_destroy,
+	},
 #ifdef CONFIG_IOMMUFD_TEST
 	[IOMMUFD_OBJ_SELFTEST] = {
 		.destroy = iommufd_selftest_destroy,
diff --git a/drivers/iommu/iommufd/Makefile b/drivers/iommu/iommufd/Makefile
index 34b446146961..b94a74366eed 100644
--- a/drivers/iommu/iommufd/Makefile
+++ b/drivers/iommu/iommufd/Makefile
@@ -6,6 +6,7 @@ iommufd-y := \
 	ioas.o \
 	main.o \
 	pages.o \
+	fault.o \
 	vfio_compat.o
 
 iommufd-$(CONFIG_IOMMUFD_TEST) += selftest.o
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v3 5/8] iommufd: Associate fault object with iommufd_hw_pgtable
  2024-01-22  7:38 [PATCH v3 0/8] IOMMUFD: Deliver IO page faults to user space Lu Baolu
                   ` (3 preceding siblings ...)
  2024-01-22  7:38 ` [PATCH v3 4/8] iommufd: Add iommufd fault object Lu Baolu
@ 2024-01-22  7:39 ` Lu Baolu
  2024-02-07  8:14   ` Tian, Kevin
                     ` (2 more replies)
  2024-01-22  7:39 ` [PATCH v3 6/8] iommufd: IOPF-capable hw page table attach/detach/replace Lu Baolu
                   ` (2 subsequent siblings)
  7 siblings, 3 replies; 43+ messages in thread
From: Lu Baolu @ 2024-01-22  7:39 UTC (permalink / raw)
  To: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel, Lu Baolu

When allocating a user iommufd_hw_pagetable, the user space is allowed to
associate a fault object with the hw_pagetable by specifying the fault
object ID in the page table allocation data and setting the
IOMMU_HWPT_FAULT_ID_VALID flag bit.

On a successful return of hwpt allocation, the user can retrieve and
respond to page faults by reading and writing the file interface of the
fault object.

Once a fault object has been associated with a hwpt, the hwpt is
iopf-capable, indicated by fault_capable set to true. Attaching,
detaching, or replacing an iopf-capable hwpt to an RID or PASID will
differ from those that are not iopf-capable. The implementation of these
will be introduced in the next patch.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/iommufd/iommufd_private.h | 11 ++++++++
 include/uapi/linux/iommufd.h            |  6 +++++
 drivers/iommu/iommufd/fault.c           | 14 ++++++++++
 drivers/iommu/iommufd/hw_pagetable.c    | 36 +++++++++++++++++++------
 4 files changed, 59 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 52d83e888bd0..2780bed0c6b1 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -293,6 +293,8 @@ int iommufd_check_iova_range(struct io_pagetable *iopt,
 struct iommufd_hw_pagetable {
 	struct iommufd_object obj;
 	struct iommu_domain *domain;
+	struct iommufd_fault *fault;
+	bool fault_capable : 1;
 };
 
 struct iommufd_hwpt_paging {
@@ -446,8 +448,17 @@ struct iommufd_fault {
 	struct wait_queue_head wait_queue;
 };
 
+static inline struct iommufd_fault *
+iommufd_get_fault(struct iommufd_ucmd *ucmd, u32 id)
+{
+	return container_of(iommufd_get_object(ucmd->ictx, id,
+					       IOMMUFD_OBJ_FAULT),
+			    struct iommufd_fault, obj);
+}
+
 int iommufd_fault_alloc(struct iommufd_ucmd *ucmd);
 void iommufd_fault_destroy(struct iommufd_object *obj);
+int iommufd_fault_iopf_handler(struct iopf_group *group);
 
 #ifdef CONFIG_IOMMUFD_TEST
 int iommufd_test(struct iommufd_ucmd *ucmd);
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index c32d62b02306..7481cdd57027 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -357,10 +357,13 @@ struct iommu_vfio_ioas {
  *                                the parent HWPT in a nesting configuration.
  * @IOMMU_HWPT_ALLOC_DIRTY_TRACKING: Dirty tracking support for device IOMMU is
  *                                   enforced on device attachment
+ * @IOMMU_HWPT_FAULT_ID_VALID: The fault_id field of hwpt allocation data is
+ *                             valid.
  */
 enum iommufd_hwpt_alloc_flags {
 	IOMMU_HWPT_ALLOC_NEST_PARENT = 1 << 0,
 	IOMMU_HWPT_ALLOC_DIRTY_TRACKING = 1 << 1,
+	IOMMU_HWPT_FAULT_ID_VALID = 1 << 2,
 };
 
 /**
@@ -411,6 +414,8 @@ enum iommu_hwpt_data_type {
  * @__reserved: Must be 0
  * @data_type: One of enum iommu_hwpt_data_type
  * @data_len: Length of the type specific data
+ * @fault_id: The ID of IOMMUFD_FAULT object. Valid only if flags field of
+ *            IOMMU_HWPT_FAULT_ID_VALID is set.
  * @data_uptr: User pointer to the type specific data
  *
  * Explicitly allocate a hardware page table object. This is the same object
@@ -441,6 +446,7 @@ struct iommu_hwpt_alloc {
 	__u32 __reserved;
 	__u32 data_type;
 	__u32 data_len;
+	__u32 fault_id;
 	__aligned_u64 data_uptr;
 };
 #define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_ALLOC)
diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
index 9844a85feeb2..e752d1c49dde 100644
--- a/drivers/iommu/iommufd/fault.c
+++ b/drivers/iommu/iommufd/fault.c
@@ -253,3 +253,17 @@ int iommufd_fault_alloc(struct iommufd_ucmd *ucmd)
 
 	return rc;
 }
+
+int iommufd_fault_iopf_handler(struct iopf_group *group)
+{
+	struct iommufd_hw_pagetable *hwpt = group->cookie->domain->fault_data;
+	struct iommufd_fault *fault = hwpt->fault;
+
+	mutex_lock(&fault->mutex);
+	list_add_tail(&group->node, &fault->deliver);
+	mutex_unlock(&fault->mutex);
+
+	wake_up_interruptible(&fault->wait_queue);
+
+	return 0;
+}
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 3f3f1fa1a0a9..2703d5aea4f5 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -8,6 +8,15 @@
 #include "../iommu-priv.h"
 #include "iommufd_private.h"
 
+static void __iommufd_hwpt_destroy(struct iommufd_hw_pagetable *hwpt)
+{
+	if (hwpt->domain)
+		iommu_domain_free(hwpt->domain);
+
+	if (hwpt->fault)
+		iommufd_put_object(hwpt->fault->ictx, &hwpt->fault->obj);
+}
+
 void iommufd_hwpt_paging_destroy(struct iommufd_object *obj)
 {
 	struct iommufd_hwpt_paging *hwpt_paging =
@@ -22,9 +31,7 @@ void iommufd_hwpt_paging_destroy(struct iommufd_object *obj)
 					 hwpt_paging->common.domain);
 	}
 
-	if (hwpt_paging->common.domain)
-		iommu_domain_free(hwpt_paging->common.domain);
-
+	__iommufd_hwpt_destroy(&hwpt_paging->common);
 	refcount_dec(&hwpt_paging->ioas->obj.users);
 }
 
@@ -49,9 +56,7 @@ void iommufd_hwpt_nested_destroy(struct iommufd_object *obj)
 	struct iommufd_hwpt_nested *hwpt_nested =
 		container_of(obj, struct iommufd_hwpt_nested, common.obj);
 
-	if (hwpt_nested->common.domain)
-		iommu_domain_free(hwpt_nested->common.domain);
-
+	__iommufd_hwpt_destroy(&hwpt_nested->common);
 	refcount_dec(&hwpt_nested->parent->common.obj.users);
 }
 
@@ -213,7 +218,8 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx,
 	struct iommufd_hw_pagetable *hwpt;
 	int rc;
 
-	if (flags || !user_data->len || !ops->domain_alloc_user)
+	if ((flags & ~IOMMU_HWPT_FAULT_ID_VALID) ||
+	    !user_data->len || !ops->domain_alloc_user)
 		return ERR_PTR(-EOPNOTSUPP);
 	if (parent->auto_domain || !parent->nest_parent)
 		return ERR_PTR(-EINVAL);
@@ -227,7 +233,7 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx,
 	refcount_inc(&parent->common.obj.users);
 	hwpt_nested->parent = parent;
 
-	hwpt->domain = ops->domain_alloc_user(idev->dev, flags,
+	hwpt->domain = ops->domain_alloc_user(idev->dev, 0,
 					      parent->common.domain, user_data);
 	if (IS_ERR(hwpt->domain)) {
 		rc = PTR_ERR(hwpt->domain);
@@ -307,6 +313,20 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 		goto out_put_pt;
 	}
 
+	if (cmd->flags & IOMMU_HWPT_FAULT_ID_VALID) {
+		struct iommufd_fault *fault;
+
+		fault = iommufd_get_fault(ucmd, cmd->fault_id);
+		if (IS_ERR(fault)) {
+			rc = PTR_ERR(fault);
+			goto out_hwpt;
+		}
+		hwpt->fault = fault;
+		hwpt->domain->iopf_handler = iommufd_fault_iopf_handler;
+		hwpt->domain->fault_data = hwpt;
+		hwpt->fault_capable = true;
+	}
+
 	cmd->out_hwpt_id = hwpt->obj.id;
 	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
 	if (rc)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v3 6/8] iommufd: IOPF-capable hw page table attach/detach/replace
  2024-01-22  7:38 [PATCH v3 0/8] IOMMUFD: Deliver IO page faults to user space Lu Baolu
                   ` (4 preceding siblings ...)
  2024-01-22  7:39 ` [PATCH v3 5/8] iommufd: Associate fault object with iommufd_hw_pgtable Lu Baolu
@ 2024-01-22  7:39 ` Lu Baolu
       [not found]   ` <CGME20240220135755eucas1p24e882cb5a735caed4bbfedb22a811442@eucas1p2.samsung.com>
  2024-01-22  7:39 ` [PATCH v3 7/8] iommufd/selftest: Add IOPF support for mock device Lu Baolu
  2024-01-22  7:39 ` [PATCH v3 8/8] iommufd/selftest: Add coverage for IOPF test Lu Baolu
  7 siblings, 1 reply; 43+ messages in thread
From: Lu Baolu @ 2024-01-22  7:39 UTC (permalink / raw)
  To: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel, Lu Baolu

The iopf-capable hw page table attach/detach/replace should use the iommu
iopf-specific interfaces. The pointer to iommufd_device is stored in the
private field of the attachment cookie, so that it can be easily retrieved
in the fault handling paths. The references to iommufd_device and
iommufd_hw_pagetable objects are held until the cookie is released, which
happens after the hw_pagetable is detached from the device and all
outstanding iopf's are responded to. This guarantees that both the device
and hw_pagetable are valid before domain detachment and outstanding faults
are handled.

The iopf-capable hw page tables can only be attached to devices that
support the IOMMU_DEV_FEAT_IOPF feature. On the first attachment of an
iopf-capable hw_pagetable to the device, the IOPF feature is enabled on
the device. Similarly, after the last iopf-capable hwpt is detached from
the device, the IOPF feature is disabled on the device.

The current implementation allows a replacement between iopf-capable and
non-iopf-capable hw page tables. This matches the nested translation use
case, where a parent domain is attached by default and can then be
replaced with a nested user domain with iopf support.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/iommufd/iommufd_private.h |   7 ++
 drivers/iommu/iommufd/device.c          |  15 ++-
 drivers/iommu/iommufd/fault.c           | 122 ++++++++++++++++++++++++
 3 files changed, 141 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 2780bed0c6b1..9844a1289c01 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -398,6 +398,7 @@ struct iommufd_device {
 	/* always the physical device */
 	struct device *dev;
 	bool enforce_cache_coherency;
+	bool iopf_enabled;
 	/* outstanding faults awaiting response indexed by fault group id */
 	struct xarray faults;
 };
@@ -459,6 +460,12 @@ iommufd_get_fault(struct iommufd_ucmd *ucmd, u32 id)
 int iommufd_fault_alloc(struct iommufd_ucmd *ucmd);
 void iommufd_fault_destroy(struct iommufd_object *obj);
 int iommufd_fault_iopf_handler(struct iopf_group *group);
+int iommufd_fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt,
+				    struct iommufd_device *idev);
+void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable *hwpt,
+				     struct iommufd_device *idev);
+int iommufd_fault_domain_replace_dev(struct iommufd_hw_pagetable *hwpt,
+				     struct iommufd_device *idev);
 
 #ifdef CONFIG_IOMMUFD_TEST
 int iommufd_test(struct iommufd_ucmd *ucmd);
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index d70913ee8fdf..c4737e876ebc 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -377,7 +377,10 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 	 * attachment.
 	 */
 	if (list_empty(&idev->igroup->device_list)) {
-		rc = iommu_attach_group(hwpt->domain, idev->igroup->group);
+		if (hwpt->fault_capable)
+			rc = iommufd_fault_domain_attach_dev(hwpt, idev);
+		else
+			rc = iommu_attach_group(hwpt->domain, idev->igroup->group);
 		if (rc)
 			goto err_unresv;
 		idev->igroup->hwpt = hwpt;
@@ -403,7 +406,10 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev)
 	mutex_lock(&idev->igroup->lock);
 	list_del(&idev->group_item);
 	if (list_empty(&idev->igroup->device_list)) {
-		iommu_detach_group(hwpt->domain, idev->igroup->group);
+		if (hwpt->fault_capable)
+			iommufd_fault_domain_detach_dev(hwpt, idev);
+		else
+			iommu_detach_group(hwpt->domain, idev->igroup->group);
 		idev->igroup->hwpt = NULL;
 	}
 	if (hwpt_is_paging(hwpt))
@@ -498,7 +504,10 @@ iommufd_device_do_replace(struct iommufd_device *idev,
 			goto err_unlock;
 	}
 
-	rc = iommu_group_replace_domain(igroup->group, hwpt->domain);
+	if (old_hwpt->fault_capable || hwpt->fault_capable)
+		rc = iommufd_fault_domain_replace_dev(hwpt, idev);
+	else
+		rc = iommu_group_replace_domain(igroup->group, hwpt->domain);
 	if (rc)
 		goto err_unresv;
 
diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
index e752d1c49dde..a4a49f3cd4c2 100644
--- a/drivers/iommu/iommufd/fault.c
+++ b/drivers/iommu/iommufd/fault.c
@@ -267,3 +267,125 @@ int iommufd_fault_iopf_handler(struct iopf_group *group)
 
 	return 0;
 }
+
+static void release_attach_cookie(struct iopf_attach_cookie *cookie)
+{
+	struct iommufd_hw_pagetable *hwpt = cookie->domain->fault_data;
+	struct iommufd_device *idev = cookie->private;
+
+	refcount_dec(&idev->obj.users);
+	refcount_dec(&hwpt->obj.users);
+	kfree(cookie);
+}
+
+static int iommufd_fault_iopf_enable(struct iommufd_device *idev)
+{
+	int ret;
+
+	if (idev->iopf_enabled)
+		return 0;
+
+	ret = iommu_dev_enable_feature(idev->dev, IOMMU_DEV_FEAT_IOPF);
+	if (ret)
+		return ret;
+
+	idev->iopf_enabled = true;
+
+	return 0;
+}
+
+static void iommufd_fault_iopf_disable(struct iommufd_device *idev)
+{
+	if (!idev->iopf_enabled)
+		return;
+
+	iommu_dev_disable_feature(idev->dev, IOMMU_DEV_FEAT_IOPF);
+	idev->iopf_enabled = false;
+}
+
+int iommufd_fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt,
+				    struct iommufd_device *idev)
+{
+	struct iopf_attach_cookie *cookie;
+	int ret;
+
+	cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
+	if (!cookie)
+		return -ENOMEM;
+
+	refcount_inc(&hwpt->obj.users);
+	refcount_inc(&idev->obj.users);
+	cookie->release = release_attach_cookie;
+	cookie->private = idev;
+
+	if (!idev->iopf_enabled) {
+		ret = iommufd_fault_iopf_enable(idev);
+		if (ret)
+			goto out_put_cookie;
+	}
+
+	ret = iopf_domain_attach(hwpt->domain, idev->dev, IOMMU_NO_PASID, cookie);
+	if (ret)
+		goto out_disable_iopf;
+
+	return 0;
+out_disable_iopf:
+	iommufd_fault_iopf_disable(idev);
+out_put_cookie:
+	release_attach_cookie(cookie);
+
+	return ret;
+}
+
+void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable *hwpt,
+				     struct iommufd_device *idev)
+{
+	iopf_domain_detach(hwpt->domain, idev->dev, IOMMU_NO_PASID);
+	iommufd_fault_iopf_disable(idev);
+}
+
+int iommufd_fault_domain_replace_dev(struct iommufd_hw_pagetable *hwpt,
+				     struct iommufd_device *idev)
+{
+	bool iopf_enabled_originally = idev->iopf_enabled;
+	struct iopf_attach_cookie *cookie = NULL;
+	int ret;
+
+	if (hwpt->fault_capable) {
+		cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
+		if (!cookie)
+			return -ENOMEM;
+
+		refcount_inc(&hwpt->obj.users);
+		refcount_inc(&idev->obj.users);
+		cookie->release = release_attach_cookie;
+		cookie->private = idev;
+
+		if (!idev->iopf_enabled) {
+			ret = iommufd_fault_iopf_enable(idev);
+			if (ret) {
+				release_attach_cookie(cookie);
+				return ret;
+			}
+		}
+	}
+
+	ret = iopf_domain_replace(hwpt->domain, idev->dev, IOMMU_NO_PASID, cookie);
+	if (ret) {
+		goto out_put_cookie;
+	}
+
+	if (iopf_enabled_originally && !hwpt->fault_capable)
+		iommufd_fault_iopf_disable(idev);
+
+	return 0;
+out_put_cookie:
+	if (hwpt->fault_capable)
+		release_attach_cookie(cookie);
+	if (iopf_enabled_originally && !idev->iopf_enabled)
+		iommufd_fault_iopf_enable(idev);
+	else if (!iopf_enabled_originally && idev->iopf_enabled)
+		iommufd_fault_iopf_disable(idev);
+
+	return ret;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v3 7/8] iommufd/selftest: Add IOPF support for mock device
  2024-01-22  7:38 [PATCH v3 0/8] IOMMUFD: Deliver IO page faults to user space Lu Baolu
                   ` (5 preceding siblings ...)
  2024-01-22  7:39 ` [PATCH v3 6/8] iommufd: IOPF-capable hw page table attach/detach/replace Lu Baolu
@ 2024-01-22  7:39 ` Lu Baolu
  2024-01-22  7:39 ` [PATCH v3 8/8] iommufd/selftest: Add coverage for IOPF test Lu Baolu
  7 siblings, 0 replies; 43+ messages in thread
From: Lu Baolu @ 2024-01-22  7:39 UTC (permalink / raw)
  To: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel, Lu Baolu

Extend the selftest mock device to support generating and responding to
an IOPF. Also add an ioctl interface to userspace applications to trigger
the IOPF on the mock device. This would allow userspace applications to
test the IOMMUFD's handling of IOPFs without having to rely on any real
hardware.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/iommufd/iommufd_test.h |  8 ++++
 drivers/iommu/iommufd/selftest.c     | 63 ++++++++++++++++++++++++++++
 2 files changed, 71 insertions(+)

diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index 482d4059f5db..ff9dcd812618 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -22,6 +22,7 @@ enum {
 	IOMMU_TEST_OP_MOCK_DOMAIN_FLAGS,
 	IOMMU_TEST_OP_DIRTY,
 	IOMMU_TEST_OP_MD_CHECK_IOTLB,
+	IOMMU_TEST_OP_TRIGGER_IOPF,
 };
 
 enum {
@@ -126,6 +127,13 @@ struct iommu_test_cmd {
 			__u32 id;
 			__u32 iotlb;
 		} check_iotlb;
+		struct {
+			__u32 dev_id;
+			__u32 pasid;
+			__u32 grpid;
+			__u32 perm;
+			__u64 addr;
+		} trigger_iopf;
 	};
 	__u32 last;
 };
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 2fb2597e069f..2ca226f88856 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -445,6 +445,8 @@ static bool mock_domain_capable(struct device *dev, enum iommu_cap cap)
 	return false;
 }
 
+static struct iopf_queue *mock_iommu_iopf_queue;
+
 static struct iommu_device mock_iommu_device = {
 };
 
@@ -455,6 +457,29 @@ static struct iommu_device *mock_probe_device(struct device *dev)
 	return &mock_iommu_device;
 }
 
+static void mock_domain_page_response(struct device *dev, struct iopf_fault *evt,
+				      struct iommu_page_response *msg)
+{
+}
+
+static int mock_dev_enable_feat(struct device *dev, enum iommu_dev_features feat)
+{
+	if (feat != IOMMU_DEV_FEAT_IOPF || !mock_iommu_iopf_queue)
+		return -ENODEV;
+
+	return iopf_queue_add_device(mock_iommu_iopf_queue, dev);
+}
+
+static int mock_dev_disable_feat(struct device *dev, enum iommu_dev_features feat)
+{
+	if (feat != IOMMU_DEV_FEAT_IOPF || !mock_iommu_iopf_queue)
+		return -ENODEV;
+
+	iopf_queue_remove_device(mock_iommu_iopf_queue, dev);
+
+	return 0;
+}
+
 static const struct iommu_ops mock_ops = {
 	/*
 	 * IOMMU_DOMAIN_BLOCKED cannot be returned from def_domain_type()
@@ -470,6 +495,9 @@ static const struct iommu_ops mock_ops = {
 	.capable = mock_domain_capable,
 	.device_group = generic_device_group,
 	.probe_device = mock_probe_device,
+	.page_response = mock_domain_page_response,
+	.dev_enable_feat = mock_dev_enable_feat,
+	.dev_disable_feat = mock_dev_disable_feat,
 	.default_domain_ops =
 		&(struct iommu_domain_ops){
 			.free = mock_domain_free,
@@ -1333,6 +1361,31 @@ static int iommufd_test_dirty(struct iommufd_ucmd *ucmd, unsigned int mockpt_id,
 	return rc;
 }
 
+static int iommufd_test_trigger_iopf(struct iommufd_ucmd *ucmd,
+				     struct iommu_test_cmd *cmd)
+{
+	struct iopf_fault event = { };
+	struct iommufd_device *idev;
+
+	idev = iommufd_get_device(ucmd, cmd->trigger_iopf.dev_id);
+	if (IS_ERR(idev))
+		return PTR_ERR(idev);
+
+	event.fault.prm.flags = IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
+	if (cmd->trigger_iopf.pasid != IOMMU_NO_PASID)
+		event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
+	event.fault.type = IOMMU_FAULT_PAGE_REQ;
+	event.fault.prm.addr = cmd->trigger_iopf.addr;
+	event.fault.prm.pasid = cmd->trigger_iopf.pasid;
+	event.fault.prm.grpid = cmd->trigger_iopf.grpid;
+	event.fault.prm.perm = cmd->trigger_iopf.perm;
+
+	iommu_report_device_fault(idev->dev, &event);
+	iommufd_put_object(ucmd->ictx, &idev->obj);
+
+	return 0;
+}
+
 void iommufd_selftest_destroy(struct iommufd_object *obj)
 {
 	struct selftest_obj *sobj = container_of(obj, struct selftest_obj, obj);
@@ -1408,6 +1461,8 @@ int iommufd_test(struct iommufd_ucmd *ucmd)
 					  cmd->dirty.page_size,
 					  u64_to_user_ptr(cmd->dirty.uptr),
 					  cmd->dirty.flags);
+	case IOMMU_TEST_OP_TRIGGER_IOPF:
+		return iommufd_test_trigger_iopf(ucmd, cmd);
 	default:
 		return -EOPNOTSUPP;
 	}
@@ -1449,6 +1504,9 @@ int __init iommufd_test_init(void)
 				  &iommufd_mock_bus_type.nb);
 	if (rc)
 		goto err_sysfs;
+
+	mock_iommu_iopf_queue = iopf_queue_alloc("mock-iopfq");
+
 	return 0;
 
 err_sysfs:
@@ -1464,6 +1522,11 @@ int __init iommufd_test_init(void)
 
 void iommufd_test_exit(void)
 {
+	if (mock_iommu_iopf_queue) {
+		iopf_queue_free(mock_iommu_iopf_queue);
+		mock_iommu_iopf_queue = NULL;
+	}
+
 	iommu_device_sysfs_remove(&mock_iommu_device);
 	iommu_device_unregister_bus(&mock_iommu_device,
 				    &iommufd_mock_bus_type.bus,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v3 8/8] iommufd/selftest: Add coverage for IOPF test
  2024-01-22  7:38 [PATCH v3 0/8] IOMMUFD: Deliver IO page faults to user space Lu Baolu
                   ` (6 preceding siblings ...)
  2024-01-22  7:39 ` [PATCH v3 7/8] iommufd/selftest: Add IOPF support for mock device Lu Baolu
@ 2024-01-22  7:39 ` Lu Baolu
  7 siblings, 0 replies; 43+ messages in thread
From: Lu Baolu @ 2024-01-22  7:39 UTC (permalink / raw)
  To: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel, Lu Baolu

Extend the selftest tool to add coverage of testing IOPF handling. This
would include the following tests:

- Allocating and destorying an iommufd fault object.
- Allocating and destroying an IOPF-capable HWPT.
- Attaching/detaching/replacing an IOPF-capable HWPT on a device.
- Triggering an IOPF on the mock device.
- Retrieving and responding to the IOPF through the file interface.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 tools/testing/selftests/iommu/iommufd_utils.h | 83 +++++++++++++++++--
 tools/testing/selftests/iommu/iommufd.c       | 17 ++++
 .../selftests/iommu/iommufd_fail_nth.c        |  2 +-
 3 files changed, 96 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index c646264aa41f..bf6027f2a16d 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -153,7 +153,7 @@ static int _test_cmd_mock_domain_replace(int fd, __u32 stdev_id, __u32 pt_id,
 	EXPECT_ERRNO(_errno, _test_cmd_mock_domain_replace(self->fd, stdev_id, \
 							   pt_id, NULL))
 
-static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id,
+static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, __u32 ft_id,
 				__u32 flags, __u32 *hwpt_id, __u32 data_type,
 				void *data, size_t data_len)
 {
@@ -165,6 +165,7 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id,
 		.data_type = data_type,
 		.data_len = data_len,
 		.data_uptr = (uint64_t)data,
+		.fault_id = ft_id,
 	};
 	int ret;
 
@@ -177,24 +178,30 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id,
 }
 
 #define test_cmd_hwpt_alloc(device_id, pt_id, flags, hwpt_id)                  \
-	ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags,   \
+	ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, 0, flags,   \
 					  hwpt_id, IOMMU_HWPT_DATA_NONE, NULL, \
 					  0))
 #define test_err_hwpt_alloc(_errno, device_id, pt_id, flags, hwpt_id)   \
 	EXPECT_ERRNO(_errno, _test_cmd_hwpt_alloc(                      \
-				     self->fd, device_id, pt_id, flags, \
+				     self->fd, device_id, pt_id, 0, flags, \
 				     hwpt_id, IOMMU_HWPT_DATA_NONE, NULL, 0))
 
 #define test_cmd_hwpt_alloc_nested(device_id, pt_id, flags, hwpt_id,         \
 				   data_type, data, data_len)                \
-	ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \
+	ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, 0, flags, \
 					  hwpt_id, data_type, data, data_len))
 #define test_err_hwpt_alloc_nested(_errno, device_id, pt_id, flags, hwpt_id, \
 				   data_type, data, data_len)                \
 	EXPECT_ERRNO(_errno,                                                 \
-		     _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \
+		     _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, 0, flags, \
 					  hwpt_id, data_type, data, data_len))
 
+#define test_cmd_hwpt_alloc_iopf(device_id, pt_id, fault_id, flags, hwpt_id,    \
+				   data_type, data, data_len)                   \
+	ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, fault_id, \
+					  flags, hwpt_id, data_type, data,      \
+					  data_len))
+
 #define test_cmd_hwpt_check_iotlb(hwpt_id, iotlb_id, expected)                 \
 	({                                                                     \
 		struct iommu_test_cmd test_cmd = {                             \
@@ -673,3 +680,69 @@ static int _test_cmd_get_hw_info(int fd, __u32 device_id, void *data,
 
 #define test_cmd_get_hw_capabilities(device_id, caps, mask) \
 	ASSERT_EQ(0, _test_cmd_get_hw_info(self->fd, device_id, NULL, 0, &caps))
+
+static int _test_ioctl_fault_alloc(int fd, __u32 *fault_id, __u32 *fault_fd)
+{
+	struct iommu_fault_alloc cmd = {
+		.size = sizeof(cmd),
+	};
+	int ret;
+
+	ret = ioctl(fd, IOMMU_FAULT_ALLOC, &cmd);
+	if (ret)
+		return ret;
+	*fault_id = cmd.out_fault_id;
+	*fault_fd = cmd.out_fault_fd;
+	return 0;
+}
+
+#define test_ioctl_fault_alloc(fault_id, fault_fd)                       \
+	({                                                               \
+		ASSERT_EQ(0, _test_ioctl_fault_alloc(self->fd, fault_id, \
+						     fault_fd));         \
+		ASSERT_NE(0, *(fault_id));                               \
+		ASSERT_NE(0, *(fault_fd));                               \
+	})
+
+static int _test_cmd_trigger_iopf(int fd, __u32 device_id, __u32 fault_fd)
+{
+	struct iommu_test_cmd trigger_iopf_cmd = {
+		.size = sizeof(trigger_iopf_cmd),
+		.op = IOMMU_TEST_OP_TRIGGER_IOPF,
+		.trigger_iopf = {
+			.dev_id = device_id,
+			.pasid = 0x1,
+			.grpid = 0x2,
+			.perm = IOMMU_PGFAULT_PERM_READ | IOMMU_PGFAULT_PERM_WRITE,
+			.addr = 0xdeadbeaf,
+		},
+	};
+	struct iommu_hwpt_page_response response = {
+		.size = sizeof(struct iommu_hwpt_page_response),
+		.dev_id = device_id,
+		.pasid = 0x1,
+		.grpid = 0x2,
+		.code = 0,
+		.addr = 0xdeadbeaf,
+	};
+	struct iommu_hwpt_pgfault fault = {};
+	ssize_t bytes;
+	int ret;
+
+	ret = ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_TRIGGER_IOPF), &trigger_iopf_cmd);
+	if (ret)
+		return ret;
+
+	bytes = read(fault_fd, &fault, sizeof(fault));
+	if (bytes < 0)
+		return bytes;
+
+	bytes = write(fault_fd, &response, sizeof(response));
+	if (bytes < 0)
+		return bytes;
+
+	return 0;
+}
+
+#define test_cmd_trigger_iopf(device_id, fault_fd) \
+	ASSERT_EQ(0, _test_cmd_trigger_iopf(self->fd, device_id, fault_fd))
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index 1a881e7a21d1..d7049df62ed2 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -278,6 +278,9 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested)
 	uint32_t parent_hwpt_id = 0;
 	uint32_t parent_hwpt_id_not_work = 0;
 	uint32_t test_hwpt_id = 0;
+	uint32_t iopf_hwpt_id;
+	uint32_t fault_id;
+	uint32_t fault_fd;
 
 	if (self->device_id) {
 		/* Negative tests */
@@ -325,6 +328,7 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested)
 					   sizeof(data));
 
 		/* Allocate two nested hwpts sharing one common parent hwpt */
+		test_ioctl_fault_alloc(&fault_id, &fault_fd);
 		test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id, 0,
 					   &nested_hwpt_id[0],
 					   IOMMU_HWPT_DATA_SELFTEST, &data,
@@ -333,6 +337,10 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested)
 					   &nested_hwpt_id[1],
 					   IOMMU_HWPT_DATA_SELFTEST, &data,
 					   sizeof(data));
+		test_cmd_hwpt_alloc_iopf(self->device_id, parent_hwpt_id, fault_id,
+					 IOMMU_HWPT_FAULT_ID_VALID, &iopf_hwpt_id,
+					 IOMMU_HWPT_DATA_SELFTEST, &data,
+					 sizeof(data));
 		test_cmd_hwpt_check_iotlb_all(nested_hwpt_id[0],
 					      IOMMU_TEST_IOTLB_DEFAULT);
 		test_cmd_hwpt_check_iotlb_all(nested_hwpt_id[1],
@@ -503,14 +511,23 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested)
 			     _test_ioctl_destroy(self->fd, nested_hwpt_id[1]));
 		test_ioctl_destroy(nested_hwpt_id[0]);
 
+		/* Switch from nested_hwpt_id[1] to iopf_hwpt_id */
+		test_cmd_mock_domain_replace(self->stdev_id, iopf_hwpt_id);
+		EXPECT_ERRNO(EBUSY,
+			     _test_ioctl_destroy(self->fd, iopf_hwpt_id));
+		/* Trigger an IOPF on the device */
+		test_cmd_trigger_iopf(self->device_id, fault_fd);
+
 		/* Detach from nested_hwpt_id[1] and destroy it */
 		test_cmd_mock_domain_replace(self->stdev_id, parent_hwpt_id);
 		test_ioctl_destroy(nested_hwpt_id[1]);
+		test_ioctl_destroy(iopf_hwpt_id);
 
 		/* Detach from the parent hw_pagetable and destroy it */
 		test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id);
 		test_ioctl_destroy(parent_hwpt_id);
 		test_ioctl_destroy(parent_hwpt_id_not_work);
+		test_ioctl_destroy(fault_id);
 	} else {
 		test_err_hwpt_alloc(ENOENT, self->device_id, self->ioas_id, 0,
 				    &parent_hwpt_id);
diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c
index f590417cd67a..c5d5e69452b0 100644
--- a/tools/testing/selftests/iommu/iommufd_fail_nth.c
+++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c
@@ -615,7 +615,7 @@ TEST_FAIL_NTH(basic_fail_nth, device)
 	if (_test_cmd_get_hw_info(self->fd, idev_id, &info, sizeof(info), NULL))
 		return -1;
 
-	if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0, &hwpt_id,
+	if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0, 0, &hwpt_id,
 				 IOMMU_HWPT_DATA_NONE, 0, 0))
 		return -1;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* RE: [PATCH v3 1/8] iommu: Add iopf domain attach/detach/replace interface
  2024-01-22  7:38 ` [PATCH v3 1/8] iommu: Add iopf domain attach/detach/replace interface Lu Baolu
@ 2024-02-07  8:11   ` Tian, Kevin
  2024-02-21  5:52     ` Baolu Lu
  0 siblings, 1 reply; 43+ messages in thread
From: Tian, Kevin @ 2024-02-07  8:11 UTC (permalink / raw)
  To: Lu Baolu, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Monday, January 22, 2024 3:39 PM
> 
> There is a slight difference between iopf domains and non-iopf domains.
> In the latter, references to domains occur between attach and detach;
> While in the former, due to the existence of asynchronous iopf handling
> paths, references to the domain may occur after detach, which leads to
> potential UAF issues.

Does UAF still exist if iommu driver follows the guidance you just added
to iopf_queue_remove_device()?

it clearly says that the driver needs to disable IOMMU PRI reception,
remove device from iopf queue and disable PRI on the device.

presumably those are all about what needs to be done in the detach
operation. Then once detach completes there should be no more
reference to the domain from the iopf path?

> 
> +struct iopf_attach_cookie {
> +	struct iommu_domain *domain;
> +	struct device *dev;
> +	unsigned int pasid;
> +	refcount_t users;
> +
> +	void *private;
> +	void (*release)(struct iopf_attach_cookie *cookie);
> +};

this cookie has nothing specific to iopf.

it may makes more sense to build a generic iommu_attach_device_cookie()
helper so the same object can be reused in future other usages too.

within iommu core it can check domain iopf handler and this generic cookie
to update iopf specific data e.g. the pasid_cookie xarray.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [PATCH v3 5/8] iommufd: Associate fault object with iommufd_hw_pgtable
  2024-01-22  7:39 ` [PATCH v3 5/8] iommufd: Associate fault object with iommufd_hw_pgtable Lu Baolu
@ 2024-02-07  8:14   ` Tian, Kevin
  2024-02-21  6:06     ` Baolu Lu
  2024-03-02  2:36   ` Zhangfei Gao
  2024-03-08 19:05   ` Jason Gunthorpe
  2 siblings, 1 reply; 43+ messages in thread
From: Tian, Kevin @ 2024-02-07  8:14 UTC (permalink / raw)
  To: Lu Baolu, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel

> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Monday, January 22, 2024 3:39 PM
> 
> +
> +int iommufd_fault_iopf_handler(struct iopf_group *group)
> +{
> +	struct iommufd_hw_pagetable *hwpt = group->cookie->domain-
> >fault_data;
> +	struct iommufd_fault *fault = hwpt->fault;
> +

why not directly using iommufd_fault as the fault_data?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 6/8] iommufd: IOPF-capable hw page table attach/detach/replace
       [not found]   ` <CGME20240220135755eucas1p24e882cb5a735caed4bbfedb22a811442@eucas1p2.samsung.com>
@ 2024-02-20 13:57     ` Joel Granados
  2024-02-21  6:15       ` Baolu Lu
  0 siblings, 1 reply; 43+ messages in thread
From: Joel Granados @ 2024-02-20 13:57 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, iommu, virtualization, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 8791 bytes --]

On Mon, Jan 22, 2024 at 03:39:01PM +0800, Lu Baolu wrote:
> The iopf-capable hw page table attach/detach/replace should use the iommu
> iopf-specific interfaces. The pointer to iommufd_device is stored in the
> private field of the attachment cookie, so that it can be easily retrieved
> in the fault handling paths. The references to iommufd_device and
> iommufd_hw_pagetable objects are held until the cookie is released, which
> happens after the hw_pagetable is detached from the device and all
> outstanding iopf's are responded to. This guarantees that both the device
> and hw_pagetable are valid before domain detachment and outstanding faults
> are handled.
> 
> The iopf-capable hw page tables can only be attached to devices that
> support the IOMMU_DEV_FEAT_IOPF feature. On the first attachment of an
> iopf-capable hw_pagetable to the device, the IOPF feature is enabled on
> the device. Similarly, after the last iopf-capable hwpt is detached from
> the device, the IOPF feature is disabled on the device.
> 
> The current implementation allows a replacement between iopf-capable and
> non-iopf-capable hw page tables. This matches the nested translation use
> case, where a parent domain is attached by default and can then be
> replaced with a nested user domain with iopf support.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/iommufd/iommufd_private.h |   7 ++
>  drivers/iommu/iommufd/device.c          |  15 ++-
>  drivers/iommu/iommufd/fault.c           | 122 ++++++++++++++++++++++++
>  3 files changed, 141 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
> index 2780bed0c6b1..9844a1289c01 100644
> --- a/drivers/iommu/iommufd/iommufd_private.h
> +++ b/drivers/iommu/iommufd/iommufd_private.h
> @@ -398,6 +398,7 @@ struct iommufd_device {
>  	/* always the physical device */
>  	struct device *dev;
>  	bool enforce_cache_coherency;
> +	bool iopf_enabled;
>  	/* outstanding faults awaiting response indexed by fault group id */
>  	struct xarray faults;
>  };
> @@ -459,6 +460,12 @@ iommufd_get_fault(struct iommufd_ucmd *ucmd, u32 id)
>  int iommufd_fault_alloc(struct iommufd_ucmd *ucmd);
>  void iommufd_fault_destroy(struct iommufd_object *obj);
>  int iommufd_fault_iopf_handler(struct iopf_group *group);
> +int iommufd_fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt,
> +				    struct iommufd_device *idev);
> +void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable *hwpt,
> +				     struct iommufd_device *idev);
> +int iommufd_fault_domain_replace_dev(struct iommufd_hw_pagetable *hwpt,
> +				     struct iommufd_device *idev);
>  
>  #ifdef CONFIG_IOMMUFD_TEST
>  int iommufd_test(struct iommufd_ucmd *ucmd);
> diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> index d70913ee8fdf..c4737e876ebc 100644
> --- a/drivers/iommu/iommufd/device.c
> +++ b/drivers/iommu/iommufd/device.c
> @@ -377,7 +377,10 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
>  	 * attachment.
>  	 */
>  	if (list_empty(&idev->igroup->device_list)) {
> -		rc = iommu_attach_group(hwpt->domain, idev->igroup->group);
> +		if (hwpt->fault_capable)
> +			rc = iommufd_fault_domain_attach_dev(hwpt, idev);
> +		else
> +			rc = iommu_attach_group(hwpt->domain, idev->igroup->group);
>  		if (rc)
>  			goto err_unresv;
>  		idev->igroup->hwpt = hwpt;
> @@ -403,7 +406,10 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev)
>  	mutex_lock(&idev->igroup->lock);
>  	list_del(&idev->group_item);
>  	if (list_empty(&idev->igroup->device_list)) {
> -		iommu_detach_group(hwpt->domain, idev->igroup->group);
> +		if (hwpt->fault_capable)
> +			iommufd_fault_domain_detach_dev(hwpt, idev);
> +		else
> +			iommu_detach_group(hwpt->domain, idev->igroup->group);
>  		idev->igroup->hwpt = NULL;
>  	}
>  	if (hwpt_is_paging(hwpt))
> @@ -498,7 +504,10 @@ iommufd_device_do_replace(struct iommufd_device *idev,
>  			goto err_unlock;
>  	}
>  
> -	rc = iommu_group_replace_domain(igroup->group, hwpt->domain);
> +	if (old_hwpt->fault_capable || hwpt->fault_capable)
> +		rc = iommufd_fault_domain_replace_dev(hwpt, idev);
> +	else
> +		rc = iommu_group_replace_domain(igroup->group, hwpt->domain);
>  	if (rc)
>  		goto err_unresv;
>  
> diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
> index e752d1c49dde..a4a49f3cd4c2 100644
> --- a/drivers/iommu/iommufd/fault.c
> +++ b/drivers/iommu/iommufd/fault.c
> @@ -267,3 +267,125 @@ int iommufd_fault_iopf_handler(struct iopf_group *group)
>  
>  	return 0;
>  }
> +
> +static void release_attach_cookie(struct iopf_attach_cookie *cookie)
> +{
> +	struct iommufd_hw_pagetable *hwpt = cookie->domain->fault_data;
There is a possibility here of cookie->domain being NULL. When you call
release_attach_cookie from iommufd_fault_domain_attach_dev if
idev->iopf_enabled is false. In this case, you have not set the domain
yet.

> +	struct iommufd_device *idev = cookie->private;
> +
> +	refcount_dec(&idev->obj.users);
> +	refcount_dec(&hwpt->obj.users);
You should decrease this ref count only if the cookie actually had a
domain.

This function could be something like this:

	static void release_attach_cookie(struct iopf_attach_cookie *cookie)
	{
		struct iommufd_hw_pagetable *hwpt;
		struct iommufd_device *idev = cookie->private;

		refcount_dec(&idev->obj.users);
		if (cookie->domain) {
			hwpt = cookie->domain->fault_data;
			refcount_dec(&hwpt->obj.users);
		}
		kfree(cookie);
	}


> +	kfree(cookie);
> +}
> +
> +static int iommufd_fault_iopf_enable(struct iommufd_device *idev)
> +{
> +	int ret;
> +
> +	if (idev->iopf_enabled)
> +		return 0;
> +
> +	ret = iommu_dev_enable_feature(idev->dev, IOMMU_DEV_FEAT_IOPF);
> +	if (ret)
> +		return ret;
> +
> +	idev->iopf_enabled = true;
> +
> +	return 0;
> +}
> +
> +static void iommufd_fault_iopf_disable(struct iommufd_device *idev)
> +{
> +	if (!idev->iopf_enabled)
> +		return;
> +
> +	iommu_dev_disable_feature(idev->dev, IOMMU_DEV_FEAT_IOPF);
> +	idev->iopf_enabled = false;
> +}
> +
> +int iommufd_fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt,
> +				    struct iommufd_device *idev)
> +{
> +	struct iopf_attach_cookie *cookie;
> +	int ret;
> +
> +	cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
> +	if (!cookie)
> +		return -ENOMEM;
> +
> +	refcount_inc(&hwpt->obj.users);
> +	refcount_inc(&idev->obj.users);
> +	cookie->release = release_attach_cookie;
> +	cookie->private = idev;
> +
> +	if (!idev->iopf_enabled) {
> +		ret = iommufd_fault_iopf_enable(idev);
> +		if (ret)
> +			goto out_put_cookie;
You have not set domain here and release_attach_cookie will try to
access a null address.

> +	}
> +
> +	ret = iopf_domain_attach(hwpt->domain, idev->dev, IOMMU_NO_PASID, cookie);
> +	if (ret)
> +		goto out_disable_iopf;
> +
> +	return 0;
> +out_disable_iopf:
> +	iommufd_fault_iopf_disable(idev);
> +out_put_cookie:
> +	release_attach_cookie(cookie);
> +
> +	return ret;
> +}
> +
> +void iommufd_fault_domain_detach_dev(struct iommufd_hw_pagetable *hwpt,
> +				     struct iommufd_device *idev)
> +{
> +	iopf_domain_detach(hwpt->domain, idev->dev, IOMMU_NO_PASID);
> +	iommufd_fault_iopf_disable(idev);
> +}
> +
> +int iommufd_fault_domain_replace_dev(struct iommufd_hw_pagetable *hwpt,
> +				     struct iommufd_device *idev)
> +{
> +	bool iopf_enabled_originally = idev->iopf_enabled;
> +	struct iopf_attach_cookie *cookie = NULL;
> +	int ret;
> +
> +	if (hwpt->fault_capable) {
> +		cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
> +		if (!cookie)
> +			return -ENOMEM;
> +
> +		refcount_inc(&hwpt->obj.users);
> +		refcount_inc(&idev->obj.users);
> +		cookie->release = release_attach_cookie;
> +		cookie->private = idev;
> +
> +		if (!idev->iopf_enabled) {
> +			ret = iommufd_fault_iopf_enable(idev);
> +			if (ret) {
> +				release_attach_cookie(cookie);
> +				return ret;
> +			}
> +		}
> +	}
> +
> +	ret = iopf_domain_replace(hwpt->domain, idev->dev, IOMMU_NO_PASID, cookie);
> +	if (ret) {
> +		goto out_put_cookie;
> +	}
> +
> +	if (iopf_enabled_originally && !hwpt->fault_capable)
> +		iommufd_fault_iopf_disable(idev);
> +
> +	return 0;
> +out_put_cookie:
> +	if (hwpt->fault_capable)
> +		release_attach_cookie(cookie);
> +	if (iopf_enabled_originally && !idev->iopf_enabled)
> +		iommufd_fault_iopf_enable(idev);
> +	else if (!iopf_enabled_originally && idev->iopf_enabled)
> +		iommufd_fault_iopf_disable(idev);
> +
> +	return ret;
> +}
> -- 
> 2.34.1
> 

-- 

Joel Granados

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 1/8] iommu: Add iopf domain attach/detach/replace interface
  2024-02-07  8:11   ` Tian, Kevin
@ 2024-02-21  5:52     ` Baolu Lu
  2024-02-21  6:49       ` Tian, Kevin
  0 siblings, 1 reply; 43+ messages in thread
From: Baolu Lu @ 2024-02-21  5:52 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: baolu.lu, iommu, virtualization, linux-kernel

On 2024/2/7 16:11, Tian, Kevin wrote:
>> From: Lu Baolu <baolu.lu@linux.intel.com>
>> Sent: Monday, January 22, 2024 3:39 PM
>>
>> There is a slight difference between iopf domains and non-iopf domains.
>> In the latter, references to domains occur between attach and detach;
>> While in the former, due to the existence of asynchronous iopf handling
>> paths, references to the domain may occur after detach, which leads to
>> potential UAF issues.
> 
> Does UAF still exist if iommu driver follows the guidance you just added
> to iopf_queue_remove_device()?
> 
> it clearly says that the driver needs to disable IOMMU PRI reception,
> remove device from iopf queue and disable PRI on the device.

The iopf_queue_remove_device() function is only called after the last
iopf-capable domain is detached from the device. It may not be called
during domain replacement. Hence, there is no guarantee that
iopf_queue_remove_device() will be called when a domain is detached from
the device.

> 
> presumably those are all about what needs to be done in the detach
> operation. Then once detach completes there should be no more
> reference to the domain from the iopf path?

The domain pointer stored in the iopf_group structure is only released
after the iopf response, possibly after the domain is detached from the
device. Thus, the domain pointer can only be freed after the iopf
response.

> 
>>
>> +struct iopf_attach_cookie {
>> +	struct iommu_domain *domain;
>> +	struct device *dev;
>> +	unsigned int pasid;
>> +	refcount_t users;
>> +
>> +	void *private;
>> +	void (*release)(struct iopf_attach_cookie *cookie);
>> +};
> 
> this cookie has nothing specific to iopf.
> 
> it may makes more sense to build a generic iommu_attach_device_cookie()
> helper so the same object can be reused in future other usages too.
> 
> within iommu core it can check domain iopf handler and this generic cookie
> to update iopf specific data e.g. the pasid_cookie xarray.

This means attaching an iopf-capable domain follows two steps:

1) Attaching the domain to the device.
2) Setting up the iopf data, necessary for handling iopf data.

This creates a time window during which the iopf is enabled, but the
software cannot handle it. Or not?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 5/8] iommufd: Associate fault object with iommufd_hw_pgtable
  2024-02-07  8:14   ` Tian, Kevin
@ 2024-02-21  6:06     ` Baolu Lu
  0 siblings, 0 replies; 43+ messages in thread
From: Baolu Lu @ 2024-02-21  6:06 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: baolu.lu, iommu, virtualization, linux-kernel

On 2024/2/7 16:14, Tian, Kevin wrote:
>> From: Lu Baolu <baolu.lu@linux.intel.com>
>> Sent: Monday, January 22, 2024 3:39 PM
>>
>> +
>> +int iommufd_fault_iopf_handler(struct iopf_group *group)
>> +{
>> +	struct iommufd_hw_pagetable *hwpt = group->cookie->domain-
>>> fault_data;
>> +	struct iommufd_fault *fault = hwpt->fault;
>> +
> 
> why not directly using iommufd_fault as the fault_data?

The relationship among these structures is:

     iommufd_hwpt -> iommu_domain
           ^
           |
           v
     iommufd_fault

It appears preferable to hook the hwpt instead of iommufd_fault to an
iommu_domain structure.

Best regards,
baolu


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 6/8] iommufd: IOPF-capable hw page table attach/detach/replace
  2024-02-20 13:57     ` Joel Granados
@ 2024-02-21  6:15       ` Baolu Lu
  0 siblings, 0 replies; 43+ messages in thread
From: Baolu Lu @ 2024-02-21  6:15 UTC (permalink / raw)
  To: Joel Granados
  Cc: baolu.lu, Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, iommu, virtualization, linux-kernel

On 2024/2/20 21:57, Joel Granados wrote:
>> diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
>> index e752d1c49dde..a4a49f3cd4c2 100644
>> --- a/drivers/iommu/iommufd/fault.c
>> +++ b/drivers/iommu/iommufd/fault.c
>> @@ -267,3 +267,125 @@ int iommufd_fault_iopf_handler(struct iopf_group *group)
>>   
>>   	return 0;
>>   }
>> +
>> +static void release_attach_cookie(struct iopf_attach_cookie *cookie)
>> +{
>> +	struct iommufd_hw_pagetable *hwpt = cookie->domain->fault_data;
> There is a possibility here of cookie->domain being NULL. When you call
> release_attach_cookie from iommufd_fault_domain_attach_dev if
> idev->iopf_enabled is false. In this case, you have not set the domain
> yet.

Yes. Good catch!

> 
>> +	struct iommufd_device *idev = cookie->private;
>> +
>> +	refcount_dec(&idev->obj.users);
>> +	refcount_dec(&hwpt->obj.users);
> You should decrease this ref count only if the cookie actually had a
> domain.
> 
> This function could be something like this:
> 
> 	static void release_attach_cookie(struct iopf_attach_cookie *cookie)
> 	{
> 		struct iommufd_hw_pagetable *hwpt;
> 		struct iommufd_device *idev = cookie->private;
> 
> 		refcount_dec(&idev->obj.users);
> 		if (cookie->domain) {
> 			hwpt = cookie->domain->fault_data;
> 			refcount_dec(&hwpt->obj.users);
> 		}
> 		kfree(cookie);
> 	}

Yeah, fixed.

>> +	kfree(cookie);
>> +}
>> +
>> +static int iommufd_fault_iopf_enable(struct iommufd_device *idev)
>> +{
>> +	int ret;
>> +
>> +	if (idev->iopf_enabled)
>> +		return 0;
>> +
>> +	ret = iommu_dev_enable_feature(idev->dev, IOMMU_DEV_FEAT_IOPF);
>> +	if (ret)
>> +		return ret;
>> +
>> +	idev->iopf_enabled = true;
>> +
>> +	return 0;
>> +}
>> +
>> +static void iommufd_fault_iopf_disable(struct iommufd_device *idev)
>> +{
>> +	if (!idev->iopf_enabled)
>> +		return;
>> +
>> +	iommu_dev_disable_feature(idev->dev, IOMMU_DEV_FEAT_IOPF);
>> +	idev->iopf_enabled = false;
>> +}
>> +
>> +int iommufd_fault_domain_attach_dev(struct iommufd_hw_pagetable *hwpt,
>> +				    struct iommufd_device *idev)
>> +{
>> +	struct iopf_attach_cookie *cookie;
>> +	int ret;
>> +
>> +	cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
>> +	if (!cookie)
>> +		return -ENOMEM;
>> +
>> +	refcount_inc(&hwpt->obj.users);
>> +	refcount_inc(&idev->obj.users);
>> +	cookie->release = release_attach_cookie;
>> +	cookie->private = idev;
>> +
>> +	if (!idev->iopf_enabled) {
>> +		ret = iommufd_fault_iopf_enable(idev);
>> +		if (ret)
>> +			goto out_put_cookie;
> You have not set domain here and release_attach_cookie will try to
> access a null address.

Fixed as above.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [PATCH v3 1/8] iommu: Add iopf domain attach/detach/replace interface
  2024-02-21  5:52     ` Baolu Lu
@ 2024-02-21  6:49       ` Tian, Kevin
  2024-02-21  7:21         ` Baolu Lu
  0 siblings, 1 reply; 43+ messages in thread
From: Tian, Kevin @ 2024-02-21  6:49 UTC (permalink / raw)
  To: Baolu Lu, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Wednesday, February 21, 2024 1:53 PM
> 
> On 2024/2/7 16:11, Tian, Kevin wrote:
> >> From: Lu Baolu <baolu.lu@linux.intel.com>
> >> Sent: Monday, January 22, 2024 3:39 PM
> >>
> >> There is a slight difference between iopf domains and non-iopf domains.
> >> In the latter, references to domains occur between attach and detach;
> >> While in the former, due to the existence of asynchronous iopf handling
> >> paths, references to the domain may occur after detach, which leads to
> >> potential UAF issues.
> >
> > Does UAF still exist if iommu driver follows the guidance you just added
> > to iopf_queue_remove_device()?
> >
> > it clearly says that the driver needs to disable IOMMU PRI reception,
> > remove device from iopf queue and disable PRI on the device.
> 
> The iopf_queue_remove_device() function is only called after the last
> iopf-capable domain is detached from the device. It may not be called
> during domain replacement. Hence, there is no guarantee that
> iopf_queue_remove_device() will be called when a domain is detached from
> the device.

oh yes. More accurately even the last detach may not trigger it.

e.g. idxd driver does it at device/driver unbind.

> 
> >
> > presumably those are all about what needs to be done in the detach
> > operation. Then once detach completes there should be no more
> > reference to the domain from the iopf path?
> 
> The domain pointer stored in the iopf_group structure is only released
> after the iopf response, possibly after the domain is detached from the
> device. Thus, the domain pointer can only be freed after the iopf
> response.

make sense.

> 
> >
> >>
> >> +struct iopf_attach_cookie {
> >> +	struct iommu_domain *domain;
> >> +	struct device *dev;
> >> +	unsigned int pasid;
> >> +	refcount_t users;
> >> +
> >> +	void *private;
> >> +	void (*release)(struct iopf_attach_cookie *cookie);
> >> +};
> >
> > this cookie has nothing specific to iopf.
> >
> > it may makes more sense to build a generic iommu_attach_device_cookie()
> > helper so the same object can be reused in future other usages too.
> >
> > within iommu core it can check domain iopf handler and this generic cookie
> > to update iopf specific data e.g. the pasid_cookie xarray.
> 
> This means attaching an iopf-capable domain follows two steps:
> 
> 1) Attaching the domain to the device.
> 2) Setting up the iopf data, necessary for handling iopf data.
> 
> This creates a time window during which the iopf is enabled, but the
> software cannot handle it. Or not?
> 

why two steps? in attach you can setup the iopf data when recognizing
that the domain is iopf capable...

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 1/8] iommu: Add iopf domain attach/detach/replace interface
  2024-02-21  6:49       ` Tian, Kevin
@ 2024-02-21  7:21         ` Baolu Lu
  2024-02-21  7:22           ` Tian, Kevin
  0 siblings, 1 reply; 43+ messages in thread
From: Baolu Lu @ 2024-02-21  7:21 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: baolu.lu, iommu, virtualization, linux-kernel

On 2024/2/21 14:49, Tian, Kevin wrote:
>>>> +struct iopf_attach_cookie {
>>>> +	struct iommu_domain *domain;
>>>> +	struct device *dev;
>>>> +	unsigned int pasid;
>>>> +	refcount_t users;
>>>> +
>>>> +	void *private;
>>>> +	void (*release)(struct iopf_attach_cookie *cookie);
>>>> +};
>>> this cookie has nothing specific to iopf.
>>>
>>> it may makes more sense to build a generic iommu_attach_device_cookie()
>>> helper so the same object can be reused in future other usages too.
>>>
>>> within iommu core it can check domain iopf handler and this generic cookie
>>> to update iopf specific data e.g. the pasid_cookie xarray.
>> This means attaching an iopf-capable domain follows two steps:
>>
>> 1) Attaching the domain to the device.
>> 2) Setting up the iopf data, necessary for handling iopf data.
>>
>> This creates a time window during which the iopf is enabled, but the
>> software cannot handle it. Or not?
>>
> why two steps? in attach you can setup the iopf data when recognizing
> that the domain is iopf capable...

Oh, maybe I misunderstood. So your proposal is to make the new interface
generic, not for iopf only?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [PATCH v3 1/8] iommu: Add iopf domain attach/detach/replace interface
  2024-02-21  7:21         ` Baolu Lu
@ 2024-02-21  7:22           ` Tian, Kevin
  0 siblings, 0 replies; 43+ messages in thread
From: Tian, Kevin @ 2024-02-21  7:22 UTC (permalink / raw)
  To: Baolu Lu, Jason Gunthorpe, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Liu, Yi L,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Wednesday, February 21, 2024 3:21 PM
> 
> On 2024/2/21 14:49, Tian, Kevin wrote:
> >>>> +struct iopf_attach_cookie {
> >>>> +	struct iommu_domain *domain;
> >>>> +	struct device *dev;
> >>>> +	unsigned int pasid;
> >>>> +	refcount_t users;
> >>>> +
> >>>> +	void *private;
> >>>> +	void (*release)(struct iopf_attach_cookie *cookie);
> >>>> +};
> >>> this cookie has nothing specific to iopf.
> >>>
> >>> it may makes more sense to build a generic
> iommu_attach_device_cookie()
> >>> helper so the same object can be reused in future other usages too.
> >>>
> >>> within iommu core it can check domain iopf handler and this generic
> cookie
> >>> to update iopf specific data e.g. the pasid_cookie xarray.
> >> This means attaching an iopf-capable domain follows two steps:
> >>
> >> 1) Attaching the domain to the device.
> >> 2) Setting up the iopf data, necessary for handling iopf data.
> >>
> >> This creates a time window during which the iopf is enabled, but the
> >> software cannot handle it. Or not?
> >>
> > why two steps? in attach you can setup the iopf data when recognizing
> > that the domain is iopf capable...
> 
> Oh, maybe I misunderstood. So your proposal is to make the new interface
> generic, not for iopf only?
> 

yes.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 5/8] iommufd: Associate fault object with iommufd_hw_pgtable
  2024-01-22  7:39 ` [PATCH v3 5/8] iommufd: Associate fault object with iommufd_hw_pgtable Lu Baolu
  2024-02-07  8:14   ` Tian, Kevin
@ 2024-03-02  2:36   ` Zhangfei Gao
  2024-03-06 15:15     ` Zhangfei Gao
  2024-03-08 19:05   ` Jason Gunthorpe
  2 siblings, 1 reply; 43+ messages in thread
From: Zhangfei Gao @ 2024-03-02  2:36 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados, iommu, virtualization, linux-kernel

On Mon, 22 Jan 2024 at 15:46, Lu Baolu <baolu.lu@linux.intel.com> wrote:
>
> When allocating a user iommufd_hw_pagetable, the user space is allowed to
> associate a fault object with the hw_pagetable by specifying the fault
> object ID in the page table allocation data and setting the
> IOMMU_HWPT_FAULT_ID_VALID flag bit.
>
> On a successful return of hwpt allocation, the user can retrieve and
> respond to page faults by reading and writing the file interface of the
> fault object.
>
> Once a fault object has been associated with a hwpt, the hwpt is
> iopf-capable, indicated by fault_capable set to true. Attaching,
> detaching, or replacing an iopf-capable hwpt to an RID or PASID will
> differ from those that are not iopf-capable. The implementation of these
> will be introduced in the next patch.
>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/iommufd/iommufd_private.h | 11 ++++++++
>  include/uapi/linux/iommufd.h            |  6 +++++
>  drivers/iommu/iommufd/fault.c           | 14 ++++++++++
>  drivers/iommu/iommufd/hw_pagetable.c    | 36 +++++++++++++++++++------
>  4 files changed, 59 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
> index 52d83e888bd0..2780bed0c6b1 100644
> --- a/drivers/iommu/iommufd/iommufd_private.h
> +++ b/drivers/iommu/iommufd/iommufd_private.h
> @@ -293,6 +293,8 @@ int iommufd_check_iova_range(struct io_pagetable *iopt,
>  struct iommufd_hw_pagetable {
>         struct iommufd_object obj;
>         struct iommu_domain *domain;
> +       struct iommufd_fault *fault;
> +       bool fault_capable : 1;
>  };
>
>  struct iommufd_hwpt_paging {
> @@ -446,8 +448,17 @@ struct iommufd_fault {
>         struct wait_queue_head wait_queue;
>  };
>
> +static inline struct iommufd_fault *
> +iommufd_get_fault(struct iommufd_ucmd *ucmd, u32 id)
> +{
> +       return container_of(iommufd_get_object(ucmd->ictx, id,
> +                                              IOMMUFD_OBJ_FAULT),
> +                           struct iommufd_fault, obj);
> +}
> +
>  int iommufd_fault_alloc(struct iommufd_ucmd *ucmd);
>  void iommufd_fault_destroy(struct iommufd_object *obj);
> +int iommufd_fault_iopf_handler(struct iopf_group *group);
>
>  #ifdef CONFIG_IOMMUFD_TEST
>  int iommufd_test(struct iommufd_ucmd *ucmd);
> diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
> index c32d62b02306..7481cdd57027 100644
> --- a/include/uapi/linux/iommufd.h
> +++ b/include/uapi/linux/iommufd.h
> @@ -357,10 +357,13 @@ struct iommu_vfio_ioas {
>   *                                the parent HWPT in a nesting configuration.
>   * @IOMMU_HWPT_ALLOC_DIRTY_TRACKING: Dirty tracking support for device IOMMU is
>   *                                   enforced on device attachment
> + * @IOMMU_HWPT_FAULT_ID_VALID: The fault_id field of hwpt allocation data is
> + *                             valid.
>   */
>  enum iommufd_hwpt_alloc_flags {
>         IOMMU_HWPT_ALLOC_NEST_PARENT = 1 << 0,
>         IOMMU_HWPT_ALLOC_DIRTY_TRACKING = 1 << 1,
> +       IOMMU_HWPT_FAULT_ID_VALID = 1 << 2,
>  };
>
>  /**
> @@ -411,6 +414,8 @@ enum iommu_hwpt_data_type {
>   * @__reserved: Must be 0
>   * @data_type: One of enum iommu_hwpt_data_type
>   * @data_len: Length of the type specific data
> + * @fault_id: The ID of IOMMUFD_FAULT object. Valid only if flags field of
> + *            IOMMU_HWPT_FAULT_ID_VALID is set.
>   * @data_uptr: User pointer to the type specific data
>   *
>   * Explicitly allocate a hardware page table object. This is the same object
> @@ -441,6 +446,7 @@ struct iommu_hwpt_alloc {
>         __u32 __reserved;
>         __u32 data_type;
>         __u32 data_len;
> +       __u32 fault_id;
>         __aligned_u64 data_uptr;
>  };
>  #define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_ALLOC)
> diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
> index 9844a85feeb2..e752d1c49dde 100644
> --- a/drivers/iommu/iommufd/fault.c
> +++ b/drivers/iommu/iommufd/fault.c
> @@ -253,3 +253,17 @@ int iommufd_fault_alloc(struct iommufd_ucmd *ucmd)
>
>         return rc;
>  }
> +
> +int iommufd_fault_iopf_handler(struct iopf_group *group)
> +{
> +       struct iommufd_hw_pagetable *hwpt = group->cookie->domain->fault_data;
> +       struct iommufd_fault *fault = hwpt->fault;
> +
> +       mutex_lock(&fault->mutex);
> +       list_add_tail(&group->node, &fault->deliver);
> +       mutex_unlock(&fault->mutex);
> +
> +       wake_up_interruptible(&fault->wait_queue);
> +
> +       return 0;
> +}
> diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
> index 3f3f1fa1a0a9..2703d5aea4f5 100644
> --- a/drivers/iommu/iommufd/hw_pagetable.c
> +++ b/drivers/iommu/iommufd/hw_pagetable.c
> @@ -8,6 +8,15 @@
>  #include "../iommu-priv.h"
>  #include "iommufd_private.h"
>
> +static void __iommufd_hwpt_destroy(struct iommufd_hw_pagetable *hwpt)
> +{
> +       if (hwpt->domain)
> +               iommu_domain_free(hwpt->domain);
> +
> +       if (hwpt->fault)
> +               iommufd_put_object(hwpt->fault->ictx, &hwpt->fault->obj);
> +}
> +
>  void iommufd_hwpt_paging_destroy(struct iommufd_object *obj)
>  {
>         struct iommufd_hwpt_paging *hwpt_paging =
> @@ -22,9 +31,7 @@ void iommufd_hwpt_paging_destroy(struct iommufd_object *obj)
>                                          hwpt_paging->common.domain);
>         }
>
> -       if (hwpt_paging->common.domain)
> -               iommu_domain_free(hwpt_paging->common.domain);
> -
> +       __iommufd_hwpt_destroy(&hwpt_paging->common);
>         refcount_dec(&hwpt_paging->ioas->obj.users);
>  }
>
> @@ -49,9 +56,7 @@ void iommufd_hwpt_nested_destroy(struct iommufd_object *obj)
>         struct iommufd_hwpt_nested *hwpt_nested =
>                 container_of(obj, struct iommufd_hwpt_nested, common.obj);
>
> -       if (hwpt_nested->common.domain)
> -               iommu_domain_free(hwpt_nested->common.domain);
> -
> +       __iommufd_hwpt_destroy(&hwpt_nested->common);
>         refcount_dec(&hwpt_nested->parent->common.obj.users);
>  }
>
> @@ -213,7 +218,8 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx,
>         struct iommufd_hw_pagetable *hwpt;
>         int rc;
>
> -       if (flags || !user_data->len || !ops->domain_alloc_user)
> +       if ((flags & ~IOMMU_HWPT_FAULT_ID_VALID) ||
> +           !user_data->len || !ops->domain_alloc_user)
>                 return ERR_PTR(-EOPNOTSUPP);
>         if (parent->auto_domain || !parent->nest_parent)
>                 return ERR_PTR(-EINVAL);
> @@ -227,7 +233,7 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx,
>         refcount_inc(&parent->common.obj.users);
>         hwpt_nested->parent = parent;
>
> -       hwpt->domain = ops->domain_alloc_user(idev->dev, flags,
> +       hwpt->domain = ops->domain_alloc_user(idev->dev, 0,
>                                               parent->common.domain, user_data);

Why remove flags? typo or any reason?
arm_smmu_domain_alloc_user  can not get flags from the user app.
User should set IOMMU_HWPT_FAULT_ID_VALID, right?

Thanks

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 5/8] iommufd: Associate fault object with iommufd_hw_pgtable
  2024-03-02  2:36   ` Zhangfei Gao
@ 2024-03-06 15:15     ` Zhangfei Gao
  2024-03-06 16:01       ` Jason Gunthorpe
  0 siblings, 1 reply; 43+ messages in thread
From: Zhangfei Gao @ 2024-03-06 15:15 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados, iommu, virtualization, linux-kernel

Hi, Baolu

On Sat, 2 Mar 2024 at 10:36, Zhangfei Gao <zhangfei.gao@linaro.org> wrote:
>
> On Mon, 22 Jan 2024 at 15:46, Lu Baolu <baolu.lu@linux.intel.com> wrote:
> >
> > When allocating a user iommufd_hw_pagetable, the user space is allowed to
> > associate a fault object with the hw_pagetable by specifying the fault
> > object ID in the page table allocation data and setting the
> > IOMMU_HWPT_FAULT_ID_VALID flag bit.
> >
> > On a successful return of hwpt allocation, the user can retrieve and
> > respond to page faults by reading and writing the file interface of the
> > fault object.
> >
> > Once a fault object has been associated with a hwpt, the hwpt is
> > iopf-capable, indicated by fault_capable set to true. Attaching,
> > detaching, or replacing an iopf-capable hwpt to an RID or PASID will
> > differ from those that are not iopf-capable. The implementation of these
> > will be introduced in the next patch.
> >
> > Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> > ---
> >  drivers/iommu/iommufd/iommufd_private.h | 11 ++++++++
> >  include/uapi/linux/iommufd.h            |  6 +++++
> >  drivers/iommu/iommufd/fault.c           | 14 ++++++++++
> >  drivers/iommu/iommufd/hw_pagetable.c    | 36 +++++++++++++++++++------
> >  4 files changed, 59 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
> > index 52d83e888bd0..2780bed0c6b1 100644
> > --- a/drivers/iommu/iommufd/iommufd_private.h
> > +++ b/drivers/iommu/iommufd/iommufd_private.h
> > @@ -293,6 +293,8 @@ int iommufd_check_iova_range(struct io_pagetable *iopt,
> >  struct iommufd_hw_pagetable {
> >         struct iommufd_object obj;
> >         struct iommu_domain *domain;
> > +       struct iommufd_fault *fault;
> > +       bool fault_capable : 1;
> >  };
> >
> >  struct iommufd_hwpt_paging {
> > @@ -446,8 +448,17 @@ struct iommufd_fault {
> >         struct wait_queue_head wait_queue;
> >  };
> >
> > +static inline struct iommufd_fault *
> > +iommufd_get_fault(struct iommufd_ucmd *ucmd, u32 id)
> > +{
> > +       return container_of(iommufd_get_object(ucmd->ictx, id,
> > +                                              IOMMUFD_OBJ_FAULT),
> > +                           struct iommufd_fault, obj);
> > +}
> > +
> >  int iommufd_fault_alloc(struct iommufd_ucmd *ucmd);
> >  void iommufd_fault_destroy(struct iommufd_object *obj);
> > +int iommufd_fault_iopf_handler(struct iopf_group *group);
> >
> >  #ifdef CONFIG_IOMMUFD_TEST
> >  int iommufd_test(struct iommufd_ucmd *ucmd);
> > diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
> > index c32d62b02306..7481cdd57027 100644
> > --- a/include/uapi/linux/iommufd.h
> > +++ b/include/uapi/linux/iommufd.h
> > @@ -357,10 +357,13 @@ struct iommu_vfio_ioas {
> >   *                                the parent HWPT in a nesting configuration.
> >   * @IOMMU_HWPT_ALLOC_DIRTY_TRACKING: Dirty tracking support for device IOMMU is
> >   *                                   enforced on device attachment
> > + * @IOMMU_HWPT_FAULT_ID_VALID: The fault_id field of hwpt allocation data is
> > + *                             valid.
> >   */
> >  enum iommufd_hwpt_alloc_flags {
> >         IOMMU_HWPT_ALLOC_NEST_PARENT = 1 << 0,
> >         IOMMU_HWPT_ALLOC_DIRTY_TRACKING = 1 << 1,
> > +       IOMMU_HWPT_FAULT_ID_VALID = 1 << 2,
> >  };
> >
> >  /**
> > @@ -411,6 +414,8 @@ enum iommu_hwpt_data_type {
> >   * @__reserved: Must be 0
> >   * @data_type: One of enum iommu_hwpt_data_type
> >   * @data_len: Length of the type specific data
> > + * @fault_id: The ID of IOMMUFD_FAULT object. Valid only if flags field of
> > + *            IOMMU_HWPT_FAULT_ID_VALID is set.
> >   * @data_uptr: User pointer to the type specific data
> >   *
> >   * Explicitly allocate a hardware page table object. This is the same object
> > @@ -441,6 +446,7 @@ struct iommu_hwpt_alloc {
> >         __u32 __reserved;
> >         __u32 data_type;
> >         __u32 data_len;
> > +       __u32 fault_id;
> >         __aligned_u64 data_uptr;
> >  };
> >  #define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_ALLOC)
> > diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
> > index 9844a85feeb2..e752d1c49dde 100644
> > --- a/drivers/iommu/iommufd/fault.c
> > +++ b/drivers/iommu/iommufd/fault.c
> > @@ -253,3 +253,17 @@ int iommufd_fault_alloc(struct iommufd_ucmd *ucmd)
> >
> >         return rc;
> >  }
> > +
> > +int iommufd_fault_iopf_handler(struct iopf_group *group)
> > +{
> > +       struct iommufd_hw_pagetable *hwpt = group->cookie->domain->fault_data;
> > +       struct iommufd_fault *fault = hwpt->fault;
> > +
> > +       mutex_lock(&fault->mutex);
> > +       list_add_tail(&group->node, &fault->deliver);
> > +       mutex_unlock(&fault->mutex);
> > +
> > +       wake_up_interruptible(&fault->wait_queue);
> > +
> > +       return 0;
> > +}
> > diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
> > index 3f3f1fa1a0a9..2703d5aea4f5 100644
> > --- a/drivers/iommu/iommufd/hw_pagetable.c
> > +++ b/drivers/iommu/iommufd/hw_pagetable.c
> > @@ -8,6 +8,15 @@
> >  #include "../iommu-priv.h"
> >  #include "iommufd_private.h"
> >
> > +static void __iommufd_hwpt_destroy(struct iommufd_hw_pagetable *hwpt)
> > +{
> > +       if (hwpt->domain)
> > +               iommu_domain_free(hwpt->domain);
> > +
> > +       if (hwpt->fault)
> > +               iommufd_put_object(hwpt->fault->ictx, &hwpt->fault->obj);
> > +}
> > +
> >  void iommufd_hwpt_paging_destroy(struct iommufd_object *obj)
> >  {
> >         struct iommufd_hwpt_paging *hwpt_paging =
> > @@ -22,9 +31,7 @@ void iommufd_hwpt_paging_destroy(struct iommufd_object *obj)
> >                                          hwpt_paging->common.domain);
> >         }
> >
> > -       if (hwpt_paging->common.domain)
> > -               iommu_domain_free(hwpt_paging->common.domain);
> > -
> > +       __iommufd_hwpt_destroy(&hwpt_paging->common);
> >         refcount_dec(&hwpt_paging->ioas->obj.users);
> >  }
> >
> > @@ -49,9 +56,7 @@ void iommufd_hwpt_nested_destroy(struct iommufd_object *obj)
> >         struct iommufd_hwpt_nested *hwpt_nested =
> >                 container_of(obj, struct iommufd_hwpt_nested, common.obj);
> >
> > -       if (hwpt_nested->common.domain)
> > -               iommu_domain_free(hwpt_nested->common.domain);
> > -
> > +       __iommufd_hwpt_destroy(&hwpt_nested->common);
> >         refcount_dec(&hwpt_nested->parent->common.obj.users);
> >  }
> >
> > @@ -213,7 +218,8 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx,
> >         struct iommufd_hw_pagetable *hwpt;
> >         int rc;
> >
> > -       if (flags || !user_data->len || !ops->domain_alloc_user)
> > +       if ((flags & ~IOMMU_HWPT_FAULT_ID_VALID) ||
> > +           !user_data->len || !ops->domain_alloc_user)
> >                 return ERR_PTR(-EOPNOTSUPP);
> >         if (parent->auto_domain || !parent->nest_parent)
> >                 return ERR_PTR(-EINVAL);
> > @@ -227,7 +233,7 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx,
> >         refcount_inc(&parent->common.obj.users);
> >         hwpt_nested->parent = parent;
> >
> > -       hwpt->domain = ops->domain_alloc_user(idev->dev, flags,
> > +       hwpt->domain = ops->domain_alloc_user(idev->dev, 0,
> >                                               parent->common.domain, user_data);
>
> Why remove flags? typo or any reason?
> arm_smmu_domain_alloc_user  can not get flags from the user app.
> User should set IOMMU_HWPT_FAULT_ID_VALID, right?
>

Double checked, this does not send flags, 0 is OK,
Only domain_alloc_user in iommufd_hwpt_paging_alloc requires flags.

In my debug, I need this patch, otherwise NULL pointer errors happen
since SVA is not set.

Subject: [PATCH] iommufd: enable/disable IOMMU_DEV_FEAT_SVA when
 iopf_enable/disable

When iopf_enable, IOMMU_DEV_FEAT_SVA has to be enabled as well,
which will call iopf_queue_add_device(master->smmu->evtq.iopf, dev).
Otherwise NULL pointer will happen since fault_param is NULL.

And disable IOMMU_DEV_FEAT_SVA in iopf_disable accordingly.

Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
---
 drivers/iommu/iommufd/fault.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
index a4a49f3cd4c2..89837d83e757 100644
--- a/drivers/iommu/iommufd/fault.c
+++ b/drivers/iommu/iommufd/fault.c
@@ -289,6 +289,12 @@ static int iommufd_fault_iopf_enable(struct
iommufd_device *idev)
        if (ret)
                return ret;

+       ret = iommu_dev_enable_feature(idev->dev, IOMMU_DEV_FEAT_SVA);
+       if (ret) {
+               iommu_dev_disable_feature(idev->dev, IOMMU_DEV_FEAT_IOPF);
+               return ret;
+       }
+
        idev->iopf_enabled = true;

        return 0;
@@ -299,6 +305,7 @@ static void iommufd_fault_iopf_disable(struct
iommufd_device *idev)
        if (!idev->iopf_enabled)
                return;

+       iommu_dev_disable_feature(idev->dev, IOMMU_DEV_FEAT_SVA);
        iommu_dev_disable_feature(idev->dev, IOMMU_DEV_FEAT_IOPF);
        idev->iopf_enabled = false;
 }


And iommufd_fault_domain_replace_dev
        if (iopf_enabled_originally && !hwpt->fault_capable)
                iommufd_fault_iopf_disable(idev);
will cause iopf enable/ disable / enable, is this required, a bit confused.

Thanks

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 5/8] iommufd: Associate fault object with iommufd_hw_pgtable
  2024-03-06 15:15     ` Zhangfei Gao
@ 2024-03-06 16:01       ` Jason Gunthorpe
  2024-03-07  1:54         ` Baolu Lu
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Gunthorpe @ 2024-03-06 16:01 UTC (permalink / raw)
  To: Zhangfei Gao
  Cc: Lu Baolu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On Wed, Mar 06, 2024 at 11:15:50PM +0800, Zhangfei Gao wrote:
> 
> Double checked, this does not send flags, 0 is OK,
> Only domain_alloc_user in iommufd_hwpt_paging_alloc requires flags.
> 
> In my debug, I need this patch, otherwise NULL pointer errors happen
> since SVA is not set.

This is some driver bug, we need to get rid of these
iommu_dev_enable_feature() requirements.

Attaching domains with properties should be entirely sufficient.

Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 5/8] iommufd: Associate fault object with iommufd_hw_pgtable
  2024-03-06 16:01       ` Jason Gunthorpe
@ 2024-03-07  1:54         ` Baolu Lu
  2024-03-08 17:19           ` Jason Gunthorpe
  0 siblings, 1 reply; 43+ messages in thread
From: Baolu Lu @ 2024-03-07  1:54 UTC (permalink / raw)
  To: Jason Gunthorpe, Zhangfei Gao
  Cc: baolu.lu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On 2024/3/7 0:01, Jason Gunthorpe wrote:
> On Wed, Mar 06, 2024 at 11:15:50PM +0800, Zhangfei Gao wrote:
>> Double checked, this does not send flags, 0 is OK,
>> Only domain_alloc_user in iommufd_hwpt_paging_alloc requires flags.
>>
>> In my debug, I need this patch, otherwise NULL pointer errors happen
>> since SVA is not set.
> This is some driver bug, we need to get rid of these
> iommu_dev_enable_feature() requirements.

Yes. Especially iopf should be independent of SVA.

The problem in the arm smmu v3 driver is that enabling iopf is actually
done in the enabling SVA path, while the enabling iopf path does nothing
except for some checks. It doesn't matter if iopf is tied with SVA, but
when it comes to delivering iopf to user space, we need to decouple it.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 5/8] iommufd: Associate fault object with iommufd_hw_pgtable
  2024-03-07  1:54         ` Baolu Lu
@ 2024-03-08 17:19           ` Jason Gunthorpe
  0 siblings, 0 replies; 43+ messages in thread
From: Jason Gunthorpe @ 2024-03-08 17:19 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Zhangfei Gao, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados, iommu, virtualization, linux-kernel

On Thu, Mar 07, 2024 at 09:54:53AM +0800, Baolu Lu wrote:
> On 2024/3/7 0:01, Jason Gunthorpe wrote:
> > On Wed, Mar 06, 2024 at 11:15:50PM +0800, Zhangfei Gao wrote:
> > > Double checked, this does not send flags, 0 is OK,
> > > Only domain_alloc_user in iommufd_hwpt_paging_alloc requires flags.
> > > 
> > > In my debug, I need this patch, otherwise NULL pointer errors happen
> > > since SVA is not set.
> > This is some driver bug, we need to get rid of these
> > iommu_dev_enable_feature() requirements.
> 
> Yes. Especially iopf should be independent of SVA.
> 
> The problem in the arm smmu v3 driver is that enabling iopf is actually
> done in the enabling SVA path, while the enabling iopf path does nothing
> except for some checks. It doesn't matter if iopf is tied with SVA, but
> when it comes to delivering iopf to user space, we need to decouple it.

Yes. Each driver is going to need to get a to a NOP for the feature
things then we can remove them.

I did half the ARM driver in my part 2, the iopf bit still needs
doing there.

Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 2/8] iommu/sva: Use iopf domain attach/detach interface
  2024-01-22  7:38 ` [PATCH v3 2/8] iommu/sva: Use iopf domain attach/detach interface Lu Baolu
@ 2024-03-08 17:46   ` Jason Gunthorpe
  2024-03-14  7:41     ` Baolu Lu
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Gunthorpe @ 2024-03-08 17:46 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On Mon, Jan 22, 2024 at 03:38:57PM +0800, Lu Baolu wrote:
> @@ -215,7 +202,23 @@ static struct iopf_group *iopf_group_alloc(struct iommu_fault_param *iopf_param,
>  		group = abort_group;
>  	}
>  
> +	cookie = iopf_pasid_cookie_get(iopf_param->dev, pasid);
> +	if (!cookie && pasid != IOMMU_NO_PASID)
> +		cookie = iopf_pasid_cookie_get(iopf_param->dev, IOMMU_NO_PASID);
> +	if (IS_ERR(cookie) || !cookie) {
> +		/*
> +		 * The PASID of this device was not attached by an I/O-capable
> +		 * domain. Ask the caller to abort handling of this fault.
> +		 * Otherwise, the reference count will be switched to the new
> +		 * iopf group and will be released in iopf_free_group().
> +		 */
> +		kfree(group);
> +		group = abort_group;
> +		cookie = NULL;
> +	}

If this is the main point of the cookie mechansim - why not just have
a refcount inside the domain itself?

I'm really having a hard time making sense of this cookie thing, we
have a lifetime issue on the domain pointer, why is adding another
structure the answer?

I see we also need to stick a pointer in the domain for iommufd to get
back to the hwpt, but that doesn't seem to need such a big system to
accomplish - just add a void *. It would make sense for the domain to
have some optional pointer to a struct for all the fault related data
that becomes allocated when a PRI domain is created..

Also, I thought we'd basically said that domain detatch is supposed to
flush any open PRIs before returning, what happened to that as a
solution to the lifetime problem?

Regardless I think we should split this into two series - improve the PRI
lifetime model for domains and then put iommufd on top of it..

Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 3/8] iommufd: Add fault and response message definitions
  2024-01-22  7:38 ` [PATCH v3 3/8] iommufd: Add fault and response message definitions Lu Baolu
@ 2024-03-08 17:50   ` Jason Gunthorpe
  2024-03-14 13:41     ` Baolu Lu
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Gunthorpe @ 2024-03-08 17:50 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On Mon, Jan 22, 2024 at 03:38:58PM +0800, Lu Baolu wrote:

> +/**
> + * enum iommu_hwpt_pgfault_flags - flags for struct iommu_hwpt_pgfault
> + * @IOMMU_PGFAULT_FLAGS_PASID_VALID: The pasid field of the fault data is
> + *                                   valid.
> + * @IOMMU_PGFAULT_FLAGS_LAST_PAGE: It's the last fault of a fault group.
> + */
> +enum iommu_hwpt_pgfault_flags {
> +	IOMMU_PGFAULT_FLAGS_PASID_VALID		= (1 << 0),
> +	IOMMU_PGFAULT_FLAGS_LAST_PAGE		= (1 << 1),
> +};
> +
> +/**
> + * enum iommu_hwpt_pgfault_perm - perm bits for struct iommu_hwpt_pgfault
> + * @IOMMU_PGFAULT_PERM_READ: request for read permission
> + * @IOMMU_PGFAULT_PERM_WRITE: request for write permission
> + * @IOMMU_PGFAULT_PERM_EXEC: request for execute permission
> + * @IOMMU_PGFAULT_PERM_PRIV: request for privileged permission

You are going to have to elaborate what PRIV is for.. We don't have
any concept of this in the UAPI for iommufd so what is a userspace
supposed to do if it hits this? EXEC is similar, we can't actually
enable exec permissions from userspace IIRC..

> +enum iommu_hwpt_pgfault_perm {
> +	IOMMU_PGFAULT_PERM_READ			= (1 << 0),
> +	IOMMU_PGFAULT_PERM_WRITE		= (1 << 1),
> +	IOMMU_PGFAULT_PERM_EXEC			= (1 << 2),
> +	IOMMU_PGFAULT_PERM_PRIV			= (1 << 3),
> +};
> +
> +/**
> + * struct iommu_hwpt_pgfault - iommu page fault data
> + * @size: sizeof(struct iommu_hwpt_pgfault)
> + * @flags: Combination of enum iommu_hwpt_pgfault_flags
> + * @dev_id: id of the originated device
> + * @pasid: Process Address Space ID
> + * @grpid: Page Request Group Index
> + * @perm: Combination of enum iommu_hwpt_pgfault_perm
> + * @addr: page address
> + */
> +struct iommu_hwpt_pgfault {
> +	__u32 size;
> +	__u32 flags;
> +	__u32 dev_id;
> +	__u32 pasid;
> +	__u32 grpid;
> +	__u32 perm;
> +	__u64 addr;
> +};

Do we need an addr + size here? I've seen a few things where I wonder
if that might become an enhancment someday.

> +/**
> + * struct iommu_hwpt_page_response - IOMMU page fault response
> + * @size: sizeof(struct iommu_hwpt_page_response)
> + * @flags: Must be set to 0
> + * @dev_id: device ID of target device for the response
> + * @pasid: Process Address Space ID
> + * @grpid: Page Request Group Index
> + * @code: response code. The supported codes include:
> + *        0: Successful; 1: Response Failure; 2: Invalid Request.

This should be an enum

> + * @addr: The fault address. Must match the addr field of the
> + *        last iommu_hwpt_pgfault of a reported iopf group.
> + */
> +struct iommu_hwpt_page_response {
> +	__u32 size;
> +	__u32 flags;
> +	__u32 dev_id;
> +	__u32 pasid;
> +	__u32 grpid;
> +	__u32 code;
> +	__u64 addr;
> +};

Do we want some kind of opaque ID value from the kernel here to match
request with response exactly? Or is the plan to search on the addr?

Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 4/8] iommufd: Add iommufd fault object
  2024-01-22  7:38 ` [PATCH v3 4/8] iommufd: Add iommufd fault object Lu Baolu
@ 2024-03-08 18:03   ` Jason Gunthorpe
  2024-03-15  1:46     ` Baolu Lu
  2024-03-20 16:18   ` Shameerali Kolothum Thodi
  1 sibling, 1 reply; 43+ messages in thread
From: Jason Gunthorpe @ 2024-03-08 18:03 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On Mon, Jan 22, 2024 at 03:38:59PM +0800, Lu Baolu wrote:
> --- /dev/null
> +++ b/drivers/iommu/iommufd/fault.c
> @@ -0,0 +1,255 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright (C) 2024 Intel Corporation
> + */
> +#define pr_fmt(fmt) "iommufd: " fmt
> +
> +#include <linux/file.h>
> +#include <linux/fs.h>
> +#include <linux/module.h>
> +#include <linux/mutex.h>
> +#include <linux/iommufd.h>
> +#include <linux/poll.h>
> +#include <linux/anon_inodes.h>
> +#include <uapi/linux/iommufd.h>
> +
> +#include "iommufd_private.h"
> +
> +static int device_add_fault(struct iopf_group *group)
> +{
> +	struct iommufd_device *idev = group->cookie->private;
> +	void *curr;
> +
> +	curr = xa_cmpxchg(&idev->faults, group->last_fault.fault.prm.grpid,
> +			  NULL, group, GFP_KERNEL);
> +
> +	return curr ? xa_err(curr) : 0;
> +}
> +
> +static void device_remove_fault(struct iopf_group *group)
> +{
> +	struct iommufd_device *idev = group->cookie->private;
> +
> +	xa_store(&idev->faults, group->last_fault.fault.prm.grpid,
> +		 NULL, GFP_KERNEL);

xa_erase ?

Is grpid OK to use this way? Doesn't it come from the originating
device?

> +void iommufd_fault_destroy(struct iommufd_object *obj)
> +{
> +	struct iommufd_fault *fault = container_of(obj, struct iommufd_fault, obj);
> +	struct iopf_group *group, *next;
> +
> +	mutex_lock(&fault->mutex);
> +	list_for_each_entry_safe(group, next, &fault->deliver, node) {
> +		list_del(&group->node);
> +		iopf_group_response(group, IOMMU_PAGE_RESP_INVALID);
> +		iopf_free_group(group);
> +	}
> +	list_for_each_entry_safe(group, next, &fault->response, node) {
> +		list_del(&group->node);
> +		device_remove_fault(group);
> +		iopf_group_response(group, IOMMU_PAGE_RESP_INVALID);
> +		iopf_free_group(group);
> +	}
> +	mutex_unlock(&fault->mutex);
> +
> +	mutex_destroy(&fault->mutex);

It is really weird to lock a mutex we are about to destroy? At this
point the refcount on the iommufd_object is zero so there had better
not be any other threads with access to this pointer!

> +static ssize_t iommufd_fault_fops_read(struct file *filep, char __user *buf,
> +				       size_t count, loff_t *ppos)
> +{
> +	size_t fault_size = sizeof(struct iommu_hwpt_pgfault);
> +	struct iommufd_fault *fault = filep->private_data;
> +	struct iommu_hwpt_pgfault data;
> +	struct iommufd_device *idev;
> +	struct iopf_group *group;
> +	struct iopf_fault *iopf;
> +	size_t done = 0;
> +	int rc;
> +
> +	if (*ppos || count % fault_size)
> +		return -ESPIPE;
> +
> +	mutex_lock(&fault->mutex);
> +	while (!list_empty(&fault->deliver) && count > done) {
> +		group = list_first_entry(&fault->deliver,
> +					 struct iopf_group, node);
> +
> +		if (list_count_nodes(&group->faults) * fault_size > count - done)
> +			break;
> +
> +		idev = (struct iommufd_device *)group->cookie->private;
> +		list_for_each_entry(iopf, &group->faults, list) {
> +			iommufd_compose_fault_message(&iopf->fault, &data, idev);
> +			rc = copy_to_user(buf + done, &data, fault_size);
> +			if (rc)
> +				goto err_unlock;
> +			done += fault_size;
> +		}
> +
> +		rc = device_add_fault(group);

See I wonder if this should be some xa_alloc or something instead of
trying to use the grpid?

> +	while (!list_empty(&fault->response) && count > done) {
> +		rc = copy_from_user(&response, buf + done, response_size);
> +		if (rc)
> +			break;
> +
> +		idev = container_of(iommufd_get_object(fault->ictx,
> +						       response.dev_id,
> +						       IOMMUFD_OBJ_DEVICE),
> +				    struct iommufd_device, obj);

It seems unfortunate we do this lookup for every iteration of the loop,
I would lift it up and cache it..

> +		if (IS_ERR(idev))
> +			break;
> +
> +		group = device_get_fault(idev, response.grpid);

This looks locked wrong, it should hold the fault mutex here and we
should probably do xchng to zero it at the same time we fetch it.

> +		if (!group ||
> +		    response.addr != group->last_fault.fault.prm.addr ||
> +		    ((group->last_fault.fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) &&
> +		      response.pasid != group->last_fault.fault.prm.pasid)) {

Why? If grpid is unique then just trust it.

> +			iommufd_put_object(fault->ictx, &idev->obj);
> +			break;
> +		}
> +
> +		iopf_group_response(group, response.code);
> +
> +		mutex_lock(&fault->mutex);
> +		list_del(&group->node);
> +		mutex_unlock(&fault->mutex);
> +
> +		device_remove_fault(group);
> +		iopf_free_group(group);
> +		done += response_size;
> +
> +		iommufd_put_object(fault->ictx, &idev->obj);
> +	}
> +
> +	return done;
> +}
> +
> +static __poll_t iommufd_fault_fops_poll(struct file *filep,
> +					struct poll_table_struct *wait)
> +{
> +	struct iommufd_fault *fault = filep->private_data;
> +	__poll_t pollflags = 0;
> +
> +	poll_wait(filep, &fault->wait_queue, wait);
> +	mutex_lock(&fault->mutex);
> +	if (!list_empty(&fault->deliver))
> +		pollflags = EPOLLIN | EPOLLRDNORM;
> +	mutex_unlock(&fault->mutex);
> +
> +	return pollflags;
> +}
> +
> +static const struct file_operations iommufd_fault_fops = {
> +	.owner		= THIS_MODULE,
> +	.open		= nonseekable_open,
> +	.read		= iommufd_fault_fops_read,
> +	.write		= iommufd_fault_fops_write,
> +	.poll		= iommufd_fault_fops_poll,
> +	.llseek		= no_llseek,
> +};

No release? That seems wrong..

> +static int get_fault_fd(struct iommufd_fault *fault)
> +{
> +	struct file *filep;
> +	int fdno;
> +
> +	fdno = get_unused_fd_flags(O_CLOEXEC);
> +	if (fdno < 0)
> +		return fdno;
> +
> +	filep = anon_inode_getfile("[iommufd-pgfault]", &iommufd_fault_fops,
> +				   fault, O_RDWR);
                                   ^^^^^^

See here you stick the iommufd_object into the FD but we don't
refcount it...

And iommufd_fault_destroy() doesn't do anything about this, so it just
UAFs the fault memory in the FD.

You need to refcount the iommufd_object here and add a release
function for the fd to put it back

*and* this FD needs to take a reference on the base iommufd_ctx fd too
as we can't tolerate them being destroyed out of sequence.

> +int iommufd_fault_alloc(struct iommufd_ucmd *ucmd)
> +{
> +	struct iommu_fault_alloc *cmd = ucmd->cmd;
> +	struct iommufd_fault *fault;
> +	int rc;
> +
> +	if (cmd->flags)
> +		return -EOPNOTSUPP;
> +
> +	fault = iommufd_object_alloc(ucmd->ictx, fault, IOMMUFD_OBJ_FAULT);
> +	if (IS_ERR(fault))
> +		return PTR_ERR(fault);
> +
> +	fault->ictx = ucmd->ictx;
> +	INIT_LIST_HEAD(&fault->deliver);
> +	INIT_LIST_HEAD(&fault->response);
> +	mutex_init(&fault->mutex);
> +	init_waitqueue_head(&fault->wait_queue);
> +
> +	rc = get_fault_fd(fault);
> +	if (rc < 0)
> +		goto out_abort;

And this is ordered wrong, fd_install has to happen after return to
user space is guarenteed as it cannot be undone.

> +	cmd->out_fault_id = fault->obj.id;
> +	cmd->out_fault_fd = rc;
> +
> +	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
> +	if (rc)
> +		goto out_abort;
> +	iommufd_object_finalize(ucmd->ictx, &fault->obj);

fd_install() goes here

> +	return 0;
> +out_abort:
> +	iommufd_object_abort_and_destroy(ucmd->ictx, &fault->obj);
> +
> +	return rc;
> +}

Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 5/8] iommufd: Associate fault object with iommufd_hw_pgtable
  2024-01-22  7:39 ` [PATCH v3 5/8] iommufd: Associate fault object with iommufd_hw_pgtable Lu Baolu
  2024-02-07  8:14   ` Tian, Kevin
  2024-03-02  2:36   ` Zhangfei Gao
@ 2024-03-08 19:05   ` Jason Gunthorpe
  2024-03-15  1:16     ` Baolu Lu
  2 siblings, 1 reply; 43+ messages in thread
From: Jason Gunthorpe @ 2024-03-08 19:05 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On Mon, Jan 22, 2024 at 03:39:00PM +0800, Lu Baolu wrote:

> @@ -411,6 +414,8 @@ enum iommu_hwpt_data_type {
>   * @__reserved: Must be 0
>   * @data_type: One of enum iommu_hwpt_data_type
>   * @data_len: Length of the type specific data
> + * @fault_id: The ID of IOMMUFD_FAULT object. Valid only if flags field of
> + *            IOMMU_HWPT_FAULT_ID_VALID is set.
>   * @data_uptr: User pointer to the type specific data
>   *
>   * Explicitly allocate a hardware page table object. This is the same object
> @@ -441,6 +446,7 @@ struct iommu_hwpt_alloc {
>  	__u32 __reserved;
>  	__u32 data_type;
>  	__u32 data_len;
> +	__u32 fault_id;
>  	__aligned_u64 data_uptr;
>  };

?? We can't add fault_id in the middle of the struct??

> +	if (cmd->flags & IOMMU_HWPT_FAULT_ID_VALID) {
> +		struct iommufd_fault *fault;
> +
> +		fault = iommufd_get_fault(ucmd, cmd->fault_id);
> +		if (IS_ERR(fault)) {
> +			rc = PTR_ERR(fault);
> +			goto out_hwpt;
> +		}
> +		hwpt->fault = fault;
> +		hwpt->domain->iopf_handler = iommufd_fault_iopf_handler;
> +		hwpt->domain->fault_data = hwpt;
> +		hwpt->fault_capable = true;

I wonder if there should be an iommu API to make a domain fault
capable?

Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 2/8] iommu/sva: Use iopf domain attach/detach interface
  2024-03-08 17:46   ` Jason Gunthorpe
@ 2024-03-14  7:41     ` Baolu Lu
  2024-03-22 16:59       ` Jason Gunthorpe
  0 siblings, 1 reply; 43+ messages in thread
From: Baolu Lu @ 2024-03-14  7:41 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On 2024/3/9 1:46, Jason Gunthorpe wrote:
> On Mon, Jan 22, 2024 at 03:38:57PM +0800, Lu Baolu wrote:
>> @@ -215,7 +202,23 @@ static struct iopf_group *iopf_group_alloc(struct iommu_fault_param *iopf_param,
>>   		group = abort_group;
>>   	}
>>   
>> +	cookie = iopf_pasid_cookie_get(iopf_param->dev, pasid);
>> +	if (!cookie && pasid != IOMMU_NO_PASID)
>> +		cookie = iopf_pasid_cookie_get(iopf_param->dev, IOMMU_NO_PASID);
>> +	if (IS_ERR(cookie) || !cookie) {
>> +		/*
>> +		 * The PASID of this device was not attached by an I/O-capable
>> +		 * domain. Ask the caller to abort handling of this fault.
>> +		 * Otherwise, the reference count will be switched to the new
>> +		 * iopf group and will be released in iopf_free_group().
>> +		 */
>> +		kfree(group);
>> +		group = abort_group;
>> +		cookie = NULL;
>> +	}
> 
> If this is the main point of the cookie mechansim - why not just have
> a refcount inside the domain itself?
> 
> I'm really having a hard time making sense of this cookie thing, we
> have a lifetime issue on the domain pointer, why is adding another
> structure the answer?

The whole cookie mechanism aims to address two things:

- Extend the domain lifetime until all pending page faults are resolved.
- Associate information about the iommu device with each attachment of
   the domain so that the iommufd device object ID could be quickly
   retrieved in the fault delivering path.


> I see we also need to stick a pointer in the domain for iommufd to get
> back to the hwpt, but that doesn't seem to need such a big system to
> accomplish - just add a void *. It would make sense for the domain to
> have some optional pointer to a struct for all the fault related data
> that becomes allocated when a PRI domain is created..

It's not getting back hwpt from domain, just getting the iommufd_device
in the fault delivering path. The iommufd_device is not per-domain, but
per-domain-attachment.

> 
> Also, I thought we'd basically said that domain detatch is supposed to
> flush any open PRIs before returning, what happened to that as a
> solution to the lifetime problem?

If I remember it correctly, what we discussed was to flush all pending
page faults when disabling PRI. Anyway, the current driver behavior is
to flush faults during domain detachment. And if we keep this behavior,
we no longer need to worry about domain lifetime anymore.

> 
> Regardless I think we should split this into two series - improve the PRI
> lifetime model for domains and then put iommufd on top of it..

Yes, agreed. Let's tackle those two points in a separate series.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 3/8] iommufd: Add fault and response message definitions
  2024-03-08 17:50   ` Jason Gunthorpe
@ 2024-03-14 13:41     ` Baolu Lu
  2024-03-22 17:04       ` Jason Gunthorpe
  0 siblings, 1 reply; 43+ messages in thread
From: Baolu Lu @ 2024-03-14 13:41 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On 2024/3/9 1:50, Jason Gunthorpe wrote:
> On Mon, Jan 22, 2024 at 03:38:58PM +0800, Lu Baolu wrote:
> 
>> +/**
>> + * enum iommu_hwpt_pgfault_flags - flags for struct iommu_hwpt_pgfault
>> + * @IOMMU_PGFAULT_FLAGS_PASID_VALID: The pasid field of the fault data is
>> + *                                   valid.
>> + * @IOMMU_PGFAULT_FLAGS_LAST_PAGE: It's the last fault of a fault group.
>> + */
>> +enum iommu_hwpt_pgfault_flags {
>> +	IOMMU_PGFAULT_FLAGS_PASID_VALID		= (1 << 0),
>> +	IOMMU_PGFAULT_FLAGS_LAST_PAGE		= (1 << 1),
>> +};
>> +
>> +/**
>> + * enum iommu_hwpt_pgfault_perm - perm bits for struct iommu_hwpt_pgfault
>> + * @IOMMU_PGFAULT_PERM_READ: request for read permission
>> + * @IOMMU_PGFAULT_PERM_WRITE: request for write permission
>> + * @IOMMU_PGFAULT_PERM_EXEC: request for execute permission
>> + * @IOMMU_PGFAULT_PERM_PRIV: request for privileged permission
> 
> You are going to have to elaborate what PRIV is for.. We don't have
> any concept of this in the UAPI for iommufd so what is a userspace
> supposed to do if it hits this? EXEC is similar, we can't actually
> enable exec permissions from userspace IIRC..

The PCIe spec, section "10.4.1 Page Request Message" and "6.20.2 PASID
Information Layout":

The PCI PASID TLP Prefix defines "Execute Requested" and "Privileged
Mode Requested" bits.

PERM_EXEC indicates a page request with a PASID that has the "Execute
Requested" bit set. Similarly, PERM_PRIV indicates a page request with a
  PASID that has "Privileged Mode Requested" bit set.

> 
>> +enum iommu_hwpt_pgfault_perm {
>> +	IOMMU_PGFAULT_PERM_READ			= (1 << 0),
>> +	IOMMU_PGFAULT_PERM_WRITE		= (1 << 1),
>> +	IOMMU_PGFAULT_PERM_EXEC			= (1 << 2),
>> +	IOMMU_PGFAULT_PERM_PRIV			= (1 << 3),
>> +};
>> +
>> +/**
>> + * struct iommu_hwpt_pgfault - iommu page fault data
>> + * @size: sizeof(struct iommu_hwpt_pgfault)
>> + * @flags: Combination of enum iommu_hwpt_pgfault_flags
>> + * @dev_id: id of the originated device
>> + * @pasid: Process Address Space ID
>> + * @grpid: Page Request Group Index
>> + * @perm: Combination of enum iommu_hwpt_pgfault_perm
>> + * @addr: page address
>> + */
>> +struct iommu_hwpt_pgfault {
>> +	__u32 size;
>> +	__u32 flags;
>> +	__u32 dev_id;
>> +	__u32 pasid;
>> +	__u32 grpid;
>> +	__u32 perm;
>> +	__u64 addr;
>> +};
> 
> Do we need an addr + size here? I've seen a few things where I wonder
> if that might become an enhancment someday.

I am not sure. The page size is not part of ATS/PRI. Can you please
elaborate a bit about how the size could be used? Perhaps I
misunderstood here?

> 
>> +/**
>> + * struct iommu_hwpt_page_response - IOMMU page fault response
>> + * @size: sizeof(struct iommu_hwpt_page_response)
>> + * @flags: Must be set to 0
>> + * @dev_id: device ID of target device for the response
>> + * @pasid: Process Address Space ID
>> + * @grpid: Page Request Group Index
>> + * @code: response code. The supported codes include:
>> + *        0: Successful; 1: Response Failure; 2: Invalid Request.
> 
> This should be an enum

Sure.

> 
>> + * @addr: The fault address. Must match the addr field of the
>> + *        last iommu_hwpt_pgfault of a reported iopf group.
>> + */
>> +struct iommu_hwpt_page_response {
>> +	__u32 size;
>> +	__u32 flags;
>> +	__u32 dev_id;
>> +	__u32 pasid;
>> +	__u32 grpid;
>> +	__u32 code;
>> +	__u64 addr;
>> +};
> 
> Do we want some kind of opaque ID value from the kernel here to match
> request with response exactly? Or is the plan to search on the addr?

I am using the "addr" as the opaque data to search request in this
series. Is it enough?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 5/8] iommufd: Associate fault object with iommufd_hw_pgtable
  2024-03-08 19:05   ` Jason Gunthorpe
@ 2024-03-15  1:16     ` Baolu Lu
  2024-03-22 17:06       ` Jason Gunthorpe
  0 siblings, 1 reply; 43+ messages in thread
From: Baolu Lu @ 2024-03-15  1:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On 3/9/24 3:05 AM, Jason Gunthorpe wrote:
> On Mon, Jan 22, 2024 at 03:39:00PM +0800, Lu Baolu wrote:
> 
>> @@ -411,6 +414,8 @@ enum iommu_hwpt_data_type {
>>    * @__reserved: Must be 0
>>    * @data_type: One of enum iommu_hwpt_data_type
>>    * @data_len: Length of the type specific data
>> + * @fault_id: The ID of IOMMUFD_FAULT object. Valid only if flags field of
>> + *            IOMMU_HWPT_FAULT_ID_VALID is set.
>>    * @data_uptr: User pointer to the type specific data
>>    *
>>    * Explicitly allocate a hardware page table object. This is the same object
>> @@ -441,6 +446,7 @@ struct iommu_hwpt_alloc {
>>   	__u32 __reserved;
>>   	__u32 data_type;
>>   	__u32 data_len;
>> +	__u32 fault_id;
>>   	__aligned_u64 data_uptr;
>>   };
> 
> ?? We can't add fault_id in the middle of the struct??

Yes. I should add the new field at the end.

By the way, with a __u32 added, this data structure is not 64-byte-
aligned anymore. Do we need to add another unused u32 entry, or just let
the compiler handle it?

> 
>> +	if (cmd->flags & IOMMU_HWPT_FAULT_ID_VALID) {
>> +		struct iommufd_fault *fault;
>> +
>> +		fault = iommufd_get_fault(ucmd, cmd->fault_id);
>> +		if (IS_ERR(fault)) {
>> +			rc = PTR_ERR(fault);
>> +			goto out_hwpt;
>> +		}
>> +		hwpt->fault = fault;
>> +		hwpt->domain->iopf_handler = iommufd_fault_iopf_handler;
>> +		hwpt->domain->fault_data = hwpt;
>> +		hwpt->fault_capable = true;
> 
> I wonder if there should be an iommu API to make a domain fault
> capable?

The iommu core identifies a fault-capable domain by checking its
domain->iopf_handler. Anyway, what's the difference between a fault or
non-fault capable domain from iommu core's point of view?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 4/8] iommufd: Add iommufd fault object
  2024-03-08 18:03   ` Jason Gunthorpe
@ 2024-03-15  1:46     ` Baolu Lu
  2024-03-22 17:09       ` Jason Gunthorpe
  0 siblings, 1 reply; 43+ messages in thread
From: Baolu Lu @ 2024-03-15  1:46 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On 3/9/24 2:03 AM, Jason Gunthorpe wrote:
> On Mon, Jan 22, 2024 at 03:38:59PM +0800, Lu Baolu wrote:
>> --- /dev/null
>> +++ b/drivers/iommu/iommufd/fault.c
>> @@ -0,0 +1,255 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/* Copyright (C) 2024 Intel Corporation
>> + */
>> +#define pr_fmt(fmt) "iommufd: " fmt
>> +
>> +#include <linux/file.h>
>> +#include <linux/fs.h>
>> +#include <linux/module.h>
>> +#include <linux/mutex.h>
>> +#include <linux/iommufd.h>
>> +#include <linux/poll.h>
>> +#include <linux/anon_inodes.h>
>> +#include <uapi/linux/iommufd.h>
>> +
>> +#include "iommufd_private.h"
>> +
>> +static int device_add_fault(struct iopf_group *group)
>> +{
>> +	struct iommufd_device *idev = group->cookie->private;
>> +	void *curr;
>> +
>> +	curr = xa_cmpxchg(&idev->faults, group->last_fault.fault.prm.grpid,
>> +			  NULL, group, GFP_KERNEL);
>> +
>> +	return curr ? xa_err(curr) : 0;
>> +}
>> +
>> +static void device_remove_fault(struct iopf_group *group)
>> +{
>> +	struct iommufd_device *idev = group->cookie->private;
>> +
>> +	xa_store(&idev->faults, group->last_fault.fault.prm.grpid,
>> +		 NULL, GFP_KERNEL);
> 
> xa_erase ?

Yes. Sure.

> Is grpid OK to use this way? Doesn't it come from the originating
> device?

The group ID is generated by the hardware. Here, we use it as an index
in the fault array to ensure it can be quickly retrieved in the page
fault response path.

>> +void iommufd_fault_destroy(struct iommufd_object *obj)
>> +{
>> +	struct iommufd_fault *fault = container_of(obj, struct iommufd_fault, obj);
>> +	struct iopf_group *group, *next;
>> +
>> +	mutex_lock(&fault->mutex);
>> +	list_for_each_entry_safe(group, next, &fault->deliver, node) {
>> +		list_del(&group->node);
>> +		iopf_group_response(group, IOMMU_PAGE_RESP_INVALID);
>> +		iopf_free_group(group);
>> +	}
>> +	list_for_each_entry_safe(group, next, &fault->response, node) {
>> +		list_del(&group->node);
>> +		device_remove_fault(group);
>> +		iopf_group_response(group, IOMMU_PAGE_RESP_INVALID);
>> +		iopf_free_group(group);
>> +	}
>> +	mutex_unlock(&fault->mutex);
>> +
>> +	mutex_destroy(&fault->mutex);
> 
> It is really weird to lock a mutex we are about to destroy? At this
> point the refcount on the iommufd_object is zero so there had better
> not be any other threads with access to this pointer!

You are right. I will remove the lock/unlock and add a comment instead.

> 
>> +static ssize_t iommufd_fault_fops_read(struct file *filep, char __user *buf,
>> +				       size_t count, loff_t *ppos)
>> +{
>> +	size_t fault_size = sizeof(struct iommu_hwpt_pgfault);
>> +	struct iommufd_fault *fault = filep->private_data;
>> +	struct iommu_hwpt_pgfault data;
>> +	struct iommufd_device *idev;
>> +	struct iopf_group *group;
>> +	struct iopf_fault *iopf;
>> +	size_t done = 0;
>> +	int rc;
>> +
>> +	if (*ppos || count % fault_size)
>> +		return -ESPIPE;
>> +
>> +	mutex_lock(&fault->mutex);
>> +	while (!list_empty(&fault->deliver) && count > done) {
>> +		group = list_first_entry(&fault->deliver,
>> +					 struct iopf_group, node);
>> +
>> +		if (list_count_nodes(&group->faults) * fault_size > count - done)
>> +			break;
>> +
>> +		idev = (struct iommufd_device *)group->cookie->private;
>> +		list_for_each_entry(iopf, &group->faults, list) {
>> +			iommufd_compose_fault_message(&iopf->fault, &data, idev);
>> +			rc = copy_to_user(buf + done, &data, fault_size);
>> +			if (rc)
>> +				goto err_unlock;
>> +			done += fault_size;
>> +		}
>> +
>> +		rc = device_add_fault(group);
> 
> See I wonder if this should be some xa_alloc or something instead of
> trying to use the grpid?

So this magic number will be passed to user space in the fault message.
And the user will then include this number in its response message. The
response message is valid only when the magic number matches. Do I get
you correctly?

> 
>> +	while (!list_empty(&fault->response) && count > done) {
>> +		rc = copy_from_user(&response, buf + done, response_size);
>> +		if (rc)
>> +			break;
>> +
>> +		idev = container_of(iommufd_get_object(fault->ictx,
>> +						       response.dev_id,
>> +						       IOMMUFD_OBJ_DEVICE),
>> +				    struct iommufd_device, obj);
> 
> It seems unfortunate we do this lookup for every iteration of the loop,
> I would lift it up and cache it..

Okay.

> 
>> +		if (IS_ERR(idev))
>> +			break;
>> +
>> +		group = device_get_fault(idev, response.grpid);
> 
> This looks locked wrong, it should hold the fault mutex here and we
> should probably do xchng to zero it at the same time we fetch it.

Okay, will fix it.

> 
>> +		if (!group ||
>> +		    response.addr != group->last_fault.fault.prm.addr ||
>> +		    ((group->last_fault.fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) &&
>> +		      response.pasid != group->last_fault.fault.prm.pasid)) {
> 
> Why? If grpid is unique then just trust it.

I was just considering that we should add some sanity check here to
ensure the response message is composed in the right way.

> 
>> +			iommufd_put_object(fault->ictx, &idev->obj);
>> +			break;
>> +		}
>> +
>> +		iopf_group_response(group, response.code);
>> +
>> +		mutex_lock(&fault->mutex);
>> +		list_del(&group->node);
>> +		mutex_unlock(&fault->mutex);
>> +
>> +		device_remove_fault(group);
>> +		iopf_free_group(group);
>> +		done += response_size;
>> +
>> +		iommufd_put_object(fault->ictx, &idev->obj);
>> +	}
>> +
>> +	return done;
>> +}
>> +
>> +static __poll_t iommufd_fault_fops_poll(struct file *filep,
>> +					struct poll_table_struct *wait)
>> +{
>> +	struct iommufd_fault *fault = filep->private_data;
>> +	__poll_t pollflags = 0;
>> +
>> +	poll_wait(filep, &fault->wait_queue, wait);
>> +	mutex_lock(&fault->mutex);
>> +	if (!list_empty(&fault->deliver))
>> +		pollflags = EPOLLIN | EPOLLRDNORM;
>> +	mutex_unlock(&fault->mutex);
>> +
>> +	return pollflags;
>> +}
>> +
>> +static const struct file_operations iommufd_fault_fops = {
>> +	.owner		= THIS_MODULE,
>> +	.open		= nonseekable_open,
>> +	.read		= iommufd_fault_fops_read,
>> +	.write		= iommufd_fault_fops_write,
>> +	.poll		= iommufd_fault_fops_poll,
>> +	.llseek		= no_llseek,
>> +};
> 
> No release? That seems wrong..

Yes. Will fix it.

> 
>> +static int get_fault_fd(struct iommufd_fault *fault)
>> +{
>> +	struct file *filep;
>> +	int fdno;
>> +
>> +	fdno = get_unused_fd_flags(O_CLOEXEC);
>> +	if (fdno < 0)
>> +		return fdno;
>> +
>> +	filep = anon_inode_getfile("[iommufd-pgfault]", &iommufd_fault_fops,
>> +				   fault, O_RDWR);
>                                     ^^^^^^
> 
> See here you stick the iommufd_object into the FD but we don't
> refcount it...
> 
> And iommufd_fault_destroy() doesn't do anything about this, so it just
> UAFs the fault memory in the FD.
> 
> You need to refcount the iommufd_object here and add a release
> function for the fd to put it back
> 
> *and* this FD needs to take a reference on the base iommufd_ctx fd too
> as we can't tolerate them being destroyed out of sequence.

Good catch. I will fix it.

> 
>> +int iommufd_fault_alloc(struct iommufd_ucmd *ucmd)
>> +{
>> +	struct iommu_fault_alloc *cmd = ucmd->cmd;
>> +	struct iommufd_fault *fault;
>> +	int rc;
>> +
>> +	if (cmd->flags)
>> +		return -EOPNOTSUPP;
>> +
>> +	fault = iommufd_object_alloc(ucmd->ictx, fault, IOMMUFD_OBJ_FAULT);
>> +	if (IS_ERR(fault))
>> +		return PTR_ERR(fault);
>> +
>> +	fault->ictx = ucmd->ictx;
>> +	INIT_LIST_HEAD(&fault->deliver);
>> +	INIT_LIST_HEAD(&fault->response);
>> +	mutex_init(&fault->mutex);
>> +	init_waitqueue_head(&fault->wait_queue);
>> +
>> +	rc = get_fault_fd(fault);
>> +	if (rc < 0)
>> +		goto out_abort;
> 
> And this is ordered wrong, fd_install has to happen after return to
> user space is guarenteed as it cannot be undone.
> 
>> +	cmd->out_fault_id = fault->obj.id;
>> +	cmd->out_fault_fd = rc;
>> +
>> +	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
>> +	if (rc)
>> +		goto out_abort;
>> +	iommufd_object_finalize(ucmd->ictx, &fault->obj);
> 
> fd_install() goes here

Yes. Sure.

> 
>> +	return 0;
>> +out_abort:
>> +	iommufd_object_abort_and_destroy(ucmd->ictx, &fault->obj);
>> +
>> +	return rc;
>> +}
> 
> Jason

Best regards,
baolu

^ permalink raw reply	[flat|nested] 43+ messages in thread

* RE: [PATCH v3 4/8] iommufd: Add iommufd fault object
  2024-01-22  7:38 ` [PATCH v3 4/8] iommufd: Add iommufd fault object Lu Baolu
  2024-03-08 18:03   ` Jason Gunthorpe
@ 2024-03-20 16:18   ` Shameerali Kolothum Thodi
  2024-03-22 17:22     ` Jason Gunthorpe
  1 sibling, 1 reply; 43+ messages in thread
From: Shameerali Kolothum Thodi @ 2024-03-20 16:18 UTC (permalink / raw)
  To: Lu Baolu, Jason Gunthorpe, Kevin Tian, Joerg Roedel, Will Deacon,
	Robin Murphy, Jean-Philippe Brucker, Nicolin Chen, Yi Liu,
	Jacob Pan, Joel Granados
  Cc: iommu, virtualization, linux-kernel



> -----Original Message-----
> From: Lu Baolu <baolu.lu@linux.intel.com>
> Sent: Monday, January 22, 2024 7:39 AM
> To: Jason Gunthorpe <jgg@ziepe.ca>; Kevin Tian <kevin.tian@intel.com>; Joerg
> Roedel <joro@8bytes.org>; Will Deacon <will@kernel.org>; Robin Murphy
> <robin.murphy@arm.com>; Jean-Philippe Brucker <jean-philippe@linaro.org>;
> Nicolin Chen <nicolinc@nvidia.com>; Yi Liu <yi.l.liu@intel.com>; Jacob Pan
> <jacob.jun.pan@linux.intel.com>; Joel Granados <j.granados@samsung.com>
> Cc: iommu@lists.linux.dev; virtualization@lists.linux-foundation.org; linux-
> kernel@vger.kernel.org; Lu Baolu <baolu.lu@linux.intel.com>
> Subject: [PATCH v3 4/8] iommufd: Add iommufd fault object
> 
> An iommufd fault object provides an interface for delivering I/O page
> faults to user space. These objects are created and destroyed by user
> space, and they can be associated with or dissociated from hardware page
> table objects during page table allocation or destruction.
> 
> User space interacts with the fault object through a file interface. This
> interface offers a straightforward and efficient way for user space to
> handle page faults. It allows user space to read fault messages
> sequentially and respond to them by writing to the same file. The file
> interface supports reading messages in poll mode, so it's recommended that
> user space applications use io_uring to enhance read and write efficiency.
> 
> A fault object can be associated with any iopf-capable iommufd_hw_pgtable
> during the pgtable's allocation. All I/O page faults triggered by devices
> when accessing the I/O addresses of an iommufd_hw_pgtable are routed
> through the fault object to user space. Similarly, user space's responses
> to these page faults are routed back to the iommu device driver through
> the same fault object.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>

[...]

> +static __poll_t iommufd_fault_fops_poll(struct file *filep,
> +					struct poll_table_struct *wait)
> +{
> +	struct iommufd_fault *fault = filep->private_data;
> +	__poll_t pollflags = 0;
> +
> +	poll_wait(filep, &fault->wait_queue, wait);
> +	mutex_lock(&fault->mutex);
> +	if (!list_empty(&fault->deliver))
> +		pollflags = EPOLLIN | EPOLLRDNORM;
> +	mutex_unlock(&fault->mutex);
> +
> +	return pollflags;
> +}
> +
> +static const struct file_operations iommufd_fault_fops = {
> +	.owner		= THIS_MODULE,
> +	.open		= nonseekable_open,
> +	.read		= iommufd_fault_fops_read,
> +	.write		= iommufd_fault_fops_write,
> +	.poll		= iommufd_fault_fops_poll,
> +	.llseek		= no_llseek,
> +};

Hi

I am trying to enable Qemu vSVA support on ARM with this series.
I am using io_uring APIs with the fault fd to handle the page fault
in the Qemu.

Please find the implementation here[1]. This is still a work in progress 
and is based on Nicolin's latest nested Qemu branch.

And I am running into a problem when we have the poll interface added
for the fault fd in kernel.

What I have noticed is that,
-read interface works fine and I can receive struct tiommu_hwpt_pgfault data.
-But once Guest handles the page faults and returns the page response,
 the write to fault fd never reaches the kernel. The sequence is like below,
 
  sqe = io_uring_get_sqe(ring);
  io_uring_prep_write(sqe, hwpt->fault_fd, resp, sizeof(*resp), 0);
  io_uring_sqe_set_data(sqe, resp);
  io_uring_submit(ring);
  ret = io_uring_wait_cqe(ring, &cqe); 
  ....
Please find the function here[2]

The above cqe wait never returns and hardware times out without receiving
page response. My understanding of io_uring default op is that it tries to 
issue an sqe as non-blocking first. But it looks like the above write sequence
ends up in kernel poll_wait() as well.Not sure how we can avoid that for
write.

All works fine if I comment out the poll for the fault_fd from the kernel.
But then of course Qemu ends up repeatedly reading the ring Queue for
any pending page fault.

It might be something I am missing in my understanding of io_uring APIs.
Just thought of checking with you if you have any Qemu implementation
using io_uring APIs to test this.

Also appreciate any pointers in resolving this.

Thanks,
Shameer
[1] https://github.com/hisilicon/qemu/tree/iommufd_vsmmu-02292024-vsva-wip
[2] https://github.com/hisilicon/qemu/blob/2b984fb5c692a03e6f5463d005670d2e2a2c7304/hw/arm/smmuv3.c#L1310


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 2/8] iommu/sva: Use iopf domain attach/detach interface
  2024-03-14  7:41     ` Baolu Lu
@ 2024-03-22 16:59       ` Jason Gunthorpe
  2024-03-25  3:52         ` Baolu Lu
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Gunthorpe @ 2024-03-22 16:59 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On Thu, Mar 14, 2024 at 03:41:23PM +0800, Baolu Lu wrote:

> The whole cookie mechanism aims to address two things:
> 
> - Extend the domain lifetime until all pending page faults are
> resolved.

Like you answered, I think the flush is a simpler scheme..

> - Associate information about the iommu device with each attachment of
>   the domain so that the iommufd device object ID could be quickly
>   retrieved in the fault delivering path.
 
> > I see we also need to stick a pointer in the domain for iommufd to get
> > back to the hwpt, but that doesn't seem to need such a big system to
> > accomplish - just add a void *. It would make sense for the domain to
> > have some optional pointer to a struct for all the fault related data
> > that becomes allocated when a PRI domain is created..
> 
> It's not getting back hwpt from domain, just getting the iommufd_device
> in the fault delivering path. The iommufd_device is not per-domain, but
> per-domain-attachment.

It does make sense you'd need that, but I think something like this is
a more direct way to get it. Caller allocates the handle struct. The
iopf will provide the handle from the XA to the
callback. container_of() not void * is used to in the caller's API.

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 68e648b5576706..0d29d8f0847cd9 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -46,6 +46,8 @@ static unsigned int iommu_def_domain_type __read_mostly;
 static bool iommu_dma_strict __read_mostly = IS_ENABLED(CONFIG_IOMMU_DEFAULT_DMA_STRICT);
 static u32 iommu_cmd_line __read_mostly;
 
+enum { IOMMU_PASID_ARRAY_DOMAIN, IOMMU_PASID_ARRAY_HANDLE };
+
 struct iommu_group {
 	struct kobject kobj;
 	struct kobject *devices_kobj;
@@ -3516,18 +3518,20 @@ static void __iommu_remove_group_pasid(struct iommu_group *group,
 }
 
 /*
- * iommu_attach_device_pasid() - Attach a domain to pasid of device
+ * __iommu_attach_device_pasid() - Attach a domain to pasid of device
  * @domain: the iommu domain.
  * @dev: the attached device.
  * @pasid: the pasid of the device.
  *
  * Return: 0 on success, or an error.
  */
-int iommu_attach_device_pasid(struct iommu_domain *domain,
-			      struct device *dev, ioasid_t pasid)
+int __iommu_attach_device_pasid(struct iommu_domain *domain, struct device *dev,
+				ioasid_t pasid,
+				struct iommu_pasid_handle *handle)
 {
 	/* Caller must be a probed driver on dev */
 	struct iommu_group *group = dev->iommu_group;
+	void *xa_entry;
 	void *curr;
 	int ret;
 
@@ -3541,9 +3545,14 @@ int iommu_attach_device_pasid(struct iommu_domain *domain,
 		return -EINVAL;
 
 	mutex_lock(&group->mutex);
-	curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain, GFP_KERNEL);
+	if (handle)
+		xa_entry = xa_tag_pointer(handle, IOMMU_PASID_ARRAY_HANDLE);
+	else
+		xa_entry = xa_tag_pointer(domain, IOMMU_PASID_ARRAY_DOMAIN);
+	curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, xa_entry,
+			  GFP_KERNEL);
 	if (curr) {
-		ret = xa_err(curr) ? : -EBUSY;
+		ret = xa_err(curr) ?: -EBUSY;
 		goto out_unlock;
 	}
 
@@ -3556,7 +3565,7 @@ int iommu_attach_device_pasid(struct iommu_domain *domain,
 	mutex_unlock(&group->mutex);
 	return ret;
 }
-EXPORT_SYMBOL_GPL(iommu_attach_device_pasid);
+EXPORT_SYMBOL_GPL(__iommu_attach_device_pasid);
 
 /*
  * iommu_detach_device_pasid() - Detach the domain from pasid of device
@@ -3600,13 +3609,23 @@ struct iommu_domain *iommu_get_domain_for_dev_pasid(struct device *dev,
 {
 	/* Caller must be a probed driver on dev */
 	struct iommu_group *group = dev->iommu_group;
+	struct iommu_pasid_handle *handle;
 	struct iommu_domain *domain;
+	void *xa_entry;
 
 	if (!group)
 		return NULL;
 
 	xa_lock(&group->pasid_array);
-	domain = xa_load(&group->pasid_array, pasid);
+	xa_entry = xa_load(&group->pasid_array, pasid);
+	if (xa_pointer_tag(xa_entry) == IOMMU_PASID_ARRAY_HANDLE) {
+		handle = xa_untag_pointer(xa_entry);
+		domain = handle->domain;
+	} else if (xa_pointer_tag(xa_entry) == IOMMU_PASID_ARRAY_DOMAIN) {
+		domain = xa_untag_pointer(xa_entry);
+	} else {
+		domain = NULL;
+	}
 	if (type && domain && domain->type != type)
 		domain = ERR_PTR(-EBUSY);
 	xa_unlock(&group->pasid_array);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 1ea2a820e1eb03..47c121d4e13f65 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -947,8 +947,27 @@ void iommu_device_release_dma_owner(struct device *dev);
 
 struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
 					    struct mm_struct *mm);
-int iommu_attach_device_pasid(struct iommu_domain *domain,
-			      struct device *dev, ioasid_t pasid);
+
+struct iommu_pasid_handle {
+	struct iommu_domain *domain;
+};
+
+int __iommu_attach_device_pasid(struct iommu_domain *domain, struct device *dev,
+				ioasid_t pasid,
+				struct iommu_pasid_handle *handle);
+
+static inline int iommu_attach_device_pasid(struct iommu_domain *domain,
+					    struct device *dev, ioasid_t pasid)
+{
+	return __iommu_attach_device_pasid(domain, dev, pasid, NULL);
+}
+static inline int
+iommu_attach_device_pasid_handle(struct iommu_pasid_handle *handle,
+				 struct device *dev, ioasid_t pasid)
+{
+	return __iommu_attach_device_pasid(handle->domain, dev, pasid, handle);
+}
+
 void iommu_detach_device_pasid(struct iommu_domain *domain,
 			       struct device *dev, ioasid_t pasid);
 struct iommu_domain *

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 3/8] iommufd: Add fault and response message definitions
  2024-03-14 13:41     ` Baolu Lu
@ 2024-03-22 17:04       ` Jason Gunthorpe
  2024-03-25  3:57         ` Baolu Lu
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Gunthorpe @ 2024-03-22 17:04 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On Thu, Mar 14, 2024 at 09:41:45PM +0800, Baolu Lu wrote:
> On 2024/3/9 1:50, Jason Gunthorpe wrote:
> > On Mon, Jan 22, 2024 at 03:38:58PM +0800, Lu Baolu wrote:
> > 
> > > +/**
> > > + * enum iommu_hwpt_pgfault_flags - flags for struct iommu_hwpt_pgfault
> > > + * @IOMMU_PGFAULT_FLAGS_PASID_VALID: The pasid field of the fault data is
> > > + *                                   valid.
> > > + * @IOMMU_PGFAULT_FLAGS_LAST_PAGE: It's the last fault of a fault group.
> > > + */
> > > +enum iommu_hwpt_pgfault_flags {
> > > +	IOMMU_PGFAULT_FLAGS_PASID_VALID		= (1 << 0),
> > > +	IOMMU_PGFAULT_FLAGS_LAST_PAGE		= (1 << 1),
> > > +};
> > > +
> > > +/**
> > > + * enum iommu_hwpt_pgfault_perm - perm bits for struct iommu_hwpt_pgfault
> > > + * @IOMMU_PGFAULT_PERM_READ: request for read permission
> > > + * @IOMMU_PGFAULT_PERM_WRITE: request for write permission
> > > + * @IOMMU_PGFAULT_PERM_EXEC: request for execute permission
> > > + * @IOMMU_PGFAULT_PERM_PRIV: request for privileged permission
> > 
> > You are going to have to elaborate what PRIV is for.. We don't have
> > any concept of this in the UAPI for iommufd so what is a userspace
> > supposed to do if it hits this? EXEC is similar, we can't actually
> > enable exec permissions from userspace IIRC..
> 
> The PCIe spec, section "10.4.1 Page Request Message" and "6.20.2 PASID
> Information Layout":
> 
> The PCI PASID TLP Prefix defines "Execute Requested" and "Privileged
> Mode Requested" bits.
> 
> PERM_EXEC indicates a page request with a PASID that has the "Execute
> Requested" bit set. Similarly, PERM_PRIV indicates a page request with a
>  PASID that has "Privileged Mode Requested" bit set.

Oh, I see! OK Maybe just add a note that it follows PCIE 10.4.1

> > > +struct iommu_hwpt_pgfault {
> > > +	__u32 size;
> > > +	__u32 flags;
> > > +	__u32 dev_id;
> > > +	__u32 pasid;
> > > +	__u32 grpid;
> > > +	__u32 perm;
> > > +	__u64 addr;
> > > +};
> > 
> > Do we need an addr + size here? I've seen a few things where I wonder
> > if that might become an enhancment someday.
> 
> I am not sure. The page size is not part of ATS/PRI. Can you please
> elaborate a bit about how the size could be used? Perhaps I
> misunderstood here?

size would be an advice how much data the requestor is expecting to
fetch. Eg of the PRI initiator knows it is going to do a 10MB transfer
it could fill in 10MB and the OS could pre-fault in 10MB of IOVA.

It is not in the spec, it may never be in the spec, but it seems like
it would be good to consider it, at least make sure we have
compatability to add it later.

> > > + * @addr: The fault address. Must match the addr field of the
> > > + *        last iommu_hwpt_pgfault of a reported iopf group.
> > > + */
> > > +struct iommu_hwpt_page_response {
> > > +	__u32 size;
> > > +	__u32 flags;
> > > +	__u32 dev_id;
> > > +	__u32 pasid;
> > > +	__u32 grpid;
> > > +	__u32 code;
> > > +	__u64 addr;
> > > +};
> > 
> > Do we want some kind of opaque ID value from the kernel here to match
> > request with response exactly? Or is the plan to search on the addr?
> 
> I am using the "addr" as the opaque data to search request in this
> series. Is it enough?

I'm not sure, the other discussion about grpid seems to be the main
question so lets see there.

Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 5/8] iommufd: Associate fault object with iommufd_hw_pgtable
  2024-03-15  1:16     ` Baolu Lu
@ 2024-03-22 17:06       ` Jason Gunthorpe
  2024-03-25  4:59         ` Baolu Lu
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Gunthorpe @ 2024-03-22 17:06 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On Fri, Mar 15, 2024 at 09:16:43AM +0800, Baolu Lu wrote:
> On 3/9/24 3:05 AM, Jason Gunthorpe wrote:
> > On Mon, Jan 22, 2024 at 03:39:00PM +0800, Lu Baolu wrote:
> > 
> > > @@ -411,6 +414,8 @@ enum iommu_hwpt_data_type {
> > >    * @__reserved: Must be 0
> > >    * @data_type: One of enum iommu_hwpt_data_type
> > >    * @data_len: Length of the type specific data
> > > + * @fault_id: The ID of IOMMUFD_FAULT object. Valid only if flags field of
> > > + *            IOMMU_HWPT_FAULT_ID_VALID is set.
> > >    * @data_uptr: User pointer to the type specific data
> > >    *
> > >    * Explicitly allocate a hardware page table object. This is the same object
> > > @@ -441,6 +446,7 @@ struct iommu_hwpt_alloc {
> > >   	__u32 __reserved;
> > >   	__u32 data_type;
> > >   	__u32 data_len;
> > > +	__u32 fault_id;
> > >   	__aligned_u64 data_uptr;
> > >   };
> > 
> > ?? We can't add fault_id in the middle of the struct??
> 
> Yes. I should add the new field at the end.
> 
> By the way, with a __u32 added, this data structure is not 64-byte-
> aligned anymore. Do we need to add another unused u32 entry, or just let
> the compiler handle it?

Yes, add a reserved u32 to ensure the structs is always without
implicit padding.

> > 
> > > +	if (cmd->flags & IOMMU_HWPT_FAULT_ID_VALID) {
> > > +		struct iommufd_fault *fault;
> > > +
> > > +		fault = iommufd_get_fault(ucmd, cmd->fault_id);
> > > +		if (IS_ERR(fault)) {
> > > +			rc = PTR_ERR(fault);
> > > +			goto out_hwpt;
> > > +		}
> > > +		hwpt->fault = fault;
> > > +		hwpt->domain->iopf_handler = iommufd_fault_iopf_handler;
> > > +		hwpt->domain->fault_data = hwpt;
> > > +		hwpt->fault_capable = true;
> > 
> > I wonder if there should be an iommu API to make a domain fault
> > capable?
> 
> The iommu core identifies a fault-capable domain by checking its
> domain->iopf_handler. Anyway, what's the difference between a fault or
> non-fault capable domain from iommu core's point of view?

From the core? Nothing. I'm just wondering from an API perspective if
we should have a little inline to indicate it.

Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 4/8] iommufd: Add iommufd fault object
  2024-03-15  1:46     ` Baolu Lu
@ 2024-03-22 17:09       ` Jason Gunthorpe
  2024-03-25  5:01         ` Baolu Lu
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Gunthorpe @ 2024-03-22 17:09 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On Fri, Mar 15, 2024 at 09:46:06AM +0800, Baolu Lu wrote:
> On 3/9/24 2:03 AM, Jason Gunthorpe wrote:
> > On Mon, Jan 22, 2024 at 03:38:59PM +0800, Lu Baolu wrote:
> > > --- /dev/null
> > > +++ b/drivers/iommu/iommufd/fault.c
> > > @@ -0,0 +1,255 @@
> > > +// SPDX-License-Identifier: GPL-2.0-only
> > > +/* Copyright (C) 2024 Intel Corporation
> > > + */
> > > +#define pr_fmt(fmt) "iommufd: " fmt
> > > +
> > > +#include <linux/file.h>
> > > +#include <linux/fs.h>
> > > +#include <linux/module.h>
> > > +#include <linux/mutex.h>
> > > +#include <linux/iommufd.h>
> > > +#include <linux/poll.h>
> > > +#include <linux/anon_inodes.h>
> > > +#include <uapi/linux/iommufd.h>
> > > +
> > > +#include "iommufd_private.h"
> > > +
> > > +static int device_add_fault(struct iopf_group *group)
> > > +{
> > > +	struct iommufd_device *idev = group->cookie->private;
> > > +	void *curr;
> > > +
> > > +	curr = xa_cmpxchg(&idev->faults, group->last_fault.fault.prm.grpid,
> > > +			  NULL, group, GFP_KERNEL);
> > > +
> > > +	return curr ? xa_err(curr) : 0;
> > > +}
> > > +
> > > +static void device_remove_fault(struct iopf_group *group)
> > > +{
> > > +	struct iommufd_device *idev = group->cookie->private;
> > > +
> > > +	xa_store(&idev->faults, group->last_fault.fault.prm.grpid,
> > > +		 NULL, GFP_KERNEL);
> > 
> > xa_erase ?
> 
> Yes. Sure.
> 
> > Is grpid OK to use this way? Doesn't it come from the originating
> > device?
> 
> The group ID is generated by the hardware. Here, we use it as an index
> in the fault array to ensure it can be quickly retrieved in the page
> fault response path.

I'm nervous about this, we are trusting HW outside the kernel to
provide unique grp id's which are integral to how the kernel
operates..

> > > +static ssize_t iommufd_fault_fops_read(struct file *filep, char __user *buf,
> > > +				       size_t count, loff_t *ppos)
> > > +{
> > > +	size_t fault_size = sizeof(struct iommu_hwpt_pgfault);
> > > +	struct iommufd_fault *fault = filep->private_data;
> > > +	struct iommu_hwpt_pgfault data;
> > > +	struct iommufd_device *idev;
> > > +	struct iopf_group *group;
> > > +	struct iopf_fault *iopf;
> > > +	size_t done = 0;
> > > +	int rc;
> > > +
> > > +	if (*ppos || count % fault_size)
> > > +		return -ESPIPE;
> > > +
> > > +	mutex_lock(&fault->mutex);
> > > +	while (!list_empty(&fault->deliver) && count > done) {
> > > +		group = list_first_entry(&fault->deliver,
> > > +					 struct iopf_group, node);
> > > +
> > > +		if (list_count_nodes(&group->faults) * fault_size > count - done)
> > > +			break;
> > > +
> > > +		idev = (struct iommufd_device *)group->cookie->private;
> > > +		list_for_each_entry(iopf, &group->faults, list) {
> > > +			iommufd_compose_fault_message(&iopf->fault, &data, idev);
> > > +			rc = copy_to_user(buf + done, &data, fault_size);
> > > +			if (rc)
> > > +				goto err_unlock;
> > > +			done += fault_size;
> > > +		}
> > > +
> > > +		rc = device_add_fault(group);
> > 
> > See I wonder if this should be some xa_alloc or something instead of
> > trying to use the grpid?
> 
> So this magic number will be passed to user space in the fault message.
> And the user will then include this number in its response message. The
> response message is valid only when the magic number matches. Do I get
> you correctly?

Yes, then it is simple xa_alloc() and xa_load() without any other
searching and we don't have to rely on the grpid to be correctly
formed by the PCI device.

But I don't know about performance xa_alloc() is pretty fast but
trusting the grpid would be faster..

IMHO from a uapi perspective we should have a definate "cookie" that
gets echo'd back. If the kernel uses xa_alloc or grpid to build that
cookie it doesn't matter to the uAPI.

Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 4/8] iommufd: Add iommufd fault object
  2024-03-20 16:18   ` Shameerali Kolothum Thodi
@ 2024-03-22 17:22     ` Jason Gunthorpe
  2024-03-25  3:26       ` Baolu Lu
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Gunthorpe @ 2024-03-22 17:22 UTC (permalink / raw)
  To: Shameerali Kolothum Thodi
  Cc: Lu Baolu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On Wed, Mar 20, 2024 at 04:18:05PM +0000, Shameerali Kolothum Thodi wrote:
> 
> What I have noticed is that,
> -read interface works fine and I can receive struct tiommu_hwpt_pgfault data.
> -But once Guest handles the page faults and returns the page response,
>  the write to fault fd never reaches the kernel. The sequence is like below,
>  
>   sqe = io_uring_get_sqe(ring);
>   io_uring_prep_write(sqe, hwpt->fault_fd, resp, sizeof(*resp), 0);
>   io_uring_sqe_set_data(sqe, resp);
>   io_uring_submit(ring);
>   ret = io_uring_wait_cqe(ring, &cqe); 
>   ....
> Please find the function here[2]
> 
> The above cqe wait never returns and hardware times out without receiving
> page response. My understanding of io_uring default op is that it tries to 
> issue an sqe as non-blocking first. But it looks like the above write sequence
> ends up in kernel poll_wait() as well.Not sure how we can avoid that for
> write.

Ah, right, it is because poll can't be choosy about read/write, it has
to work equally for both directions. iommufd_fault_fops_poll() never
returns EPOLLOUT

It should just always return EPOLLOUT because we don't have any queue
to manage.

Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 4/8] iommufd: Add iommufd fault object
  2024-03-22 17:22     ` Jason Gunthorpe
@ 2024-03-25  3:26       ` Baolu Lu
  2024-03-25  4:02         ` Baolu Lu
  0 siblings, 1 reply; 43+ messages in thread
From: Baolu Lu @ 2024-03-25  3:26 UTC (permalink / raw)
  To: Jason Gunthorpe, Shameerali Kolothum Thodi
  Cc: baolu.lu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On 3/23/24 1:22 AM, Jason Gunthorpe wrote:
> On Wed, Mar 20, 2024 at 04:18:05PM +0000, Shameerali Kolothum Thodi wrote:
>> What I have noticed is that,
>> -read interface works fine and I can receive struct tiommu_hwpt_pgfault data.
>> -But once Guest handles the page faults and returns the page response,
>>   the write to fault fd never reaches the kernel. The sequence is like below,
>>   
>>    sqe = io_uring_get_sqe(ring);
>>    io_uring_prep_write(sqe, hwpt->fault_fd, resp, sizeof(*resp), 0);
>>    io_uring_sqe_set_data(sqe, resp);
>>    io_uring_submit(ring);
>>    ret = io_uring_wait_cqe(ring, &cqe);
>>    ....
>> Please find the function here[2]
>>
>> The above cqe wait never returns and hardware times out without receiving
>> page response. My understanding of io_uring default op is that it tries to
>> issue an sqe as non-blocking first. But it looks like the above write sequence
>> ends up in kernel poll_wait() as well.Not sure how we can avoid that for
>> write.
> Ah, right, it is because poll can't be choosy about read/write, it has
> to work equally for both directions. iommufd_fault_fops_poll() never
> returns EPOLLOUT
> 
> It should just always return EPOLLOUT because we don't have any queue
> to manage.

Are you suggesting the poll file operation to be like below?

static __poll_t iommufd_fault_fops_poll(struct file *filep,
                                         struct poll_table_struct *wait)
{
         struct iommufd_fault *fault = filep->private_data;
         __poll_t pollflags = EPOLLOUT;

         poll_wait(filep, &fault->wait_queue, wait);
         mutex_lock(&fault->mutex);
         if (!list_empty(&fault->deliver))
                 pollflags = EPOLLIN | EPOLLRDNORM;
         mutex_unlock(&fault->mutex);

         return pollflags;
}

The diff is,

diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
index ede16702d433..a33f8aa92575 100644
--- a/drivers/iommu/iommufd/fault.c
+++ b/drivers/iommu/iommufd/fault.c
@@ -175,7 +175,7 @@ static __poll_t iommufd_fault_fops_poll(struct file 
*filep,
                                         struct poll_table_struct *wait)
  {
         struct iommufd_fault *fault = filep->private_data;
-       __poll_t pollflags = 0;
+       __poll_t pollflags = EPOLLOUT;

         poll_wait(filep, &fault->wait_queue, wait);
         mutex_lock(&fault->mutex);


I was originally thinking that poll file operation is specifically
designed for polling on read events associated with IOMMU faults.

Best regards,
baolu

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 2/8] iommu/sva: Use iopf domain attach/detach interface
  2024-03-22 16:59       ` Jason Gunthorpe
@ 2024-03-25  3:52         ` Baolu Lu
  0 siblings, 0 replies; 43+ messages in thread
From: Baolu Lu @ 2024-03-25  3:52 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel



On 3/23/24 12:59 AM, Jason Gunthorpe wrote:
> On Thu, Mar 14, 2024 at 03:41:23PM +0800, Baolu Lu wrote:
> 
>> The whole cookie mechanism aims to address two things:
>>
>> - Extend the domain lifetime until all pending page faults are
>> resolved.
> Like you answered, I think the flush is a simpler scheme..

Yeah! Let me head this direction.

> 
>> - Associate information about the iommu device with each attachment of
>>    the domain so that the iommufd device object ID could be quickly
>>    retrieved in the fault delivering path.
>   
>>> I see we also need to stick a pointer in the domain for iommufd to get
>>> back to the hwpt, but that doesn't seem to need such a big system to
>>> accomplish - just add a void *. It would make sense for the domain to
>>> have some optional pointer to a struct for all the fault related data
>>> that becomes allocated when a PRI domain is created..
>> It's not getting back hwpt from domain, just getting the iommufd_device
>> in the fault delivering path. The iommufd_device is not per-domain, but
>> per-domain-attachment.
> It does make sense you'd need that, but I think something like this is
> a more direct way to get it. Caller allocates the handle struct. The
> iopf will provide the handle from the XA to the
> callback. container_of() not void * is used to in the caller's API.

Your code looks attractive to me. It's much simpler. Thank you!

Best regards,
baolu

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 3/8] iommufd: Add fault and response message definitions
  2024-03-22 17:04       ` Jason Gunthorpe
@ 2024-03-25  3:57         ` Baolu Lu
  0 siblings, 0 replies; 43+ messages in thread
From: Baolu Lu @ 2024-03-25  3:57 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On 3/23/24 1:04 AM, Jason Gunthorpe wrote:
>>>> +struct iommu_hwpt_pgfault {
>>>> +	__u32 size;
>>>> +	__u32 flags;
>>>> +	__u32 dev_id;
>>>> +	__u32 pasid;
>>>> +	__u32 grpid;
>>>> +	__u32 perm;
>>>> +	__u64 addr;
>>>> +};
>>> Do we need an addr + size here? I've seen a few things where I wonder
>>> if that might become an enhancment someday.
>> I am not sure. The page size is not part of ATS/PRI. Can you please
>> elaborate a bit about how the size could be used? Perhaps I
>> misunderstood here?
> size would be an advice how much data the requestor is expecting to
> fetch. Eg of the PRI initiator knows it is going to do a 10MB transfer
> it could fill in 10MB and the OS could pre-fault in 10MB of IOVA.
> 
> It is not in the spec, it may never be in the spec, but it seems like
> it would be good to consider it, at least make sure we have
> compatability to add it later.

Thanks for the explanation. It sounds reasonable. I will take it and add
some comments to explain the motivation as you described above.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 4/8] iommufd: Add iommufd fault object
  2024-03-25  3:26       ` Baolu Lu
@ 2024-03-25  4:02         ` Baolu Lu
  0 siblings, 0 replies; 43+ messages in thread
From: Baolu Lu @ 2024-03-25  4:02 UTC (permalink / raw)
  To: Jason Gunthorpe, Shameerali Kolothum Thodi
  Cc: baolu.lu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On 3/25/24 11:26 AM, Baolu Lu wrote:
> On 3/23/24 1:22 AM, Jason Gunthorpe wrote:
>> On Wed, Mar 20, 2024 at 04:18:05PM +0000, Shameerali Kolothum Thodi 
>> wrote:
>>> What I have noticed is that,
>>> -read interface works fine and I can receive struct 
>>> tiommu_hwpt_pgfault data.
>>> -But once Guest handles the page faults and returns the page response,
>>>   the write to fault fd never reaches the kernel. The sequence is 
>>> like below,
>>>    sqe = io_uring_get_sqe(ring);
>>>    io_uring_prep_write(sqe, hwpt->fault_fd, resp, sizeof(*resp), 0);
>>>    io_uring_sqe_set_data(sqe, resp);
>>>    io_uring_submit(ring);
>>>    ret = io_uring_wait_cqe(ring, &cqe);
>>>    ....
>>> Please find the function here[2]
>>>
>>> The above cqe wait never returns and hardware times out without 
>>> receiving
>>> page response. My understanding of io_uring default op is that it 
>>> tries to
>>> issue an sqe as non-blocking first. But it looks like the above write 
>>> sequence
>>> ends up in kernel poll_wait() as well.Not sure how we can avoid that for
>>> write.
>> Ah, right, it is because poll can't be choosy about read/write, it has
>> to work equally for both directions. iommufd_fault_fops_poll() never
>> returns EPOLLOUT
>>
>> It should just always return EPOLLOUT because we don't have any queue
>> to manage.
> 
> Are you suggesting the poll file operation to be like below?
> 
> static __poll_t iommufd_fault_fops_poll(struct file *filep,
>                                          struct poll_table_struct *wait)
> {
>          struct iommufd_fault *fault = filep->private_data;
>          __poll_t pollflags = EPOLLOUT;
> 
>          poll_wait(filep, &fault->wait_queue, wait);
>          mutex_lock(&fault->mutex);
>          if (!list_empty(&fault->deliver))
>                  pollflags = EPOLLIN | EPOLLRDNORM;

If I understood it correctly, here it should be,

		pollflags |= EPOLLIN | EPOLLRDNORM;

>          mutex_unlock(&fault->mutex);
> 
>          return pollflags;
> }

Best regards,
baolu

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 5/8] iommufd: Associate fault object with iommufd_hw_pgtable
  2024-03-22 17:06       ` Jason Gunthorpe
@ 2024-03-25  4:59         ` Baolu Lu
  0 siblings, 0 replies; 43+ messages in thread
From: Baolu Lu @ 2024-03-25  4:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On 2024/3/23 1:06, Jason Gunthorpe wrote:
> On Fri, Mar 15, 2024 at 09:16:43AM +0800, Baolu Lu wrote:
>> On 3/9/24 3:05 AM, Jason Gunthorpe wrote:
>>> On Mon, Jan 22, 2024 at 03:39:00PM +0800, Lu Baolu wrote:
>>>
>>>> @@ -411,6 +414,8 @@ enum iommu_hwpt_data_type {
>>>>     * @__reserved: Must be 0
>>>>     * @data_type: One of enum iommu_hwpt_data_type
>>>>     * @data_len: Length of the type specific data
>>>> + * @fault_id: The ID of IOMMUFD_FAULT object. Valid only if flags field of
>>>> + *            IOMMU_HWPT_FAULT_ID_VALID is set.
>>>>     * @data_uptr: User pointer to the type specific data
>>>>     *
>>>>     * Explicitly allocate a hardware page table object. This is the same object
>>>> @@ -441,6 +446,7 @@ struct iommu_hwpt_alloc {
>>>>    	__u32 __reserved;
>>>>    	__u32 data_type;
>>>>    	__u32 data_len;
>>>> +	__u32 fault_id;
>>>>    	__aligned_u64 data_uptr;
>>>>    };
>>>
>>> ?? We can't add fault_id in the middle of the struct??
>>
>> Yes. I should add the new field at the end.
>>
>> By the way, with a __u32 added, this data structure is not 64-byte-
>> aligned anymore. Do we need to add another unused u32 entry, or just let
>> the compiler handle it?
> 
> Yes, add a reserved u32 to ensure the structs is always without
> implicit padding.

Sure.

> 
>>>
>>>> +	if (cmd->flags & IOMMU_HWPT_FAULT_ID_VALID) {
>>>> +		struct iommufd_fault *fault;
>>>> +
>>>> +		fault = iommufd_get_fault(ucmd, cmd->fault_id);
>>>> +		if (IS_ERR(fault)) {
>>>> +			rc = PTR_ERR(fault);
>>>> +			goto out_hwpt;
>>>> +		}
>>>> +		hwpt->fault = fault;
>>>> +		hwpt->domain->iopf_handler = iommufd_fault_iopf_handler;
>>>> +		hwpt->domain->fault_data = hwpt;
>>>> +		hwpt->fault_capable = true;
>>>
>>> I wonder if there should be an iommu API to make a domain fault
>>> capable?
>>
>> The iommu core identifies a fault-capable domain by checking its
>> domain->iopf_handler. Anyway, what's the difference between a fault or
>> non-fault capable domain from iommu core's point of view?
> 
>  From the core? Nothing. I'm just wondering from an API perspective if
> we should have a little inline to indicate it.

I have no objection if there's a consumer for it.

Best regards,
baolu


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 4/8] iommufd: Add iommufd fault object
  2024-03-22 17:09       ` Jason Gunthorpe
@ 2024-03-25  5:01         ` Baolu Lu
  0 siblings, 0 replies; 43+ messages in thread
From: Baolu Lu @ 2024-03-25  5:01 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, Kevin Tian, Joerg Roedel, Will Deacon, Robin Murphy,
	Jean-Philippe Brucker, Nicolin Chen, Yi Liu, Jacob Pan,
	Joel Granados, iommu, virtualization, linux-kernel

On 2024/3/23 1:09, Jason Gunthorpe wrote:
> On Fri, Mar 15, 2024 at 09:46:06AM +0800, Baolu Lu wrote:
>> On 3/9/24 2:03 AM, Jason Gunthorpe wrote:
>>> On Mon, Jan 22, 2024 at 03:38:59PM +0800, Lu Baolu wrote:
>>>> --- /dev/null
>>>> +++ b/drivers/iommu/iommufd/fault.c
>>>> @@ -0,0 +1,255 @@
>>>> +// SPDX-License-Identifier: GPL-2.0-only
>>>> +/* Copyright (C) 2024 Intel Corporation
>>>> + */
>>>> +#define pr_fmt(fmt) "iommufd: " fmt
>>>> +
>>>> +#include <linux/file.h>
>>>> +#include <linux/fs.h>
>>>> +#include <linux/module.h>
>>>> +#include <linux/mutex.h>
>>>> +#include <linux/iommufd.h>
>>>> +#include <linux/poll.h>
>>>> +#include <linux/anon_inodes.h>
>>>> +#include <uapi/linux/iommufd.h>
>>>> +
>>>> +#include "iommufd_private.h"
>>>> +
>>>> +static int device_add_fault(struct iopf_group *group)
>>>> +{
>>>> +	struct iommufd_device *idev = group->cookie->private;
>>>> +	void *curr;
>>>> +
>>>> +	curr = xa_cmpxchg(&idev->faults, group->last_fault.fault.prm.grpid,
>>>> +			  NULL, group, GFP_KERNEL);
>>>> +
>>>> +	return curr ? xa_err(curr) : 0;
>>>> +}
>>>> +
>>>> +static void device_remove_fault(struct iopf_group *group)
>>>> +{
>>>> +	struct iommufd_device *idev = group->cookie->private;
>>>> +
>>>> +	xa_store(&idev->faults, group->last_fault.fault.prm.grpid,
>>>> +		 NULL, GFP_KERNEL);
>>>
>>> xa_erase ?
>>
>> Yes. Sure.
>>
>>> Is grpid OK to use this way? Doesn't it come from the originating
>>> device?
>>
>> The group ID is generated by the hardware. Here, we use it as an index
>> in the fault array to ensure it can be quickly retrieved in the page
>> fault response path.
> 
> I'm nervous about this, we are trusting HW outside the kernel to
> provide unique grp id's which are integral to how the kernel
> operates..

Agreed.

> 
>>>> +static ssize_t iommufd_fault_fops_read(struct file *filep, char __user *buf,
>>>> +				       size_t count, loff_t *ppos)
>>>> +{
>>>> +	size_t fault_size = sizeof(struct iommu_hwpt_pgfault);
>>>> +	struct iommufd_fault *fault = filep->private_data;
>>>> +	struct iommu_hwpt_pgfault data;
>>>> +	struct iommufd_device *idev;
>>>> +	struct iopf_group *group;
>>>> +	struct iopf_fault *iopf;
>>>> +	size_t done = 0;
>>>> +	int rc;
>>>> +
>>>> +	if (*ppos || count % fault_size)
>>>> +		return -ESPIPE;
>>>> +
>>>> +	mutex_lock(&fault->mutex);
>>>> +	while (!list_empty(&fault->deliver) && count > done) {
>>>> +		group = list_first_entry(&fault->deliver,
>>>> +					 struct iopf_group, node);
>>>> +
>>>> +		if (list_count_nodes(&group->faults) * fault_size > count - done)
>>>> +			break;
>>>> +
>>>> +		idev = (struct iommufd_device *)group->cookie->private;
>>>> +		list_for_each_entry(iopf, &group->faults, list) {
>>>> +			iommufd_compose_fault_message(&iopf->fault, &data, idev);
>>>> +			rc = copy_to_user(buf + done, &data, fault_size);
>>>> +			if (rc)
>>>> +				goto err_unlock;
>>>> +			done += fault_size;
>>>> +		}
>>>> +
>>>> +		rc = device_add_fault(group);
>>>
>>> See I wonder if this should be some xa_alloc or something instead of
>>> trying to use the grpid?
>>
>> So this magic number will be passed to user space in the fault message.
>> And the user will then include this number in its response message. The
>> response message is valid only when the magic number matches. Do I get
>> you correctly?
> 
> Yes, then it is simple xa_alloc() and xa_load() without any other
> searching and we don't have to rely on the grpid to be correctly
> formed by the PCI device.
> 
> But I don't know about performance xa_alloc() is pretty fast but
> trusting the grpid would be faster..
> 
> IMHO from a uapi perspective we should have a definate "cookie" that
> gets echo'd back. If the kernel uses xa_alloc or grpid to build that
> cookie it doesn't matter to the uAPI.

Okay, I will head in this direction.

Best regards,
baolu


^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2024-03-25  5:01 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-22  7:38 [PATCH v3 0/8] IOMMUFD: Deliver IO page faults to user space Lu Baolu
2024-01-22  7:38 ` [PATCH v3 1/8] iommu: Add iopf domain attach/detach/replace interface Lu Baolu
2024-02-07  8:11   ` Tian, Kevin
2024-02-21  5:52     ` Baolu Lu
2024-02-21  6:49       ` Tian, Kevin
2024-02-21  7:21         ` Baolu Lu
2024-02-21  7:22           ` Tian, Kevin
2024-01-22  7:38 ` [PATCH v3 2/8] iommu/sva: Use iopf domain attach/detach interface Lu Baolu
2024-03-08 17:46   ` Jason Gunthorpe
2024-03-14  7:41     ` Baolu Lu
2024-03-22 16:59       ` Jason Gunthorpe
2024-03-25  3:52         ` Baolu Lu
2024-01-22  7:38 ` [PATCH v3 3/8] iommufd: Add fault and response message definitions Lu Baolu
2024-03-08 17:50   ` Jason Gunthorpe
2024-03-14 13:41     ` Baolu Lu
2024-03-22 17:04       ` Jason Gunthorpe
2024-03-25  3:57         ` Baolu Lu
2024-01-22  7:38 ` [PATCH v3 4/8] iommufd: Add iommufd fault object Lu Baolu
2024-03-08 18:03   ` Jason Gunthorpe
2024-03-15  1:46     ` Baolu Lu
2024-03-22 17:09       ` Jason Gunthorpe
2024-03-25  5:01         ` Baolu Lu
2024-03-20 16:18   ` Shameerali Kolothum Thodi
2024-03-22 17:22     ` Jason Gunthorpe
2024-03-25  3:26       ` Baolu Lu
2024-03-25  4:02         ` Baolu Lu
2024-01-22  7:39 ` [PATCH v3 5/8] iommufd: Associate fault object with iommufd_hw_pgtable Lu Baolu
2024-02-07  8:14   ` Tian, Kevin
2024-02-21  6:06     ` Baolu Lu
2024-03-02  2:36   ` Zhangfei Gao
2024-03-06 15:15     ` Zhangfei Gao
2024-03-06 16:01       ` Jason Gunthorpe
2024-03-07  1:54         ` Baolu Lu
2024-03-08 17:19           ` Jason Gunthorpe
2024-03-08 19:05   ` Jason Gunthorpe
2024-03-15  1:16     ` Baolu Lu
2024-03-22 17:06       ` Jason Gunthorpe
2024-03-25  4:59         ` Baolu Lu
2024-01-22  7:39 ` [PATCH v3 6/8] iommufd: IOPF-capable hw page table attach/detach/replace Lu Baolu
     [not found]   ` <CGME20240220135755eucas1p24e882cb5a735caed4bbfedb22a811442@eucas1p2.samsung.com>
2024-02-20 13:57     ` Joel Granados
2024-02-21  6:15       ` Baolu Lu
2024-01-22  7:39 ` [PATCH v3 7/8] iommufd/selftest: Add IOPF support for mock device Lu Baolu
2024-01-22  7:39 ` [PATCH v3 8/8] iommufd/selftest: Add coverage for IOPF test Lu Baolu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.