linux-kselftest.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/17] Add iommufd physical device operations for replace and alloc hwpt
@ 2023-03-21 19:14 Jason Gunthorpe
  2023-03-21 19:14 ` [PATCH v3 01/17] iommufd: Move isolated msi enforcement to iommufd_device_bind() Jason Gunthorpe
                   ` (17 more replies)
  0 siblings, 18 replies; 46+ messages in thread
From: Jason Gunthorpe @ 2023-03-21 19:14 UTC (permalink / raw)
  To: iommu, Kevin Tian, linux-kselftest; +Cc: Lu Baolu, kvm, Nicolin Chen, Yi Liu

This is the basic functionality for iommufd to support
iommufd_device_replace() and IOMMU_HWPT_ALLOC for physical devices.

iommufd_device_replace() allows changing the HWPT associated with the
device to a new IOAS or HWPT. Replace does this in way that failure leaves
things unchanged, and utilizes the iommu iommu_group_replace_domain() API
to allow the iommu driver to perform an optional non-disruptive change.

IOMMU_HWPT_ALLOC allows HWPTs to be explicitly allocated by the user and
used by attach or replace. At this point it isn't very useful since the
HWPT is the same as the automatically managed HWPT from the IOAS. However
a following series will allow userspace to customize the created HWPT.

The implementation is complicated because we have to introduce some
per-iommu_group memory in iommufd and redo how we think about multi-device
groups to be more explicit. This solves all the locking problems in the
prior attempts.

This series is infrastructure work for the following series which:
 - Add replace for attach
 - Expose replace through VFIO APIs
 - Implement driver parameters for HWPT creation (nesting)

Once review of this is complete I will keep it on a side branch and
accumulate the following series when they are ready so we can have a
stable base and make more incremental progress. When we have all the parts
together to get a full implementation it can go to Linus.

I have this on github:

https://github.com/jgunthorpe/linux/commits/iommufd_hwpt

v3:
 - Refine comments and commit messages
 - Adjust the flow in iommufd_device_auto_get_domain() so pt_id is only
   set on success
 - Reject replace on non-attached devices
 - Add missing __reserved check for IOMMU_HWPT_ALLOC
v2: https://lore.kernel.org/r/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia.com
 - Use WARN_ON for the igroup->group test and move that logic to a
   function iommufd_group_try_get()
 - Change igroup->devices to igroup->device list
   Replace will need to iterate over all attached idevs
 - Rename to iommufd_group_setup_msi()
 - New patch to export iommu_get_resv_regions()
 - New patch to use per-device reserved regions instead of per-group
   regions
 - Split out the reorganizing of iommufd_device_change_pt() from the
   replace patch
 - Replace uses the per-dev reserved regions
 - Use stdev_id in a few more places in the selftest
 - Fix error handling in IOMMU_HWPT_ALLOC
 - Clarify comments
 - Rebase on v6.3-rc1
v1: https://lore.kernel.org/all/0-v1-7612f88c19f5+2f21-iommufd_alloc_jgg@nvidia.com/

Jason Gunthorpe (15):
  iommufd: Move isolated msi enforcement to iommufd_device_bind()
  iommufd: Add iommufd_group
  iommufd: Replace the hwpt->devices list with iommufd_group
  iommu: Export iommu_get_resv_regions()
  iommufd: Keep track of each device's reserved regions instead of
    groups
  iommufd: Use the iommufd_group to avoid duplicate MSI setup
  iommufd: Make sw_msi_start a group global
  iommufd: Move putting a hwpt to a helper function
  iommufd: Add enforced_cache_coherency to iommufd_hw_pagetable_alloc()
  iommufd: Reorganize iommufd_device_attach into
    iommufd_device_change_pt
  iommufd: Add iommufd_device_replace()
  iommufd: Make destroy_rwsem use a lock class per object type
  iommufd: Add IOMMU_HWPT_ALLOC
  iommufd/selftest: Return the real idev id from selftest mock_domain
  iommufd/selftest: Add a selftest for IOMMU_HWPT_ALLOC

Nicolin Chen (2):
  iommu: Introduce a new iommu_group_replace_domain() API
  iommufd/selftest: Test iommufd_device_replace()

 drivers/iommu/iommu-priv.h                    |  10 +
 drivers/iommu/iommu.c                         |  41 +-
 drivers/iommu/iommufd/device.c                | 512 +++++++++++++-----
 drivers/iommu/iommufd/hw_pagetable.c          |  96 +++-
 drivers/iommu/iommufd/io_pagetable.c          |  27 +-
 drivers/iommu/iommufd/iommufd_private.h       |  51 +-
 drivers/iommu/iommufd/iommufd_test.h          |   6 +
 drivers/iommu/iommufd/main.c                  |  17 +-
 drivers/iommu/iommufd/selftest.c              |  40 ++
 include/linux/iommufd.h                       |   1 +
 include/uapi/linux/iommufd.h                  |  26 +
 tools/testing/selftests/iommu/iommufd.c       |  64 ++-
 .../selftests/iommu/iommufd_fail_nth.c        |  52 +-
 tools/testing/selftests/iommu/iommufd_utils.h |  61 ++-
 14 files changed, 804 insertions(+), 200 deletions(-)
 create mode 100644 drivers/iommu/iommu-priv.h


base-commit: fd8c1a4aee973e87d890a5861e106625a33b2c4e
-- 
2.40.0


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v3 01/17] iommufd: Move isolated msi enforcement to iommufd_device_bind()
  2023-03-21 19:14 [PATCH v3 00/17] Add iommufd physical device operations for replace and alloc hwpt Jason Gunthorpe
@ 2023-03-21 19:14 ` Jason Gunthorpe
  2023-03-21 19:14 ` [PATCH v3 02/17] iommufd: Add iommufd_group Jason Gunthorpe
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Jason Gunthorpe @ 2023-03-21 19:14 UTC (permalink / raw)
  To: iommu, Kevin Tian, linux-kselftest; +Cc: Lu Baolu, kvm, Nicolin Chen, Yi Liu

With the recent rework this no longer needs to be done at domain
attachment time, we know if the device is usable by iommufd when we bind
it.

The value of msi_device_has_isolated_msi() is not allowed to change while
a driver is bound.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommufd/device.c | 38 ++++++++++++++++++----------------
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index c6f4852a8a0c08..e66303c17c894c 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -60,6 +60,26 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
 	if (!group)
 		return ERR_PTR(-ENODEV);
 
+	/*
+	 * For historical compat with VFIO the insecure interrupt path is
+	 * allowed if the module parameter is set. Secure/Isolated means that a
+	 * MemWr operation from the device (eg a simple DMA) cannot trigger an
+	 * interrupt outside this iommufd context.
+	 */
+	if (!iommufd_selftest_is_mock_dev(dev) &&
+	    !iommu_group_has_isolated_msi(group)) {
+		if (!allow_unsafe_interrupts) {
+			rc = -EPERM;
+			goto out_group_put;
+		}
+
+		dev_warn(
+			dev,
+			"MSI interrupts are not secure, they cannot be isolated by the platform. "
+			"Check that platform features like interrupt remapping are enabled. "
+			"Use the \"allow_unsafe_interrupts\" module parameter to override\n");
+	}
+
 	rc = iommu_device_claim_dma_owner(dev, ictx);
 	if (rc)
 		goto out_group_put;
@@ -146,24 +166,6 @@ static int iommufd_device_setup_msi(struct iommufd_device *idev,
 		 */
 		hwpt->msi_cookie = true;
 	}
-
-	/*
-	 * For historical compat with VFIO the insecure interrupt path is
-	 * allowed if the module parameter is set. Insecure means that a MemWr
-	 * operation from the device (eg a simple DMA) cannot trigger an
-	 * interrupt outside this iommufd context.
-	 */
-	if (!iommufd_selftest_is_mock_dev(idev->dev) &&
-	    !iommu_group_has_isolated_msi(idev->group)) {
-		if (!allow_unsafe_interrupts)
-			return -EPERM;
-
-		dev_warn(
-			idev->dev,
-			"MSI interrupts are not secure, they cannot be isolated by the platform. "
-			"Check that platform features like interrupt remapping are enabled. "
-			"Use the \"allow_unsafe_interrupts\" module parameter to override\n");
-	}
 	return 0;
 }
 
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v3 02/17] iommufd: Add iommufd_group
  2023-03-21 19:14 [PATCH v3 00/17] Add iommufd physical device operations for replace and alloc hwpt Jason Gunthorpe
  2023-03-21 19:14 ` [PATCH v3 01/17] iommufd: Move isolated msi enforcement to iommufd_device_bind() Jason Gunthorpe
@ 2023-03-21 19:14 ` Jason Gunthorpe
  2023-03-21 19:14 ` [PATCH v3 03/17] iommufd: Replace the hwpt->devices list with iommufd_group Jason Gunthorpe
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Jason Gunthorpe @ 2023-03-21 19:14 UTC (permalink / raw)
  To: iommu, Kevin Tian, linux-kselftest; +Cc: Lu Baolu, kvm, Nicolin Chen, Yi Liu

When the hwpt to device attachment is fairly static we could get away with
the simple approach of keeping track of the groups via a device list. But
with replace this is infeasible.

Add an automatically managed struct that is 1:1 with the iommu_group
per-ictx so we can store the necessary tracking information there.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommufd/device.c          | 141 +++++++++++++++++++++---
 drivers/iommu/iommufd/iommufd_private.h |   9 +-
 drivers/iommu/iommufd/main.c            |   2 +
 3 files changed, 135 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index e66303c17c894c..3fd623208c691f 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -15,13 +15,121 @@ MODULE_PARM_DESC(
 	"Allow IOMMUFD to bind to devices even if the platform cannot isolate "
 	"the MSI interrupt window. Enabling this is a security weakness.");
 
+static void iommufd_group_release(struct kref *kref)
+{
+	struct iommufd_group *igroup =
+		container_of(kref, struct iommufd_group, ref);
+
+	xa_cmpxchg(&igroup->ictx->groups, iommu_group_id(igroup->group), igroup,
+		   NULL, GFP_KERNEL);
+	iommu_group_put(igroup->group);
+	kfree(igroup);
+}
+
+static void iommufd_put_group(struct iommufd_group *group)
+{
+	kref_put(&group->ref, iommufd_group_release);
+}
+
+static bool iommufd_group_try_get(struct iommufd_group *igroup,
+				  struct iommu_group *group)
+{
+	if (!igroup)
+		return false;
+	/*
+	 * group ID's cannot be re-used until the group is put back which does
+	 * not happen if we could get an igroup pointer under the xa_lock.
+	 */
+	if (WARN_ON(igroup->group != group))
+		return false;
+	return kref_get_unless_zero(&igroup->ref);
+}
+
+/*
+ * iommufd needs to store some more data for each iommu_group, we keep a
+ * parallel xarray indexed by iommu_group id to hold this instead of putting it
+ * in the core structure. To keep things simple the iommufd_group memory is
+ * unique within the iommufd_ctx. This makes it easy to check there are no
+ * memory leaks.
+ */
+static struct iommufd_group *iommufd_get_group(struct iommufd_ctx *ictx,
+					       struct device *dev)
+{
+	struct iommufd_group *new_igroup;
+	struct iommufd_group *cur_igroup;
+	struct iommufd_group *igroup;
+	struct iommu_group *group;
+	unsigned int id;
+
+	group = iommu_group_get(dev);
+	if (!group)
+		return ERR_PTR(-ENODEV);
+
+	id = iommu_group_id(group);
+
+	xa_lock(&ictx->groups);
+	igroup = xa_load(&ictx->groups, id);
+	if (iommufd_group_try_get(igroup, group)) {
+		xa_unlock(&ictx->groups);
+		iommu_group_put(group);
+		return igroup;
+	}
+	xa_unlock(&ictx->groups);
+
+	new_igroup = kzalloc(sizeof(*new_igroup), GFP_KERNEL);
+	if (!new_igroup) {
+		iommu_group_put(group);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	kref_init(&new_igroup->ref);
+	/* group reference moves into new_igroup */
+	new_igroup->group = group;
+
+	/*
+	 * The ictx is not additionally refcounted here becase all objects using
+	 * an igroup must put it before their destroy completes.
+	 */
+	new_igroup->ictx = ictx;
+
+	/*
+	 * We dropped the lock so igroup is invalid. NULL is a safe and likely
+	 * value to assume for the xa_cmpxchg algorithm.
+	 */
+	cur_igroup = NULL;
+	xa_lock(&ictx->groups);
+	while (true) {
+		igroup = __xa_cmpxchg(&ictx->groups, id, cur_igroup, new_igroup,
+				      GFP_KERNEL);
+		if (xa_is_err(igroup)) {
+			xa_unlock(&ictx->groups);
+			iommufd_put_group(new_igroup);
+			return ERR_PTR(xa_err(igroup));
+		}
+
+		/* new_group was successfully installed */
+		if (cur_igroup == igroup) {
+			xa_unlock(&ictx->groups);
+			return new_igroup;
+		}
+
+		/* Check again if the current group is any good */
+		if (iommufd_group_try_get(igroup, group)) {
+			xa_unlock(&ictx->groups);
+			iommufd_put_group(new_igroup);
+			return igroup;
+		}
+		cur_igroup = igroup;
+	}
+}
+
 void iommufd_device_destroy(struct iommufd_object *obj)
 {
 	struct iommufd_device *idev =
 		container_of(obj, struct iommufd_device, obj);
 
 	iommu_device_release_dma_owner(idev->dev);
-	iommu_group_put(idev->group);
+	iommufd_put_group(idev->igroup);
 	if (!iommufd_selftest_is_mock_dev(idev->dev))
 		iommufd_ctx_put(idev->ictx);
 }
@@ -46,7 +154,7 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
 					   struct device *dev, u32 *id)
 {
 	struct iommufd_device *idev;
-	struct iommu_group *group;
+	struct iommufd_group *igroup;
 	int rc;
 
 	/*
@@ -56,9 +164,9 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
 	if (!device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY))
 		return ERR_PTR(-EINVAL);
 
-	group = iommu_group_get(dev);
-	if (!group)
-		return ERR_PTR(-ENODEV);
+	igroup = iommufd_get_group(ictx, dev);
+	if (IS_ERR(igroup))
+		return ERR_CAST(igroup);
 
 	/*
 	 * For historical compat with VFIO the insecure interrupt path is
@@ -67,7 +175,7 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
 	 * interrupt outside this iommufd context.
 	 */
 	if (!iommufd_selftest_is_mock_dev(dev) &&
-	    !iommu_group_has_isolated_msi(group)) {
+	    !iommu_group_has_isolated_msi(igroup->group)) {
 		if (!allow_unsafe_interrupts) {
 			rc = -EPERM;
 			goto out_group_put;
@@ -97,8 +205,8 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
 		device_iommu_capable(dev, IOMMU_CAP_ENFORCE_CACHE_COHERENCY);
 	/* The calling driver is a user until iommufd_device_unbind() */
 	refcount_inc(&idev->obj.users);
-	/* group refcount moves into iommufd_device */
-	idev->group = group;
+	/* igroup refcount moves into iommufd_device */
+	idev->igroup = igroup;
 
 	/*
 	 * If the caller fails after this success it must call
@@ -113,7 +221,7 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
 out_release_owner:
 	iommu_device_release_dma_owner(dev);
 out_group_put:
-	iommu_group_put(group);
+	iommufd_put_group(igroup);
 	return ERR_PTR(rc);
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_device_bind, IOMMUFD);
@@ -170,14 +278,14 @@ static int iommufd_device_setup_msi(struct iommufd_device *idev,
 }
 
 static bool iommufd_hw_pagetable_has_group(struct iommufd_hw_pagetable *hwpt,
-					   struct iommu_group *group)
+					   struct iommufd_group *igroup)
 {
 	struct iommufd_device *cur_dev;
 
 	lockdep_assert_held(&hwpt->devices_lock);
 
 	list_for_each_entry(cur_dev, &hwpt->devices, devices_item)
-		if (cur_dev->group == group)
+		if (cur_dev->igroup->group == igroup->group)
 			return true;
 	return false;
 }
@@ -211,7 +319,8 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 	}
 
 	rc = iopt_table_enforce_group_resv_regions(&hwpt->ioas->iopt, idev->dev,
-						   idev->group, &sw_msi_start);
+						   idev->igroup->group,
+						   &sw_msi_start);
 	if (rc)
 		return rc;
 
@@ -223,8 +332,8 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 	 * FIXME: Hack around missing a device-centric iommu api, only attach to
 	 * the group once for the first device that is in the group.
 	 */
-	if (!iommufd_hw_pagetable_has_group(hwpt, idev->group)) {
-		rc = iommu_attach_group(hwpt->domain, idev->group);
+	if (!iommufd_hw_pagetable_has_group(hwpt, idev->igroup)) {
+		rc = iommu_attach_group(hwpt->domain, idev->igroup->group);
 		if (rc)
 			goto err_unresv;
 	}
@@ -237,8 +346,8 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 void iommufd_hw_pagetable_detach(struct iommufd_hw_pagetable *hwpt,
 				 struct iommufd_device *idev)
 {
-	if (!iommufd_hw_pagetable_has_group(hwpt, idev->group))
-		iommu_detach_group(hwpt->domain, idev->group);
+	if (!iommufd_hw_pagetable_has_group(hwpt, idev->igroup))
+		iommu_detach_group(hwpt->domain, idev->igroup->group);
 	iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev);
 }
 
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index d523ef12890e1e..2544f10dae9aef 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -17,6 +17,7 @@ struct iommufd_device;
 struct iommufd_ctx {
 	struct file *file;
 	struct xarray objects;
+	struct xarray groups;
 
 	u8 account_mode;
 	/* Compatibility with VFIO no iommu */
@@ -262,6 +263,12 @@ void iommufd_hw_pagetable_detach(struct iommufd_hw_pagetable *hwpt,
 				 struct iommufd_device *idev);
 void iommufd_hw_pagetable_destroy(struct iommufd_object *obj);
 
+struct iommufd_group {
+	struct kref ref;
+	struct iommufd_ctx *ictx;
+	struct iommu_group *group;
+};
+
 /*
  * A iommufd_device object represents the binding relationship between a
  * consuming driver and the iommufd. These objects are created/destroyed by
@@ -270,12 +277,12 @@ void iommufd_hw_pagetable_destroy(struct iommufd_object *obj);
 struct iommufd_device {
 	struct iommufd_object obj;
 	struct iommufd_ctx *ictx;
+	struct iommufd_group *igroup;
 	struct iommufd_hw_pagetable *hwpt;
 	/* Head at iommufd_hw_pagetable::devices */
 	struct list_head devices_item;
 	/* always the physical device */
 	struct device *dev;
-	struct iommu_group *group;
 	bool enforce_cache_coherency;
 };
 
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 3fbe636c3d8a69..e5ed5dfa91a0b5 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -183,6 +183,7 @@ static int iommufd_fops_open(struct inode *inode, struct file *filp)
 	}
 
 	xa_init_flags(&ictx->objects, XA_FLAGS_ALLOC1 | XA_FLAGS_ACCOUNT);
+	xa_init(&ictx->groups);
 	ictx->file = filp;
 	filp->private_data = ictx;
 	return 0;
@@ -218,6 +219,7 @@ static int iommufd_fops_release(struct inode *inode, struct file *filp)
 		if (WARN_ON(!destroyed))
 			break;
 	}
+	WARN_ON(!xa_empty(&ictx->groups));
 	kfree(ictx);
 	return 0;
 }
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v3 03/17] iommufd: Replace the hwpt->devices list with iommufd_group
  2023-03-21 19:14 [PATCH v3 00/17] Add iommufd physical device operations for replace and alloc hwpt Jason Gunthorpe
  2023-03-21 19:14 ` [PATCH v3 01/17] iommufd: Move isolated msi enforcement to iommufd_device_bind() Jason Gunthorpe
  2023-03-21 19:14 ` [PATCH v3 02/17] iommufd: Add iommufd_group Jason Gunthorpe
@ 2023-03-21 19:14 ` Jason Gunthorpe
  2023-03-23  7:21   ` Tian, Kevin
  2023-03-21 19:14 ` [PATCH v3 04/17] iommu: Export iommu_get_resv_regions() Jason Gunthorpe
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 46+ messages in thread
From: Jason Gunthorpe @ 2023-03-21 19:14 UTC (permalink / raw)
  To: iommu, Kevin Tian, linux-kselftest; +Cc: Lu Baolu, kvm, Nicolin Chen, Yi Liu

The devices list was used as a simple way to avoid having per-group
information. Now that this seems to be unavoidable, just commit to
per-group information fully and remove the devices list from the HWPT.

The iommufd_group stores the currently assigned HWPT for the entire group
and we can manage the per-device attach/detach with a list in the
iommufd_group.

For destruction the flow is organized to make the following patches
easier, the actual call to iommufd_object_destroy_user() is done at the
top of the call chain without holding any locks. The HWPT to be destroyed
is returned out from the locked region to make this possible. Later
patches create locking that requires this.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommufd/device.c          | 78 ++++++++++++-------------
 drivers/iommu/iommufd/hw_pagetable.c    | 23 +++-----
 drivers/iommu/iommufd/iommufd_private.h | 13 ++---
 3 files changed, 51 insertions(+), 63 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 3fd623208c691f..66de0274d65629 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -20,9 +20,12 @@ static void iommufd_group_release(struct kref *kref)
 	struct iommufd_group *igroup =
 		container_of(kref, struct iommufd_group, ref);
 
+	WARN_ON(igroup->hwpt || !list_empty(&igroup->device_list));
+
 	xa_cmpxchg(&igroup->ictx->groups, iommu_group_id(igroup->group), igroup,
 		   NULL, GFP_KERNEL);
 	iommu_group_put(igroup->group);
+	mutex_destroy(&igroup->lock);
 	kfree(igroup);
 }
 
@@ -83,6 +86,8 @@ static struct iommufd_group *iommufd_get_group(struct iommufd_ctx *ictx,
 	}
 
 	kref_init(&new_igroup->ref);
+	mutex_init(&new_igroup->lock);
+	INIT_LIST_HEAD(&new_igroup->device_list);
 	/* group reference moves into new_igroup */
 	new_igroup->group = group;
 
@@ -277,28 +282,15 @@ static int iommufd_device_setup_msi(struct iommufd_device *idev,
 	return 0;
 }
 
-static bool iommufd_hw_pagetable_has_group(struct iommufd_hw_pagetable *hwpt,
-					   struct iommufd_group *igroup)
-{
-	struct iommufd_device *cur_dev;
-
-	lockdep_assert_held(&hwpt->devices_lock);
-
-	list_for_each_entry(cur_dev, &hwpt->devices, devices_item)
-		if (cur_dev->igroup->group == igroup->group)
-			return true;
-	return false;
-}
-
 int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 				struct iommufd_device *idev)
 {
 	phys_addr_t sw_msi_start = PHYS_ADDR_MAX;
 	int rc;
 
-	lockdep_assert_held(&hwpt->devices_lock);
+	lockdep_assert_held(&idev->igroup->lock);
 
-	if (WARN_ON(idev->hwpt))
+	if (idev->igroup->hwpt != NULL && idev->igroup->hwpt != hwpt)
 		return -EINVAL;
 
 	/*
@@ -313,7 +305,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 				hwpt->domain->ops->enforce_cache_coherency(
 					hwpt->domain);
 		if (!hwpt->enforce_cache_coherency) {
-			WARN_ON(list_empty(&hwpt->devices));
+			WARN_ON(list_empty(&idev->igroup->device_list));
 			return -EINVAL;
 		}
 	}
@@ -329,26 +321,40 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 		goto err_unresv;
 
 	/*
-	 * FIXME: Hack around missing a device-centric iommu api, only attach to
-	 * the group once for the first device that is in the group.
+	 * Only attach to the group once for the first device that is in the
+	 * group. All the other devices will follow this attachment. The user
+	 * should attach every device individually to as the per-device reserved
+	 * regions are only updated during individual device attachment.
 	 */
-	if (!iommufd_hw_pagetable_has_group(hwpt, idev->igroup)) {
+	if (list_empty(&idev->igroup->device_list)) {
 		rc = iommu_attach_group(hwpt->domain, idev->igroup->group);
 		if (rc)
 			goto err_unresv;
+		idev->igroup->hwpt = hwpt;
 	}
+	refcount_inc(&hwpt->obj.users);
+	list_add_tail(&idev->group_item, &idev->igroup->device_list);
 	return 0;
 err_unresv:
 	iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev);
 	return rc;
 }
 
-void iommufd_hw_pagetable_detach(struct iommufd_hw_pagetable *hwpt,
-				 struct iommufd_device *idev)
+struct iommufd_hw_pagetable *
+iommufd_hw_pagetable_detach(struct iommufd_device *idev)
 {
-	if (!iommufd_hw_pagetable_has_group(hwpt, idev->igroup))
+	struct iommufd_hw_pagetable *hwpt = idev->igroup->hwpt;
+
+	lockdep_assert_held(&idev->igroup->lock);
+
+	list_del(&idev->group_item);
+	if (list_empty(&idev->igroup->device_list)) {
 		iommu_detach_group(hwpt->domain, idev->igroup->group);
+		idev->igroup->hwpt = NULL;
+	}
 	iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev);
+	/* Caller must destroy hwpt */
+	return hwpt;
 }
 
 static int iommufd_device_do_attach(struct iommufd_device *idev,
@@ -356,16 +362,9 @@ static int iommufd_device_do_attach(struct iommufd_device *idev,
 {
 	int rc;
 
-	mutex_lock(&hwpt->devices_lock);
+	mutex_lock(&idev->igroup->lock);
 	rc = iommufd_hw_pagetable_attach(hwpt, idev);
-	if (rc)
-		goto out_unlock;
-
-	idev->hwpt = hwpt;
-	refcount_inc(&hwpt->obj.users);
-	list_add(&idev->devices_item, &hwpt->devices);
-out_unlock:
-	mutex_unlock(&hwpt->devices_lock);
+	mutex_unlock(&idev->igroup->lock);
 	return rc;
 }
 
@@ -375,7 +374,7 @@ static int iommufd_device_do_attach(struct iommufd_device *idev,
  * Automatic domain selection will never pick a manually created domain.
  */
 static int iommufd_device_auto_get_domain(struct iommufd_device *idev,
-					  struct iommufd_ioas *ioas)
+					  struct iommufd_ioas *ioas, u32 *pt_id)
 {
 	struct iommufd_hw_pagetable *hwpt;
 	int rc;
@@ -402,6 +401,7 @@ static int iommufd_device_auto_get_domain(struct iommufd_device *idev,
 		 */
 		if (rc == -EINVAL)
 			continue;
+		*pt_id = hwpt->obj.id;
 		goto out_unlock;
 	}
 
@@ -411,6 +411,7 @@ static int iommufd_device_auto_get_domain(struct iommufd_device *idev,
 		goto out_unlock;
 	}
 	hwpt->auto_domain = true;
+	*pt_id = hwpt->obj.id;
 
 	mutex_unlock(&ioas->mutex);
 	iommufd_object_finalize(idev->ictx, &hwpt->obj);
@@ -455,7 +456,7 @@ int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id)
 		struct iommufd_ioas *ioas =
 			container_of(pt_obj, struct iommufd_ioas, obj);
 
-		rc = iommufd_device_auto_get_domain(idev, ioas);
+		rc = iommufd_device_auto_get_domain(idev, ioas, pt_id);
 		if (rc)
 			goto out_put_pt_obj;
 		break;
@@ -466,7 +467,6 @@ int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id)
 	}
 
 	refcount_inc(&idev->obj.users);
-	*pt_id = idev->hwpt->obj.id;
 	rc = 0;
 
 out_put_pt_obj:
@@ -484,13 +484,11 @@ EXPORT_SYMBOL_NS_GPL(iommufd_device_attach, IOMMUFD);
  */
 void iommufd_device_detach(struct iommufd_device *idev)
 {
-	struct iommufd_hw_pagetable *hwpt = idev->hwpt;
+	struct iommufd_hw_pagetable *hwpt;
 
-	mutex_lock(&hwpt->devices_lock);
-	list_del(&idev->devices_item);
-	idev->hwpt = NULL;
-	iommufd_hw_pagetable_detach(hwpt, idev);
-	mutex_unlock(&hwpt->devices_lock);
+	mutex_lock(&idev->igroup->lock);
+	hwpt = iommufd_hw_pagetable_detach(idev);
+	mutex_unlock(&idev->igroup->lock);
 
 	if (hwpt->auto_domain)
 		iommufd_object_destroy_user(idev->ictx, &hwpt->obj);
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 6cdb6749d359f3..566eba0cd9b917 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -11,8 +11,6 @@ void iommufd_hw_pagetable_destroy(struct iommufd_object *obj)
 	struct iommufd_hw_pagetable *hwpt =
 		container_of(obj, struct iommufd_hw_pagetable, obj);
 
-	WARN_ON(!list_empty(&hwpt->devices));
-
 	if (!list_empty(&hwpt->hwpt_item)) {
 		mutex_lock(&hwpt->ioas->mutex);
 		list_del(&hwpt->hwpt_item);
@@ -25,7 +23,6 @@ void iommufd_hw_pagetable_destroy(struct iommufd_object *obj)
 		iommu_domain_free(hwpt->domain);
 
 	refcount_dec(&hwpt->ioas->obj.users);
-	mutex_destroy(&hwpt->devices_lock);
 }
 
 /**
@@ -52,9 +49,7 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 	if (IS_ERR(hwpt))
 		return hwpt;
 
-	INIT_LIST_HEAD(&hwpt->devices);
 	INIT_LIST_HEAD(&hwpt->hwpt_item);
-	mutex_init(&hwpt->devices_lock);
 	/* Pairs with iommufd_hw_pagetable_destroy() */
 	refcount_inc(&ioas->obj.users);
 	hwpt->ioas = ioas;
@@ -65,13 +60,16 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 		goto out_abort;
 	}
 
-	mutex_lock(&hwpt->devices_lock);
+	mutex_lock(&idev->igroup->lock);
 
 	/*
 	 * immediate_attach exists only to accommodate iommu drivers that cannot
 	 * directly allocate a domain. These drivers do not finish creating the
 	 * domain until attach is completed. Thus we must have this call
 	 * sequence. Once those drivers are fixed this should be removed.
+	 *
+	 * Note we hold the igroup->lock here which prevents any other thread
+	 * from observing igroup->hwpt until we finish setting it up.
 	 */
 	if (immediate_attach) {
 		rc = iommufd_hw_pagetable_attach(hwpt, idev);
@@ -84,21 +82,14 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 		goto out_detach;
 	list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list);
 
-	if (immediate_attach) {
-		/* See iommufd_device_do_attach() */
-		refcount_inc(&hwpt->obj.users);
-		idev->hwpt = hwpt;
-		list_add(&idev->devices_item, &hwpt->devices);
-	}
-
-	mutex_unlock(&hwpt->devices_lock);
+	mutex_unlock(&idev->igroup->lock);
 	return hwpt;
 
 out_detach:
 	if (immediate_attach)
-		iommufd_hw_pagetable_detach(hwpt, idev);
+		iommufd_hw_pagetable_detach(idev);
 out_unlock:
-	mutex_unlock(&hwpt->devices_lock);
+	mutex_unlock(&idev->igroup->lock);
 out_abort:
 	iommufd_object_abort_and_destroy(ictx, &hwpt->obj);
 	return ERR_PTR(rc);
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 2544f10dae9aef..2ff192777f27d3 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -250,8 +250,6 @@ struct iommufd_hw_pagetable {
 	bool msi_cookie : 1;
 	/* Head at iommufd_ioas::hwpt_list */
 	struct list_head hwpt_item;
-	struct mutex devices_lock;
-	struct list_head devices;
 };
 
 struct iommufd_hw_pagetable *
@@ -259,14 +257,17 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 			   struct iommufd_device *idev, bool immediate_attach);
 int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 				struct iommufd_device *idev);
-void iommufd_hw_pagetable_detach(struct iommufd_hw_pagetable *hwpt,
-				 struct iommufd_device *idev);
+struct iommufd_hw_pagetable *
+iommufd_hw_pagetable_detach(struct iommufd_device *idev);
 void iommufd_hw_pagetable_destroy(struct iommufd_object *obj);
 
 struct iommufd_group {
 	struct kref ref;
+	struct mutex lock;
 	struct iommufd_ctx *ictx;
 	struct iommu_group *group;
+	struct iommufd_hw_pagetable *hwpt;
+	struct list_head device_list;
 };
 
 /*
@@ -278,9 +279,7 @@ struct iommufd_device {
 	struct iommufd_object obj;
 	struct iommufd_ctx *ictx;
 	struct iommufd_group *igroup;
-	struct iommufd_hw_pagetable *hwpt;
-	/* Head at iommufd_hw_pagetable::devices */
-	struct list_head devices_item;
+	struct list_head group_item;
 	/* always the physical device */
 	struct device *dev;
 	bool enforce_cache_coherency;
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v3 04/17] iommu: Export iommu_get_resv_regions()
  2023-03-21 19:14 [PATCH v3 00/17] Add iommufd physical device operations for replace and alloc hwpt Jason Gunthorpe
                   ` (2 preceding siblings ...)
  2023-03-21 19:14 ` [PATCH v3 03/17] iommufd: Replace the hwpt->devices list with iommufd_group Jason Gunthorpe
@ 2023-03-21 19:14 ` Jason Gunthorpe
  2023-03-21 19:14 ` [PATCH v3 05/17] iommufd: Keep track of each device's reserved regions instead of groups Jason Gunthorpe
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Jason Gunthorpe @ 2023-03-21 19:14 UTC (permalink / raw)
  To: iommu, Kevin Tian, linux-kselftest; +Cc: Lu Baolu, kvm, Nicolin Chen, Yi Liu

iommufd wants to use this in the next patch. For some reason the
iommu_put_resv_regions() was already exported.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommu.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 10db680acaed5a..76969904b93af4 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2639,6 +2639,14 @@ int iommu_set_pgtable_quirks(struct iommu_domain *domain,
 }
 EXPORT_SYMBOL_GPL(iommu_set_pgtable_quirks);
 
+/**
+ * iommu_get_resv_regions - get reserved regions
+ * @dev: device for which to get reserved regions
+ * @list: reserved region list for device
+ *
+ * This returns a list of reserved IOVA regions specific to this device.
+ * A domain user should not map IOVA in these ranges.
+ */
 void iommu_get_resv_regions(struct device *dev, struct list_head *list)
 {
 	const struct iommu_ops *ops = dev_iommu_ops(dev);
@@ -2646,9 +2654,10 @@ void iommu_get_resv_regions(struct device *dev, struct list_head *list)
 	if (ops->get_resv_regions)
 		ops->get_resv_regions(dev, list);
 }
+EXPORT_SYMBOL_GPL(iommu_get_resv_regions);
 
 /**
- * iommu_put_resv_regions - release resered regions
+ * iommu_put_resv_regions - release reserved regions
  * @dev: device for which to free reserved regions
  * @list: reserved region list for device
  *
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v3 05/17] iommufd: Keep track of each device's reserved regions instead of groups
  2023-03-21 19:14 [PATCH v3 00/17] Add iommufd physical device operations for replace and alloc hwpt Jason Gunthorpe
                   ` (3 preceding siblings ...)
  2023-03-21 19:14 ` [PATCH v3 04/17] iommu: Export iommu_get_resv_regions() Jason Gunthorpe
@ 2023-03-21 19:14 ` Jason Gunthorpe
  2023-03-21 19:14 ` [PATCH v3 06/17] iommufd: Use the iommufd_group to avoid duplicate MSI setup Jason Gunthorpe
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Jason Gunthorpe @ 2023-03-21 19:14 UTC (permalink / raw)
  To: iommu, Kevin Tian, linux-kselftest; +Cc: Lu Baolu, kvm, Nicolin Chen, Yi Liu

The driver facing API in the iommu core makes the reserved regions
per-device. An algorithm in the core code consolidates the regions of all
the devices in a group to return the group view.

To allow for devices to be hotplugged into the group iommufd would re-load
the entire group's reserved regions for each device, just in case they
changed.

Further iommufd already has to deal with duplicated/overlapping reserved
regions as it must union all the groups together.

Thus simplify all of this to just use the device reserved regions
interface directly from the iommu driver.

Suggested-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommufd/device.c          |  5 ++---
 drivers/iommu/iommufd/io_pagetable.c    | 27 ++++++++++---------------
 drivers/iommu/iommufd/iommufd_private.h |  7 +++----
 3 files changed, 16 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 66de0274d65629..4566833494e076 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -310,9 +310,8 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 		}
 	}
 
-	rc = iopt_table_enforce_group_resv_regions(&hwpt->ioas->iopt, idev->dev,
-						   idev->igroup->group,
-						   &sw_msi_start);
+	rc = iopt_table_enforce_dev_resv_regions(
+		&hwpt->ioas->iopt, idev->dev, &sw_msi_start);
 	if (rc)
 		return rc;
 
diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/io_pagetable.c
index e0ae72b9e67f86..f842768b2e250b 100644
--- a/drivers/iommu/iommufd/io_pagetable.c
+++ b/drivers/iommu/iommufd/io_pagetable.c
@@ -1161,25 +1161,22 @@ void iopt_remove_access(struct io_pagetable *iopt,
 	up_write(&iopt->domains_rwsem);
 }
 
-/* Narrow the valid_iova_itree to include reserved ranges from a group. */
-int iopt_table_enforce_group_resv_regions(struct io_pagetable *iopt,
-					  struct device *device,
-					  struct iommu_group *group,
-					  phys_addr_t *sw_msi_start)
+/* Narrow the valid_iova_itree to include reserved ranges from a device. */
+int iopt_table_enforce_dev_resv_regions(struct io_pagetable *iopt,
+					struct device *dev,
+					phys_addr_t *sw_msi_start)
 {
 	struct iommu_resv_region *resv;
-	struct iommu_resv_region *tmp;
-	LIST_HEAD(group_resv_regions);
+	LIST_HEAD(resv_regions);
 	unsigned int num_hw_msi = 0;
 	unsigned int num_sw_msi = 0;
 	int rc;
 
 	down_write(&iopt->iova_rwsem);
-	rc = iommu_get_group_resv_regions(group, &group_resv_regions);
-	if (rc)
-		goto out_unlock;
+	/* FIXME: drivers allocate memory but there is no failure propogated */
+	iommu_get_resv_regions(dev, &resv_regions);
 
-	list_for_each_entry(resv, &group_resv_regions, list) {
+	list_for_each_entry(resv, &resv_regions, list) {
 		if (resv->type == IOMMU_RESV_DIRECT_RELAXABLE)
 			continue;
 
@@ -1191,7 +1188,7 @@ int iopt_table_enforce_group_resv_regions(struct io_pagetable *iopt,
 		}
 
 		rc = iopt_reserve_iova(iopt, resv->start,
-				       resv->length - 1 + resv->start, device);
+				       resv->length - 1 + resv->start, dev);
 		if (rc)
 			goto out_reserved;
 	}
@@ -1206,11 +1203,9 @@ int iopt_table_enforce_group_resv_regions(struct io_pagetable *iopt,
 	goto out_free_resv;
 
 out_reserved:
-	__iopt_remove_reserved_iova(iopt, device);
+	__iopt_remove_reserved_iova(iopt, dev);
 out_free_resv:
-	list_for_each_entry_safe(resv, tmp, &group_resv_regions, list)
-		kfree(resv);
-out_unlock:
+	iommu_put_resv_regions(dev, &resv_regions);
 	up_write(&iopt->iova_rwsem);
 	return rc;
 }
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 2ff192777f27d3..22863759c3bfb0 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -76,10 +76,9 @@ int iopt_table_add_domain(struct io_pagetable *iopt,
 			  struct iommu_domain *domain);
 void iopt_table_remove_domain(struct io_pagetable *iopt,
 			      struct iommu_domain *domain);
-int iopt_table_enforce_group_resv_regions(struct io_pagetable *iopt,
-					  struct device *device,
-					  struct iommu_group *group,
-					  phys_addr_t *sw_msi_start);
+int iopt_table_enforce_dev_resv_regions(struct io_pagetable *iopt,
+					struct device *dev,
+					phys_addr_t *sw_msi_start);
 int iopt_set_allow_iova(struct io_pagetable *iopt,
 			struct rb_root_cached *allowed_iova);
 int iopt_reserve_iova(struct io_pagetable *iopt, unsigned long start,
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v3 06/17] iommufd: Use the iommufd_group to avoid duplicate MSI setup
  2023-03-21 19:14 [PATCH v3 00/17] Add iommufd physical device operations for replace and alloc hwpt Jason Gunthorpe
                   ` (4 preceding siblings ...)
  2023-03-21 19:14 ` [PATCH v3 05/17] iommufd: Keep track of each device's reserved regions instead of groups Jason Gunthorpe
@ 2023-03-21 19:14 ` Jason Gunthorpe
  2023-03-21 19:14 ` [PATCH v3 07/17] iommufd: Make sw_msi_start a group global Jason Gunthorpe
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Jason Gunthorpe @ 2023-03-21 19:14 UTC (permalink / raw)
  To: iommu, Kevin Tian, linux-kselftest; +Cc: Lu Baolu, kvm, Nicolin Chen, Yi Liu

This only needs to be done once per group, not once per device. The once
per device was a way to make the device list work. Since we are abandoning
this we can optimize things a bit.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommufd/device.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 4566833494e076..7f063b679a24be 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -315,10 +315,6 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 	if (rc)
 		return rc;
 
-	rc = iommufd_device_setup_msi(idev, hwpt, sw_msi_start);
-	if (rc)
-		goto err_unresv;
-
 	/*
 	 * Only attach to the group once for the first device that is in the
 	 * group. All the other devices will follow this attachment. The user
@@ -326,6 +322,10 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 	 * regions are only updated during individual device attachment.
 	 */
 	if (list_empty(&idev->igroup->device_list)) {
+		rc = iommufd_device_setup_msi(idev, hwpt, sw_msi_start);
+		if (rc)
+			goto err_unresv;
+
 		rc = iommu_attach_group(hwpt->domain, idev->igroup->group);
 		if (rc)
 			goto err_unresv;
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v3 07/17] iommufd: Make sw_msi_start a group global
  2023-03-21 19:14 [PATCH v3 00/17] Add iommufd physical device operations for replace and alloc hwpt Jason Gunthorpe
                   ` (5 preceding siblings ...)
  2023-03-21 19:14 ` [PATCH v3 06/17] iommufd: Use the iommufd_group to avoid duplicate MSI setup Jason Gunthorpe
@ 2023-03-21 19:14 ` Jason Gunthorpe
  2023-03-21 19:14 ` [PATCH v3 08/17] iommufd: Move putting a hwpt to a helper function Jason Gunthorpe
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Jason Gunthorpe @ 2023-03-21 19:14 UTC (permalink / raw)
  To: iommu, Kevin Tian, linux-kselftest; +Cc: Lu Baolu, kvm, Nicolin Chen, Yi Liu

The sw_msi_start is only set by the ARM drivers and it is always constant.
Due to the way vfio/iommufd allow domains to be re-used between
devices we have a built in assumption that there is only one value
for sw_msi_start and it is global to the system.

To make replace simpler where we may not reparse the
iommu_get_resv_regions() move the sw_msi_start to the iommufd_group so it
is always available once any HWPT has been attached.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommufd/device.c          | 14 +++++++-------
 drivers/iommu/iommufd/iommufd_private.h |  1 +
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 7f063b679a24be..616a23939e968b 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -88,6 +88,7 @@ static struct iommufd_group *iommufd_get_group(struct iommufd_ctx *ictx,
 	kref_init(&new_igroup->ref);
 	mutex_init(&new_igroup->lock);
 	INIT_LIST_HEAD(&new_igroup->device_list);
+	new_igroup->sw_msi_start = PHYS_ADDR_MAX;
 	/* group reference moves into new_igroup */
 	new_igroup->group = group;
 
@@ -249,10 +250,10 @@ void iommufd_device_unbind(struct iommufd_device *idev)
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_device_unbind, IOMMUFD);
 
-static int iommufd_device_setup_msi(struct iommufd_device *idev,
-				    struct iommufd_hw_pagetable *hwpt,
-				    phys_addr_t sw_msi_start)
+static int iommufd_group_setup_msi(struct iommufd_group *igroup,
+				   struct iommufd_hw_pagetable *hwpt)
 {
+	phys_addr_t sw_msi_start = igroup->sw_msi_start;
 	int rc;
 
 	/*
@@ -285,7 +286,6 @@ static int iommufd_device_setup_msi(struct iommufd_device *idev,
 int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 				struct iommufd_device *idev)
 {
-	phys_addr_t sw_msi_start = PHYS_ADDR_MAX;
 	int rc;
 
 	lockdep_assert_held(&idev->igroup->lock);
@@ -310,8 +310,8 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 		}
 	}
 
-	rc = iopt_table_enforce_dev_resv_regions(
-		&hwpt->ioas->iopt, idev->dev, &sw_msi_start);
+	rc = iopt_table_enforce_dev_resv_regions(&hwpt->ioas->iopt, idev->dev,
+						 &idev->igroup->sw_msi_start);
 	if (rc)
 		return rc;
 
@@ -322,7 +322,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 	 * regions are only updated during individual device attachment.
 	 */
 	if (list_empty(&idev->igroup->device_list)) {
-		rc = iommufd_device_setup_msi(idev, hwpt, sw_msi_start);
+		rc = iommufd_group_setup_msi(idev->igroup, hwpt);
 		if (rc)
 			goto err_unresv;
 
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 22863759c3bfb0..7f4936cf537be4 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -267,6 +267,7 @@ struct iommufd_group {
 	struct iommu_group *group;
 	struct iommufd_hw_pagetable *hwpt;
 	struct list_head device_list;
+	phys_addr_t sw_msi_start;
 };
 
 /*
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v3 08/17] iommufd: Move putting a hwpt to a helper function
  2023-03-21 19:14 [PATCH v3 00/17] Add iommufd physical device operations for replace and alloc hwpt Jason Gunthorpe
                   ` (6 preceding siblings ...)
  2023-03-21 19:14 ` [PATCH v3 07/17] iommufd: Make sw_msi_start a group global Jason Gunthorpe
@ 2023-03-21 19:14 ` Jason Gunthorpe
  2023-03-21 19:14 ` [PATCH v3 09/17] iommufd: Add enforced_cache_coherency to iommufd_hw_pagetable_alloc() Jason Gunthorpe
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Jason Gunthorpe @ 2023-03-21 19:14 UTC (permalink / raw)
  To: iommu, Kevin Tian, linux-kselftest; +Cc: Lu Baolu, kvm, Nicolin Chen, Yi Liu

Next patch will need to call this from two places.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommufd/device.c          |  6 +-----
 drivers/iommu/iommufd/iommufd_private.h | 10 ++++++++++
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 616a23939e968b..8946de73660ece 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -489,11 +489,7 @@ void iommufd_device_detach(struct iommufd_device *idev)
 	hwpt = iommufd_hw_pagetable_detach(idev);
 	mutex_unlock(&idev->igroup->lock);
 
-	if (hwpt->auto_domain)
-		iommufd_object_destroy_user(idev->ictx, &hwpt->obj);
-	else
-		refcount_dec(&hwpt->obj.users);
-
+	iommufd_hw_pagetable_put(idev->ictx, hwpt);
 	refcount_dec(&idev->obj.users);
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_device_detach, IOMMUFD);
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 7f4936cf537be4..8bf053f4d4a9ce 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -260,6 +260,16 @@ struct iommufd_hw_pagetable *
 iommufd_hw_pagetable_detach(struct iommufd_device *idev);
 void iommufd_hw_pagetable_destroy(struct iommufd_object *obj);
 
+static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx,
+					    struct iommufd_hw_pagetable *hwpt)
+{
+	lockdep_assert_not_held(&hwpt->ioas->mutex);
+	if (hwpt->auto_domain)
+		iommufd_object_destroy_user(ictx, &hwpt->obj);
+	else
+		refcount_dec(&hwpt->obj.users);
+}
+
 struct iommufd_group {
 	struct kref ref;
 	struct mutex lock;
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v3 09/17] iommufd: Add enforced_cache_coherency to iommufd_hw_pagetable_alloc()
  2023-03-21 19:14 [PATCH v3 00/17] Add iommufd physical device operations for replace and alloc hwpt Jason Gunthorpe
                   ` (7 preceding siblings ...)
  2023-03-21 19:14 ` [PATCH v3 08/17] iommufd: Move putting a hwpt to a helper function Jason Gunthorpe
@ 2023-03-21 19:14 ` Jason Gunthorpe
  2023-03-21 19:14 ` [PATCH v3 10/17] iommufd: Reorganize iommufd_device_attach into iommufd_device_change_pt Jason Gunthorpe
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Jason Gunthorpe @ 2023-03-21 19:14 UTC (permalink / raw)
  To: iommu, Kevin Tian, linux-kselftest; +Cc: Lu Baolu, kvm, Nicolin Chen, Yi Liu

Logically the HWPT should have the coherency set properly for the device
that it is being created for when it is created.

This was happening implicitly if the immediate_attach was set because
iommufd_hw_pagetable_attach() does it as the first thing.

Do it unconditionally so !immediate_attach works properly.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommufd/device.c          | 20 +++++-------------
 drivers/iommu/iommufd/hw_pagetable.c    | 27 +++++++++++++++++++++++++
 drivers/iommu/iommufd/iommufd_private.h |  1 +
 3 files changed, 33 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 8946de73660ece..ea01718f46ccb1 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -293,21 +293,11 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 	if (idev->igroup->hwpt != NULL && idev->igroup->hwpt != hwpt)
 		return -EINVAL;
 
-	/*
-	 * Try to upgrade the domain we have, it is an iommu driver bug to
-	 * report IOMMU_CAP_ENFORCE_CACHE_COHERENCY but fail
-	 * enforce_cache_coherency when there are no devices attached to the
-	 * domain.
-	 */
-	if (idev->enforce_cache_coherency && !hwpt->enforce_cache_coherency) {
-		if (hwpt->domain->ops->enforce_cache_coherency)
-			hwpt->enforce_cache_coherency =
-				hwpt->domain->ops->enforce_cache_coherency(
-					hwpt->domain);
-		if (!hwpt->enforce_cache_coherency) {
-			WARN_ON(list_empty(&idev->igroup->device_list));
-			return -EINVAL;
-		}
+	/* Try to upgrade the domain we have */
+	if (idev->enforce_cache_coherency) {
+		rc = iommufd_hw_pagetable_enforce_cc(hwpt);
+		if (rc)
+			return rc;
 	}
 
 	rc = iopt_table_enforce_dev_resv_regions(&hwpt->ioas->iopt, idev->dev,
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 566eba0cd9b917..2584f9038b29a2 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -25,6 +25,20 @@ void iommufd_hw_pagetable_destroy(struct iommufd_object *obj)
 	refcount_dec(&hwpt->ioas->obj.users);
 }
 
+int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
+{
+	if (hwpt->enforce_cache_coherency)
+		return 0;
+
+	if (hwpt->domain->ops->enforce_cache_coherency)
+		hwpt->enforce_cache_coherency =
+			hwpt->domain->ops->enforce_cache_coherency(
+				hwpt->domain);
+	if (!hwpt->enforce_cache_coherency)
+		return -EINVAL;
+	return 0;
+}
+
 /**
  * iommufd_hw_pagetable_alloc() - Get an iommu_domain for a device
  * @ictx: iommufd context
@@ -60,6 +74,19 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 		goto out_abort;
 	}
 
+	/*
+	 * Set the coherency mode before we do iopt_table_add_domain() as some
+	 * iommus have a per-PTE bit that controls it and need to decide before
+	 * doing any maps. It is an iommu driver bug to report
+	 * IOMMU_CAP_ENFORCE_CACHE_COHERENCY but fail enforce_cache_coherency on
+	 * a new domain.
+	 */
+	if (idev->enforce_cache_coherency) {
+		rc = iommufd_hw_pagetable_enforce_cc(hwpt);
+		if (WARN_ON(rc))
+			goto out_abort;
+	}
+
 	mutex_lock(&idev->igroup->lock);
 
 	/*
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 8bf053f4d4a9ce..471a3fdff1e0b6 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -254,6 +254,7 @@ struct iommufd_hw_pagetable {
 struct iommufd_hw_pagetable *
 iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 			   struct iommufd_device *idev, bool immediate_attach);
+int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt);
 int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 				struct iommufd_device *idev);
 struct iommufd_hw_pagetable *
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v3 10/17] iommufd: Reorganize iommufd_device_attach into iommufd_device_change_pt
  2023-03-21 19:14 [PATCH v3 00/17] Add iommufd physical device operations for replace and alloc hwpt Jason Gunthorpe
                   ` (8 preceding siblings ...)
  2023-03-21 19:14 ` [PATCH v3 09/17] iommufd: Add enforced_cache_coherency to iommufd_hw_pagetable_alloc() Jason Gunthorpe
@ 2023-03-21 19:14 ` Jason Gunthorpe
  2023-03-23  7:25   ` Tian, Kevin
  2023-03-21 19:14 ` [PATCH v3 11/17] iommu: Introduce a new iommu_group_replace_domain() API Jason Gunthorpe
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 46+ messages in thread
From: Jason Gunthorpe @ 2023-03-21 19:14 UTC (permalink / raw)
  To: iommu, Kevin Tian, linux-kselftest; +Cc: Lu Baolu, kvm, Nicolin Chen, Yi Liu

The code flow for first time attaching a PT and replacing a PT is very
similar except for the lowest do_attach step.

Reorganize this so that the do_attach step is a function pointer.

Replace requires destroying the old HWPT once it is replaced. This
destruction cannot be done under all the locks that are held in the
function pointer, so the signature allows returning a HWPT which will be
destroyed by the caller after everything is unlocked.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommufd/device.c | 138 +++++++++++++++++++++++----------
 1 file changed, 96 insertions(+), 42 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index ea01718f46ccb1..40466152b68132 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -346,27 +346,43 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev)
 	return hwpt;
 }
 
-static int iommufd_device_do_attach(struct iommufd_device *idev,
-				    struct iommufd_hw_pagetable *hwpt)
+static struct iommufd_hw_pagetable *
+iommufd_device_do_attach(struct iommufd_device *idev,
+			 struct iommufd_hw_pagetable *hwpt)
 {
 	int rc;
 
 	mutex_lock(&idev->igroup->lock);
 	rc = iommufd_hw_pagetable_attach(hwpt, idev);
 	mutex_unlock(&idev->igroup->lock);
-	return rc;
+	if (rc)
+		return ERR_PTR(rc);
+	return NULL;
 }
 
+typedef struct iommufd_hw_pagetable *(*attach_fn)(
+	struct iommufd_device *idev, struct iommufd_hw_pagetable *hwpt);
+
 /*
  * When automatically managing the domains we search for a compatible domain in
  * the iopt and if one is found use it, otherwise create a new domain.
  * Automatic domain selection will never pick a manually created domain.
  */
-static int iommufd_device_auto_get_domain(struct iommufd_device *idev,
-					  struct iommufd_ioas *ioas, u32 *pt_id)
+static struct iommufd_hw_pagetable *
+iommufd_device_auto_get_domain(struct iommufd_device *idev,
+			       struct iommufd_ioas *ioas, u32 *pt_id,
+			       attach_fn do_attach)
 {
+	/*
+	 * iommufd_hw_pagetable_attach() is called by
+	 * iommufd_hw_pagetable_alloc() in immediate attachment mode, same as
+	 * iommufd_device_do_attach(). So if we are in this mode then we prefer
+	 * to use the immediate_attach path as it supports drivers that can't
+	 * directly allocate a domain.
+	 */
+	bool immediate_attach = do_attach == iommufd_device_do_attach;
+	struct iommufd_hw_pagetable *destroy_hwpt;
 	struct iommufd_hw_pagetable *hwpt;
-	int rc;
 
 	/*
 	 * There is no differentiation when domains are allocated, so any domain
@@ -380,52 +396,58 @@ static int iommufd_device_auto_get_domain(struct iommufd_device *idev,
 
 		if (!iommufd_lock_obj(&hwpt->obj))
 			continue;
-		rc = iommufd_device_do_attach(idev, hwpt);
-		iommufd_put_object(&hwpt->obj);
-
-		/*
-		 * -EINVAL means the domain is incompatible with the device.
-		 * Other error codes should propagate to userspace as failure.
-		 * Success means the domain is attached.
-		 */
-		if (rc == -EINVAL)
-			continue;
+		destroy_hwpt = (*do_attach)(idev, hwpt);
+		if (IS_ERR(destroy_hwpt)) {
+			iommufd_put_object(&hwpt->obj);
+			/*
+			 * -EINVAL means the domain is incompatible with the
+			 * device. Other error codes should propagate to
+			 * userspace as failure. Success means the domain is
+			 * attached.
+			 */
+			if (PTR_ERR(destroy_hwpt) == -EINVAL)
+				continue;
+			goto out_unlock;
+		}
 		*pt_id = hwpt->obj.id;
+		iommufd_put_object(&hwpt->obj);
 		goto out_unlock;
 	}
 
-	hwpt = iommufd_hw_pagetable_alloc(idev->ictx, ioas, idev, true);
+	hwpt = iommufd_hw_pagetable_alloc(idev->ictx, ioas, idev,
+					  immediate_attach);
 	if (IS_ERR(hwpt)) {
-		rc = PTR_ERR(hwpt);
+		destroy_hwpt = ERR_CAST(hwpt);
 		goto out_unlock;
 	}
+
+	if (!immediate_attach) {
+		destroy_hwpt = (*do_attach)(idev, hwpt);
+		if (IS_ERR(destroy_hwpt))
+			goto out_abort;
+	} else {
+		destroy_hwpt = NULL;
+	}
+
 	hwpt->auto_domain = true;
 	*pt_id = hwpt->obj.id;
 
 	mutex_unlock(&ioas->mutex);
 	iommufd_object_finalize(idev->ictx, &hwpt->obj);
-	return 0;
+	return destroy_hwpt;
+
+out_abort:
+	iommufd_object_abort_and_destroy(idev->ictx, &hwpt->obj);
 out_unlock:
 	mutex_unlock(&ioas->mutex);
-	return rc;
+	return destroy_hwpt;
 }
 
-/**
- * iommufd_device_attach - Connect a device from an iommu_domain
- * @idev: device to attach
- * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HW_PAGETABLE
- *         Output the IOMMUFD_OBJ_HW_PAGETABLE ID
- *
- * This connects the device to an iommu_domain, either automatically or manually
- * selected. Once this completes the device could do DMA.
- *
- * The caller should return the resulting pt_id back to userspace.
- * This function is undone by calling iommufd_device_detach().
- */
-int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id)
+static int iommufd_device_change_pt(struct iommufd_device *idev, u32 *pt_id,
+				    attach_fn do_attach)
 {
+	struct iommufd_hw_pagetable *destroy_hwpt;
 	struct iommufd_object *pt_obj;
-	int rc;
 
 	pt_obj = iommufd_get_object(idev->ictx, *pt_id, IOMMUFD_OBJ_ANY);
 	if (IS_ERR(pt_obj))
@@ -436,8 +458,8 @@ int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id)
 		struct iommufd_hw_pagetable *hwpt =
 			container_of(pt_obj, struct iommufd_hw_pagetable, obj);
 
-		rc = iommufd_device_do_attach(idev, hwpt);
-		if (rc)
+		destroy_hwpt = (*do_attach)(idev, hwpt);
+		if (IS_ERR(destroy_hwpt))
 			goto out_put_pt_obj;
 		break;
 	}
@@ -445,22 +467,54 @@ int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id)
 		struct iommufd_ioas *ioas =
 			container_of(pt_obj, struct iommufd_ioas, obj);
 
-		rc = iommufd_device_auto_get_domain(idev, ioas, pt_id);
-		if (rc)
+		destroy_hwpt = iommufd_device_auto_get_domain(idev, ioas, pt_id,
+							      do_attach);
+		if (IS_ERR(destroy_hwpt))
 			goto out_put_pt_obj;
 		break;
 	}
 	default:
-		rc = -EINVAL;
+		destroy_hwpt = ERR_PTR(-EINVAL);
 		goto out_put_pt_obj;
 	}
+	iommufd_put_object(pt_obj);
 
-	refcount_inc(&idev->obj.users);
-	rc = 0;
+	/* This destruction has to be after we unlock everything */
+	if (destroy_hwpt)
+		iommufd_hw_pagetable_put(idev->ictx, destroy_hwpt);
+	return 0;
 
 out_put_pt_obj:
 	iommufd_put_object(pt_obj);
-	return rc;
+	return PTR_ERR(destroy_hwpt);
+}
+
+/**
+ * iommufd_device_attach - Connect a device to an iommu_domain
+ * @idev: device to attach
+ * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HW_PAGETABLE
+ *         Output the IOMMUFD_OBJ_HW_PAGETABLE ID
+ *
+ * This connects the device to an iommu_domain, either automatically or manually
+ * selected. Once this completes the device could do DMA.
+ *
+ * The caller should return the resulting pt_id back to userspace.
+ * This function is undone by calling iommufd_device_detach().
+ */
+int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id)
+{
+	int rc;
+
+	rc = iommufd_device_change_pt(idev, pt_id, &iommufd_device_do_attach);
+	if (rc)
+		return rc;
+
+	/*
+	 * Pairs with iommufd_device_detach() - catches caller bugs attempting
+	 * to destroy a device with an attachment.
+	 */
+	refcount_inc(&idev->obj.users);
+	return 0;
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_device_attach, IOMMUFD);
 
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v3 11/17] iommu: Introduce a new iommu_group_replace_domain() API
  2023-03-21 19:14 [PATCH v3 00/17] Add iommufd physical device operations for replace and alloc hwpt Jason Gunthorpe
                   ` (9 preceding siblings ...)
  2023-03-21 19:14 ` [PATCH v3 10/17] iommufd: Reorganize iommufd_device_attach into iommufd_device_change_pt Jason Gunthorpe
@ 2023-03-21 19:14 ` Jason Gunthorpe
  2023-03-21 19:14 ` [PATCH v3 12/17] iommufd: Add iommufd_device_replace() Jason Gunthorpe
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 46+ messages in thread
From: Jason Gunthorpe @ 2023-03-21 19:14 UTC (permalink / raw)
  To: iommu, Kevin Tian, linux-kselftest; +Cc: Lu Baolu, kvm, Nicolin Chen, Yi Liu

From: Nicolin Chen <nicolinc@nvidia.com>

qemu has a need to replace the translations associated with a domain
when the guest does large-scale operations like switching between an
IDENTITY domain and, say, dma-iommu.c.

Currently, it does this by replacing all the mappings in a single
domain, but this is very inefficient and means that domains have to be
per-device rather than per-translation.

Provide a high-level API to allow replacements of one domain with
another. This is similar to a detach/attach cycle except it doesn't
force the group to go to the blocking domain in-between.

By removing this forced blocking domain the iommu driver has the
opportunity to implement a non-disruptive replacement of the domain to the
greatest extent its hardware allows. This allows the qemu emulation of the
vIOMMU to be more complete, as real hardware often has a non-distruptive
replacement capability.

It could be possible to address this by simply removing the protection
from the iommu_attach_group(), but it is not so clear if that is safe for
the few users. Thus, add a new API to serve this new purpose.

All drivers are already required to support changing between active
UNMANAGED domains when using their attach_dev ops.

This API is expected to be used only by IOMMUFD, so add to the iommu-priv
header and mark it as IOMMUFD_INTERNAL.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommu-priv.h | 10 ++++++++++
 drivers/iommu/iommu.c      | 30 ++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+)
 create mode 100644 drivers/iommu/iommu-priv.h

diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
new file mode 100644
index 00000000000000..7c8011bfd15374
--- /dev/null
+++ b/drivers/iommu/iommu-priv.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __LINUX_IOMMU_PRIV_H
+#define __LINUX_IOMMU_PRIV_H
+
+#include <linux/iommu.h>
+
+int iommu_group_replace_domain(struct iommu_group *group,
+			       struct iommu_domain *new_domain);
+
+#endif /* __LINUX_IOMMU_PRIV_H */
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 76969904b93af4..91a948a76db5ee 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -33,6 +33,7 @@
 #include <linux/msi.h>
 
 #include "dma-iommu.h"
+#include "iommu-priv.h"
 
 #include "iommu-sva.h"
 
@@ -2191,6 +2192,35 @@ int iommu_attach_group(struct iommu_domain *domain, struct iommu_group *group)
 }
 EXPORT_SYMBOL_GPL(iommu_attach_group);
 
+/**
+ * iommu_group_replace_domain - replace the domain that a group is attached to
+ * @new_domain: new IOMMU domain to replace with
+ * @group: IOMMU group that will be attached to the new domain
+ *
+ * This API allows the group to switch domains without being forced to go to
+ * the blocking domain in-between.
+ *
+ * If the currently attached domain is a core domain (e.g. a default_domain),
+ * it will act just like the iommu_attach_group().
+ */
+int iommu_group_replace_domain(struct iommu_group *group,
+			       struct iommu_domain *new_domain)
+{
+	int ret;
+
+	if (!new_domain)
+		return -EINVAL;
+
+	mutex_lock(&group->mutex);
+	ret = __iommu_group_set_domain(group, new_domain);
+	if (ret)
+		__iommu_group_for_each_dev(group, group->domain,
+					   iommu_group_do_attach_device);
+	mutex_unlock(&group->mutex);
+	return ret;
+}
+EXPORT_SYMBOL_NS_GPL(iommu_group_replace_domain, IOMMUFD_INTERNAL);
+
 static int iommu_group_do_set_platform_dma(struct device *dev, void *data)
 {
 	const struct iommu_ops *ops = dev_iommu_ops(dev);
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v3 12/17] iommufd: Add iommufd_device_replace()
  2023-03-21 19:14 [PATCH v3 00/17] Add iommufd physical device operations for replace and alloc hwpt Jason Gunthorpe
                   ` (10 preceding siblings ...)
  2023-03-21 19:14 ` [PATCH v3 11/17] iommu: Introduce a new iommu_group_replace_domain() API Jason Gunthorpe
@ 2023-03-21 19:14 ` Jason Gunthorpe
  2023-03-23  7:31   ` Tian, Kevin
  2023-03-21 19:14 ` [PATCH v3 13/17] iommufd: Make destroy_rwsem use a lock class per object type Jason Gunthorpe
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 46+ messages in thread
From: Jason Gunthorpe @ 2023-03-21 19:14 UTC (permalink / raw)
  To: iommu, Kevin Tian, linux-kselftest; +Cc: Lu Baolu, kvm, Nicolin Chen, Yi Liu

Replace allows all the devices in a group to move in one step to a new
HWPT. Further, the HWPT move is done without going through a blocking
domain so that the IOMMU driver can implement some level of
non-distruption to ongoing DMA if that has meaning for it (eg for future
special driver domains)

Replace uses a lot of the same logic as normal attach, except the actual
domain change over has different restrictions, and we are careful to
sequence things so that failure is going to leave everything the way it
was, and not get trapped in a blocking domain or something if there is
ENOMEM.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommufd/device.c | 96 ++++++++++++++++++++++++++++++++++
 drivers/iommu/iommufd/main.c   |  1 +
 2 files changed, 97 insertions(+)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 40466152b68132..5b5c2745b4a088 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -4,6 +4,7 @@
 #include <linux/iommufd.h>
 #include <linux/slab.h>
 #include <linux/iommu.h>
+#include "../iommu-priv.h"
 
 #include "io_pagetable.h"
 #include "iommufd_private.h"
@@ -360,6 +361,81 @@ iommufd_device_do_attach(struct iommufd_device *idev,
 	return NULL;
 }
 
+static struct iommufd_hw_pagetable *
+iommufd_device_do_replace(struct iommufd_device *idev,
+			  struct iommufd_hw_pagetable *hwpt)
+{
+	struct iommufd_group *igroup = idev->igroup;
+	struct iommufd_hw_pagetable *old_hwpt;
+	unsigned int num_devices = 0;
+	struct iommufd_device *cur;
+	int rc;
+
+	mutex_lock(&idev->igroup->lock);
+
+	if (igroup->hwpt == NULL) {
+		rc = -EINVAL;
+		goto err_unlock;
+	}
+
+	if (hwpt == igroup->hwpt) {
+		mutex_unlock(&idev->igroup->lock);
+		return NULL;
+	}
+
+	/* Try to upgrade the domain we have */
+	list_for_each_entry(cur, &igroup->device_list, group_item) {
+		num_devices++;
+		if (cur->enforce_cache_coherency) {
+			rc = iommufd_hw_pagetable_enforce_cc(hwpt);
+			if (rc)
+				goto err_unlock;
+		}
+	}
+
+	old_hwpt = igroup->hwpt;
+	if (hwpt->ioas != old_hwpt->ioas) {
+		list_for_each_entry(cur, &igroup->device_list, group_item) {
+			rc = iopt_table_enforce_dev_resv_regions(
+				&hwpt->ioas->iopt, cur->dev, NULL);
+			if (rc)
+				goto err_unresv;
+		}
+	}
+
+	rc = iommufd_group_setup_msi(idev->igroup, hwpt);
+	if (rc)
+		goto err_unresv;
+
+	rc = iommu_group_replace_domain(igroup->group, hwpt->domain);
+	if (rc)
+		goto err_unresv;
+
+	if (hwpt->ioas != old_hwpt->ioas) {
+		list_for_each_entry(cur, &igroup->device_list, group_item)
+			iopt_remove_reserved_iova(&old_hwpt->ioas->iopt,
+						  cur->dev);
+	}
+
+	igroup->hwpt = hwpt;
+
+	/* Move the refcounts held by the device_list to the new hwpt */
+	refcount_add(num_devices, &hwpt->obj.users);
+	if (num_devices > 1)
+		WARN_ON(refcount_sub_and_test(num_devices - 1,
+					      &old_hwpt->obj.users));
+	mutex_unlock(&idev->igroup->lock);
+
+	/* Caller must destroy old_hwpt */
+	return old_hwpt;
+err_unresv:
+	list_for_each_entry(cur, &igroup->device_list, group_item)
+		iopt_remove_reserved_iova(&hwpt->ioas->iopt, cur->dev);
+err_unlock:
+	mutex_unlock(&idev->igroup->lock);
+	return ERR_PTR(rc);
+}
+
 typedef struct iommufd_hw_pagetable *(*attach_fn)(
 	struct iommufd_device *idev, struct iommufd_hw_pagetable *hwpt);
 
@@ -518,6 +594,26 @@ int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id)
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_device_attach, IOMMUFD);
 
+/**
+ * iommufd_device_replace - Change the device's iommu_domain
+ * @idev: device to change
+ * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HW_PAGETABLE
+ *         Output the IOMMUFD_OBJ_HW_PAGETABLE ID
+ *
+ * This is the same as
+ *   iommufd_device_detach();
+ *   iommufd_device_attach();
+ * If it fails then no change is made to the attachment. The iommu driver may
+ * implement this so there is no disruption in translation. This can only be
+ * called if iommufd_device_attach() has already succeeded.
+ */
+int iommufd_device_replace(struct iommufd_device *idev, u32 *pt_id)
+{
+	return iommufd_device_change_pt(idev, pt_id,
+					&iommufd_device_do_replace);
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_device_replace, IOMMUFD);
+
 /**
  * iommufd_device_detach - Disconnect a device to an iommu_domain
  * @idev: device to detach
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index e5ed5dfa91a0b5..8597f2fb88da3a 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -461,5 +461,6 @@ module_exit(iommufd_exit);
 MODULE_ALIAS_MISCDEV(VFIO_MINOR);
 MODULE_ALIAS("devname:vfio/vfio");
 #endif
+MODULE_IMPORT_NS(IOMMUFD_INTERNAL);
 MODULE_DESCRIPTION("I/O Address Space Management for passthrough devices");
 MODULE_LICENSE("GPL");
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v3 13/17] iommufd: Make destroy_rwsem use a lock class per object type
  2023-03-21 19:14 [PATCH v3 00/17] Add iommufd physical device operations for replace and alloc hwpt Jason Gunthorpe
                   ` (11 preceding siblings ...)
  2023-03-21 19:14 ` [PATCH v3 12/17] iommufd: Add iommufd_device_replace() Jason Gunthorpe
@ 2023-03-21 19:14 ` Jason Gunthorpe
  2023-03-23  7:54   ` Tian, Kevin
  2023-03-21 19:14 ` [PATCH v3 14/17] iommufd/selftest: Test iommufd_device_replace() Jason Gunthorpe
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 46+ messages in thread
From: Jason Gunthorpe @ 2023-03-21 19:14 UTC (permalink / raw)
  To: iommu, Kevin Tian, linux-kselftest; +Cc: Lu Baolu, kvm, Nicolin Chen, Yi Liu

The selftest invokes things like replace under the object lock of its
idev which protects the idev in a similar way to a real user.
Unfortunately this triggers lockdep. A lock class per type will solve the
problem.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommufd/iommufd_private.h |  1 +
 drivers/iommu/iommufd/main.c            | 11 ++++++++++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 471a3fdff1e0b6..f80b012e1bc200 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -119,6 +119,7 @@ enum iommufd_object_type {
 #ifdef CONFIG_IOMMUFD_TEST
 	IOMMUFD_OBJ_SELFTEST,
 #endif
+	IOMMUFD_OBJ_MAX,
 };
 
 /* Base struct for all objects with a userspace ID handle. */
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 8597f2fb88da3a..9cba592d0482e7 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -32,6 +32,7 @@ struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx,
 					     size_t size,
 					     enum iommufd_object_type type)
 {
+	static struct lock_class_key obj_keys[IOMMUFD_OBJ_MAX];
 	struct iommufd_object *obj;
 	int rc;
 
@@ -39,7 +40,15 @@ struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx,
 	if (!obj)
 		return ERR_PTR(-ENOMEM);
 	obj->type = type;
-	init_rwsem(&obj->destroy_rwsem);
+	/*
+	 * In most cases the destroy_rwsem is obtained with try so it doesn't
+	 * interact with lockdep, however on destroy we have to sleep. This
+	 * means if we have to destroy an object while holding a get on another
+	 * object it triggers lockdep. Using one locking class per object type
+	 * is a simple and reasonable way to avoid this.
+	 */
+	__init_rwsem(&obj->destroy_rwsem, "iommufd_object::destroy_rwsem",
+		     &obj_keys[type]);
 	refcount_set(&obj->users, 1);
 
 	/*
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v3 14/17] iommufd/selftest: Test iommufd_device_replace()
  2023-03-21 19:14 [PATCH v3 00/17] Add iommufd physical device operations for replace and alloc hwpt Jason Gunthorpe
                   ` (12 preceding siblings ...)
  2023-03-21 19:14 ` [PATCH v3 13/17] iommufd: Make destroy_rwsem use a lock class per object type Jason Gunthorpe
@ 2023-03-21 19:14 ` Jason Gunthorpe
  2023-03-23  7:57   ` Tian, Kevin
  2023-03-21 19:14 ` [PATCH v3 15/17] iommufd: Add IOMMU_HWPT_ALLOC Jason Gunthorpe
                   ` (3 subsequent siblings)
  17 siblings, 1 reply; 46+ messages in thread
From: Jason Gunthorpe @ 2023-03-21 19:14 UTC (permalink / raw)
  To: iommu, Kevin Tian, linux-kselftest; +Cc: Lu Baolu, kvm, Nicolin Chen, Yi Liu

From: Nicolin Chen <nicolinc@nvidia.com>

Allow the selftest to call the function on the mock idev, add some tests
to exercise it.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/iommufd_test.h          |  4 ++
 drivers/iommu/iommufd/selftest.c              | 39 +++++++++++++++++++
 include/linux/iommufd.h                       |  1 +
 tools/testing/selftests/iommu/iommufd.c       | 34 +++++++++++++++-
 .../selftests/iommu/iommufd_fail_nth.c        | 27 +++++++++++++
 tools/testing/selftests/iommu/iommufd_utils.h | 28 +++++++++++++
 6 files changed, 132 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index b3d69cca77295c..e3f1035cbd6464 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -17,6 +17,7 @@ enum {
 	IOMMU_TEST_OP_ACCESS_PAGES,
 	IOMMU_TEST_OP_ACCESS_RW,
 	IOMMU_TEST_OP_SET_TEMP_MEMORY_LIMIT,
+	IOMMU_TEST_OP_MOCK_DOMAIN_REPLACE,
 };
 
 enum {
@@ -52,6 +53,9 @@ struct iommu_test_cmd {
 			__u32 out_stdev_id;
 			__u32 out_hwpt_id;
 		} mock_domain;
+		struct {
+			__u32 pt_id;
+		} mock_domain_replace;
 		struct {
 			__aligned_u64 iova;
 			__aligned_u64 length;
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 58471f9452be55..dd792294298ffd 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -455,6 +455,42 @@ static int iommufd_test_mock_domain(struct iommufd_ucmd *ucmd,
 	return rc;
 }
 
+/* Replace the mock domain with a manually allocated hw_pagetable */
+static int iommufd_test_mock_domain_replace(struct iommufd_ucmd *ucmd,
+					    unsigned int device_id, u32 pt_id,
+					    struct iommu_test_cmd *cmd)
+{
+	struct iommufd_object *dev_obj;
+	struct selftest_obj *sobj;
+	int rc;
+
+	/*
+	 * Prefer to use the OBJ_SELFTEST because the destroy_rwsem will ensure
+	 * it doesn't race with detach, which is not allowed.
+	 */
+	dev_obj =
+		iommufd_get_object(ucmd->ictx, device_id, IOMMUFD_OBJ_SELFTEST);
+	if (IS_ERR(dev_obj))
+		return PTR_ERR(dev_obj);
+
+	sobj = container_of(dev_obj, struct selftest_obj, obj);
+	if (sobj->type != TYPE_IDEV) {
+		rc = -EINVAL;
+		goto out_dev_obj;
+	}
+
+	rc = iommufd_device_replace(sobj->idev.idev, &pt_id);
+	if (rc)
+		goto out_dev_obj;
+
+	cmd->mock_domain_replace.pt_id = pt_id;
+	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
+
+out_dev_obj:
+	iommufd_put_object(dev_obj);
+	return rc;
+}
+
 /* Add an additional reserved IOVA to the IOAS */
 static int iommufd_test_add_reserved(struct iommufd_ucmd *ucmd,
 				     unsigned int mockpt_id,
@@ -944,6 +980,9 @@ int iommufd_test(struct iommufd_ucmd *ucmd)
 						 cmd->add_reserved.length);
 	case IOMMU_TEST_OP_MOCK_DOMAIN:
 		return iommufd_test_mock_domain(ucmd, cmd);
+	case IOMMU_TEST_OP_MOCK_DOMAIN_REPLACE:
+		return iommufd_test_mock_domain_replace(
+			ucmd, cmd->id, cmd->mock_domain_replace.pt_id, cmd);
 	case IOMMU_TEST_OP_MD_CHECK_MAP:
 		return iommufd_test_md_check_pa(
 			ucmd, cmd->id, cmd->check_map.iova,
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index c0b5b3ac34f1e0..3044a432a83e22 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -22,6 +22,7 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
 void iommufd_device_unbind(struct iommufd_device *idev);
 
 int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id);
+int iommufd_device_replace(struct iommufd_device *idev, u32 *pt_id);
 void iommufd_device_detach(struct iommufd_device *idev);
 
 struct iommufd_access_ops {
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index da0443ba16830f..77b0601fd13a71 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -1001,6 +1001,7 @@ FIXTURE(iommufd_mock_domain)
 	uint32_t ioas_id;
 	uint32_t hwpt_id;
 	uint32_t hwpt_ids[2];
+	uint32_t stdev_ids[2];
 	int mmap_flags;
 	size_t mmap_buf_size;
 };
@@ -1022,7 +1023,8 @@ FIXTURE_SETUP(iommufd_mock_domain)
 	ASSERT_GE(ARRAY_SIZE(self->hwpt_ids), variant->mock_domains);
 
 	for (i = 0; i != variant->mock_domains; i++)
-		test_cmd_mock_domain(self->ioas_id, NULL, &self->hwpt_ids[i]);
+		test_cmd_mock_domain(self->ioas_id, &self->stdev_ids[i],
+				     &self->hwpt_ids[i]);
 	self->hwpt_id = self->hwpt_ids[0];
 
 	self->mmap_flags = MAP_SHARED | MAP_ANONYMOUS;
@@ -1274,6 +1276,36 @@ TEST_F(iommufd_mock_domain, user_copy)
 	test_ioctl_destroy(ioas_id);
 }
 
+TEST_F(iommufd_mock_domain, replace)
+{
+	uint32_t ioas_id;
+
+	test_ioctl_ioas_alloc(&ioas_id);
+
+	test_cmd_mock_domain_replace(self->stdev_ids[0], ioas_id);
+
+	/*
+	 * Replacing the IOAS causes the prior HWPT to be deallocated, thus we
+	 * should get enoent when we try to use it.
+	 */
+	if (variant->mock_domains == 1)
+		test_err_mock_domain_replace(ENOENT, self->stdev_ids[0],
+					     self->hwpt_ids[0]);
+
+	test_cmd_mock_domain_replace(self->stdev_ids[0], ioas_id);
+	if (variant->mock_domains >= 2) {
+		test_cmd_mock_domain_replace(self->stdev_ids[0],
+					     self->hwpt_ids[1]);
+		test_cmd_mock_domain_replace(self->stdev_ids[0],
+					     self->hwpt_ids[1]);
+		test_cmd_mock_domain_replace(self->stdev_ids[0],
+					     self->hwpt_ids[0]);
+	}
+
+	test_cmd_mock_domain_replace(self->stdev_ids[0], self->ioas_id);
+	test_ioctl_destroy(ioas_id);
+}
+
 /* VFIO compatibility IOCTLs */
 
 TEST_F(iommufd, simple_ioctls)
diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c
index d9afcb23810e1a..baaea15b355297 100644
--- a/tools/testing/selftests/iommu/iommufd_fail_nth.c
+++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c
@@ -569,4 +569,31 @@ TEST_FAIL_NTH(basic_fail_nth, access_pin_domain)
 	return 0;
 }
 
+/* device.c */
+TEST_FAIL_NTH(basic_fail_nth, device)
+{
+	uint32_t ioas_id;
+	uint32_t ioas_id2;
+	uint32_t stdev_id;
+
+	self->fd = open("/dev/iommu", O_RDWR);
+	if (self->fd == -1)
+		return -1;
+
+	if (_test_ioctl_ioas_alloc(self->fd, &ioas_id))
+		return -1;
+
+	if (_test_ioctl_ioas_alloc(self->fd, &ioas_id2))
+		return -1;
+
+	fail_nth_enable();
+
+	if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, NULL))
+		return -1;
+
+	if (_test_cmd_mock_domain_replace(self->fd, stdev_id, ioas_id2, NULL))
+		return -1;
+	return 0;
+}
+
 TEST_HARNESS_MAIN
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 85d6662ef8e867..3222f246600422 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -66,6 +66,34 @@ static int _test_cmd_mock_domain(int fd, unsigned int ioas_id, __u32 *stdev_id,
 	EXPECT_ERRNO(_errno, _test_cmd_mock_domain(self->fd, ioas_id, \
 						   stdev_id, hwpt_id))
 
+static int _test_cmd_mock_domain_replace(int fd, __u32 stdev_id, __u32 pt_id,
+					 __u32 *hwpt_id)
+{
+	struct iommu_test_cmd cmd = {
+		.size = sizeof(cmd),
+		.op = IOMMU_TEST_OP_MOCK_DOMAIN_REPLACE,
+		.id = stdev_id,
+		.mock_domain_replace = {
+			.pt_id = pt_id,
+		},
+	};
+	int ret;
+
+	ret = ioctl(fd, IOMMU_TEST_CMD, &cmd);
+	if (ret)
+		return ret;
+	if (hwpt_id)
+		*hwpt_id = cmd.mock_domain_replace.pt_id;
+	return 0;
+}
+
+#define test_cmd_mock_domain_replace(stdev_id, pt_id)                         \
+	ASSERT_EQ(0, _test_cmd_mock_domain_replace(self->fd, stdev_id, pt_id, \
+						   NULL))
+#define test_err_mock_domain_replace(_errno, stdev_id, pt_id)                  \
+	EXPECT_ERRNO(_errno, _test_cmd_mock_domain_replace(self->fd, stdev_id, \
+							   pt_id, NULL))
+
 static int _test_cmd_create_access(int fd, unsigned int ioas_id,
 				   __u32 *access_id, unsigned int flags)
 {
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v3 15/17] iommufd: Add IOMMU_HWPT_ALLOC
  2023-03-21 19:14 [PATCH v3 00/17] Add iommufd physical device operations for replace and alloc hwpt Jason Gunthorpe
                   ` (13 preceding siblings ...)
  2023-03-21 19:14 ` [PATCH v3 14/17] iommufd/selftest: Test iommufd_device_replace() Jason Gunthorpe
@ 2023-03-21 19:14 ` Jason Gunthorpe
  2023-03-23  8:00   ` Tian, Kevin
  2023-03-21 19:14 ` [PATCH v3 16/17] iommufd/selftest: Return the real idev id from selftest mock_domain Jason Gunthorpe
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 46+ messages in thread
From: Jason Gunthorpe @ 2023-03-21 19:14 UTC (permalink / raw)
  To: iommu, Kevin Tian, linux-kselftest; +Cc: Lu Baolu, kvm, Nicolin Chen, Yi Liu

This allows userspace to manually create HWPTs on IOAS's and then use
those HWPTs as inputs to iommufd_device_attach/replace().

Following series will extend this to allow creating iommu_domains with
driver specific parameters.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommufd/hw_pagetable.c    | 46 +++++++++++++++++++++++++
 drivers/iommu/iommufd/iommufd_private.h |  9 +++++
 drivers/iommu/iommufd/main.c            |  3 ++
 include/uapi/linux/iommufd.h            | 26 ++++++++++++++
 4 files changed, 84 insertions(+)

diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 2584f9038b29a2..30c4c44a99b7a9 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -3,6 +3,7 @@
  * Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES
  */
 #include <linux/iommu.h>
+#include <uapi/linux/iommufd.h>
 
 #include "iommufd_private.h"
 
@@ -121,3 +122,48 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 	iommufd_object_abort_and_destroy(ictx, &hwpt->obj);
 	return ERR_PTR(rc);
 }
+
+int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
+{
+	struct iommu_hwpt_alloc *cmd = ucmd->cmd;
+	struct iommufd_hw_pagetable *hwpt;
+	struct iommufd_device *idev;
+	struct iommufd_ioas *ioas;
+	int rc;
+
+	if (cmd->flags || cmd->__reserved)
+		return -EOPNOTSUPP;
+
+	idev = iommufd_get_device(ucmd, cmd->dev_id);
+	if (IS_ERR(idev))
+		return PTR_ERR(idev);
+
+	ioas = iommufd_get_ioas(ucmd, cmd->pt_id);
+	if (IS_ERR(ioas)) {
+		rc = PTR_ERR(ioas);
+		goto out_put_idev;
+	}
+
+	mutex_lock(&ioas->mutex);
+	hwpt = iommufd_hw_pagetable_alloc(ucmd->ictx, ioas, idev, false);
+	mutex_unlock(&ioas->mutex);
+	if (IS_ERR(hwpt)) {
+		rc = PTR_ERR(hwpt);
+		goto out_put_ioas;
+	}
+
+	cmd->out_hwpt_id = hwpt->obj.id;
+	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
+	if (rc)
+		goto out_hwpt;
+	iommufd_object_finalize(ucmd->ictx, &hwpt->obj);
+	goto out_put_ioas;
+
+out_hwpt:
+	iommufd_object_abort_and_destroy(ucmd->ictx, &hwpt->obj);
+out_put_ioas:
+	iommufd_put_object(&ioas->obj);
+out_put_idev:
+	iommufd_put_object(&idev->obj);
+	return rc;
+}
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index f80b012e1bc200..cb693190bf51c5 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -261,6 +261,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
 struct iommufd_hw_pagetable *
 iommufd_hw_pagetable_detach(struct iommufd_device *idev);
 void iommufd_hw_pagetable_destroy(struct iommufd_object *obj);
+int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd);
 
 static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx,
 					    struct iommufd_hw_pagetable *hwpt)
@@ -297,6 +298,14 @@ struct iommufd_device {
 	bool enforce_cache_coherency;
 };
 
+static inline struct iommufd_device *
+iommufd_get_device(struct iommufd_ucmd *ucmd, u32 id)
+{
+	return container_of(iommufd_get_object(ucmd->ictx, id,
+					       IOMMUFD_OBJ_DEVICE),
+			    struct iommufd_device, obj);
+}
+
 void iommufd_device_destroy(struct iommufd_object *obj);
 
 struct iommufd_access {
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 9cba592d0482e7..694da191e4b155 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -261,6 +261,7 @@ static int iommufd_option(struct iommufd_ucmd *ucmd)
 
 union ucmd_buffer {
 	struct iommu_destroy destroy;
+	struct iommu_hwpt_alloc hwpt;
 	struct iommu_ioas_alloc alloc;
 	struct iommu_ioas_allow_iovas allow_iovas;
 	struct iommu_ioas_copy ioas_copy;
@@ -292,6 +293,8 @@ struct iommufd_ioctl_op {
 	}
 static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
 	IOCTL_OP(IOMMU_DESTROY, iommufd_destroy, struct iommu_destroy, id),
+	IOCTL_OP(IOMMU_HWPT_ALLOC, iommufd_hwpt_alloc, struct iommu_hwpt_alloc,
+		 __reserved),
 	IOCTL_OP(IOMMU_IOAS_ALLOC, iommufd_ioas_alloc_ioctl,
 		 struct iommu_ioas_alloc, out_ioas_id),
 	IOCTL_OP(IOMMU_IOAS_ALLOW_IOVAS, iommufd_ioas_allow_iovas,
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 98ebba80cfa1fc..8245c01adca673 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -45,6 +45,7 @@ enum {
 	IOMMUFD_CMD_IOAS_UNMAP,
 	IOMMUFD_CMD_OPTION,
 	IOMMUFD_CMD_VFIO_IOAS,
+	IOMMUFD_CMD_HWPT_ALLOC,
 };
 
 /**
@@ -344,4 +345,29 @@ struct iommu_vfio_ioas {
 	__u16 __reserved;
 };
 #define IOMMU_VFIO_IOAS _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VFIO_IOAS)
+
+/**
+ * struct iommu_hwpt_alloc - ioctl(IOMMU_HWPT_ALLOC)
+ * @size: sizeof(struct iommu_hwpt_alloc)
+ * @flags: Must be 0
+ * @dev_id: The device to allocate this HWPT for
+ * @pt_id: The IOAS to connect this HWPT to
+ * @out_hwpt_id: The ID of the new HWPT
+ * @__reserved: Must be 0
+ *
+ * Explicitly allocate a hardware page table object. This is the same object
+ * type that is returned by iommufd_device_attach() and represents the
+ * underlying iommu driver's iommu_domain kernel object.
+ *
+ * A HWPT will be created with the IOVA mappings from the given IOAS.
+ */
+struct iommu_hwpt_alloc {
+	__u32 size;
+	__u32 flags;
+	__u32 dev_id;
+	__u32 pt_id;
+	__u32 out_hwpt_id;
+	__u32 __reserved;
+};
+#define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_ALLOC)
 #endif
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v3 16/17] iommufd/selftest: Return the real idev id from selftest mock_domain
  2023-03-21 19:14 [PATCH v3 00/17] Add iommufd physical device operations for replace and alloc hwpt Jason Gunthorpe
                   ` (14 preceding siblings ...)
  2023-03-21 19:14 ` [PATCH v3 15/17] iommufd: Add IOMMU_HWPT_ALLOC Jason Gunthorpe
@ 2023-03-21 19:14 ` Jason Gunthorpe
  2023-03-23  8:02   ` Tian, Kevin
  2023-03-21 19:14 ` [PATCH v3 17/17] iommufd/selftest: Add a selftest for IOMMU_HWPT_ALLOC Jason Gunthorpe
  2023-03-23  8:04 ` [PATCH v3 00/17] Add iommufd physical device operations for replace and alloc hwpt Tian, Kevin
  17 siblings, 1 reply; 46+ messages in thread
From: Jason Gunthorpe @ 2023-03-21 19:14 UTC (permalink / raw)
  To: iommu, Kevin Tian, linux-kselftest; +Cc: Lu Baolu, kvm, Nicolin Chen, Yi Liu

Now that we actually call iommufd_device_bind() we can return the
idev_id from that function to userspace for use in other APIs.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommufd/iommufd_test.h           |  2 ++
 drivers/iommu/iommufd/selftest.c               |  1 +
 tools/testing/selftests/iommu/iommufd.c        | 17 +++++++++--------
 .../testing/selftests/iommu/iommufd_fail_nth.c | 18 ++++++++++--------
 tools/testing/selftests/iommu/iommufd_utils.h  | 12 +++++++-----
 5 files changed, 29 insertions(+), 21 deletions(-)

diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index e3f1035cbd6464..dd9168a20ddf4c 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -52,6 +52,8 @@ struct iommu_test_cmd {
 		struct {
 			__u32 out_stdev_id;
 			__u32 out_hwpt_id;
+			/* out_idev_id is the standard iommufd_bind object */
+			__u32 out_idev_id;
 		} mock_domain;
 		struct {
 			__u32 pt_id;
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index dd792294298ffd..a9c22560f51aa9 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -443,6 +443,7 @@ static int iommufd_test_mock_domain(struct iommufd_ucmd *ucmd,
 	/* Userspace must destroy the device_id to destroy the object */
 	cmd->mock_domain.out_hwpt_id = pt_id;
 	cmd->mock_domain.out_stdev_id = sobj->obj.id;
+	cmd->mock_domain.out_idev_id = idev_id;
 	iommufd_object_finalize(ucmd->ictx, &sobj->obj);
 	return iommufd_ucmd_respond(ucmd, sizeof(*cmd));
 
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index 77b0601fd13a71..83c98a371f9e7a 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -214,7 +214,7 @@ FIXTURE_SETUP(iommufd_ioas)
 
 	for (i = 0; i != variant->mock_domains; i++) {
 		test_cmd_mock_domain(self->ioas_id, &self->stdev_id,
-				     &self->hwpt_id);
+				     &self->hwpt_id, NULL);
 		self->base_iova = MOCK_APERTURE_START;
 	}
 }
@@ -265,7 +265,7 @@ TEST_F(iommufd_ioas, hwpt_attach)
 {
 	/* Create a device attached directly to a hwpt */
 	if (self->stdev_id) {
-		test_cmd_mock_domain(self->hwpt_id, NULL, NULL);
+		test_cmd_mock_domain(self->hwpt_id, NULL, NULL, NULL);
 	} else {
 		test_err_mock_domain(ENOENT, self->hwpt_id, NULL, NULL);
 	}
@@ -650,7 +650,7 @@ TEST_F(iommufd_ioas, access_pin)
 				   _IOMMU_TEST_CMD(IOMMU_TEST_OP_ACCESS_PAGES),
 				   &access_cmd));
 		test_cmd_mock_domain(self->ioas_id, &mock_stdev_id,
-				     &mock_hwpt_id);
+				     &mock_hwpt_id, NULL);
 		check_map_cmd.id = mock_hwpt_id;
 		ASSERT_EQ(0, ioctl(self->fd,
 				   _IOMMU_TEST_CMD(IOMMU_TEST_OP_MD_CHECK_MAP),
@@ -805,7 +805,7 @@ TEST_F(iommufd_ioas, fork_gone)
 		 * If a domain already existed then everything was pinned within
 		 * the fork, so this copies from one domain to another.
 		 */
-		test_cmd_mock_domain(self->ioas_id, NULL, NULL);
+		test_cmd_mock_domain(self->ioas_id, NULL, NULL, NULL);
 		check_access_rw(_metadata, self->fd, access_id,
 				MOCK_APERTURE_START, 0);
 
@@ -854,7 +854,7 @@ TEST_F(iommufd_ioas, fork_present)
 	ASSERT_EQ(8, read(efd, &tmp, sizeof(tmp)));
 
 	/* Read pages from the remote process */
-	test_cmd_mock_domain(self->ioas_id, NULL, NULL);
+	test_cmd_mock_domain(self->ioas_id, NULL, NULL, NULL);
 	check_access_rw(_metadata, self->fd, access_id, MOCK_APERTURE_START, 0);
 
 	ASSERT_EQ(0, close(pipefds[1]));
@@ -1002,6 +1002,7 @@ FIXTURE(iommufd_mock_domain)
 	uint32_t hwpt_id;
 	uint32_t hwpt_ids[2];
 	uint32_t stdev_ids[2];
+	uint32_t idev_ids[2];
 	int mmap_flags;
 	size_t mmap_buf_size;
 };
@@ -1024,7 +1025,7 @@ FIXTURE_SETUP(iommufd_mock_domain)
 
 	for (i = 0; i != variant->mock_domains; i++)
 		test_cmd_mock_domain(self->ioas_id, &self->stdev_ids[i],
-				     &self->hwpt_ids[i]);
+				     &self->hwpt_ids[i], &self->idev_ids[i]);
 	self->hwpt_id = self->hwpt_ids[0];
 
 	self->mmap_flags = MAP_SHARED | MAP_ANONYMOUS;
@@ -1218,7 +1219,7 @@ TEST_F(iommufd_mock_domain, all_aligns_copy)
 			/* Add and destroy a domain while the area exists */
 			old_id = self->hwpt_ids[1];
 			test_cmd_mock_domain(self->ioas_id, &mock_stdev_id,
-					     &self->hwpt_ids[1]);
+					     &self->hwpt_ids[1], NULL);
 
 			check_mock_iova(buf + start, iova, length);
 			check_refs(buf + start / PAGE_SIZE * PAGE_SIZE,
@@ -1427,7 +1428,7 @@ FIXTURE_SETUP(vfio_compat_mock_domain)
 
 	/* Create what VFIO would consider a group */
 	test_ioctl_ioas_alloc(&self->ioas_id);
-	test_cmd_mock_domain(self->ioas_id, NULL, NULL);
+	test_cmd_mock_domain(self->ioas_id, NULL, NULL, NULL);
 
 	/* Attach it to the vfio compat */
 	vfio_ioas_cmd.ioas_id = self->ioas_id;
diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c
index baaea15b355297..7a3d149019da92 100644
--- a/tools/testing/selftests/iommu/iommufd_fail_nth.c
+++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c
@@ -313,7 +313,7 @@ TEST_FAIL_NTH(basic_fail_nth, map_domain)
 
 	fail_nth_enable();
 
-	if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, &hwpt_id))
+	if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, &hwpt_id, NULL))
 		return -1;
 
 	if (_test_ioctl_ioas_map(self->fd, ioas_id, buffer, 262144, &iova,
@@ -324,7 +324,7 @@ TEST_FAIL_NTH(basic_fail_nth, map_domain)
 	if (_test_ioctl_destroy(self->fd, stdev_id))
 		return -1;
 
-	if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, &hwpt_id))
+	if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, &hwpt_id, NULL))
 		return -1;
 	return 0;
 }
@@ -348,12 +348,13 @@ TEST_FAIL_NTH(basic_fail_nth, map_two_domains)
 	if (_test_ioctl_set_temp_memory_limit(self->fd, 32))
 		return -1;
 
-	if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, &hwpt_id))
+	if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, &hwpt_id, NULL))
 		return -1;
 
 	fail_nth_enable();
 
-	if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id2, &hwpt_id2))
+	if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id2, &hwpt_id2,
+				  NULL))
 		return -1;
 
 	if (_test_ioctl_ioas_map(self->fd, ioas_id, buffer, 262144, &iova,
@@ -367,9 +368,10 @@ TEST_FAIL_NTH(basic_fail_nth, map_two_domains)
 	if (_test_ioctl_destroy(self->fd, stdev_id2))
 		return -1;
 
-	if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, &hwpt_id))
+	if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, &hwpt_id, NULL))
 		return -1;
-	if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id2, &hwpt_id2))
+	if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id2, &hwpt_id2,
+				  NULL))
 		return -1;
 	return 0;
 }
@@ -526,7 +528,7 @@ TEST_FAIL_NTH(basic_fail_nth, access_pin_domain)
 	if (_test_ioctl_set_temp_memory_limit(self->fd, 32))
 		return -1;
 
-	if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, &hwpt_id))
+	if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, &hwpt_id, NULL))
 		return -1;
 
 	if (_test_ioctl_ioas_map(self->fd, ioas_id, buffer, BUFFER_SIZE, &iova,
@@ -588,7 +590,7 @@ TEST_FAIL_NTH(basic_fail_nth, device)
 
 	fail_nth_enable();
 
-	if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, NULL))
+	if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, NULL, NULL))
 		return -1;
 
 	if (_test_cmd_mock_domain_replace(self->fd, stdev_id, ioas_id2, NULL))
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 3222f246600422..e67a929a5c87d3 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -39,7 +39,7 @@ static unsigned long BUFFER_SIZE;
 	})
 
 static int _test_cmd_mock_domain(int fd, unsigned int ioas_id, __u32 *stdev_id,
-				 __u32 *hwpt_id)
+				 __u32 *hwpt_id, __u32 *idev_id)
 {
 	struct iommu_test_cmd cmd = {
 		.size = sizeof(cmd),
@@ -57,14 +57,16 @@ static int _test_cmd_mock_domain(int fd, unsigned int ioas_id, __u32 *stdev_id,
 	assert(cmd.id != 0);
 	if (hwpt_id)
 		*hwpt_id = cmd.mock_domain.out_hwpt_id;
+	if (idev_id)
+		*idev_id = cmd.mock_domain.out_idev_id;
 	return 0;
 }
-#define test_cmd_mock_domain(ioas_id, stdev_id, hwpt_id) \
-	ASSERT_EQ(0,                                     \
-		  _test_cmd_mock_domain(self->fd, ioas_id, stdev_id, hwpt_id))
+#define test_cmd_mock_domain(ioas_id, stdev_id, hwpt_id, idev_id)       \
+	ASSERT_EQ(0, _test_cmd_mock_domain(self->fd, ioas_id, stdev_id, \
+					   hwpt_id, idev_id))
 #define test_err_mock_domain(_errno, ioas_id, stdev_id, hwpt_id)      \
 	EXPECT_ERRNO(_errno, _test_cmd_mock_domain(self->fd, ioas_id, \
-						   stdev_id, hwpt_id))
+						   stdev_id, hwpt_id, NULL))
 
 static int _test_cmd_mock_domain_replace(int fd, __u32 stdev_id, __u32 pt_id,
 					 __u32 *hwpt_id)
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v3 17/17] iommufd/selftest: Add a selftest for IOMMU_HWPT_ALLOC
  2023-03-21 19:14 [PATCH v3 00/17] Add iommufd physical device operations for replace and alloc hwpt Jason Gunthorpe
                   ` (15 preceding siblings ...)
  2023-03-21 19:14 ` [PATCH v3 16/17] iommufd/selftest: Return the real idev id from selftest mock_domain Jason Gunthorpe
@ 2023-03-21 19:14 ` Jason Gunthorpe
  2023-03-23  8:03   ` Tian, Kevin
  2023-03-23  8:04 ` [PATCH v3 00/17] Add iommufd physical device operations for replace and alloc hwpt Tian, Kevin
  17 siblings, 1 reply; 46+ messages in thread
From: Jason Gunthorpe @ 2023-03-21 19:14 UTC (permalink / raw)
  To: iommu, Kevin Tian, linux-kselftest; +Cc: Lu Baolu, kvm, Nicolin Chen, Yi Liu

Test the basic flow.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 tools/testing/selftests/iommu/iommufd.c       | 15 +++++++++++++
 .../selftests/iommu/iommufd_fail_nth.c        | 11 +++++++++-
 tools/testing/selftests/iommu/iommufd_utils.h | 21 +++++++++++++++++++
 3 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index 83c98a371f9e7a..c07252dbf62d72 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -1307,6 +1307,21 @@ TEST_F(iommufd_mock_domain, replace)
 	test_ioctl_destroy(ioas_id);
 }
 
+TEST_F(iommufd_mock_domain, alloc_hwpt)
+{
+	int i;
+
+	for (i = 0; i != variant->mock_domains; i++) {
+		uint32_t stddev_id;
+		uint32_t hwpt_id;
+
+		test_cmd_hwpt_alloc(self->idev_ids[0], self->ioas_id, &hwpt_id);
+		test_cmd_mock_domain(hwpt_id, &stddev_id, NULL, NULL);
+		test_ioctl_destroy(stddev_id);
+		test_ioctl_destroy(hwpt_id);
+	}
+}
+
 /* VFIO compatibility IOCTLs */
 
 TEST_F(iommufd, simple_ioctls)
diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c
index 7a3d149019da92..7e1afb6ff9bd8d 100644
--- a/tools/testing/selftests/iommu/iommufd_fail_nth.c
+++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c
@@ -577,6 +577,8 @@ TEST_FAIL_NTH(basic_fail_nth, device)
 	uint32_t ioas_id;
 	uint32_t ioas_id2;
 	uint32_t stdev_id;
+	uint32_t idev_id;
+	uint32_t hwpt_id;
 
 	self->fd = open("/dev/iommu", O_RDWR);
 	if (self->fd == -1)
@@ -590,11 +592,18 @@ TEST_FAIL_NTH(basic_fail_nth, device)
 
 	fail_nth_enable();
 
-	if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, NULL, NULL))
+	if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, NULL,
+				  &idev_id))
+		return -1;
+
+	if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, &hwpt_id))
 		return -1;
 
 	if (_test_cmd_mock_domain_replace(self->fd, stdev_id, ioas_id2, NULL))
 		return -1;
+
+	if (_test_cmd_mock_domain_replace(self->fd, stdev_id, hwpt_id, NULL))
+		return -1;
 	return 0;
 }
 
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index e67a929a5c87d3..9b6dcb921750b6 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -96,6 +96,27 @@ static int _test_cmd_mock_domain_replace(int fd, __u32 stdev_id, __u32 pt_id,
 	EXPECT_ERRNO(_errno, _test_cmd_mock_domain_replace(self->fd, stdev_id, \
 							   pt_id, NULL))
 
+static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id,
+					 __u32 *hwpt_id)
+{
+	struct iommu_hwpt_alloc cmd = {
+		.size = sizeof(cmd),
+		.dev_id = device_id,
+		.pt_id = pt_id,
+	};
+	int ret;
+
+	ret = ioctl(fd, IOMMU_HWPT_ALLOC, &cmd);
+	if (ret)
+		return ret;
+	if (hwpt_id)
+		*hwpt_id = cmd.out_hwpt_id;
+	return 0;
+}
+
+#define test_cmd_hwpt_alloc(device_id, pt_id, hwpt_id) \
+	ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, hwpt_id))
+
 static int _test_cmd_create_access(int fd, unsigned int ioas_id,
 				   __u32 *access_id, unsigned int flags)
 {
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* RE: [PATCH v3 03/17] iommufd: Replace the hwpt->devices list with iommufd_group
  2023-03-21 19:14 ` [PATCH v3 03/17] iommufd: Replace the hwpt->devices list with iommufd_group Jason Gunthorpe
@ 2023-03-23  7:21   ` Tian, Kevin
  2023-03-23 14:23     ` Jason Gunthorpe
  2023-04-11 14:31     ` Jason Gunthorpe
  0 siblings, 2 replies; 46+ messages in thread
From: Tian, Kevin @ 2023-03-23  7:21 UTC (permalink / raw)
  To: Jason Gunthorpe, iommu, linux-kselftest
  Cc: Lu Baolu, kvm, Nicolin Chen, Liu, Yi L

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, March 22, 2023 3:15 AM
> 
>  	/*
> -	 * FIXME: Hack around missing a device-centric iommu api, only
> attach to
> -	 * the group once for the first device that is in the group.
> +	 * Only attach to the group once for the first device that is in the
> +	 * group. All the other devices will follow this attachment. The user
> +	 * should attach every device individually to as the per-device
> reserved

"individually to the hwpt"

> 
> -	mutex_lock(&hwpt->devices_lock);
> +	mutex_lock(&idev->igroup->lock);
> 
>  	/*
>  	 * immediate_attach exists only to accommodate iommu drivers that
> cannot
>  	 * directly allocate a domain. These drivers do not finish creating the
>  	 * domain until attach is completed. Thus we must have this call
>  	 * sequence. Once those drivers are fixed this should be removed.
> +	 *
> +	 * Note we hold the igroup->lock here which prevents any other
> thread
> +	 * from observing igroup->hwpt until we finish setting it up.
>  	 */
>  	if (immediate_attach) {
>  		rc = iommufd_hw_pagetable_attach(hwpt, idev);

I thought about whether holding igroup->lock is necessary here.

The caller should avoid racing attach/detach/replace on the same device.

Then the only possible race would come from other devices in the group.

But if above attach call succeeds it implies that all other devices must
haven't been  attached yet then the only allowed operation is to attach
them to the same ioas as the calling device does (replace cannot happen
w/o attach first). Then it's already protected by ioas->mutex in
iommufd_device_auto_get_domain().

If no oversight then we can directly put the lock in
iommufd_hw_pagetable_attach/detach() which can also simplify a bit on 
its callers in device.c.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH v3 10/17] iommufd: Reorganize iommufd_device_attach into iommufd_device_change_pt
  2023-03-21 19:14 ` [PATCH v3 10/17] iommufd: Reorganize iommufd_device_attach into iommufd_device_change_pt Jason Gunthorpe
@ 2023-03-23  7:25   ` Tian, Kevin
  2023-03-23 14:26     ` Jason Gunthorpe
  0 siblings, 1 reply; 46+ messages in thread
From: Tian, Kevin @ 2023-03-23  7:25 UTC (permalink / raw)
  To: Jason Gunthorpe, iommu, linux-kselftest
  Cc: Lu Baolu, kvm, Nicolin Chen, Liu, Yi L

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, March 22, 2023 3:15 AM
> +int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id)
> +{
> +	int rc;
> +
> +	rc = iommufd_device_change_pt(idev, pt_id,
> &iommufd_device_do_attach);
> +	if (rc)
> +		return rc;
> +
> +	/*
> +	 * Pairs with iommufd_device_detach() - catches caller bugs
> attempting
> +	 * to destroy a device with an attachment.
> +	 */
> +	refcount_inc(&idev->obj.users);
> +	return 0;

While it's the right thing to do I'm not sure what "catch caller bugs"
mean here. There is no WARN_ON to catch.

Otherwise it's good.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH v3 12/17] iommufd: Add iommufd_device_replace()
  2023-03-21 19:14 ` [PATCH v3 12/17] iommufd: Add iommufd_device_replace() Jason Gunthorpe
@ 2023-03-23  7:31   ` Tian, Kevin
  2023-03-23 14:30     ` Jason Gunthorpe
  0 siblings, 1 reply; 46+ messages in thread
From: Tian, Kevin @ 2023-03-23  7:31 UTC (permalink / raw)
  To: Jason Gunthorpe, iommu, linux-kselftest
  Cc: Lu Baolu, kvm, Nicolin Chen, Liu, Yi L

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, March 22, 2023 3:15 AM
> +
> +	mutex_lock(&idev->igroup->lock);
> +
> +	if (igroup->hwpt == NULL) {
> +		rc = -EINVAL;
> +		goto err_unlock;
> +	}
> +
> +	if (hwpt == igroup->hwpt) {
> +		mutex_unlock(&idev->igroup->lock);
> +		return NULL;
> +	}

goto err_unlock;

> +
> +	/* Move the refcounts held by the device_list to the new hwpt */
> +	refcount_add(num_devices, &hwpt->obj.users);
> +	if (num_devices > 1)
> +		WARN_ON(refcount_sub_and_test(num_devices - 1,
> +					      &old_hwpt->obj.users));

A comment is welcomed to match "caller must destroy old_hwpt".

Otherwise looks good.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH v3 13/17] iommufd: Make destroy_rwsem use a lock class per object type
  2023-03-21 19:14 ` [PATCH v3 13/17] iommufd: Make destroy_rwsem use a lock class per object type Jason Gunthorpe
@ 2023-03-23  7:54   ` Tian, Kevin
  0 siblings, 0 replies; 46+ messages in thread
From: Tian, Kevin @ 2023-03-23  7:54 UTC (permalink / raw)
  To: Jason Gunthorpe, iommu, linux-kselftest
  Cc: Lu Baolu, kvm, Nicolin Chen, Liu, Yi L

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, March 22, 2023 3:15 AM
> 
> The selftest invokes things like replace under the object lock of its
> idev which protects the idev in a similar way to a real user.
> Unfortunately this triggers lockdep. A lock class per type will solve the
> problem.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH v3 14/17] iommufd/selftest: Test iommufd_device_replace()
  2023-03-21 19:14 ` [PATCH v3 14/17] iommufd/selftest: Test iommufd_device_replace() Jason Gunthorpe
@ 2023-03-23  7:57   ` Tian, Kevin
  2023-03-23 14:32     ` Jason Gunthorpe
  0 siblings, 1 reply; 46+ messages in thread
From: Tian, Kevin @ 2023-03-23  7:57 UTC (permalink / raw)
  To: Jason Gunthorpe, iommu, linux-kselftest
  Cc: Lu Baolu, kvm, Nicolin Chen, Liu, Yi L

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, March 22, 2023 3:15 AM
> 
> From: Nicolin Chen <nicolinc@nvidia.com>
> 
> Allow the selftest to call the function on the mock idev, add some tests
> to exercise it.
> 
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>

Miss your s-o-b

> +
> +	test_cmd_mock_domain_replace(self->stdev_ids[0], ioas_id);
> +	if (variant->mock_domains >= 2) {
> +		test_cmd_mock_domain_replace(self->stdev_ids[0],
> +					     self->hwpt_ids[1]);
> +		test_cmd_mock_domain_replace(self->stdev_ids[0],
> +					     self->hwpt_ids[1]);

Duplicated or intended?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH v3 15/17] iommufd: Add IOMMU_HWPT_ALLOC
  2023-03-21 19:14 ` [PATCH v3 15/17] iommufd: Add IOMMU_HWPT_ALLOC Jason Gunthorpe
@ 2023-03-23  8:00   ` Tian, Kevin
  0 siblings, 0 replies; 46+ messages in thread
From: Tian, Kevin @ 2023-03-23  8:00 UTC (permalink / raw)
  To: Jason Gunthorpe, iommu, linux-kselftest
  Cc: Lu Baolu, kvm, Nicolin Chen, Liu, Yi L

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, March 22, 2023 3:15 AM
> 
> This allows userspace to manually create HWPTs on IOAS's and then use
> those HWPTs as inputs to iommufd_device_attach/replace().
> 
> Following series will extend this to allow creating iommu_domains with
> driver specific parameters.
> 
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH v3 16/17] iommufd/selftest: Return the real idev id from selftest mock_domain
  2023-03-21 19:14 ` [PATCH v3 16/17] iommufd/selftest: Return the real idev id from selftest mock_domain Jason Gunthorpe
@ 2023-03-23  8:02   ` Tian, Kevin
  0 siblings, 0 replies; 46+ messages in thread
From: Tian, Kevin @ 2023-03-23  8:02 UTC (permalink / raw)
  To: Jason Gunthorpe, iommu, linux-kselftest
  Cc: Lu Baolu, kvm, Nicolin Chen, Liu, Yi L

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, March 22, 2023 3:15 AM
> 
> Now that we actually call iommufd_device_bind() we can return the
> idev_id from that function to userspace for use in other APIs.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH v3 17/17] iommufd/selftest: Add a selftest for IOMMU_HWPT_ALLOC
  2023-03-21 19:14 ` [PATCH v3 17/17] iommufd/selftest: Add a selftest for IOMMU_HWPT_ALLOC Jason Gunthorpe
@ 2023-03-23  8:03   ` Tian, Kevin
  0 siblings, 0 replies; 46+ messages in thread
From: Tian, Kevin @ 2023-03-23  8:03 UTC (permalink / raw)
  To: Jason Gunthorpe, iommu, linux-kselftest
  Cc: Lu Baolu, kvm, Nicolin Chen, Liu, Yi L

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, March 22, 2023 3:15 AM
> 
> Test the basic flow.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH v3 00/17] Add iommufd physical device operations for replace and alloc hwpt
  2023-03-21 19:14 [PATCH v3 00/17] Add iommufd physical device operations for replace and alloc hwpt Jason Gunthorpe
                   ` (16 preceding siblings ...)
  2023-03-21 19:14 ` [PATCH v3 17/17] iommufd/selftest: Add a selftest for IOMMU_HWPT_ALLOC Jason Gunthorpe
@ 2023-03-23  8:04 ` Tian, Kevin
  2023-03-23 14:35   ` Jason Gunthorpe
  17 siblings, 1 reply; 46+ messages in thread
From: Tian, Kevin @ 2023-03-23  8:04 UTC (permalink / raw)
  To: Jason Gunthorpe, iommu, linux-kselftest
  Cc: Lu Baolu, kvm, Nicolin Chen, Liu, Yi L

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, March 22, 2023 3:15 AM
> 
> The implementation is complicated because we have to introduce some
> per-iommu_group memory in iommufd and redo how we think about multi-
> device
> groups to be more explicit. This solves all the locking problems in the
> prior attempts.
> 

By tracking groups explicitly should we also move DMA ownership claim
to be per group?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3 03/17] iommufd: Replace the hwpt->devices list with iommufd_group
  2023-03-23  7:21   ` Tian, Kevin
@ 2023-03-23 14:23     ` Jason Gunthorpe
  2023-03-24  1:37       ` Tian, Kevin
  2023-04-11 14:31     ` Jason Gunthorpe
  1 sibling, 1 reply; 46+ messages in thread
From: Jason Gunthorpe @ 2023-03-23 14:23 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: iommu, linux-kselftest, Lu Baolu, kvm, Nicolin Chen, Liu, Yi L

On Thu, Mar 23, 2023 at 07:21:42AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Wednesday, March 22, 2023 3:15 AM
> > 
> >  	/*
> > -	 * FIXME: Hack around missing a device-centric iommu api, only
> > attach to
> > -	 * the group once for the first device that is in the group.
> > +	 * Only attach to the group once for the first device that is in the
> > +	 * group. All the other devices will follow this attachment. The user
> > +	 * should attach every device individually to as the per-device
> > reserved
> 
> "individually to the hwpt"

Done

> I thought about whether holding igroup->lock is necessary here.
> 
> The caller should avoid racing attach/detach/replace on the same device.

I think even if the caller races we should be fine

The point of the lock scope was the capture these lines:

	rc = iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain);
	if (rc)
		goto out_detach;
	list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list);

But based on the current arrangement none of them rely on the igroup
mutex so it does seem we can narrow it

Thanks,
Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3 10/17] iommufd: Reorganize iommufd_device_attach into iommufd_device_change_pt
  2023-03-23  7:25   ` Tian, Kevin
@ 2023-03-23 14:26     ` Jason Gunthorpe
  0 siblings, 0 replies; 46+ messages in thread
From: Jason Gunthorpe @ 2023-03-23 14:26 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: iommu, linux-kselftest, Lu Baolu, kvm, Nicolin Chen, Liu, Yi L

On Thu, Mar 23, 2023 at 07:25:55AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Wednesday, March 22, 2023 3:15 AM
> > +int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id)
> > +{
> > +	int rc;
> > +
> > +	rc = iommufd_device_change_pt(idev, pt_id,
> > &iommufd_device_do_attach);
> > +	if (rc)
> > +		return rc;
> > +
> > +	/*
> > +	 * Pairs with iommufd_device_detach() - catches caller bugs
> > attempting
> > +	 * to destroy a device with an attachment.
> > +	 */
> > +	refcount_inc(&idev->obj.users);
> > +	return 0;
> 
> While it's the right thing to do I'm not sure what "catch caller bugs"
> mean here. There is no WARN_ON to catch.

The WARN_ON is on the destroy path of the device

	was_destroyed = iommufd_object_destroy_user(idev->ictx, &idev->obj);
	WARN_ON(!was_destroyed);

It is the idev we are incr'ing

Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3 12/17] iommufd: Add iommufd_device_replace()
  2023-03-23  7:31   ` Tian, Kevin
@ 2023-03-23 14:30     ` Jason Gunthorpe
  2023-03-24  1:42       ` Tian, Kevin
  0 siblings, 1 reply; 46+ messages in thread
From: Jason Gunthorpe @ 2023-03-23 14:30 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: iommu, linux-kselftest, Lu Baolu, kvm, Nicolin Chen, Liu, Yi L

On Thu, Mar 23, 2023 at 07:31:02AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Wednesday, March 22, 2023 3:15 AM
> > +
> > +	mutex_lock(&idev->igroup->lock);
> > +
> > +	if (igroup->hwpt == NULL) {
> > +		rc = -EINVAL;
> > +		goto err_unlock;
> > +	}
> > +
> > +	if (hwpt == igroup->hwpt) {
> > +		mutex_unlock(&idev->igroup->lock);
> > +		return NULL;
> > +	}
> 
> goto err_unlock;

No, this is a success path, it should not jumpt to an err label or use

  return ERR_PTR(0)

> > +	/* Move the refcounts held by the device_list to the new hwpt */
> > +	refcount_add(num_devices, &hwpt->obj.users);
> > +	if (num_devices > 1)
> > +		WARN_ON(refcount_sub_and_test(num_devices - 1,
> > +					      &old_hwpt->obj.users));
> 
> A comment is welcomed to match "caller must destroy old_hwpt".

??

Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3 14/17] iommufd/selftest: Test iommufd_device_replace()
  2023-03-23  7:57   ` Tian, Kevin
@ 2023-03-23 14:32     ` Jason Gunthorpe
  0 siblings, 0 replies; 46+ messages in thread
From: Jason Gunthorpe @ 2023-03-23 14:32 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: iommu, linux-kselftest, Lu Baolu, kvm, Nicolin Chen, Liu, Yi L

On Thu, Mar 23, 2023 at 07:57:34AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Wednesday, March 22, 2023 3:15 AM
> > 
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > 
> > Allow the selftest to call the function on the mock idev, add some tests
> > to exercise it.
> > 
> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> 
> Miss your s-o-b
> 
> > +
> > +	test_cmd_mock_domain_replace(self->stdev_ids[0], ioas_id);
> > +	if (variant->mock_domains >= 2) {
> > +		test_cmd_mock_domain_replace(self->stdev_ids[0],
> > +					     self->hwpt_ids[1]);
> > +		test_cmd_mock_domain_replace(self->stdev_ids[0],
> > +					     self->hwpt_ids[1]);
>
> Duplicated or intended?

Intended, it tests replacing with the same hwpt

Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3 00/17] Add iommufd physical device operations for replace and alloc hwpt
  2023-03-23  8:04 ` [PATCH v3 00/17] Add iommufd physical device operations for replace and alloc hwpt Tian, Kevin
@ 2023-03-23 14:35   ` Jason Gunthorpe
  0 siblings, 0 replies; 46+ messages in thread
From: Jason Gunthorpe @ 2023-03-23 14:35 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: iommu, linux-kselftest, Lu Baolu, kvm, Nicolin Chen, Liu, Yi L

On Thu, Mar 23, 2023 at 08:04:24AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Wednesday, March 22, 2023 3:15 AM
> > 
> > The implementation is complicated because we have to introduce some
> > per-iommu_group memory in iommufd and redo how we think about multi-
> > device
> > groups to be more explicit. This solves all the locking problems in the
> > prior attempts.
> 
> By tracking groups explicitly should we also move DMA ownership claim
> to be per group?

That looks like it would make things more complicated as we'd have to
be alot more careful with the group refcount to avoid bugs.

Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH v3 03/17] iommufd: Replace the hwpt->devices list with iommufd_group
  2023-03-23 14:23     ` Jason Gunthorpe
@ 2023-03-24  1:37       ` Tian, Kevin
  2023-03-24 15:02         ` Jason Gunthorpe
  0 siblings, 1 reply; 46+ messages in thread
From: Tian, Kevin @ 2023-03-24  1:37 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, linux-kselftest, Lu Baolu, kvm, Nicolin Chen, Liu, Yi L

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, March 23, 2023 10:24 PM
> 
> On Thu, Mar 23, 2023 at 07:21:42AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Wednesday, March 22, 2023 3:15 AM
> > >
> > >  	/*
> > > -	 * FIXME: Hack around missing a device-centric iommu api, only
> > > attach to
> > > -	 * the group once for the first device that is in the group.
> > > +	 * Only attach to the group once for the first device that is in the
> > > +	 * group. All the other devices will follow this attachment. The user
> > > +	 * should attach every device individually to as the per-device
> > > reserved
> >
> > "individually to the hwpt"
> 
> Done
> 
> > I thought about whether holding igroup->lock is necessary here.
> >
> > The caller should avoid racing attach/detach/replace on the same device.
> 
> I think even if the caller races we should be fine

If vfio races attach/detach then lots of things are messed.

e.g. iommufd_device_detach() directly calls list_del(&idev->group_item)
w/o checking whether the device has been attached.

And with that race UAF could occur if we narrow down the lock scope
to iommufd_hw_pagetable_attach():

              cpu0                                cpu1
vfio_iommufd_attach()
  iommufd_device_attach()
    iommufd_device_auto_get_domain()
      mutex_lock(&ioas->mutex);
      iommufd_hw_pagetable_alloc()
        hwpt = iommufd_object_alloc() //hwpt.users=1
        hwpt->domain = iommu_domain_alloc(idev->dev->bus);
        iommufd_hw_pagetable_attach() //hwpt.users=2
                                          vfio_iommufd_detach()
                                            iommufd_device_detach()
                                              mutex_lock(&idev->igroup->lock);
                                              hwpt = iommufd_hw_pagetable_detach()
                                              mutex_unlock(&idev->igroup->lock);
                                              iommufd_hw_pagetable_put(hwpt)
                                                iommufd_object_destroy_user(hwpt) //hwpt.users=0
                                                  iommufd_hw_pagetable_destroy(hwpt)
                                                    iommu_domain_free(hwpt->domain);
        iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain); //UAF

From past discussion we assumed the calling driver i.e. vfio should do
the right thing e.g. by holding dev_set->lock otherwise itself is already
messed.

igroup->lock is really for protection cross devices in the group. But as
pointed out below we can narrow its scope in this function as another
device cannot detach from this hwpt w/o first attaching to it which is
already protected by ioas->mutex.

> 
> The point of the lock scope was the capture these lines:
> 
> 	rc = iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain);
> 	if (rc)
> 		goto out_detach;
> 	list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list);
> 
> But based on the current arrangement none of them rely on the igroup
> mutex so it does seem we can narrow it
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH v3 12/17] iommufd: Add iommufd_device_replace()
  2023-03-23 14:30     ` Jason Gunthorpe
@ 2023-03-24  1:42       ` Tian, Kevin
  2023-03-24 15:03         ` Jason Gunthorpe
  0 siblings, 1 reply; 46+ messages in thread
From: Tian, Kevin @ 2023-03-24  1:42 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, linux-kselftest, Lu Baolu, kvm, Nicolin Chen, Liu, Yi L

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, March 23, 2023 10:31 PM
> 
> On Thu, Mar 23, 2023 at 07:31:02AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Wednesday, March 22, 2023 3:15 AM
> > > +
> > > +	mutex_lock(&idev->igroup->lock);
> > > +
> > > +	if (igroup->hwpt == NULL) {
> > > +		rc = -EINVAL;
> > > +		goto err_unlock;
> > > +	}
> > > +
> > > +	if (hwpt == igroup->hwpt) {
> > > +		mutex_unlock(&idev->igroup->lock);
> > > +		return NULL;
> > > +	}
> >
> > goto err_unlock;
> 
> No, this is a success path, it should not jumpt to an err label or use
> 
>   return ERR_PTR(0)

My bad.

> 
> > > +	/* Move the refcounts held by the device_list to the new hwpt */
> > > +	refcount_add(num_devices, &hwpt->obj.users);
> > > +	if (num_devices > 1)
> > > +		WARN_ON(refcount_sub_and_test(num_devices - 1,
> > > +					      &old_hwpt->obj.users));
> >
> > A comment is welcomed to match "caller must destroy old_hwpt".
> 
> ??
> 

It's not intuitive when moving the refcnt why we subtract num_devices
from the old_hwpt only when it's greater than 1. It's really about
the destroy must be done by the caller.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3 03/17] iommufd: Replace the hwpt->devices list with iommufd_group
  2023-03-24  1:37       ` Tian, Kevin
@ 2023-03-24 15:02         ` Jason Gunthorpe
  2023-03-28  2:32           ` Tian, Kevin
  0 siblings, 1 reply; 46+ messages in thread
From: Jason Gunthorpe @ 2023-03-24 15:02 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: iommu, linux-kselftest, Lu Baolu, kvm, Nicolin Chen, Liu, Yi L

On Fri, Mar 24, 2023 at 01:37:51AM +0000, Tian, Kevin wrote:

> If vfio races attach/detach then lots of things are messed.
> 
> e.g. iommufd_device_detach() directly calls list_del(&idev->group_item)
> w/o checking whether the device has been attached.

Yeah, you obviously can't race attach/detach or detach/replace

> And with that race UAF could occur if we narrow down the lock scope
> to iommufd_hw_pagetable_attach():
> 
>               cpu0                                cpu1
> vfio_iommufd_attach()
>   iommufd_device_attach()
>     iommufd_device_auto_get_domain()
>       mutex_lock(&ioas->mutex);
>       iommufd_hw_pagetable_alloc()
>         hwpt = iommufd_object_alloc() //hwpt.users=1
>         hwpt->domain = iommu_domain_alloc(idev->dev->bus);
>         iommufd_hw_pagetable_attach() //hwpt.users=2
>                                           vfio_iommufd_detach()
>                                             iommufd_device_detach()
>                                               mutex_lock(&idev->igroup->lock);
>                                               hwpt = iommufd_hw_pagetable_detach()
>                                               mutex_unlock(&idev->igroup->lock);
>                                               iommufd_hw_pagetable_put(hwpt)
>                                                 iommufd_object_destroy_user(hwpt) //hwpt.users=0
>                                                   iommufd_hw_pagetable_destroy(hwpt)
>                                                     iommu_domain_free(hwpt->domain);
>         iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain); //UAF

You didn't balance the refcounts properly, the cpu1 put will get to
hwpt.users=1

There is a refcount_inc in iommufd_hw_pagetable_attach(), the
iommufd_hw_pagetable_alloc() retains its reference and so the domain
is guarenteed valid

Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3 12/17] iommufd: Add iommufd_device_replace()
  2023-03-24  1:42       ` Tian, Kevin
@ 2023-03-24 15:03         ` Jason Gunthorpe
  0 siblings, 0 replies; 46+ messages in thread
From: Jason Gunthorpe @ 2023-03-24 15:03 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: iommu, linux-kselftest, Lu Baolu, kvm, Nicolin Chen, Liu, Yi L

On Fri, Mar 24, 2023 at 01:42:58AM +0000, Tian, Kevin wrote:

> > > > +	/* Move the refcounts held by the device_list to the new hwpt */
> > > > +	refcount_add(num_devices, &hwpt->obj.users);
> > > > +	if (num_devices > 1)
> > > > +		WARN_ON(refcount_sub_and_test(num_devices - 1,
> > > > +					      &old_hwpt->obj.users));
> > >
> > > A comment is welcomed to match "caller must destroy old_hwpt".
> > 
> > ??
> > 
> 
> It's not intuitive when moving the refcnt why we subtract num_devices
> from the old_hwpt only when it's greater than 1. It's really about
> the destroy must be done by the caller.


	/*
	 * Move the refcounts held by the device_list to the new hwpt. Retain a
	 * refcount for this thread as the caller will free it.
	 */


Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH v3 03/17] iommufd: Replace the hwpt->devices list with iommufd_group
  2023-03-24 15:02         ` Jason Gunthorpe
@ 2023-03-28  2:32           ` Tian, Kevin
  2023-03-28 11:38             ` Jason Gunthorpe
  0 siblings, 1 reply; 46+ messages in thread
From: Tian, Kevin @ 2023-03-28  2:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, linux-kselftest, Lu Baolu, kvm, Nicolin Chen, Liu, Yi L

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, March 24, 2023 11:03 PM
> 
> On Fri, Mar 24, 2023 at 01:37:51AM +0000, Tian, Kevin wrote:
> 
> > If vfio races attach/detach then lots of things are messed.
> >
> > e.g. iommufd_device_detach() directly calls list_del(&idev->group_item)
> > w/o checking whether the device has been attached.
> 
> Yeah, you obviously can't race attach/detach or detach/replace
> 
> > And with that race UAF could occur if we narrow down the lock scope
> > to iommufd_hw_pagetable_attach():
> >
> >               cpu0                                cpu1
> > vfio_iommufd_attach()
> >   iommufd_device_attach()
> >     iommufd_device_auto_get_domain()
> >       mutex_lock(&ioas->mutex);
> >       iommufd_hw_pagetable_alloc()
> >         hwpt = iommufd_object_alloc() //hwpt.users=1
> >         hwpt->domain = iommu_domain_alloc(idev->dev->bus);
> >         iommufd_hw_pagetable_attach() //hwpt.users=2
> >                                           vfio_iommufd_detach()
> >                                             iommufd_device_detach()
> >                                               mutex_lock(&idev->igroup->lock);
> >                                               hwpt = iommufd_hw_pagetable_detach()
> >                                               mutex_unlock(&idev->igroup->lock);
> >                                               iommufd_hw_pagetable_put(hwpt)
> >                                                 iommufd_object_destroy_user(hwpt)
> //hwpt.users=0
> >                                                   iommufd_hw_pagetable_destroy(hwpt)
> >                                                     iommu_domain_free(hwpt->domain);
> >         iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain); //UAF
> 
> You didn't balance the refcounts properly, the cpu1 put will get to
> hwpt.users=1
> 

iommufd_object_destroy_user() decrements the count twice if the value
is two:

	refcount_dec(&obj->users);
	if (!refcount_dec_if_one(&obj->users)) {

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3 03/17] iommufd: Replace the hwpt->devices list with iommufd_group
  2023-03-28  2:32           ` Tian, Kevin
@ 2023-03-28 11:38             ` Jason Gunthorpe
  2023-03-29  3:03               ` Tian, Kevin
  0 siblings, 1 reply; 46+ messages in thread
From: Jason Gunthorpe @ 2023-03-28 11:38 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: iommu, linux-kselftest, Lu Baolu, kvm, Nicolin Chen, Liu, Yi L

On Tue, Mar 28, 2023 at 02:32:22AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Friday, March 24, 2023 11:03 PM
> > 
> > On Fri, Mar 24, 2023 at 01:37:51AM +0000, Tian, Kevin wrote:
> > 
> > > If vfio races attach/detach then lots of things are messed.
> > >
> > > e.g. iommufd_device_detach() directly calls list_del(&idev->group_item)
> > > w/o checking whether the device has been attached.
> > 
> > Yeah, you obviously can't race attach/detach or detach/replace
> > 
> > > And with that race UAF could occur if we narrow down the lock scope
> > > to iommufd_hw_pagetable_attach():
> > >
> > >               cpu0                                cpu1
> > > vfio_iommufd_attach()
> > >   iommufd_device_attach()
> > >     iommufd_device_auto_get_domain()
> > >       mutex_lock(&ioas->mutex);
> > >       iommufd_hw_pagetable_alloc()
> > >         hwpt = iommufd_object_alloc() //hwpt.users=1
> > >         hwpt->domain = iommu_domain_alloc(idev->dev->bus);
> > >         iommufd_hw_pagetable_attach() //hwpt.users=2
> > >                                           vfio_iommufd_detach()
> > >                                             iommufd_device_detach()
> > >                                               mutex_lock(&idev->igroup->lock);
> > >                                               hwpt = iommufd_hw_pagetable_detach()
> > >                                               mutex_unlock(&idev->igroup->lock);
> > >                                               iommufd_hw_pagetable_put(hwpt)
> > >                                                 iommufd_object_destroy_user(hwpt)
> > //hwpt.users=0
> > >                                                   iommufd_hw_pagetable_destroy(hwpt)
> > >                                                     iommu_domain_free(hwpt->domain);
> > >         iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain); //UAF
> > 
> > You didn't balance the refcounts properly, the cpu1 put will get to
> > hwpt.users=1
> > 
> 
> iommufd_object_destroy_user() decrements the count twice if the value
> is two:
> 
> 	refcount_dec(&obj->users);
> 	if (!refcount_dec_if_one(&obj->users)) {

Ohh, it isn't allowed to call iommufd_object_destroy_user() until
finalize has passed..

Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH v3 03/17] iommufd: Replace the hwpt->devices list with iommufd_group
  2023-03-28 11:38             ` Jason Gunthorpe
@ 2023-03-29  3:03               ` Tian, Kevin
  0 siblings, 0 replies; 46+ messages in thread
From: Tian, Kevin @ 2023-03-29  3:03 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, linux-kselftest, Lu Baolu, kvm, Nicolin Chen, Liu, Yi L

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, March 28, 2023 7:39 PM
> 
> On Tue, Mar 28, 2023 at 02:32:22AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Friday, March 24, 2023 11:03 PM
> > >
> > > On Fri, Mar 24, 2023 at 01:37:51AM +0000, Tian, Kevin wrote:
> > >
> > > > If vfio races attach/detach then lots of things are messed.
> > > >
> > > > e.g. iommufd_device_detach() directly calls list_del(&idev->group_item)
> > > > w/o checking whether the device has been attached.
> > >
> > > Yeah, you obviously can't race attach/detach or detach/replace
> > >
> > > > And with that race UAF could occur if we narrow down the lock scope
> > > > to iommufd_hw_pagetable_attach():
> > > >
> > > >               cpu0                                cpu1
> > > > vfio_iommufd_attach()
> > > >   iommufd_device_attach()
> > > >     iommufd_device_auto_get_domain()
> > > >       mutex_lock(&ioas->mutex);
> > > >       iommufd_hw_pagetable_alloc()
> > > >         hwpt = iommufd_object_alloc() //hwpt.users=1
> > > >         hwpt->domain = iommu_domain_alloc(idev->dev->bus);
> > > >         iommufd_hw_pagetable_attach() //hwpt.users=2
> > > >                                           vfio_iommufd_detach()
> > > >                                             iommufd_device_detach()
> > > >                                               mutex_lock(&idev->igroup->lock);
> > > >                                               hwpt = iommufd_hw_pagetable_detach()
> > > >                                               mutex_unlock(&idev->igroup->lock);
> > > >                                               iommufd_hw_pagetable_put(hwpt)
> > > >                                                 iommufd_object_destroy_user(hwpt)
> > > //hwpt.users=0
> > > >                                                   iommufd_hw_pagetable_destroy(hwpt)
> > > >                                                     iommu_domain_free(hwpt->domain);
> > > >         iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain); //UAF
> > >
> > > You didn't balance the refcounts properly, the cpu1 put will get to
> > > hwpt.users=1
> > >
> >
> > iommufd_object_destroy_user() decrements the count twice if the value
> > is two:
> >
> > 	refcount_dec(&obj->users);
> > 	if (!refcount_dec_if_one(&obj->users)) {
> 
> Ohh, it isn't allowed to call iommufd_object_destroy_user() until
> finalize has passed..
> 

ah you are right. In this case iommufd_get_object() will fail in the first
place.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3 03/17] iommufd: Replace the hwpt->devices list with iommufd_group
  2023-03-23  7:21   ` Tian, Kevin
  2023-03-23 14:23     ` Jason Gunthorpe
@ 2023-04-11 14:31     ` Jason Gunthorpe
  2023-04-12  8:27       ` Tian, Kevin
  1 sibling, 1 reply; 46+ messages in thread
From: Jason Gunthorpe @ 2023-04-11 14:31 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: iommu, linux-kselftest, Lu Baolu, kvm, Nicolin Chen, Liu, Yi L

On Thu, Mar 23, 2023 at 07:21:42AM +0000, Tian, Kevin wrote:

> If no oversight then we can directly put the lock in
> iommufd_hw_pagetable_attach/detach() which can also simplify a bit on 
> its callers in device.c.

So, I did this, and syzkaller explains why this can't be done:

https://lore.kernel.org/r/0000000000006e66d605f83e09bc@google.com

We can't allow the hwpt to be discovered by a parallel
iommufd_hw_pagetable_attach() until it is done being setup, otherwise
if we fail to set it up we can't destroy the hwpt.
 
	if (immediate_attach) {
		rc = iommufd_hw_pagetable_attach(hwpt, idev);
		if (rc)
			goto out_abort;
	}

	rc = iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain);
	if (rc)
		goto out_detach;
	list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list);
	return hwpt;

out_detach:
	if (immediate_attach)
		iommufd_hw_pagetable_detach(idev);
out_abort:
	iommufd_object_abort_and_destroy(ictx, &hwpt->obj);

As some other idev could be pointing at it too now.

So the lock has to come back out..

Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH v3 03/17] iommufd: Replace the hwpt->devices list with iommufd_group
  2023-04-11 14:31     ` Jason Gunthorpe
@ 2023-04-12  8:27       ` Tian, Kevin
  2023-04-12 11:17         ` Jason Gunthorpe
  0 siblings, 1 reply; 46+ messages in thread
From: Tian, Kevin @ 2023-04-12  8:27 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, linux-kselftest, Lu Baolu, kvm, Nicolin Chen, Liu, Yi L

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, April 11, 2023 10:31 PM
> 
> On Thu, Mar 23, 2023 at 07:21:42AM +0000, Tian, Kevin wrote:
> 
> > If no oversight then we can directly put the lock in
> > iommufd_hw_pagetable_attach/detach() which can also simplify a bit on
> > its callers in device.c.
> 
> So, I did this, and syzkaller explains why this can't be done:
> 
> https://lore.kernel.org/r/0000000000006e66d605f83e09bc@google.com
> 
> We can't allow the hwpt to be discovered by a parallel
> iommufd_hw_pagetable_attach() until it is done being setup, otherwise
> if we fail to set it up we can't destroy the hwpt.
> 
> 	if (immediate_attach) {
> 		rc = iommufd_hw_pagetable_attach(hwpt, idev);
> 		if (rc)
> 			goto out_abort;
> 	}
> 
> 	rc = iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain);
> 	if (rc)
> 		goto out_detach;
> 	list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list);
> 	return hwpt;
> 
> out_detach:
> 	if (immediate_attach)
> 		iommufd_hw_pagetable_detach(idev);
> out_abort:
> 	iommufd_object_abort_and_destroy(ictx, &hwpt->obj);
> 
> As some other idev could be pointing at it too now.

How could this happen before this object is finalized? iirc you pointed to
me this fact in previous discussion.

For this specific lockdep issue isn't the simple fix is to move the group lock
into iommufd_hw_pagetable_detach() just like done in attach()?

> 
> So the lock has to come back out..
> 
> Jason


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3 03/17] iommufd: Replace the hwpt->devices list with iommufd_group
  2023-04-12  8:27       ` Tian, Kevin
@ 2023-04-12 11:17         ` Jason Gunthorpe
  2023-04-13  2:52           ` Tian, Kevin
  0 siblings, 1 reply; 46+ messages in thread
From: Jason Gunthorpe @ 2023-04-12 11:17 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: iommu, linux-kselftest, Lu Baolu, kvm, Nicolin Chen, Liu, Yi L

On Wed, Apr 12, 2023 at 08:27:36AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, April 11, 2023 10:31 PM
> > 
> > On Thu, Mar 23, 2023 at 07:21:42AM +0000, Tian, Kevin wrote:
> > 
> > > If no oversight then we can directly put the lock in
> > > iommufd_hw_pagetable_attach/detach() which can also simplify a bit on
> > > its callers in device.c.
> > 
> > So, I did this, and syzkaller explains why this can't be done:
> > 
> > https://lore.kernel.org/r/0000000000006e66d605f83e09bc@google.com
> > 
> > We can't allow the hwpt to be discovered by a parallel
> > iommufd_hw_pagetable_attach() until it is done being setup, otherwise
> > if we fail to set it up we can't destroy the hwpt.
> > 
> > 	if (immediate_attach) {
> > 		rc = iommufd_hw_pagetable_attach(hwpt, idev);
> > 		if (rc)
> > 			goto out_abort;
> > 	}
> > 
> > 	rc = iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain);
> > 	if (rc)
> > 		goto out_detach;
> > 	list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list);
> > 	return hwpt;
> > 
> > out_detach:
> > 	if (immediate_attach)
> > 		iommufd_hw_pagetable_detach(idev);
> > out_abort:
> > 	iommufd_object_abort_and_destroy(ictx, &hwpt->obj);
> > 
> > As some other idev could be pointing at it too now.
> 
> How could this happen before this object is finalized? iirc you pointed to
> me this fact in previous discussion.

It only is unavailable through the xarray, but we've added it to at
least one internal list on the group already, it is kind of sketchy to
work like this, it should all be atomic..

Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH v3 03/17] iommufd: Replace the hwpt->devices list with iommufd_group
  2023-04-12 11:17         ` Jason Gunthorpe
@ 2023-04-13  2:52           ` Tian, Kevin
  2023-04-14 13:31             ` Jason Gunthorpe
  0 siblings, 1 reply; 46+ messages in thread
From: Tian, Kevin @ 2023-04-13  2:52 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, linux-kselftest, Lu Baolu, kvm, Nicolin Chen, Liu, Yi L

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, April 12, 2023 7:18 PM
> 
> On Wed, Apr 12, 2023 at 08:27:36AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Tuesday, April 11, 2023 10:31 PM
> > >
> > > On Thu, Mar 23, 2023 at 07:21:42AM +0000, Tian, Kevin wrote:
> > >
> > > > If no oversight then we can directly put the lock in
> > > > iommufd_hw_pagetable_attach/detach() which can also simplify a bit
> on
> > > > its callers in device.c.
> > >
> > > So, I did this, and syzkaller explains why this can't be done:
> > >
> > > https://lore.kernel.org/r/0000000000006e66d605f83e09bc@google.com
> > >
> > > We can't allow the hwpt to be discovered by a parallel
> > > iommufd_hw_pagetable_attach() until it is done being setup, otherwise
> > > if we fail to set it up we can't destroy the hwpt.
> > >
> > > 	if (immediate_attach) {
> > > 		rc = iommufd_hw_pagetable_attach(hwpt, idev);
> > > 		if (rc)
> > > 			goto out_abort;
> > > 	}
> > >
> > > 	rc = iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain);
> > > 	if (rc)
> > > 		goto out_detach;
> > > 	list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list);
> > > 	return hwpt;
> > >
> > > out_detach:
> > > 	if (immediate_attach)
> > > 		iommufd_hw_pagetable_detach(idev);
> > > out_abort:
> > > 	iommufd_object_abort_and_destroy(ictx, &hwpt->obj);
> > >
> > > As some other idev could be pointing at it too now.
> >
> > How could this happen before this object is finalized? iirc you pointed to
> > me this fact in previous discussion.
> 
> It only is unavailable through the xarray, but we've added it to at
> least one internal list on the group already, it is kind of sketchy to
> work like this, it should all be atomic..
> 

which internal list? group has a list for attached devices but regarding
to hwpt it's stored in a single field igroup->hwpt.

with that being set in iommufd_hw_pagetable_attach() another device
cannot race attaching to a different ioas/hwpt (mismatching igroup->hwpt)
or the same hwpt being created (not finalized and holding ioas->mutex).

So although it's added to group none will reference its state before it's
finalized.

btw removing this lock in this file also makes it easier to support siov
device which doesn't have group. We can have internal group attach
and pasid attach wrappers within device.c and leave igroup->lock held
in the group attach path.

Otherwise we'll have to create a locking wrapper used in this file to
touch igroup->lock in particular for iommufd_device which has a igroup
object.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3 03/17] iommufd: Replace the hwpt->devices list with iommufd_group
  2023-04-13  2:52           ` Tian, Kevin
@ 2023-04-14 13:31             ` Jason Gunthorpe
  2023-04-20  6:15               ` Tian, Kevin
  0 siblings, 1 reply; 46+ messages in thread
From: Jason Gunthorpe @ 2023-04-14 13:31 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: iommu, linux-kselftest, Lu Baolu, kvm, Nicolin Chen, Liu, Yi L

On Thu, Apr 13, 2023 at 02:52:54AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Wednesday, April 12, 2023 7:18 PM
> > 
> > On Wed, Apr 12, 2023 at 08:27:36AM +0000, Tian, Kevin wrote:
> > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > Sent: Tuesday, April 11, 2023 10:31 PM
> > > >
> > > > On Thu, Mar 23, 2023 at 07:21:42AM +0000, Tian, Kevin wrote:
> > > >
> > > > > If no oversight then we can directly put the lock in
> > > > > iommufd_hw_pagetable_attach/detach() which can also simplify a bit
> > on
> > > > > its callers in device.c.
> > > >
> > > > So, I did this, and syzkaller explains why this can't be done:
> > > >
> > > > https://lore.kernel.org/r/0000000000006e66d605f83e09bc@google.com
> > > >
> > > > We can't allow the hwpt to be discovered by a parallel
> > > > iommufd_hw_pagetable_attach() until it is done being setup, otherwise
> > > > if we fail to set it up we can't destroy the hwpt.
> > > >
> > > > 	if (immediate_attach) {
> > > > 		rc = iommufd_hw_pagetable_attach(hwpt, idev);
> > > > 		if (rc)
> > > > 			goto out_abort;
> > > > 	}
> > > >
> > > > 	rc = iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain);
> > > > 	if (rc)
> > > > 		goto out_detach;
> > > > 	list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list);
> > > > 	return hwpt;
> > > >
> > > > out_detach:
> > > > 	if (immediate_attach)
> > > > 		iommufd_hw_pagetable_detach(idev);
> > > > out_abort:
> > > > 	iommufd_object_abort_and_destroy(ictx, &hwpt->obj);
> > > >
> > > > As some other idev could be pointing at it too now.
> > >
> > > How could this happen before this object is finalized? iirc you pointed to
> > > me this fact in previous discussion.
> > 
> > It only is unavailable through the xarray, but we've added it to at
> > least one internal list on the group already, it is kind of sketchy to
> > work like this, it should all be atomic..
> > 
> 
> which internal list? group has a list for attached devices but regarding
> to hwpt it's stored in a single field igroup->hwpt.

It is added to 

	list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list);

Which can be observed from

	mutex_lock(&ioas->mutex);
	list_for_each_entry(hwpt, &ioas->hwpt_list, hwpt_item) {
		if (!hwpt->auto_domain)
			continue;

		if (!iommufd_lock_obj(&hwpt->obj))
			continue;

If iommufd_lock_obj() has happened then
iommufd_object_abort_and_destroy() is in trouble.

Thus we need to hold the ioas->mutex right up until we know we can't
call iommufd_object_abort_and_destroy(), or lift out the hwpt list_add

This could maybe also be fixed by holding the destroy_rw_sem right up
until finalize. Though, I think I looked at this once and decided
against it for some reason..

> btw removing this lock in this file also makes it easier to support siov
> device which doesn't have group. We can have internal group attach
> and pasid attach wrappers within device.c and leave igroup->lock held
> in the group attach path.

Yeah, I expect this will need more work when we get to PASID support

Most likely the resolution will be something like PASID domains can't
be used as PF/VF domains because they don't have the right reserved
regions, so they shouldn't be in the hwpt_list at all, so we can use a
more relaxed locking.

Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

* RE: [PATCH v3 03/17] iommufd: Replace the hwpt->devices list with iommufd_group
  2023-04-14 13:31             ` Jason Gunthorpe
@ 2023-04-20  6:15               ` Tian, Kevin
  2023-04-20 15:34                 ` Jason Gunthorpe
  0 siblings, 1 reply; 46+ messages in thread
From: Tian, Kevin @ 2023-04-20  6:15 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, linux-kselftest, Lu Baolu, kvm, Nicolin Chen, Liu, Yi L

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, April 14, 2023 9:32 PM
> 
> On Thu, Apr 13, 2023 at 02:52:54AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Wednesday, April 12, 2023 7:18 PM
> > >
> > > On Wed, Apr 12, 2023 at 08:27:36AM +0000, Tian, Kevin wrote:
> > > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > > Sent: Tuesday, April 11, 2023 10:31 PM
> > > > >
> > > > > On Thu, Mar 23, 2023 at 07:21:42AM +0000, Tian, Kevin wrote:
> > > > >
> > > > > > If no oversight then we can directly put the lock in
> > > > > > iommufd_hw_pagetable_attach/detach() which can also simplify a
> bit
> > > on
> > > > > > its callers in device.c.
> > > > >
> > > > > So, I did this, and syzkaller explains why this can't be done:
> > > > >
> > > > >
> https://lore.kernel.org/r/0000000000006e66d605f83e09bc@google.com
> > > > >
> > > > > We can't allow the hwpt to be discovered by a parallel
> > > > > iommufd_hw_pagetable_attach() until it is done being setup,
> otherwise
> > > > > if we fail to set it up we can't destroy the hwpt.
> > > > >
> > > > > 	if (immediate_attach) {
> > > > > 		rc = iommufd_hw_pagetable_attach(hwpt, idev);
> > > > > 		if (rc)
> > > > > 			goto out_abort;
> > > > > 	}
> > > > >
> > > > > 	rc = iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain);
> > > > > 	if (rc)
> > > > > 		goto out_detach;
> > > > > 	list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list);
> > > > > 	return hwpt;
> > > > >
> > > > > out_detach:
> > > > > 	if (immediate_attach)
> > > > > 		iommufd_hw_pagetable_detach(idev);
> > > > > out_abort:
> > > > > 	iommufd_object_abort_and_destroy(ictx, &hwpt->obj);
> > > > >
> > > > > As some other idev could be pointing at it too now.
> > > >
> > > > How could this happen before this object is finalized? iirc you pointed to
> > > > me this fact in previous discussion.
> > >
> > > It only is unavailable through the xarray, but we've added it to at
> > > least one internal list on the group already, it is kind of sketchy to
> > > work like this, it should all be atomic..
> > >
> >
> > which internal list? group has a list for attached devices but regarding
> > to hwpt it's stored in a single field igroup->hwpt.
> 
> It is added to
> 
> 	list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list);

this is called under ioas->mutex.

> 
> Which can be observed from
> 
> 	mutex_lock(&ioas->mutex);
> 	list_for_each_entry(hwpt, &ioas->hwpt_list, hwpt_item) {
> 		if (!hwpt->auto_domain)
> 			continue;
> 
> 		if (!iommufd_lock_obj(&hwpt->obj))
> 			continue;

this is called also under ioas->mutex. So no race. 😊

> 
> If iommufd_lock_obj() has happened then
> iommufd_object_abort_and_destroy() is in trouble.
> 
> Thus we need to hold the ioas->mutex right up until we know we can't
> call iommufd_object_abort_and_destroy(), or lift out the hwpt list_add
> 
> This could maybe also be fixed by holding the destroy_rw_sem right up
> until finalize. Though, I think I looked at this once and decided
> against it for some reason..
> 
> > btw removing this lock in this file also makes it easier to support siov
> > device which doesn't have group. We can have internal group attach
> > and pasid attach wrappers within device.c and leave igroup->lock held
> > in the group attach path.
> 
> Yeah, I expect this will need more work when we get to PASID support
> 
> Most likely the resolution will be something like PASID domains can't
> be used as PF/VF domains because they don't have the right reserved
> regions, so they shouldn't be in the hwpt_list at all, so we can use a
> more relaxed locking.
> 

Yes with pasid there is no reserved region conceptually. Currently in
our internal implementation we still added the reserved regions of
the device to IOAS upon pasid attach which looks unnecessary.

But I didn't get why a domain cannot be shared by PASID and PF/VF.
Upon attach we add the reserved regions (if any) of the device to
IOAS. I don't think there is a requirement that reserved regions
must exist for a successful attachment?

A similar example is on Jacob's DMA API PASID work. The DMA
domain is first attached to RID and then to a PASID.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3 03/17] iommufd: Replace the hwpt->devices list with iommufd_group
  2023-04-20  6:15               ` Tian, Kevin
@ 2023-04-20 15:34                 ` Jason Gunthorpe
  0 siblings, 0 replies; 46+ messages in thread
From: Jason Gunthorpe @ 2023-04-20 15:34 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: iommu, linux-kselftest, Lu Baolu, kvm, Nicolin Chen, Liu, Yi L

On Thu, Apr 20, 2023 at 06:15:16AM +0000, Tian, Kevin wrote:
> > > which internal list? group has a list for attached devices but regarding
> > > to hwpt it's stored in a single field igroup->hwpt.
> > 
> > It is added to
> > 
> > 	list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list);
> 
> this is called under ioas->mutex.

Yes.. But.. that is troubled too, we are calling destroy under the
same mutex, there is a missing a fault point to catch it in the test,
and hwpt_alloc doesn't have the lock wide enough :\

So you want to argue that it is safe to do this:

   mutex_lock(&ioas->mutex);
   alloc
   attach
   detach
   abort
   mutex_unlock(&ioas->mutex);

Even if attach/detach lock/unlock the group mutex during their cycle?

It seems OK..

I don't see any places that Though I don't much like the locking
pattern where we succeed attach, drop all the locks and the fail and
then relock and do error unwind.. Sketchy..

Jason

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2023-04-20 15:34 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-21 19:14 [PATCH v3 00/17] Add iommufd physical device operations for replace and alloc hwpt Jason Gunthorpe
2023-03-21 19:14 ` [PATCH v3 01/17] iommufd: Move isolated msi enforcement to iommufd_device_bind() Jason Gunthorpe
2023-03-21 19:14 ` [PATCH v3 02/17] iommufd: Add iommufd_group Jason Gunthorpe
2023-03-21 19:14 ` [PATCH v3 03/17] iommufd: Replace the hwpt->devices list with iommufd_group Jason Gunthorpe
2023-03-23  7:21   ` Tian, Kevin
2023-03-23 14:23     ` Jason Gunthorpe
2023-03-24  1:37       ` Tian, Kevin
2023-03-24 15:02         ` Jason Gunthorpe
2023-03-28  2:32           ` Tian, Kevin
2023-03-28 11:38             ` Jason Gunthorpe
2023-03-29  3:03               ` Tian, Kevin
2023-04-11 14:31     ` Jason Gunthorpe
2023-04-12  8:27       ` Tian, Kevin
2023-04-12 11:17         ` Jason Gunthorpe
2023-04-13  2:52           ` Tian, Kevin
2023-04-14 13:31             ` Jason Gunthorpe
2023-04-20  6:15               ` Tian, Kevin
2023-04-20 15:34                 ` Jason Gunthorpe
2023-03-21 19:14 ` [PATCH v3 04/17] iommu: Export iommu_get_resv_regions() Jason Gunthorpe
2023-03-21 19:14 ` [PATCH v3 05/17] iommufd: Keep track of each device's reserved regions instead of groups Jason Gunthorpe
2023-03-21 19:14 ` [PATCH v3 06/17] iommufd: Use the iommufd_group to avoid duplicate MSI setup Jason Gunthorpe
2023-03-21 19:14 ` [PATCH v3 07/17] iommufd: Make sw_msi_start a group global Jason Gunthorpe
2023-03-21 19:14 ` [PATCH v3 08/17] iommufd: Move putting a hwpt to a helper function Jason Gunthorpe
2023-03-21 19:14 ` [PATCH v3 09/17] iommufd: Add enforced_cache_coherency to iommufd_hw_pagetable_alloc() Jason Gunthorpe
2023-03-21 19:14 ` [PATCH v3 10/17] iommufd: Reorganize iommufd_device_attach into iommufd_device_change_pt Jason Gunthorpe
2023-03-23  7:25   ` Tian, Kevin
2023-03-23 14:26     ` Jason Gunthorpe
2023-03-21 19:14 ` [PATCH v3 11/17] iommu: Introduce a new iommu_group_replace_domain() API Jason Gunthorpe
2023-03-21 19:14 ` [PATCH v3 12/17] iommufd: Add iommufd_device_replace() Jason Gunthorpe
2023-03-23  7:31   ` Tian, Kevin
2023-03-23 14:30     ` Jason Gunthorpe
2023-03-24  1:42       ` Tian, Kevin
2023-03-24 15:03         ` Jason Gunthorpe
2023-03-21 19:14 ` [PATCH v3 13/17] iommufd: Make destroy_rwsem use a lock class per object type Jason Gunthorpe
2023-03-23  7:54   ` Tian, Kevin
2023-03-21 19:14 ` [PATCH v3 14/17] iommufd/selftest: Test iommufd_device_replace() Jason Gunthorpe
2023-03-23  7:57   ` Tian, Kevin
2023-03-23 14:32     ` Jason Gunthorpe
2023-03-21 19:14 ` [PATCH v3 15/17] iommufd: Add IOMMU_HWPT_ALLOC Jason Gunthorpe
2023-03-23  8:00   ` Tian, Kevin
2023-03-21 19:14 ` [PATCH v3 16/17] iommufd/selftest: Return the real idev id from selftest mock_domain Jason Gunthorpe
2023-03-23  8:02   ` Tian, Kevin
2023-03-21 19:14 ` [PATCH v3 17/17] iommufd/selftest: Add a selftest for IOMMU_HWPT_ALLOC Jason Gunthorpe
2023-03-23  8:03   ` Tian, Kevin
2023-03-23  8:04 ` [PATCH v3 00/17] Add iommufd physical device operations for replace and alloc hwpt Tian, Kevin
2023-03-23 14:35   ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).