iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] iommu: Extend changing default domain to normal group
@ 2023-02-13  7:49 Lu Baolu
  2023-02-13  7:49 ` [PATCH 1/4] iommu: Add dev_iommu->ops_rwsem Lu Baolu
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Lu Baolu @ 2023-02-13  7:49 UTC (permalink / raw)
  To: iommu
  Cc: Joerg Roedel, Jason Gunthorpe, Christoph Hellwig, Kevin Tian,
	Will Deacon, Robin Murphy, linux-kernel, Lu Baolu

The IOMMU group sysfs interface allows users to change the default
domain of a group. The current implementation uses device_lock() to make
sure that the devices in the group are not bound to any driver and won't
be bound during the process of changing the default domain. In order to
avoid a possible deadlock caused by lock order of device_lock and
group->mutex, it limits the functionality to singleton groups only.

The recently implemented DMA ownership framework can be applied here to
replace device_lock(). In addition, add an rw lock to ensure that the
iommu ops of the devices is always valid during the process of changing
default domain.

With above replacement and enhancement, the device_lock() could be
removed and the singleton-group-only limitation could be removed.

The whole series is also available on github:

https://github.com/LuBaolu/intel-iommu/commits/iommu-sysfs-default-domain-extension-v1

Please help to review and suggest.

Lu Baolu (4):
  iommu: Add dev_iommu->ops_rwsem
  iommu: Use group ownership to avoid driver attachment
  iommu: Remove unnecessary device_lock()
  iommu: Cleanup iommu_change_dev_def_domain()

 include/linux/iommu.h |   3 +
 drivers/iommu/iommu.c | 187 +++++++++++++++++-------------------------
 2 files changed, 78 insertions(+), 112 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/4] iommu: Add dev_iommu->ops_rwsem
  2023-02-13  7:49 [PATCH 0/4] iommu: Extend changing default domain to normal group Lu Baolu
@ 2023-02-13  7:49 ` Lu Baolu
  2023-02-13 14:16   ` Jason Gunthorpe
  2023-02-13  7:49 ` [PATCH 2/4] iommu: Use group ownership to avoid driver attachment Lu Baolu
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 17+ messages in thread
From: Lu Baolu @ 2023-02-13  7:49 UTC (permalink / raw)
  To: iommu
  Cc: Joerg Roedel, Jason Gunthorpe, Christoph Hellwig, Kevin Tian,
	Will Deacon, Robin Murphy, linux-kernel, Lu Baolu

Add a RW semaphore to make sure that iommu_ops of a device is consistent
in any non-driver-oriented path, such as a store operation on the iommu
group sysfs node.

Add a pair of helpers to freeze and unfreeze the iommu ops of all devices
in an iommu group, and use them in iommu_group_store_type().

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 include/linux/iommu.h |  3 +++
 drivers/iommu/iommu.c | 53 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 3589d1b8f922..a4204e1bfef3 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -402,6 +402,8 @@ struct iommu_fault_param {
  * @fwspec:	 IOMMU fwspec data
  * @iommu_dev:	 IOMMU device this device is linked to
  * @priv:	 IOMMU Driver private data
+ * @ops_rwsem:	 RW semaphore to synchronize between device release
+ *		 path and the sysfs interfaces.
  * @max_pasids:  number of PASIDs this device can consume
  * @attach_deferred: the dma domain attachment is deferred
  *
@@ -415,6 +417,7 @@ struct dev_iommu {
 	struct iommu_fwspec		*fwspec;
 	struct iommu_device		*iommu_dev;
 	void				*priv;
+	struct rw_semaphore		ops_rwsem;
 	u32				max_pasids;
 	u32				attach_deferred:1;
 };
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 5f1dc9aaba52..4f71dcd2621b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -267,6 +267,7 @@ static struct dev_iommu *dev_iommu_get(struct device *dev)
 		return NULL;
 
 	mutex_init(&param->lock);
+	init_rwsem(&param->ops_rwsem);
 	dev->iommu = param;
 	return param;
 }
@@ -461,12 +462,19 @@ void iommu_release_device(struct device *dev)
 
 	iommu_device_unlink(dev->iommu->iommu_dev, dev);
 
+	/*
+	 * The device's iommu_ops will be released in .release_device
+	 * callback. Hold ops_rwsem to avoid use after release.
+	 */
+	down_write(&dev->iommu->ops_rwsem);
 	ops = dev_iommu_ops(dev);
 	if (ops->release_device)
 		ops->release_device(dev);
+	module_put(ops->owner);
+	dev->iommu->iommu_dev = NULL;
+	up_write(&dev->iommu->ops_rwsem);
 
 	iommu_group_remove_device(dev);
-	module_put(ops->owner);
 	dev_iommu_free(dev);
 }
 
@@ -2911,6 +2919,46 @@ static int iommu_change_dev_def_domain(struct iommu_group *group,
 	return ret;
 }
 
+static int iommu_group_freeze_dev_ops(struct iommu_group *group)
+{
+	struct group_device *device;
+	struct device *dev;
+
+	mutex_lock(&group->mutex);
+	list_for_each_entry(device, &group->devices, list) {
+		dev = device->dev;
+		down_read(&dev->iommu->ops_rwsem);
+		/* .release_device has been called. */
+		if (!dev->iommu->iommu_dev) {
+			up_read(&dev->iommu->ops_rwsem);
+			goto restore_out;
+		}
+	}
+	mutex_unlock(&group->mutex);
+
+	return 0;
+
+restore_out:
+	list_for_each_entry(device, &group->devices, list) {
+		if (device->dev == dev)
+			break;
+		up_read(&device->dev->iommu->ops_rwsem);
+	}
+	mutex_unlock(&group->mutex);
+
+	return -EINVAL;
+}
+
+static void iommu_group_unfreeze_dev_ops(struct iommu_group *group)
+{
+	struct group_device *device;
+
+	mutex_lock(&group->mutex);
+	list_for_each_entry(device, &group->devices, list)
+		up_read(&device->dev->iommu->ops_rwsem);
+	mutex_unlock(&group->mutex);
+}
+
 /*
  * Changing the default domain through sysfs requires the users to unbind the
  * drivers from the devices in the iommu group, except for a DMA -> DMA-FQ
@@ -2988,6 +3036,8 @@ static ssize_t iommu_group_store_type(struct iommu_group *group,
 	 */
 	mutex_unlock(&group->mutex);
 
+	iommu_group_freeze_dev_ops(group);
+
 	/* Check if the device in the group still has a driver bound to it */
 	device_lock(dev);
 	if (device_is_bound(dev) && !(req_type == IOMMU_DOMAIN_DMA_FQ &&
@@ -3002,6 +3052,7 @@ static ssize_t iommu_group_store_type(struct iommu_group *group,
 
 out:
 	device_unlock(dev);
+	iommu_group_unfreeze_dev_ops(group);
 	put_device(dev);
 
 	return ret;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/4] iommu: Use group ownership to avoid driver attachment
  2023-02-13  7:49 [PATCH 0/4] iommu: Extend changing default domain to normal group Lu Baolu
  2023-02-13  7:49 ` [PATCH 1/4] iommu: Add dev_iommu->ops_rwsem Lu Baolu
@ 2023-02-13  7:49 ` Lu Baolu
  2023-02-13 14:19   ` Jason Gunthorpe
  2023-02-13  7:49 ` [PATCH 3/4] iommu: Remove unnecessary device_lock() Lu Baolu
  2023-02-13  7:49 ` [PATCH 4/4] iommu: Cleanup iommu_change_dev_def_domain() Lu Baolu
  3 siblings, 1 reply; 17+ messages in thread
From: Lu Baolu @ 2023-02-13  7:49 UTC (permalink / raw)
  To: iommu
  Cc: Joerg Roedel, Jason Gunthorpe, Christoph Hellwig, Kevin Tian,
	Will Deacon, Robin Murphy, linux-kernel, Lu Baolu

The iommu_group_store_type() requires the devices in the iommu group are
not bound to any device driver during the whole operation. The existing
code locks the device with device_lock(dev) and use device_is_bound() to
check whether any driver is bound to device.

In fact, this can be achieved through the DMA ownership helpers. Replace
them with iommu_group_claim/release_dma_owner() helpers.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/iommu.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 4f71dcd2621b..6547cb38480c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2807,12 +2807,6 @@ static int iommu_change_dev_def_domain(struct iommu_group *group,
 
 	mutex_lock(&group->mutex);
 
-	if (group->default_domain != group->domain) {
-		dev_err_ratelimited(prev_dev, "Group not assigned to default domain\n");
-		ret = -EBUSY;
-		goto out;
-	}
-
 	/*
 	 * iommu group wasn't locked while acquiring device lock in
 	 * iommu_group_store_type(). So, make sure that the device count hasn't
@@ -2971,6 +2965,7 @@ static void iommu_group_unfreeze_dev_ops(struct iommu_group *group)
 static ssize_t iommu_group_store_type(struct iommu_group *group,
 				      const char *buf, size_t count)
 {
+	bool group_owner_claimed = false;
 	struct group_device *grp_dev;
 	struct device *dev;
 	int ret, req_type;
@@ -2992,6 +2987,14 @@ static ssize_t iommu_group_store_type(struct iommu_group *group,
 	else
 		return -EINVAL;
 
+	if (req_type != IOMMU_DOMAIN_DMA_FQ ||
+	    group->default_domain->type != IOMMU_DOMAIN_DMA) {
+		ret = iommu_group_claim_dma_owner(group, (void *)buf);
+		if (ret)
+			return ret;
+		group_owner_claimed = true;
+	}
+
 	/*
 	 * Lock/Unlock the group mutex here before device lock to
 	 * 1. Make sure that the iommu group has only one device (this is a
@@ -3001,6 +3004,8 @@ static ssize_t iommu_group_store_type(struct iommu_group *group,
 	mutex_lock(&group->mutex);
 	if (iommu_group_device_count(group) != 1) {
 		mutex_unlock(&group->mutex);
+		if (group_owner_claimed)
+			iommu_group_release_dma_owner(group);
 		pr_err_ratelimited("Cannot change default domain: Group has more than one device\n");
 		return -EINVAL;
 	}
@@ -3038,22 +3043,16 @@ static ssize_t iommu_group_store_type(struct iommu_group *group,
 
 	iommu_group_freeze_dev_ops(group);
 
-	/* Check if the device in the group still has a driver bound to it */
 	device_lock(dev);
-	if (device_is_bound(dev) && !(req_type == IOMMU_DOMAIN_DMA_FQ &&
-	    group->default_domain->type == IOMMU_DOMAIN_DMA)) {
-		pr_err_ratelimited("Device is still bound to driver\n");
-		ret = -EBUSY;
-		goto out;
-	}
 
 	ret = iommu_change_dev_def_domain(group, dev, req_type);
 	ret = ret ?: count;
 
-out:
 	device_unlock(dev);
 	iommu_group_unfreeze_dev_ops(group);
 	put_device(dev);
+	if (group_owner_claimed)
+		iommu_group_release_dma_owner(group);
 
 	return ret;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/4] iommu: Remove unnecessary device_lock()
  2023-02-13  7:49 [PATCH 0/4] iommu: Extend changing default domain to normal group Lu Baolu
  2023-02-13  7:49 ` [PATCH 1/4] iommu: Add dev_iommu->ops_rwsem Lu Baolu
  2023-02-13  7:49 ` [PATCH 2/4] iommu: Use group ownership to avoid driver attachment Lu Baolu
@ 2023-02-13  7:49 ` Lu Baolu
  2023-02-13  7:49 ` [PATCH 4/4] iommu: Cleanup iommu_change_dev_def_domain() Lu Baolu
  3 siblings, 0 replies; 17+ messages in thread
From: Lu Baolu @ 2023-02-13  7:49 UTC (permalink / raw)
  To: iommu
  Cc: Joerg Roedel, Jason Gunthorpe, Christoph Hellwig, Kevin Tian,
	Will Deacon, Robin Murphy, linux-kernel, Lu Baolu

device_lock() was used in iommu_group_store_type() to prevent the
devices in an iommu group from being attached by any device driver.
On the other hand, in order to avoid lock race between group->mutex
and device_lock(), it limited the usage scenario to the singleton
groups.

We have used the DMA ownership framework to avoid driver attachment
and ensured that device ops are always valid, there's no need to
lock device anymore. Remove use of device_lock() and the singleton
group limitation.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/iommu.c | 78 ++++---------------------------------------
 1 file changed, 6 insertions(+), 72 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 6547cb38480c..1e69def20a67 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2786,8 +2786,6 @@ EXPORT_SYMBOL_GPL(iommu_dev_disable_feature);
  * Changes the default domain of an iommu group that has *only* one device
  *
  * @group: The group for which the default domain should be changed
- * @prev_dev: The device in the group (this is used to make sure that the device
- *	 hasn't changed after the caller has called this function)
  * @type: The type of the new default domain that gets associated with the group
  *
  * Returns 0 on success and error code on failure
@@ -2797,8 +2795,7 @@ EXPORT_SYMBOL_GPL(iommu_dev_disable_feature);
  *    group's default domain type through /sys/kernel/iommu_groups/<grp_id>/type
  *    Please take a closer look if intended to use for other purposes.
  */
-static int iommu_change_dev_def_domain(struct iommu_group *group,
-				       struct device *prev_dev, int type)
+static int iommu_change_dev_def_domain(struct iommu_group *group, int type)
 {
 	struct iommu_domain *prev_dom;
 	struct group_device *grp_dev;
@@ -2821,7 +2818,7 @@ static int iommu_change_dev_def_domain(struct iommu_group *group,
 	 * waiting for T1 to release other device locks.
 	 */
 	if (iommu_group_device_count(group) != 1) {
-		dev_err_ratelimited(prev_dev, "Cannot change default domain: Group has more than one device\n");
+		pr_err_ratelimited("Cannot change default domain: Group has more than one device\n");
 		ret = -EINVAL;
 		goto out;
 	}
@@ -2830,12 +2827,6 @@ static int iommu_change_dev_def_domain(struct iommu_group *group,
 	grp_dev = list_first_entry(&group->devices, struct group_device, list);
 	dev = grp_dev->dev;
 
-	if (prev_dev != dev) {
-		dev_err_ratelimited(prev_dev, "Cannot change default domain: Device has been changed\n");
-		ret = -EBUSY;
-		goto out;
-	}
-
 	prev_dom = group->default_domain;
 	if (!prev_dom) {
 		ret = -EINVAL;
@@ -2851,8 +2842,8 @@ static int iommu_change_dev_def_domain(struct iommu_group *group,
 		 */
 		type = dev_def_dom ? : iommu_def_domain_type;
 	} else if (dev_def_dom && type != dev_def_dom) {
-		dev_err_ratelimited(prev_dev, "Device cannot be in %s domain\n",
-				    iommu_domain_type_str(type));
+		pr_err_ratelimited("Device cannot be in %s domain\n",
+				   iommu_domain_type_str(type));
 		ret = -EINVAL;
 		goto out;
 	}
@@ -2959,15 +2950,11 @@ static void iommu_group_unfreeze_dev_ops(struct iommu_group *group)
  * transition. Return failure if this isn't met.
  *
  * We need to consider the race between this and the device release path.
- * device_lock(dev) is used here to guarantee that the device release path
- * will not be entered at the same time.
  */
 static ssize_t iommu_group_store_type(struct iommu_group *group,
 				      const char *buf, size_t count)
 {
 	bool group_owner_claimed = false;
-	struct group_device *grp_dev;
-	struct device *dev;
 	int ret, req_type;
 
 	if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SYS_RAWIO))
@@ -2995,66 +2982,13 @@ static ssize_t iommu_group_store_type(struct iommu_group *group,
 		group_owner_claimed = true;
 	}
 
-	/*
-	 * Lock/Unlock the group mutex here before device lock to
-	 * 1. Make sure that the iommu group has only one device (this is a
-	 *    prerequisite for step 2)
-	 * 2. Get struct *dev which is needed to lock device
-	 */
-	mutex_lock(&group->mutex);
-	if (iommu_group_device_count(group) != 1) {
-		mutex_unlock(&group->mutex);
-		if (group_owner_claimed)
-			iommu_group_release_dma_owner(group);
-		pr_err_ratelimited("Cannot change default domain: Group has more than one device\n");
-		return -EINVAL;
-	}
-
-	/* Since group has only one device */
-	grp_dev = list_first_entry(&group->devices, struct group_device, list);
-	dev = grp_dev->dev;
-	get_device(dev);
-
-	/*
-	 * Don't hold the group mutex because taking group mutex first and then
-	 * the device lock could potentially cause a deadlock as below. Assume
-	 * two threads T1 and T2. T1 is trying to change default domain of an
-	 * iommu group and T2 is trying to hot unplug a device or release [1] VF
-	 * of a PCIe device which is in the same iommu group. T1 takes group
-	 * mutex and before it could take device lock assume T2 has taken device
-	 * lock and is yet to take group mutex. Now, both the threads will be
-	 * waiting for the other thread to release lock. Below, lock order was
-	 * suggested.
-	 * device_lock(dev);
-	 *	mutex_lock(&group->mutex);
-	 *		iommu_change_dev_def_domain();
-	 *	mutex_unlock(&group->mutex);
-	 * device_unlock(dev);
-	 *
-	 * [1] Typical device release path
-	 * device_lock() from device/driver core code
-	 *  -> bus_notifier()
-	 *   -> iommu_bus_notifier()
-	 *    -> iommu_release_device()
-	 *     -> ops->release_device() vendor driver calls back iommu core code
-	 *      -> mutex_lock() from iommu core code
-	 */
-	mutex_unlock(&group->mutex);
-
 	iommu_group_freeze_dev_ops(group);
-
-	device_lock(dev);
-
-	ret = iommu_change_dev_def_domain(group, dev, req_type);
-	ret = ret ?: count;
-
-	device_unlock(dev);
+	ret = iommu_change_dev_def_domain(group, req_type);
 	iommu_group_unfreeze_dev_ops(group);
-	put_device(dev);
 	if (group_owner_claimed)
 		iommu_group_release_dma_owner(group);
 
-	return ret;
+	return ret ?: count;
 }
 
 static bool iommu_is_default_domain(struct iommu_group *group)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/4] iommu: Cleanup iommu_change_dev_def_domain()
  2023-02-13  7:49 [PATCH 0/4] iommu: Extend changing default domain to normal group Lu Baolu
                   ` (2 preceding siblings ...)
  2023-02-13  7:49 ` [PATCH 3/4] iommu: Remove unnecessary device_lock() Lu Baolu
@ 2023-02-13  7:49 ` Lu Baolu
  3 siblings, 0 replies; 17+ messages in thread
From: Lu Baolu @ 2023-02-13  7:49 UTC (permalink / raw)
  To: iommu
  Cc: Joerg Roedel, Jason Gunthorpe, Christoph Hellwig, Kevin Tian,
	Will Deacon, Robin Murphy, linux-kernel, Lu Baolu

As the singleton group limitation has been removed, cleanup the code
in iommu_change_dev_def_domain() accordingly.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/iommu.c | 41 ++++++++++-------------------------------
 1 file changed, 10 insertions(+), 31 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 1e69def20a67..67b978403a05 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2783,7 +2783,7 @@ int iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features feat)
 EXPORT_SYMBOL_GPL(iommu_dev_disable_feature);
 
 /*
- * Changes the default domain of an iommu group that has *only* one device
+ * Changes the default domain of an iommu group
  *
  * @group: The group for which the default domain should be changed
  * @type: The type of the new default domain that gets associated with the group
@@ -2797,33 +2797,14 @@ EXPORT_SYMBOL_GPL(iommu_dev_disable_feature);
  */
 static int iommu_change_dev_def_domain(struct iommu_group *group, int type)
 {
+	struct __group_domain_type gtype = {NULL, 0};
 	struct iommu_domain *prev_dom;
 	struct group_device *grp_dev;
-	int ret, dev_def_dom;
 	struct device *dev;
+	int ret;
 
 	mutex_lock(&group->mutex);
 
-	/*
-	 * iommu group wasn't locked while acquiring device lock in
-	 * iommu_group_store_type(). So, make sure that the device count hasn't
-	 * changed while acquiring device lock.
-	 *
-	 * Changing default domain of an iommu group with two or more devices
-	 * isn't supported because there could be a potential deadlock. Consider
-	 * the following scenario. T1 is trying to acquire device locks of all
-	 * the devices in the group and before it could acquire all of them,
-	 * there could be another thread T2 (from different sub-system and use
-	 * case) that has already acquired some of the device locks and might be
-	 * waiting for T1 to release other device locks.
-	 */
-	if (iommu_group_device_count(group) != 1) {
-		pr_err_ratelimited("Cannot change default domain: Group has more than one device\n");
-		ret = -EINVAL;
-		goto out;
-	}
-
-	/* Since group has only one device */
 	grp_dev = list_first_entry(&group->devices, struct group_device, list);
 	dev = grp_dev->dev;
 
@@ -2833,15 +2814,16 @@ static int iommu_change_dev_def_domain(struct iommu_group *group, int type)
 		goto out;
 	}
 
-	dev_def_dom = iommu_get_def_domain_type(dev);
+	__iommu_group_for_each_dev(group, &gtype,
+				   probe_get_default_domain_type);
 	if (!type) {
 		/*
 		 * If the user hasn't requested any specific type of domain and
 		 * if the device supports both the domains, then default to the
 		 * domain the device was booted with
 		 */
-		type = dev_def_dom ? : iommu_def_domain_type;
-	} else if (dev_def_dom && type != dev_def_dom) {
+		type = gtype.type ? : iommu_def_domain_type;
+	} else if (gtype.type && type != gtype.type) {
 		pr_err_ratelimited("Device cannot be in %s domain\n",
 				   iommu_domain_type_str(type));
 		ret = -EINVAL;
@@ -2870,16 +2852,14 @@ static int iommu_change_dev_def_domain(struct iommu_group *group, int type)
 	if (ret)
 		goto out;
 
-	ret = iommu_create_device_direct_mappings(group, dev);
+	ret = iommu_group_create_direct_mappings(group);
 	if (ret)
 		goto free_new_domain;
 
-	ret = __iommu_attach_device(group->default_domain, dev);
+	ret = __iommu_attach_group(group->default_domain, group);
 	if (ret)
 		goto free_new_domain;
 
-	group->domain = group->default_domain;
-
 	/*
 	 * Release the mutex here because ops->probe_finalize() call-back of
 	 * some vendor IOMMU drivers calls arm_iommu_attach_device() which
@@ -2889,7 +2869,7 @@ static int iommu_change_dev_def_domain(struct iommu_group *group, int type)
 	mutex_unlock(&group->mutex);
 
 	/* Make sure dma_ops is appropriatley set */
-	iommu_group_do_probe_finalize(dev, group->default_domain);
+	__iommu_group_dma_finalize(group);
 	iommu_domain_free(prev_dom);
 	return 0;
 
@@ -2897,7 +2877,6 @@ static int iommu_change_dev_def_domain(struct iommu_group *group, int type)
 	iommu_domain_free(group->default_domain);
 	group->default_domain = prev_dom;
 	group->domain = prev_dom;
-
 out:
 	mutex_unlock(&group->mutex);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] iommu: Add dev_iommu->ops_rwsem
  2023-02-13  7:49 ` [PATCH 1/4] iommu: Add dev_iommu->ops_rwsem Lu Baolu
@ 2023-02-13 14:16   ` Jason Gunthorpe
  2023-02-15  5:34     ` Baolu Lu
  0 siblings, 1 reply; 17+ messages in thread
From: Jason Gunthorpe @ 2023-02-13 14:16 UTC (permalink / raw)
  To: Lu Baolu
  Cc: iommu, Joerg Roedel, Christoph Hellwig, Kevin Tian, Will Deacon,
	Robin Murphy, linux-kernel

On Mon, Feb 13, 2023 at 03:49:38PM +0800, Lu Baolu wrote:

> +static int iommu_group_freeze_dev_ops(struct iommu_group *group)
> +{
> +	struct group_device *device;
> +	struct device *dev;
> +
> +	mutex_lock(&group->mutex);
> +	list_for_each_entry(device, &group->devices, list) {
> +		dev = device->dev;
> +		down_read(&dev->iommu->ops_rwsem);

This isn't allowed, you can't obtain locks in a loop like this, it
will deadlock.

You don't need more locks, we already have the group mutex, the
release path should be fixed to use it properly as I was trying to do here:

https://lore.kernel.org/kvm/4-v1-ef00ffecea52+2cb-iommu_group_lifetime_jgg@nvidia.com/
https://lore.kernel.org/kvm/YyyTxx0HnA3maxEk@nvidia.com/

Then what you'd do in a path like this is:

  mutex_lock(&group->mutex);
  dev = iommu_group_first_device(group)
  if (!dev)
     /* Racing with group cleanup */
     return -EINVAL;
  
  /* Now dev->ops is valid and must remain valid so long as
     group->mutex is held */
   

The only reason this doesn't work already is because of the above
stuff about release not holding the group mutex when manipulating the
devices in the group. This is simply mis-locked.

Robin explained it was done like this because
arm_iommu_detach_device() re-enters the iommu core during release_dev,
so I suggest fixing that by simply not doing that (see above)

Below is the lastest attempt I had tried, I didn't have time to get back
to it and we fixed the bug another way. It needs a bit of adjusting to
also remove the device from the group after release is called within
the same mutex critical region.

Jason

diff --git a/arch/arm/include/asm/dma-iommu.h b/arch/arm/include/asm/dma-iommu.h
index fe9ef6f79e9cfe..ea7198a1786186 100644
--- a/arch/arm/include/asm/dma-iommu.h
+++ b/arch/arm/include/asm/dma-iommu.h
@@ -31,6 +31,7 @@ void arm_iommu_release_mapping(struct dma_iommu_mapping *mapping);
 int arm_iommu_attach_device(struct device *dev,
 					struct dma_iommu_mapping *mapping);
 void arm_iommu_detach_device(struct device *dev);
+void arm_iommu_release_device(struct device *dev);
 
 #endif /* __KERNEL__ */
 #endif
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index c135f6e37a00ca..3d7b385e3981ef 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1680,13 +1680,15 @@ int arm_iommu_attach_device(struct device *dev,
 EXPORT_SYMBOL_GPL(arm_iommu_attach_device);
 
 /**
- * arm_iommu_detach_device
+ * arm_iommu_release_device
  * @dev: valid struct device pointer
  *
- * Detaches the provided device from a previously attached map.
- * This overwrites the dma_ops pointer with appropriate non-IOMMU ops.
+ * This is like arm_iommu_detach_device() except it handles the special
+ * case of being called under an iommu driver's release operation. In this
+ * case the driver must have already detached the device from any attached
+ * domain before calling this function.
  */
-void arm_iommu_detach_device(struct device *dev)
+void arm_iommu_release_device(struct device *dev)
 {
 	struct dma_iommu_mapping *mapping;
 
@@ -1696,13 +1698,34 @@ void arm_iommu_detach_device(struct device *dev)
 		return;
 	}
 
-	iommu_detach_device(mapping->domain, dev);
 	kref_put(&mapping->kref, release_iommu_mapping);
 	to_dma_iommu_mapping(dev) = NULL;
 	set_dma_ops(dev, NULL);
 
 	pr_debug("Detached IOMMU controller from %s device.\n", dev_name(dev));
 }
+EXPORT_SYMBOL_GPL(arm_iommu_release_device);
+
+/**
+ * arm_iommu_detach_device
+ * @dev: valid struct device pointer
+ *
+ * Detaches the provided device from a previously attached map.
+ * This overwrites the dma_ops pointer with appropriate non-IOMMU ops.
+ */
+void arm_iommu_detach_device(struct device *dev)
+{
+	struct dma_iommu_mapping *mapping;
+
+	mapping = to_dma_iommu_mapping(dev);
+	if (!mapping) {
+		dev_warn(dev, "Not attached\n");
+		return;
+	}
+
+	iommu_detach_device(mapping->domain, dev);
+	arm_iommu_release_device(dev);
+}
 EXPORT_SYMBOL_GPL(arm_iommu_detach_device);
 
 static void arm_setup_iommu_dma_ops(struct device *dev, u64 dma_base, u64 size,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index de91dd88705bd3..f3dbd980133567 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -101,6 +101,10 @@ static int iommu_create_device_direct_mappings(struct iommu_group *group,
 static struct iommu_group *iommu_group_get_for_dev(struct device *dev);
 static ssize_t iommu_group_store_type(struct iommu_group *group,
 				      const char *buf, size_t count);
+static struct group_device *
+__iommu_group_remove_device(struct iommu_group *group, struct device *dev);
+static void __iommu_group_remove_device_sysfs(struct iommu_group *group,
+					      struct group_device *device);
 
 #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)		\
 struct iommu_group_attribute iommu_group_attr_##_name =		\
@@ -430,18 +434,50 @@ int iommu_probe_device(struct device *dev)
 
 void iommu_release_device(struct device *dev)
 {
+	struct iommu_group *group = dev->iommu_group;
 	const struct iommu_ops *ops;
+	struct group_device *device;
 
 	if (!dev->iommu)
 		return;
 
 	iommu_device_unlink(dev->iommu->iommu_dev, dev);
 
+	mutex_lock(&group->mutex);
+	device = __iommu_group_remove_device(group, dev);
 	ops = dev_iommu_ops(dev);
+
+	/*
+	 * If the group has become empty then ownership must have been released,
+	 * and the current domain must be set back to NULL or the default
+	 * domain.
+	 */
+	if (list_empty(&group->devices))
+		WARN_ON(group->owner_cnt ||
+			group->domain != group->default_domain);
+
+	/*
+	 * release_device() must stop using any attached domain on the device.
+	 * If there are still other devices in the group they are not effected
+	 * by this callback.
+	 *
+	 * The IOMMU driver must set the device to either an identity or
+	 * blocking translation and stop using any domain pointer, as it is
+	 * going to be freed.
+	 */
 	if (ops->release_device)
 		ops->release_device(dev);
+	mutex_unlock(&group->mutex);
+
+	__iommu_group_remove_device_sysfs(group, device);
+
+	/*
+	 * This will eventually call iommu_group_release() which will free the
+	 * iommu_domains.
+	 */
+	dev->iommu_group = NULL;
+	kobject_put(group->devices_kobj);
 
-	iommu_group_remove_device(dev);
 	module_put(ops->owner);
 	dev_iommu_free(dev);
 }
@@ -1039,44 +1075,71 @@ int iommu_group_add_device(struct iommu_group *group, struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_group_add_device);
 
-/**
- * iommu_group_remove_device - remove a device from it's current group
- * @dev: device to be removed
- *
- * This function is called by an iommu driver to remove the device from
- * it's current group.  This decrements the iommu group reference count.
- */
-void iommu_group_remove_device(struct device *dev)
+static struct group_device *
+__iommu_group_remove_device(struct iommu_group *group, struct device *dev)
 {
-	struct iommu_group *group = dev->iommu_group;
-	struct group_device *tmp_device, *device = NULL;
+	struct group_device *device;
+
+	lockdep_assert_held(&group->mutex);
 
 	if (!group)
-		return;
+		return NULL;
 
 	dev_info(dev, "Removing from iommu group %d\n", group->id);
 
-	mutex_lock(&group->mutex);
-	list_for_each_entry(tmp_device, &group->devices, list) {
-		if (tmp_device->dev == dev) {
-			device = tmp_device;
+	list_for_each_entry(device, &group->devices, list) {
+		if (device->dev == dev) {
 			list_del(&device->list);
-			break;
+			return device;
 		}
 	}
-	mutex_unlock(&group->mutex);
+	return NULL;
+}
 
+static void __iommu_group_remove_device_sysfs(struct iommu_group *group,
+					      struct group_device *device)
+{
 	if (!device)
 		return;
 
 	sysfs_remove_link(group->devices_kobj, device->name);
-	sysfs_remove_link(&dev->kobj, "iommu_group");
+	sysfs_remove_link(&device->dev->kobj, "iommu_group");
 
-	trace_remove_device_from_group(group->id, dev);
+	trace_remove_device_from_group(group->id, device->dev);
 
 	kfree(device->name);
 	kfree(device);
-	dev->iommu_group = NULL;
+}
+
+/**
+ * iommu_group_remove_device - remove a device from it's current group
+ * @dev: device to be removed
+ *
+ * This function is used by non-iommu drivers to create non-iommu subystem
+ * groups (eg VFIO mdev, SPAPR) Using this from inside an iommu driver is an
+ * abuse. Instead the driver should return the correct group from
+ * ops->device_group()
+ */
+void iommu_group_remove_device(struct device *dev)
+{
+	struct iommu_group *group = dev->iommu_group;
+	struct group_device *device;
+
+	/*
+	 * Since we don't do ops->release_device() there is no way for the
+	 * driver to stop using any attached domain before we free it. If a
+	 * domain is attached while this function is called it will eventually
+	 * UAF.
+	 *
+	 * Thus it is only useful for cases like VFIO/SPAPR that don't use an
+	 * iommu driver, or for cases like FSL that don't use default domains.
+	 */
+	WARN_ON(group->domain);
+
+	mutex_lock(&group->mutex);
+	device = __iommu_group_remove_device(group, dev);
+	mutex_unlock(&group->mutex);
+	__iommu_group_remove_device_sysfs(group, device);
 	kobject_put(group->devices_kobj);
 }
 EXPORT_SYMBOL_GPL(iommu_group_remove_device);
diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index a003bd5fc65c13..703a6083172de1 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -302,11 +302,8 @@ static void ipmmu_utlb_enable(struct ipmmu_vmsa_domain *domain,
 /*
  * Disable MMU translation for the microTLB.
  */
-static void ipmmu_utlb_disable(struct ipmmu_vmsa_domain *domain,
-			       unsigned int utlb)
+static void ipmmu_utlb_disable(struct ipmmu_vmsa_device *mmu, unsigned int utlb)
 {
-	struct ipmmu_vmsa_device *mmu = domain->mmu;
-
 	ipmmu_imuctr_write(mmu, utlb, 0);
 	mmu->utlb_ctx[utlb] = IPMMU_CTX_INVALID;
 }
@@ -647,11 +644,11 @@ static void ipmmu_detach_device(struct iommu_domain *io_domain,
 				struct device *dev)
 {
 	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
-	struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);
+	struct ipmmu_vmsa_device *mmu = to_ipmmu(dev);
 	unsigned int i;
 
 	for (i = 0; i < fwspec->num_ids; ++i)
-		ipmmu_utlb_disable(domain, fwspec->ids[i]);
+		ipmmu_utlb_disable(mmu, fwspec->ids[i]);
 
 	/*
 	 * TODO: Optimize by disabling the context when no device is attached.
@@ -847,7 +844,8 @@ static void ipmmu_probe_finalize(struct device *dev)
 
 static void ipmmu_release_device(struct device *dev)
 {
-	arm_iommu_detach_device(dev);
+	ipmmu_detach_device(NULL, dev);
+	arm_iommu_release_device(dev);
 }
 
 static struct iommu_group *ipmmu_find_group(struct device *dev)

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] iommu: Use group ownership to avoid driver attachment
  2023-02-13  7:49 ` [PATCH 2/4] iommu: Use group ownership to avoid driver attachment Lu Baolu
@ 2023-02-13 14:19   ` Jason Gunthorpe
  2023-02-15  5:51     ` Baolu Lu
  0 siblings, 1 reply; 17+ messages in thread
From: Jason Gunthorpe @ 2023-02-13 14:19 UTC (permalink / raw)
  To: Lu Baolu
  Cc: iommu, Joerg Roedel, Christoph Hellwig, Kevin Tian, Will Deacon,
	Robin Murphy, linux-kernel

On Mon, Feb 13, 2023 at 03:49:39PM +0800, Lu Baolu wrote:
> The iommu_group_store_type() requires the devices in the iommu group are
> not bound to any device driver during the whole operation. The existing
> code locks the device with device_lock(dev) and use device_is_bound() to
> check whether any driver is bound to device.
> 
> In fact, this can be achieved through the DMA ownership helpers. Replace
> them with iommu_group_claim/release_dma_owner() helpers.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/iommu.c | 27 +++++++++++++--------------
>  1 file changed, 13 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 4f71dcd2621b..6547cb38480c 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2807,12 +2807,6 @@ static int iommu_change_dev_def_domain(struct iommu_group *group,
>  
>  	mutex_lock(&group->mutex);
>  
> -	if (group->default_domain != group->domain) {
> -		dev_err_ratelimited(prev_dev, "Group not assigned to default domain\n");
> -		ret = -EBUSY;
> -		goto out;
> -	}
> -
>  	/*
>  	 * iommu group wasn't locked while acquiring device lock in
>  	 * iommu_group_store_type(). So, make sure that the device count hasn't
> @@ -2971,6 +2965,7 @@ static void iommu_group_unfreeze_dev_ops(struct iommu_group *group)
>  static ssize_t iommu_group_store_type(struct iommu_group *group,
>  				      const char *buf, size_t count)
>  {
> +	bool group_owner_claimed = false;
>  	struct group_device *grp_dev;
>  	struct device *dev;
>  	int ret, req_type;
> @@ -2992,6 +2987,14 @@ static ssize_t iommu_group_store_type(struct iommu_group *group,
>  	else
>  		return -EINVAL;
>  
> +	if (req_type != IOMMU_DOMAIN_DMA_FQ ||
> +	    group->default_domain->type != IOMMU_DOMAIN_DMA) {
> +		ret = iommu_group_claim_dma_owner(group, (void *)buf);
> +		if (ret)
> +			return ret;
> +		group_owner_claimed = true;
> +	}

I don't get it, this should be done unconditionally. If we couldn't
take ownership then we simply can't progress.

But there is more to it than that, a device that is owned should not
be release and to achieve this the general logic around the owner
scheme assumes that a driver is attached.

So if you call it from this non-driver context you have to hold the
group_mutex as previously discussed, which also means this needs to be
an externally version of iommu_group_claim_dma_owner()

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] iommu: Add dev_iommu->ops_rwsem
  2023-02-13 14:16   ` Jason Gunthorpe
@ 2023-02-15  5:34     ` Baolu Lu
  2023-02-15 11:24       ` Robin Murphy
  0 siblings, 1 reply; 17+ messages in thread
From: Baolu Lu @ 2023-02-15  5:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, iommu, Joerg Roedel, Christoph Hellwig, Kevin Tian,
	Will Deacon, Robin Murphy, linux-kernel

On 2/13/23 10:16 PM, Jason Gunthorpe wrote:
> On Mon, Feb 13, 2023 at 03:49:38PM +0800, Lu Baolu wrote:
> 
>> +static int iommu_group_freeze_dev_ops(struct iommu_group *group)
>> +{
>> +	struct group_device *device;
>> +	struct device *dev;
>> +
>> +	mutex_lock(&group->mutex);
>> +	list_for_each_entry(device, &group->devices, list) {
>> +		dev = device->dev;
>> +		down_read(&dev->iommu->ops_rwsem);
> 
> This isn't allowed, you can't obtain locks in a loop like this, it
> will deadlock.
> 
> You don't need more locks, we already have the group mutex, the
> release path should be fixed to use it properly as I was trying to do here:
> 
> https://lore.kernel.org/kvm/4-v1-ef00ffecea52+2cb-iommu_group_lifetime_jgg@nvidia.com/
> https://lore.kernel.org/kvm/YyyTxx0HnA3maxEk@nvidia.com/
> 
> Then what you'd do in a path like this is:
> 
>    mutex_lock(&group->mutex);
>    dev = iommu_group_first_device(group)
>    if (!dev)
>       /* Racing with group cleanup */
>       return -EINVAL;
>    
>    /* Now dev->ops is valid and must remain valid so long as
>       group->mutex is held */
>     
> 
> The only reason this doesn't work already is because of the above
> stuff about release not holding the group mutex when manipulating the
> devices in the group. This is simply mis-locked.
> 
> Robin explained it was done like this because
> arm_iommu_detach_device() re-enters the iommu core during release_dev,
> so I suggest fixing that by simply not doing that (see above)
> 
> Below is the lastest attempt I had tried, I didn't have time to get back
> to it and we fixed the bug another way. It needs a bit of adjusting to
> also remove the device from the group after release is called within
> the same mutex critical region.

Yes. If we can make remove device from list and device release in the
same mutex critical region, we don't need any other lock mechanism
anymore.

The ipmmu driver supports default domain. When code comes to release
device, the device driver has already been unbound. The default domain
should have been attached to the device. Then iommu_detach_device() does
nothing because what it really does is just attaching default domain.

How about removing iommu_detach_device() from ipmmu's release path like
below?

diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index bdf1a4e5eae0..0bc29009703e 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -820,7 +820,16 @@ static void ipmmu_probe_finalize(struct device *dev)

  static void ipmmu_release_device(struct device *dev)
  {
-       arm_iommu_detach_device(dev);
+       struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+
+       if (!mapping) {
+               dev_warn(dev, "Not attached\n");
+               return;
+       }
+
+       arm_iommu_release_mapping(mapping);
+       to_dma_iommu_mapping(dev) = NULL;
+       set_dma_ops(dev, NULL);
  }

After fixing this in ipmmu driver, we can safely put removing device and
release_device in a group->mutex critical region in two steps:

Step 1: Refactor iommu_group_remove_device() w/o functionality change:

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 05522eace439..17b2e358a6fd 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1065,6 +1065,46 @@ int iommu_group_add_device(struct iommu_group 
*group, struct device *dev)
  }
  EXPORT_SYMBOL_GPL(iommu_group_add_device);

+/*
+ * Remove a device from a group's device list and return the group device
+ * if successful.
+ */
+static struct group_device *
+__iommu_group_remove_device(struct iommu_group *group, struct device *dev)
+{
+	struct group_device *device;
+
+	lockdep_assert_held(&group->mutex);
+	list_for_each_entry(device, &group->devices, list) {
+		if (device->dev == dev) {
+			list_del(&device->list);
+			return device;
+		}
+	}
+
+	return NULL;
+}
+
+/*
+ * Release a device from its group and decrements the iommu group reference
+ * count.
+ */
+static void __iommu_group_release_device(struct iommu_group *group,
+					 struct group_device *grp_dev)
+{
+	struct device *dev = grp_dev->dev;
+
+	sysfs_remove_link(group->devices_kobj, grp_dev->name);
+	sysfs_remove_link(&dev->kobj, "iommu_group");
+
+	trace_remove_device_from_group(group->id, dev);
+
+	kfree(grp_dev->name);
+	kfree(grp_dev);
+	dev->iommu_group = NULL;
+	kobject_put(group->devices_kobj);
+}
+
  /**
   * iommu_group_remove_device - remove a device from it's current group
   * @dev: device to be removed
@@ -1075,7 +1115,7 @@ EXPORT_SYMBOL_GPL(iommu_group_add_device);
  void iommu_group_remove_device(struct device *dev)
  {
  	struct iommu_group *group = dev->iommu_group;
-	struct group_device *tmp_device, *device = NULL;
+	struct group_device *device;

  	if (!group)
  		return;
@@ -1083,27 +1123,11 @@ void iommu_group_remove_device(struct device *dev)
  	dev_info(dev, "Removing from iommu group %d\n", group->id);

  	mutex_lock(&group->mutex);
-	list_for_each_entry(tmp_device, &group->devices, list) {
-		if (tmp_device->dev == dev) {
-			device = tmp_device;
-			list_del(&device->list);
-			break;
-		}
-	}
+	device = __iommu_group_remove_device(group, dev);
  	mutex_unlock(&group->mutex);

-	if (!device)
-		return;
-
-	sysfs_remove_link(group->devices_kobj, device->name);
-	sysfs_remove_link(&dev->kobj, "iommu_group");
-
-	trace_remove_device_from_group(group->id, dev);
-
-	kfree(device->name);
-	kfree(device);
-	dev->iommu_group = NULL;
-	kobject_put(group->devices_kobj);
+	if (device)
+		__iommu_group_release_device(group, device);
  }
  EXPORT_SYMBOL_GPL(iommu_group_remove_device);


Step 2: Put removing group and release_device in a same critical region:

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 17b2e358a6fd..eeb2907472bc 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -101,6 +101,10 @@ static int 
iommu_create_device_direct_mappings(struct iommu_group *group,
  static struct iommu_group *iommu_group_get_for_dev(struct device *dev);
  static ssize_t iommu_group_store_type(struct iommu_group *group,
  				      const char *buf, size_t count);
+static struct group_device *
+__iommu_group_remove_device(struct iommu_group *group, struct device *dev);
+static void __iommu_group_release_device(struct iommu_group *group,
+					 struct group_device *grp_dev);

  #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)		\
  struct iommu_group_attribute iommu_group_attr_##_name =		\
@@ -466,18 +470,25 @@ int iommu_probe_device(struct device *dev)

  void iommu_release_device(struct device *dev)
  {
+	struct iommu_group *group = dev->iommu_group;
+	struct group_device *device;
  	const struct iommu_ops *ops;

-	if (!dev->iommu)
+	if (!dev->iommu || !group)
  		return;

  	iommu_device_unlink(dev->iommu->iommu_dev, dev);

+	mutex_lock(&group->mutex);
  	ops = dev_iommu_ops(dev);
  	if (ops->release_device)
  		ops->release_device(dev);
+	device = __iommu_group_remove_device(group, dev);
+	mutex_unlock(&group->mutex);
+
+	if (device)
+		__iommu_group_release_device(group, device);

-	iommu_group_remove_device(dev);
  	module_put(ops->owner);
  	dev_iommu_free(dev);
  }

Best regards,
baolu

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] iommu: Use group ownership to avoid driver attachment
  2023-02-13 14:19   ` Jason Gunthorpe
@ 2023-02-15  5:51     ` Baolu Lu
  2023-02-15  6:56       ` Tian, Kevin
  2023-02-15 12:56       ` Jason Gunthorpe
  0 siblings, 2 replies; 17+ messages in thread
From: Baolu Lu @ 2023-02-15  5:51 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, iommu, Joerg Roedel, Christoph Hellwig, Kevin Tian,
	Will Deacon, Robin Murphy, linux-kernel

On 2/13/23 10:19 PM, Jason Gunthorpe wrote:
> On Mon, Feb 13, 2023 at 03:49:39PM +0800, Lu Baolu wrote:
>> The iommu_group_store_type() requires the devices in the iommu group are
>> not bound to any device driver during the whole operation. The existing
>> code locks the device with device_lock(dev) and use device_is_bound() to
>> check whether any driver is bound to device.
>>
>> In fact, this can be achieved through the DMA ownership helpers. Replace
>> them with iommu_group_claim/release_dma_owner() helpers.
>>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>>   drivers/iommu/iommu.c | 27 +++++++++++++--------------
>>   1 file changed, 13 insertions(+), 14 deletions(-)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index 4f71dcd2621b..6547cb38480c 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -2807,12 +2807,6 @@ static int iommu_change_dev_def_domain(struct iommu_group *group,
>>   
>>   	mutex_lock(&group->mutex);
>>   
>> -	if (group->default_domain != group->domain) {
>> -		dev_err_ratelimited(prev_dev, "Group not assigned to default domain\n");
>> -		ret = -EBUSY;
>> -		goto out;
>> -	}
>> -
>>   	/*
>>   	 * iommu group wasn't locked while acquiring device lock in
>>   	 * iommu_group_store_type(). So, make sure that the device count hasn't
>> @@ -2971,6 +2965,7 @@ static void iommu_group_unfreeze_dev_ops(struct iommu_group *group)
>>   static ssize_t iommu_group_store_type(struct iommu_group *group,
>>   				      const char *buf, size_t count)
>>   {
>> +	bool group_owner_claimed = false;
>>   	struct group_device *grp_dev;
>>   	struct device *dev;
>>   	int ret, req_type;
>> @@ -2992,6 +2987,14 @@ static ssize_t iommu_group_store_type(struct iommu_group *group,
>>   	else
>>   		return -EINVAL;
>>   
>> +	if (req_type != IOMMU_DOMAIN_DMA_FQ ||
>> +	    group->default_domain->type != IOMMU_DOMAIN_DMA) {
>> +		ret = iommu_group_claim_dma_owner(group, (void *)buf);
>> +		if (ret)
>> +			return ret;
>> +		group_owner_claimed = true;
>> +	}
> 
> I don't get it, this should be done unconditionally. If we couldn't
> take ownership then we simply can't progress.

The existing code allows the user to switch the default domain from
strict to lazy invalidation mode. The default domain is not changed,
hence it should be seamless and transparent to the device driver.

> 
> But there is more to it than that, a device that is owned should not
> be release and to achieve this the general logic around the owner
> scheme assumes that a driver is attached.

Yes. Current ownership scheme was built on this assumption.

> 
> So if you call it from this non-driver context you have to hold the
> group_mutex as previously discussed,

Yes.

> which also means this needs to be
> an externally version of iommu_group_claim_dma_owner()

Sorry! What does "an externally version of
iommu_group_claim_dma_owner()" mean?

My understanding is that we should limit iommu_group_claim_dma_owner()
use in the driver context. For this non-driver context, we should not
use iommu_group_claim_dma_owner() directly, but hold the group->mutex
and check the group->owner_cnt directly:

         mutex_lock(&group->mutex);
         if (group->owner_cnt) {
                 ret = -EPERM;
                 goto unlock_out;
         }

the group->mutex should be held until everything is done.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH 2/4] iommu: Use group ownership to avoid driver attachment
  2023-02-15  5:51     ` Baolu Lu
@ 2023-02-15  6:56       ` Tian, Kevin
  2023-02-15  7:28         ` Baolu Lu
  2023-02-15 12:56       ` Jason Gunthorpe
  1 sibling, 1 reply; 17+ messages in thread
From: Tian, Kevin @ 2023-02-15  6:56 UTC (permalink / raw)
  To: Baolu Lu, Jason Gunthorpe
  Cc: iommu, Joerg Roedel, Christoph Hellwig, Will Deacon,
	Robin Murphy, linux-kernel

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Wednesday, February 15, 2023 1:51 PM
> 
> On 2/13/23 10:19 PM, Jason Gunthorpe wrote:
> > On Mon, Feb 13, 2023 at 03:49:39PM +0800, Lu Baolu wrote:
> >> @@ -2992,6 +2987,14 @@ static ssize_t iommu_group_store_type(struct
> iommu_group *group,
> >>   	else
> >>   		return -EINVAL;
> >>
> >> +	if (req_type != IOMMU_DOMAIN_DMA_FQ ||
> >> +	    group->default_domain->type != IOMMU_DOMAIN_DMA) {
> >> +		ret = iommu_group_claim_dma_owner(group, (void *)buf);
> >> +		if (ret)
> >> +			return ret;
> >> +		group_owner_claimed = true;
> >> +	}
> >
> > I don't get it, this should be done unconditionally. If we couldn't
> > take ownership then we simply can't progress.
> 
> The existing code allows the user to switch the default domain from
> strict to lazy invalidation mode. The default domain is not changed,
> hence it should be seamless and transparent to the device driver.

Is there real usage relying on this transition for a bound device?

In concept strict->lazy transition implies relaxed DMA security. It's hard
to think of a motivation of doing so while the device might be doing
in-fly DMAs.

Presumably such perf/security tradeoff should be planned way before
binding device/driver together.

btw if strict->lazy is allowed why lazy->strict is prohibited?

> 
> > which also means this needs to be
> > an externally version of iommu_group_claim_dma_owner()
> 
> Sorry! What does "an externally version of
> iommu_group_claim_dma_owner()" mean?
> 
> My understanding is that we should limit iommu_group_claim_dma_owner()
> use in the driver context. For this non-driver context, we should not
> use iommu_group_claim_dma_owner() directly, but hold the group->mutex
> and check the group->owner_cnt directly:
> 
>          mutex_lock(&group->mutex);
>          if (group->owner_cnt) {
>                  ret = -EPERM;
>                  goto unlock_out;
>          }
> 
> the group->mutex should be held until everything is done.
> 

I guess you two meant the same thing.

	mutex_lock(&group->mutex);
	iommu_group_claim_dma_owner_unlocked();
	//blah blah
	mutex_unlock(&group->mutex);

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] iommu: Use group ownership to avoid driver attachment
  2023-02-15  6:56       ` Tian, Kevin
@ 2023-02-15  7:28         ` Baolu Lu
  2023-02-15 11:09           ` Robin Murphy
  0 siblings, 1 reply; 17+ messages in thread
From: Baolu Lu @ 2023-02-15  7:28 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe
  Cc: baolu.lu, iommu, Joerg Roedel, Christoph Hellwig, Will Deacon,
	Robin Murphy, linux-kernel

On 2023/2/15 14:56, Tian, Kevin wrote:
>> From: Baolu Lu<baolu.lu@linux.intel.com>
>> Sent: Wednesday, February 15, 2023 1:51 PM
>>
>> On 2/13/23 10:19 PM, Jason Gunthorpe wrote:
>>> On Mon, Feb 13, 2023 at 03:49:39PM +0800, Lu Baolu wrote:
>>>> @@ -2992,6 +2987,14 @@ static ssize_t iommu_group_store_type(struct
>> iommu_group *group,
>>>>    	else
>>>>    		return -EINVAL;
>>>>
>>>> +	if (req_type != IOMMU_DOMAIN_DMA_FQ ||
>>>> +	    group->default_domain->type != IOMMU_DOMAIN_DMA) {
>>>> +		ret = iommu_group_claim_dma_owner(group, (void *)buf);
>>>> +		if (ret)
>>>> +			return ret;
>>>> +		group_owner_claimed = true;
>>>> +	}
>>> I don't get it, this should be done unconditionally. If we couldn't
>>> take ownership then we simply can't progress.
>> The existing code allows the user to switch the default domain from
>> strict to lazy invalidation mode. The default domain is not changed,
>> hence it should be seamless and transparent to the device driver.
> Is there real usage relying on this transition for a bound device?
> 
> In concept strict->lazy transition implies relaxed DMA security. It's hard
> to think of a motivation of doing so while the device might be doing
> in-fly DMAs.
> 
> Presumably such perf/security tradeoff should be planned way before
> binding device/driver together.
> 
> btw if strict->lazy is allowed why lazy->strict is prohibited?
> 

We all know, strict vs. lazy is a tradeoff between performance and
security.

strict -> lazy: driver works in secure mode. This transition trades off
security for better performance.

lazy->strict: The driver is already working in non-safety mode. This
transition only results in worse performance. It makes no sense. If user
want to put the driver in a secure mode, they need to unbind the driver,
reset the device and do the lazy->strict transition.

Robin might have better insights.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] iommu: Use group ownership to avoid driver attachment
  2023-02-15  7:28         ` Baolu Lu
@ 2023-02-15 11:09           ` Robin Murphy
  2023-02-16  0:42             ` Baolu Lu
  0 siblings, 1 reply; 17+ messages in thread
From: Robin Murphy @ 2023-02-15 11:09 UTC (permalink / raw)
  To: Baolu Lu, Tian, Kevin, Jason Gunthorpe
  Cc: iommu, Joerg Roedel, Christoph Hellwig, Will Deacon, linux-kernel

On 2023-02-15 07:28, Baolu Lu wrote:
> On 2023/2/15 14:56, Tian, Kevin wrote:
>>> From: Baolu Lu<baolu.lu@linux.intel.com>
>>> Sent: Wednesday, February 15, 2023 1:51 PM
>>>
>>> On 2/13/23 10:19 PM, Jason Gunthorpe wrote:
>>>> On Mon, Feb 13, 2023 at 03:49:39PM +0800, Lu Baolu wrote:
>>>>> @@ -2992,6 +2987,14 @@ static ssize_t iommu_group_store_type(struct
>>> iommu_group *group,
>>>>>        else
>>>>>            return -EINVAL;
>>>>>
>>>>> +    if (req_type != IOMMU_DOMAIN_DMA_FQ ||
>>>>> +        group->default_domain->type != IOMMU_DOMAIN_DMA) {
>>>>> +        ret = iommu_group_claim_dma_owner(group, (void *)buf);
>>>>> +        if (ret)
>>>>> +            return ret;
>>>>> +        group_owner_claimed = true;
>>>>> +    }
>>>> I don't get it, this should be done unconditionally. If we couldn't
>>>> take ownership then we simply can't progress.
>>> The existing code allows the user to switch the default domain from
>>> strict to lazy invalidation mode. The default domain is not changed,
>>> hence it should be seamless and transparent to the device driver.
>> Is there real usage relying on this transition for a bound device?
>>
>> In concept strict->lazy transition implies relaxed DMA security. It's 
>> hard
>> to think of a motivation of doing so while the device might be doing
>> in-fly DMAs.
>>
>> Presumably such perf/security tradeoff should be planned way before
>> binding device/driver together.
>>
>> btw if strict->lazy is allowed why lazy->strict is prohibited?
>>
> 
> We all know, strict vs. lazy is a tradeoff between performance and
> security.
> 
> strict -> lazy: driver works in secure mode. This transition trades off
> security for better performance.
> 
> lazy->strict: The driver is already working in non-safety mode. This
> transition only results in worse performance. It makes no sense. If user
> want to put the driver in a secure mode, they need to unbind the driver,
> reset the device and do the lazy->strict transition.
> 
> Robin might have better insights.

Yes, this was added for a definite use-case in ChromeOS, where 
strict->lazy needs to support being done "live" since the device in 
question is the storage controller for the already-mounted root 
filesystem. Your reasoning seems to match what I summarised in the 
original commit message :)

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] iommu: Add dev_iommu->ops_rwsem
  2023-02-15  5:34     ` Baolu Lu
@ 2023-02-15 11:24       ` Robin Murphy
  2023-02-16  0:40         ` Baolu Lu
  0 siblings, 1 reply; 17+ messages in thread
From: Robin Murphy @ 2023-02-15 11:24 UTC (permalink / raw)
  To: Baolu Lu, Jason Gunthorpe
  Cc: iommu, Joerg Roedel, Christoph Hellwig, Kevin Tian, Will Deacon,
	linux-kernel

On 2023-02-15 05:34, Baolu Lu wrote:
> On 2/13/23 10:16 PM, Jason Gunthorpe wrote:
>> On Mon, Feb 13, 2023 at 03:49:38PM +0800, Lu Baolu wrote:
>>
>>> +static int iommu_group_freeze_dev_ops(struct iommu_group *group)
>>> +{
>>> +    struct group_device *device;
>>> +    struct device *dev;
>>> +
>>> +    mutex_lock(&group->mutex);
>>> +    list_for_each_entry(device, &group->devices, list) {
>>> +        dev = device->dev;
>>> +        down_read(&dev->iommu->ops_rwsem);
>>
>> This isn't allowed, you can't obtain locks in a loop like this, it
>> will deadlock.
>>
>> You don't need more locks, we already have the group mutex, the
>> release path should be fixed to use it properly as I was trying to do 
>> here:
>>
>> https://lore.kernel.org/kvm/4-v1-ef00ffecea52+2cb-iommu_group_lifetime_jgg@nvidia.com/
>> https://lore.kernel.org/kvm/YyyTxx0HnA3maxEk@nvidia.com/
>>
>> Then what you'd do in a path like this is:
>>
>>    mutex_lock(&group->mutex);
>>    dev = iommu_group_first_device(group)
>>    if (!dev)
>>       /* Racing with group cleanup */
>>       return -EINVAL;
>>    /* Now dev->ops is valid and must remain valid so long as
>>       group->mutex is held */
>>
>> The only reason this doesn't work already is because of the above
>> stuff about release not holding the group mutex when manipulating the
>> devices in the group. This is simply mis-locked.
>>
>> Robin explained it was done like this because
>> arm_iommu_detach_device() re-enters the iommu core during release_dev,
>> so I suggest fixing that by simply not doing that (see above)
>>
>> Below is the lastest attempt I had tried, I didn't have time to get back
>> to it and we fixed the bug another way. It needs a bit of adjusting to
>> also remove the device from the group after release is called within
>> the same mutex critical region.
> 
> Yes. If we can make remove device from list and device release in the
> same mutex critical region, we don't need any other lock mechanism
> anymore.
> 
> The ipmmu driver supports default domain.

It supports default domains *on arm64*. Nothing on 32-bit ARM uses 
default domains, they won't even exist since iommu-dma is not enabled, 
and either way the ARM DMA ops don't understand groups. I don't see an 
obvious satisfactory solution while the arm_iommu_* APIs still exist, 
but if we need bodges in the interim could we please try to concentrate 
them in ARM-specific code?

Thanks,
Robin.

> When code comes to release
> device, the device driver has already been unbound. The default domain
> should have been attached to the device. Then iommu_detach_device() does
> nothing because what it really does is just attaching default domain.
> 
> How about removing iommu_detach_device() from ipmmu's release path like
> below?
> 
> diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
> index bdf1a4e5eae0..0bc29009703e 100644
> --- a/drivers/iommu/ipmmu-vmsa.c
> +++ b/drivers/iommu/ipmmu-vmsa.c
> @@ -820,7 +820,16 @@ static void ipmmu_probe_finalize(struct device *dev)
> 
>   static void ipmmu_release_device(struct device *dev)
>   {
> -       arm_iommu_detach_device(dev);
> +       struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
> +
> +       if (!mapping) {
> +               dev_warn(dev, "Not attached\n");
> +               return;
> +       }
> +
> +       arm_iommu_release_mapping(mapping);
> +       to_dma_iommu_mapping(dev) = NULL;
> +       set_dma_ops(dev, NULL);
>   }
> 
> After fixing this in ipmmu driver, we can safely put removing device and
> release_device in a group->mutex critical region in two steps:
> 
> Step 1: Refactor iommu_group_remove_device() w/o functionality change:
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 05522eace439..17b2e358a6fd 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1065,6 +1065,46 @@ int iommu_group_add_device(struct iommu_group 
> *group, struct device *dev)
>   }
>   EXPORT_SYMBOL_GPL(iommu_group_add_device);
> 
> +/*
> + * Remove a device from a group's device list and return the group device
> + * if successful.
> + */
> +static struct group_device *
> +__iommu_group_remove_device(struct iommu_group *group, struct device *dev)
> +{
> +    struct group_device *device;
> +
> +    lockdep_assert_held(&group->mutex);
> +    list_for_each_entry(device, &group->devices, list) {
> +        if (device->dev == dev) {
> +            list_del(&device->list);
> +            return device;
> +        }
> +    }
> +
> +    return NULL;
> +}
> +
> +/*
> + * Release a device from its group and decrements the iommu group 
> reference
> + * count.
> + */
> +static void __iommu_group_release_device(struct iommu_group *group,
> +                     struct group_device *grp_dev)
> +{
> +    struct device *dev = grp_dev->dev;
> +
> +    sysfs_remove_link(group->devices_kobj, grp_dev->name);
> +    sysfs_remove_link(&dev->kobj, "iommu_group");
> +
> +    trace_remove_device_from_group(group->id, dev);
> +
> +    kfree(grp_dev->name);
> +    kfree(grp_dev);
> +    dev->iommu_group = NULL;
> +    kobject_put(group->devices_kobj);
> +}
> +
>   /**
>    * iommu_group_remove_device - remove a device from it's current group
>    * @dev: device to be removed
> @@ -1075,7 +1115,7 @@ EXPORT_SYMBOL_GPL(iommu_group_add_device);
>   void iommu_group_remove_device(struct device *dev)
>   {
>       struct iommu_group *group = dev->iommu_group;
> -    struct group_device *tmp_device, *device = NULL;
> +    struct group_device *device;
> 
>       if (!group)
>           return;
> @@ -1083,27 +1123,11 @@ void iommu_group_remove_device(struct device *dev)
>       dev_info(dev, "Removing from iommu group %d\n", group->id);
> 
>       mutex_lock(&group->mutex);
> -    list_for_each_entry(tmp_device, &group->devices, list) {
> -        if (tmp_device->dev == dev) {
> -            device = tmp_device;
> -            list_del(&device->list);
> -            break;
> -        }
> -    }
> +    device = __iommu_group_remove_device(group, dev);
>       mutex_unlock(&group->mutex);
> 
> -    if (!device)
> -        return;
> -
> -    sysfs_remove_link(group->devices_kobj, device->name);
> -    sysfs_remove_link(&dev->kobj, "iommu_group");
> -
> -    trace_remove_device_from_group(group->id, dev);
> -
> -    kfree(device->name);
> -    kfree(device);
> -    dev->iommu_group = NULL;
> -    kobject_put(group->devices_kobj);
> +    if (device)
> +        __iommu_group_release_device(group, device);
>   }
>   EXPORT_SYMBOL_GPL(iommu_group_remove_device);
> 
> 
> Step 2: Put removing group and release_device in a same critical region:
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 17b2e358a6fd..eeb2907472bc 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -101,6 +101,10 @@ static int 
> iommu_create_device_direct_mappings(struct iommu_group *group,
>   static struct iommu_group *iommu_group_get_for_dev(struct device *dev);
>   static ssize_t iommu_group_store_type(struct iommu_group *group,
>                         const char *buf, size_t count);
> +static struct group_device *
> +__iommu_group_remove_device(struct iommu_group *group, struct device 
> *dev);
> +static void __iommu_group_release_device(struct iommu_group *group,
> +                     struct group_device *grp_dev);
> 
>   #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)        \
>   struct iommu_group_attribute iommu_group_attr_##_name =        \
> @@ -466,18 +470,25 @@ int iommu_probe_device(struct device *dev)
> 
>   void iommu_release_device(struct device *dev)
>   {
> +    struct iommu_group *group = dev->iommu_group;
> +    struct group_device *device;
>       const struct iommu_ops *ops;
> 
> -    if (!dev->iommu)
> +    if (!dev->iommu || !group)
>           return;
> 
>       iommu_device_unlink(dev->iommu->iommu_dev, dev);
> 
> +    mutex_lock(&group->mutex);
>       ops = dev_iommu_ops(dev);
>       if (ops->release_device)
>           ops->release_device(dev);
> +    device = __iommu_group_remove_device(group, dev);
> +    mutex_unlock(&group->mutex);
> +
> +    if (device)
> +        __iommu_group_release_device(group, device);
> 
> -    iommu_group_remove_device(dev);
>       module_put(ops->owner);
>       dev_iommu_free(dev);
>   }
> 
> Best regards,
> baolu

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] iommu: Use group ownership to avoid driver attachment
  2023-02-15  5:51     ` Baolu Lu
  2023-02-15  6:56       ` Tian, Kevin
@ 2023-02-15 12:56       ` Jason Gunthorpe
  2023-02-16  0:36         ` Baolu Lu
  1 sibling, 1 reply; 17+ messages in thread
From: Jason Gunthorpe @ 2023-02-15 12:56 UTC (permalink / raw)
  To: Baolu Lu
  Cc: iommu, Joerg Roedel, Christoph Hellwig, Kevin Tian, Will Deacon,
	Robin Murphy, linux-kernel

On Wed, Feb 15, 2023 at 01:51:14PM +0800, Baolu Lu wrote:
> On 2/13/23 10:19 PM, Jason Gunthorpe wrote:
> > On Mon, Feb 13, 2023 at 03:49:39PM +0800, Lu Baolu wrote:
> > > The iommu_group_store_type() requires the devices in the iommu group are
> > > not bound to any device driver during the whole operation. The existing
> > > code locks the device with device_lock(dev) and use device_is_bound() to
> > > check whether any driver is bound to device.
> > > 
> > > In fact, this can be achieved through the DMA ownership helpers. Replace
> > > them with iommu_group_claim/release_dma_owner() helpers.
> > > 
> > > Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> > > ---
> > >   drivers/iommu/iommu.c | 27 +++++++++++++--------------
> > >   1 file changed, 13 insertions(+), 14 deletions(-)
> > > 
> > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > > index 4f71dcd2621b..6547cb38480c 100644
> > > --- a/drivers/iommu/iommu.c
> > > +++ b/drivers/iommu/iommu.c
> > > @@ -2807,12 +2807,6 @@ static int iommu_change_dev_def_domain(struct iommu_group *group,
> > >   	mutex_lock(&group->mutex);
> > > -	if (group->default_domain != group->domain) {
> > > -		dev_err_ratelimited(prev_dev, "Group not assigned to default domain\n");
> > > -		ret = -EBUSY;
> > > -		goto out;
> > > -	}
> > > -
> > >   	/*
> > >   	 * iommu group wasn't locked while acquiring device lock in
> > >   	 * iommu_group_store_type(). So, make sure that the device count hasn't
> > > @@ -2971,6 +2965,7 @@ static void iommu_group_unfreeze_dev_ops(struct iommu_group *group)
> > >   static ssize_t iommu_group_store_type(struct iommu_group *group,
> > >   				      const char *buf, size_t count)
> > >   {
> > > +	bool group_owner_claimed = false;
> > >   	struct group_device *grp_dev;
> > >   	struct device *dev;
> > >   	int ret, req_type;
> > > @@ -2992,6 +2987,14 @@ static ssize_t iommu_group_store_type(struct iommu_group *group,
> > >   	else
> > >   		return -EINVAL;
> > > +	if (req_type != IOMMU_DOMAIN_DMA_FQ ||
> > > +	    group->default_domain->type != IOMMU_DOMAIN_DMA) {
> > > +		ret = iommu_group_claim_dma_owner(group, (void *)buf);
> > > +		if (ret)
> > > +			return ret;
> > > +		group_owner_claimed = true;
> > > +	}
> > 
> > I don't get it, this should be done unconditionally. If we couldn't
> > take ownership then we simply can't progress.
> 
> The existing code allows the user to switch the default domain from
> strict to lazy invalidation mode. The default domain is not changed,
> hence it should be seamless and transparent to the device driver.

So make that a special case, get the group lock check if it is this
case and then just adjust it and exit, otherwise get ownership under
the group lock as discussed.
> 
> > which also means this needs to be
> > an externally version of iommu_group_claim_dma_owner()
> 
> Sorry! What does "an externally version of
> iommu_group_claim_dma_owner()" mean?
> 

Oops "externally locked"

> My understanding is that we should limit iommu_group_claim_dma_owner()
> use in the driver context. For this non-driver context, we should not
> use iommu_group_claim_dma_owner() directly, but hold the group->mutex
> and check the group->owner_cnt directly:
> 
>         mutex_lock(&group->mutex);
>         if (group->owner_cnt) {
>                 ret = -EPERM;
>                 goto unlock_out;
>         }
> 
> the group->mutex should be held until everything is done.

Yes, that would be fine as long as we can hold the group mutex
throughout

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] iommu: Use group ownership to avoid driver attachment
  2023-02-15 12:56       ` Jason Gunthorpe
@ 2023-02-16  0:36         ` Baolu Lu
  0 siblings, 0 replies; 17+ messages in thread
From: Baolu Lu @ 2023-02-16  0:36 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, iommu, Joerg Roedel, Christoph Hellwig, Kevin Tian,
	Will Deacon, Robin Murphy, linux-kernel

On 2/15/23 8:56 PM, Jason Gunthorpe wrote:
> On Wed, Feb 15, 2023 at 01:51:14PM +0800, Baolu Lu wrote:
>> On 2/13/23 10:19 PM, Jason Gunthorpe wrote:
>>> On Mon, Feb 13, 2023 at 03:49:39PM +0800, Lu Baolu wrote:
>>>> The iommu_group_store_type() requires the devices in the iommu group are
>>>> not bound to any device driver during the whole operation. The existing
>>>> code locks the device with device_lock(dev) and use device_is_bound() to
>>>> check whether any driver is bound to device.
>>>>
>>>> In fact, this can be achieved through the DMA ownership helpers. Replace
>>>> them with iommu_group_claim/release_dma_owner() helpers.
>>>>
>>>> Signed-off-by: Lu Baolu<baolu.lu@linux.intel.com>
>>>> ---
>>>>    drivers/iommu/iommu.c | 27 +++++++++++++--------------
>>>>    1 file changed, 13 insertions(+), 14 deletions(-)
>>>>
>>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>>> index 4f71dcd2621b..6547cb38480c 100644
>>>> --- a/drivers/iommu/iommu.c
>>>> +++ b/drivers/iommu/iommu.c
>>>> @@ -2807,12 +2807,6 @@ static int iommu_change_dev_def_domain(struct iommu_group *group,
>>>>    	mutex_lock(&group->mutex);
>>>> -	if (group->default_domain != group->domain) {
>>>> -		dev_err_ratelimited(prev_dev, "Group not assigned to default domain\n");
>>>> -		ret = -EBUSY;
>>>> -		goto out;
>>>> -	}
>>>> -
>>>>    	/*
>>>>    	 * iommu group wasn't locked while acquiring device lock in
>>>>    	 * iommu_group_store_type(). So, make sure that the device count hasn't
>>>> @@ -2971,6 +2965,7 @@ static void iommu_group_unfreeze_dev_ops(struct iommu_group *group)
>>>>    static ssize_t iommu_group_store_type(struct iommu_group *group,
>>>>    				      const char *buf, size_t count)
>>>>    {
>>>> +	bool group_owner_claimed = false;
>>>>    	struct group_device *grp_dev;
>>>>    	struct device *dev;
>>>>    	int ret, req_type;
>>>> @@ -2992,6 +2987,14 @@ static ssize_t iommu_group_store_type(struct iommu_group *group,
>>>>    	else
>>>>    		return -EINVAL;
>>>> +	if (req_type != IOMMU_DOMAIN_DMA_FQ ||
>>>> +	    group->default_domain->type != IOMMU_DOMAIN_DMA) {
>>>> +		ret = iommu_group_claim_dma_owner(group, (void *)buf);
>>>> +		if (ret)
>>>> +			return ret;
>>>> +		group_owner_claimed = true;
>>>> +	}
>>> I don't get it, this should be done unconditionally. If we couldn't
>>> take ownership then we simply can't progress.
>> The existing code allows the user to switch the default domain from
>> strict to lazy invalidation mode. The default domain is not changed,
>> hence it should be seamless and transparent to the device driver.
> So make that a special case, get the group lock check if it is this
> case and then just adjust it and exit, otherwise get ownership under
> the group lock as discussed.

OK. Will do like this in the next version.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] iommu: Add dev_iommu->ops_rwsem
  2023-02-15 11:24       ` Robin Murphy
@ 2023-02-16  0:40         ` Baolu Lu
  0 siblings, 0 replies; 17+ messages in thread
From: Baolu Lu @ 2023-02-16  0:40 UTC (permalink / raw)
  To: Robin Murphy, Jason Gunthorpe
  Cc: baolu.lu, iommu, Joerg Roedel, Christoph Hellwig, Kevin Tian,
	Will Deacon, linux-kernel

On 2/15/23 7:24 PM, Robin Murphy wrote:
> On 2023-02-15 05:34, Baolu Lu wrote:
>> On 2/13/23 10:16 PM, Jason Gunthorpe wrote:
>>> On Mon, Feb 13, 2023 at 03:49:38PM +0800, Lu Baolu wrote:
>>>
>>>> +static int iommu_group_freeze_dev_ops(struct iommu_group *group)
>>>> +{
>>>> +    struct group_device *device;
>>>> +    struct device *dev;
>>>> +
>>>> +    mutex_lock(&group->mutex);
>>>> +    list_for_each_entry(device, &group->devices, list) {
>>>> +        dev = device->dev;
>>>> +        down_read(&dev->iommu->ops_rwsem);
>>>
>>> This isn't allowed, you can't obtain locks in a loop like this, it
>>> will deadlock.
>>>
>>> You don't need more locks, we already have the group mutex, the
>>> release path should be fixed to use it properly as I was trying to do 
>>> here:
>>>
>>> https://lore.kernel.org/kvm/4-v1-ef00ffecea52+2cb-iommu_group_lifetime_jgg@nvidia.com/
>>> https://lore.kernel.org/kvm/YyyTxx0HnA3maxEk@nvidia.com/
>>>
>>> Then what you'd do in a path like this is:
>>>
>>>    mutex_lock(&group->mutex);
>>>    dev = iommu_group_first_device(group)
>>>    if (!dev)
>>>       /* Racing with group cleanup */
>>>       return -EINVAL;
>>>    /* Now dev->ops is valid and must remain valid so long as
>>>       group->mutex is held */
>>>
>>> The only reason this doesn't work already is because of the above
>>> stuff about release not holding the group mutex when manipulating the
>>> devices in the group. This is simply mis-locked.
>>>
>>> Robin explained it was done like this because
>>> arm_iommu_detach_device() re-enters the iommu core during release_dev,
>>> so I suggest fixing that by simply not doing that (see above)
>>>
>>> Below is the lastest attempt I had tried, I didn't have time to get back
>>> to it and we fixed the bug another way. It needs a bit of adjusting to
>>> also remove the device from the group after release is called within
>>> the same mutex critical region.
>>
>> Yes. If we can make remove device from list and device release in the
>> same mutex critical region, we don't need any other lock mechanism
>> anymore.
>>
>> The ipmmu driver supports default domain.
> 
> It supports default domains *on arm64*. Nothing on 32-bit ARM uses 
> default domains, they won't even exist since iommu-dma is not enabled, 
> and either way the ARM DMA ops don't understand groups. I don't see an 
> obvious satisfactory solution while the arm_iommu_* APIs still exist, 
> but if we need bodges in the interim could we please try to concentrate 
> them in ARM-specific code?

Yes, sure. Thanks for the guide.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] iommu: Use group ownership to avoid driver attachment
  2023-02-15 11:09           ` Robin Murphy
@ 2023-02-16  0:42             ` Baolu Lu
  0 siblings, 0 replies; 17+ messages in thread
From: Baolu Lu @ 2023-02-16  0:42 UTC (permalink / raw)
  To: Robin Murphy, Tian, Kevin, Jason Gunthorpe
  Cc: baolu.lu, iommu, Joerg Roedel, Christoph Hellwig, Will Deacon,
	linux-kernel

On 2/15/23 7:09 PM, Robin Murphy wrote:
> On 2023-02-15 07:28, Baolu Lu wrote:
>> On 2023/2/15 14:56, Tian, Kevin wrote:
>>>> From: Baolu Lu<baolu.lu@linux.intel.com>
>>>> Sent: Wednesday, February 15, 2023 1:51 PM
>>>>
>>>> On 2/13/23 10:19 PM, Jason Gunthorpe wrote:
>>>>> On Mon, Feb 13, 2023 at 03:49:39PM +0800, Lu Baolu wrote:
>>>>>> @@ -2992,6 +2987,14 @@ static ssize_t iommu_group_store_type(struct
>>>> iommu_group *group,
>>>>>>        else
>>>>>>            return -EINVAL;
>>>>>>
>>>>>> +    if (req_type != IOMMU_DOMAIN_DMA_FQ ||
>>>>>> +        group->default_domain->type != IOMMU_DOMAIN_DMA) {
>>>>>> +        ret = iommu_group_claim_dma_owner(group, (void *)buf);
>>>>>> +        if (ret)
>>>>>> +            return ret;
>>>>>> +        group_owner_claimed = true;
>>>>>> +    }
>>>>> I don't get it, this should be done unconditionally. If we couldn't
>>>>> take ownership then we simply can't progress.
>>>> The existing code allows the user to switch the default domain from
>>>> strict to lazy invalidation mode. The default domain is not changed,
>>>> hence it should be seamless and transparent to the device driver.
>>> Is there real usage relying on this transition for a bound device?
>>>
>>> In concept strict->lazy transition implies relaxed DMA security. It's 
>>> hard
>>> to think of a motivation of doing so while the device might be doing
>>> in-fly DMAs.
>>>
>>> Presumably such perf/security tradeoff should be planned way before
>>> binding device/driver together.
>>>
>>> btw if strict->lazy is allowed why lazy->strict is prohibited?
>>>
>>
>> We all know, strict vs. lazy is a tradeoff between performance and
>> security.
>>
>> strict -> lazy: driver works in secure mode. This transition trades off
>> security for better performance.
>>
>> lazy->strict: The driver is already working in non-safety mode. This
>> transition only results in worse performance. It makes no sense. If user
>> want to put the driver in a secure mode, they need to unbind the driver,
>> reset the device and do the lazy->strict transition.
>>
>> Robin might have better insights.
> 
> Yes, this was added for a definite use-case in ChromeOS, where 
> strict->lazy needs to support being done "live" since the device in 
> question is the storage controller for the already-mounted root 
> filesystem.

Thanks for letting us know this.

> Your reasoning seems to match what I summarised in the 
> original commit message 😄

Haha, it seems that my memory is till good. :-)

Best regards,
baolu

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-02-16  0:51 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-13  7:49 [PATCH 0/4] iommu: Extend changing default domain to normal group Lu Baolu
2023-02-13  7:49 ` [PATCH 1/4] iommu: Add dev_iommu->ops_rwsem Lu Baolu
2023-02-13 14:16   ` Jason Gunthorpe
2023-02-15  5:34     ` Baolu Lu
2023-02-15 11:24       ` Robin Murphy
2023-02-16  0:40         ` Baolu Lu
2023-02-13  7:49 ` [PATCH 2/4] iommu: Use group ownership to avoid driver attachment Lu Baolu
2023-02-13 14:19   ` Jason Gunthorpe
2023-02-15  5:51     ` Baolu Lu
2023-02-15  6:56       ` Tian, Kevin
2023-02-15  7:28         ` Baolu Lu
2023-02-15 11:09           ` Robin Murphy
2023-02-16  0:42             ` Baolu Lu
2023-02-15 12:56       ` Jason Gunthorpe
2023-02-16  0:36         ` Baolu Lu
2023-02-13  7:49 ` [PATCH 3/4] iommu: Remove unnecessary device_lock() Lu Baolu
2023-02-13  7:49 ` [PATCH 4/4] iommu: Cleanup iommu_change_dev_def_domain() Lu Baolu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).