linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] iommufd: Remove iommufd_hw_pagetable_has_group
@ 2023-01-28 21:18 Nicolin Chen
  2023-01-28 21:18 ` [PATCH v2 1/3] iommufd: Add devices_users to track the hw_pagetable usage by device Nicolin Chen
                   ` (2 more replies)
  0 siblings, 3 replies; 44+ messages in thread
From: Nicolin Chen @ 2023-01-28 21:18 UTC (permalink / raw)
  To: jgg, kevin.tian; +Cc: yi.l.liu, iommu, linux-kernel

The iommufd_hw_pagetable_has_group is not a device-centric API and has
been a bit of a hack. And it needs to keep tracking an attached device
list on the hw_pagetable, and a device lock to protect the device list.

However, the coming domain replacement series can overcomplicate this
list/lock solution, especially to handle nested hw_pagetable use cases.
So, as a preparatory series, remove the device list/lock and also fix
the iommufd_hw_pagetable_has_group hack.

The iommufd_hw_pagetable_has_group() using the device list could be
replaced with a domain-pointer comparison between the hwpt->domain and
iommu_get_domain_for_dev(). The piece of dependency on list_empty() of
the device list can be also replaced with a refcount. Yet, the removal
of the device lock might introduce a race condition, so the ioas mutex
can be moved as an alternative protection.

You can also find this series on Github:
https://github.com/nicolinc/iommufd/commits/remove_iommufd_hw_pagetable_has_group

Changelog
v1->v2:
 * Fixed a copy-n-paste mistake at a lockdep_assert_held() line

Thanks
Nicolin Chen

Nicolin Chen (2):
  iommufd/device: Make hwpt_list list_add/del symmetric
  iommufd/device: Change iommufd_hw_pagetable_has_group to device
    centric

Yi Liu (1):
  iommufd: Add devices_users to track the hw_pagetable usage by device

 drivers/iommu/iommufd/device.c          | 72 ++++++++++---------------
 drivers/iommu/iommufd/hw_pagetable.c    | 16 ++++--
 drivers/iommu/iommufd/iommufd_private.h |  3 +-
 3 files changed, 40 insertions(+), 51 deletions(-)

-- 
2.39.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 1/3] iommufd: Add devices_users to track the hw_pagetable usage by device
  2023-01-28 21:18 [PATCH v2 0/3] iommufd: Remove iommufd_hw_pagetable_has_group Nicolin Chen
@ 2023-01-28 21:18 ` Nicolin Chen
  2023-01-29  9:23   ` Tian, Kevin
  2023-01-30 15:02   ` Jason Gunthorpe
  2023-01-28 21:18 ` [PATCH v2 2/3] iommufd/device: Make hwpt_list list_add/del symmetric Nicolin Chen
  2023-01-28 21:18 ` [PATCH v2 3/3] iommufd/device: Change iommufd_hw_pagetable_has_group to device centric Nicolin Chen
  2 siblings, 2 replies; 44+ messages in thread
From: Nicolin Chen @ 2023-01-28 21:18 UTC (permalink / raw)
  To: jgg, kevin.tian; +Cc: yi.l.liu, iommu, linux-kernel

From: Yi Liu <yi.l.liu@intel.com>

Currently, hw_pagetable tracks the attached devices using a device list.
When attaching the first device to the kernel-managed hw_pagetable, it
should be linked to IOAS. When detaching the last device from this hwpt,
the link with IOAS should be removed too. And this first-or-last device
check is done with list_empty(hwpt->devices).

However, with a nested configuration, when a device is attached to the
user-managed stage-1 hw_pagetable, it will be added to this user-managed
hwpt's device list instead of the kernel-managed stage-2 hwpt's one. And
this breaks the logic for a kernel-managed hw_pagetable link/disconnect
to/from IOAS/IOPT. e.g. the stage-2 hw_pagetable would be linked to IOAS
multiple times if multiple device is attached, but it will become empty
as soon as one device detached.

Add a devices_users in struct iommufd_hw_pagetable to track the users of
hw_pagetable by the attached devices. Make this field as a pointer, only
allocate for a stage-2 hw_pagetable. A stage-1 hw_pagetable should reuse
the stage-2 hw_pagetable's devices_users, because when a device attaches
to a stage-1 hw_pagetable, linking the stage-2 hwpt to the IOAS is still
required. So, with a nested configuration, increase the devices_users on
the stage-2 (parent) hwpt, no matter a device is attached to the stage-1
or the stage-2 hwpt.

Adding this devices_users also reduces the dependency on the device list,
so it helps the following patch to remove the device list completely.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/iommufd/device.c          |  8 +++++---
 drivers/iommu/iommufd/hw_pagetable.c    | 11 +++++++++++
 drivers/iommu/iommufd/iommufd_private.h |  1 +
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 9f3b9674d72e..208757c39c90 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -212,7 +212,7 @@ static int iommufd_device_do_attach(struct iommufd_device *idev,
 				hwpt->domain->ops->enforce_cache_coherency(
 					hwpt->domain);
 		if (!hwpt->enforce_cache_coherency) {
-			WARN_ON(list_empty(&hwpt->devices));
+			WARN_ON(refcount_read(hwpt->devices_users) == 1);
 			rc = -EINVAL;
 			goto out_unlock;
 		}
@@ -236,7 +236,7 @@ static int iommufd_device_do_attach(struct iommufd_device *idev,
 		if (rc)
 			goto out_iova;
 
-		if (list_empty(&hwpt->devices)) {
+		if (refcount_read(hwpt->devices_users) == 1) {
 			rc = iopt_table_add_domain(&hwpt->ioas->iopt,
 						   hwpt->domain);
 			if (rc)
@@ -246,6 +246,7 @@ static int iommufd_device_do_attach(struct iommufd_device *idev,
 
 	idev->hwpt = hwpt;
 	refcount_inc(&hwpt->obj.users);
+	refcount_inc(hwpt->devices_users);
 	list_add(&idev->devices_item, &hwpt->devices);
 	mutex_unlock(&hwpt->devices_lock);
 	return 0;
@@ -387,9 +388,10 @@ void iommufd_device_detach(struct iommufd_device *idev)
 
 	mutex_lock(&hwpt->ioas->mutex);
 	mutex_lock(&hwpt->devices_lock);
+	refcount_dec(hwpt->devices_users);
 	list_del(&idev->devices_item);
 	if (!iommufd_hw_pagetable_has_group(hwpt, idev->group)) {
-		if (list_empty(&hwpt->devices)) {
+		if (refcount_read(hwpt->devices_users) == 1) {
 			iopt_table_remove_domain(&hwpt->ioas->iopt,
 						 hwpt->domain);
 			list_del(&hwpt->hwpt_item);
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 43d473989a06..910e759ffeac 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -16,6 +16,8 @@ void iommufd_hw_pagetable_destroy(struct iommufd_object *obj)
 	iommu_domain_free(hwpt->domain);
 	refcount_dec(&hwpt->ioas->obj.users);
 	mutex_destroy(&hwpt->devices_lock);
+	WARN_ON(!refcount_dec_if_one(hwpt->devices_users));
+	kfree(hwpt->devices_users);
 }
 
 /**
@@ -46,11 +48,20 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 	INIT_LIST_HEAD(&hwpt->devices);
 	INIT_LIST_HEAD(&hwpt->hwpt_item);
 	mutex_init(&hwpt->devices_lock);
+	hwpt->devices_users = kzalloc(sizeof(*hwpt->devices_users), GFP_KERNEL);
+	if (!hwpt->devices_users) {
+		rc = -ENOMEM;
+		goto out_free_domain;
+	}
+	refcount_set(hwpt->devices_users, 1);
+
 	/* Pairs with iommufd_hw_pagetable_destroy() */
 	refcount_inc(&ioas->obj.users);
 	hwpt->ioas = ioas;
 	return hwpt;
 
+out_free_domain:
+	iommu_domain_free(hwpt->domain);
 out_abort:
 	iommufd_object_abort(ictx, &hwpt->obj);
 	return ERR_PTR(rc);
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 222e86591f8a..f128d77fb076 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -247,6 +247,7 @@ struct iommufd_hw_pagetable {
 	/* Head at iommufd_ioas::hwpt_list */
 	struct list_head hwpt_item;
 	struct mutex devices_lock;
+	refcount_t *devices_users;
 	struct list_head devices;
 };
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 2/3] iommufd/device: Make hwpt_list list_add/del symmetric
  2023-01-28 21:18 [PATCH v2 0/3] iommufd: Remove iommufd_hw_pagetable_has_group Nicolin Chen
  2023-01-28 21:18 ` [PATCH v2 1/3] iommufd: Add devices_users to track the hw_pagetable usage by device Nicolin Chen
@ 2023-01-28 21:18 ` Nicolin Chen
  2023-01-29  9:24   ` Tian, Kevin
  2023-01-30 14:59   ` Jason Gunthorpe
  2023-01-28 21:18 ` [PATCH v2 3/3] iommufd/device: Change iommufd_hw_pagetable_has_group to device centric Nicolin Chen
  2 siblings, 2 replies; 44+ messages in thread
From: Nicolin Chen @ 2023-01-28 21:18 UTC (permalink / raw)
  To: jgg, kevin.tian; +Cc: yi.l.liu, iommu, linux-kernel

Since the list_del() of hwpt_item is done in iommufd_device_detach(), move
its list_add_tail() to a similar place in iommufd_device_do_attach().

Also move and place the mutex outside the iommufd_device_auto_get_domain()
and iommufd_device_do_attach() calls, to serialize attach/detach routines.
This adds an additional locking protection so that the following patch can
safely remove devices_lock.

Co-developed-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/iommufd/device.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 208757c39c90..9375fcac884c 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -198,6 +198,8 @@ static int iommufd_device_do_attach(struct iommufd_device *idev,
 	phys_addr_t sw_msi_start = PHYS_ADDR_MAX;
 	int rc;
 
+	lockdep_assert_held(&hwpt->ioas->mutex);
+
 	mutex_lock(&hwpt->devices_lock);
 
 	/*
@@ -241,6 +243,7 @@ static int iommufd_device_do_attach(struct iommufd_device *idev,
 						   hwpt->domain);
 			if (rc)
 				goto out_detach;
+			list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list);
 		}
 	}
 
@@ -271,12 +274,13 @@ static int iommufd_device_auto_get_domain(struct iommufd_device *idev,
 	struct iommufd_hw_pagetable *hwpt;
 	int rc;
 
+	lockdep_assert_held(&ioas->mutex);
+
 	/*
 	 * There is no differentiation when domains are allocated, so any domain
 	 * that is willing to attach to the device is interchangeable with any
 	 * other.
 	 */
-	mutex_lock(&ioas->mutex);
 	list_for_each_entry(hwpt, &ioas->hwpt_list, hwpt_item) {
 		if (!hwpt->auto_domain)
 			continue;
@@ -290,29 +294,23 @@ static int iommufd_device_auto_get_domain(struct iommufd_device *idev,
 		 */
 		if (rc == -EINVAL)
 			continue;
-		goto out_unlock;
+		return rc;
 	}
 
 	hwpt = iommufd_hw_pagetable_alloc(idev->ictx, ioas, idev->dev);
-	if (IS_ERR(hwpt)) {
-		rc = PTR_ERR(hwpt);
-		goto out_unlock;
-	}
+	if (IS_ERR(hwpt))
+		return PTR_ERR(hwpt);
 	hwpt->auto_domain = true;
 
 	rc = iommufd_device_do_attach(idev, hwpt);
 	if (rc)
 		goto out_abort;
-	list_add_tail(&hwpt->hwpt_item, &ioas->hwpt_list);
 
-	mutex_unlock(&ioas->mutex);
 	iommufd_object_finalize(idev->ictx, &hwpt->obj);
 	return 0;
 
 out_abort:
 	iommufd_object_abort_and_destroy(idev->ictx, &hwpt->obj);
-out_unlock:
-	mutex_unlock(&ioas->mutex);
 	return rc;
 }
 
@@ -342,20 +340,20 @@ int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id)
 		struct iommufd_hw_pagetable *hwpt =
 			container_of(pt_obj, struct iommufd_hw_pagetable, obj);
 
+		mutex_lock(&hwpt->ioas->mutex);
 		rc = iommufd_device_do_attach(idev, hwpt);
+		mutex_unlock(&hwpt->ioas->mutex);
 		if (rc)
 			goto out_put_pt_obj;
-
-		mutex_lock(&hwpt->ioas->mutex);
-		list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list);
-		mutex_unlock(&hwpt->ioas->mutex);
 		break;
 	}
 	case IOMMUFD_OBJ_IOAS: {
 		struct iommufd_ioas *ioas =
 			container_of(pt_obj, struct iommufd_ioas, obj);
 
+		mutex_lock(&ioas->mutex);
 		rc = iommufd_device_auto_get_domain(idev, ioas);
+		mutex_unlock(&ioas->mutex);
 		if (rc)
 			goto out_put_pt_obj;
 		break;
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 3/3] iommufd/device: Change iommufd_hw_pagetable_has_group to device centric
  2023-01-28 21:18 [PATCH v2 0/3] iommufd: Remove iommufd_hw_pagetable_has_group Nicolin Chen
  2023-01-28 21:18 ` [PATCH v2 1/3] iommufd: Add devices_users to track the hw_pagetable usage by device Nicolin Chen
  2023-01-28 21:18 ` [PATCH v2 2/3] iommufd/device: Make hwpt_list list_add/del symmetric Nicolin Chen
@ 2023-01-28 21:18 ` Nicolin Chen
  2023-01-29  9:37   ` Tian, Kevin
  2 siblings, 1 reply; 44+ messages in thread
From: Nicolin Chen @ 2023-01-28 21:18 UTC (permalink / raw)
  To: jgg, kevin.tian; +Cc: yi.l.liu, iommu, linux-kernel

The iommufd_hw_pagetable_has_group() is a little hack to check whether
a hw_pagetable has an idev->group attached. This isn't device-centric
firstly, but also requires the hw_pagetable to maintain a device list
with a devices_lock, which gets overcomplicated among the routines for
different use cases, upcoming nested hwpts especially.

Since we can compare the iommu_group->domain and the hwpt->domain to
serve the same purpose while being device centric, change it and drop
the device list with the devices_lock accordingly.

Note that, in the detach routine, it previously checked !has_group as
there was a list_del on the device list. But now, the device is still
being attached to the hwpt, so the if logic gets inverted.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/iommufd/device.c          | 40 ++++++++-----------------
 drivers/iommu/iommufd/hw_pagetable.c    |  5 ----
 drivers/iommu/iommufd/iommufd_private.h |  2 --
 3 files changed, 12 insertions(+), 35 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 9375fcac884c..f582e59cc51c 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -24,8 +24,6 @@ struct iommufd_device {
 	struct iommufd_object obj;
 	struct iommufd_ctx *ictx;
 	struct iommufd_hw_pagetable *hwpt;
-	/* Head at iommufd_hw_pagetable::devices */
-	struct list_head devices_item;
 	/* always the physical device */
 	struct device *dev;
 	struct iommu_group *group;
@@ -181,15 +179,15 @@ static int iommufd_device_setup_msi(struct iommufd_device *idev,
 	return 0;
 }
 
-static bool iommufd_hw_pagetable_has_group(struct iommufd_hw_pagetable *hwpt,
-					   struct iommu_group *group)
+static bool iommufd_hw_pagetable_has_device(struct iommufd_hw_pagetable *hwpt,
+					    struct device *dev)
 {
-	struct iommufd_device *cur_dev;
-
-	list_for_each_entry(cur_dev, &hwpt->devices, devices_item)
-		if (cur_dev->group == group)
-			return true;
-	return false;
+	/*
+	 * iommu_get_domain_for_dev() returns an iommu_group->domain ptr, if it
+	 * is the same domain as the hwpt->domain, it means that this hwpt has
+	 * the iommu_group/device.
+	 */
+	return hwpt->domain == iommu_get_domain_for_dev(dev);
 }
 
 static int iommufd_device_do_attach(struct iommufd_device *idev,
@@ -200,8 +198,6 @@ static int iommufd_device_do_attach(struct iommufd_device *idev,
 
 	lockdep_assert_held(&hwpt->ioas->mutex);
 
-	mutex_lock(&hwpt->devices_lock);
-
 	/*
 	 * Try to upgrade the domain we have, it is an iommu driver bug to
 	 * report IOMMU_CAP_ENFORCE_CACHE_COHERENCY but fail
@@ -215,25 +211,20 @@ static int iommufd_device_do_attach(struct iommufd_device *idev,
 					hwpt->domain);
 		if (!hwpt->enforce_cache_coherency) {
 			WARN_ON(refcount_read(hwpt->devices_users) == 1);
-			rc = -EINVAL;
-			goto out_unlock;
+			return -EINVAL;
 		}
 	}
 
 	rc = iopt_table_enforce_group_resv_regions(&hwpt->ioas->iopt, idev->dev,
 						   idev->group, &sw_msi_start);
 	if (rc)
-		goto out_unlock;
+		return rc;
 
 	rc = iommufd_device_setup_msi(idev, hwpt, sw_msi_start);
 	if (rc)
 		goto out_iova;
 
-	/*
-	 * FIXME: Hack around missing a device-centric iommu api, only attach to
-	 * the group once for the first device that is in the group.
-	 */
-	if (!iommufd_hw_pagetable_has_group(hwpt, idev->group)) {
+	if (!iommufd_hw_pagetable_has_device(hwpt, idev->dev)) {
 		rc = iommu_attach_group(hwpt->domain, idev->group);
 		if (rc)
 			goto out_iova;
@@ -250,16 +241,12 @@ static int iommufd_device_do_attach(struct iommufd_device *idev,
 	idev->hwpt = hwpt;
 	refcount_inc(&hwpt->obj.users);
 	refcount_inc(hwpt->devices_users);
-	list_add(&idev->devices_item, &hwpt->devices);
-	mutex_unlock(&hwpt->devices_lock);
 	return 0;
 
 out_detach:
 	iommu_detach_group(hwpt->domain, idev->group);
 out_iova:
 	iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev);
-out_unlock:
-	mutex_unlock(&hwpt->devices_lock);
 	return rc;
 }
 
@@ -385,10 +372,8 @@ void iommufd_device_detach(struct iommufd_device *idev)
 	struct iommufd_hw_pagetable *hwpt = idev->hwpt;
 
 	mutex_lock(&hwpt->ioas->mutex);
-	mutex_lock(&hwpt->devices_lock);
 	refcount_dec(hwpt->devices_users);
-	list_del(&idev->devices_item);
-	if (!iommufd_hw_pagetable_has_group(hwpt, idev->group)) {
+	if (iommufd_hw_pagetable_has_device(hwpt, idev->dev)) {
 		if (refcount_read(hwpt->devices_users) == 1) {
 			iopt_table_remove_domain(&hwpt->ioas->iopt,
 						 hwpt->domain);
@@ -397,7 +382,6 @@ void iommufd_device_detach(struct iommufd_device *idev)
 		iommu_detach_group(hwpt->domain, idev->group);
 	}
 	iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev);
-	mutex_unlock(&hwpt->devices_lock);
 	mutex_unlock(&hwpt->ioas->mutex);
 
 	if (hwpt->auto_domain)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 910e759ffeac..868a126ff37d 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -11,11 +11,8 @@ void iommufd_hw_pagetable_destroy(struct iommufd_object *obj)
 	struct iommufd_hw_pagetable *hwpt =
 		container_of(obj, struct iommufd_hw_pagetable, obj);
 
-	WARN_ON(!list_empty(&hwpt->devices));
-
 	iommu_domain_free(hwpt->domain);
 	refcount_dec(&hwpt->ioas->obj.users);
-	mutex_destroy(&hwpt->devices_lock);
 	WARN_ON(!refcount_dec_if_one(hwpt->devices_users));
 	kfree(hwpt->devices_users);
 }
@@ -45,9 +42,7 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 		goto out_abort;
 	}
 
-	INIT_LIST_HEAD(&hwpt->devices);
 	INIT_LIST_HEAD(&hwpt->hwpt_item);
-	mutex_init(&hwpt->devices_lock);
 	hwpt->devices_users = kzalloc(sizeof(*hwpt->devices_users), GFP_KERNEL);
 	if (!hwpt->devices_users) {
 		rc = -ENOMEM;
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index f128d77fb076..1c8e59b37f46 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -246,9 +246,7 @@ struct iommufd_hw_pagetable {
 	bool msi_cookie : 1;
 	/* Head at iommufd_ioas::hwpt_list */
 	struct list_head hwpt_item;
-	struct mutex devices_lock;
 	refcount_t *devices_users;
-	struct list_head devices;
 };
 
 struct iommufd_hw_pagetable *
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* RE: [PATCH v2 1/3] iommufd: Add devices_users to track the hw_pagetable usage by device
  2023-01-28 21:18 ` [PATCH v2 1/3] iommufd: Add devices_users to track the hw_pagetable usage by device Nicolin Chen
@ 2023-01-29  9:23   ` Tian, Kevin
  2023-01-29  9:30     ` Nicolin Chen
  2023-01-30  2:22     ` Liu, Yi L
  2023-01-30 15:02   ` Jason Gunthorpe
  1 sibling, 2 replies; 44+ messages in thread
From: Tian, Kevin @ 2023-01-29  9:23 UTC (permalink / raw)
  To: Nicolin Chen, jgg; +Cc: Liu, Yi L, iommu, linux-kernel

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Sunday, January 29, 2023 5:18 AM
> 
> From: Yi Liu <yi.l.liu@intel.com>
> 
> Currently, hw_pagetable tracks the attached devices using a device list.
> When attaching the first device to the kernel-managed hw_pagetable, it
> should be linked to IOAS. When detaching the last device from this hwpt,
> the link with IOAS should be removed too. And this first-or-last device
> check is done with list_empty(hwpt->devices).
> 
> However, with a nested configuration, when a device is attached to the
> user-managed stage-1 hw_pagetable, it will be added to this user-managed
> hwpt's device list instead of the kernel-managed stage-2 hwpt's one. And
> this breaks the logic for a kernel-managed hw_pagetable link/disconnect
> to/from IOAS/IOPT. e.g. the stage-2 hw_pagetable would be linked to IOAS
> multiple times if multiple device is attached, but it will become empty
> as soon as one device detached.
> 
> Add a devices_users in struct iommufd_hw_pagetable to track the users of

device_users

> hw_pagetable by the attached devices. Make this field as a pointer, only
> allocate for a stage-2 hw_pagetable. A stage-1 hw_pagetable should reuse
> the stage-2 hw_pagetable's devices_users, because when a device attaches
> to a stage-1 hw_pagetable, linking the stage-2 hwpt to the IOAS is still
> required. So, with a nested configuration, increase the devices_users on
> the stage-2 (parent) hwpt, no matter a device is attached to the stage-1
> or the stage-2 hwpt.

Above is very confusing w/o seeing the full series of nesting support.

As a preparatory step this should focus on existing code and what this
series tries to achieve. e.g. I'd not make device_users a pointer here.
Do that incrementally when the nesting support comes.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH v2 2/3] iommufd/device: Make hwpt_list list_add/del symmetric
  2023-01-28 21:18 ` [PATCH v2 2/3] iommufd/device: Make hwpt_list list_add/del symmetric Nicolin Chen
@ 2023-01-29  9:24   ` Tian, Kevin
  2023-01-29  9:31     ` Nicolin Chen
  2023-01-30 14:59   ` Jason Gunthorpe
  1 sibling, 1 reply; 44+ messages in thread
From: Tian, Kevin @ 2023-01-29  9:24 UTC (permalink / raw)
  To: Nicolin Chen, jgg; +Cc: Liu, Yi L, iommu, linux-kernel

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Sunday, January 29, 2023 5:18 AM
> 
> Since the list_del() of hwpt_item is done in iommufd_device_detach(), move
> its list_add_tail() to a similar place in iommufd_device_do_attach().

It's clearer to say that because list_del() is together with
iopt_table_remove_domain() then it makes sense to have list_add_tail()
together with iopt_table_add_domain().

> 
> Also move and place the mutex outside the
> iommufd_device_auto_get_domain()
> and iommufd_device_do_attach() calls, to serialize attach/detach routines.
> This adds an additional locking protection so that the following patch can
> safely remove devices_lock.
> 
> Co-developed-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/3] iommufd: Add devices_users to track the hw_pagetable usage by device
  2023-01-29  9:23   ` Tian, Kevin
@ 2023-01-29  9:30     ` Nicolin Chen
  2023-01-29  9:39       ` Tian, Kevin
  2023-01-30  2:22     ` Liu, Yi L
  1 sibling, 1 reply; 44+ messages in thread
From: Nicolin Chen @ 2023-01-29  9:30 UTC (permalink / raw)
  To: Tian, Kevin; +Cc: jgg, Liu, Yi L, iommu, linux-kernel

On Sun, Jan 29, 2023 at 09:23:05AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Sunday, January 29, 2023 5:18 AM
> >
> > From: Yi Liu <yi.l.liu@intel.com>
> >
> > Currently, hw_pagetable tracks the attached devices using a device list.
> > When attaching the first device to the kernel-managed hw_pagetable, it
> > should be linked to IOAS. When detaching the last device from this hwpt,
> > the link with IOAS should be removed too. And this first-or-last device
> > check is done with list_empty(hwpt->devices).
> >
> > However, with a nested configuration, when a device is attached to the
> > user-managed stage-1 hw_pagetable, it will be added to this user-managed
> > hwpt's device list instead of the kernel-managed stage-2 hwpt's one. And
> > this breaks the logic for a kernel-managed hw_pagetable link/disconnect
> > to/from IOAS/IOPT. e.g. the stage-2 hw_pagetable would be linked to IOAS
> > multiple times if multiple device is attached, but it will become empty
> > as soon as one device detached.
> >
> > Add a devices_users in struct iommufd_hw_pagetable to track the users of
> 
> device_users

I assume you are suggesting a rename right? I can do that.

> > hw_pagetable by the attached devices. Make this field as a pointer, only
> > allocate for a stage-2 hw_pagetable. A stage-1 hw_pagetable should reuse
> > the stage-2 hw_pagetable's devices_users, because when a device attaches
> > to a stage-1 hw_pagetable, linking the stage-2 hwpt to the IOAS is still
> > required. So, with a nested configuration, increase the devices_users on
> > the stage-2 (parent) hwpt, no matter a device is attached to the stage-1
> > or the stage-2 hwpt.
> 
> Above is very confusing w/o seeing the full series of nesting support.
> 
> As a preparatory step this should focus on existing code and what this
> series tries to achieve. e.g. I'd not make device_users a pointer here.
> Do that incrementally when the nesting support comes.

OK. I will shift that part to the nesting series.

Thanks
Nic

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 2/3] iommufd/device: Make hwpt_list list_add/del symmetric
  2023-01-29  9:24   ` Tian, Kevin
@ 2023-01-29  9:31     ` Nicolin Chen
  0 siblings, 0 replies; 44+ messages in thread
From: Nicolin Chen @ 2023-01-29  9:31 UTC (permalink / raw)
  To: Tian, Kevin; +Cc: jgg, Liu, Yi L, iommu, linux-kernel

On Sun, Jan 29, 2023 at 09:24:38AM +0000, Tian, Kevin wrote:
> External email: Use caution opening links or attachments
> 
> 
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Sunday, January 29, 2023 5:18 AM
> >
> > Since the list_del() of hwpt_item is done in iommufd_device_detach(), move
> > its list_add_tail() to a similar place in iommufd_device_do_attach().
> 
> It's clearer to say that because list_del() is together with
> iopt_table_remove_domain() then it makes sense to have list_add_tail()
> together with iopt_table_add_domain().

OK. Will change that.

> >
> > Also move and place the mutex outside the
> > iommufd_device_auto_get_domain()
> > and iommufd_device_do_attach() calls, to serialize attach/detach routines.
> > This adds an additional locking protection so that the following patch can
> > safely remove devices_lock.
> >
> > Co-developed-by: Yi Liu <yi.l.liu@intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>

And add this too. Thanks!

Nicolin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH v2 3/3] iommufd/device: Change iommufd_hw_pagetable_has_group to device centric
  2023-01-28 21:18 ` [PATCH v2 3/3] iommufd/device: Change iommufd_hw_pagetable_has_group to device centric Nicolin Chen
@ 2023-01-29  9:37   ` Tian, Kevin
  2023-01-29 10:38     ` Nicolin Chen
  0 siblings, 1 reply; 44+ messages in thread
From: Tian, Kevin @ 2023-01-29  9:37 UTC (permalink / raw)
  To: Nicolin Chen, jgg; +Cc: Liu, Yi L, iommu, linux-kernel

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Sunday, January 29, 2023 5:18 AM
> 
> -static bool iommufd_hw_pagetable_has_group(struct
> iommufd_hw_pagetable *hwpt,
> -					   struct iommu_group *group)
> +static bool iommufd_hw_pagetable_has_device(struct
> iommufd_hw_pagetable *hwpt,
> +					    struct device *dev)
>  {
> -	struct iommufd_device *cur_dev;
> -
> -	list_for_each_entry(cur_dev, &hwpt->devices, devices_item)
> -		if (cur_dev->group == group)
> -			return true;
> -	return false;
> +	/*
> +	 * iommu_get_domain_for_dev() returns an iommu_group->domain
> ptr, if it
> +	 * is the same domain as the hwpt->domain, it means that this hwpt
> has
> +	 * the iommu_group/device.
> +	 */
> +	return hwpt->domain == iommu_get_domain_for_dev(dev);
>  }

Here we could have three scenarios:

1) the device is attached to blocked domain;
2) the device is attached to hwpt->domain;
3) the device is attached to another hwpt->domain;

if this function returns false then iommufd_device_do_attach() will attach
the device to the specified hwpt. But then it's wrong for 3).

Has 3) been denied in earlier path? If yes at least a WARN_ON for
case 3) makes sense here.

> @@ -385,10 +372,8 @@ void iommufd_device_detach(struct
> iommufd_device *idev)
>  	struct iommufd_hw_pagetable *hwpt = idev->hwpt;
> 
>  	mutex_lock(&hwpt->ioas->mutex);
> -	mutex_lock(&hwpt->devices_lock);
>  	refcount_dec(hwpt->devices_users);
> -	list_del(&idev->devices_item);
> -	if (!iommufd_hw_pagetable_has_group(hwpt, idev->group)) {
> +	if (iommufd_hw_pagetable_has_device(hwpt, idev->dev)) {
>  		if (refcount_read(hwpt->devices_users) == 1) {
>  			iopt_table_remove_domain(&hwpt->ioas->iopt,
>  						 hwpt->domain);
> @@ -397,7 +382,6 @@ void iommufd_device_detach(struct iommufd_device
> *idev)
>  		iommu_detach_group(hwpt->domain, idev->group);
>  	}

emmm how do we track last device detach in a group? Here the first
device detach already leads to group detach...

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH v2 1/3] iommufd: Add devices_users to track the hw_pagetable usage by device
  2023-01-29  9:30     ` Nicolin Chen
@ 2023-01-29  9:39       ` Tian, Kevin
  0 siblings, 0 replies; 44+ messages in thread
From: Tian, Kevin @ 2023-01-29  9:39 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: jgg, Liu, Yi L, iommu, linux-kernel

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Sunday, January 29, 2023 5:31 PM
> 
> > > Add a devices_users in struct iommufd_hw_pagetable to track the users
> of
> >
> > device_users
> 
> I assume you are suggesting a rename right? I can do that.
> 

yes

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 3/3] iommufd/device: Change iommufd_hw_pagetable_has_group to device centric
  2023-01-29  9:37   ` Tian, Kevin
@ 2023-01-29 10:38     ` Nicolin Chen
  2023-01-30  0:44       ` Tian, Kevin
  2023-01-30 10:22       ` Nicolin Chen
  0 siblings, 2 replies; 44+ messages in thread
From: Nicolin Chen @ 2023-01-29 10:38 UTC (permalink / raw)
  To: Tian, Kevin; +Cc: jgg, Liu, Yi L, iommu, linux-kernel

On Sun, Jan 29, 2023 at 09:37:00AM +0000, Tian, Kevin wrote:
> External email: Use caution opening links or attachments
> 
> 
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Sunday, January 29, 2023 5:18 AM
> >
> > -static bool iommufd_hw_pagetable_has_group(struct
> > iommufd_hw_pagetable *hwpt,
> > -                                        struct iommu_group *group)
> > +static bool iommufd_hw_pagetable_has_device(struct
> > iommufd_hw_pagetable *hwpt,
> > +                                         struct device *dev)
> >  {
> > -     struct iommufd_device *cur_dev;
> > -
> > -     list_for_each_entry(cur_dev, &hwpt->devices, devices_item)
> > -             if (cur_dev->group == group)
> > -                     return true;
> > -     return false;
> > +     /*
> > +      * iommu_get_domain_for_dev() returns an iommu_group->domain
> > ptr, if it
> > +      * is the same domain as the hwpt->domain, it means that this hwpt
> > has
> > +      * the iommu_group/device.
> > +      */
> > +     return hwpt->domain == iommu_get_domain_for_dev(dev);
> >  }
> 
> Here we could have three scenarios:
> 
> 1) the device is attached to blocked domain;
> 2) the device is attached to hwpt->domain;
> 3) the device is attached to another hwpt->domain;
> 
> if this function returns false then iommufd_device_do_attach() will attach
> the device to the specified hwpt. But then it's wrong for 3).
> 
> Has 3) been denied in earlier path? If yes at least a WARN_ON for
> case 3) makes sense here.

The case #3 means the device is already attached to some other
domain? Then vfio_iommufd_physical_attach_ioas returns -EBUSY
at the sanity of vdev->iommufd_attached. And the case #3 feels
like a domain replacement use case to me. So probably not that
necessary to add a wARN_ON?

> > @@ -385,10 +372,8 @@ void iommufd_device_detach(struct
> > iommufd_device *idev)
> >       struct iommufd_hw_pagetable *hwpt = idev->hwpt;
> >
> >       mutex_lock(&hwpt->ioas->mutex);
> > -     mutex_lock(&hwpt->devices_lock);
> >       refcount_dec(hwpt->devices_users);
> > -     list_del(&idev->devices_item);
> > -     if (!iommufd_hw_pagetable_has_group(hwpt, idev->group)) {
> > +     if (iommufd_hw_pagetable_has_device(hwpt, idev->dev)) {
> >               if (refcount_read(hwpt->devices_users) == 1) {
> >                       iopt_table_remove_domain(&hwpt->ioas->iopt,
> >                                                hwpt->domain);
> > @@ -397,7 +382,6 @@ void iommufd_device_detach(struct iommufd_device
> > *idev)
> >               iommu_detach_group(hwpt->domain, idev->group);
> >       }
> 
> emmm how do we track last device detach in a group? Here the first
> device detach already leads to group detach...

Oh no. That's a bug. Thanks for catching it.

We need an additional refcount somewhere to track the number of
attached devices in the iommu_group.

Nicolin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH v2 3/3] iommufd/device: Change iommufd_hw_pagetable_has_group to device centric
  2023-01-29 10:38     ` Nicolin Chen
@ 2023-01-30  0:44       ` Tian, Kevin
  2023-01-30 10:22       ` Nicolin Chen
  1 sibling, 0 replies; 44+ messages in thread
From: Tian, Kevin @ 2023-01-30  0:44 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: jgg, Liu, Yi L, iommu, linux-kernel

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Sunday, January 29, 2023 6:39 PM
> 
> On Sun, Jan 29, 2023 at 09:37:00AM +0000, Tian, Kevin wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > Sent: Sunday, January 29, 2023 5:18 AM
> > >
> > > -static bool iommufd_hw_pagetable_has_group(struct
> > > iommufd_hw_pagetable *hwpt,
> > > -                                        struct iommu_group *group)
> > > +static bool iommufd_hw_pagetable_has_device(struct
> > > iommufd_hw_pagetable *hwpt,
> > > +                                         struct device *dev)
> > >  {
> > > -     struct iommufd_device *cur_dev;
> > > -
> > > -     list_for_each_entry(cur_dev, &hwpt->devices, devices_item)
> > > -             if (cur_dev->group == group)
> > > -                     return true;
> > > -     return false;
> > > +     /*
> > > +      * iommu_get_domain_for_dev() returns an iommu_group->domain
> > > ptr, if it
> > > +      * is the same domain as the hwpt->domain, it means that this hwpt
> > > has
> > > +      * the iommu_group/device.
> > > +      */
> > > +     return hwpt->domain == iommu_get_domain_for_dev(dev);
> > >  }
> >
> > Here we could have three scenarios:
> >
> > 1) the device is attached to blocked domain;
> > 2) the device is attached to hwpt->domain;
> > 3) the device is attached to another hwpt->domain;
> >
> > if this function returns false then iommufd_device_do_attach() will attach
> > the device to the specified hwpt. But then it's wrong for 3).
> >
> > Has 3) been denied in earlier path? If yes at least a WARN_ON for
> > case 3) makes sense here.
> 
> The case #3 means the device is already attached to some other
> domain? Then vfio_iommufd_physical_attach_ioas returns -EBUSY
> at the sanity of vdev->iommufd_attached. And the case #3 feels
> like a domain replacement use case to me. So probably not that
> necessary to add a wARN_ON?
> 

You are right. I thought about the cdev case where the device is
not attached in vfio but has a valid domain due to attach status
of other devices in the group. But even in this case it's user's
responsibility to not break group boundary. So yes it's just a
domain replacement and WARN_ON is not required.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH v2 1/3] iommufd: Add devices_users to track the hw_pagetable usage by device
  2023-01-29  9:23   ` Tian, Kevin
  2023-01-29  9:30     ` Nicolin Chen
@ 2023-01-30  2:22     ` Liu, Yi L
  1 sibling, 0 replies; 44+ messages in thread
From: Liu, Yi L @ 2023-01-30  2:22 UTC (permalink / raw)
  To: Tian, Kevin, Nicolin Chen, jgg; +Cc: iommu, linux-kernel

> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Sunday, January 29, 2023 5:23 PM
>
> > hw_pagetable by the attached devices. Make this field as a pointer, only
> > allocate for a stage-2 hw_pagetable. A stage-1 hw_pagetable should
> reuse
> > the stage-2 hw_pagetable's devices_users, because when a device
> attaches
> > to a stage-1 hw_pagetable, linking the stage-2 hwpt to the IOAS is still
> > required. So, with a nested configuration, increase the devices_users on
> > the stage-2 (parent) hwpt, no matter a device is attached to the stage-1
> > or the stage-2 hwpt.
> 
> Above is very confusing w/o seeing the full series of nesting support.
> 
> As a preparatory step this should focus on existing code and what this
> series tries to achieve. e.g. I'd not make device_users a pointer here.
> Do that incrementally when the nesting support comes.

Yes, in the below branch, I've moved this patch to be together with the nesting
commits. Maybe I can send out the nesting RFC.

https://github.com/yiliu1765/iommufd/commits/wip/iommufd-v6.2-rc4-nesting

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 3/3] iommufd/device: Change iommufd_hw_pagetable_has_group to device centric
  2023-01-29 10:38     ` Nicolin Chen
  2023-01-30  0:44       ` Tian, Kevin
@ 2023-01-30 10:22       ` Nicolin Chen
  2023-02-01  3:07         ` Tian, Kevin
  1 sibling, 1 reply; 44+ messages in thread
From: Nicolin Chen @ 2023-01-30 10:22 UTC (permalink / raw)
  To: jgg, Tian, Kevin; +Cc: Liu, Yi L, iommu, linux-kernel

On Sun, Jan 29, 2023 at 02:38:55AM -0800, Nicolin Chen wrote:
 
> > > @@ -385,10 +372,8 @@ void iommufd_device_detach(struct
> > > iommufd_device *idev)
> > >       struct iommufd_hw_pagetable *hwpt = idev->hwpt;
> > >
> > >       mutex_lock(&hwpt->ioas->mutex);
> > > -     mutex_lock(&hwpt->devices_lock);
> > >       refcount_dec(hwpt->devices_users);
> > > -     list_del(&idev->devices_item);
> > > -     if (!iommufd_hw_pagetable_has_group(hwpt, idev->group)) {
> > > +     if (iommufd_hw_pagetable_has_device(hwpt, idev->dev)) {
> > >               if (refcount_read(hwpt->devices_users) == 1) {
> > >                       iopt_table_remove_domain(&hwpt->ioas->iopt,
> > >                                                hwpt->domain);
> > > @@ -397,7 +382,6 @@ void iommufd_device_detach(struct iommufd_device
> > > *idev)
> > >               iommu_detach_group(hwpt->domain, idev->group);
> > >       }
> > 
> > emmm how do we track last device detach in a group? Here the first
> > device detach already leads to group detach...
> 
> Oh no. That's a bug. Thanks for catching it.
> 
> We need an additional refcount somewhere to track the number of
> attached devices in the iommu_group.

Wondering if we can let iommu_attach/detach_device handle this:

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d0d7c2177ad6..b38f71e92e2a 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -57,6 +57,7 @@ struct iommu_group {
 	struct iommu_domain *domain;
 	struct list_head entry;
 	unsigned int owner_cnt;
+	unsigned int attached_cnt;
 	void *owner;
 };
 
@@ -64,6 +65,7 @@ struct group_device {
 	struct list_head list;
 	struct device *dev;
 	char *name;
+	bool attached;
 };
 
 struct iommu_group_attribute {
@@ -2035,6 +2037,7 @@ static int __iommu_attach_device(struct iommu_domain *domain,
  */
 int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
 {
+	struct group_device *grp_dev;
 	struct iommu_group *group;
 	int ret;
 
@@ -2042,16 +2045,22 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev)
 	if (!group)
 		return -ENODEV;
 
-	/*
-	 * Lock the group to make sure the device-count doesn't
-	 * change while we are attaching
-	 */
 	mutex_lock(&group->mutex);
-	ret = -EINVAL;
-	if (iommu_group_device_count(group) != 1)
+
+	list_for_each_entry(grp_dev, &group->devices, list)
+		if (grp_dev->dev == dev)
+			break;
+	if (grp_dev->attached)
 		goto out_unlock;
 
-	ret = __iommu_attach_group(domain, group);
+	/* Attach the group when attaching the first device in the group */
+	if (group->attached_cnt == 0) {
+		ret = __iommu_attach_group(domain, group);
+		if (ret)
+			goto out_unlock;
+	}
+	grp_dev->attached = true;
+	group->attached_cnt++;
 
 out_unlock:
 	mutex_unlock(&group->mutex);
@@ -2071,6 +2080,7 @@ int iommu_deferred_attach(struct device *dev, struct iommu_domain *domain)
 
 void iommu_detach_device(struct iommu_domain *domain, struct device *dev)
 {
+	struct group_device *grp_dev;
 	struct iommu_group *group;
 
 	group = iommu_group_get(dev);
@@ -2078,10 +2088,20 @@ void iommu_detach_device(struct iommu_domain *domain, struct device *dev)
 		return;
 
 	mutex_lock(&group->mutex);
-	if (WARN_ON(domain != group->domain) ||
-	    WARN_ON(iommu_group_device_count(group) != 1))
+	if (WARN_ON(domain != group->domain))
 		goto out_unlock;
-	__iommu_group_set_core_domain(group);
+
+	list_for_each_entry(grp_dev, &group->devices, list)
+		if (grp_dev->dev == dev)
+			break;
+	if (!grp_dev->attached)
+		goto out_unlock;
+
+	grp_dev->attached = false;
+	group->attached_cnt--;
+	/* Detach the group when detaching the last device in the group */
+	if (group->attached_cnt == 0)
+		__iommu_group_set_core_domain(group);
 
 out_unlock:
 	mutex_unlock(&group->mutex);

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 2/3] iommufd/device: Make hwpt_list list_add/del symmetric
  2023-01-28 21:18 ` [PATCH v2 2/3] iommufd/device: Make hwpt_list list_add/del symmetric Nicolin Chen
  2023-01-29  9:24   ` Tian, Kevin
@ 2023-01-30 14:59   ` Jason Gunthorpe
  2023-01-30 19:03     ` Nicolin Chen
  1 sibling, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2023-01-30 14:59 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: kevin.tian, yi.l.liu, iommu, linux-kernel

On Sat, Jan 28, 2023 at 01:18:10PM -0800, Nicolin Chen wrote:
> Since the list_del() of hwpt_item is done in iommufd_device_detach(), move
> its list_add_tail() to a similar place in iommufd_device_do_attach().
> 
> Also move and place the mutex outside the iommufd_device_auto_get_domain()
> and iommufd_device_do_attach() calls, to serialize attach/detach routines.
> This adds an additional locking protection so that the following patch can
> safely remove devices_lock.

That should be two patches then, this is just moving one line of code
from what I can tell?

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/3] iommufd: Add devices_users to track the hw_pagetable usage by device
  2023-01-28 21:18 ` [PATCH v2 1/3] iommufd: Add devices_users to track the hw_pagetable usage by device Nicolin Chen
  2023-01-29  9:23   ` Tian, Kevin
@ 2023-01-30 15:02   ` Jason Gunthorpe
  2023-01-30 19:27     ` Nicolin Chen
  1 sibling, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2023-01-30 15:02 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: kevin.tian, yi.l.liu, iommu, linux-kernel

On Sat, Jan 28, 2023 at 01:18:09PM -0800, Nicolin Chen wrote:
> From: Yi Liu <yi.l.liu@intel.com>
> 
> Currently, hw_pagetable tracks the attached devices using a device list.
> When attaching the first device to the kernel-managed hw_pagetable, it
> should be linked to IOAS. When detaching the last device from this hwpt,
> the link with IOAS should be removed too. And this first-or-last device
> check is done with list_empty(hwpt->devices).
> 
> However, with a nested configuration, when a device is attached to the
> user-managed stage-1 hw_pagetable, it will be added to this user-managed
> hwpt's device list instead of the kernel-managed stage-2 hwpt's one. And
> this breaks the logic for a kernel-managed hw_pagetable link/disconnect
> to/from IOAS/IOPT. e.g. the stage-2 hw_pagetable would be linked to IOAS
> multiple times if multiple device is attached, but it will become empty
> as soon as one device detached.

Why this seems really weird to say.

The stage 2 is linked explicitly to the IOAS that drives it's
map/unmap

Why is there any implicit activity here? There should be no implicit
attach of the S2 to an IOAS ever.

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 2/3] iommufd/device: Make hwpt_list list_add/del symmetric
  2023-01-30 14:59   ` Jason Gunthorpe
@ 2023-01-30 19:03     ` Nicolin Chen
  2023-01-30 19:07       ` Jason Gunthorpe
  0 siblings, 1 reply; 44+ messages in thread
From: Nicolin Chen @ 2023-01-30 19:03 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kevin.tian, yi.l.liu, iommu, linux-kernel

On Mon, Jan 30, 2023 at 10:59:32AM -0400, Jason Gunthorpe wrote:
> On Sat, Jan 28, 2023 at 01:18:10PM -0800, Nicolin Chen wrote:
> > Since the list_del() of hwpt_item is done in iommufd_device_detach(), move
> > its list_add_tail() to a similar place in iommufd_device_do_attach().
> > 
> > Also move and place the mutex outside the iommufd_device_auto_get_domain()
> > and iommufd_device_do_attach() calls, to serialize attach/detach routines.
> > This adds an additional locking protection so that the following patch can
> > safely remove devices_lock.
> 
> That should be two patches then, this is just moving one line of code
> from what I can tell?

The mutex is used to protect the list. So moving the list means
we'd need to the mutex too. What this patch does is to enlarge
the protection scope a bit to cover iommufd_device_do_attach()
and iommufd_device_auto_get_domain().

I can revise a bit of the commit message to mention this.

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 2/3] iommufd/device: Make hwpt_list list_add/del symmetric
  2023-01-30 19:03     ` Nicolin Chen
@ 2023-01-30 19:07       ` Jason Gunthorpe
  2023-01-30 19:38         ` Nicolin Chen
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2023-01-30 19:07 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: kevin.tian, yi.l.liu, iommu, linux-kernel

On Mon, Jan 30, 2023 at 11:03:09AM -0800, Nicolin Chen wrote:
> On Mon, Jan 30, 2023 at 10:59:32AM -0400, Jason Gunthorpe wrote:
> > On Sat, Jan 28, 2023 at 01:18:10PM -0800, Nicolin Chen wrote:
> > > Since the list_del() of hwpt_item is done in iommufd_device_detach(), move
> > > its list_add_tail() to a similar place in iommufd_device_do_attach().
> > > 
> > > Also move and place the mutex outside the iommufd_device_auto_get_domain()
> > > and iommufd_device_do_attach() calls, to serialize attach/detach routines.
> > > This adds an additional locking protection so that the following patch can
> > > safely remove devices_lock.
> > 
> > That should be two patches then, this is just moving one line of code
> > from what I can tell?
> 
> The mutex is used to protect the list. So moving the list means
> we'd need to the mutex too. What this patch does is to enlarge
> the protection scope a bit to cover iommufd_device_do_attach()
> and iommufd_device_auto_get_domain().

That doesn't explain why iommufd_device_auto_get_domain was changed
around, it already had the lock

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/3] iommufd: Add devices_users to track the hw_pagetable usage by device
  2023-01-30 15:02   ` Jason Gunthorpe
@ 2023-01-30 19:27     ` Nicolin Chen
  2023-01-30 19:50       ` Jason Gunthorpe
  0 siblings, 1 reply; 44+ messages in thread
From: Nicolin Chen @ 2023-01-30 19:27 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kevin.tian, yi.l.liu, iommu, linux-kernel

On Mon, Jan 30, 2023 at 11:02:25AM -0400, Jason Gunthorpe wrote:
> On Sat, Jan 28, 2023 at 01:18:09PM -0800, Nicolin Chen wrote:
> > From: Yi Liu <yi.l.liu@intel.com>
> > 
> > Currently, hw_pagetable tracks the attached devices using a device list.
> > When attaching the first device to the kernel-managed hw_pagetable, it
> > should be linked to IOAS. When detaching the last device from this hwpt,
> > the link with IOAS should be removed too. And this first-or-last device
> > check is done with list_empty(hwpt->devices).
> > 
> > However, with a nested configuration, when a device is attached to the
> > user-managed stage-1 hw_pagetable, it will be added to this user-managed
> > hwpt's device list instead of the kernel-managed stage-2 hwpt's one. And
> > this breaks the logic for a kernel-managed hw_pagetable link/disconnect
> > to/from IOAS/IOPT. e.g. the stage-2 hw_pagetable would be linked to IOAS
> > multiple times if multiple device is attached, but it will become empty
> > as soon as one device detached.
> 
> Why this seems really weird to say.
> 
> The stage 2 is linked explicitly to the IOAS that drives it's
> map/unmap
> 
> Why is there any implicit activity here? There should be no implicit
> attach of the S2 to an IOAS ever.

I think this is supposed to say the following use case:

Two stage-1 hwpts share the same parent s2_hwpt:

attach device1 to stage-1 hwpt1:
	...
	if (list_empty(s1_hwpt1->devices))		// empty; true
		iopt_table_add_domain(s2_hwpt->domain); // do once
	s1_hwpt1 device list cnt++;
	...

attach device2 to stage-1 hwpt2:
	...
	if (list_empty(s1_hwpt2->devices))		// empty; true
		iopt_table_add_domain(s2_hwpt->domain); // do again
	s1_hwpt2 device list cnt++;
	...

This is because each hwpt has its own device list. To prevent
the duplicated iopt_table_add_domain call, we need to check all
the device list. So this patch adds a shared list among all
relevant hwpts.

I can revise the commit message to make a better sense.

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 2/3] iommufd/device: Make hwpt_list list_add/del symmetric
  2023-01-30 19:07       ` Jason Gunthorpe
@ 2023-01-30 19:38         ` Nicolin Chen
  0 siblings, 0 replies; 44+ messages in thread
From: Nicolin Chen @ 2023-01-30 19:38 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kevin.tian, yi.l.liu, iommu, linux-kernel

On Mon, Jan 30, 2023 at 03:07:50PM -0400, Jason Gunthorpe wrote:
> On Mon, Jan 30, 2023 at 11:03:09AM -0800, Nicolin Chen wrote:
> > On Mon, Jan 30, 2023 at 10:59:32AM -0400, Jason Gunthorpe wrote:
> > > On Sat, Jan 28, 2023 at 01:18:10PM -0800, Nicolin Chen wrote:
> > > > Since the list_del() of hwpt_item is done in iommufd_device_detach(), move
> > > > its list_add_tail() to a similar place in iommufd_device_do_attach().
> > > > 
> > > > Also move and place the mutex outside the iommufd_device_auto_get_domain()
> > > > and iommufd_device_do_attach() calls, to serialize attach/detach routines.
> > > > This adds an additional locking protection so that the following patch can
> > > > safely remove devices_lock.
> > > 
> > > That should be two patches then, this is just moving one line of code
> > > from what I can tell?
> > 
> > The mutex is used to protect the list. So moving the list means
> > we'd need to the mutex too. What this patch does is to enlarge
> > the protection scope a bit to cover iommufd_device_do_attach()
> > and iommufd_device_auto_get_domain().
> 
> That doesn't explain why iommufd_device_auto_get_domain was changed
> around, it already had the lock

That is trying to make the code look like this:

iommufd_device_attach {
		...
 	case IOMMUFD_OBJ_HW_PAGETABLE:
+		mutex_lock(&hwpt->ioas->mutex);
 		rc = iommufd_device_do_attach(idev, hwpt);
+		mutex_unlock(&hwpt->ioas->mutex);
		...
 	case IOMMUFD_OBJ_IOAS:
		...
+		mutex_lock(&ioas->mutex);
 		rc = iommufd_device_auto_get_domain(idev, ioas);
+		mutex_unlock(&ioas->mutex);
		...
}

If you don't think that's necessary, I can make things intact
in iommufd_device_auto_get_domain().

Thanks
Nic

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/3] iommufd: Add devices_users to track the hw_pagetable usage by device
  2023-01-30 19:27     ` Nicolin Chen
@ 2023-01-30 19:50       ` Jason Gunthorpe
  2023-01-30 20:04         ` Nicolin Chen
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2023-01-30 19:50 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: kevin.tian, yi.l.liu, iommu, linux-kernel

On Mon, Jan 30, 2023 at 11:27:37AM -0800, Nicolin Chen wrote:
> On Mon, Jan 30, 2023 at 11:02:25AM -0400, Jason Gunthorpe wrote:
> > On Sat, Jan 28, 2023 at 01:18:09PM -0800, Nicolin Chen wrote:
> > > From: Yi Liu <yi.l.liu@intel.com>
> > > 
> > > Currently, hw_pagetable tracks the attached devices using a device list.
> > > When attaching the first device to the kernel-managed hw_pagetable, it
> > > should be linked to IOAS. When detaching the last device from this hwpt,
> > > the link with IOAS should be removed too. And this first-or-last device
> > > check is done with list_empty(hwpt->devices).
> > > 
> > > However, with a nested configuration, when a device is attached to the
> > > user-managed stage-1 hw_pagetable, it will be added to this user-managed
> > > hwpt's device list instead of the kernel-managed stage-2 hwpt's one. And
> > > this breaks the logic for a kernel-managed hw_pagetable link/disconnect
> > > to/from IOAS/IOPT. e.g. the stage-2 hw_pagetable would be linked to IOAS
> > > multiple times if multiple device is attached, but it will become empty
> > > as soon as one device detached.
> > 
> > Why this seems really weird to say.
> > 
> > The stage 2 is linked explicitly to the IOAS that drives it's
> > map/unmap
> > 
> > Why is there any implicit activity here? There should be no implicit
> > attach of the S2 to an IOAS ever.
> 
> I think this is supposed to say the following use case:
> 
> Two stage-1 hwpts share the same parent s2_hwpt:
> 
> attach device1 to stage-1 hwpt1:
> 	...
> 	if (list_empty(s1_hwpt1->devices))		// empty; true
> 		iopt_table_add_domain(s2_hwpt->domain); // do once
> 	s1_hwpt1 device list cnt++;
> 	...

No, this doesn't make sense.

The s2_hwpt should be created explicitly, not using autodomains

When it is created it should be linked to a single IOAS and that is
when iopt_table_add_domain() should have been called.

The S1 attach should do *nothing* to a S2.

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/3] iommufd: Add devices_users to track the hw_pagetable usage by device
  2023-01-30 19:50       ` Jason Gunthorpe
@ 2023-01-30 20:04         ` Nicolin Chen
  2023-01-30 20:35           ` Jason Gunthorpe
  0 siblings, 1 reply; 44+ messages in thread
From: Nicolin Chen @ 2023-01-30 20:04 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kevin.tian, yi.l.liu, iommu, linux-kernel

On Mon, Jan 30, 2023 at 03:50:07PM -0400, Jason Gunthorpe wrote:
> On Mon, Jan 30, 2023 at 11:27:37AM -0800, Nicolin Chen wrote:
> > On Mon, Jan 30, 2023 at 11:02:25AM -0400, Jason Gunthorpe wrote:
> > > On Sat, Jan 28, 2023 at 01:18:09PM -0800, Nicolin Chen wrote:
> > > > From: Yi Liu <yi.l.liu@intel.com>
> > > > 
> > > > Currently, hw_pagetable tracks the attached devices using a device list.
> > > > When attaching the first device to the kernel-managed hw_pagetable, it
> > > > should be linked to IOAS. When detaching the last device from this hwpt,
> > > > the link with IOAS should be removed too. And this first-or-last device
> > > > check is done with list_empty(hwpt->devices).
> > > > 
> > > > However, with a nested configuration, when a device is attached to the
> > > > user-managed stage-1 hw_pagetable, it will be added to this user-managed
> > > > hwpt's device list instead of the kernel-managed stage-2 hwpt's one. And
> > > > this breaks the logic for a kernel-managed hw_pagetable link/disconnect
> > > > to/from IOAS/IOPT. e.g. the stage-2 hw_pagetable would be linked to IOAS
> > > > multiple times if multiple device is attached, but it will become empty
> > > > as soon as one device detached.
> > > 
> > > Why this seems really weird to say.
> > > 
> > > The stage 2 is linked explicitly to the IOAS that drives it's
> > > map/unmap
> > > 
> > > Why is there any implicit activity here? There should be no implicit
> > > attach of the S2 to an IOAS ever.
> > 
> > I think this is supposed to say the following use case:
> > 
> > Two stage-1 hwpts share the same parent s2_hwpt:
> > 
> > attach device1 to stage-1 hwpt1:
> > 	...
> > 	if (list_empty(s1_hwpt1->devices))		// empty; true
> > 		iopt_table_add_domain(s2_hwpt->domain); // do once
> > 	s1_hwpt1 device list cnt++;
> > 	...
> 
> No, this doesn't make sense.
> 
> The s2_hwpt should be created explicitly, not using autodomains

iopt_table_add_domain() is called in iommufd_device_do_attach(),
so it's shared by both a created hwpt or autodomain.

> When it is created it should be linked to a single IOAS and that is
> when iopt_table_add_domain() should have been called.

I recall we've discussed this that SMMU sets up domain when it
attaches the device to, so we made a compromise here...

> The S1 attach should do *nothing* to a S2.

With that compromise, a nested attach flow may be
1) create an s2 hwpt
2) create an s1 hwpt
3) attach dev to s1
    calls iopt_table_add_domain()

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/3] iommufd: Add devices_users to track the hw_pagetable usage by device
  2023-01-30 20:04         ` Nicolin Chen
@ 2023-01-30 20:35           ` Jason Gunthorpe
  2023-01-30 20:53             ` Nicolin Chen
  2023-02-01  6:57             ` Nicolin Chen
  0 siblings, 2 replies; 44+ messages in thread
From: Jason Gunthorpe @ 2023-01-30 20:35 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: kevin.tian, yi.l.liu, iommu, linux-kernel

On Mon, Jan 30, 2023 at 12:04:33PM -0800, Nicolin Chen wrote:

> I recall we've discussed this that SMMU sets up domain when it
> attaches the device to, so we made a compromise here...

The ARM driver has a problem that it doesn't know what SMMU instance
will host the domain when it is allocated so it doesn't know if it
should select a S1 or S2 page table format - which is determined by
the capabilities of the specific SMMU HW block.

However, we don't have this problem when creating the S2. The S2
should be created by a special alloc_domain_iommufd() asking for the
S2 format. Not only does the new alloc_domain_iommufd API directly
request a S2 format table, but it also specifies the struct device so
any residual details can be determined directly.

Thus there is no need to do the two stage process when working with
the S2.

So fixup the driver to create fully configured iommu_domain's
immediately and get rid of this problem.

IMHO I would structure the smmu driver so that all the different
iommu_domain formats have their own ops pointer. The special
"undecided" format would have a special ops with only attach_dev and
at first attach it would switch the ops to whatever format it
selected.

I think this could get rid of a lot of the 'if undecided/S1/S2/CD'
complexity all over the place. You know what type it is because you
were called on a op that is only called on its type.

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/3] iommufd: Add devices_users to track the hw_pagetable usage by device
  2023-01-30 20:35           ` Jason Gunthorpe
@ 2023-01-30 20:53             ` Nicolin Chen
  2023-02-01  7:48               ` Nicolin Chen
  2023-02-01  6:57             ` Nicolin Chen
  1 sibling, 1 reply; 44+ messages in thread
From: Nicolin Chen @ 2023-01-30 20:53 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kevin.tian, yi.l.liu, iommu, linux-kernel

On Mon, Jan 30, 2023 at 04:35:35PM -0400, Jason Gunthorpe wrote:
> On Mon, Jan 30, 2023 at 12:04:33PM -0800, Nicolin Chen wrote:
> 
> > I recall we've discussed this that SMMU sets up domain when it
> > attaches the device to, so we made a compromise here...
> 
> The ARM driver has a problem that it doesn't know what SMMU instance
> will host the domain when it is allocated so it doesn't know if it
> should select a S1 or S2 page table format - which is determined by
> the capabilities of the specific SMMU HW block.
> 
> However, we don't have this problem when creating the S2. The S2
> should be created by a special alloc_domain_iommufd() asking for the
> S2 format. Not only does the new alloc_domain_iommufd API directly
> request a S2 format table, but it also specifies the struct device so
> any residual details can be determined directly.
> 
> Thus there is no need to do the two stage process when working with
> the S2.

Ah, right! Taking a quick look, we should be able to call that
arm_smmu_domain_finalise when handling alloc_domain_iommufd().

> So fixup the driver to create fully configured iommu_domain's
> immediately and get rid of this problem.

OK. I will draft a patch today.

> IMHO I would structure the smmu driver so that all the different
> iommu_domain formats have their own ops pointer. The special
> "undecided" format would have a special ops with only attach_dev and
> at first attach it would switch the ops to whatever format it
> selected.
> 
> I think this could get rid of a lot of the 'if undecided/S1/S2/CD'
> complexity all over the place. You know what type it is because you
> were called on a op that is only called on its type.

I see. I can try that as well. Hopefully it won't touch too
many places that cam raise a potential big concern/objection..

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH v2 3/3] iommufd/device: Change iommufd_hw_pagetable_has_group to device centric
  2023-01-30 10:22       ` Nicolin Chen
@ 2023-02-01  3:07         ` Tian, Kevin
  2023-02-01  6:49           ` Baolu Lu
  0 siblings, 1 reply; 44+ messages in thread
From: Tian, Kevin @ 2023-02-01  3:07 UTC (permalink / raw)
  To: Nicolin Chen, jgg, Lu, Baolu, robin.murphy; +Cc: Liu, Yi L, iommu, linux-kernel

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Monday, January 30, 2023 6:22 PM
> 
> On Sun, Jan 29, 2023 at 02:38:55AM -0800, Nicolin Chen wrote:
> 
> > > > @@ -385,10 +372,8 @@ void iommufd_device_detach(struct
> > > > iommufd_device *idev)
> > > >       struct iommufd_hw_pagetable *hwpt = idev->hwpt;
> > > >
> > > >       mutex_lock(&hwpt->ioas->mutex);
> > > > -     mutex_lock(&hwpt->devices_lock);
> > > >       refcount_dec(hwpt->devices_users);
> > > > -     list_del(&idev->devices_item);
> > > > -     if (!iommufd_hw_pagetable_has_group(hwpt, idev->group)) {
> > > > +     if (iommufd_hw_pagetable_has_device(hwpt, idev->dev)) {
> > > >               if (refcount_read(hwpt->devices_users) == 1) {
> > > >                       iopt_table_remove_domain(&hwpt->ioas->iopt,
> > > >                                                hwpt->domain);
> > > > @@ -397,7 +382,6 @@ void iommufd_device_detach(struct
> iommufd_device
> > > > *idev)
> > > >               iommu_detach_group(hwpt->domain, idev->group);
> > > >       }
> > >
> > > emmm how do we track last device detach in a group? Here the first
> > > device detach already leads to group detach...
> >
> > Oh no. That's a bug. Thanks for catching it.
> >
> > We need an additional refcount somewhere to track the number of
> > attached devices in the iommu_group.
> 
> Wondering if we can let iommu_attach/detach_device handle this:
> 

that is the desired way to fully remove group awareness in iommufd.

but iirc there were some concerns on changing their semantics. But
I don't remember the detail now. Jason might know. also +Baolu/Robin.

otherwise as long as the group attach/detach continues to be used
then identifying last device in the group always needs some hack
within iommufd itself.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 3/3] iommufd/device: Change iommufd_hw_pagetable_has_group to device centric
  2023-02-01  3:07         ` Tian, Kevin
@ 2023-02-01  6:49           ` Baolu Lu
  2023-02-01  6:59             ` Tian, Kevin
  0 siblings, 1 reply; 44+ messages in thread
From: Baolu Lu @ 2023-02-01  6:49 UTC (permalink / raw)
  To: Tian, Kevin, Nicolin Chen, jgg, Lu, Baolu, robin.murphy
  Cc: baolu.lu, Liu, Yi L, iommu, linux-kernel

On 2023/2/1 11:07, Tian, Kevin wrote:
>> From: Nicolin Chen <nicolinc@nvidia.com>
>> Sent: Monday, January 30, 2023 6:22 PM
>>
>> On Sun, Jan 29, 2023 at 02:38:55AM -0800, Nicolin Chen wrote:
>>
>>>>> @@ -385,10 +372,8 @@ void iommufd_device_detach(struct
>>>>> iommufd_device *idev)
>>>>>        struct iommufd_hw_pagetable *hwpt = idev->hwpt;
>>>>>
>>>>>        mutex_lock(&hwpt->ioas->mutex);
>>>>> -     mutex_lock(&hwpt->devices_lock);
>>>>>        refcount_dec(hwpt->devices_users);
>>>>> -     list_del(&idev->devices_item);
>>>>> -     if (!iommufd_hw_pagetable_has_group(hwpt, idev->group)) {
>>>>> +     if (iommufd_hw_pagetable_has_device(hwpt, idev->dev)) {
>>>>>                if (refcount_read(hwpt->devices_users) == 1) {
>>>>>                        iopt_table_remove_domain(&hwpt->ioas->iopt,
>>>>>                                                 hwpt->domain);
>>>>> @@ -397,7 +382,6 @@ void iommufd_device_detach(struct
>> iommufd_device
>>>>> *idev)
>>>>>                iommu_detach_group(hwpt->domain, idev->group);
>>>>>        }
>>>>
>>>> emmm how do we track last device detach in a group? Here the first
>>>> device detach already leads to group detach...
>>>
>>> Oh no. That's a bug. Thanks for catching it.
>>>
>>> We need an additional refcount somewhere to track the number of
>>> attached devices in the iommu_group.
>>
>> Wondering if we can let iommu_attach/detach_device handle this:
>>
> 
> that is the desired way to fully remove group awareness in iommufd.
> 
> but iirc there were some concerns on changing their semantics. But
> I don't remember the detail now. Jason might know. also +Baolu/Robin.
> 
> otherwise as long as the group attach/detach continues to be used
> then identifying last device in the group always needs some hack
> within iommufd itself.

I have tried to solve this problem.

https://lore.kernel.org/linux-iommu/20220106022053.2406748-1-baolu.lu@linux.intel.com/

I may need to review the original discussion to see if I can update a
new version.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/3] iommufd: Add devices_users to track the hw_pagetable usage by device
  2023-01-30 20:35           ` Jason Gunthorpe
  2023-01-30 20:53             ` Nicolin Chen
@ 2023-02-01  6:57             ` Nicolin Chen
  2023-02-01  7:56               ` Nicolin Chen
  2023-02-01 15:53               ` Jason Gunthorpe
  1 sibling, 2 replies; 44+ messages in thread
From: Nicolin Chen @ 2023-02-01  6:57 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kevin.tian, yi.l.liu, iommu, linux-kernel

On Mon, Jan 30, 2023 at 04:35:35PM -0400, Jason Gunthorpe wrote:
 
> IMHO I would structure the smmu driver so that all the different
> iommu_domain formats have their own ops pointer. The special
> "undecided" format would have a special ops with only attach_dev and
> at first attach it would switch the ops to whatever format it
> selected.
> 
> I think this could get rid of a lot of the 'if undecided/S1/S2/CD'
> complexity all over the place. You know what type it is because you
> were called on a op that is only called on its type.

An auto/unmanaged domain allocation via iommu_domain_alloc() would
be S1, while an allocation via ops->domain_alloc_user can be S1 or
S2 with a given parameter/flag. So, actually the format is always
decided. The trouble we have with the iommu_domain_alloc() path is
that we don't pass the dev pointer down to ops->domain_alloc. So,
the SMMU driver can't know which SMMU device the device is behind,
resulting in being unable to finalizing the domain. Robin mentioned
that he has a patch "iommu: Pass device through ops->domain_alloc".
Perhaps that is required for us to entirely fix the add_domain()
problem?

Thanks
Nic

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH v2 3/3] iommufd/device: Change iommufd_hw_pagetable_has_group to device centric
  2023-02-01  6:49           ` Baolu Lu
@ 2023-02-01  6:59             ` Tian, Kevin
  2023-02-01  7:20               ` Nicolin Chen
  0 siblings, 1 reply; 44+ messages in thread
From: Tian, Kevin @ 2023-02-01  6:59 UTC (permalink / raw)
  To: Baolu Lu, Nicolin Chen, jgg, Lu, Baolu, robin.murphy
  Cc: Liu, Yi L, iommu, linux-kernel

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Wednesday, February 1, 2023 2:49 PM
> 
> On 2023/2/1 11:07, Tian, Kevin wrote:
> >> From: Nicolin Chen <nicolinc@nvidia.com>
> >> Sent: Monday, January 30, 2023 6:22 PM
> >>
> >> On Sun, Jan 29, 2023 at 02:38:55AM -0800, Nicolin Chen wrote:
> >>
> >>>>> @@ -385,10 +372,8 @@ void iommufd_device_detach(struct
> >>>>> iommufd_device *idev)
> >>>>>        struct iommufd_hw_pagetable *hwpt = idev->hwpt;
> >>>>>
> >>>>>        mutex_lock(&hwpt->ioas->mutex);
> >>>>> -     mutex_lock(&hwpt->devices_lock);
> >>>>>        refcount_dec(hwpt->devices_users);
> >>>>> -     list_del(&idev->devices_item);
> >>>>> -     if (!iommufd_hw_pagetable_has_group(hwpt, idev->group)) {
> >>>>> +     if (iommufd_hw_pagetable_has_device(hwpt, idev->dev)) {
> >>>>>                if (refcount_read(hwpt->devices_users) == 1) {
> >>>>>                        iopt_table_remove_domain(&hwpt->ioas->iopt,
> >>>>>                                                 hwpt->domain);
> >>>>> @@ -397,7 +382,6 @@ void iommufd_device_detach(struct
> >> iommufd_device
> >>>>> *idev)
> >>>>>                iommu_detach_group(hwpt->domain, idev->group);
> >>>>>        }
> >>>>
> >>>> emmm how do we track last device detach in a group? Here the first
> >>>> device detach already leads to group detach...
> >>>
> >>> Oh no. That's a bug. Thanks for catching it.
> >>>
> >>> We need an additional refcount somewhere to track the number of
> >>> attached devices in the iommu_group.
> >>
> >> Wondering if we can let iommu_attach/detach_device handle this:
> >>
> >
> > that is the desired way to fully remove group awareness in iommufd.
> >
> > but iirc there were some concerns on changing their semantics. But
> > I don't remember the detail now. Jason might know. also +Baolu/Robin.
> >
> > otherwise as long as the group attach/detach continues to be used
> > then identifying last device in the group always needs some hack
> > within iommufd itself.
> 
> I have tried to solve this problem.
> 
> https://lore.kernel.org/linux-iommu/20220106022053.2406748-1-
> baolu.lu@linux.intel.com/
> 
> I may need to review the original discussion to see if I can update a
> new version.
> 

emm looks there are quite some discussions to catch up.

anyway assuming this will happen, Nicolin do we still want this
preparatory series for coming nesting support?

iiuc the main motivation was on the complexity of s2 attaching
but with your discussion with Jason looks it's solvable. Then if this
group hack can be separated from the nesting work it avoids
unnecessary dependency in-between. 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 3/3] iommufd/device: Change iommufd_hw_pagetable_has_group to device centric
  2023-02-01  6:59             ` Tian, Kevin
@ 2023-02-01  7:20               ` Nicolin Chen
  2023-02-02  6:32                 ` Tian, Kevin
  0 siblings, 1 reply; 44+ messages in thread
From: Nicolin Chen @ 2023-02-01  7:20 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Baolu Lu, jgg, Lu, Baolu, robin.murphy, Liu, Yi L, iommu, linux-kernel

On Wed, Feb 01, 2023 at 06:59:21AM +0000, Tian, Kevin wrote:
> External email: Use caution opening links or attachments
> 
> 
> > From: Baolu Lu <baolu.lu@linux.intel.com>
> > Sent: Wednesday, February 1, 2023 2:49 PM
> >
> > On 2023/2/1 11:07, Tian, Kevin wrote:
> > >> From: Nicolin Chen <nicolinc@nvidia.com>
> > >> Sent: Monday, January 30, 2023 6:22 PM
> > >>
> > >> On Sun, Jan 29, 2023 at 02:38:55AM -0800, Nicolin Chen wrote:
> > >>
> > >>>>> @@ -385,10 +372,8 @@ void iommufd_device_detach(struct
> > >>>>> iommufd_device *idev)
> > >>>>>        struct iommufd_hw_pagetable *hwpt = idev->hwpt;
> > >>>>>
> > >>>>>        mutex_lock(&hwpt->ioas->mutex);
> > >>>>> -     mutex_lock(&hwpt->devices_lock);
> > >>>>>        refcount_dec(hwpt->devices_users);
> > >>>>> -     list_del(&idev->devices_item);
> > >>>>> -     if (!iommufd_hw_pagetable_has_group(hwpt, idev->group)) {
> > >>>>> +     if (iommufd_hw_pagetable_has_device(hwpt, idev->dev)) {
> > >>>>>                if (refcount_read(hwpt->devices_users) == 1) {
> > >>>>>                        iopt_table_remove_domain(&hwpt->ioas->iopt,
> > >>>>>                                                 hwpt->domain);
> > >>>>> @@ -397,7 +382,6 @@ void iommufd_device_detach(struct
> > >> iommufd_device
> > >>>>> *idev)
> > >>>>>                iommu_detach_group(hwpt->domain, idev->group);
> > >>>>>        }
> > >>>>
> > >>>> emmm how do we track last device detach in a group? Here the first
> > >>>> device detach already leads to group detach...
> > >>>
> > >>> Oh no. That's a bug. Thanks for catching it.
> > >>>
> > >>> We need an additional refcount somewhere to track the number of
> > >>> attached devices in the iommu_group.
> > >>
> > >> Wondering if we can let iommu_attach/detach_device handle this:
> > >>
> > >
> > > that is the desired way to fully remove group awareness in iommufd.
> > >
> > > but iirc there were some concerns on changing their semantics. But
> > > I don't remember the detail now. Jason might know. also +Baolu/Robin.
> > >
> > > otherwise as long as the group attach/detach continues to be used
> > > then identifying last device in the group always needs some hack
> > > within iommufd itself.
> >
> > I have tried to solve this problem.
> >
> > https://lore.kernel.org/linux-iommu/20220106022053.2406748-1-
> > baolu.lu@linux.intel.com/
> >
> > I may need to review the original discussion to see if I can update a
> > new version.
> >
> 
> emm looks there are quite some discussions to catch up.
> 
> anyway assuming this will happen, Nicolin do we still want this
> preparatory series for coming nesting support?
>
> iiuc the main motivation was on the complexity of s2 attaching
> but with your discussion with Jason looks it's solvable. Then if this
> group hack can be separated from the nesting work it avoids
> unnecessary dependency in-between.

The device list is going to overcomplicate either the nesting
series or the replace series. And Jason asked me to send the
replace series prior to our nesting series.

Hoping that we can eliminate the dependencies between those two
series, I took these patches out and sent separately to clean
up a way. Yet, I was naive by thinking that I could do it with
a smaller pathset.

So, assuming we drop this series and move the first two patches
back to the nesting series or the replace series, one of them
would end up doing something ugly:

	if (cur_hwpt != hwpt)
		mutex_lock(&cur_hwpt->device_lock);
	mutex_lock(&hwpt->device_lock);
	...
	mutex_unlock(&hwpt->device_lock);
	if (cur_hwpt != hwpt)
		mutex_unlock(&cur_hwpt->device_lock);

So, perhaps we should discuss about which way we want to choose.

Btw, Baolu's version has a similar patch as mine changing the
iommu_attach/detach_device(), yet also touches _group(). Could
we bisect that series into _device() first and _group() later?
Given that we only need a device-centric API at this moment...

Thanks
Nic

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/3] iommufd: Add devices_users to track the hw_pagetable usage by device
  2023-01-30 20:53             ` Nicolin Chen
@ 2023-02-01  7:48               ` Nicolin Chen
  2023-02-02  9:12                 ` Nicolin Chen
  0 siblings, 1 reply; 44+ messages in thread
From: Nicolin Chen @ 2023-02-01  7:48 UTC (permalink / raw)
  To: yi.l.liu; +Cc: kevin.tian, iommu, linux-kernel, Jason Gunthorpe

On Mon, Jan 30, 2023 at 12:54:01PM -0800, Nicolin Chen wrote:
> On Mon, Jan 30, 2023 at 04:35:35PM -0400, Jason Gunthorpe wrote:
> > On Mon, Jan 30, 2023 at 12:04:33PM -0800, Nicolin Chen wrote:
> > 
> > > I recall we've discussed this that SMMU sets up domain when it
> > > attaches the device to, so we made a compromise here...
> > 
> > The ARM driver has a problem that it doesn't know what SMMU instance
> > will host the domain when it is allocated so it doesn't know if it
> > should select a S1 or S2 page table format - which is determined by
> > the capabilities of the specific SMMU HW block.
> > 
> > However, we don't have this problem when creating the S2. The S2
> > should be created by a special alloc_domain_iommufd() asking for the
> > S2 format. Not only does the new alloc_domain_iommufd API directly
> > request a S2 format table, but it also specifies the struct device so
> > any residual details can be determined directly.
> > 
> > Thus there is no need to do the two stage process when working with
> > the S2.
> 
> Ah, right! Taking a quick look, we should be able to call that
> arm_smmu_domain_finalise when handling alloc_domain_iommufd().
> 
> > So fixup the driver to create fully configured iommu_domain's
> > immediately and get rid of this problem.
> 
> OK. I will draft a patch today.

@Yi
Do you recall doing iopt_table_add_domain() in hwpt_alloc()?

Jason has a great point above. So even SMMU should be able to
call the iopt_table_add_domain() after a kernel-manged hwpt
allocation rather than after an iommu_attach_group(), except
an auto_domain or a selftest mock_domain that still needs to
attach the device first, otherwise the SMMU driver (currently)
cannot finalize the domain aperture.

I made a draft today, and ran some sanity with SMMUv3:
  "iommufd: Attach/detach hwpt to IOAS at alloc/destroy"
  https://github.com/nicolinc/iommufd/commit/5ae54f360495aae35b5967d1eb00149912145639

The change basically: moves iopt_table_add/remove_domain()
next to hwpt_alloc/destroy(); an auto_domain or a mock_domain
needs to attach_dev first, before calling the add_domain().

With this change, following patches of attach_ioas() and new
selftest should be also updated. I have them in a wip branch:
https://github.com/nicolinc/iommufd/commits/wip/iommufd-v6.2-rc4-nesting-01312023

Can you check if there's anything wrong with this approach?
And would it be possible for you to integrate this into the
nesting series?

I can also help cleanup and polish the changes with a proper
commit message.

Thanks
Nic

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/3] iommufd: Add devices_users to track the hw_pagetable usage by device
  2023-02-01  6:57             ` Nicolin Chen
@ 2023-02-01  7:56               ` Nicolin Chen
  2023-02-01 15:53               ` Jason Gunthorpe
  1 sibling, 0 replies; 44+ messages in thread
From: Nicolin Chen @ 2023-02-01  7:56 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kevin.tian, yi.l.liu, iommu, linux-kernel

On Tue, Jan 31, 2023 at 10:57:19PM -0800, Nicolin Chen wrote:
> On Mon, Jan 30, 2023 at 04:35:35PM -0400, Jason Gunthorpe wrote:
>  
> > IMHO I would structure the smmu driver so that all the different
> > iommu_domain formats have their own ops pointer. The special
> > "undecided" format would have a special ops with only attach_dev and
> > at first attach it would switch the ops to whatever format it
> > selected.
> > 
> > I think this could get rid of a lot of the 'if undecided/S1/S2/CD'
> > complexity all over the place. You know what type it is because you
> > were called on a op that is only called on its type.
> 
> An auto/unmanaged domain allocation via iommu_domain_alloc() would
> be S1, while an allocation via ops->domain_alloc_user can be S1 or
> S2 with a given parameter/flag. So, actually the format is always
> decided. The trouble we have with the iommu_domain_alloc() path is
> that we don't pass the dev pointer down to ops->domain_alloc. So,

Precisely, the undecided part is not the format but the ias/ios
and SMMU feature bits of the SMMU HW device's, in order to set
up the domain->geometry. And the regular domain_alloc() taking
a "type" augment alone can't get the SMMU pointer.

Thanks
Nic

> the SMMU driver can't know which SMMU device the device is behind,
> resulting in being unable to finalizing the domain. Robin mentioned
> that he has a patch "iommu: Pass device through ops->domain_alloc".
> Perhaps that is required for us to entirely fix the add_domain()
> problem?
> 
> Thanks
> Nic

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/3] iommufd: Add devices_users to track the hw_pagetable usage by device
  2023-02-01  6:57             ` Nicolin Chen
  2023-02-01  7:56               ` Nicolin Chen
@ 2023-02-01 15:53               ` Jason Gunthorpe
  2023-02-01 17:46                 ` Nicolin Chen
  1 sibling, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2023-02-01 15:53 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: kevin.tian, yi.l.liu, iommu, linux-kernel

On Tue, Jan 31, 2023 at 10:57:13PM -0800, Nicolin Chen wrote:
> On Mon, Jan 30, 2023 at 04:35:35PM -0400, Jason Gunthorpe wrote:
>  
> > IMHO I would structure the smmu driver so that all the different
> > iommu_domain formats have their own ops pointer. The special
> > "undecided" format would have a special ops with only attach_dev and
> > at first attach it would switch the ops to whatever format it
> > selected.
> > 
> > I think this could get rid of a lot of the 'if undecided/S1/S2/CD'
> > complexity all over the place. You know what type it is because you
> > were called on a op that is only called on its type.
> 
> An auto/unmanaged domain allocation via iommu_domain_alloc() would
> be S1, while an allocation via ops->domain_alloc_user can be S1 or
> S2 with a given parameter/flag. So, actually the format is always
> decided. 

No, it can't decide the S1/S2 format until it knows the smmu because
of this:

	/* Restrict the stage to what we can actually support */
	if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1))
		smmu_domain->stage = ARM_SMMU_DOMAIN_S2;
	if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S2))
		smmu_domain->stage = ARM_SMMU_DOMAIN_S1;

So the format is never decided.

> that we don't pass the dev pointer down to ops->domain_alloc. So,
> the SMMU driver can't know which SMMU device the device is behind,
> resulting in being unable to finalizing the domain. Robin mentioned
> that he has a patch "iommu: Pass device through ops->domain_alloc".
> Perhaps that is required for us to entirely fix the add_domain()
> problem?

Robin is making progress, hopefully soon

So the issue is with replace you need to have the domain populated
before we can call replace but you can't populate the domain until it
is bound because of the above issue? That seems unsovlable without
fixing up the driver.

I'd say replace can go ahead ingoring that issue and that for now
replace will only work on ARM with domains created by
domain_alloc_user that are fully configured.

It will start working correctly for auto domains once Robin's changes
get finished.

Is there another issue?

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/3] iommufd: Add devices_users to track the hw_pagetable usage by device
  2023-02-01 15:53               ` Jason Gunthorpe
@ 2023-02-01 17:46                 ` Nicolin Chen
  2023-02-01 18:37                   ` Jason Gunthorpe
  0 siblings, 1 reply; 44+ messages in thread
From: Nicolin Chen @ 2023-02-01 17:46 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kevin.tian, yi.l.liu, iommu, linux-kernel

On Wed, Feb 01, 2023 at 11:53:02AM -0400, Jason Gunthorpe wrote:
> On Tue, Jan 31, 2023 at 10:57:13PM -0800, Nicolin Chen wrote:
> > On Mon, Jan 30, 2023 at 04:35:35PM -0400, Jason Gunthorpe wrote:
> >  
> > > IMHO I would structure the smmu driver so that all the different
> > > iommu_domain formats have their own ops pointer. The special
> > > "undecided" format would have a special ops with only attach_dev and
> > > at first attach it would switch the ops to whatever format it
> > > selected.
> > > 
> > > I think this could get rid of a lot of the 'if undecided/S1/S2/CD'
> > > complexity all over the place. You know what type it is because you
> > > were called on a op that is only called on its type.
> > 
> > An auto/unmanaged domain allocation via iommu_domain_alloc() would
> > be S1, while an allocation via ops->domain_alloc_user can be S1 or
> > S2 with a given parameter/flag. So, actually the format is always
> > decided. 
> 
> No, it can't decide the S1/S2 format until it knows the smmu because
> of this:
> 
> 	/* Restrict the stage to what we can actually support */
> 	if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1))
> 		smmu_domain->stage = ARM_SMMU_DOMAIN_S2;
> 	if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S2))
> 		smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
> 
> So the format is never decided.

OK. That's right. And the solution to that is also passing a dev
pointer in regular ->domain_alloc() op.

> > that we don't pass the dev pointer down to ops->domain_alloc. So,
> > the SMMU driver can't know which SMMU device the device is behind,
> > resulting in being unable to finalizing the domain. Robin mentioned
> > that he has a patch "iommu: Pass device through ops->domain_alloc".
> > Perhaps that is required for us to entirely fix the add_domain()
> > problem?
> 
> Robin is making progress, hopefully soon
> 
> So the issue is with replace you need to have the domain populated
> before we can call replace but you can't populate the domain until it
> is bound because of the above issue? That seems unsovlable without
> fixing up the driver.

Not really. A REPLACE ioctl is just an ATTACH, if the device just
gets BIND-ed. So the SMMU driver will initialize ("finalise") the
domain during the replace() call, then iopt_table_add_domain() can
be done.

So, not a blocker here.

> I'd say replace can go ahead ingoring that issue and that for now
> replace will only work on ARM with domains created by
> domain_alloc_user that are fully configured.
> 
> It will start working correctly for auto domains once Robin's changes
> get finished.
> 
> Is there another issue?

Oh. I think we mixed the topics here. These three patches were
not to unblock but to clean up a way for the replace series and
the nesting series, for the device locking issue:

	if (cur_hwpt != hwpt)
		mutex_lock(&cur_hwpt->device_lock);
	mutex_lock(&hwpt->device_lock);
	...
	if (iommufd_hw_pagetabe_has_group()) {	// touching device list
		...
		iommu_group_replace_domain();
		...
	}
	if (cur_hwpt && hwpt)
		list_del(&idev->devices_item);
	list_add(&idev->devices_item, &cur_hwpt->devices);
	...
	mutex_unlock(&hwpt->device_lock);
	if (cur_hwpt != hwpt)
		mutex_unlock(&cur_hwpt->device_lock);

I just gave another thought about it. Since we have the patch-2
from this series moving the ioas->mutex, it already serializes
attach/detach routines. And I see that all the places touching
idev->device_item and hwpt->devices are protected by ioas->mutex.
So, perhaps we can simply remove the device_lock?

do_attach():
	mutex_lock(&ioas->mutex); // protect both devices_item and hwpt_item
	...
	if (iommufd_hw_pagetabe_has_group()) {	// touching device list
		...
		iommu_group_replace_domain();
		...
	}
	if (cur_hwpt && hwpt)
		list_del(&idev->devices_item);
	list_add(&idev->devices_item, &cur_hwpt->devices);
	...
	mutex_unlock(&ioas->mutex);

do_detach():
	mutex_lock(&ioas->mutex); // protect both devices_item and hwpt_item
	...
	if (iommufd_hw_pagetabe_has_group()) {	// touching device list
		...
		iommu_detach_group();
		...
	}
	list_del(&idev->devices_item);
	...
	mutex_unlock(&ioas->mutex);

If this is correct, I think I can prepare the replace series and
send it by the end of the day.

Thanks
Nic

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/3] iommufd: Add devices_users to track the hw_pagetable usage by device
  2023-02-01 17:46                 ` Nicolin Chen
@ 2023-02-01 18:37                   ` Jason Gunthorpe
  2023-02-01 19:25                     ` Nicolin Chen
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2023-02-01 18:37 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: kevin.tian, yi.l.liu, iommu, linux-kernel

On Wed, Feb 01, 2023 at 09:46:23AM -0800, Nicolin Chen wrote:
> > So the issue is with replace you need to have the domain populated
> > before we can call replace but you can't populate the domain until it
> > is bound because of the above issue? That seems unsovlable without
> > fixing up the driver.
> 
> Not really. A REPLACE ioctl is just an ATTACH, if the device just
> gets BIND-ed. So the SMMU driver will initialize ("finalise") the
> domain during the replace() call, then iopt_table_add_domain() can
> be done.
> 
> So, not a blocker here.

Well, yes, there sort of is because the whole flow becomes nonsensical
- we are supposed to have the iommu_domain populated by the time we do
replace. Otherwise replace is extra-pointless..

> > Is there another issue?
> 
> Oh. I think we mixed the topics here. These three patches were
> not to unblock but to clean up a way for the replace series and
> the nesting series, for the device locking issue:
> 
> 	if (cur_hwpt != hwpt)
> 		mutex_lock(&cur_hwpt->device_lock);
> 	mutex_lock(&hwpt->device_lock);
> 	...
> 	if (iommufd_hw_pagetabe_has_group()) {	// touching device list
> 		...
> 		iommu_group_replace_domain();
> 		...
> 	}
> 	if (cur_hwpt && hwpt)
> 		list_del(&idev->devices_item);
> 	list_add(&idev->devices_item, &cur_hwpt->devices);
> 	...
> 	mutex_unlock(&hwpt->device_lock);
> 	if (cur_hwpt != hwpt)
> 		mutex_unlock(&cur_hwpt->device_lock);

What is the issue? That isn't quite right, but the basic bit is fine

If you want to do replace then you have to hold both devices_lock and
you write that super ugly thing like this

lock_both:
   if (hwpt_a < hwpt_b) {
      mutex_lock(&hwpt_a->devices_lock);
      mutex_lock_nested(&hwpt_b->devices_lock);
   } else if (hwpt_a > hwpt_b) {
      mutex_lock(&hwpt_b->devices_lock);
      mutex_lock_nested(&hwpt_a->devices_lock);
   } else
      mutex_lock(&hwpt_a->devices_lock);

And then it is trivial, yes?

Using the group_lock in the iommu core is the right way to fix
this.. Maybe someday we can do that.

(also document that replace causes all the devices in the group to
change iommu_domains at once)

> I just gave another thought about it. Since we have the patch-2
> from this series moving the ioas->mutex, it already serializes
> attach/detach routines. And I see that all the places touching
> idev->device_item and hwpt->devices are protected by ioas->mutex.
> So, perhaps we can simply remove the device_lock?

The two hwpts are not required to have the same ioas, so this doesn't
really help..

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/3] iommufd: Add devices_users to track the hw_pagetable usage by device
  2023-02-01 18:37                   ` Jason Gunthorpe
@ 2023-02-01 19:25                     ` Nicolin Chen
  2023-02-01 20:00                       ` Jason Gunthorpe
  2023-02-07  4:27                       ` Liu, Yi L
  0 siblings, 2 replies; 44+ messages in thread
From: Nicolin Chen @ 2023-02-01 19:25 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kevin.tian, yi.l.liu, iommu, linux-kernel

On Wed, Feb 01, 2023 at 02:37:42PM -0400, Jason Gunthorpe wrote:
> On Wed, Feb 01, 2023 at 09:46:23AM -0800, Nicolin Chen wrote:
> > > So the issue is with replace you need to have the domain populated
> > > before we can call replace but you can't populate the domain until it
> > > is bound because of the above issue? That seems unsovlable without
> > > fixing up the driver.
> > 
> > Not really. A REPLACE ioctl is just an ATTACH, if the device just
> > gets BIND-ed. So the SMMU driver will initialize ("finalise") the
> > domain during the replace() call, then iopt_table_add_domain() can
> > be done.
> > 
> > So, not a blocker here.
> 
> Well, yes, there sort of is because the whole flow becomes nonsensical
> - we are supposed to have the iommu_domain populated by the time we do
> replace. Otherwise replace is extra-pointless..

The "finalise" is one of the very first lines of the attach_dev()
callback function in SMMU driver, though it might still undesirably
fail the replace().

https://github.com/nicolinc/iommufd/commit/5ae54f360495aae35b5967d1eb00149912145639
Btw, this is a draft that I made to move iopt_table_add_domain(). I
think we can have this with the nesting series.

Later, once we pass in the dev pointer to the ->domain_alloc op
using Robin's change, all the iopt_table_add_domain() can be done
within the hwpt_alloc(), prior to an attach()/replace().

> > > Is there another issue?
> > 
> > Oh. I think we mixed the topics here. These three patches were
> > not to unblock but to clean up a way for the replace series and
> > the nesting series, for the device locking issue:
> > 
> > 	if (cur_hwpt != hwpt)
> > 		mutex_lock(&cur_hwpt->device_lock);
> > 	mutex_lock(&hwpt->device_lock);
> > 	...
> > 	if (iommufd_hw_pagetabe_has_group()) {	// touching device list
> > 		...
> > 		iommu_group_replace_domain();
> > 		...
> > 	}
> > 	if (cur_hwpt && hwpt)
> > 		list_del(&idev->devices_item);
> > 	list_add(&idev->devices_item, &cur_hwpt->devices);
> > 	...
> > 	mutex_unlock(&hwpt->device_lock);
> > 	if (cur_hwpt != hwpt)
> > 		mutex_unlock(&cur_hwpt->device_lock);
> 
> What is the issue? That isn't quite right, but the basic bit is fine
> 
> If you want to do replace then you have to hold both devices_lock and
> you write that super ugly thing like this
> 
> lock_both:
>    if (hwpt_a < hwpt_b) {
>       mutex_lock(&hwpt_a->devices_lock);
>       mutex_lock_nested(&hwpt_b->devices_lock);
>    } else if (hwpt_a > hwpt_b) {
>       mutex_lock(&hwpt_b->devices_lock);
>       mutex_lock_nested(&hwpt_a->devices_lock);
>    } else
>       mutex_lock(&hwpt_a->devices_lock);
> 
> And then it is trivial, yes?

Yea. That's your previous remark.

> Using the group_lock in the iommu core is the right way to fix
> this.. Maybe someday we can do that.
> 
> (also document that replace causes all the devices in the group to
> change iommu_domains at once)

Yes. There's a discussion in PATCH-3 of this series. I drafted a
patch changing iommu_attach/detach_dev():
https://github.com/nicolinc/iommufd/commit/124f7804ef38d50490b606fd56c1e27ce551a839

Baolu had a similar patch series a year ago. So we might continue
that effort in parallel, and eventually drop the device list/lock.

> > I just gave another thought about it. Since we have the patch-2
> > from this series moving the ioas->mutex, it already serializes
> > attach/detach routines. And I see that all the places touching
> > idev->device_item and hwpt->devices are protected by ioas->mutex.
> > So, perhaps we can simply remove the device_lock?
> 
> The two hwpts are not required to have the same ioas, so this doesn't
> really help..

Hmm...in that case, we should hold two ioas->mutex locks in
addition to two device locks?

Thanks
Nic

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/3] iommufd: Add devices_users to track the hw_pagetable usage by device
  2023-02-01 19:25                     ` Nicolin Chen
@ 2023-02-01 20:00                       ` Jason Gunthorpe
  2023-02-01 21:18                         ` Nicolin Chen
  2023-02-07  4:27                       ` Liu, Yi L
  1 sibling, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2023-02-01 20:00 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: kevin.tian, yi.l.liu, iommu, linux-kernel

On Wed, Feb 01, 2023 at 11:25:10AM -0800, Nicolin Chen wrote:

> The "finalise" is one of the very first lines of the attach_dev()
> callback function in SMMU driver, though it might still undesirably
> fail the replace().

It won't get that far.

Remember how this all works - only autodomains have the special path
that allocates a domain, attaches the empty domain, and then populates
it with the ioas. We made this special path specifically to accomodate
the current ARM drivers, otherwise they wouldn't work at all.

replace can't do this - replace must always start out with a
pre-existing hwpt that was allocated with a dedicated hwpt allocation
ioctl.

Wwhen the hwpt was allocated it must be linked to the IOAS at that
time, because we definately don't do defered IOAS linkage.

So on ARM when you create an unfinalizes iommu_domain it cannot be
added to the IOAS (because it has an empty aperture) and creation will
fail, or worse, it will get added to an empty IOAS and make the IOAS
permanently unusable.

> Hmm...in that case, we should hold two ioas->mutex locks in
> addition to two device locks?

No, the device lock is the thing that protects the data you are
touching no reason to make it any different.

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/3] iommufd: Add devices_users to track the hw_pagetable usage by device
  2023-02-01 20:00                       ` Jason Gunthorpe
@ 2023-02-01 21:18                         ` Nicolin Chen
  2023-02-02  7:28                           ` Nicolin Chen
  0 siblings, 1 reply; 44+ messages in thread
From: Nicolin Chen @ 2023-02-01 21:18 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kevin.tian, yi.l.liu, iommu, linux-kernel

On Wed, Feb 01, 2023 at 04:00:40PM -0400, Jason Gunthorpe wrote:
> On Wed, Feb 01, 2023 at 11:25:10AM -0800, Nicolin Chen wrote:
> 
> > The "finalise" is one of the very first lines of the attach_dev()
> > callback function in SMMU driver, though it might still undesirably
> > fail the replace().
> 
> It won't get that far.
> 
> Remember how this all works - only autodomains have the special path
> that allocates a domain, attaches the empty domain, and then populates
> it with the ioas. We made this special path specifically to accomodate
> the current ARM drivers, otherwise they wouldn't work at all.

Yes.

> replace can't do this - replace must always start out with a
> pre-existing hwpt that was allocated with a dedicated hwpt allocation
> ioctl.
> 
> Wwhen the hwpt was allocated it must be linked to the IOAS at that
> time, because we definately don't do defered IOAS linkage.
>
> So on ARM when you create an unfinalizes iommu_domain it cannot be
> added to the IOAS (because it has an empty aperture) and creation will
> fail, or worse, it will get added to an empty IOAS and make the IOAS
> permanently unusable.

IIUIC, user space might add some IOVA mappings to the hwpt/iopt,
before doing a replace(). If we do a deferred IOAS linkage to
this hwpt/iopt, it will cause a problem because we are adding
the reserved region for the MSI window later than IOMMU_IOAS_MAP
calls. Thus, "we definately don't do defered IOAS linkage".

With this justification, I think I should include my patch of
moving iopt_table_add/remove_domain(), in the replace series.

> > Hmm...in that case, we should hold two ioas->mutex locks in
> > addition to two device locks?
> 
> No, the device lock is the thing that protects the data you are
> touching no reason to make it any different.

Thinking it deeper: we don't really touch the old_hwpt->ioas or
its iopt, but only the new_hwpt->ioas/iopt to reserve the MSI
window, so there isn't an actual need to hold old_hwpt->ioas's
mutex.

I will prepare the replace series and send it by the end of the
day, upon certain level of sanity/verification.

Thanks!
Nic

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH v2 3/3] iommufd/device: Change iommufd_hw_pagetable_has_group to device centric
  2023-02-01  7:20               ` Nicolin Chen
@ 2023-02-02  6:32                 ` Tian, Kevin
  2023-02-02  6:36                   ` Nicolin Chen
  0 siblings, 1 reply; 44+ messages in thread
From: Tian, Kevin @ 2023-02-02  6:32 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Baolu Lu, jgg, Lu, Baolu, robin.murphy, Liu, Yi L, iommu, linux-kernel

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Wednesday, February 1, 2023 3:20 PM
> 
> So, assuming we drop this series and move the first two patches
> back to the nesting series or the replace series, one of them
> would end up doing something ugly:
> 
> 	if (cur_hwpt != hwpt)
> 		mutex_lock(&cur_hwpt->device_lock);
> 	mutex_lock(&hwpt->device_lock);
> 	...
> 	mutex_unlock(&hwpt->device_lock);
> 	if (cur_hwpt != hwpt)
> 		mutex_unlock(&cur_hwpt->device_lock);
> 
> So, perhaps we should discuss about which way we want to choose.

from your discussion with Jason I think this locking open has
been settled down.

> 
> Btw, Baolu's version has a similar patch as mine changing the
> iommu_attach/detach_device(), yet also touches _group(). Could
> we bisect that series into _device() first and _group() later?
> Given that we only need a device-centric API at this moment...
> 

I'll let Baolu to decide after he re-catches up the comments in
that thread. But overall I think we now agreed that removing the
device list/lock can be kept out of your replace/nesting series
and let it cleaned up after Baolu's work completes, correct? 😊

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 3/3] iommufd/device: Change iommufd_hw_pagetable_has_group to device centric
  2023-02-02  6:32                 ` Tian, Kevin
@ 2023-02-02  6:36                   ` Nicolin Chen
  0 siblings, 0 replies; 44+ messages in thread
From: Nicolin Chen @ 2023-02-02  6:36 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Baolu Lu, jgg, Lu, Baolu, robin.murphy, Liu, Yi L, iommu, linux-kernel

On Thu, Feb 02, 2023 at 06:32:36AM +0000, Tian, Kevin wrote:
> External email: Use caution opening links or attachments
> 
> 
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Wednesday, February 1, 2023 3:20 PM
> >
> > So, assuming we drop this series and move the first two patches
> > back to the nesting series or the replace series, one of them
> > would end up doing something ugly:
> >
> >       if (cur_hwpt != hwpt)
> >               mutex_lock(&cur_hwpt->device_lock);
> >       mutex_lock(&hwpt->device_lock);
> >       ...
> >       mutex_unlock(&hwpt->device_lock);
> >       if (cur_hwpt != hwpt)
> >               mutex_unlock(&cur_hwpt->device_lock);
> >
> > So, perhaps we should discuss about which way we want to choose.
> 
> from your discussion with Jason I think this locking open has
> been settled down.

Yes :)

> >
> > Btw, Baolu's version has a similar patch as mine changing the
> > iommu_attach/detach_device(), yet also touches _group(). Could
> > we bisect that series into _device() first and _group() later?
> > Given that we only need a device-centric API at this moment...
> >
> 
> I'll let Baolu to decide after he re-catches up the comments in
> that thread. But overall I think we now agreed that removing the
> device list/lock can be kept out of your replace/nesting series
> and let it cleaned up after Baolu's work completes, correct? 😊

Correct. It's not a blocker. And I am going to post the replace
series today -- running some additional sanity now.

Thanks
Nic

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/3] iommufd: Add devices_users to track the hw_pagetable usage by device
  2023-02-01 21:18                         ` Nicolin Chen
@ 2023-02-02  7:28                           ` Nicolin Chen
  2023-02-02 15:03                             ` Jason Gunthorpe
  0 siblings, 1 reply; 44+ messages in thread
From: Nicolin Chen @ 2023-02-02  7:28 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kevin.tian, yi.l.liu, iommu, linux-kernel

On Wed, Feb 01, 2023 at 01:18:08PM -0800, Nicolin Chen wrote:
> On Wed, Feb 01, 2023 at 04:00:40PM -0400, Jason Gunthorpe wrote:
> > On Wed, Feb 01, 2023 at 11:25:10AM -0800, Nicolin Chen wrote:
> > 
> > > The "finalise" is one of the very first lines of the attach_dev()
> > > callback function in SMMU driver, though it might still undesirably
> > > fail the replace().
> > 
> > It won't get that far.
> > 
> > Remember how this all works - only autodomains have the special path
> > that allocates a domain, attaches the empty domain, and then populates
> > it with the ioas. We made this special path specifically to accomodate
> > the current ARM drivers, otherwise they wouldn't work at all.
> 
> Yes.
> 
> > replace can't do this - replace must always start out with a
> > pre-existing hwpt that was allocated with a dedicated hwpt allocation
> > ioctl.
> > 
> > Wwhen the hwpt was allocated it must be linked to the IOAS at that
> > time, because we definately don't do defered IOAS linkage.
> >
> > So on ARM when you create an unfinalizes iommu_domain it cannot be
> > added to the IOAS (because it has an empty aperture) and creation will
> > fail, or worse, it will get added to an empty IOAS and make the IOAS
> > permanently unusable.
> 
> IIUIC, user space might add some IOVA mappings to the hwpt/iopt,
> before doing a replace(). If we do a deferred IOAS linkage to
> this hwpt/iopt, it will cause a problem because we are adding
> the reserved region for the MSI window later than IOMMU_IOAS_MAP
> calls. Thus, "we definately don't do defered IOAS linkage".
> 
> With this justification, I think I should include my patch of
> moving iopt_table_add/remove_domain(), in the replace series.

I just posted the replace series. But I found that the base
line (without the nesting series) allocates iommu_domains
always with the ->domain_alloc() op. So, we cannot actually
move the iopt_table_add_domain() in the replace series, as I
intended to.

Yet, a great news is that our nesting series replaces the
domain_alloc() op entirely with ->domain_alloc_user() for all
the iommu_domain allocations, including for auto_domains. So,
we can completely move iopt_table_add_domain() to the hwpt
allocation function. And we don't really need a big change
in the SMMU driver nor Robin's patch that passes in dev ptr
to domain_alloc() op. And even this device_users refcount in
this patch is no longer needed. It also simplifies the shared
device locking situation, if I am not missing anything.

So, in short, we'll have to wait for ->domain_alloc_user()
patch (in the nesting series) to unblock the problem that we
discussed above regarding the iopt_table_add_domain().

Thanks
Nic

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/3] iommufd: Add devices_users to track the hw_pagetable usage by device
  2023-02-01  7:48               ` Nicolin Chen
@ 2023-02-02  9:12                 ` Nicolin Chen
  2023-02-07  4:19                   ` Liu, Yi L
  0 siblings, 1 reply; 44+ messages in thread
From: Nicolin Chen @ 2023-02-02  9:12 UTC (permalink / raw)
  To: yi.l.liu; +Cc: kevin.tian, iommu, linux-kernel, Jason Gunthorpe

On Tue, Jan 31, 2023 at 11:48:04PM -0800, Nicolin Chen wrote:
> On Mon, Jan 30, 2023 at 12:54:01PM -0800, Nicolin Chen wrote:
> > On Mon, Jan 30, 2023 at 04:35:35PM -0400, Jason Gunthorpe wrote:
> > > On Mon, Jan 30, 2023 at 12:04:33PM -0800, Nicolin Chen wrote:
> > > 
> > > > I recall we've discussed this that SMMU sets up domain when it
> > > > attaches the device to, so we made a compromise here...
> > > 
> > > The ARM driver has a problem that it doesn't know what SMMU instance
> > > will host the domain when it is allocated so it doesn't know if it
> > > should select a S1 or S2 page table format - which is determined by
> > > the capabilities of the specific SMMU HW block.
> > > 
> > > However, we don't have this problem when creating the S2. The S2
> > > should be created by a special alloc_domain_iommufd() asking for the
> > > S2 format. Not only does the new alloc_domain_iommufd API directly
> > > request a S2 format table, but it also specifies the struct device so
> > > any residual details can be determined directly.
> > > 
> > > Thus there is no need to do the two stage process when working with
> > > the S2.
> > 
> > Ah, right! Taking a quick look, we should be able to call that
> > arm_smmu_domain_finalise when handling alloc_domain_iommufd().
> > 
> > > So fixup the driver to create fully configured iommu_domain's
> > > immediately and get rid of this problem.
> > 
> > OK. I will draft a patch today.
> 
> @Yi
> Do you recall doing iopt_table_add_domain() in hwpt_alloc()?
> 
> Jason has a great point above. So even SMMU should be able to
> call the iopt_table_add_domain() after a kernel-manged hwpt
> allocation rather than after an iommu_attach_group(), except
> an auto_domain or a selftest mock_domain that still needs to
> attach the device first, otherwise the SMMU driver (currently)
> cannot finalize the domain aperture.

Some update today: I found ops->domain_alloc_user() is used
for all domain allocations inside IOMMUFD. So, without any
special case, we could entirely do iopt_table_add_domain()
in the __iommufd_hw_pagetable_alloc() and accordingly do
iopt_table_remove_domain() in the hw_pagetable_destroy():
https://github.com/nicolinc/iommufd/commit/85248e1c5f645e1eb701562eb078cf586af617fe
(We can also skip that "symmetric" patch for the list_add,
 moving the list_add/del calls directly to alloc/destroy.)

Without the complication of the add/remove_domain() calls
in the do_attach/detach() functions, there is no need for
the device_users counter any more.

I am not 100% sure if we still need the shared device lock,
though so far the sanity that I run doesn't show a problem.
We may discuss about it later when we converge our branches.
As before, I am also okay to do in the way with incremental
changes on top of your tree and to ask you to integrate,
once you have your branch ready.

My full wip branch:
https://github.com/nicolinc/iommufd/commits/wip/iommufd-v6.2-rc5-nesting

Thanks
Nic

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/3] iommufd: Add devices_users to track the hw_pagetable usage by device
  2023-02-02  7:28                           ` Nicolin Chen
@ 2023-02-02 15:03                             ` Jason Gunthorpe
  0 siblings, 0 replies; 44+ messages in thread
From: Jason Gunthorpe @ 2023-02-02 15:03 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: kevin.tian, yi.l.liu, iommu, linux-kernel

On Wed, Feb 01, 2023 at 11:28:05PM -0800, Nicolin Chen wrote:

> So, in short, we'll have to wait for ->domain_alloc_user()
> patch (in the nesting series) to unblock the problem that we
> discussed above regarding the iopt_table_add_domain().

I think that is probably OK

It would be good to prepare a SMMUv3 patch assuming Robin's stuff
lands to move the domain iopt determination to allocate so we are
ready.

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH v2 1/3] iommufd: Add devices_users to track the hw_pagetable usage by device
  2023-02-02  9:12                 ` Nicolin Chen
@ 2023-02-07  4:19                   ` Liu, Yi L
  0 siblings, 0 replies; 44+ messages in thread
From: Liu, Yi L @ 2023-02-07  4:19 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: Tian, Kevin, iommu, linux-kernel, Jason Gunthorpe

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Thursday, February 2, 2023 5:12 PM
> 
> On Tue, Jan 31, 2023 at 11:48:04PM -0800, Nicolin Chen wrote:
> > On Mon, Jan 30, 2023 at 12:54:01PM -0800, Nicolin Chen wrote:
> > > On Mon, Jan 30, 2023 at 04:35:35PM -0400, Jason Gunthorpe wrote:
> > > > On Mon, Jan 30, 2023 at 12:04:33PM -0800, Nicolin Chen wrote:
> > > >
> > > > > I recall we've discussed this that SMMU sets up domain when it
> > > > > attaches the device to, so we made a compromise here...
> > > >
> > > > The ARM driver has a problem that it doesn't know what SMMU
> instance
> > > > will host the domain when it is allocated so it doesn't know if it
> > > > should select a S1 or S2 page table format - which is determined by
> > > > the capabilities of the specific SMMU HW block.
> > > >
> > > > However, we don't have this problem when creating the S2. The S2
> > > > should be created by a special alloc_domain_iommufd() asking for the
> > > > S2 format. Not only does the new alloc_domain_iommufd API directly
> > > > request a S2 format table, but it also specifies the struct device so
> > > > any residual details can be determined directly.
> > > >
> > > > Thus there is no need to do the two stage process when working with
> > > > the S2.
> > >
> > > Ah, right! Taking a quick look, we should be able to call that
> > > arm_smmu_domain_finalise when handling alloc_domain_iommufd().
> > >
> > > > So fixup the driver to create fully configured iommu_domain's
> > > > immediately and get rid of this problem.
> > >
> > > OK. I will draft a patch today.
> >
> > @Yi
> > Do you recall doing iopt_table_add_domain() in hwpt_alloc()?

Yeah, doing iopt_table_add_domain() in hwpt_alloc suits well.
The only reason for current code is SMMU has drawback with it.
Great to see it is solved.

> > Jason has a great point above. So even SMMU should be able to
> > call the iopt_table_add_domain() after a kernel-manged hwpt
> > allocation rather than after an iommu_attach_group(), except
> > an auto_domain or a selftest mock_domain that still needs to
> > attach the device first, otherwise the SMMU driver (currently)
> > cannot finalize the domain aperture.
> 
> Some update today: I found ops->domain_alloc_user() is used
> for all domain allocations inside IOMMUFD. So, without any
> special case, we could entirely do iopt_table_add_domain()
> in the __iommufd_hw_pagetable_alloc() and accordingly do
> iopt_table_remove_domain() in the hw_pagetable_destroy():
> https://github.com/nicolinc/iommufd/commit/85248e1c5f645e1eb701562e
> b078cf586af617fe
> (We can also skip that "symmetric" patch for the list_add,
>  moving the list_add/del calls directly to alloc/destroy.)
> 
> Without the complication of the add/remove_domain() calls
> in the do_attach/detach() functions, there is no need for
> the device_users counter any more.

Yes.

> I am not 100% sure if we still need the shared device lock,
> though so far the sanity that I run doesn't show a problem.
> We may discuss about it later when we converge our branches.
> As before, I am also okay to do in the way with incremental
> changes on top of your tree and to ask you to integrate,
> once you have your branch ready.

I think reusing the device lock can simplify things to avoid bad
readability. However, I'll do a double-check.

> My full wip branch:
> https://github.com/nicolinc/iommufd/commits/wip/iommufd-v6.2-rc5-
> nesting

Thanks, I'm now re-integrating the nesting patches.

Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH v2 1/3] iommufd: Add devices_users to track the hw_pagetable usage by device
  2023-02-01 19:25                     ` Nicolin Chen
  2023-02-01 20:00                       ` Jason Gunthorpe
@ 2023-02-07  4:27                       ` Liu, Yi L
  1 sibling, 0 replies; 44+ messages in thread
From: Liu, Yi L @ 2023-02-07  4:27 UTC (permalink / raw)
  To: Nicolin Chen, Jason Gunthorpe; +Cc: Tian, Kevin, iommu, linux-kernel

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Thursday, February 2, 2023 3:25 AM
> 
> On Wed, Feb 01, 2023 at 02:37:42PM -0400, Jason Gunthorpe wrote:
> > On Wed, Feb 01, 2023 at 09:46:23AM -0800, Nicolin Chen wrote:
> > > > So the issue is with replace you need to have the domain populated
> > > > before we can call replace but you can't populate the domain until it
> > > > is bound because of the above issue? That seems unsovlable without
> > > > fixing up the driver.
> > >
> > > Not really. A REPLACE ioctl is just an ATTACH, if the device just
> > > gets BIND-ed. So the SMMU driver will initialize ("finalise") the
> > > domain during the replace() call, then iopt_table_add_domain() can
> > > be done.
> > >
> > > So, not a blocker here.
> >
> > Well, yes, there sort of is because the whole flow becomes nonsensical
> > - we are supposed to have the iommu_domain populated by the time we
> do
> > replace. Otherwise replace is extra-pointless..
> 
> The "finalise" is one of the very first lines of the attach_dev()
> callback function in SMMU driver, though it might still undesirably
> fail the replace().
> 
> https://github.com/nicolinc/iommufd/commit/5ae54f360495aae35b5967d1
> eb00149912145639
> Btw, this is a draft that I made to move iopt_table_add_domain(). I
> think we can have this with the nesting series.
> 
> Later, once we pass in the dev pointer to the ->domain_alloc op
> using Robin's change, all the iopt_table_add_domain() can be done
> within the hwpt_alloc(), prior to an attach()/replace().
> 
> > > > Is there another issue?
> > >
> > > Oh. I think we mixed the topics here. These three patches were
> > > not to unblock but to clean up a way for the replace series and
> > > the nesting series, for the device locking issue:
> > >
> > > 	if (cur_hwpt != hwpt)
> > > 		mutex_lock(&cur_hwpt->device_lock);
> > > 	mutex_lock(&hwpt->device_lock);
> > > 	...
> > > 	if (iommufd_hw_pagetabe_has_group()) {	// touching device
> list
> > > 		...
> > > 		iommu_group_replace_domain();
> > > 		...
> > > 	}
> > > 	if (cur_hwpt && hwpt)
> > > 		list_del(&idev->devices_item);
> > > 	list_add(&idev->devices_item, &cur_hwpt->devices);
> > > 	...
> > > 	mutex_unlock(&hwpt->device_lock);
> > > 	if (cur_hwpt != hwpt)
> > > 		mutex_unlock(&cur_hwpt->device_lock);
> >
> > What is the issue? That isn't quite right, but the basic bit is fine
> >
> > If you want to do replace then you have to hold both devices_lock and
> > you write that super ugly thing like this
> >
> > lock_both:
> >    if (hwpt_a < hwpt_b) {
> >       mutex_lock(&hwpt_a->devices_lock);
> >       mutex_lock_nested(&hwpt_b->devices_lock);
> >    } else if (hwpt_a > hwpt_b) {
> >       mutex_lock(&hwpt_b->devices_lock);
> >       mutex_lock_nested(&hwpt_a->devices_lock);
> >    } else
> >       mutex_lock(&hwpt_a->devices_lock);
> >
> > And then it is trivial, yes?
> 
> Yea. That's your previous remark.
> 
> > Using the group_lock in the iommu core is the right way to fix
> > this.. Maybe someday we can do that.
> >
> > (also document that replace causes all the devices in the group to
> > change iommu_domains at once)
> 
> Yes. There's a discussion in PATCH-3 of this series. I drafted a
> patch changing iommu_attach/detach_dev():
> https://github.com/nicolinc/iommufd/commit/124f7804ef38d50490b606fd5
> 6c1e27ce551a839
> 
> Baolu had a similar patch series a year ago. So we might continue
> that effort in parallel, and eventually drop the device list/lock.
> 
> > > I just gave another thought about it. Since we have the patch-2
> > > from this series moving the ioas->mutex, it already serializes
> > > attach/detach routines. And I see that all the places touching
> > > idev->device_item and hwpt->devices are protected by ioas->mutex.
> > > So, perhaps we can simply remove the device_lock?
> >
> > The two hwpts are not required to have the same ioas, so this doesn't
> > really help..
> 
> Hmm...in that case, we should hold two ioas->mutex locks in
> addition to two device locks?

Aha, seems so. You can replace a s1 hwpt with another s1 hwpt which
Has a different ioas. 😊 maybe this is something incremental to nesting
series. In nesting series, between ioas and s1 hwpt, they can share
the device_lock. Isn't it?

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2023-02-07  4:27 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-28 21:18 [PATCH v2 0/3] iommufd: Remove iommufd_hw_pagetable_has_group Nicolin Chen
2023-01-28 21:18 ` [PATCH v2 1/3] iommufd: Add devices_users to track the hw_pagetable usage by device Nicolin Chen
2023-01-29  9:23   ` Tian, Kevin
2023-01-29  9:30     ` Nicolin Chen
2023-01-29  9:39       ` Tian, Kevin
2023-01-30  2:22     ` Liu, Yi L
2023-01-30 15:02   ` Jason Gunthorpe
2023-01-30 19:27     ` Nicolin Chen
2023-01-30 19:50       ` Jason Gunthorpe
2023-01-30 20:04         ` Nicolin Chen
2023-01-30 20:35           ` Jason Gunthorpe
2023-01-30 20:53             ` Nicolin Chen
2023-02-01  7:48               ` Nicolin Chen
2023-02-02  9:12                 ` Nicolin Chen
2023-02-07  4:19                   ` Liu, Yi L
2023-02-01  6:57             ` Nicolin Chen
2023-02-01  7:56               ` Nicolin Chen
2023-02-01 15:53               ` Jason Gunthorpe
2023-02-01 17:46                 ` Nicolin Chen
2023-02-01 18:37                   ` Jason Gunthorpe
2023-02-01 19:25                     ` Nicolin Chen
2023-02-01 20:00                       ` Jason Gunthorpe
2023-02-01 21:18                         ` Nicolin Chen
2023-02-02  7:28                           ` Nicolin Chen
2023-02-02 15:03                             ` Jason Gunthorpe
2023-02-07  4:27                       ` Liu, Yi L
2023-01-28 21:18 ` [PATCH v2 2/3] iommufd/device: Make hwpt_list list_add/del symmetric Nicolin Chen
2023-01-29  9:24   ` Tian, Kevin
2023-01-29  9:31     ` Nicolin Chen
2023-01-30 14:59   ` Jason Gunthorpe
2023-01-30 19:03     ` Nicolin Chen
2023-01-30 19:07       ` Jason Gunthorpe
2023-01-30 19:38         ` Nicolin Chen
2023-01-28 21:18 ` [PATCH v2 3/3] iommufd/device: Change iommufd_hw_pagetable_has_group to device centric Nicolin Chen
2023-01-29  9:37   ` Tian, Kevin
2023-01-29 10:38     ` Nicolin Chen
2023-01-30  0:44       ` Tian, Kevin
2023-01-30 10:22       ` Nicolin Chen
2023-02-01  3:07         ` Tian, Kevin
2023-02-01  6:49           ` Baolu Lu
2023-02-01  6:59             ` Tian, Kevin
2023-02-01  7:20               ` Nicolin Chen
2023-02-02  6:32                 ` Tian, Kevin
2023-02-02  6:36                   ` Nicolin Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).