All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 0/8] Add IO page table replacement support
@ 2023-02-02  7:05 Nicolin Chen
  2023-02-02  7:05 ` [PATCH v1 1/8] iommu: Move dev_iommu_ops() to private header Nicolin Chen
                   ` (8 more replies)
  0 siblings, 9 replies; 41+ messages in thread
From: Nicolin Chen @ 2023-02-02  7:05 UTC (permalink / raw)
  To: jgg, kevin.tian, joro, will, robin.murphy, alex.williamson, shuah
  Cc: yi.l.liu, linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

Hi all,

The existing IOMMU APIs provide a pair of functions: iommu_attach_group()
for callers to attach a device from the default_domain (NULL if not being
supported) to a given iommu domain, and iommu_detach_group() for callers
to detach a device from a given domain to the default_domain. Internally,
the detach_dev op is deprecated for the newer drivers with default_domain.
This means that those drivers likely can switch an attaching domain to
another one, without stagging the device at a blocking or default domain,
for use cases such as:
1) vPASID mode, when a guest wants to replace a single pasid (PASID=0)
   table with a larger table (PASID=N)
2) Nesting mode, when switching the attaching device from an S2 domain
   to an S1 domain, or when switching between relevant S1 domains.

This series introduces a new iommu_group_replace_domain() for that. And
add corresponding support throughout the uAPI. So user space can do such
a REPLACE ioctl reusing the existing VFIO_DEVICE_ATTACH_IOMMUFD_PT. This
means that user space needs to be aware whether the device is attached or
not: an unattached device calling VFIO_DEVICE_ATTACH_IOMMUFD_PT means a
regular ATTACH; an attached device calling VFIO_DEVICE_ATTACH_IOMMUFD_PT
on the other hand means a REPLACE.

QEMU with this feature should have the vIOMMU maintain a cache of the
guest io page table addresses and assign a unique IOAS to each unique
guest page table.

As the guest writes the page table address to the HW registers qemu should
then use the 'replace domain' operation on VFIO to assign the VFIO device
to the correct de-duplicated page table.

The algorithm where QEMU uses one VFIO container per-device and removes
all the mappings to change the assignment should ideally not be used with
iommufd.

To apply this series, please rebase on top of the following patches:
1) [PATCH 00/13] Add vfio_device cdev for iommufd support
   https://lore.kernel.org/kvm/20230117134942.101112-1-yi.l.liu@intel.com/
2) (Merged) [PATCH v5 0/5] iommu: Retire detach_dev callback
   https://lore.kernel.org/linux-iommu/20230110025408.667767-1-baolu.lu@linux.intel.com/
3) (Merged) [PATCH] selftests: iommu: Fix test_cmd_destroy_access() call in user_copy
   https://lore.kernel.org/lkml/20230120074204.1368-1-nicolinc@nvidia.com/

Or you can also find this series on Github:
https://github.com/nicolinc/iommufd/commits/iommu_group_replace_domain-v1

Thank you
Nicolin Chen

Nicolin Chen (7):
  iommu: Introduce a new iommu_group_replace_domain() API
  iommufd: Create access in vfio_iommufd_emulated_bind()
  iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_SET_IOAS coverage
  iommufd: Add replace support in iommufd_access_set_ioas()
  iommufd/selftest: Add coverage for access->ioas replacement
  iommufd/device: Use iommu_group_replace_domain()
  vfio-iommufd: Support IO page table replacement

Yi Liu (1):
  iommu: Move dev_iommu_ops() to private header

 drivers/iommu/iommu-priv.h                    |  22 +++
 drivers/iommu/iommu.c                         |  32 ++++
 drivers/iommu/iommufd/device.c                | 150 +++++++++++++++---
 drivers/iommu/iommufd/iommufd_private.h       |   4 +
 drivers/iommu/iommufd/iommufd_test.h          |   4 +
 drivers/iommu/iommufd/selftest.c              |  25 ++-
 drivers/vfio/iommufd.c                        |  33 ++--
 include/linux/iommu.h                         |  11 --
 include/linux/iommufd.h                       |   4 +-
 tools/testing/selftests/iommu/iommufd.c       |  29 +++-
 tools/testing/selftests/iommu/iommufd_utils.h |  22 ++-
 11 files changed, 273 insertions(+), 63 deletions(-)
 create mode 100644 drivers/iommu/iommu-priv.h

-- 
2.39.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v1 1/8] iommu: Move dev_iommu_ops() to private header
  2023-02-02  7:05 [PATCH v1 0/8] Add IO page table replacement support Nicolin Chen
@ 2023-02-02  7:05 ` Nicolin Chen
  2023-02-02  7:05 ` [PATCH v1 2/8] iommu: Introduce a new iommu_group_replace_domain() API Nicolin Chen
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 41+ messages in thread
From: Nicolin Chen @ 2023-02-02  7:05 UTC (permalink / raw)
  To: jgg, kevin.tian, joro, will, robin.murphy, alex.williamson, shuah
  Cc: yi.l.liu, linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

From: Yi Liu <yi.l.liu@intel.com>

dev_iommu_ops() is essentially only used in iommu subsystem, so
move to a private header to avoid being abused by other drivers.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/iommu-priv.h | 18 ++++++++++++++++++
 drivers/iommu/iommu.c      |  2 ++
 include/linux/iommu.h      | 11 -----------
 3 files changed, 20 insertions(+), 11 deletions(-)
 create mode 100644 drivers/iommu/iommu-priv.h

diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
new file mode 100644
index 000000000000..9e1497027cff
--- /dev/null
+++ b/drivers/iommu/iommu-priv.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef __LINUX_IOMMU_PRIV_H
+#define __LINUX_IOMMU_PRIV_H
+
+#include <linux/iommu.h>
+
+static inline const struct iommu_ops *dev_iommu_ops(struct device *dev)
+{
+	/*
+	 * Assume that valid ops must be installed if iommu_probe_device()
+	 * has succeeded. The device ops are essentially for internal use
+	 * within the IOMMU subsystem itself, so we should be able to trust
+	 * ourselves not to misuse the helper.
+	 */
+	return dev->iommu->iommu_dev->ops;
+}
+#endif /* __LINUX_IOMMU_PRIV_H */
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 4b5a21ac5e88..a18b7f1a4e6e 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -35,6 +35,8 @@
 
 #include "iommu-sva.h"
 
+#include "iommu-priv.h"
+
 static struct kset *iommu_group_kset;
 static DEFINE_IDA(iommu_group_ida);
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a8063f26ff69..cb586d054c57 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -445,17 +445,6 @@ static inline void iommu_iotlb_gather_init(struct iommu_iotlb_gather *gather)
 	};
 }
 
-static inline const struct iommu_ops *dev_iommu_ops(struct device *dev)
-{
-	/*
-	 * Assume that valid ops must be installed if iommu_probe_device()
-	 * has succeeded. The device ops are essentially for internal use
-	 * within the IOMMU subsystem itself, so we should be able to trust
-	 * ourselves not to misuse the helper.
-	 */
-	return dev->iommu->iommu_dev->ops;
-}
-
 extern int bus_iommu_probe(struct bus_type *bus);
 extern bool iommu_present(struct bus_type *bus);
 extern bool device_iommu_capable(struct device *dev, enum iommu_cap cap);
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v1 2/8] iommu: Introduce a new iommu_group_replace_domain() API
  2023-02-02  7:05 [PATCH v1 0/8] Add IO page table replacement support Nicolin Chen
  2023-02-02  7:05 ` [PATCH v1 1/8] iommu: Move dev_iommu_ops() to private header Nicolin Chen
@ 2023-02-02  7:05 ` Nicolin Chen
  2023-02-02 10:21   ` Baolu Lu
  2023-02-03  8:26   ` Tian, Kevin
  2023-02-02  7:05 ` [PATCH v1 3/8] iommufd: Create access in vfio_iommufd_emulated_bind() Nicolin Chen
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 41+ messages in thread
From: Nicolin Chen @ 2023-02-02  7:05 UTC (permalink / raw)
  To: jgg, kevin.tian, joro, will, robin.murphy, alex.williamson, shuah
  Cc: yi.l.liu, linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

qemu has a need to replace the translations associated with a domain
when the guest does large-scale operations like switching between an
IDENTITY domain and, say, dma-iommu.c.

Currently, it does this by replacing all the mappings in a single
domain, but this is very inefficient and means that domains have to be
per-device rather than per-translation.

Provide a high-level API to allow replacements of one domain with
another. This is similar to a detach/attach cycle except it doesn't
force the group to go to the blocking domain in-between.

By removing this forced blocking domain the iommu driver has the
opportunity to implement an atomic replacement of the domains to the
greatest extent its hardware allows.

Atomic replacement allows the qemu emulation of the viommu to be more
complete, as real hardware has this ability.

All drivers are already required to support changing between active
UNMANAGED domains when using their attach_dev ops.

This API is expected to be used by IOMMUFD, so add to the iommu-priv
header and mark it as IOMMUFD_INTERNAL.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/iommu-priv.h |  4 ++++
 drivers/iommu/iommu.c      | 30 ++++++++++++++++++++++++++++++
 2 files changed, 34 insertions(+)

diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
index 9e1497027cff..b546795a7e49 100644
--- a/drivers/iommu/iommu-priv.h
+++ b/drivers/iommu/iommu-priv.h
@@ -15,4 +15,8 @@ static inline const struct iommu_ops *dev_iommu_ops(struct device *dev)
 	 */
 	return dev->iommu->iommu_dev->ops;
 }
+
+extern int iommu_group_replace_domain(struct iommu_group *group,
+				      struct iommu_domain *new_domain);
+
 #endif /* __LINUX_IOMMU_PRIV_H */
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index a18b7f1a4e6e..c35da7a5c0d4 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2151,6 +2151,36 @@ int iommu_attach_group(struct iommu_domain *domain, struct iommu_group *group)
 }
 EXPORT_SYMBOL_GPL(iommu_attach_group);
 
+/**
+ * iommu_group_replace_domain - replace the domain that a group is attached to
+ * @new_domain: new IOMMU domain to replace with
+ * @group: IOMMU group that will be attached to the new domain
+ *
+ * This API allows the group to switch domains without being forced to go to
+ * the blocking domain in-between.
+ *
+ * If the attached domain is a core domain (e.g. a default_domain), it will act
+ * just like the iommu_attach_group().
+ */
+int iommu_group_replace_domain(struct iommu_group *group,
+			       struct iommu_domain *new_domain)
+{
+	int ret;
+
+	if (!new_domain)
+		return -EINVAL;
+
+	mutex_lock(&group->mutex);
+	ret = __iommu_group_set_domain(group, new_domain);
+	if (ret) {
+		if (__iommu_group_set_domain(group, group->domain))
+			__iommu_group_set_core_domain(group);
+	}
+	mutex_unlock(&group->mutex);
+	return ret;
+}
+EXPORT_SYMBOL_NS_GPL(iommu_group_replace_domain, IOMMUFD_INTERNAL);
+
 static int iommu_group_do_set_platform_dma(struct device *dev, void *data)
 {
 	const struct iommu_ops *ops = dev_iommu_ops(dev);
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v1 3/8] iommufd: Create access in vfio_iommufd_emulated_bind()
  2023-02-02  7:05 [PATCH v1 0/8] Add IO page table replacement support Nicolin Chen
  2023-02-02  7:05 ` [PATCH v1 1/8] iommu: Move dev_iommu_ops() to private header Nicolin Chen
  2023-02-02  7:05 ` [PATCH v1 2/8] iommu: Introduce a new iommu_group_replace_domain() API Nicolin Chen
@ 2023-02-02  7:05 ` Nicolin Chen
  2023-02-03  9:23   ` Tian, Kevin
  2023-02-03  9:28   ` Tian, Kevin
  2023-02-02  7:05 ` [PATCH v1 4/8] iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_SET_IOAS coverage Nicolin Chen
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 41+ messages in thread
From: Nicolin Chen @ 2023-02-02  7:05 UTC (permalink / raw)
  To: jgg, kevin.tian, joro, will, robin.murphy, alex.williamson, shuah
  Cc: yi.l.liu, linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

To prepare for an access->ioas replacement, move iommufd_access_create()
call into vfio_iommufd_emulated_bind(), making it symmetric with the
__vfio_iommufd_access_destroy() call in vfio_iommufd_emulated_unbind().
This means an access is created/destroyed by the bind()/unbind(), and the
vfio_iommufd_emulated_attach_ioas() only updates the access->ioas pointer.

Since there's no longer an ioas_id input for iommufd_access_create(), add
a new helper iommufd_access_set_ioas() to set access->ioas. We can later
add a "replace" feature simply to the new iommufd_access_set_ioas() too.

Leaving the access->ioas in vfio_iommufd_emulated_attach_ioas(), however,
can introduce some potential of a race condition during pin_/unpin_pages()
call where access->ioas->iopt is getting referenced. So, add an ioas_lock
to protect it.

Note that the "refcount_dec(&access->ioas->obj.users)" line is also moved
to the new iommufd_access_set_ioas() from iommufd_access_destroy_object()
for symmetry. Without this change, the old_ioas would also lose the track
of its refcount when the replace support is added.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/iommufd/device.c          | 100 ++++++++++++++++++------
 drivers/iommu/iommufd/iommufd_private.h |   1 +
 drivers/iommu/iommufd/selftest.c        |   5 +-
 drivers/vfio/iommufd.c                  |  30 +++----
 include/linux/iommufd.h                 |   3 +-
 5 files changed, 96 insertions(+), 43 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index d81f93a321af..f4bd6f532a90 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -418,9 +418,9 @@ void iommufd_access_destroy_object(struct iommufd_object *obj)
 	struct iommufd_access *access =
 		container_of(obj, struct iommufd_access, obj);
 
-	iopt_remove_access(&access->ioas->iopt, access);
+	iommufd_access_set_ioas(access, 0);
 	iommufd_ctx_put(access->ictx);
-	refcount_dec(&access->ioas->obj.users);
+	mutex_destroy(&access->ioas_lock);
 }
 
 /**
@@ -437,12 +437,10 @@ void iommufd_access_destroy_object(struct iommufd_object *obj)
  * The provided ops are required to use iommufd_access_pin_pages().
  */
 struct iommufd_access *
-iommufd_access_create(struct iommufd_ctx *ictx, u32 ioas_id,
+iommufd_access_create(struct iommufd_ctx *ictx,
 		      const struct iommufd_access_ops *ops, void *data)
 {
 	struct iommufd_access *access;
-	struct iommufd_object *obj;
-	int rc;
 
 	/*
 	 * There is no uAPI for the access object, but to keep things symmetric
@@ -455,33 +453,18 @@ iommufd_access_create(struct iommufd_ctx *ictx, u32 ioas_id,
 	access->data = data;
 	access->ops = ops;
 
-	obj = iommufd_get_object(ictx, ioas_id, IOMMUFD_OBJ_IOAS);
-	if (IS_ERR(obj)) {
-		rc = PTR_ERR(obj);
-		goto out_abort;
-	}
-	access->ioas = container_of(obj, struct iommufd_ioas, obj);
-	iommufd_ref_to_users(obj);
-
 	if (ops->needs_pin_pages)
 		access->iova_alignment = PAGE_SIZE;
 	else
 		access->iova_alignment = 1;
-	rc = iopt_add_access(&access->ioas->iopt, access);
-	if (rc)
-		goto out_put_ioas;
 
 	/* The calling driver is a user until iommufd_access_destroy() */
 	refcount_inc(&access->obj.users);
+	mutex_init(&access->ioas_lock);
 	access->ictx = ictx;
 	iommufd_ctx_get(ictx);
 	iommufd_object_finalize(ictx, &access->obj);
 	return access;
-out_put_ioas:
-	refcount_dec(&access->ioas->obj.users);
-out_abort:
-	iommufd_object_abort(ictx, &access->obj);
-	return ERR_PTR(rc);
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_access_create, IOMMUFD);
 
@@ -500,6 +483,50 @@ void iommufd_access_destroy(struct iommufd_access *access)
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_access_destroy, IOMMUFD);
 
+int iommufd_access_set_ioas(struct iommufd_access *access, u32 ioas_id)
+{
+	struct iommufd_ioas *new_ioas = NULL, *cur_ioas;
+	struct iommufd_ctx *ictx = access->ictx;
+	struct iommufd_object *obj;
+	int rc = 0;
+
+	if (ioas_id) {
+		obj = iommufd_get_object(ictx, ioas_id, IOMMUFD_OBJ_IOAS);
+		if (IS_ERR(obj))
+			return PTR_ERR(obj);
+		new_ioas = container_of(obj, struct iommufd_ioas, obj);
+	}
+
+	mutex_lock(&access->ioas_lock);
+	cur_ioas = access->ioas;
+	if (cur_ioas == new_ioas)
+		goto out_unlock;
+
+	if (new_ioas) {
+		rc = iopt_add_access(&new_ioas->iopt, access);
+		if (rc)
+			goto out_unlock;
+		iommufd_ref_to_users(obj);
+	}
+
+	if (cur_ioas) {
+		iopt_remove_access(&cur_ioas->iopt, access);
+		refcount_dec(&cur_ioas->obj.users);
+	}
+
+	access->ioas = new_ioas;
+	mutex_unlock(&access->ioas_lock);
+
+	return 0;
+
+out_unlock:
+	mutex_unlock(&access->ioas_lock);
+	if (new_ioas)
+		iommufd_put_object(obj);
+	return rc;
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_access_set_ioas, IOMMUFD);
+
 /**
  * iommufd_access_notify_unmap - Notify users of an iopt to stop using it
  * @iopt: iopt to work on
@@ -550,8 +577,8 @@ void iommufd_access_notify_unmap(struct io_pagetable *iopt, unsigned long iova,
 void iommufd_access_unpin_pages(struct iommufd_access *access,
 				unsigned long iova, unsigned long length)
 {
-	struct io_pagetable *iopt = &access->ioas->iopt;
 	struct iopt_area_contig_iter iter;
+	struct io_pagetable *iopt;
 	unsigned long last_iova;
 	struct iopt_area *area;
 
@@ -559,6 +586,13 @@ void iommufd_access_unpin_pages(struct iommufd_access *access,
 	    WARN_ON(check_add_overflow(iova, length - 1, &last_iova)))
 		return;
 
+	mutex_lock(&access->ioas_lock);
+	if (!access->ioas) {
+		mutex_unlock(&access->ioas_lock);
+		return;
+	}
+	iopt = &access->ioas->iopt;
+
 	down_read(&iopt->iova_rwsem);
 	iopt_for_each_contig_area(&iter, area, iopt, iova, last_iova)
 		iopt_area_remove_access(
@@ -568,6 +602,7 @@ void iommufd_access_unpin_pages(struct iommufd_access *access,
 				min(last_iova, iopt_area_last_iova(area))));
 	up_read(&iopt->iova_rwsem);
 	WARN_ON(!iopt_area_contig_done(&iter));
+	mutex_unlock(&access->ioas_lock);
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_access_unpin_pages, IOMMUFD);
 
@@ -613,8 +648,8 @@ int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova,
 			     unsigned long length, struct page **out_pages,
 			     unsigned int flags)
 {
-	struct io_pagetable *iopt = &access->ioas->iopt;
 	struct iopt_area_contig_iter iter;
+	struct io_pagetable *iopt;
 	unsigned long last_iova;
 	struct iopt_area *area;
 	int rc;
@@ -629,6 +664,13 @@ int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova,
 	if (check_add_overflow(iova, length - 1, &last_iova))
 		return -EOVERFLOW;
 
+	mutex_lock(&access->ioas_lock);
+	if (!access->ioas) {
+		mutex_unlock(&access->ioas_lock);
+		return -ENOENT;
+	}
+	iopt = &access->ioas->iopt;
+
 	down_read(&iopt->iova_rwsem);
 	iopt_for_each_contig_area(&iter, area, iopt, iova, last_iova) {
 		unsigned long last = min(last_iova, iopt_area_last_iova(area));
@@ -659,6 +701,7 @@ int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova,
 	}
 
 	up_read(&iopt->iova_rwsem);
+	mutex_unlock(&access->ioas_lock);
 	return 0;
 
 err_remove:
@@ -673,6 +716,7 @@ int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova,
 						  iopt_area_last_iova(area))));
 	}
 	up_read(&iopt->iova_rwsem);
+	mutex_unlock(&access->ioas_lock);
 	return rc;
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_access_pin_pages, IOMMUFD);
@@ -692,8 +736,8 @@ EXPORT_SYMBOL_NS_GPL(iommufd_access_pin_pages, IOMMUFD);
 int iommufd_access_rw(struct iommufd_access *access, unsigned long iova,
 		      void *data, size_t length, unsigned int flags)
 {
-	struct io_pagetable *iopt = &access->ioas->iopt;
 	struct iopt_area_contig_iter iter;
+	struct io_pagetable *iopt;
 	struct iopt_area *area;
 	unsigned long last_iova;
 	int rc;
@@ -703,6 +747,13 @@ int iommufd_access_rw(struct iommufd_access *access, unsigned long iova,
 	if (check_add_overflow(iova, length - 1, &last_iova))
 		return -EOVERFLOW;
 
+	mutex_lock(&access->ioas_lock);
+	if (!access->ioas) {
+		mutex_unlock(&access->ioas_lock);
+		return -ENOENT;
+	}
+	iopt = &access->ioas->iopt;
+
 	down_read(&iopt->iova_rwsem);
 	iopt_for_each_contig_area(&iter, area, iopt, iova, last_iova) {
 		unsigned long last = min(last_iova, iopt_area_last_iova(area));
@@ -729,6 +780,7 @@ int iommufd_access_rw(struct iommufd_access *access, unsigned long iova,
 		rc = -ENOENT;
 err_out:
 	up_read(&iopt->iova_rwsem);
+	mutex_unlock(&access->ioas_lock);
 	return rc;
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_access_rw, IOMMUFD);
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 222e86591f8a..2f4bb106bac6 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -261,6 +261,7 @@ struct iommufd_access {
 	struct iommufd_object obj;
 	struct iommufd_ctx *ictx;
 	struct iommufd_ioas *ioas;
+	struct mutex ioas_lock;
 	const struct iommufd_access_ops *ops;
 	void *data;
 	unsigned long iova_alignment;
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index cfb5fe9a5e0e..db4011bdc8a9 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -571,7 +571,7 @@ static int iommufd_test_create_access(struct iommufd_ucmd *ucmd,
 	}
 
 	access = iommufd_access_create(
-		ucmd->ictx, ioas_id,
+		ucmd->ictx,
 		(flags & MOCK_FLAGS_ACCESS_CREATE_NEEDS_PIN_PAGES) ?
 			&selftest_access_ops_pin :
 			&selftest_access_ops,
@@ -580,6 +580,9 @@ static int iommufd_test_create_access(struct iommufd_ucmd *ucmd,
 		rc = PTR_ERR(access);
 		goto out_put_fdno;
 	}
+	rc = iommufd_access_set_ioas(access, ioas_id);
+	if (rc)
+		goto out_destroy;
 	cmd->create_access.out_access_fd = fdno;
 	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
 	if (rc)
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index 412644fdbf16..78a8e4641cbf 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -144,10 +144,19 @@ static const struct iommufd_access_ops vfio_user_ops = {
 int vfio_iommufd_emulated_bind(struct vfio_device *vdev,
 			       struct iommufd_ctx *ictx, u32 *out_device_id)
 {
+	struct iommufd_access *user;
+
 	lockdep_assert_held(&vdev->dev_set->lock);
 
-	vdev->iommufd_ictx = ictx;
 	iommufd_ctx_get(ictx);
+	user = iommufd_access_create(vdev->iommufd_ictx, &vfio_user_ops, vdev);
+	if (IS_ERR(user)) {
+		iommufd_ctx_put(vdev->iommufd_ictx);
+		return PTR_ERR(user);
+	}
+	iommufd_access_set_ioas(user, 0);
+	vdev->iommufd_access = user;
+	vdev->iommufd_ictx = ictx;
 	return 0;
 }
 EXPORT_SYMBOL_GPL(vfio_iommufd_emulated_bind);
@@ -171,27 +180,14 @@ EXPORT_SYMBOL_GPL(vfio_iommufd_emulated_unbind);
 
 int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
 {
-	struct iommufd_access *user;
-
 	lockdep_assert_held(&vdev->dev_set->lock);
 
 	if (!vdev->iommufd_ictx)
 		return -EINVAL;
+	if (!vdev->iommufd_access)
+		return -ENOENT;
 
-	if (!pt_id) {
-		if (vdev->iommufd_access)
-			__vfio_iommufd_access_destroy(vdev);
-		return 0;
-	}
-
-	if (vdev->iommufd_access)
-		return -EBUSY;
-
-	user = iommufd_access_create(vdev->iommufd_ictx, *pt_id, &vfio_user_ops,
-				     vdev);
-	if (IS_ERR(user))
-		return PTR_ERR(user);
-	vdev->iommufd_access = user;
+	iommufd_access_set_ioas(vdev->iommufd_access, *pt_id);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(vfio_iommufd_emulated_attach_ioas);
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index 650d45629647..0e30f2ce27cd 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -40,9 +40,10 @@ enum {
 };
 
 struct iommufd_access *
-iommufd_access_create(struct iommufd_ctx *ictx, u32 ioas_id,
+iommufd_access_create(struct iommufd_ctx *ictx,
 		      const struct iommufd_access_ops *ops, void *data);
 void iommufd_access_destroy(struct iommufd_access *access);
+int iommufd_access_set_ioas(struct iommufd_access *access, u32 ioas_id);
 
 void iommufd_ctx_get(struct iommufd_ctx *ictx);
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v1 4/8] iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_SET_IOAS coverage
  2023-02-02  7:05 [PATCH v1 0/8] Add IO page table replacement support Nicolin Chen
                   ` (2 preceding siblings ...)
  2023-02-02  7:05 ` [PATCH v1 3/8] iommufd: Create access in vfio_iommufd_emulated_bind() Nicolin Chen
@ 2023-02-02  7:05 ` Nicolin Chen
  2023-02-02  7:05 ` [PATCH v1 5/8] iommufd: Add replace support in iommufd_access_set_ioas() Nicolin Chen
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 41+ messages in thread
From: Nicolin Chen @ 2023-02-02  7:05 UTC (permalink / raw)
  To: jgg, kevin.tian, joro, will, robin.murphy, alex.williamson, shuah
  Cc: yi.l.liu, linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

Add a new IOMMU_TEST_OP_ACCESS_SET_IOAS to allow setting access->ioas
individually, corresponding to the iommufd_access_set_ioas() helper.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/iommufd/iommufd_test.h          |  4 +++
 drivers/iommu/iommufd/selftest.c              | 26 +++++++++++++++----
 tools/testing/selftests/iommu/iommufd_utils.h | 22 ++++++++++++++--
 3 files changed, 45 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index 1d96a8f466fd..f2c61a9500e7 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -13,6 +13,7 @@ enum {
 	IOMMU_TEST_OP_MD_CHECK_MAP,
 	IOMMU_TEST_OP_MD_CHECK_REFS,
 	IOMMU_TEST_OP_CREATE_ACCESS,
+	IOMMU_TEST_OP_ACCESS_SET_IOAS,
 	IOMMU_TEST_OP_DESTROY_ACCESS_PAGES,
 	IOMMU_TEST_OP_ACCESS_PAGES,
 	IOMMU_TEST_OP_ACCESS_RW,
@@ -66,6 +67,9 @@ struct iommu_test_cmd {
 			__u32 out_access_fd;
 			__u32 flags;
 		} create_access;
+		struct {
+			__u32 ioas_id;
+		} access_set_ioas;
 		struct {
 			__u32 access_pages_id;
 		} destroy_access_pages;
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index db4011bdc8a9..b94870f93138 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -549,7 +549,7 @@ static struct selftest_access *iommufd_test_alloc_access(void)
 }
 
 static int iommufd_test_create_access(struct iommufd_ucmd *ucmd,
-				      unsigned int ioas_id, unsigned int flags)
+				      unsigned int flags)
 {
 	struct iommu_test_cmd *cmd = ucmd->cmd;
 	struct selftest_access *staccess;
@@ -580,9 +580,6 @@ static int iommufd_test_create_access(struct iommufd_ucmd *ucmd,
 		rc = PTR_ERR(access);
 		goto out_put_fdno;
 	}
-	rc = iommufd_access_set_ioas(access, ioas_id);
-	if (rc)
-		goto out_destroy;
 	cmd->create_access.out_access_fd = fdno;
 	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
 	if (rc)
@@ -601,6 +598,22 @@ static int iommufd_test_create_access(struct iommufd_ucmd *ucmd,
 	return rc;
 }
 
+static int iommufd_test_access_set_ioas(struct iommufd_ucmd *ucmd,
+					unsigned int access_id,
+					unsigned int ioas_id)
+{
+	struct selftest_access *staccess;
+	int rc;
+
+	staccess = iommufd_access_get(access_id);
+	if (IS_ERR(staccess))
+		return PTR_ERR(staccess);
+
+	rc = iommufd_access_set_ioas(staccess->access, ioas_id);
+	fput(staccess->file);
+	return rc;
+}
+
 /* Check that the pages in a page array match the pages in the user VA */
 static int iommufd_test_check_pages(void __user *uptr, struct page **pages,
 				    size_t npages)
@@ -810,8 +823,11 @@ int iommufd_test(struct iommufd_ucmd *ucmd)
 			ucmd, u64_to_user_ptr(cmd->check_refs.uptr),
 			cmd->check_refs.length, cmd->check_refs.refs);
 	case IOMMU_TEST_OP_CREATE_ACCESS:
-		return iommufd_test_create_access(ucmd, cmd->id,
+		return iommufd_test_create_access(ucmd,
 						  cmd->create_access.flags);
+	case IOMMU_TEST_OP_ACCESS_SET_IOAS:
+		return iommufd_test_access_set_ioas(
+			ucmd, cmd->id, cmd->access_set_ioas.ioas_id);
 	case IOMMU_TEST_OP_ACCESS_PAGES:
 		return iommufd_test_access_pages(
 			ucmd, cmd->id, cmd->access_pages.iova,
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 0d1f46369c2a..67805afc620f 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -66,13 +66,31 @@ static int _test_cmd_mock_domain(int fd, unsigned int ioas_id, __u32 *device_id,
 	EXPECT_ERRNO(_errno, _test_cmd_mock_domain(self->fd, ioas_id, \
 						   device_id, hwpt_id))
 
+static int _test_cmd_access_set_ioas(int fd, __u32 access_id,
+				     unsigned int ioas_id)
+{
+	struct iommu_test_cmd cmd = {
+		.size = sizeof(cmd),
+		.op = IOMMU_TEST_OP_ACCESS_SET_IOAS,
+		.id = access_id,
+		.access_set_ioas = { .ioas_id = ioas_id },
+	};
+	int ret;
+
+	ret = ioctl(fd, IOMMU_TEST_CMD, &cmd);
+	if (ret)
+		return ret;
+	return 0;
+}
+#define test_cmd_access_set_ioas(access_id, ioas_id) \
+	ASSERT_EQ(0, _test_cmd_access_set_ioas(self->fd, access_id, ioas_id))
+
 static int _test_cmd_create_access(int fd, unsigned int ioas_id,
 				   __u32 *access_id, unsigned int flags)
 {
 	struct iommu_test_cmd cmd = {
 		.size = sizeof(cmd),
 		.op = IOMMU_TEST_OP_CREATE_ACCESS,
-		.id = ioas_id,
 		.create_access = { .flags = flags },
 	};
 	int ret;
@@ -81,7 +99,7 @@ static int _test_cmd_create_access(int fd, unsigned int ioas_id,
 	if (ret)
 		return ret;
 	*access_id = cmd.create_access.out_access_fd;
-	return 0;
+	return _test_cmd_access_set_ioas(fd, *access_id, ioas_id);
 }
 #define test_cmd_create_access(ioas_id, access_id, flags)                  \
 	ASSERT_EQ(0, _test_cmd_create_access(self->fd, ioas_id, access_id, \
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v1 5/8] iommufd: Add replace support in iommufd_access_set_ioas()
  2023-02-02  7:05 [PATCH v1 0/8] Add IO page table replacement support Nicolin Chen
                   ` (3 preceding siblings ...)
  2023-02-02  7:05 ` [PATCH v1 4/8] iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_SET_IOAS coverage Nicolin Chen
@ 2023-02-02  7:05 ` Nicolin Chen
  2023-02-03 10:10   ` Tian, Kevin
  2023-02-02  7:05 ` [PATCH v1 6/8] iommufd/selftest: Add coverage for access->ioas replacement Nicolin Chen
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 41+ messages in thread
From: Nicolin Chen @ 2023-02-02  7:05 UTC (permalink / raw)
  To: jgg, kevin.tian, joro, will, robin.murphy, alex.williamson, shuah
  Cc: yi.l.liu, linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

Support an access->ioas replacement in iommufd_access_set_ioas(), which
sets the access->ioas to NULL provisionally so that any further incoming
iommufd_access_pin_pages() callback can be blocked.

Then, call access->ops->unmap() to clean up the entire iopt. To allow an
iommufd_access_unpin_pages() callback to happen via this unmap() call,
add an ioas_unpin pointer so the unpin routine won't be affected by the
"access->ioas = NULL" trick above.

Also, a vdev without an ops->dma_unmap implementation cannot replace its
access->ioas pointer. So add an iommufd_access_ioas_is_attached() helper
to sanity that.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/iommufd/device.c          | 28 +++++++++++++++++++++++--
 drivers/iommu/iommufd/iommufd_private.h |  1 +
 drivers/vfio/iommufd.c                  |  4 ++++
 include/linux/iommufd.h                 |  1 +
 4 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index f4bd6f532a90..8118d5262800 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -509,11 +509,23 @@ int iommufd_access_set_ioas(struct iommufd_access *access, u32 ioas_id)
 		iommufd_ref_to_users(obj);
 	}
 
+	/*
+	 * Set ioas to NULL to block any further iommufd_access_pin_pages().
+	 * iommufd_access_unpin_pages() can continue using access->ioas_unpin.
+	 */
+	access->ioas = NULL;
+
 	if (cur_ioas) {
+		if (new_ioas) {
+			mutex_unlock(&access->ioas_lock);
+			access->ops->unmap(access->data, 0, ULONG_MAX);
+			mutex_lock(&access->ioas_lock);
+		}
 		iopt_remove_access(&cur_ioas->iopt, access);
 		refcount_dec(&cur_ioas->obj.users);
 	}
 
+	access->ioas_unpin = new_ioas;
 	access->ioas = new_ioas;
 	mutex_unlock(&access->ioas_lock);
 
@@ -527,6 +539,18 @@ int iommufd_access_set_ioas(struct iommufd_access *access, u32 ioas_id)
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_access_set_ioas, IOMMUFD);
 
+bool iommufd_access_ioas_is_attached(struct iommufd_access *access)
+{
+	bool attached;
+
+	mutex_lock(&access->ioas_lock);
+	attached = !!access->ioas;
+	mutex_unlock(&access->ioas_lock);
+
+	return attached;
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_access_ioas_is_attached, IOMMUFD);
+
 /**
  * iommufd_access_notify_unmap - Notify users of an iopt to stop using it
  * @iopt: iopt to work on
@@ -587,11 +611,11 @@ void iommufd_access_unpin_pages(struct iommufd_access *access,
 		return;
 
 	mutex_lock(&access->ioas_lock);
-	if (!access->ioas) {
+	if (!access->ioas_unpin) {
 		mutex_unlock(&access->ioas_lock);
 		return;
 	}
-	iopt = &access->ioas->iopt;
+	iopt = &access->ioas_unpin->iopt;
 
 	down_read(&iopt->iova_rwsem);
 	iopt_for_each_contig_area(&iter, area, iopt, iova, last_iova)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 2f4bb106bac6..593138bb37b8 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -261,6 +261,7 @@ struct iommufd_access {
 	struct iommufd_object obj;
 	struct iommufd_ctx *ictx;
 	struct iommufd_ioas *ioas;
+	struct iommufd_ioas *ioas_unpin;
 	struct mutex ioas_lock;
 	const struct iommufd_access_ops *ops;
 	void *data;
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index 78a8e4641cbf..7e09defbcffe 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -187,6 +187,10 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
 	if (!vdev->iommufd_access)
 		return -ENOENT;
 
+	if (!vdev->ops->dma_unmap && pt_id &&
+	    iommufd_access_ioas_is_attached(vdev->iommufd_access))
+		return -EBUSY;
+
 	iommufd_access_set_ioas(vdev->iommufd_access, *pt_id);
 	return 0;
 }
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index 0e30f2ce27cd..e8d764b2c14c 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -44,6 +44,7 @@ iommufd_access_create(struct iommufd_ctx *ictx,
 		      const struct iommufd_access_ops *ops, void *data);
 void iommufd_access_destroy(struct iommufd_access *access);
 int iommufd_access_set_ioas(struct iommufd_access *access, u32 ioas_id);
+bool iommufd_access_ioas_is_attached(struct iommufd_access *access);
 
 void iommufd_ctx_get(struct iommufd_ctx *ictx);
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v1 6/8] iommufd/selftest: Add coverage for access->ioas replacement
  2023-02-02  7:05 [PATCH v1 0/8] Add IO page table replacement support Nicolin Chen
                   ` (4 preceding siblings ...)
  2023-02-02  7:05 ` [PATCH v1 5/8] iommufd: Add replace support in iommufd_access_set_ioas() Nicolin Chen
@ 2023-02-02  7:05 ` Nicolin Chen
  2023-02-02  7:05 ` [PATCH v1 7/8] iommufd/device: Use iommu_group_replace_domain() Nicolin Chen
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 41+ messages in thread
From: Nicolin Chen @ 2023-02-02  7:05 UTC (permalink / raw)
  To: jgg, kevin.tian, joro, will, robin.murphy, alex.williamson, shuah
  Cc: yi.l.liu, linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

Add replace coverage as a part of user_copy test case. It basically
repeats the copy test after replacing the old ioas with a new one.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 tools/testing/selftests/iommu/iommufd.c | 29 +++++++++++++++++++++++--
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index fa08209268c4..1e293950ac88 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -1239,7 +1239,13 @@ TEST_F(iommufd_mock_domain, user_copy)
 		.dst_iova = MOCK_APERTURE_START,
 		.length = BUFFER_SIZE,
 	};
-	unsigned int ioas_id;
+	struct iommu_ioas_unmap unmap_cmd = {
+		.size = sizeof(unmap_cmd),
+		.ioas_id = self->ioas_id,
+		.iova = MOCK_APERTURE_START,
+		.length = BUFFER_SIZE,
+	};
+	unsigned int new_ioas_id, ioas_id;
 
 	/* Pin the pages in an IOAS with no domains then copy to an IOAS with domains */
 	test_ioctl_ioas_alloc(&ioas_id);
@@ -1257,11 +1263,30 @@ TEST_F(iommufd_mock_domain, user_copy)
 	ASSERT_EQ(0, ioctl(self->fd, IOMMU_IOAS_COPY, &copy_cmd));
 	check_mock_iova(buffer, MOCK_APERTURE_START, BUFFER_SIZE);
 
+	/* Now replace the ioas with a new one */
+	test_ioctl_ioas_alloc(&new_ioas_id);
+	test_ioctl_ioas_map_id(new_ioas_id, buffer, BUFFER_SIZE,
+			       &copy_cmd.src_iova);
+	test_cmd_access_set_ioas(access_cmd.id, new_ioas_id);
+
+	/* Destroy the old ioas and cleanup copied mapping */
+	ASSERT_EQ(0, ioctl(self->fd, IOMMU_IOAS_UNMAP, &unmap_cmd));
+	test_ioctl_destroy(ioas_id);
+
+	/* Then run the same test again with the new ioas */
+	access_cmd.access_pages.iova = copy_cmd.src_iova;
+	ASSERT_EQ(0,
+		  ioctl(self->fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_ACCESS_PAGES),
+			&access_cmd));
+	copy_cmd.src_ioas_id = new_ioas_id;
+	ASSERT_EQ(0, ioctl(self->fd, IOMMU_IOAS_COPY, &copy_cmd));
+	check_mock_iova(buffer, MOCK_APERTURE_START, BUFFER_SIZE);
+
 	test_cmd_destroy_access_pages(
 		access_cmd.id, access_cmd.access_pages.out_access_pages_id);
 	test_cmd_destroy_access(access_cmd.id);
 
-	test_ioctl_destroy(ioas_id);
+	test_ioctl_destroy(new_ioas_id);
 }
 
 /* VFIO compatibility IOCTLs */
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v1 7/8] iommufd/device: Use iommu_group_replace_domain()
  2023-02-02  7:05 [PATCH v1 0/8] Add IO page table replacement support Nicolin Chen
                   ` (5 preceding siblings ...)
  2023-02-02  7:05 ` [PATCH v1 6/8] iommufd/selftest: Add coverage for access->ioas replacement Nicolin Chen
@ 2023-02-02  7:05 ` Nicolin Chen
  2023-02-06  8:46   ` Tian, Kevin
  2023-02-02  7:05 ` [PATCH v1 8/8] vfio-iommufd: Support IO page table replacement Nicolin Chen
  2023-02-03  8:09 ` [PATCH v1 0/8] Add IO page table replacement support Tian, Kevin
  8 siblings, 1 reply; 41+ messages in thread
From: Nicolin Chen @ 2023-02-02  7:05 UTC (permalink / raw)
  To: jgg, kevin.tian, joro, will, robin.murphy, alex.williamson, shuah
  Cc: yi.l.liu, linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

iommu_group_replace_domain() is introduced to support use cases where an
iommu_group can be attached to a new domain without getting detached from
the old one. This replacement feature will be useful, for cases such as:
1) vPASID mode, when a guest wants to replace a single pasid (PASID=0)
   table with a larger table (PASID=N)
2) Nesting mode, when switching the attaching device from an S2 domain
   to an S1 domain, or when switching between relevant S1 domains.
as it allows these cases to switch seamlessly without a DMA disruption.

So, call iommu_group_replace_domain() in the iommufd_device_do_attach().
And then decrease the refcount of the hwpt being replaced.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/iommufd/device.c          | 26 +++++++++++++++++++++++--
 drivers/iommu/iommufd/iommufd_private.h |  2 ++
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 8118d5262800..c7701e449f3d 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -9,6 +9,8 @@
 #include "io_pagetable.h"
 #include "iommufd_private.h"
 
+MODULE_IMPORT_NS(IOMMUFD_INTERNAL);
+
 static bool allow_unsafe_interrupts;
 module_param(allow_unsafe_interrupts, bool, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(
@@ -197,6 +199,7 @@ static bool iommufd_hw_pagetable_has_group(struct iommufd_hw_pagetable *hwpt,
 static int iommufd_device_do_attach(struct iommufd_device *idev,
 				    struct iommufd_hw_pagetable *hwpt)
 {
+	struct iommufd_hw_pagetable *cur_hwpt = idev->hwpt;
 	phys_addr_t sw_msi_start = PHYS_ADDR_MAX;
 	int rc;
 
@@ -234,7 +237,7 @@ static int iommufd_device_do_attach(struct iommufd_device *idev,
 	 * the group once for the first device that is in the group.
 	 */
 	if (!iommufd_hw_pagetable_has_group(hwpt, idev->group)) {
-		rc = iommu_attach_group(hwpt->domain, idev->group);
+		rc = iommu_group_replace_domain(idev->group, hwpt->domain);
 		if (rc)
 			goto out_iova;
 
@@ -246,6 +249,18 @@ static int iommufd_device_do_attach(struct iommufd_device *idev,
 		}
 	}
 
+	if (cur_hwpt) {
+		/* Replace the cur_hwpt */
+		mutex_lock(&cur_hwpt->devices_lock);
+		if (cur_hwpt->ioas != hwpt->ioas)
+			iopt_remove_reserved_iova(&cur_hwpt->ioas->iopt,
+						  idev->dev);
+		list_del(&cur_hwpt->hwpt_item);
+		list_del(&idev->devices_item);
+		refcount_dec(&cur_hwpt->obj.users);
+		refcount_dec(&idev->obj.users);
+		mutex_unlock(&cur_hwpt->devices_lock);
+	}
 	idev->hwpt = hwpt;
 	refcount_inc(&hwpt->obj.users);
 	list_add(&idev->devices_item, &hwpt->devices);
@@ -253,7 +268,10 @@ static int iommufd_device_do_attach(struct iommufd_device *idev,
 	return 0;
 
 out_detach:
-	iommu_detach_group(hwpt->domain, idev->group);
+	if (cur_hwpt)
+		iommu_group_replace_domain(idev->group, cur_hwpt->domain);
+	else
+		iommu_detach_group(hwpt->domain, idev->group);
 out_iova:
 	iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev);
 out_unlock:
@@ -343,6 +361,9 @@ int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id)
 		struct iommufd_hw_pagetable *hwpt =
 			container_of(pt_obj, struct iommufd_hw_pagetable, obj);
 
+		if (idev->hwpt == hwpt)
+			goto out_done;
+
 		rc = iommufd_device_do_attach(idev, hwpt);
 		if (rc)
 			goto out_put_pt_obj;
@@ -367,6 +388,7 @@ int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id)
 	}
 
 	refcount_inc(&idev->obj.users);
+out_done:
 	*pt_id = idev->hwpt->obj.id;
 	rc = 0;
 
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 593138bb37b8..200c783800ad 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -9,6 +9,8 @@
 #include <linux/refcount.h>
 #include <linux/uaccess.h>
 
+#include "../iommu-priv.h"
+
 struct iommu_domain;
 struct iommu_group;
 struct iommu_option;
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v1 8/8] vfio-iommufd: Support IO page table replacement
  2023-02-02  7:05 [PATCH v1 0/8] Add IO page table replacement support Nicolin Chen
                   ` (6 preceding siblings ...)
  2023-02-02  7:05 ` [PATCH v1 7/8] iommufd/device: Use iommu_group_replace_domain() Nicolin Chen
@ 2023-02-02  7:05 ` Nicolin Chen
  2023-02-06  8:49   ` Tian, Kevin
  2023-02-03  8:09 ` [PATCH v1 0/8] Add IO page table replacement support Tian, Kevin
  8 siblings, 1 reply; 41+ messages in thread
From: Nicolin Chen @ 2023-02-02  7:05 UTC (permalink / raw)
  To: jgg, kevin.tian, joro, will, robin.murphy, alex.williamson, shuah
  Cc: yi.l.liu, linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

Remove the vdev->iommufd_attached check, since the kernel can internally
handle a replacement of the IO page table now.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/vfio/iommufd.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index 7e09defbcffe..f9e89b3eef69 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -111,9 +111,6 @@ int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
 		return 0;
 	}
 
-	if (vdev->iommufd_attached)
-		return -EBUSY;
-
 	rc = iommufd_device_attach(vdev->iommufd_device, pt_id);
 	if (rc)
 		return rc;
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 2/8] iommu: Introduce a new iommu_group_replace_domain() API
  2023-02-02  7:05 ` [PATCH v1 2/8] iommu: Introduce a new iommu_group_replace_domain() API Nicolin Chen
@ 2023-02-02 10:21   ` Baolu Lu
  2023-02-02 19:14     ` Nicolin Chen
  2023-02-03  8:26   ` Tian, Kevin
  1 sibling, 1 reply; 41+ messages in thread
From: Baolu Lu @ 2023-02-02 10:21 UTC (permalink / raw)
  To: Nicolin Chen, jgg, kevin.tian, joro, will, robin.murphy,
	alex.williamson, shuah
  Cc: baolu.lu, yi.l.liu, linux-kernel, iommu, kvm, linux-kselftest

On 2023/2/2 15:05, Nicolin Chen wrote:
> +/**
> + * iommu_group_replace_domain - replace the domain that a group is attached to
> + * @new_domain: new IOMMU domain to replace with
> + * @group: IOMMU group that will be attached to the new domain
> + *
> + * This API allows the group to switch domains without being forced to go to
> + * the blocking domain in-between.
> + *
> + * If the attached domain is a core domain (e.g. a default_domain), it will act
> + * just like the iommu_attach_group().

I am not following above two lines. Why and how could iommufd set a
core domain to an iommu_group?

> + */
> +int iommu_group_replace_domain(struct iommu_group *group,
> +			       struct iommu_domain *new_domain)
> +{
> +	int ret;
> +
> +	if (!new_domain)
> +		return -EINVAL;
> +
> +	mutex_lock(&group->mutex);
> +	ret = __iommu_group_set_domain(group, new_domain);
> +	if (ret) {
> +		if (__iommu_group_set_domain(group, group->domain))
> +			__iommu_group_set_core_domain(group);
> +	}
> +	mutex_unlock(&group->mutex);
> +	return ret;
> +}
> +EXPORT_SYMBOL_NS_GPL(iommu_group_replace_domain, IOMMUFD_INTERNAL);

Best regards,
baolu

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 2/8] iommu: Introduce a new iommu_group_replace_domain() API
  2023-02-02 10:21   ` Baolu Lu
@ 2023-02-02 19:14     ` Nicolin Chen
  2023-02-03  1:33       ` Baolu Lu
  0 siblings, 1 reply; 41+ messages in thread
From: Nicolin Chen @ 2023-02-02 19:14 UTC (permalink / raw)
  To: Baolu Lu
  Cc: jgg, kevin.tian, joro, will, robin.murphy, alex.williamson,
	shuah, yi.l.liu, linux-kernel, iommu, kvm, linux-kselftest

On Thu, Feb 02, 2023 at 06:21:20PM +0800, Baolu Lu wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 2023/2/2 15:05, Nicolin Chen wrote:
> > +/**
> > + * iommu_group_replace_domain - replace the domain that a group is attached to
> > + * @new_domain: new IOMMU domain to replace with
> > + * @group: IOMMU group that will be attached to the new domain
> > + *
> > + * This API allows the group to switch domains without being forced to go to
> > + * the blocking domain in-between.
> > + *
> > + * If the attached domain is a core domain (e.g. a default_domain), it will act
> > + * just like the iommu_attach_group().
> 
> I am not following above two lines. Why and how could iommufd set a
> core domain to an iommu_group?

Perhaps this isn't the best narrative. What it's supposed to say
is that this function acts as an iommu_attach_group() call if the
device is "detached", yet we have changed the semantics about the
word "detach". So, what should the correct way to write such a
note?

Thanks
Nic

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 2/8] iommu: Introduce a new iommu_group_replace_domain() API
  2023-02-02 19:14     ` Nicolin Chen
@ 2023-02-03  1:33       ` Baolu Lu
  2023-02-03  1:41         ` Nicolin Chen
  0 siblings, 1 reply; 41+ messages in thread
From: Baolu Lu @ 2023-02-03  1:33 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: baolu.lu, jgg, kevin.tian, joro, will, robin.murphy,
	alex.williamson, shuah, yi.l.liu, linux-kernel, iommu, kvm,
	linux-kselftest

On 2023/2/3 3:14, Nicolin Chen wrote:
> On Thu, Feb 02, 2023 at 06:21:20PM +0800, Baolu Lu wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 2023/2/2 15:05, Nicolin Chen wrote:
>>> +/**
>>> + * iommu_group_replace_domain - replace the domain that a group is attached to
>>> + * @new_domain: new IOMMU domain to replace with
>>> + * @group: IOMMU group that will be attached to the new domain
>>> + *
>>> + * This API allows the group to switch domains without being forced to go to
>>> + * the blocking domain in-between.
>>> + *
>>> + * If the attached domain is a core domain (e.g. a default_domain), it will act
>>> + * just like the iommu_attach_group().
>> I am not following above two lines. Why and how could iommufd set a
>> core domain to an iommu_group?
> Perhaps this isn't the best narrative. What it's supposed to say
> is that this function acts as an iommu_attach_group() call if the
> device is "detached", yet we have changed the semantics about the
> word "detach". So, what should the correct way to write such a
> note?

How could this interface be used as detaching a domain from a group?
Even it could be used, doesn't it act as an iommu_detach_group()?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 2/8] iommu: Introduce a new iommu_group_replace_domain() API
  2023-02-03  1:33       ` Baolu Lu
@ 2023-02-03  1:41         ` Nicolin Chen
  2023-02-03  2:35           ` Baolu Lu
  0 siblings, 1 reply; 41+ messages in thread
From: Nicolin Chen @ 2023-02-03  1:41 UTC (permalink / raw)
  To: Baolu Lu
  Cc: jgg, kevin.tian, joro, will, robin.murphy, alex.williamson,
	shuah, yi.l.liu, linux-kernel, iommu, kvm, linux-kselftest

On Fri, Feb 03, 2023 at 09:33:44AM +0800, Baolu Lu wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 2023/2/3 3:14, Nicolin Chen wrote:
> > On Thu, Feb 02, 2023 at 06:21:20PM +0800, Baolu Lu wrote:
> > > External email: Use caution opening links or attachments
> > > 
> > > 
> > > On 2023/2/2 15:05, Nicolin Chen wrote:
> > > > +/**
> > > > + * iommu_group_replace_domain - replace the domain that a group is attached to
> > > > + * @new_domain: new IOMMU domain to replace with
> > > > + * @group: IOMMU group that will be attached to the new domain
> > > > + *
> > > > + * This API allows the group to switch domains without being forced to go to
> > > > + * the blocking domain in-between.
> > > > + *
> > > > + * If the attached domain is a core domain (e.g. a default_domain), it will act
> > > > + * just like the iommu_attach_group().
> > > I am not following above two lines. Why and how could iommufd set a
> > > core domain to an iommu_group?
> > Perhaps this isn't the best narrative. What it's supposed to say
> > is that this function acts as an iommu_attach_group() call if the
> > device is "detached", yet we have changed the semantics about the
> > word "detach". So, what should the correct way to write such a
> > note?
> 
> How could this interface be used as detaching a domain from a group?
> Even it could be used, doesn't it act as an iommu_detach_group()?

No. I didn't say that. It doesn't act as detach(), but attach()
when a device is already "detached".

The original statement is saying, "if the attached domain is a
core domain", i.e. the device is detach()-ed, "it will act just
like the iommu_attach_group()".

Thanks
Nic

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 2/8] iommu: Introduce a new iommu_group_replace_domain() API
  2023-02-03  1:41         ` Nicolin Chen
@ 2023-02-03  2:35           ` Baolu Lu
  0 siblings, 0 replies; 41+ messages in thread
From: Baolu Lu @ 2023-02-03  2:35 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: baolu.lu, jgg, kevin.tian, joro, will, robin.murphy,
	alex.williamson, shuah, yi.l.liu, linux-kernel, iommu, kvm,
	linux-kselftest

On 2023/2/3 9:41, Nicolin Chen wrote:
> On Fri, Feb 03, 2023 at 09:33:44AM +0800, Baolu Lu wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 2023/2/3 3:14, Nicolin Chen wrote:
>>> On Thu, Feb 02, 2023 at 06:21:20PM +0800, Baolu Lu wrote:
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> On 2023/2/2 15:05, Nicolin Chen wrote:
>>>>> +/**
>>>>> + * iommu_group_replace_domain - replace the domain that a group is attached to
>>>>> + * @new_domain: new IOMMU domain to replace with
>>>>> + * @group: IOMMU group that will be attached to the new domain
>>>>> + *
>>>>> + * This API allows the group to switch domains without being forced to go to
>>>>> + * the blocking domain in-between.
>>>>> + *
>>>>> + * If the attached domain is a core domain (e.g. a default_domain), it will act
>>>>> + * just like the iommu_attach_group().
>>>> I am not following above two lines. Why and how could iommufd set a
>>>> core domain to an iommu_group?
>>> Perhaps this isn't the best narrative. What it's supposed to say
>>> is that this function acts as an iommu_attach_group() call if the
>>> device is "detached", yet we have changed the semantics about the
>>> word "detach". So, what should the correct way to write such a
>>> note?
>> How could this interface be used as detaching a domain from a group?
>> Even it could be used, doesn't it act as an iommu_detach_group()?
> No. I didn't say that. It doesn't act as detach(), but attach()
> when a device is already "detached".
> 
> The original statement is saying, "if the attached domain is a
> core domain", i.e. the device is detach()-ed, "it will act just
> like the iommu_attach_group()".

Oh! My bad. I misunderstood it. Sorry for the noise. :-)

Best regards,
baolu

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH v1 0/8] Add IO page table replacement support
  2023-02-02  7:05 [PATCH v1 0/8] Add IO page table replacement support Nicolin Chen
                   ` (7 preceding siblings ...)
  2023-02-02  7:05 ` [PATCH v1 8/8] vfio-iommufd: Support IO page table replacement Nicolin Chen
@ 2023-02-03  8:09 ` Tian, Kevin
  2023-02-03 15:04   ` Jason Gunthorpe
  8 siblings, 1 reply; 41+ messages in thread
From: Tian, Kevin @ 2023-02-03  8:09 UTC (permalink / raw)
  To: Nicolin Chen, jgg, joro, will, robin.murphy, alex.williamson, shuah
  Cc: Liu, Yi L, linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Thursday, February 2, 2023 3:05 PM
> 
> QEMU with this feature should have the vIOMMU maintain a cache of the
> guest io page table addresses and assign a unique IOAS to each unique
> guest page table.

I didn't get why we impose such requirement to userspace.

> 
> To apply this series, please rebase on top of the following patches:
> 1) [PATCH 00/13] Add vfio_device cdev for iommufd support
>    https://lore.kernel.org/kvm/20230117134942.101112-1-yi.l.liu@intel.com/
> 2) (Merged) [PATCH v5 0/5] iommu: Retire detach_dev callback
>    https://lore.kernel.org/linux-iommu/20230110025408.667767-1-
> baolu.lu@linux.intel.com/
> 3) (Merged) [PATCH] selftests: iommu: Fix test_cmd_destroy_access() call in
> user_copy
>    https://lore.kernel.org/lkml/20230120074204.1368-1-nicolinc@nvidia.com/
> 

No need to list merged work as dependency. In concept every commit
merged in the version which this series is rebased to is required. 😊

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH v1 2/8] iommu: Introduce a new iommu_group_replace_domain() API
  2023-02-02  7:05 ` [PATCH v1 2/8] iommu: Introduce a new iommu_group_replace_domain() API Nicolin Chen
  2023-02-02 10:21   ` Baolu Lu
@ 2023-02-03  8:26   ` Tian, Kevin
  2023-02-03 15:03     ` Jason Gunthorpe
  2023-02-03 17:45     ` Nicolin Chen
  1 sibling, 2 replies; 41+ messages in thread
From: Tian, Kevin @ 2023-02-03  8:26 UTC (permalink / raw)
  To: Nicolin Chen, jgg, joro, will, robin.murphy, alex.williamson, shuah
  Cc: Liu, Yi L, linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Thursday, February 2, 2023 3:05 PM
> 
> All drivers are already required to support changing between active
> UNMANAGED domains when using their attach_dev ops.

All drivers which don't have *broken* UNMANAGED domain?

> 
> +/**
> + * iommu_group_replace_domain - replace the domain that a group is
> attached to
> + * @new_domain: new IOMMU domain to replace with
> + * @group: IOMMU group that will be attached to the new domain
> + *
> + * This API allows the group to switch domains without being forced to go to
> + * the blocking domain in-between.
> + *
> + * If the attached domain is a core domain (e.g. a default_domain), it will act
> + * just like the iommu_attach_group().

I think you meant "the currently-attached domain", which implies a
'detached' state as you replied to Baolu.

> + */
> +int iommu_group_replace_domain(struct iommu_group *group,
> +			       struct iommu_domain *new_domain)

what actual value does 'replace' give us? It's just a wrapper of
__iommu_group_set_domain() then calling it set_domain is
probably clearer. You can clarify the 'replace' behavior in the
comment.

> +{
> +	int ret;
> +
> +	if (!new_domain)
> +		return -EINVAL;
> +
> +	mutex_lock(&group->mutex);
> +	ret = __iommu_group_set_domain(group, new_domain);
> +	if (ret) {
> +		if (__iommu_group_set_domain(group, group->domain))
> +			__iommu_group_set_core_domain(group);
> +	}

Can you elaborate the error handling here? Ideally if
__iommu_group_set_domain() fails then group->domain shouldn't
be changed. Why do we need further housekeeping here?


^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH v1 3/8] iommufd: Create access in vfio_iommufd_emulated_bind()
  2023-02-02  7:05 ` [PATCH v1 3/8] iommufd: Create access in vfio_iommufd_emulated_bind() Nicolin Chen
@ 2023-02-03  9:23   ` Tian, Kevin
  2023-02-03  9:28   ` Tian, Kevin
  1 sibling, 0 replies; 41+ messages in thread
From: Tian, Kevin @ 2023-02-03  9:23 UTC (permalink / raw)
  To: Nicolin Chen, jgg, joro, will, robin.murphy, alex.williamson, shuah
  Cc: Liu, Yi L, linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Thursday, February 2, 2023 3:05 PM
> 
> To prepare for an access->ioas replacement, move iommufd_access_create()
> call into vfio_iommufd_emulated_bind(), making it symmetric with the
> __vfio_iommufd_access_destroy() call in vfio_iommufd_emulated_unbind().
> This means an access is created/destroyed by the bind()/unbind(), and the
> vfio_iommufd_emulated_attach_ioas() only updates the access->ioas pointer.
> 
> Since there's no longer an ioas_id input for iommufd_access_create(), add
> a new helper iommufd_access_set_ioas() to set access->ioas. We can later
> add a "replace" feature simply to the new iommufd_access_set_ioas() too.
> 
> Leaving the access->ioas in vfio_iommufd_emulated_attach_ioas(), however,
> can introduce some potential of a race condition during pin_/unpin_pages()
> call where access->ioas->iopt is getting referenced. So, add an ioas_lock
> to protect it.
> 

What about introducing another flag e.g. vdev->iommufd_access_attached
which got set in vfio_iommufd_emulated_attach_ioas() to fulfill what
vdev->iommufd_access guards today?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH v1 3/8] iommufd: Create access in vfio_iommufd_emulated_bind()
  2023-02-02  7:05 ` [PATCH v1 3/8] iommufd: Create access in vfio_iommufd_emulated_bind() Nicolin Chen
  2023-02-03  9:23   ` Tian, Kevin
@ 2023-02-03  9:28   ` Tian, Kevin
  1 sibling, 0 replies; 41+ messages in thread
From: Tian, Kevin @ 2023-02-03  9:28 UTC (permalink / raw)
  To: Nicolin Chen, jgg, joro, will, robin.murphy, alex.williamson, shuah
  Cc: Liu, Yi L, linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

> From: Tian, Kevin
> Sent: Friday, February 3, 2023 5:24 PM
> 
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Thursday, February 2, 2023 3:05 PM
> >
> > To prepare for an access->ioas replacement, move
> iommufd_access_create()
> > call into vfio_iommufd_emulated_bind(), making it symmetric with the
> > __vfio_iommufd_access_destroy() call in
> vfio_iommufd_emulated_unbind().
> > This means an access is created/destroyed by the bind()/unbind(), and the
> > vfio_iommufd_emulated_attach_ioas() only updates the access->ioas
> pointer.
> >
> > Since there's no longer an ioas_id input for iommufd_access_create(), add
> > a new helper iommufd_access_set_ioas() to set access->ioas. We can later
> > add a "replace" feature simply to the new iommufd_access_set_ioas() too.
> >
> > Leaving the access->ioas in vfio_iommufd_emulated_attach_ioas(),
> however,
> > can introduce some potential of a race condition during pin_/unpin_pages()
> > call where access->ioas->iopt is getting referenced. So, add an ioas_lock
> > to protect it.
> >
> 
> What about introducing another flag e.g. vdev->iommufd_access_attached
> which got set in vfio_iommufd_emulated_attach_ioas() to fulfill what
> vdev->iommufd_access guards today?

Obviously this doesn't work with what 'replace' requires in following
patches. So please just ignore this comment...

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH v1 5/8] iommufd: Add replace support in iommufd_access_set_ioas()
  2023-02-02  7:05 ` [PATCH v1 5/8] iommufd: Add replace support in iommufd_access_set_ioas() Nicolin Chen
@ 2023-02-03 10:10   ` Tian, Kevin
  2023-02-03 12:31     ` Jason Gunthorpe
  0 siblings, 1 reply; 41+ messages in thread
From: Tian, Kevin @ 2023-02-03 10:10 UTC (permalink / raw)
  To: Nicolin Chen, jgg, joro, will, robin.murphy, alex.williamson, shuah
  Cc: Liu, Yi L, linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Thursday, February 2, 2023 3:05 PM
> 
> Support an access->ioas replacement in iommufd_access_set_ioas(), which
> sets the access->ioas to NULL provisionally so that any further incoming
> iommufd_access_pin_pages() callback can be blocked.
> 
> Then, call access->ops->unmap() to clean up the entire iopt. To allow an
> iommufd_access_unpin_pages() callback to happen via this unmap() call,
> add an ioas_unpin pointer so the unpin routine won't be affected by the
> "access->ioas = NULL" trick above.
> 
> Also, a vdev without an ops->dma_unmap implementation cannot replace its
> access->ioas pointer. So add an iommufd_access_ioas_is_attached() helper
> to sanity that.
> 

Presumably a driver which doesn't implement ops->dma_unmap shouldn't
be allowed to do pin/unpin. But it could use vfio_dma_rw() to access an
iova range. In the latter case I don't see why replace cannot work.

Probably what's required here is to deny !ops->dma_unmap in
vfio_pin/unpin_pages then making here replace always allowed?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 5/8] iommufd: Add replace support in iommufd_access_set_ioas()
  2023-02-03 10:10   ` Tian, Kevin
@ 2023-02-03 12:31     ` Jason Gunthorpe
  2023-02-03 22:25       ` Nicolin Chen
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2023-02-03 12:31 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Nicolin Chen, joro, will, robin.murphy, alex.williamson, shuah,
	Liu, Yi L, linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

On Fri, Feb 03, 2023 at 10:10:45AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Thursday, February 2, 2023 3:05 PM
> > 
> > Support an access->ioas replacement in iommufd_access_set_ioas(), which
> > sets the access->ioas to NULL provisionally so that any further incoming
> > iommufd_access_pin_pages() callback can be blocked.
> > 
> > Then, call access->ops->unmap() to clean up the entire iopt. To allow an
> > iommufd_access_unpin_pages() callback to happen via this unmap() call,
> > add an ioas_unpin pointer so the unpin routine won't be affected by the
> > "access->ioas = NULL" trick above.
> > 
> > Also, a vdev without an ops->dma_unmap implementation cannot replace its
> > access->ioas pointer. So add an iommufd_access_ioas_is_attached() helper
> > to sanity that.
> > 
> 
> Presumably a driver which doesn't implement ops->dma_unmap shouldn't
> be allowed to do pin/unpin. But it could use vfio_dma_rw() to access an
> iova range. In the latter case I don't see why replace cannot work.
> 
> Probably what's required here is to deny !ops->dma_unmap in
> vfio_pin/unpin_pages then making here replace always allowed?

That makes sense

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 2/8] iommu: Introduce a new iommu_group_replace_domain() API
  2023-02-03  8:26   ` Tian, Kevin
@ 2023-02-03 15:03     ` Jason Gunthorpe
  2023-02-03 17:53       ` Nicolin Chen
  2023-02-06  6:57       ` Tian, Kevin
  2023-02-03 17:45     ` Nicolin Chen
  1 sibling, 2 replies; 41+ messages in thread
From: Jason Gunthorpe @ 2023-02-03 15:03 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Nicolin Chen, joro, will, robin.murphy, alex.williamson, shuah,
	Liu, Yi L, linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

On Fri, Feb 03, 2023 at 08:26:44AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Thursday, February 2, 2023 3:05 PM
> > 
> > All drivers are already required to support changing between active
> > UNMANAGED domains when using their attach_dev ops.
> 
> All drivers which don't have *broken* UNMANAGED domain?

No, all drivers.. It has always been used by VFIO.

> > + */
> > +int iommu_group_replace_domain(struct iommu_group *group,
> > +			       struct iommu_domain *new_domain)
> 
> what actual value does 'replace' give us? It's just a wrapper of
> __iommu_group_set_domain() then calling it set_domain is
> probably clearer. You can clarify the 'replace' behavior in the
> comment.

As the APIs are setup:

attach demands that no domain is currently set (eg the domain must be blocking)

replace permits the domain to be currently set

'set' vs 'attach' is really unclear what the intended difference is.

We could also address this by simply removing the protection from
attach, but it is not so clear if that is safe for the few users.

> > +{
> > +	int ret;
> > +
> > +	if (!new_domain)
> > +		return -EINVAL;
> > +
> > +	mutex_lock(&group->mutex);
> > +	ret = __iommu_group_set_domain(group, new_domain);
> > +	if (ret) {
> > +		if (__iommu_group_set_domain(group, group->domain))
> > +			__iommu_group_set_core_domain(group);
> > +	}
> 
> Can you elaborate the error handling here? Ideally if
> __iommu_group_set_domain() fails then group->domain shouldn't
> be changed. 

That isn't what it implements though. The internal helper leaves
things in a mess, it is for the caller to fix it, and it depends on
the caller what that means.

In this case the API cannot retain a hidden reference to the new
domain, so it must be purged, one way or another.

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 0/8] Add IO page table replacement support
  2023-02-03  8:09 ` [PATCH v1 0/8] Add IO page table replacement support Tian, Kevin
@ 2023-02-03 15:04   ` Jason Gunthorpe
  2023-02-06  6:39     ` Tian, Kevin
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2023-02-03 15:04 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Nicolin Chen, joro, will, robin.murphy, alex.williamson, shuah,
	Liu, Yi L, linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

On Fri, Feb 03, 2023 at 08:09:30AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Thursday, February 2, 2023 3:05 PM
> > 
> > QEMU with this feature should have the vIOMMU maintain a cache of the
> > guest io page table addresses and assign a unique IOAS to each unique
> > guest page table.
> 
> I didn't get why we impose such requirement to userspace.

I read this as implementation guidance for qemu. qemu can do what it
wants of course

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 2/8] iommu: Introduce a new iommu_group_replace_domain() API
  2023-02-03  8:26   ` Tian, Kevin
  2023-02-03 15:03     ` Jason Gunthorpe
@ 2023-02-03 17:45     ` Nicolin Chen
  2023-02-06  6:40       ` Tian, Kevin
  1 sibling, 1 reply; 41+ messages in thread
From: Nicolin Chen @ 2023-02-03 17:45 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: jgg, joro, will, robin.murphy, alex.williamson, shuah, Liu, Yi L,
	linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

On Fri, Feb 03, 2023 at 08:26:44AM +0000, Tian, Kevin wrote:

> > +/**
> > + * iommu_group_replace_domain - replace the domain that a group is
> > attached to
> > + * @new_domain: new IOMMU domain to replace with
> > + * @group: IOMMU group that will be attached to the new domain
> > + *
> > + * This API allows the group to switch domains without being forced to go to
> > + * the blocking domain in-between.
> > + *
> > + * If the attached domain is a core domain (e.g. a default_domain), it will act
> > + * just like the iommu_attach_group().
> 
> I think you meant "the currently-attached domain", which implies a
> 'detached' state as you replied to Baolu.

Hmm, I don't see an implication, since we only have either
"the attached domain" or "a new domain" in the context?

With that being said, I can add "currently" in v2:
 * If the currently attached domain is a core domain (e.g. a default_domain),
 * it will act just like the iommu_attach_group().

Thanks
Nic

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 2/8] iommu: Introduce a new iommu_group_replace_domain() API
  2023-02-03 15:03     ` Jason Gunthorpe
@ 2023-02-03 17:53       ` Nicolin Chen
  2023-02-06  6:57       ` Tian, Kevin
  1 sibling, 0 replies; 41+ messages in thread
From: Nicolin Chen @ 2023-02-03 17:53 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Tian, Kevin, joro, will, robin.murphy, alex.williamson, shuah,
	Liu, Yi L, linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

On Fri, Feb 03, 2023 at 11:03:20AM -0400, Jason Gunthorpe wrote:
> > > + */
> > > +int iommu_group_replace_domain(struct iommu_group *group,
> > > +			       struct iommu_domain *new_domain)
> > 
> > what actual value does 'replace' give us? It's just a wrapper of
> > __iommu_group_set_domain() then calling it set_domain is
> > probably clearer. You can clarify the 'replace' behavior in the
> > comment.
> 
> As the APIs are setup:
> 
> attach demands that no domain is currently set (eg the domain must be blocking)
> 
> replace permits the domain to be currently set
> 
> 'set' vs 'attach' is really unclear what the intended difference is.
> 
> We could also address this by simply removing the protection from
> attach, but it is not so clear if that is safe for the few users.

I can add a couple of lines to the commit message to make things
clear.

> > > +{
> > > +	int ret;
> > > +
> > > +	if (!new_domain)
> > > +		return -EINVAL;
> > > +
> > > +	mutex_lock(&group->mutex);
> > > +	ret = __iommu_group_set_domain(group, new_domain);
> > > +	if (ret) {
> > > +		if (__iommu_group_set_domain(group, group->domain))
> > > +			__iommu_group_set_core_domain(group);
> > > +	}
> > 
> > Can you elaborate the error handling here? Ideally if
> > __iommu_group_set_domain() fails then group->domain shouldn't
> > be changed. 
> 
> That isn't what it implements though. The internal helper leaves
> things in a mess, it is for the caller to fix it, and it depends on
> the caller what that means.
> 
> In this case the API cannot retain a hidden reference to the new
> domain, so it must be purged, one way or another.

Considering it is a bit different than the existing APIs like
iommu_attach_group(), perhaps I should add a line of comments
in the fallback routine.

Thanks
Nic

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 5/8] iommufd: Add replace support in iommufd_access_set_ioas()
  2023-02-03 12:31     ` Jason Gunthorpe
@ 2023-02-03 22:25       ` Nicolin Chen
  0 siblings, 0 replies; 41+ messages in thread
From: Nicolin Chen @ 2023-02-03 22:25 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe
  Cc: joro, will, robin.murphy, alex.williamson, shuah, Liu, Yi L,
	linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

On Fri, Feb 03, 2023 at 08:31:52AM -0400, Jason Gunthorpe wrote:
> On Fri, Feb 03, 2023 at 10:10:45AM +0000, Tian, Kevin wrote:
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > Sent: Thursday, February 2, 2023 3:05 PM
> > > 
> > > Support an access->ioas replacement in iommufd_access_set_ioas(), which
> > > sets the access->ioas to NULL provisionally so that any further incoming
> > > iommufd_access_pin_pages() callback can be blocked.
> > > 
> > > Then, call access->ops->unmap() to clean up the entire iopt. To allow an
> > > iommufd_access_unpin_pages() callback to happen via this unmap() call,
> > > add an ioas_unpin pointer so the unpin routine won't be affected by the
> > > "access->ioas = NULL" trick above.
> > > 
> > > Also, a vdev without an ops->dma_unmap implementation cannot replace its
> > > access->ioas pointer. So add an iommufd_access_ioas_is_attached() helper
> > > to sanity that.
> > > 
> > 
> > Presumably a driver which doesn't implement ops->dma_unmap shouldn't
> > be allowed to do pin/unpin. But it could use vfio_dma_rw() to access an
> > iova range. In the latter case I don't see why replace cannot work.
> > 
> > Probably what's required here is to deny !ops->dma_unmap in
> > vfio_pin/unpin_pages then making here replace always allowed?
> 
> That makes sense

Will change that in v2.

Thanks
Nic

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH v1 0/8] Add IO page table replacement support
  2023-02-03 15:04   ` Jason Gunthorpe
@ 2023-02-06  6:39     ` Tian, Kevin
  2023-02-06 13:26       ` Jason Gunthorpe
  0 siblings, 1 reply; 41+ messages in thread
From: Tian, Kevin @ 2023-02-06  6:39 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Nicolin Chen, joro, will, robin.murphy, alex.williamson, shuah,
	Liu, Yi L, linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, February 3, 2023 11:04 PM
> 
> On Fri, Feb 03, 2023 at 08:09:30AM +0000, Tian, Kevin wrote:
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > Sent: Thursday, February 2, 2023 3:05 PM
> > >
> > > QEMU with this feature should have the vIOMMU maintain a cache of the
> > > guest io page table addresses and assign a unique IOAS to each unique
> > > guest page table.
> >
> > I didn't get why we impose such requirement to userspace.
> 
> I read this as implementation guidance for qemu. qemu can do what it
> wants of course
> 

sure but I still didn't get why this is a guidance specific to the
new replace cmd...

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH v1 2/8] iommu: Introduce a new iommu_group_replace_domain() API
  2023-02-03 17:45     ` Nicolin Chen
@ 2023-02-06  6:40       ` Tian, Kevin
  0 siblings, 0 replies; 41+ messages in thread
From: Tian, Kevin @ 2023-02-06  6:40 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: jgg, joro, will, robin.murphy, alex.williamson, shuah, Liu, Yi L,
	linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Saturday, February 4, 2023 1:45 AM
> 
> On Fri, Feb 03, 2023 at 08:26:44AM +0000, Tian, Kevin wrote:
> 
> > > +/**
> > > + * iommu_group_replace_domain - replace the domain that a group is
> > > attached to
> > > + * @new_domain: new IOMMU domain to replace with
> > > + * @group: IOMMU group that will be attached to the new domain
> > > + *
> > > + * This API allows the group to switch domains without being forced to
> go to
> > > + * the blocking domain in-between.
> > > + *
> > > + * If the attached domain is a core domain (e.g. a default_domain), it will
> act
> > > + * just like the iommu_attach_group().
> >
> > I think you meant "the currently-attached domain", which implies a
> > 'detached' state as you replied to Baolu.
> 
> Hmm, I don't see an implication, since we only have either
> "the attached domain" or "a new domain" in the context?

probably just me reading it as "the newly attached domain". 😊

> 
> With that being said, I can add "currently" in v2:
>  * If the currently attached domain is a core domain (e.g. a default_domain),
>  * it will act just like the iommu_attach_group().
> 

this is clearer.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH v1 2/8] iommu: Introduce a new iommu_group_replace_domain() API
  2023-02-03 15:03     ` Jason Gunthorpe
  2023-02-03 17:53       ` Nicolin Chen
@ 2023-02-06  6:57       ` Tian, Kevin
  2023-02-06 13:25         ` Jason Gunthorpe
  1 sibling, 1 reply; 41+ messages in thread
From: Tian, Kevin @ 2023-02-06  6:57 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Nicolin Chen, joro, will, robin.murphy, alex.williamson, shuah,
	Liu, Yi L, linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, February 3, 2023 11:03 PM
> 
> On Fri, Feb 03, 2023 at 08:26:44AM +0000, Tian, Kevin wrote:
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > Sent: Thursday, February 2, 2023 3:05 PM
> > >
> > > All drivers are already required to support changing between active
> > > UNMANAGED domains when using their attach_dev ops.
> >
> > All drivers which don't have *broken* UNMANAGED domain?
> 
> No, all drivers.. It has always been used by VFIO.

existing iommu_attach_group() doesn't support changing between
two UNMANAGED domains. only from default->unmanaged or
blocking->unmanaged.

Above statement is true only if this series is based on your unmerged
work removing DMA domain types.

> 
> > > +{
> > > +	int ret;
> > > +
> > > +	if (!new_domain)
> > > +		return -EINVAL;
> > > +
> > > +	mutex_lock(&group->mutex);
> > > +	ret = __iommu_group_set_domain(group, new_domain);
> > > +	if (ret) {
> > > +		if (__iommu_group_set_domain(group, group->domain))
> > > +			__iommu_group_set_core_domain(group);
> > > +	}
> >
> > Can you elaborate the error handling here? Ideally if
> > __iommu_group_set_domain() fails then group->domain shouldn't
> > be changed.
> 
> That isn't what it implements though. The internal helper leaves
> things in a mess, it is for the caller to fix it, and it depends on
> the caller what that means.

I didn't see any warning of the mess and the caller's responsibility
in __iommu_group_set_domain(). Can it be documented clearly
so if someone wants to add a new caller on it he can clearly know
what to do?

and why doesn't iommu_attach_group() need to do anything
when an attach attempt fails? In the end it calls the same
iommu_group_do_attach_device() as __iommu_group_set_domain()
does.

btw looking at the code __iommu_group_set_domain():

	 * Note that this is called in error unwind paths, attaching to a
	 * domain that has already been attached cannot fail.
	 */
	ret = __iommu_group_for_each_dev(group, new_domain,
				iommu_group_do_attach_device);

with that we don't need fall back to core domain in above error
unwinding per this comment.

> 
> In this case the API cannot retain a hidden reference to the new
> domain, so it must be purged, one way or another.
> 

Could you elaborate where the hidden reference is retained?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH v1 7/8] iommufd/device: Use iommu_group_replace_domain()
  2023-02-02  7:05 ` [PATCH v1 7/8] iommufd/device: Use iommu_group_replace_domain() Nicolin Chen
@ 2023-02-06  8:46   ` Tian, Kevin
  2023-02-06 19:17     ` Nicolin Chen
  0 siblings, 1 reply; 41+ messages in thread
From: Tian, Kevin @ 2023-02-06  8:46 UTC (permalink / raw)
  To: Nicolin Chen, jgg, joro, will, robin.murphy, alex.williamson, shuah
  Cc: Liu, Yi L, linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Thursday, February 2, 2023 3:05 PM
>
> @@ -246,6 +249,18 @@ static int iommufd_device_do_attach(struct
> iommufd_device *idev,
>  		}
>  	}
> 
> +	if (cur_hwpt) {
> +		/* Replace the cur_hwpt */
> +		mutex_lock(&cur_hwpt->devices_lock);
> +		if (cur_hwpt->ioas != hwpt->ioas)
> +			iopt_remove_reserved_iova(&cur_hwpt->ioas->iopt,
> +						  idev->dev);
> +		list_del(&cur_hwpt->hwpt_item);

emmm shouldn't this be done only when the device is the last
one attached to the hwpt? and if it's the last one you should
also iopt_table_remove_domain() together with list_del, i.e.
similar housekeeping as done in iommufd_device_detach().

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH v1 8/8] vfio-iommufd: Support IO page table replacement
  2023-02-02  7:05 ` [PATCH v1 8/8] vfio-iommufd: Support IO page table replacement Nicolin Chen
@ 2023-02-06  8:49   ` Tian, Kevin
  2023-02-06 18:54     ` Nicolin Chen
  0 siblings, 1 reply; 41+ messages in thread
From: Tian, Kevin @ 2023-02-06  8:49 UTC (permalink / raw)
  To: Nicolin Chen, jgg, joro, will, robin.murphy, alex.williamson, shuah
  Cc: Liu, Yi L, linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Thursday, February 2, 2023 3:05 PM
> 
> Remove the vdev->iommufd_attached check, since the kernel can internally
> handle a replacement of the IO page table now.
> 
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>  drivers/vfio/iommufd.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> index 7e09defbcffe..f9e89b3eef69 100644
> --- a/drivers/vfio/iommufd.c
> +++ b/drivers/vfio/iommufd.c
> @@ -111,9 +111,6 @@ int vfio_iommufd_physical_attach_ioas(struct
> vfio_device *vdev, u32 *pt_id)
>  		return 0;
>  	}
> 
> -	if (vdev->iommufd_attached)
> -		return -EBUSY;
> -
>  	rc = iommufd_device_attach(vdev->iommufd_device, pt_id);
>  	if (rc)
>  		return rc;

also update vfio uapi description to explain the replace semantics.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 2/8] iommu: Introduce a new iommu_group_replace_domain() API
  2023-02-06  6:57       ` Tian, Kevin
@ 2023-02-06 13:25         ` Jason Gunthorpe
  2023-02-07  0:32           ` Tian, Kevin
  2023-02-07 19:16           ` Nicolin Chen
  0 siblings, 2 replies; 41+ messages in thread
From: Jason Gunthorpe @ 2023-02-06 13:25 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Nicolin Chen, joro, will, robin.murphy, alex.williamson, shuah,
	Liu, Yi L, linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

On Mon, Feb 06, 2023 at 06:57:35AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Friday, February 3, 2023 11:03 PM
> > 
> > On Fri, Feb 03, 2023 at 08:26:44AM +0000, Tian, Kevin wrote:
> > > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > > Sent: Thursday, February 2, 2023 3:05 PM
> > > >
> > > > All drivers are already required to support changing between active
> > > > UNMANAGED domains when using their attach_dev ops.
> > >
> > > All drivers which don't have *broken* UNMANAGED domain?
> > 
> > No, all drivers.. It has always been used by VFIO.
> 
> existing iommu_attach_group() doesn't support changing between
> two UNMANAGED domains. only from default->unmanaged or
> blocking->unmanaged.

Yes, but before we added the blocking domains VFIO was changing
between unmanaged domains. Blocking domains are so new that no driver
could have suddenly started to depend on this.

> > > Can you elaborate the error handling here? Ideally if
> > > __iommu_group_set_domain() fails then group->domain shouldn't
> > > be changed.
> > 
> > That isn't what it implements though. The internal helper leaves
> > things in a mess, it is for the caller to fix it, and it depends on
> > the caller what that means.
> 
> I didn't see any warning of the mess and the caller's responsibility
> in __iommu_group_set_domain(). Can it be documented clearly
> so if someone wants to add a new caller on it he can clearly know
> what to do?

That would be nice..
 
> and why doesn't iommu_attach_group() need to do anything
> when an attach attempt fails? In the end it calls the same
> iommu_group_do_attach_device() as __iommu_group_set_domain()
> does.

That's a bug for sure.

 
> btw looking at the code __iommu_group_set_domain():
> 
> 	 * Note that this is called in error unwind paths, attaching to a
> 	 * domain that has already been attached cannot fail.
> 	 */
> 	ret = __iommu_group_for_each_dev(group, new_domain,
> 				iommu_group_do_attach_device);
> 
> with that we don't need fall back to core domain in above error
> unwinding per this comment.

That does make some sense.

I tried to make a patch to consolidate all this error handling once,
that would be the better way to approach this.

> > In this case the API cannot retain a hidden reference to the new
> > domain, so it must be purged, one way or another.
> 
> Could you elaborate where the hidden reference is retained?

Inside the driver, it can keep track of the domain pointer if
attach_dev succeeds

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 0/8] Add IO page table replacement support
  2023-02-06  6:39     ` Tian, Kevin
@ 2023-02-06 13:26       ` Jason Gunthorpe
  2023-02-07  0:34         ` Tian, Kevin
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2023-02-06 13:26 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Nicolin Chen, joro, will, robin.murphy, alex.williamson, shuah,
	Liu, Yi L, linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

On Mon, Feb 06, 2023 at 06:39:29AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Friday, February 3, 2023 11:04 PM
> > 
> > On Fri, Feb 03, 2023 at 08:09:30AM +0000, Tian, Kevin wrote:
> > > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > > Sent: Thursday, February 2, 2023 3:05 PM
> > > >
> > > > QEMU with this feature should have the vIOMMU maintain a cache of the
> > > > guest io page table addresses and assign a unique IOAS to each unique
> > > > guest page table.
> > >
> > > I didn't get why we impose such requirement to userspace.
> > 
> > I read this as implementation guidance for qemu. qemu can do what it
> > wants of course
> > 
> 
> sure but I still didn't get why this is a guidance specific to the
> new replace cmd...

I think the guidance is about the change to VFIO uAPI where it is now
possible to change the domain attached, previously that was not
possible

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 8/8] vfio-iommufd: Support IO page table replacement
  2023-02-06  8:49   ` Tian, Kevin
@ 2023-02-06 18:54     ` Nicolin Chen
  0 siblings, 0 replies; 41+ messages in thread
From: Nicolin Chen @ 2023-02-06 18:54 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: jgg, joro, will, robin.murphy, alex.williamson, shuah, Liu, Yi L,
	linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

On Mon, Feb 06, 2023 at 08:49:16AM +0000, Tian, Kevin wrote:
> External email: Use caution opening links or attachments
> 
> 
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Thursday, February 2, 2023 3:05 PM
> >
> > Remove the vdev->iommufd_attached check, since the kernel can internally
> > handle a replacement of the IO page table now.
> >
> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > ---
> >  drivers/vfio/iommufd.c | 3 ---
> >  1 file changed, 3 deletions(-)
> >
> > diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> > index 7e09defbcffe..f9e89b3eef69 100644
> > --- a/drivers/vfio/iommufd.c
> > +++ b/drivers/vfio/iommufd.c
> > @@ -111,9 +111,6 @@ int vfio_iommufd_physical_attach_ioas(struct
> > vfio_device *vdev, u32 *pt_id)
> >               return 0;
> >       }
> >
> > -     if (vdev->iommufd_attached)
> > -             return -EBUSY;
> > -
> >       rc = iommufd_device_attach(vdev->iommufd_device, pt_id);
> >       if (rc)
> >               return rc;
> 
> also update vfio uapi description to explain the replace semantics.

Will add that. Thanks!

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 7/8] iommufd/device: Use iommu_group_replace_domain()
  2023-02-06  8:46   ` Tian, Kevin
@ 2023-02-06 19:17     ` Nicolin Chen
  2023-02-07  0:37       ` Tian, Kevin
  0 siblings, 1 reply; 41+ messages in thread
From: Nicolin Chen @ 2023-02-06 19:17 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: jgg, joro, will, robin.murphy, alex.williamson, shuah, Liu, Yi L,
	linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

On Mon, Feb 06, 2023 at 08:46:04AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Thursday, February 2, 2023 3:05 PM
> >
> > @@ -246,6 +249,18 @@ static int iommufd_device_do_attach(struct
> > iommufd_device *idev,
> >               }
> >       }
> >
> > +     if (cur_hwpt) {
> > +             /* Replace the cur_hwpt */
> > +             mutex_lock(&cur_hwpt->devices_lock);
> > +             if (cur_hwpt->ioas != hwpt->ioas)
> > +                     iopt_remove_reserved_iova(&cur_hwpt->ioas->iopt,
> > +                                               idev->dev);
> > +             list_del(&cur_hwpt->hwpt_item);
> 
> emmm shouldn't this be done only when the device is the last
> one attached to the hwpt? and if it's the last one you should
> also iopt_table_remove_domain() together with list_del, i.e.
> similar housekeeping as done in iommufd_device_detach().

You are right. I had another patch on top of this series,
moving this list_del() and iopt_table_remove_domain() to
the destroy() callback, so I overlooked.

And I just found that the list_add_del(hwpt_item) in the
IOMMUFD_OBJ_HW_PAGETABLE case doesn't seem to call at the
first device's attachment. So, I think that we might need
my previous "symmetric" patch in this series too.

Will fix in v2. Thanks!

Nic

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH v1 2/8] iommu: Introduce a new iommu_group_replace_domain() API
  2023-02-06 13:25         ` Jason Gunthorpe
@ 2023-02-07  0:32           ` Tian, Kevin
  2023-02-07 12:22             ` Jason Gunthorpe
  2023-02-07 19:16           ` Nicolin Chen
  1 sibling, 1 reply; 41+ messages in thread
From: Tian, Kevin @ 2023-02-07  0:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Nicolin Chen, joro, will, robin.murphy, alex.williamson, shuah,
	Liu, Yi L, linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Monday, February 6, 2023 9:25 PM
> 
> On Mon, Feb 06, 2023 at 06:57:35AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Friday, February 3, 2023 11:03 PM
> > >
> > > On Fri, Feb 03, 2023 at 08:26:44AM +0000, Tian, Kevin wrote:
> > > > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > > > Sent: Thursday, February 2, 2023 3:05 PM
> > > > >
> > > > > All drivers are already required to support changing between active
> > > > > UNMANAGED domains when using their attach_dev ops.
> > > >
> > > > All drivers which don't have *broken* UNMANAGED domain?
> > >
> > > No, all drivers.. It has always been used by VFIO.
> >
> > existing iommu_attach_group() doesn't support changing between
> > two UNMANAGED domains. only from default->unmanaged or
> > blocking->unmanaged.
> 
> Yes, but before we added the blocking domains VFIO was changing
> between unmanaged domains. Blocking domains are so new that no driver
> could have suddenly started to depend on this.

In legacy VFIO unmanaged domain was 1:1 associated with vfio
container. I didn't say how a group can switch between two
containers w/o going through transition to/from the default
domain, i.e. detach from 1st container and then attach to the 2nd.

> > btw looking at the code __iommu_group_set_domain():
> >
> > 	 * Note that this is called in error unwind paths, attaching to a
> > 	 * domain that has already been attached cannot fail.
> > 	 */
> > 	ret = __iommu_group_for_each_dev(group, new_domain,
> > 				iommu_group_do_attach_device);
> >
> > with that we don't need fall back to core domain in above error
> > unwinding per this comment.
> 
> That does make some sense.
> 
> I tried to make a patch to consolidate all this error handling once,
> that would be the better way to approach this.

that would be good.

> 
> > > In this case the API cannot retain a hidden reference to the new
> > > domain, so it must be purged, one way or another.
> >
> > Could you elaborate where the hidden reference is retained?
> 
> Inside the driver, it can keep track of the domain pointer if
> attach_dev succeeds
> 

Are you referring to no error unwinding in __iommu_group_for_each_dev()
so if it is failed some devices may have attach_dev succeeds then simply
recovering group->domain in __iommu_attach_group() is insufficient?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH v1 0/8] Add IO page table replacement support
  2023-02-06 13:26       ` Jason Gunthorpe
@ 2023-02-07  0:34         ` Tian, Kevin
  0 siblings, 0 replies; 41+ messages in thread
From: Tian, Kevin @ 2023-02-07  0:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Nicolin Chen, joro, will, robin.murphy, alex.williamson, shuah,
	Liu, Yi L, linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Monday, February 6, 2023 9:26 PM
> 
> On Mon, Feb 06, 2023 at 06:39:29AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Friday, February 3, 2023 11:04 PM
> > >
> > > On Fri, Feb 03, 2023 at 08:09:30AM +0000, Tian, Kevin wrote:
> > > > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > > > Sent: Thursday, February 2, 2023 3:05 PM
> > > > >
> > > > > QEMU with this feature should have the vIOMMU maintain a cache of
> the
> > > > > guest io page table addresses and assign a unique IOAS to each
> unique
> > > > > guest page table.
> > > >
> > > > I didn't get why we impose such requirement to userspace.
> > >
> > > I read this as implementation guidance for qemu. qemu can do what it
> > > wants of course
> > >
> >
> > sure but I still didn't get why this is a guidance specific to the
> > new replace cmd...
> 
> I think the guidance is about the change to VFIO uAPI where it is now
> possible to change the domain attached, previously that was not
> possible
> 

that is fine. I just didn't get why the original description emphasized
the cache and unique IOAS aspects in Qemu.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH v1 7/8] iommufd/device: Use iommu_group_replace_domain()
  2023-02-06 19:17     ` Nicolin Chen
@ 2023-02-07  0:37       ` Tian, Kevin
  0 siblings, 0 replies; 41+ messages in thread
From: Tian, Kevin @ 2023-02-07  0:37 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: jgg, joro, will, robin.murphy, alex.williamson, shuah, Liu, Yi L,
	linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Tuesday, February 7, 2023 3:18 AM
> 
> On Mon, Feb 06, 2023 at 08:46:04AM +0000, Tian, Kevin wrote:
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > Sent: Thursday, February 2, 2023 3:05 PM
> > >
> > > @@ -246,6 +249,18 @@ static int iommufd_device_do_attach(struct
> > > iommufd_device *idev,
> > >               }
> > >       }
> > >
> > > +     if (cur_hwpt) {
> > > +             /* Replace the cur_hwpt */
> > > +             mutex_lock(&cur_hwpt->devices_lock);
> > > +             if (cur_hwpt->ioas != hwpt->ioas)
> > > +                     iopt_remove_reserved_iova(&cur_hwpt->ioas->iopt,
> > > +                                               idev->dev);
> > > +             list_del(&cur_hwpt->hwpt_item);
> >
> > emmm shouldn't this be done only when the device is the last
> > one attached to the hwpt? and if it's the last one you should
> > also iopt_table_remove_domain() together with list_del, i.e.
> > similar housekeeping as done in iommufd_device_detach().
> 
> You are right. I had another patch on top of this series,
> moving this list_del() and iopt_table_remove_domain() to
> the destroy() callback, so I overlooked.
> 
> And I just found that the list_add_del(hwpt_item) in the
> IOMMUFD_OBJ_HW_PAGETABLE case doesn't seem to call at the
> first device's attachment. So, I think that we might need
> my previous "symmetric" patch in this series too.
> 

Yes, that makes sense.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 2/8] iommu: Introduce a new iommu_group_replace_domain() API
  2023-02-07  0:32           ` Tian, Kevin
@ 2023-02-07 12:22             ` Jason Gunthorpe
  2023-02-08  4:25               ` Tian, Kevin
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Gunthorpe @ 2023-02-07 12:22 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Nicolin Chen, joro, will, robin.murphy, alex.williamson, shuah,
	Liu, Yi L, linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

On Tue, Feb 07, 2023 at 12:32:50AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Monday, February 6, 2023 9:25 PM
> > 
> > On Mon, Feb 06, 2023 at 06:57:35AM +0000, Tian, Kevin wrote:
> > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > Sent: Friday, February 3, 2023 11:03 PM
> > > >
> > > > On Fri, Feb 03, 2023 at 08:26:44AM +0000, Tian, Kevin wrote:
> > > > > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > > > > Sent: Thursday, February 2, 2023 3:05 PM
> > > > > >
> > > > > > All drivers are already required to support changing between active
> > > > > > UNMANAGED domains when using their attach_dev ops.
> > > > >
> > > > > All drivers which don't have *broken* UNMANAGED domain?
> > > >
> > > > No, all drivers.. It has always been used by VFIO.
> > >
> > > existing iommu_attach_group() doesn't support changing between
> > > two UNMANAGED domains. only from default->unmanaged or
> > > blocking->unmanaged.
> > 
> > Yes, but before we added the blocking domains VFIO was changing
> > between unmanaged domains. Blocking domains are so new that no driver
> > could have suddenly started to depend on this.
> 
> In legacy VFIO unmanaged domain was 1:1 associated with vfio
> container. I didn't say how a group can switch between two
> containers w/o going through transition to/from the default
> domain, i.e. detach from 1st container and then attach to the 2nd.

Yes, in the past we went through the default domain which is basically
another unmanaged domain type. So unmanaged -> unmanaged is OK.

> > Inside the driver, it can keep track of the domain pointer if
> > attach_dev succeeds
> 
> Are you referring to no error unwinding in __iommu_group_for_each_dev()
> so if it is failed some devices may have attach_dev succeeds then simply
> recovering group->domain in __iommu_attach_group() is insufficient?

Yes

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 2/8] iommu: Introduce a new iommu_group_replace_domain() API
  2023-02-06 13:25         ` Jason Gunthorpe
  2023-02-07  0:32           ` Tian, Kevin
@ 2023-02-07 19:16           ` Nicolin Chen
  1 sibling, 0 replies; 41+ messages in thread
From: Nicolin Chen @ 2023-02-07 19:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Tian, Kevin, joro, will, robin.murphy, alex.williamson, shuah,
	Liu, Yi L, linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

On Mon, Feb 06, 2023 at 09:25:17AM -0400, Jason Gunthorpe wrote:

> > > > Can you elaborate the error handling here? Ideally if
> > > > __iommu_group_set_domain() fails then group->domain shouldn't
> > > > be changed.
> > > 
> > > That isn't what it implements though. The internal helper leaves
> > > things in a mess, it is for the caller to fix it, and it depends on
> > > the caller what that means.
> > 
> > I didn't see any warning of the mess and the caller's responsibility
> > in __iommu_group_set_domain(). Can it be documented clearly
> > so if someone wants to add a new caller on it he can clearly know
> > what to do?
> 
> That would be nice..

I'd expect the doc to come with some other patch/series than this
replace series, so I think we should be fine without adding a line
of comments in this patch?

> > btw looking at the code __iommu_group_set_domain():
> > 
> > 	 * Note that this is called in error unwind paths, attaching to a
> > 	 * domain that has already been attached cannot fail.
> > 	 */
> > 	ret = __iommu_group_for_each_dev(group, new_domain,
> > 				iommu_group_do_attach_device);
> > 
> > with that we don't need fall back to core domain in above error
> > unwinding per this comment.
> 
> That does make some sense.
> 
> I tried to make a patch to consolidate all this error handling once,
> that would be the better way to approach this.

Then, I'll drop the core-domain line. Combining my reply above:

+	mutex_lock(&group->mutex);
+	ret = __iommu_group_set_domain(group, new_domain);
+	if (ret)
+		__iommu_group_set_domain(group, group->domain);
+	mutex_unlock(&group->mutex);

Will wrap things up and send v2 today.

Thanks
Nic

^ permalink raw reply	[flat|nested] 41+ messages in thread

* RE: [PATCH v1 2/8] iommu: Introduce a new iommu_group_replace_domain() API
  2023-02-07 12:22             ` Jason Gunthorpe
@ 2023-02-08  4:25               ` Tian, Kevin
  2023-02-08 12:42                 ` Jason Gunthorpe
  0 siblings, 1 reply; 41+ messages in thread
From: Tian, Kevin @ 2023-02-08  4:25 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Nicolin Chen, joro, will, robin.murphy, alex.williamson, shuah,
	Liu, Yi L, linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, February 7, 2023 8:23 PM
> 
> On Tue, Feb 07, 2023 at 12:32:50AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Monday, February 6, 2023 9:25 PM
> > >
> > > On Mon, Feb 06, 2023 at 06:57:35AM +0000, Tian, Kevin wrote:
> > > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > > Sent: Friday, February 3, 2023 11:03 PM
> > > > >
> > > > > On Fri, Feb 03, 2023 at 08:26:44AM +0000, Tian, Kevin wrote:
> > > > > > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > > > > > Sent: Thursday, February 2, 2023 3:05 PM
> > > > > > >
> > > > > > > All drivers are already required to support changing between
> active
> > > > > > > UNMANAGED domains when using their attach_dev ops.
> > > > > >
> > > > > > All drivers which don't have *broken* UNMANAGED domain?
> > > > >
> > > > > No, all drivers.. It has always been used by VFIO.
> > > >
> > > > existing iommu_attach_group() doesn't support changing between
> > > > two UNMANAGED domains. only from default->unmanaged or
> > > > blocking->unmanaged.
> > >
> > > Yes, but before we added the blocking domains VFIO was changing
> > > between unmanaged domains. Blocking domains are so new that no
> driver
> > > could have suddenly started to depend on this.
> >
> > In legacy VFIO unmanaged domain was 1:1 associated with vfio
> > container. I didn't say how a group can switch between two
> > containers w/o going through transition to/from the default
> > domain, i.e. detach from 1st container and then attach to the 2nd.
> 
> Yes, in the past we went through the default domain which is basically
> another unmanaged domain type. So unmanaged -> unmanaged is OK.
> 

it's right in concept but confusing in current context whether DMA
domain still has its own type. That's why I replied earlier the statement
makes more sense after your patch which cleans up the domain type
is merged.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v1 2/8] iommu: Introduce a new iommu_group_replace_domain() API
  2023-02-08  4:25               ` Tian, Kevin
@ 2023-02-08 12:42                 ` Jason Gunthorpe
  0 siblings, 0 replies; 41+ messages in thread
From: Jason Gunthorpe @ 2023-02-08 12:42 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Nicolin Chen, joro, will, robin.murphy, alex.williamson, shuah,
	Liu, Yi L, linux-kernel, iommu, kvm, linux-kselftest, baolu.lu

On Wed, Feb 08, 2023 at 04:25:58AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, February 7, 2023 8:23 PM
> > 
> > On Tue, Feb 07, 2023 at 12:32:50AM +0000, Tian, Kevin wrote:
> > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > Sent: Monday, February 6, 2023 9:25 PM
> > > >
> > > > On Mon, Feb 06, 2023 at 06:57:35AM +0000, Tian, Kevin wrote:
> > > > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > > > Sent: Friday, February 3, 2023 11:03 PM
> > > > > >
> > > > > > On Fri, Feb 03, 2023 at 08:26:44AM +0000, Tian, Kevin wrote:
> > > > > > > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > > > > > > Sent: Thursday, February 2, 2023 3:05 PM
> > > > > > > >
> > > > > > > > All drivers are already required to support changing between
> > active
> > > > > > > > UNMANAGED domains when using their attach_dev ops.
> > > > > > >
> > > > > > > All drivers which don't have *broken* UNMANAGED domain?
> > > > > >
> > > > > > No, all drivers.. It has always been used by VFIO.
> > > > >
> > > > > existing iommu_attach_group() doesn't support changing between
> > > > > two UNMANAGED domains. only from default->unmanaged or
> > > > > blocking->unmanaged.
> > > >
> > > > Yes, but before we added the blocking domains VFIO was changing
> > > > between unmanaged domains. Blocking domains are so new that no
> > driver
> > > > could have suddenly started to depend on this.
> > >
> > > In legacy VFIO unmanaged domain was 1:1 associated with vfio
> > > container. I didn't say how a group can switch between two
> > > containers w/o going through transition to/from the default
> > > domain, i.e. detach from 1st container and then attach to the 2nd.
> > 
> > Yes, in the past we went through the default domain which is basically
> > another unmanaged domain type. So unmanaged -> unmanaged is OK.
> > 
> 
> it's right in concept but confusing in current context whether DMA
> domain still has its own type. That's why I replied earlier the statement
> makes more sense after your patch which cleans up the domain type
> is merged.

We are just reasoning about why the existing drivers are safe with
this - and my draft patch to remove DMA simply demonstrates that DMA
== UNMANAGED, and the above reasoning about VFIO in the past
demonstrates that this has historically be expected of drivers.

So I'm not so worried about patch order as long as things do work

Jason

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2023-02-08 12:42 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-02  7:05 [PATCH v1 0/8] Add IO page table replacement support Nicolin Chen
2023-02-02  7:05 ` [PATCH v1 1/8] iommu: Move dev_iommu_ops() to private header Nicolin Chen
2023-02-02  7:05 ` [PATCH v1 2/8] iommu: Introduce a new iommu_group_replace_domain() API Nicolin Chen
2023-02-02 10:21   ` Baolu Lu
2023-02-02 19:14     ` Nicolin Chen
2023-02-03  1:33       ` Baolu Lu
2023-02-03  1:41         ` Nicolin Chen
2023-02-03  2:35           ` Baolu Lu
2023-02-03  8:26   ` Tian, Kevin
2023-02-03 15:03     ` Jason Gunthorpe
2023-02-03 17:53       ` Nicolin Chen
2023-02-06  6:57       ` Tian, Kevin
2023-02-06 13:25         ` Jason Gunthorpe
2023-02-07  0:32           ` Tian, Kevin
2023-02-07 12:22             ` Jason Gunthorpe
2023-02-08  4:25               ` Tian, Kevin
2023-02-08 12:42                 ` Jason Gunthorpe
2023-02-07 19:16           ` Nicolin Chen
2023-02-03 17:45     ` Nicolin Chen
2023-02-06  6:40       ` Tian, Kevin
2023-02-02  7:05 ` [PATCH v1 3/8] iommufd: Create access in vfio_iommufd_emulated_bind() Nicolin Chen
2023-02-03  9:23   ` Tian, Kevin
2023-02-03  9:28   ` Tian, Kevin
2023-02-02  7:05 ` [PATCH v1 4/8] iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_SET_IOAS coverage Nicolin Chen
2023-02-02  7:05 ` [PATCH v1 5/8] iommufd: Add replace support in iommufd_access_set_ioas() Nicolin Chen
2023-02-03 10:10   ` Tian, Kevin
2023-02-03 12:31     ` Jason Gunthorpe
2023-02-03 22:25       ` Nicolin Chen
2023-02-02  7:05 ` [PATCH v1 6/8] iommufd/selftest: Add coverage for access->ioas replacement Nicolin Chen
2023-02-02  7:05 ` [PATCH v1 7/8] iommufd/device: Use iommu_group_replace_domain() Nicolin Chen
2023-02-06  8:46   ` Tian, Kevin
2023-02-06 19:17     ` Nicolin Chen
2023-02-07  0:37       ` Tian, Kevin
2023-02-02  7:05 ` [PATCH v1 8/8] vfio-iommufd: Support IO page table replacement Nicolin Chen
2023-02-06  8:49   ` Tian, Kevin
2023-02-06 18:54     ` Nicolin Chen
2023-02-03  8:09 ` [PATCH v1 0/8] Add IO page table replacement support Tian, Kevin
2023-02-03 15:04   ` Jason Gunthorpe
2023-02-06  6:39     ` Tian, Kevin
2023-02-06 13:26       ` Jason Gunthorpe
2023-02-07  0:34         ` Tian, Kevin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.