All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 0/4] cover-letter: Add IO page table replacement support
@ 2023-07-24 19:47 Nicolin Chen
  2023-07-24 19:47 ` [PATCH v8 1/4] vfio: Do not allow !ops->dma_unmap in vfio_pin/unpin_pages() Nicolin Chen
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Nicolin Chen @ 2023-07-24 19:47 UTC (permalink / raw)
  To: jgg, kevin.tian, alex.williamson
  Cc: yi.l.liu, joro, will, robin.murphy, shuah, linux-kernel, iommu,
	kvm, linux-kselftest, mjrosato, farman

[ This series depends on the VFIO device cdev series ]

Changelog
v8:
 * Rebased on top of Jason's iommufd_hwpt series and then cdev v15 series:
   https://lore.kernel.org/all/0-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.com/
   https://lore.kernel.org/kvm/20230718135551.6592-1-yi.l.liu@intel.com/
 * Changed the order of detach() and attach() in replace(), to fix a bug
v7:
 https://lore.kernel.org/all/cover.1683593831.git.nicolinc@nvidia.com/
 * Rebased on top of v6.4-rc1 and cdev v11 candidate
 * Fixed a wrong file in replace() API patch
 * Added Kevin's "Reviewed-by" to replace() API patch
v6:
 https://lore.kernel.org/all/cover.1679939952.git.nicolinc@nvidia.com/
 * Rebased on top of cdev v8 series
   https://lore.kernel.org/kvm/20230327094047.47215-1-yi.l.liu@intel.com/
 * Added "Reviewed-by" from Kevin to PATCH-4
 * Squashed access->ioas updating lines into iommufd_access_change_pt(),
   and changed function return type accordingly for simplification.
v5:
 https://lore.kernel.org/all/cover.1679559476.git.nicolinc@nvidia.com/
 * Kept the cmd->id in the iommufd_test_create_access() so the access can
   be created with an ioas by default. Then, renamed the previous ioctl
   IOMMU_TEST_OP_ACCESS_SET_IOAS to IOMMU_TEST_OP_ACCESS_REPLACE_IOAS, so
   it would be used to replace an access->ioas pointer.
 * Added iommufd_access_replace() API after the introductions of the other
   two APIs iommufd_access_attach() and iommufd_access_detach().
 * Since vdev->iommufd_attached is also set in emulated pathway too, call
   iommufd_access_update(), similar to the physical pathway.
v4:
 https://lore.kernel.org/all/cover.1678284812.git.nicolinc@nvidia.com/
 * Rebased on top of Jason's series adding replace() and hwpt_alloc()
 https://lore.kernel.org/all/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia.com/
 * Rebased on top of cdev series v6
 https://lore.kernel.org/kvm/20230308132903.465159-1-yi.l.liu@intel.com/
 * Dropped the patch that's moved to cdev series.
 * Added unmap function pointer sanity before calling it.
 * Added "Reviewed-by" from Kevin and Yi.
 * Added back the VFIO change updating the ATTACH uAPI.
v3:
 https://lore.kernel.org/all/cover.1677288789.git.nicolinc@nvidia.com/
 * Rebased on top of Jason's iommufd_hwpt branch:
 https://lore.kernel.org/all/0-v2-406f7ac07936+6a-iommufd_hwpt_jgg@nvidia.com/
 * Dropped patches from this series accordingly. There were a couple of
   VFIO patches that will be submitted after the VFIO cdev series. Also,
   renamed the series to be "emulated".
 * Moved dma_unmap sanity patch to the first in the series.
 * Moved dma_unmap sanity to cover both VFIO and IOMMUFD pathways.
 * Added Kevin's "Reviewed-by" to two of the patches.
 * Fixed a NULL pointer bug in vfio_iommufd_emulated_bind().
 * Moved unmap() call to the common place in iommufd_access_set_ioas().
v2:
 https://lore.kernel.org/all/cover.1675802050.git.nicolinc@nvidia.com/
 * Rebased on top of vfio_device cdev v2 series.
 * Update the kdoc and commit message of iommu_group_replace_domain().
 * Dropped revert-to-core-domain part in iommu_group_replace_domain().
 * Dropped !ops->dma_unmap check in vfio_iommufd_emulated_attach_ioas().
 * Added missing rc value in vfio_iommufd_emulated_attach_ioas() from the
   iommufd_access_set_ioas() call.
 * Added a new patch in vfio_main to deny vfio_pin/unpin_pages() calls if
   vdev->ops->dma_unmap is not implemented.
 * Added a __iommmufd_device_detach helper and let the replace routine do
   a partial detach().
 * Added restriction on auto_domains to use the replace feature.
 * Added the patch "iommufd/device: Make hwpt_list list_add/del symmetric"
   from the has_group removal series.
v1:
 https://lore.kernel.org/all/cover.1675320212.git.nicolinc@nvidia.com/

Hi all,

The existing IOMMU APIs provide a pair of functions: iommu_attach_group()
for callers to attach a device from the default_domain (NULL if not being
supported) to a given iommu domain, and iommu_detach_group() for callers
to detach a device from a given domain to the default_domain. Internally,
the detach_dev op is deprecated for the newer drivers with default_domain.
This means that those drivers likely can switch an attaching domain to
another one, without stagging the device at a blocking or default domain,
for use cases such as:
1) vPASID mode, when a guest wants to replace a single pasid (PASID=0)
   table with a larger table (PASID=N)
2) Nesting mode, when switching the attaching device from an S2 domain
   to an S1 domain, or when switching between relevant S1 domains.

This series is rebased on top of Jason Gunthorpe's series that introduces
iommu_group_replace_domain API and IOMMUFD infrastructure for the IOMMUFD
"physical" devices. The IOMMUFD "emulated" deivces will need some extra
steps to replace the access->ioas object and its iopt pointer.

You can also find this series on Github:
https://github.com/nicolinc/iommufd/commits/iommu_group_replace_domain-v8

Thank you
Nicolin Chen

Nicolin Chen (4):
  vfio: Do not allow !ops->dma_unmap in vfio_pin/unpin_pages()
  iommufd: Add iommufd_access_replace() API
  iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_REPLACE_IOAS coverage
  vfio: Support IO page table replacement

 drivers/iommu/iommufd/device.c                | 72 ++++++++++++++-----
 drivers/iommu/iommufd/iommufd_test.h          |  4 ++
 drivers/iommu/iommufd/selftest.c              | 19 +++++
 drivers/vfio/iommufd.c                        | 11 +--
 drivers/vfio/vfio_main.c                      |  4 ++
 include/linux/iommufd.h                       |  1 +
 include/uapi/linux/vfio.h                     |  6 ++
 tools/testing/selftests/iommu/iommufd.c       | 29 +++++++-
 tools/testing/selftests/iommu/iommufd_utils.h | 19 +++++
 9 files changed, 142 insertions(+), 23 deletions(-)

-- 
2.41.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v8 1/4] vfio: Do not allow !ops->dma_unmap in vfio_pin/unpin_pages()
  2023-07-24 19:47 [PATCH v8 0/4] cover-letter: Add IO page table replacement support Nicolin Chen
@ 2023-07-24 19:47 ` Nicolin Chen
  2023-07-26 17:33   ` Alex Williamson
  2023-07-24 19:47 ` [PATCH v8 2/4] iommufd: Add iommufd_access_replace() API Nicolin Chen
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 22+ messages in thread
From: Nicolin Chen @ 2023-07-24 19:47 UTC (permalink / raw)
  To: jgg, kevin.tian, alex.williamson
  Cc: yi.l.liu, joro, will, robin.murphy, shuah, linux-kernel, iommu,
	kvm, linux-kselftest, mjrosato, farman

A driver that doesn't implement ops->dma_unmap shouldn't be allowed to do
vfio_pin/unpin_pages(), though it can use vfio_dma_rw() to access an iova
range. Deny !ops->dma_unmap cases in vfio_pin/unpin_pages().

Suggested-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/vfio/vfio_main.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 902f06e52c48..0da8ed81a97d 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1483,6 +1483,8 @@ int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
 	/* group->container cannot change while a vfio device is open */
 	if (!pages || !npage || WARN_ON(!vfio_assert_device_open(device)))
 		return -EINVAL;
+	if (!device->ops->dma_unmap)
+		return -EINVAL;
 	if (vfio_device_has_container(device))
 		return vfio_device_container_pin_pages(device, iova,
 						       npage, prot, pages);
@@ -1520,6 +1522,8 @@ void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova, int npage)
 {
 	if (WARN_ON(!vfio_assert_device_open(device)))
 		return;
+	if (WARN_ON(!device->ops->dma_unmap))
+		return;
 
 	if (vfio_device_has_container(device)) {
 		vfio_device_container_unpin_pages(device, iova, npage);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v8 2/4] iommufd: Add iommufd_access_replace() API
  2023-07-24 19:47 [PATCH v8 0/4] cover-letter: Add IO page table replacement support Nicolin Chen
  2023-07-24 19:47 ` [PATCH v8 1/4] vfio: Do not allow !ops->dma_unmap in vfio_pin/unpin_pages() Nicolin Chen
@ 2023-07-24 19:47 ` Nicolin Chen
  2023-07-26 14:30   ` Jason Gunthorpe
  2023-07-24 19:47 ` [PATCH v8 3/4] iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_REPLACE_IOAS coverage Nicolin Chen
  2023-07-24 19:47 ` [PATCH v8 4/4] vfio: Support IO page table replacement Nicolin Chen
  3 siblings, 1 reply; 22+ messages in thread
From: Nicolin Chen @ 2023-07-24 19:47 UTC (permalink / raw)
  To: jgg, kevin.tian, alex.williamson
  Cc: yi.l.liu, joro, will, robin.murphy, shuah, linux-kernel, iommu,
	kvm, linux-kselftest, mjrosato, farman

Extract all common procedure, between the iommufd_access_attach API and a
new iommufd_access_replace API, to an iommufd_access_change_pt helper. And
separate an unlocked __iommufd_access_detach to use it in the helper too.

This adds a new iommufd_access_replace() for VFIO emulated pathway to use.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/iommufd/device.c | 72 ++++++++++++++++++++++++++--------
 include/linux/iommufd.h        |  1 +
 2 files changed, 57 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 072912d87f53..1015d6c42e2b 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -757,13 +757,11 @@ void iommufd_access_destroy(struct iommufd_access *access)
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_access_destroy, IOMMUFD);
 
-void iommufd_access_detach(struct iommufd_access *access)
+static void __iommufd_access_detach(struct iommufd_access *access)
 {
 	struct iommufd_ioas *cur_ioas = access->ioas;
 
-	mutex_lock(&access->ioas_lock);
-	if (WARN_ON(!access->ioas))
-		goto out;
+	lockdep_assert_held(&access->ioas_lock);
 	/*
 	 * Set ioas to NULL to block any further iommufd_access_pin_pages().
 	 * iommufd_access_unpin_pages() can continue using access->ioas_unpin.
@@ -777,44 +775,86 @@ void iommufd_access_detach(struct iommufd_access *access)
 	}
 	iopt_remove_access(&cur_ioas->iopt, access);
 	refcount_dec(&cur_ioas->obj.users);
+}
+
+void iommufd_access_detach(struct iommufd_access *access)
+{
+	mutex_lock(&access->ioas_lock);
+	if (WARN_ON(!access->ioas))
+		goto out;
+	__iommufd_access_detach(access);
 out:
 	access->ioas_unpin = NULL;
 	mutex_unlock(&access->ioas_lock);
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_access_detach, IOMMUFD);
 
-int iommufd_access_attach(struct iommufd_access *access, u32 ioas_id)
+static int iommufd_access_change_pt(struct iommufd_access *access, u32 ioas_id)
 {
+	struct iommufd_ioas *cur_ioas = access->ioas;
 	struct iommufd_ioas *new_ioas;
-	int rc = 0;
+	int rc;
 
-	mutex_lock(&access->ioas_lock);
-	if (WARN_ON(access->ioas || access->ioas_unpin)) {
-		mutex_unlock(&access->ioas_lock);
-		return -EINVAL;
-	}
+	lockdep_assert_held(&access->ioas_lock);
 
 	new_ioas = iommufd_get_ioas(access->ictx, ioas_id);
-	if (IS_ERR(new_ioas)) {
-		mutex_unlock(&access->ioas_lock);
+	if (IS_ERR(new_ioas))
 		return PTR_ERR(new_ioas);
-	}
+
+	if (cur_ioas)
+		__iommufd_access_detach(access);
 
 	rc = iopt_add_access(&new_ioas->iopt, access);
 	if (rc) {
-		mutex_unlock(&access->ioas_lock);
 		iommufd_put_object(&new_ioas->obj);
+		if (cur_ioas)
+			WARN_ON(iommufd_access_change_pt(access,
+							 cur_ioas->obj.id));
 		return rc;
 	}
 	iommufd_ref_to_users(&new_ioas->obj);
 
 	access->ioas = new_ioas;
 	access->ioas_unpin = new_ioas;
-	mutex_unlock(&access->ioas_lock);
 	return 0;
 }
+
+int iommufd_access_attach(struct iommufd_access *access, u32 ioas_id)
+{
+	int rc;
+
+	mutex_lock(&access->ioas_lock);
+	if (WARN_ON(access->ioas || access->ioas_unpin)) {
+		mutex_unlock(&access->ioas_lock);
+		return -EINVAL;
+	}
+
+	rc = iommufd_access_change_pt(access, ioas_id);
+	mutex_unlock(&access->ioas_lock);
+	return rc;
+}
 EXPORT_SYMBOL_NS_GPL(iommufd_access_attach, IOMMUFD);
 
+int iommufd_access_replace(struct iommufd_access *access, u32 ioas_id)
+{
+	int rc;
+
+	mutex_lock(&access->ioas_lock);
+	if (!access->ioas) {
+		mutex_unlock(&access->ioas_lock);
+		return -ENOENT;
+	}
+	if (access->ioas->obj.id == ioas_id) {
+		mutex_unlock(&access->ioas_lock);
+		return 0;
+	}
+
+	rc = iommufd_access_change_pt(access, ioas_id);
+	mutex_unlock(&access->ioas_lock);
+	return rc;
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_access_replace, IOMMUFD);
+
 /**
  * iommufd_access_notify_unmap - Notify users of an iopt to stop using it
  * @iopt: iopt to work on
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index 0ac60256b659..ffc3a949f837 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -49,6 +49,7 @@ iommufd_access_create(struct iommufd_ctx *ictx,
 		      const struct iommufd_access_ops *ops, void *data, u32 *id);
 void iommufd_access_destroy(struct iommufd_access *access);
 int iommufd_access_attach(struct iommufd_access *access, u32 ioas_id);
+int iommufd_access_replace(struct iommufd_access *access, u32 ioas_id);
 void iommufd_access_detach(struct iommufd_access *access);
 
 void iommufd_ctx_get(struct iommufd_ctx *ictx);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v8 3/4] iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_REPLACE_IOAS coverage
  2023-07-24 19:47 [PATCH v8 0/4] cover-letter: Add IO page table replacement support Nicolin Chen
  2023-07-24 19:47 ` [PATCH v8 1/4] vfio: Do not allow !ops->dma_unmap in vfio_pin/unpin_pages() Nicolin Chen
  2023-07-24 19:47 ` [PATCH v8 2/4] iommufd: Add iommufd_access_replace() API Nicolin Chen
@ 2023-07-24 19:47 ` Nicolin Chen
  2023-07-26 17:04   ` Jason Gunthorpe
  2023-07-24 19:47 ` [PATCH v8 4/4] vfio: Support IO page table replacement Nicolin Chen
  3 siblings, 1 reply; 22+ messages in thread
From: Nicolin Chen @ 2023-07-24 19:47 UTC (permalink / raw)
  To: jgg, kevin.tian, alex.williamson
  Cc: yi.l.liu, joro, will, robin.murphy, shuah, linux-kernel, iommu,
	kvm, linux-kselftest, mjrosato, farman

Add a new IOMMU_TEST_OP_ACCESS_REPLACE_IOAS to allow replacing the
access->ioas, corresponding to the iommufd_access_replace() helper.

Then add a replace coverage as a part of user_copy test case, which
basically repeats the copy test after replacing the old ioas with a
new one.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/iommufd/iommufd_test.h          |  4 +++
 drivers/iommu/iommufd/selftest.c              | 19 ++++++++++++
 tools/testing/selftests/iommu/iommufd.c       | 29 +++++++++++++++++--
 tools/testing/selftests/iommu/iommufd_utils.h | 19 ++++++++++++
 4 files changed, 69 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index dd9168a20ddf..258de2253b61 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -18,6 +18,7 @@ enum {
 	IOMMU_TEST_OP_ACCESS_RW,
 	IOMMU_TEST_OP_SET_TEMP_MEMORY_LIMIT,
 	IOMMU_TEST_OP_MOCK_DOMAIN_REPLACE,
+	IOMMU_TEST_OP_ACCESS_REPLACE_IOAS,
 };
 
 enum {
@@ -91,6 +92,9 @@ struct iommu_test_cmd {
 		struct {
 			__u32 limit;
 		} memory_limit;
+		struct {
+			__u32 ioas_id;
+		} access_replace_ioas;
 	};
 	__u32 last;
 };
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 9d43334e4faf..bb2cd54ca7b6 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -785,6 +785,22 @@ static int iommufd_test_create_access(struct iommufd_ucmd *ucmd,
 	return rc;
 }
 
+static int iommufd_test_access_replace_ioas(struct iommufd_ucmd *ucmd,
+					    unsigned int access_id,
+					    unsigned int ioas_id)
+{
+	struct selftest_access *staccess;
+	int rc;
+
+	staccess = iommufd_access_get(access_id);
+	if (IS_ERR(staccess))
+		return PTR_ERR(staccess);
+
+	rc = iommufd_access_replace(staccess->access, ioas_id);
+	fput(staccess->file);
+	return rc;
+}
+
 /* Check that the pages in a page array match the pages in the user VA */
 static int iommufd_test_check_pages(void __user *uptr, struct page **pages,
 				    size_t npages)
@@ -1000,6 +1016,9 @@ int iommufd_test(struct iommufd_ucmd *ucmd)
 	case IOMMU_TEST_OP_CREATE_ACCESS:
 		return iommufd_test_create_access(ucmd, cmd->id,
 						  cmd->create_access.flags);
+	case IOMMU_TEST_OP_ACCESS_REPLACE_IOAS:
+		return iommufd_test_access_replace_ioas(
+			ucmd, cmd->id, cmd->access_replace_ioas.ioas_id);
 	case IOMMU_TEST_OP_ACCESS_PAGES:
 		return iommufd_test_access_pages(
 			ucmd, cmd->id, cmd->access_pages.iova,
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index dc09c1de319f..8acd0af37aa5 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -1283,7 +1283,13 @@ TEST_F(iommufd_mock_domain, user_copy)
 		.dst_iova = MOCK_APERTURE_START,
 		.length = BUFFER_SIZE,
 	};
-	unsigned int ioas_id;
+	struct iommu_ioas_unmap unmap_cmd = {
+		.size = sizeof(unmap_cmd),
+		.ioas_id = self->ioas_id,
+		.iova = MOCK_APERTURE_START,
+		.length = BUFFER_SIZE,
+	};
+	unsigned int new_ioas_id, ioas_id;
 
 	/* Pin the pages in an IOAS with no domains then copy to an IOAS with domains */
 	test_ioctl_ioas_alloc(&ioas_id);
@@ -1301,11 +1307,30 @@ TEST_F(iommufd_mock_domain, user_copy)
 	ASSERT_EQ(0, ioctl(self->fd, IOMMU_IOAS_COPY, &copy_cmd));
 	check_mock_iova(buffer, MOCK_APERTURE_START, BUFFER_SIZE);
 
+	/* Now replace the ioas with a new one */
+	test_ioctl_ioas_alloc(&new_ioas_id);
+	test_ioctl_ioas_map_id(new_ioas_id, buffer, BUFFER_SIZE,
+			       &copy_cmd.src_iova);
+	test_cmd_access_replace_ioas(access_cmd.id, new_ioas_id);
+
+	/* Destroy the old ioas and cleanup copied mapping */
+	ASSERT_EQ(0, ioctl(self->fd, IOMMU_IOAS_UNMAP, &unmap_cmd));
+	test_ioctl_destroy(ioas_id);
+
+	/* Then run the same test again with the new ioas */
+	access_cmd.access_pages.iova = copy_cmd.src_iova;
+	ASSERT_EQ(0,
+		  ioctl(self->fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_ACCESS_PAGES),
+			&access_cmd));
+	copy_cmd.src_ioas_id = new_ioas_id;
+	ASSERT_EQ(0, ioctl(self->fd, IOMMU_IOAS_COPY, &copy_cmd));
+	check_mock_iova(buffer, MOCK_APERTURE_START, BUFFER_SIZE);
+
 	test_cmd_destroy_access_pages(
 		access_cmd.id, access_cmd.access_pages.out_access_pages_id);
 	test_cmd_destroy_access(access_cmd.id);
 
-	test_ioctl_destroy(ioas_id);
+	test_ioctl_destroy(new_ioas_id);
 }
 
 TEST_F(iommufd_mock_domain, replace)
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 53b4d3f2d9fc..70353e68e599 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -119,6 +119,25 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id,
 #define test_cmd_hwpt_alloc(device_id, pt_id, hwpt_id) \
 	ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, hwpt_id))
 
+static int _test_cmd_access_replace_ioas(int fd, __u32 access_id,
+					 unsigned int ioas_id)
+{
+	struct iommu_test_cmd cmd = {
+		.size = sizeof(cmd),
+		.op = IOMMU_TEST_OP_ACCESS_REPLACE_IOAS,
+		.id = access_id,
+		.access_replace_ioas = { .ioas_id = ioas_id },
+	};
+	int ret;
+
+	ret = ioctl(fd, IOMMU_TEST_CMD, &cmd);
+	if (ret)
+		return ret;
+	return 0;
+}
+#define test_cmd_access_replace_ioas(access_id, ioas_id) \
+	ASSERT_EQ(0, _test_cmd_access_replace_ioas(self->fd, access_id, ioas_id))
+
 static int _test_cmd_create_access(int fd, unsigned int ioas_id,
 				   __u32 *access_id, unsigned int flags)
 {
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v8 4/4] vfio: Support IO page table replacement
  2023-07-24 19:47 [PATCH v8 0/4] cover-letter: Add IO page table replacement support Nicolin Chen
                   ` (2 preceding siblings ...)
  2023-07-24 19:47 ` [PATCH v8 3/4] iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_REPLACE_IOAS coverage Nicolin Chen
@ 2023-07-24 19:47 ` Nicolin Chen
  2023-07-26 17:04   ` Jason Gunthorpe
  2023-07-26 17:34   ` Alex Williamson
  3 siblings, 2 replies; 22+ messages in thread
From: Nicolin Chen @ 2023-07-24 19:47 UTC (permalink / raw)
  To: jgg, kevin.tian, alex.williamson
  Cc: yi.l.liu, joro, will, robin.murphy, shuah, linux-kernel, iommu,
	kvm, linux-kselftest, mjrosato, farman

Now both the physical path and the emulated path should support an IO page
table replacement. Call iommufd_device_replace/iommufd_access_replace(),
when vdev->iommufd_attached is true.

Also update the VFIO_DEVICE_ATTACH_IOMMUFD_PT kdoc in the uAPI header.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/vfio/iommufd.c    | 11 ++++++-----
 include/uapi/linux/vfio.h |  6 ++++++
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index 4d84904fd927..82eba6966fa5 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -146,9 +146,9 @@ int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
 		return -EINVAL;
 
 	if (vdev->iommufd_attached)
-		return -EBUSY;
-
-	rc = iommufd_device_attach(vdev->iommufd_device, pt_id);
+		rc = iommufd_device_replace(vdev->iommufd_device, pt_id);
+	else
+		rc = iommufd_device_attach(vdev->iommufd_device, pt_id);
 	if (rc)
 		return rc;
 	vdev->iommufd_attached = true;
@@ -223,8 +223,9 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
 	lockdep_assert_held(&vdev->dev_set->lock);
 
 	if (vdev->iommufd_attached)
-		return -EBUSY;
-	rc = iommufd_access_attach(vdev->iommufd_access, *pt_id);
+		rc = iommufd_access_replace(vdev->iommufd_access, *pt_id);
+	else
+		rc = iommufd_access_attach(vdev->iommufd_access, *pt_id);
 	if (rc)
 		return rc;
 	vdev->iommufd_attached = true;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index fa06e3eb4955..537157ff8670 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -939,6 +939,12 @@ struct vfio_device_bind_iommufd {
  * Undo by VFIO_DEVICE_DETACH_IOMMUFD_PT or device fd close.  This is only
  * allowed on cdev fds.
  *
+ * If a vfio device is currently attached to a valid hw_pagetable, without doing
+ * a VFIO_DEVICE_DETACH_IOMMUFD_PT, a second VFIO_DEVICE_ATTACH_IOMMUFD_PT ioctl
+ * passing in another hw_pagetable (hwpt) id is allowed. This action, also known
+ * as a hw_pagetable replacement, will replace the device's currently attached
+ * hw_pagetable with a new hw_pagetable corresponding to the given pt_id.
+ *
  * Return: 0 on success, -errno on failure.
  */
 struct vfio_device_attach_iommufd_pt {
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v8 2/4] iommufd: Add iommufd_access_replace() API
  2023-07-24 19:47 ` [PATCH v8 2/4] iommufd: Add iommufd_access_replace() API Nicolin Chen
@ 2023-07-26 14:30   ` Jason Gunthorpe
  2023-07-26 20:50     ` Nicolin Chen
  0 siblings, 1 reply; 22+ messages in thread
From: Jason Gunthorpe @ 2023-07-26 14:30 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: kevin.tian, alex.williamson, yi.l.liu, joro, will, robin.murphy,
	shuah, linux-kernel, iommu, kvm, linux-kselftest, mjrosato,
	farman

On Mon, Jul 24, 2023 at 12:47:05PM -0700, Nicolin Chen wrote:
> -int iommufd_access_attach(struct iommufd_access *access, u32 ioas_id)
> +static int iommufd_access_change_pt(struct iommufd_access *access, u32 ioas_id)
>  {
> +	struct iommufd_ioas *cur_ioas = access->ioas;
>  	struct iommufd_ioas *new_ioas;
> -	int rc = 0;
> +	int rc;
>  
> -	mutex_lock(&access->ioas_lock);
> -	if (WARN_ON(access->ioas || access->ioas_unpin)) {
> -		mutex_unlock(&access->ioas_lock);
> -		return -EINVAL;
> -	}
> +	lockdep_assert_held(&access->ioas_lock);
>  
>  	new_ioas = iommufd_get_ioas(access->ictx, ioas_id);
> -	if (IS_ERR(new_ioas)) {
> -		mutex_unlock(&access->ioas_lock);
> +	if (IS_ERR(new_ioas))
>  		return PTR_ERR(new_ioas);
> -	}
> +
> +	if (cur_ioas)
> +		__iommufd_access_detach(access);

The drop of the mutex while this function runs is racey with the rest
of this, we can mitigate it by blocking concurrent change while
detaching which is if access->ioas_unpin is set
  
>  	rc = iopt_add_access(&new_ioas->iopt, access);
>  	if (rc) {
> -		mutex_unlock(&access->ioas_lock);
>  		iommufd_put_object(&new_ioas->obj);
> +		if (cur_ioas)
> +			WARN_ON(iommufd_access_change_pt(access,
> +							 cur_ioas->obj.id));

We've already dropped our ref to cur_ioas, so this is also racy with
destroy.

This is what I came up with:

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 57c0e81f5073b2..e55d6e902edb98 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -758,64 +758,101 @@ void iommufd_access_destroy(struct iommufd_access *access)
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_access_destroy, IOMMUFD);
 
-void iommufd_access_detach(struct iommufd_access *access)
+static int iommufd_access_change_ioas(struct iommufd_access *access,
+				      struct iommufd_ioas *new_ioas)
 {
 	struct iommufd_ioas *cur_ioas = access->ioas;
+	int rc;
+
+	lockdep_assert_held(&access->ioas_lock);
+
+	/* We are racing with a concurrent detach, bail */
+	if (access->ioas_unpin)
+		return -EBUSY;
+
+	if (IS_ERR(new_ioas))
+		return PTR_ERR(new_ioas);
+
+	if (cur_ioas == new_ioas)
+		return 0;
 
-	mutex_lock(&access->ioas_lock);
-	if (WARN_ON(!access->ioas))
-		goto out;
 	/*
 	 * Set ioas to NULL to block any further iommufd_access_pin_pages().
 	 * iommufd_access_unpin_pages() can continue using access->ioas_unpin.
 	 */
 	access->ioas = NULL;
-
-	if (access->ops->unmap) {
+	if (cur_ioas && access->ops->unmap) {
 		mutex_unlock(&access->ioas_lock);
 		access->ops->unmap(access->data, 0, ULONG_MAX);
 		mutex_lock(&access->ioas_lock);
 	}
+
+	if (new_ioas) {
+		rc = iopt_add_access(&new_ioas->iopt, access);
+		if (rc) {
+			iommufd_put_object(&new_ioas->obj);
+			access->ioas = cur_ioas;
+			return rc;
+		}
+		iommufd_ref_to_users(&new_ioas->obj);
+	}
+
+	access->ioas = new_ioas;
+	access->ioas_unpin = new_ioas;
 	iopt_remove_access(&cur_ioas->iopt, access);
 	refcount_dec(&cur_ioas->obj.users);
-out:
-	access->ioas_unpin = NULL;
+
+	return 0;
+}
+
+void iommufd_access_detach(struct iommufd_access *access)
+{
+	int rc;
+
+	mutex_lock(&access->ioas_lock);
+	if (WARN_ON(!access->ioas)) {
+		mutex_unlock(&access->ioas_lock);
+		return;
+	}
+	rc = iommufd_access_change_ioas(access, NULL);
+	WARN_ON(rc);
 	mutex_unlock(&access->ioas_lock);
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_access_detach, IOMMUFD);
 
 int iommufd_access_attach(struct iommufd_access *access, u32 ioas_id)
 {
-	struct iommufd_ioas *new_ioas;
-	int rc = 0;
+	int rc;
 
 	mutex_lock(&access->ioas_lock);
-	if (WARN_ON(access->ioas || access->ioas_unpin)) {
+	if (WARN_ON(access->ioas)) {
 		mutex_unlock(&access->ioas_lock);
 		return -EINVAL;
 	}
 
-	new_ioas = iommufd_get_ioas(access->ictx, ioas_id);
-	if (IS_ERR(new_ioas)) {
-		mutex_unlock(&access->ioas_lock);
-		return PTR_ERR(new_ioas);
-	}
-
-	rc = iopt_add_access(&new_ioas->iopt, access);
-	if (rc) {
-		mutex_unlock(&access->ioas_lock);
-		iommufd_put_object(&new_ioas->obj);
-		return rc;
-	}
-	iommufd_ref_to_users(&new_ioas->obj);
-
-	access->ioas = new_ioas;
-	access->ioas_unpin = new_ioas;
+	rc = iommufd_access_change_ioas(access,
+				      iommufd_get_ioas(access->ictx, ioas_id));
 	mutex_unlock(&access->ioas_lock);
-	return 0;
+	return rc;
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_access_attach, IOMMUFD);
 
+int iommufd_access_replace(struct iommufd_access *access, u32 ioas_id)
+{
+	int rc;
+
+	mutex_lock(&access->ioas_lock);
+	if (!access->ioas) {
+		mutex_unlock(&access->ioas_lock);
+		return -ENOENT;
+	}
+	rc = iommufd_access_change_ioas(access,
+				      iommufd_get_ioas(access->ictx, ioas_id));
+	mutex_unlock(&access->ioas_lock);
+	return rc;
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_access_replace, IOMMUFD);
+
 /**
  * iommufd_access_notify_unmap - Notify users of an iopt to stop using it
  * @iopt: iopt to work on
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index 0ac60256b65929..ffc3a949f8374f 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -49,6 +49,7 @@ iommufd_access_create(struct iommufd_ctx *ictx,
 		      const struct iommufd_access_ops *ops, void *data, u32 *id);
 void iommufd_access_destroy(struct iommufd_access *access);
 int iommufd_access_attach(struct iommufd_access *access, u32 ioas_id);
+int iommufd_access_replace(struct iommufd_access *access, u32 ioas_id);
 void iommufd_access_detach(struct iommufd_access *access);
 
 void iommufd_ctx_get(struct iommufd_ctx *ictx);


Jason

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v8 3/4] iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_REPLACE_IOAS coverage
  2023-07-24 19:47 ` [PATCH v8 3/4] iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_REPLACE_IOAS coverage Nicolin Chen
@ 2023-07-26 17:04   ` Jason Gunthorpe
  0 siblings, 0 replies; 22+ messages in thread
From: Jason Gunthorpe @ 2023-07-26 17:04 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: kevin.tian, alex.williamson, yi.l.liu, joro, will, robin.murphy,
	shuah, linux-kernel, iommu, kvm, linux-kselftest, mjrosato,
	farman

On Mon, Jul 24, 2023 at 12:47:06PM -0700, Nicolin Chen wrote:
> Add a new IOMMU_TEST_OP_ACCESS_REPLACE_IOAS to allow replacing the
> access->ioas, corresponding to the iommufd_access_replace() helper.
> 
> Then add a replace coverage as a part of user_copy test case, which
> basically repeats the copy test after replacing the old ioas with a
> new one.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>  drivers/iommu/iommufd/iommufd_test.h          |  4 +++
>  drivers/iommu/iommufd/selftest.c              | 19 ++++++++++++
>  tools/testing/selftests/iommu/iommufd.c       | 29 +++++++++++++++++--
>  tools/testing/selftests/iommu/iommufd_utils.h | 19 ++++++++++++
>  4 files changed, 69 insertions(+), 2 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v8 4/4] vfio: Support IO page table replacement
  2023-07-24 19:47 ` [PATCH v8 4/4] vfio: Support IO page table replacement Nicolin Chen
@ 2023-07-26 17:04   ` Jason Gunthorpe
  2023-07-26 17:34   ` Alex Williamson
  1 sibling, 0 replies; 22+ messages in thread
From: Jason Gunthorpe @ 2023-07-26 17:04 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: kevin.tian, alex.williamson, yi.l.liu, joro, will, robin.murphy,
	shuah, linux-kernel, iommu, kvm, linux-kselftest, mjrosato,
	farman

On Mon, Jul 24, 2023 at 12:47:07PM -0700, Nicolin Chen wrote:
> Now both the physical path and the emulated path should support an IO page
> table replacement. Call iommufd_device_replace/iommufd_access_replace(),
> when vdev->iommufd_attached is true.
> 
> Also update the VFIO_DEVICE_ATTACH_IOMMUFD_PT kdoc in the uAPI header.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>  drivers/vfio/iommufd.c    | 11 ++++++-----
>  include/uapi/linux/vfio.h |  6 ++++++
>  2 files changed, 12 insertions(+), 5 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v8 1/4] vfio: Do not allow !ops->dma_unmap in vfio_pin/unpin_pages()
  2023-07-24 19:47 ` [PATCH v8 1/4] vfio: Do not allow !ops->dma_unmap in vfio_pin/unpin_pages() Nicolin Chen
@ 2023-07-26 17:33   ` Alex Williamson
  2023-07-26 17:38     ` Jason Gunthorpe
  0 siblings, 1 reply; 22+ messages in thread
From: Alex Williamson @ 2023-07-26 17:33 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: jgg, kevin.tian, yi.l.liu, joro, will, robin.murphy, shuah,
	linux-kernel, iommu, kvm, linux-kselftest, mjrosato, farman

On Mon, 24 Jul 2023 12:47:04 -0700
Nicolin Chen <nicolinc@nvidia.com> wrote:

> A driver that doesn't implement ops->dma_unmap shouldn't be allowed to do
> vfio_pin/unpin_pages(), though it can use vfio_dma_rw() to access an iova
> range. Deny !ops->dma_unmap cases in vfio_pin/unpin_pages().
> 
> Suggested-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Yi Liu <yi.l.liu@intel.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>  drivers/vfio/vfio_main.c | 4 ++++
>  1 file changed, 4 insertions(+)

I assume these go through iommufd.

Reviewed-by: Alex Williamson <alex.williamson@redhat.com>

> 
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 902f06e52c48..0da8ed81a97d 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -1483,6 +1483,8 @@ int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
>  	/* group->container cannot change while a vfio device is open */
>  	if (!pages || !npage || WARN_ON(!vfio_assert_device_open(device)))
>  		return -EINVAL;
> +	if (!device->ops->dma_unmap)
> +		return -EINVAL;
>  	if (vfio_device_has_container(device))
>  		return vfio_device_container_pin_pages(device, iova,
>  						       npage, prot, pages);
> @@ -1520,6 +1522,8 @@ void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova, int npage)
>  {
>  	if (WARN_ON(!vfio_assert_device_open(device)))
>  		return;
> +	if (WARN_ON(!device->ops->dma_unmap))
> +		return;
>  
>  	if (vfio_device_has_container(device)) {
>  		vfio_device_container_unpin_pages(device, iova, npage);


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v8 4/4] vfio: Support IO page table replacement
  2023-07-24 19:47 ` [PATCH v8 4/4] vfio: Support IO page table replacement Nicolin Chen
  2023-07-26 17:04   ` Jason Gunthorpe
@ 2023-07-26 17:34   ` Alex Williamson
  1 sibling, 0 replies; 22+ messages in thread
From: Alex Williamson @ 2023-07-26 17:34 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: jgg, kevin.tian, yi.l.liu, joro, will, robin.murphy, shuah,
	linux-kernel, iommu, kvm, linux-kselftest, mjrosato, farman

On Mon, 24 Jul 2023 12:47:07 -0700
Nicolin Chen <nicolinc@nvidia.com> wrote:

> Now both the physical path and the emulated path should support an IO page
> table replacement. Call iommufd_device_replace/iommufd_access_replace(),
> when vdev->iommufd_attached is true.
> 
> Also update the VFIO_DEVICE_ATTACH_IOMMUFD_PT kdoc in the uAPI header.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>  drivers/vfio/iommufd.c    | 11 ++++++-----
>  include/uapi/linux/vfio.h |  6 ++++++
>  2 files changed, 12 insertions(+), 5 deletions(-)

Reviewed-by: Alex Williamson <alex.williamson@redhat.com>

> 
> diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> index 4d84904fd927..82eba6966fa5 100644
> --- a/drivers/vfio/iommufd.c
> +++ b/drivers/vfio/iommufd.c
> @@ -146,9 +146,9 @@ int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
>  		return -EINVAL;
>  
>  	if (vdev->iommufd_attached)
> -		return -EBUSY;
> -
> -	rc = iommufd_device_attach(vdev->iommufd_device, pt_id);
> +		rc = iommufd_device_replace(vdev->iommufd_device, pt_id);
> +	else
> +		rc = iommufd_device_attach(vdev->iommufd_device, pt_id);
>  	if (rc)
>  		return rc;
>  	vdev->iommufd_attached = true;
> @@ -223,8 +223,9 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
>  	lockdep_assert_held(&vdev->dev_set->lock);
>  
>  	if (vdev->iommufd_attached)
> -		return -EBUSY;
> -	rc = iommufd_access_attach(vdev->iommufd_access, *pt_id);
> +		rc = iommufd_access_replace(vdev->iommufd_access, *pt_id);
> +	else
> +		rc = iommufd_access_attach(vdev->iommufd_access, *pt_id);
>  	if (rc)
>  		return rc;
>  	vdev->iommufd_attached = true;
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index fa06e3eb4955..537157ff8670 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -939,6 +939,12 @@ struct vfio_device_bind_iommufd {
>   * Undo by VFIO_DEVICE_DETACH_IOMMUFD_PT or device fd close.  This is only
>   * allowed on cdev fds.
>   *
> + * If a vfio device is currently attached to a valid hw_pagetable, without doing
> + * a VFIO_DEVICE_DETACH_IOMMUFD_PT, a second VFIO_DEVICE_ATTACH_IOMMUFD_PT ioctl
> + * passing in another hw_pagetable (hwpt) id is allowed. This action, also known
> + * as a hw_pagetable replacement, will replace the device's currently attached
> + * hw_pagetable with a new hw_pagetable corresponding to the given pt_id.
> + *
>   * Return: 0 on success, -errno on failure.
>   */
>  struct vfio_device_attach_iommufd_pt {


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v8 1/4] vfio: Do not allow !ops->dma_unmap in vfio_pin/unpin_pages()
  2023-07-26 17:33   ` Alex Williamson
@ 2023-07-26 17:38     ` Jason Gunthorpe
  0 siblings, 0 replies; 22+ messages in thread
From: Jason Gunthorpe @ 2023-07-26 17:38 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Nicolin Chen, kevin.tian, yi.l.liu, joro, will, robin.murphy,
	shuah, linux-kernel, iommu, kvm, linux-kselftest, mjrosato,
	farman

On Wed, Jul 26, 2023 at 11:33:49AM -0600, Alex Williamson wrote:
> On Mon, 24 Jul 2023 12:47:04 -0700
> Nicolin Chen <nicolinc@nvidia.com> wrote:
> 
> > A driver that doesn't implement ops->dma_unmap shouldn't be allowed to do
> > vfio_pin/unpin_pages(), though it can use vfio_dma_rw() to access an iova
> > range. Deny !ops->dma_unmap cases in vfio_pin/unpin_pages().
> > 
> > Suggested-by: Kevin Tian <kevin.tian@intel.com>
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > Reviewed-by: Yi Liu <yi.l.liu@intel.com>
> > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > ---
> >  drivers/vfio/vfio_main.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> 
> I assume these go through iommufd.

Yep, I think it is next up, thanks

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v8 2/4] iommufd: Add iommufd_access_replace() API
  2023-07-26 14:30   ` Jason Gunthorpe
@ 2023-07-26 20:50     ` Nicolin Chen
  2023-07-26 23:36       ` Jason Gunthorpe
  0 siblings, 1 reply; 22+ messages in thread
From: Nicolin Chen @ 2023-07-26 20:50 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kevin.tian, alex.williamson, yi.l.liu, joro, will, robin.murphy,
	shuah, linux-kernel, iommu, kvm, linux-kselftest, mjrosato,
	farman

On Wed, Jul 26, 2023 at 11:30:17AM -0300, Jason Gunthorpe wrote:
> On Mon, Jul 24, 2023 at 12:47:05PM -0700, Nicolin Chen wrote:
> > -int iommufd_access_attach(struct iommufd_access *access, u32 ioas_id)
> > +static int iommufd_access_change_pt(struct iommufd_access *access, u32 ioas_id)
> >  {
> > +	struct iommufd_ioas *cur_ioas = access->ioas;
> >  	struct iommufd_ioas *new_ioas;
> > -	int rc = 0;
> > +	int rc;
> >  
> > -	mutex_lock(&access->ioas_lock);
> > -	if (WARN_ON(access->ioas || access->ioas_unpin)) {
> > -		mutex_unlock(&access->ioas_lock);
> > -		return -EINVAL;
> > -	}
> > +	lockdep_assert_held(&access->ioas_lock);
> >  
> >  	new_ioas = iommufd_get_ioas(access->ictx, ioas_id);
> > -	if (IS_ERR(new_ioas)) {
> > -		mutex_unlock(&access->ioas_lock);
> > +	if (IS_ERR(new_ioas))
> >  		return PTR_ERR(new_ioas);
> > -	}
> > +
> > +	if (cur_ioas)
> > +		__iommufd_access_detach(access);
> 
> The drop of the mutex while this function runs is racey with the rest
> of this, we can mitigate it by blocking concurrent change while
> detaching which is if access->ioas_unpin is set

Oh. You mean that unmap part dropping the mutex right? I see.

> >  	rc = iopt_add_access(&new_ioas->iopt, access);
> >  	if (rc) {
> > -		mutex_unlock(&access->ioas_lock);
> >  		iommufd_put_object(&new_ioas->obj);
> > +		if (cur_ioas)
> > +			WARN_ON(iommufd_access_change_pt(access,
> > +							 cur_ioas->obj.id));
> 
> We've already dropped our ref to cur_ioas, so this is also racy with
> destroy.

Would it be better by calling iommufd_access_detach() that holds
the same mutex in the iommufd_access_destroy_object()? We could
also unwrap the detach and delay the refcount_dec, as you did in
your attaching patch.

> This is what I came up with:
> 
> diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> index 57c0e81f5073b2..e55d6e902edb98 100644
> --- a/drivers/iommu/iommufd/device.c
> +++ b/drivers/iommu/iommufd/device.c
> @@ -758,64 +758,101 @@ void iommufd_access_destroy(struct iommufd_access *access)
>  }
>  EXPORT_SYMBOL_NS_GPL(iommufd_access_destroy, IOMMUFD);
>  
> -void iommufd_access_detach(struct iommufd_access *access)
> +static int iommufd_access_change_ioas(struct iommufd_access *access,
> +				      struct iommufd_ioas *new_ioas)
>  {
>  	struct iommufd_ioas *cur_ioas = access->ioas;
> +	int rc;
> +
> +	lockdep_assert_held(&access->ioas_lock);
> +
> +	/* We are racing with a concurrent detach, bail */
> +	if (access->ioas_unpin)
> +		return -EBUSY;

I think this should check access->ioas too? I mean:

+	/* We are racing with a concurrent detach, bail */
+	if (!access->ioas && access->ioas_unpin)
+		return -EBUSY;

Otherwise, a normal detach() would fail, since an access has both
a valid ioas and a valid ioas_unpin.

> +
> +	if (IS_ERR(new_ioas))
> +		return PTR_ERR(new_ioas);
> +
> +	if (cur_ioas == new_ioas)
> +		return 0;
>  
> -	mutex_lock(&access->ioas_lock);
> -	if (WARN_ON(!access->ioas))
> -		goto out;
>  	/*
>  	 * Set ioas to NULL to block any further iommufd_access_pin_pages().
>  	 * iommufd_access_unpin_pages() can continue using access->ioas_unpin.
>  	 */
>  	access->ioas = NULL;
> -
> -	if (access->ops->unmap) {
> +	if (cur_ioas && access->ops->unmap) {
>  		mutex_unlock(&access->ioas_lock);
>  		access->ops->unmap(access->data, 0, ULONG_MAX);
>  		mutex_lock(&access->ioas_lock);
>  	}
> +
> +	if (new_ioas) {
> +		rc = iopt_add_access(&new_ioas->iopt, access);
> +		if (rc) {
> +			iommufd_put_object(&new_ioas->obj);
> +			access->ioas = cur_ioas;
> +			return rc;
> +		}
> +		iommufd_ref_to_users(&new_ioas->obj);
> +	}
> +
> +	access->ioas = new_ioas;
> +	access->ioas_unpin = new_ioas;
>  	iopt_remove_access(&cur_ioas->iopt, access);

There was a bug in my earlier version, having the same flow by
calling iopt_add_access() prior to iopt_remove_access(). But,
doing that would override the access->iopt_access_list_id and
it would then get unset by the following iopt_remove_access().

Please refer to :
https://lore.kernel.org/linux-iommu/ZJYYWz2wy%2F86FapK@Asurada-Nvidia/

If we want a cleaner detach-then-attach flow, we would need an
atomic function in the io_pagetable.c file handling the id, yet
I couldn't figure a good naming for the atomic function since
it's about acccess shifting between two iopts other than simply
"iopt_repalce_access".

So, I came up with this version calling an iopt_remove_access()
prior to iopt_add_access(), which requires an add-back the old
ioas upon an failure at iopt_add_access(new_ioas).

I will try making some change accordingly on top of this patch.

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v8 2/4] iommufd: Add iommufd_access_replace() API
  2023-07-26 20:50     ` Nicolin Chen
@ 2023-07-26 23:36       ` Jason Gunthorpe
  2023-07-27  2:59         ` Nicolin Chen
  0 siblings, 1 reply; 22+ messages in thread
From: Jason Gunthorpe @ 2023-07-26 23:36 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: kevin.tian, alex.williamson, yi.l.liu, joro, will, robin.murphy,
	shuah, linux-kernel, iommu, kvm, linux-kselftest, mjrosato,
	farman

On Wed, Jul 26, 2023 at 01:50:28PM -0700, Nicolin Chen wrote:
> 
> > >  	rc = iopt_add_access(&new_ioas->iopt, access);
> > >  	if (rc) {
> > > -		mutex_unlock(&access->ioas_lock);
> > >  		iommufd_put_object(&new_ioas->obj);
> > > +		if (cur_ioas)
> > > +			WARN_ON(iommufd_access_change_pt(access,
> > > +							 cur_ioas->obj.id));
> > 
> > We've already dropped our ref to cur_ioas, so this is also racy with
> > destroy.
> 
> Would it be better by calling iommufd_access_detach() that holds
> the same mutex in the iommufd_access_destroy_object()? We could
> also unwrap the detach and delay the refcount_dec, as you did in
> your attaching patch.

It is better just to integrate it with this algorithm so we don't have
the refcounting issues, like I did


> 
> > This is what I came up with:
> > 
> > diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> > index 57c0e81f5073b2..e55d6e902edb98 100644
> > --- a/drivers/iommu/iommufd/device.c
> > +++ b/drivers/iommu/iommufd/device.c
> > @@ -758,64 +758,101 @@ void iommufd_access_destroy(struct iommufd_access *access)
> >  }
> >  EXPORT_SYMBOL_NS_GPL(iommufd_access_destroy, IOMMUFD);
> >  
> > -void iommufd_access_detach(struct iommufd_access *access)
> > +static int iommufd_access_change_ioas(struct iommufd_access *access,
> > +				      struct iommufd_ioas *new_ioas)
> >  {
> >  	struct iommufd_ioas *cur_ioas = access->ioas;
> > +	int rc;
> > +
> > +	lockdep_assert_held(&access->ioas_lock);
> > +
> > +	/* We are racing with a concurrent detach, bail */
> > +	if (access->ioas_unpin)
> > +		return -EBUSY;
> 
> I think this should check access->ioas too? I mean:

> 
> +	/* We are racing with a concurrent detach, bail */
> +	if (!access->ioas && access->ioas_unpin)
> +		return -EBUSY;

Oh, yes, that should basically be 'cur_ioas != access->ioas_unpin' -
ie any difference means we are racing with the unmap call.

> > +	if (new_ioas) {
> > +		rc = iopt_add_access(&new_ioas->iopt, access);
> > +		if (rc) {
> > +			iommufd_put_object(&new_ioas->obj);
> > +			access->ioas = cur_ioas;
> > +			return rc;
> > +		}
> > +		iommufd_ref_to_users(&new_ioas->obj);
> > +	}
> > +
> > +	access->ioas = new_ioas;
> > +	access->ioas_unpin = new_ioas;
> >  	iopt_remove_access(&cur_ioas->iopt, access);
> 
> There was a bug in my earlier version, having the same flow by
> calling iopt_add_access() prior to iopt_remove_access(). But,
> doing that would override the access->iopt_access_list_id and
> it would then get unset by the following iopt_remove_access().

Ah, I was wondering about that order but didn't check it.

Maybe we just need to pass the ID into iopt_remove_access and keep the
right version on the stack.

> So, I came up with this version calling an iopt_remove_access()
> prior to iopt_add_access(), which requires an add-back the old
> ioas upon an failure at iopt_add_access(new_ioas).

That is also sort of reasonable if the refcounting is organized like
this does.

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v8 2/4] iommufd: Add iommufd_access_replace() API
  2023-07-26 23:36       ` Jason Gunthorpe
@ 2023-07-27  2:59         ` Nicolin Chen
  2023-07-27  7:30           ` Nicolin Chen
  2023-07-27 12:03           ` Jason Gunthorpe
  0 siblings, 2 replies; 22+ messages in thread
From: Nicolin Chen @ 2023-07-27  2:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kevin.tian, alex.williamson, yi.l.liu, joro, will, robin.murphy,
	shuah, linux-kernel, iommu, kvm, linux-kselftest, mjrosato,
	farman

On Wed, Jul 26, 2023 at 08:36:31PM -0300, Jason Gunthorpe wrote:
> On Wed, Jul 26, 2023 at 01:50:28PM -0700, Nicolin Chen wrote:
> > 
> > > >  	rc = iopt_add_access(&new_ioas->iopt, access);
> > > >  	if (rc) {
> > > > -		mutex_unlock(&access->ioas_lock);
> > > >  		iommufd_put_object(&new_ioas->obj);
> > > > +		if (cur_ioas)
> > > > +			WARN_ON(iommufd_access_change_pt(access,
> > > > +							 cur_ioas->obj.id));
> > > 
> > > We've already dropped our ref to cur_ioas, so this is also racy with
> > > destroy.
> > 
> > Would it be better by calling iommufd_access_detach() that holds
> > the same mutex in the iommufd_access_destroy_object()? We could
> > also unwrap the detach and delay the refcount_dec, as you did in
> > your attaching patch.
> 
> It is better just to integrate it with this algorithm so we don't have
> the refcounting issues, like I did

OK. I will have a patch adding the iommufd_access_change_ioas
first, and it can update iommufd_access_destroy_object() too.

> > > This is what I came up with:
> > > 
> > > diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> > > index 57c0e81f5073b2..e55d6e902edb98 100644
> > > --- a/drivers/iommu/iommufd/device.c
> > > +++ b/drivers/iommu/iommufd/device.c
> > > @@ -758,64 +758,101 @@ void iommufd_access_destroy(struct iommufd_access *access)
> > >  }
> > >  EXPORT_SYMBOL_NS_GPL(iommufd_access_destroy, IOMMUFD);
> > >  
> > > -void iommufd_access_detach(struct iommufd_access *access)
> > > +static int iommufd_access_change_ioas(struct iommufd_access *access,
> > > +				      struct iommufd_ioas *new_ioas)
> > >  {
> > >  	struct iommufd_ioas *cur_ioas = access->ioas;
> > > +	int rc;
> > > +
> > > +	lockdep_assert_held(&access->ioas_lock);
> > > +
> > > +	/* We are racing with a concurrent detach, bail */
> > > +	if (access->ioas_unpin)
> > > +		return -EBUSY;
> > 
> > I think this should check access->ioas too? I mean:
> 
> > 
> > +	/* We are racing with a concurrent detach, bail */
> > +	if (!access->ioas && access->ioas_unpin)
> > +		return -EBUSY;
> 
> Oh, yes, that should basically be 'cur_ioas != access->ioas_unpin' -
> ie any difference means we are racing with the unmap call.

Yea, will update to 'cur_ioas != access->ioas_unpin'.

> > > +	if (new_ioas) {
> > > +		rc = iopt_add_access(&new_ioas->iopt, access);
> > > +		if (rc) {
> > > +			iommufd_put_object(&new_ioas->obj);
> > > +			access->ioas = cur_ioas;
> > > +			return rc;
> > > +		}
> > > +		iommufd_ref_to_users(&new_ioas->obj);
> > > +	}
> > > +
> > > +	access->ioas = new_ioas;
> > > +	access->ioas_unpin = new_ioas;
> > >  	iopt_remove_access(&cur_ioas->iopt, access);
> > 
> > There was a bug in my earlier version, having the same flow by
> > calling iopt_add_access() prior to iopt_remove_access(). But,
> > doing that would override the access->iopt_access_list_id and
> > it would then get unset by the following iopt_remove_access().
> 
> Ah, I was wondering about that order but didn't check it.
> 
> Maybe we just need to pass the ID into iopt_remove_access and keep the
> right version on the stack.
> 
> > So, I came up with this version calling an iopt_remove_access()
> > prior to iopt_add_access(), which requires an add-back the old
> > ioas upon an failure at iopt_add_access(new_ioas).
> 
> That is also sort of reasonable if the refcounting is organized like
> this does.

I just realized that either my v8 or your version calls unmap()
first at the entire cur_ioas. So, there seems to be no point in
doing that fallback re-add routine since the cur_ioas isn't the
same, which I don't feel quite right...

Perhaps we should pass the ID into iopt_add/remove_access like
you said above. And then we attach the new_ioas, piror to the
detach the cur_ioas?

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v8 2/4] iommufd: Add iommufd_access_replace() API
  2023-07-27  2:59         ` Nicolin Chen
@ 2023-07-27  7:30           ` Nicolin Chen
  2023-07-27 12:03           ` Jason Gunthorpe
  1 sibling, 0 replies; 22+ messages in thread
From: Nicolin Chen @ 2023-07-27  7:30 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kevin.tian, alex.williamson, yi.l.liu, joro, will, robin.murphy,
	shuah, linux-kernel, iommu, kvm, linux-kselftest, mjrosato,
	farman

On Wed, Jul 26, 2023 at 07:59:17PM -0700, Nicolin Chen wrote:
 
> > > > +	if (new_ioas) {
> > > > +		rc = iopt_add_access(&new_ioas->iopt, access);
> > > > +		if (rc) {
> > > > +			iommufd_put_object(&new_ioas->obj);
> > > > +			access->ioas = cur_ioas;
> > > > +			return rc;
> > > > +		}
> > > > +		iommufd_ref_to_users(&new_ioas->obj);
> > > > +	}
> > > > +
> > > > +	access->ioas = new_ioas;
> > > > +	access->ioas_unpin = new_ioas;
> > > >  	iopt_remove_access(&cur_ioas->iopt, access);
> > > 
> > > There was a bug in my earlier version, having the same flow by
> > > calling iopt_add_access() prior to iopt_remove_access(). But,
> > > doing that would override the access->iopt_access_list_id and
> > > it would then get unset by the following iopt_remove_access().
> > 
> > Ah, I was wondering about that order but didn't check it.
> > 
> > Maybe we just need to pass the ID into iopt_remove_access and keep the
> > right version on the stack.
> > 
> > > So, I came up with this version calling an iopt_remove_access()
> > > prior to iopt_add_access(), which requires an add-back the old
> > > ioas upon an failure at iopt_add_access(new_ioas).
> > 
> > That is also sort of reasonable if the refcounting is organized like
> > this does.
> 
> I just realized that either my v8 or your version calls unmap()
> first at the entire cur_ioas. So, there seems to be no point in
> doing that fallback re-add routine since the cur_ioas isn't the
> same, which I don't feel quite right...
> 
> Perhaps we should pass the ID into iopt_add/remove_access like
> you said above. And then we attach the new_ioas, piror to the
> detach the cur_ioas?

I sent v9 having the iopt_remove_access trick, so we can do an
iopt_remove_access only upon success. Let's continue there.

Thanks
Nic

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v8 2/4] iommufd: Add iommufd_access_replace() API
  2023-07-27  2:59         ` Nicolin Chen
  2023-07-27  7:30           ` Nicolin Chen
@ 2023-07-27 12:03           ` Jason Gunthorpe
  2023-07-27 19:04             ` Nicolin Chen
  1 sibling, 1 reply; 22+ messages in thread
From: Jason Gunthorpe @ 2023-07-27 12:03 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: kevin.tian, alex.williamson, yi.l.liu, joro, will, robin.murphy,
	shuah, linux-kernel, iommu, kvm, linux-kselftest, mjrosato,
	farman

On Wed, Jul 26, 2023 at 07:59:11PM -0700, Nicolin Chen wrote:

> I just realized that either my v8 or your version calls unmap()
> first at the entire cur_ioas. So, there seems to be no point in
> doing that fallback re-add routine since the cur_ioas isn't the
> same, which I don't feel quite right...

The point is to restore the access back to how it should be on failure
so future use of the accesss still does the right thing.

We already have built into this a certain non-atomicity for mdevs,
they can see a pin failure during replace if they race an access
during this unmap window. This is similar to the real HW iommu's
without atomic replace.

> Perhaps we should pass the ID into iopt_add/remove_access like
> you said above. And then we attach the new_ioas, piror to the
> detach the cur_ioas?

If it is simple this seems like the most robust

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v8 2/4] iommufd: Add iommufd_access_replace() API
  2023-07-27 12:03           ` Jason Gunthorpe
@ 2023-07-27 19:04             ` Nicolin Chen
  2023-07-28  3:45               ` Tian, Kevin
  2023-07-28 12:27               ` Jason Gunthorpe
  0 siblings, 2 replies; 22+ messages in thread
From: Nicolin Chen @ 2023-07-27 19:04 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kevin.tian, alex.williamson, yi.l.liu, joro, will, robin.murphy,
	shuah, linux-kernel, iommu, kvm, linux-kselftest, mjrosato,
	farman

On Thu, Jul 27, 2023 at 09:03:01AM -0300, Jason Gunthorpe wrote:
> On Wed, Jul 26, 2023 at 07:59:11PM -0700, Nicolin Chen wrote:
> 
> > I just realized that either my v8 or your version calls unmap()
> > first at the entire cur_ioas. So, there seems to be no point in
> > doing that fallback re-add routine since the cur_ioas isn't the
> > same, which I don't feel quite right...
> 
> The point is to restore the access back to how it should be on failure
> so future use of the accesss still does the right thing.
> 
> We already have built into this a certain non-atomicity for mdevs,
> they can see a pin failure during replace if they race an access
> during this unmap window. This is similar to the real HW iommu's
> without atomic replace.

I was concerned about, after the replace, mdev losing all the
mappings due to the unmap() call, which means the fallback is
not really a status quo. Do you mean that they could pin those
lost mappings back?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v8 2/4] iommufd: Add iommufd_access_replace() API
  2023-07-27 19:04             ` Nicolin Chen
@ 2023-07-28  3:45               ` Tian, Kevin
  2023-07-28  4:43                 ` Nicolin Chen
  2023-07-28 12:27               ` Jason Gunthorpe
  1 sibling, 1 reply; 22+ messages in thread
From: Tian, Kevin @ 2023-07-28  3:45 UTC (permalink / raw)
  To: Nicolin Chen, Jason Gunthorpe
  Cc: alex.williamson, Liu, Yi L, joro, will, robin.murphy, shuah,
	linux-kernel, iommu, kvm, linux-kselftest, mjrosato, farman

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Friday, July 28, 2023 3:04 AM
> 
> On Thu, Jul 27, 2023 at 09:03:01AM -0300, Jason Gunthorpe wrote:
> > On Wed, Jul 26, 2023 at 07:59:11PM -0700, Nicolin Chen wrote:
> >
> > > I just realized that either my v8 or your version calls unmap()
> > > first at the entire cur_ioas. So, there seems to be no point in
> > > doing that fallback re-add routine since the cur_ioas isn't the
> > > same, which I don't feel quite right...
> >
> > The point is to restore the access back to how it should be on failure
> > so future use of the accesss still does the right thing.
> >
> > We already have built into this a certain non-atomicity for mdevs,
> > they can see a pin failure during replace if they race an access
> > during this unmap window. This is similar to the real HW iommu's
> > without atomic replace.
> 
> I was concerned about, after the replace, mdev losing all the
> mappings due to the unmap() call, which means the fallback is
> not really a status quo. Do you mean that they could pin those
> lost mappings back?

None of mdev drivers does that.

but we need think about the actual usage. I don't think the user
can request ioas change w/o actually reconfiguring the mdev
device. Presumably the latter could lead to reconstructure of pinned
pages.

so in code-level as Jason said we just need ensure the access is
back to an usable state.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v8 2/4] iommufd: Add iommufd_access_replace() API
  2023-07-28  3:45               ` Tian, Kevin
@ 2023-07-28  4:43                 ` Nicolin Chen
  2023-07-28  6:20                   ` Tian, Kevin
  0 siblings, 1 reply; 22+ messages in thread
From: Nicolin Chen @ 2023-07-28  4:43 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Jason Gunthorpe, alex.williamson, Liu, Yi L, joro, will,
	robin.murphy, shuah, linux-kernel, iommu, kvm, linux-kselftest,
	mjrosato, farman

On Fri, Jul 28, 2023 at 03:45:39AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Friday, July 28, 2023 3:04 AM
> >
> > On Thu, Jul 27, 2023 at 09:03:01AM -0300, Jason Gunthorpe wrote:
> > > On Wed, Jul 26, 2023 at 07:59:11PM -0700, Nicolin Chen wrote:
> > >
> > > > I just realized that either my v8 or your version calls unmap()
> > > > first at the entire cur_ioas. So, there seems to be no point in
> > > > doing that fallback re-add routine since the cur_ioas isn't the
> > > > same, which I don't feel quite right...
> > >
> > > The point is to restore the access back to how it should be on failure
> > > so future use of the accesss still does the right thing.
> > >
> > > We already have built into this a certain non-atomicity for mdevs,
> > > they can see a pin failure during replace if they race an access
> > > during this unmap window. This is similar to the real HW iommu's
> > > without atomic replace.
> >
> > I was concerned about, after the replace, mdev losing all the
> > mappings due to the unmap() call, which means the fallback is
> > not really a status quo. Do you mean that they could pin those
> > lost mappings back?
> 
> None of mdev drivers does that.
> 
> but we need think about the actual usage. I don't think the user
> can request ioas change w/o actually reconfiguring the mdev
> device. Presumably the latter could lead to reconstructure of pinned
> pages.

I can understand that the user should reconfigure the IOAS on
success. Yet, should we expect it to reconfigure on a failure
also?

Thanks!
Nic

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v8 2/4] iommufd: Add iommufd_access_replace() API
  2023-07-28  4:43                 ` Nicolin Chen
@ 2023-07-28  6:20                   ` Tian, Kevin
  2023-07-28 12:28                     ` Jason Gunthorpe
  0 siblings, 1 reply; 22+ messages in thread
From: Tian, Kevin @ 2023-07-28  6:20 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Jason Gunthorpe, alex.williamson, Liu, Yi L, joro, will,
	robin.murphy, shuah, linux-kernel, iommu, kvm, linux-kselftest,
	mjrosato, farman

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Friday, July 28, 2023 12:43 PM
> 
> On Fri, Jul 28, 2023 at 03:45:39AM +0000, Tian, Kevin wrote:
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > Sent: Friday, July 28, 2023 3:04 AM
> > >
> > > On Thu, Jul 27, 2023 at 09:03:01AM -0300, Jason Gunthorpe wrote:
> > > > On Wed, Jul 26, 2023 at 07:59:11PM -0700, Nicolin Chen wrote:
> > > >
> > > > > I just realized that either my v8 or your version calls unmap()
> > > > > first at the entire cur_ioas. So, there seems to be no point in
> > > > > doing that fallback re-add routine since the cur_ioas isn't the
> > > > > same, which I don't feel quite right...
> > > >
> > > > The point is to restore the access back to how it should be on failure
> > > > so future use of the accesss still does the right thing.
> > > >
> > > > We already have built into this a certain non-atomicity for mdevs,
> > > > they can see a pin failure during replace if they race an access
> > > > during this unmap window. This is similar to the real HW iommu's
> > > > without atomic replace.
> > >
> > > I was concerned about, after the replace, mdev losing all the
> > > mappings due to the unmap() call, which means the fallback is
> > > not really a status quo. Do you mean that they could pin those
> > > lost mappings back?
> >
> > None of mdev drivers does that.
> >
> > but we need think about the actual usage. I don't think the user
> > can request ioas change w/o actually reconfiguring the mdev
> > device. Presumably the latter could lead to reconstructure of pinned
> > pages.
> 
> I can understand that the user should reconfigure the IOAS on
> success. Yet, should we expect it to reconfigure on a failure
> also?
> 

I thought the user will likely stop the device before changing IOAS
and then re-enable device DMA afterwards. If that is the typical
flow then no matter this replace request succeeds or fails the
re-enabling sequence should lead to the addition of pinned pages
back to the current IOAS.

But this does imply inconsistent behavior between success and failure.
Not sure whether it's worth a fix e.g. introducing another notifier for
mdev drivers to re-pin...

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v8 2/4] iommufd: Add iommufd_access_replace() API
  2023-07-27 19:04             ` Nicolin Chen
  2023-07-28  3:45               ` Tian, Kevin
@ 2023-07-28 12:27               ` Jason Gunthorpe
  1 sibling, 0 replies; 22+ messages in thread
From: Jason Gunthorpe @ 2023-07-28 12:27 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: kevin.tian, alex.williamson, yi.l.liu, joro, will, robin.murphy,
	shuah, linux-kernel, iommu, kvm, linux-kselftest, mjrosato,
	farman

On Thu, Jul 27, 2023 at 12:04:00PM -0700, Nicolin Chen wrote:
> On Thu, Jul 27, 2023 at 09:03:01AM -0300, Jason Gunthorpe wrote:
> > On Wed, Jul 26, 2023 at 07:59:11PM -0700, Nicolin Chen wrote:
> > 
> > > I just realized that either my v8 or your version calls unmap()
> > > first at the entire cur_ioas. So, there seems to be no point in
> > > doing that fallback re-add routine since the cur_ioas isn't the
> > > same, which I don't feel quite right...
> > 
> > The point is to restore the access back to how it should be on failure
> > so future use of the accesss still does the right thing.
> > 
> > We already have built into this a certain non-atomicity for mdevs,
> > they can see a pin failure during replace if they race an access
> > during this unmap window. This is similar to the real HW iommu's
> > without atomic replace.
> 
> I was concerned about, after the replace, mdev losing all the
> mappings due to the unmap() call, which means the fallback is
> not really a status quo. Do you mean that they could pin those
> lost mappings back?

At this point their shouldn't be mappings in any path with a chance of
success, as I said it is racy already. Not sure we need to fuss about
it futher.

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v8 2/4] iommufd: Add iommufd_access_replace() API
  2023-07-28  6:20                   ` Tian, Kevin
@ 2023-07-28 12:28                     ` Jason Gunthorpe
  0 siblings, 0 replies; 22+ messages in thread
From: Jason Gunthorpe @ 2023-07-28 12:28 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Nicolin Chen, alex.williamson, Liu, Yi L, joro, will,
	robin.murphy, shuah, linux-kernel, iommu, kvm, linux-kselftest,
	mjrosato, farman

On Fri, Jul 28, 2023 at 06:20:56AM +0000, Tian, Kevin wrote:

> But this does imply inconsistent behavior between success and failure.
> Not sure whether it's worth a fix e.g. introducing another notifier for
> mdev drivers to re-pin...

After unmap drivers should re-establish their DMA mappings when they
are next required. It is a mdev driver bug if they don't do this..

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2023-07-28 12:28 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-24 19:47 [PATCH v8 0/4] cover-letter: Add IO page table replacement support Nicolin Chen
2023-07-24 19:47 ` [PATCH v8 1/4] vfio: Do not allow !ops->dma_unmap in vfio_pin/unpin_pages() Nicolin Chen
2023-07-26 17:33   ` Alex Williamson
2023-07-26 17:38     ` Jason Gunthorpe
2023-07-24 19:47 ` [PATCH v8 2/4] iommufd: Add iommufd_access_replace() API Nicolin Chen
2023-07-26 14:30   ` Jason Gunthorpe
2023-07-26 20:50     ` Nicolin Chen
2023-07-26 23:36       ` Jason Gunthorpe
2023-07-27  2:59         ` Nicolin Chen
2023-07-27  7:30           ` Nicolin Chen
2023-07-27 12:03           ` Jason Gunthorpe
2023-07-27 19:04             ` Nicolin Chen
2023-07-28  3:45               ` Tian, Kevin
2023-07-28  4:43                 ` Nicolin Chen
2023-07-28  6:20                   ` Tian, Kevin
2023-07-28 12:28                     ` Jason Gunthorpe
2023-07-28 12:27               ` Jason Gunthorpe
2023-07-24 19:47 ` [PATCH v8 3/4] iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_REPLACE_IOAS coverage Nicolin Chen
2023-07-26 17:04   ` Jason Gunthorpe
2023-07-24 19:47 ` [PATCH v8 4/4] vfio: Support IO page table replacement Nicolin Chen
2023-07-26 17:04   ` Jason Gunthorpe
2023-07-26 17:34   ` Alex Williamson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.