dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/11] Connect VFIO to IOMMUFD
@ 2022-11-08  0:52 Jason Gunthorpe
  2022-11-08  0:52 ` [PATCH v2 01/11] vfio: Move vfio_device driver open/close code to a function Jason Gunthorpe
                   ` (13 more replies)
  0 siblings, 14 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2022-11-08  0:52 UTC (permalink / raw)
  To: Alexander Gordeev, David Airlie, Tony Krowiak, Alex Williamson,
	Christian Borntraeger, Cornelia Huck, Daniel Vetter,
	Diana Craciun, dri-devel, Eric Auger, Eric Farman,
	Harald Freudenberger, Vasily Gorbik, Heiko Carstens, intel-gfx,
	intel-gvt-dev, iommu, Jani Nikula, Jason Herne, Joonas Lahtinen,
	Joerg Roedel, Kevin Tian, kvm, linux-s390, Longfang Liu,
	Matthew Rosato, Peter Oberparleiter, Halil Pasic, Robin Murphy,
	Rodrigo Vivi, Shameer Kolothum, Sven Schnelle, Tvrtko Ursulin,
	Vineeth Vijayan, Will Deacon, Yishai Hadas, Zhenyu Wang,
	Zhi Wang
  Cc: Liu, Yi L, Nicolin Chen, Lu Baolu

This series provides an alternative container layer for VFIO implemented
using iommufd. This is optional, if CONFIG_IOMMUFD is not set then it will
not be compiled in.

At this point iommufd can be injected by passing in a iommfd FD to
VFIO_GROUP_SET_CONTAINER which will use the VFIO compat layer in iommufd
to obtain the compat IOAS and then connect up all the VFIO drivers as
appropriate.

This is temporary stopping point, a following series will provide a way to
directly open a VFIO device FD and directly connect it to IOMMUFD using
native ioctls that can expose the IOMMUFD features like hwpt, future
vPASID and dynamic attachment.

This series, in compat mode, has passed all the qemu tests we have
available, including the test suites for the Intel GVT mdev. Aside from
the temporary limitation with P2P memory this is belived to be fully
compatible with VFIO.

This is on github: https://github.com/jgunthorpe/linux/commits/vfio_iommufd

It requires the iommufd series:

https://lore.kernel.org/r/0-v4-0de2f6c78ed0+9d1-iommufd_jgg@nvidia.com

v2:
 - Rebase to v6.1-rc3, v4 iommufd series
 - Fixup comments and commit messages from list remarks
 - Fix leaking of the iommufd for mdevs
 - New patch to fix vfio modaliases when vfio container is disabled
 - Add a dmesg once when the iommufd provided /dev/vfio/vfio is opened
   to signal that iommufd is providing this
v1: https://lore.kernel.org/r/0-v1-4991695894d8+211-vfio_iommufd_jgg@nvidia.com

Jason Gunthorpe (11):
  vfio: Move vfio_device driver open/close code to a function
  vfio: Move vfio_device_assign_container() into
    vfio_device_first_open()
  vfio: Rename vfio_device_assign/unassign_container()
  vfio: Move storage of allow_unsafe_interrupts to vfio_main.c
  vfio: Use IOMMU_CAP_ENFORCE_CACHE_COHERENCY for
    vfio_file_enforced_coherent()
  vfio-iommufd: Allow iommufd to be used in place of a container fd
  vfio-iommufd: Support iommufd for physical VFIO devices
  vfio-iommufd: Support iommufd for emulated VFIO devices
  vfio: Move container related MODULE_ALIAS statements into container.c
  vfio: Make vfio_container optionally compiled
  iommufd: Allow iommufd to supply /dev/vfio/vfio

 drivers/gpu/drm/i915/gvt/kvmgt.c              |   3 +
 drivers/iommu/iommufd/Kconfig                 |  12 +
 drivers/iommu/iommufd/main.c                  |  36 ++
 drivers/s390/cio/vfio_ccw_ops.c               |   3 +
 drivers/s390/crypto/vfio_ap_ops.c             |   3 +
 drivers/vfio/Kconfig                          |  36 +-
 drivers/vfio/Makefile                         |   5 +-
 drivers/vfio/container.c                      | 141 ++------
 drivers/vfio/fsl-mc/vfio_fsl_mc.c             |   3 +
 drivers/vfio/iommufd.c                        | 157 ++++++++
 .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    |   6 +
 drivers/vfio/pci/mlx5/main.c                  |   3 +
 drivers/vfio/pci/vfio_pci.c                   |   3 +
 drivers/vfio/platform/vfio_amba.c             |   3 +
 drivers/vfio/platform/vfio_platform.c         |   3 +
 drivers/vfio/vfio.h                           | 100 +++++-
 drivers/vfio/vfio_iommu_type1.c               |   5 +-
 drivers/vfio/vfio_main.c                      | 338 ++++++++++++++----
 include/linux/vfio.h                          |  39 ++
 19 files changed, 700 insertions(+), 199 deletions(-)
 create mode 100644 drivers/vfio/iommufd.c


base-commit: ca3067007d4f2aa7f3a5375bd256839e08a09453
-- 
2.38.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 01/11] vfio: Move vfio_device driver open/close code to a function
  2022-11-08  0:52 [PATCH v2 00/11] Connect VFIO to IOMMUFD Jason Gunthorpe
@ 2022-11-08  0:52 ` Jason Gunthorpe
  2022-11-08  0:52 ` [PATCH v2 02/11] vfio: Move vfio_device_assign_container() into vfio_device_first_open() Jason Gunthorpe
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2022-11-08  0:52 UTC (permalink / raw)
  To: Alexander Gordeev, David Airlie, Tony Krowiak, Alex Williamson,
	Christian Borntraeger, Cornelia Huck, Daniel Vetter,
	Diana Craciun, dri-devel, Eric Auger, Eric Farman,
	Harald Freudenberger, Vasily Gorbik, Heiko Carstens, intel-gfx,
	intel-gvt-dev, iommu, Jani Nikula, Jason Herne, Joonas Lahtinen,
	Joerg Roedel, Kevin Tian, kvm, linux-s390, Longfang Liu,
	Matthew Rosato, Peter Oberparleiter, Halil Pasic, Robin Murphy,
	Rodrigo Vivi, Shameer Kolothum, Sven Schnelle, Tvrtko Ursulin,
	Vineeth Vijayan, Will Deacon, Yishai Hadas, Zhenyu Wang,
	Zhi Wang
  Cc: Liu, Yi L, Nicolin Chen, Lu Baolu

This error unwind is getting complicated. Move all the code into two
pair'd function. The functions should be called when the open_count == 1
after incrementing/before decrementing.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/vfio_main.c | 95 ++++++++++++++++++++++------------------
 1 file changed, 53 insertions(+), 42 deletions(-)

diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 2d168793d4e1ce..2e8346d13c16ca 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -734,6 +734,51 @@ bool vfio_assert_device_open(struct vfio_device *device)
 	return !WARN_ON_ONCE(!READ_ONCE(device->open_count));
 }
 
+static int vfio_device_first_open(struct vfio_device *device)
+{
+	int ret;
+
+	lockdep_assert_held(&device->dev_set->lock);
+
+	if (!try_module_get(device->dev->driver->owner))
+		return -ENODEV;
+
+	/*
+	 * Here we pass the KVM pointer with the group under the lock.  If the
+	 * device driver will use it, it must obtain a reference and release it
+	 * during close_device.
+	 */
+	mutex_lock(&device->group->group_lock);
+	device->kvm = device->group->kvm;
+	if (device->ops->open_device) {
+		ret = device->ops->open_device(device);
+		if (ret)
+			goto err_module_put;
+	}
+	vfio_device_container_register(device);
+	mutex_unlock(&device->group->group_lock);
+	return 0;
+
+err_module_put:
+	device->kvm = NULL;
+	mutex_unlock(&device->group->group_lock);
+	module_put(device->dev->driver->owner);
+	return ret;
+}
+
+static void vfio_device_last_close(struct vfio_device *device)
+{
+	lockdep_assert_held(&device->dev_set->lock);
+
+	mutex_lock(&device->group->group_lock);
+	vfio_device_container_unregister(device);
+	if (device->ops->close_device)
+		device->ops->close_device(device);
+	device->kvm = NULL;
+	mutex_unlock(&device->group->group_lock);
+	module_put(device->dev->driver->owner);
+}
+
 static struct file *vfio_device_open(struct vfio_device *device)
 {
 	struct file *filep;
@@ -745,29 +790,12 @@ static struct file *vfio_device_open(struct vfio_device *device)
 	if (ret)
 		return ERR_PTR(ret);
 
-	if (!try_module_get(device->dev->driver->owner)) {
-		ret = -ENODEV;
-		goto err_unassign_container;
-	}
-
 	mutex_lock(&device->dev_set->lock);
 	device->open_count++;
 	if (device->open_count == 1) {
-		/*
-		 * Here we pass the KVM pointer with the group under the read
-		 * lock.  If the device driver will use it, it must obtain a
-		 * reference and release it during close_device.
-		 */
-		mutex_lock(&device->group->group_lock);
-		device->kvm = device->group->kvm;
-
-		if (device->ops->open_device) {
-			ret = device->ops->open_device(device);
-			if (ret)
-				goto err_undo_count;
-		}
-		vfio_device_container_register(device);
-		mutex_unlock(&device->group->group_lock);
+		ret = vfio_device_first_open(device);
+		if (ret)
+			goto err_unassign_container;
 	}
 	mutex_unlock(&device->dev_set->lock);
 
@@ -800,20 +828,11 @@ static struct file *vfio_device_open(struct vfio_device *device)
 
 err_close_device:
 	mutex_lock(&device->dev_set->lock);
-	mutex_lock(&device->group->group_lock);
-	if (device->open_count == 1 && device->ops->close_device) {
-		device->ops->close_device(device);
-
-		vfio_device_container_unregister(device);
-	}
-err_undo_count:
-	mutex_unlock(&device->group->group_lock);
+	if (device->open_count == 1)
+		vfio_device_last_close(device);
+err_unassign_container:
 	device->open_count--;
-	if (device->open_count == 0 && device->kvm)
-		device->kvm = NULL;
 	mutex_unlock(&device->dev_set->lock);
-	module_put(device->dev->driver->owner);
-err_unassign_container:
 	vfio_device_unassign_container(device);
 	return ERR_PTR(ret);
 }
@@ -1016,19 +1035,11 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 
 	mutex_lock(&device->dev_set->lock);
 	vfio_assert_device_open(device);
-	mutex_lock(&device->group->group_lock);
-	if (device->open_count == 1 && device->ops->close_device)
-		device->ops->close_device(device);
-
-	vfio_device_container_unregister(device);
-	mutex_unlock(&device->group->group_lock);
+	if (device->open_count == 1)
+		vfio_device_last_close(device);
 	device->open_count--;
-	if (device->open_count == 0)
-		device->kvm = NULL;
 	mutex_unlock(&device->dev_set->lock);
 
-	module_put(device->dev->driver->owner);
-
 	vfio_device_unassign_container(device);
 
 	vfio_device_put_registration(device);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 02/11] vfio: Move vfio_device_assign_container() into vfio_device_first_open()
  2022-11-08  0:52 [PATCH v2 00/11] Connect VFIO to IOMMUFD Jason Gunthorpe
  2022-11-08  0:52 ` [PATCH v2 01/11] vfio: Move vfio_device driver open/close code to a function Jason Gunthorpe
@ 2022-11-08  0:52 ` Jason Gunthorpe
  2022-11-08  0:52 ` [PATCH v2 03/11] vfio: Rename vfio_device_assign/unassign_container() Jason Gunthorpe
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2022-11-08  0:52 UTC (permalink / raw)
  To: Alexander Gordeev, David Airlie, Tony Krowiak, Alex Williamson,
	Christian Borntraeger, Cornelia Huck, Daniel Vetter,
	Diana Craciun, dri-devel, Eric Auger, Eric Farman,
	Harald Freudenberger, Vasily Gorbik, Heiko Carstens, intel-gfx,
	intel-gvt-dev, iommu, Jani Nikula, Jason Herne, Joonas Lahtinen,
	Joerg Roedel, Kevin Tian, kvm, linux-s390, Longfang Liu,
	Matthew Rosato, Peter Oberparleiter, Halil Pasic, Robin Murphy,
	Rodrigo Vivi, Shameer Kolothum, Sven Schnelle, Tvrtko Ursulin,
	Vineeth Vijayan, Will Deacon, Yishai Hadas, Zhenyu Wang,
	Zhi Wang
  Cc: Liu, Yi L, Nicolin Chen, Lu Baolu

The only thing this function does is assert the group has an assigned
container and incrs refcounts.

The overall model we have is that once a container_users refcount is
incremented it cannot be de-assigned from the group -
vfio_group_ioctl_unset_container() will fail and the group FD cannot be
closed.

Thus we do not need to check this on every device FD open, just the
first. Reorganize the code so that only the first open and last close
manages the container.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/container.c |  4 ++--
 drivers/vfio/vfio_main.c | 24 +++++++++++-------------
 2 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/drivers/vfio/container.c b/drivers/vfio/container.c
index d74164abbf401d..dd79a66ec62cad 100644
--- a/drivers/vfio/container.c
+++ b/drivers/vfio/container.c
@@ -531,11 +531,11 @@ int vfio_device_assign_container(struct vfio_device *device)
 
 void vfio_device_unassign_container(struct vfio_device *device)
 {
-	mutex_lock(&device->group->group_lock);
+	lockdep_assert_held_write(&device->group->group_lock);
+
 	WARN_ON(device->group->container_users <= 1);
 	device->group->container_users--;
 	fput(device->group->opened_file);
-	mutex_unlock(&device->group->group_lock);
 }
 
 /*
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 2e8346d13c16ca..717c7f404feeea 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -749,18 +749,24 @@ static int vfio_device_first_open(struct vfio_device *device)
 	 * during close_device.
 	 */
 	mutex_lock(&device->group->group_lock);
+	ret = vfio_device_assign_container(device);
+	if (ret)
+		goto err_module_put;
+
 	device->kvm = device->group->kvm;
 	if (device->ops->open_device) {
 		ret = device->ops->open_device(device);
 		if (ret)
-			goto err_module_put;
+			goto err_container;
 	}
 	vfio_device_container_register(device);
 	mutex_unlock(&device->group->group_lock);
 	return 0;
 
-err_module_put:
+err_container:
 	device->kvm = NULL;
+	vfio_device_unassign_container(device);
+err_module_put:
 	mutex_unlock(&device->group->group_lock);
 	module_put(device->dev->driver->owner);
 	return ret;
@@ -775,6 +781,7 @@ static void vfio_device_last_close(struct vfio_device *device)
 	if (device->ops->close_device)
 		device->ops->close_device(device);
 	device->kvm = NULL;
+	vfio_device_unassign_container(device);
 	mutex_unlock(&device->group->group_lock);
 	module_put(device->dev->driver->owner);
 }
@@ -784,18 +791,12 @@ static struct file *vfio_device_open(struct vfio_device *device)
 	struct file *filep;
 	int ret;
 
-	mutex_lock(&device->group->group_lock);
-	ret = vfio_device_assign_container(device);
-	mutex_unlock(&device->group->group_lock);
-	if (ret)
-		return ERR_PTR(ret);
-
 	mutex_lock(&device->dev_set->lock);
 	device->open_count++;
 	if (device->open_count == 1) {
 		ret = vfio_device_first_open(device);
 		if (ret)
-			goto err_unassign_container;
+			goto err_unlock;
 	}
 	mutex_unlock(&device->dev_set->lock);
 
@@ -830,10 +831,9 @@ static struct file *vfio_device_open(struct vfio_device *device)
 	mutex_lock(&device->dev_set->lock);
 	if (device->open_count == 1)
 		vfio_device_last_close(device);
-err_unassign_container:
+err_unlock:
 	device->open_count--;
 	mutex_unlock(&device->dev_set->lock);
-	vfio_device_unassign_container(device);
 	return ERR_PTR(ret);
 }
 
@@ -1040,8 +1040,6 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 	device->open_count--;
 	mutex_unlock(&device->dev_set->lock);
 
-	vfio_device_unassign_container(device);
-
 	vfio_device_put_registration(device);
 
 	return 0;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 03/11] vfio: Rename vfio_device_assign/unassign_container()
  2022-11-08  0:52 [PATCH v2 00/11] Connect VFIO to IOMMUFD Jason Gunthorpe
  2022-11-08  0:52 ` [PATCH v2 01/11] vfio: Move vfio_device driver open/close code to a function Jason Gunthorpe
  2022-11-08  0:52 ` [PATCH v2 02/11] vfio: Move vfio_device_assign_container() into vfio_device_first_open() Jason Gunthorpe
@ 2022-11-08  0:52 ` Jason Gunthorpe
  2022-11-08  0:52 ` [PATCH v2 04/11] vfio: Move storage of allow_unsafe_interrupts to vfio_main.c Jason Gunthorpe
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2022-11-08  0:52 UTC (permalink / raw)
  To: Alexander Gordeev, David Airlie, Tony Krowiak, Alex Williamson,
	Christian Borntraeger, Cornelia Huck, Daniel Vetter,
	Diana Craciun, dri-devel, Eric Auger, Eric Farman,
	Harald Freudenberger, Vasily Gorbik, Heiko Carstens, intel-gfx,
	intel-gvt-dev, iommu, Jani Nikula, Jason Herne, Joonas Lahtinen,
	Joerg Roedel, Kevin Tian, kvm, linux-s390, Longfang Liu,
	Matthew Rosato, Peter Oberparleiter, Halil Pasic, Robin Murphy,
	Rodrigo Vivi, Shameer Kolothum, Sven Schnelle, Tvrtko Ursulin,
	Vineeth Vijayan, Will Deacon, Yishai Hadas, Zhenyu Wang,
	Zhi Wang
  Cc: Liu, Yi L, Nicolin Chen, Lu Baolu

These functions don't really assign anything anymore, they just increment
some refcounts and do a sanity check. Call them
vfio_group_[un]use_container()

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/container.c | 14 ++++++--------
 drivers/vfio/vfio.h      |  4 ++--
 drivers/vfio/vfio_main.c |  6 +++---
 3 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/drivers/vfio/container.c b/drivers/vfio/container.c
index dd79a66ec62cad..499777930b08fa 100644
--- a/drivers/vfio/container.c
+++ b/drivers/vfio/container.c
@@ -511,10 +511,8 @@ void vfio_group_detach_container(struct vfio_group *group)
 	vfio_container_put(container);
 }
 
-int vfio_device_assign_container(struct vfio_device *device)
+int vfio_group_use_container(struct vfio_group *group)
 {
-	struct vfio_group *group = device->group;
-
 	lockdep_assert_held(&group->group_lock);
 
 	if (!group->container || !group->container->iommu_driver ||
@@ -529,13 +527,13 @@ int vfio_device_assign_container(struct vfio_device *device)
 	return 0;
 }
 
-void vfio_device_unassign_container(struct vfio_device *device)
+void vfio_group_unuse_container(struct vfio_group *group)
 {
-	lockdep_assert_held_write(&device->group->group_lock);
+	lockdep_assert_held(&group->group_lock);
 
-	WARN_ON(device->group->container_users <= 1);
-	device->group->container_users--;
-	fput(device->group->opened_file);
+	WARN_ON(group->container_users <= 1);
+	group->container_users--;
+	fput(group->opened_file);
 }
 
 /*
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index bcad54bbab08c4..f95f4925b83bbd 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -112,8 +112,8 @@ void vfio_unregister_iommu_driver(const struct vfio_iommu_driver_ops *ops);
 bool vfio_assert_device_open(struct vfio_device *device);
 
 struct vfio_container *vfio_container_from_file(struct file *filep);
-int vfio_device_assign_container(struct vfio_device *device);
-void vfio_device_unassign_container(struct vfio_device *device);
+int vfio_group_use_container(struct vfio_group *group);
+void vfio_group_unuse_container(struct vfio_group *group);
 int vfio_container_attach_group(struct vfio_container *container,
 				struct vfio_group *group);
 void vfio_group_detach_container(struct vfio_group *group);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 717c7f404feeea..8c2dcb481ae10b 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -749,7 +749,7 @@ static int vfio_device_first_open(struct vfio_device *device)
 	 * during close_device.
 	 */
 	mutex_lock(&device->group->group_lock);
-	ret = vfio_device_assign_container(device);
+	ret = vfio_group_use_container(device->group);
 	if (ret)
 		goto err_module_put;
 
@@ -765,7 +765,7 @@ static int vfio_device_first_open(struct vfio_device *device)
 
 err_container:
 	device->kvm = NULL;
-	vfio_device_unassign_container(device);
+	vfio_group_unuse_container(device->group);
 err_module_put:
 	mutex_unlock(&device->group->group_lock);
 	module_put(device->dev->driver->owner);
@@ -781,7 +781,7 @@ static void vfio_device_last_close(struct vfio_device *device)
 	if (device->ops->close_device)
 		device->ops->close_device(device);
 	device->kvm = NULL;
-	vfio_device_unassign_container(device);
+	vfio_group_unuse_container(device->group);
 	mutex_unlock(&device->group->group_lock);
 	module_put(device->dev->driver->owner);
 }
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 04/11] vfio: Move storage of allow_unsafe_interrupts to vfio_main.c
  2022-11-08  0:52 [PATCH v2 00/11] Connect VFIO to IOMMUFD Jason Gunthorpe
                   ` (2 preceding siblings ...)
  2022-11-08  0:52 ` [PATCH v2 03/11] vfio: Rename vfio_device_assign/unassign_container() Jason Gunthorpe
@ 2022-11-08  0:52 ` Jason Gunthorpe
  2022-11-08  0:52 ` [PATCH v2 05/11] vfio: Use IOMMU_CAP_ENFORCE_CACHE_COHERENCY for vfio_file_enforced_coherent() Jason Gunthorpe
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2022-11-08  0:52 UTC (permalink / raw)
  To: Alexander Gordeev, David Airlie, Tony Krowiak, Alex Williamson,
	Christian Borntraeger, Cornelia Huck, Daniel Vetter,
	Diana Craciun, dri-devel, Eric Auger, Eric Farman,
	Harald Freudenberger, Vasily Gorbik, Heiko Carstens, intel-gfx,
	intel-gvt-dev, iommu, Jani Nikula, Jason Herne, Joonas Lahtinen,
	Joerg Roedel, Kevin Tian, kvm, linux-s390, Longfang Liu,
	Matthew Rosato, Peter Oberparleiter, Halil Pasic, Robin Murphy,
	Rodrigo Vivi, Shameer Kolothum, Sven Schnelle, Tvrtko Ursulin,
	Vineeth Vijayan, Will Deacon, Yishai Hadas, Zhenyu Wang,
	Zhi Wang
  Cc: Liu, Yi L, Nicolin Chen, Lu Baolu

This legacy module knob has become uAPI, when set on the vfio_iommu_type1
it disables some security protections in the iommu drivers. Move the
storage for this knob to vfio_main.c so that iommufd can access it too.

The may need enhancing as we learn more about how necessary
allow_unsafe_interrupts is in the current state of the world. If vfio
container is disabled then this option will not be available to the user.

Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/vfio.h             | 2 ++
 drivers/vfio/vfio_iommu_type1.c | 5 ++---
 drivers/vfio/vfio_main.c        | 3 +++
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index f95f4925b83bbd..54e5a8e0834ccb 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -130,4 +130,6 @@ extern bool vfio_noiommu __read_mostly;
 enum { vfio_noiommu = false };
 #endif
 
+extern bool vfio_allow_unsafe_interrupts;
+
 #endif
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 23c24fe98c00d4..186e33a006d314 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -44,9 +44,8 @@
 #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
 #define DRIVER_DESC     "Type1 IOMMU driver for VFIO"
 
-static bool allow_unsafe_interrupts;
 module_param_named(allow_unsafe_interrupts,
-		   allow_unsafe_interrupts, bool, S_IRUGO | S_IWUSR);
+		   vfio_allow_unsafe_interrupts, bool, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(allow_unsafe_interrupts,
 		 "Enable VFIO IOMMU support for on platforms without interrupt remapping support.");
 
@@ -2282,7 +2281,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 		    iommu_group_for_each_dev(iommu_group, (void *)IOMMU_CAP_INTR_REMAP,
 					     vfio_iommu_device_capable);
 
-	if (!allow_unsafe_interrupts && !msi_remap) {
+	if (!vfio_allow_unsafe_interrupts && !msi_remap) {
 		pr_warn("%s: No interrupt remapping support.  Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
 		       __func__);
 		ret = -EPERM;
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 8c2dcb481ae10b..e1fec1db6a3c93 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -51,6 +51,9 @@ static struct vfio {
 	struct ida			device_ida;
 } vfio;
 
+bool vfio_allow_unsafe_interrupts;
+EXPORT_SYMBOL_GPL(vfio_allow_unsafe_interrupts);
+
 static DEFINE_XARRAY(vfio_device_set_xa);
 static const struct file_operations vfio_group_fops;
 
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 05/11] vfio: Use IOMMU_CAP_ENFORCE_CACHE_COHERENCY for vfio_file_enforced_coherent()
  2022-11-08  0:52 [PATCH v2 00/11] Connect VFIO to IOMMUFD Jason Gunthorpe
                   ` (3 preceding siblings ...)
  2022-11-08  0:52 ` [PATCH v2 04/11] vfio: Move storage of allow_unsafe_interrupts to vfio_main.c Jason Gunthorpe
@ 2022-11-08  0:52 ` Jason Gunthorpe
  2022-11-10  2:48   ` Tian, Kevin
  2022-11-08  0:52 ` [PATCH v2 06/11] vfio-iommufd: Allow iommufd to be used in place of a container fd Jason Gunthorpe
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 56+ messages in thread
From: Jason Gunthorpe @ 2022-11-08  0:52 UTC (permalink / raw)
  To: Alexander Gordeev, David Airlie, Tony Krowiak, Alex Williamson,
	Christian Borntraeger, Cornelia Huck, Daniel Vetter,
	Diana Craciun, dri-devel, Eric Auger, Eric Farman,
	Harald Freudenberger, Vasily Gorbik, Heiko Carstens, intel-gfx,
	intel-gvt-dev, iommu, Jani Nikula, Jason Herne, Joonas Lahtinen,
	Joerg Roedel, Kevin Tian, kvm, linux-s390, Longfang Liu,
	Matthew Rosato, Peter Oberparleiter, Halil Pasic, Robin Murphy,
	Rodrigo Vivi, Shameer Kolothum, Sven Schnelle, Tvrtko Ursulin,
	Vineeth Vijayan, Will Deacon, Yishai Hadas, Zhenyu Wang,
	Zhi Wang
  Cc: Liu, Yi L, Nicolin Chen, Lu Baolu

iommufd doesn't establish the iommu_domains until after the device FD is
opened, even if the container has been set. This design is part of moving
away from the group centric iommu APIs.

This is fine, except that the normal sequence of establishing the kvm
wbinvd won't work:

   group = open("/dev/vfio/XX")
   ioctl(group, VFIO_GROUP_SET_CONTAINER)
   ioctl(kvm, KVM_DEV_VFIO_GROUP_ADD)
   ioctl(group, VFIO_GROUP_GET_DEVICE_FD)

As the domains don't start existing until GET_DEVICE_FD. Further,
GET_DEVICE_FD requires that KVM_DEV_VFIO_GROUP_ADD already be done as that
is what sets the group->kvm and thus device->kvm for the driver to use
during open.

Now that we have device centric cap ops and the new
IOMMU_CAP_ENFORCE_CACHE_COHERENCY we know what the iommu_domain will be
capable of without having to create it. Use this to compute
vfio_file_enforced_coherent() and resolve the ordering problems.

VFIO always tries to upgrade domains to enforce cache coherency, it never
attaches a device that supports enforce cache coherency to a less capable
domain, so the cap test is a sufficient proxy for the ultimate
outcome. iommufd also ensures that devices that set the cap will be
connected to enforcing domains.

Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/container.c |  5 +++--
 drivers/vfio/vfio.h      |  2 --
 drivers/vfio/vfio_main.c | 29 ++++++++++++++++-------------
 3 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/drivers/vfio/container.c b/drivers/vfio/container.c
index 499777930b08fa..d97747dfb05d02 100644
--- a/drivers/vfio/container.c
+++ b/drivers/vfio/container.c
@@ -188,8 +188,9 @@ void vfio_device_container_unregister(struct vfio_device *device)
 			device->group->container->iommu_data, device);
 }
 
-long vfio_container_ioctl_check_extension(struct vfio_container *container,
-					  unsigned long arg)
+static long
+vfio_container_ioctl_check_extension(struct vfio_container *container,
+				     unsigned long arg)
 {
 	struct vfio_iommu_driver *driver;
 	long ret = 0;
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 54e5a8e0834ccb..247590334e14b0 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -119,8 +119,6 @@ int vfio_container_attach_group(struct vfio_container *container,
 void vfio_group_detach_container(struct vfio_group *group);
 void vfio_device_container_register(struct vfio_device *device);
 void vfio_device_container_unregister(struct vfio_device *device);
-long vfio_container_ioctl_check_extension(struct vfio_container *container,
-					  unsigned long arg);
 int __init vfio_container_init(void);
 void vfio_container_cleanup(void);
 
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index e1fec1db6a3c93..5c0e810f8b4d08 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1625,24 +1625,27 @@ EXPORT_SYMBOL_GPL(vfio_file_is_group);
 bool vfio_file_enforced_coherent(struct file *file)
 {
 	struct vfio_group *group = file->private_data;
-	bool ret;
+	struct vfio_device *device;
+	bool ret = true;
 
 	if (!vfio_file_is_group(file))
 		return true;
 
-	mutex_lock(&group->group_lock);
-	if (group->container) {
-		ret = vfio_container_ioctl_check_extension(group->container,
-							   VFIO_DMA_CC_IOMMU);
-	} else {
-		/*
-		 * Since the coherency state is determined only once a container
-		 * is attached the user must do so before they can prove they
-		 * have permission.
-		 */
-		ret = true;
+	/*
+	 * If the device does not have IOMMU_CAP_ENFORCE_CACHE_COHERENCY then
+	 * any domain later attached to it will also not support it. If the cap
+	 * is set then the iommu_domain eventually attached to the device/group
+	 * must use a domain with enforce_cache_coherency().
+	 */
+	mutex_lock(&group->device_lock);
+	list_for_each_entry(device, &group->device_list, group_next) {
+		if (!device_iommu_capable(device->dev,
+					  IOMMU_CAP_ENFORCE_CACHE_COHERENCY)) {
+			ret = false;
+			break;
+		}
 	}
-	mutex_unlock(&group->group_lock);
+	mutex_unlock(&group->device_lock);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 06/11] vfio-iommufd: Allow iommufd to be used in place of a container fd
  2022-11-08  0:52 [PATCH v2 00/11] Connect VFIO to IOMMUFD Jason Gunthorpe
                   ` (4 preceding siblings ...)
  2022-11-08  0:52 ` [PATCH v2 05/11] vfio: Use IOMMU_CAP_ENFORCE_CACHE_COHERENCY for vfio_file_enforced_coherent() Jason Gunthorpe
@ 2022-11-08  0:52 ` Jason Gunthorpe
  2022-11-10  2:51   ` Tian, Kevin
  2022-11-08  0:52 ` [PATCH v2 07/11] vfio-iommufd: Support iommufd for physical VFIO devices Jason Gunthorpe
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 56+ messages in thread
From: Jason Gunthorpe @ 2022-11-08  0:52 UTC (permalink / raw)
  To: Alexander Gordeev, David Airlie, Tony Krowiak, Alex Williamson,
	Christian Borntraeger, Cornelia Huck, Daniel Vetter,
	Diana Craciun, dri-devel, Eric Auger, Eric Farman,
	Harald Freudenberger, Vasily Gorbik, Heiko Carstens, intel-gfx,
	intel-gvt-dev, iommu, Jani Nikula, Jason Herne, Joonas Lahtinen,
	Joerg Roedel, Kevin Tian, kvm, linux-s390, Longfang Liu,
	Matthew Rosato, Peter Oberparleiter, Halil Pasic, Robin Murphy,
	Rodrigo Vivi, Shameer Kolothum, Sven Schnelle, Tvrtko Ursulin,
	Vineeth Vijayan, Will Deacon, Yishai Hadas, Zhenyu Wang,
	Zhi Wang
  Cc: Liu, Yi L, Nicolin Chen, Lu Baolu

This makes VFIO_GROUP_SET_CONTAINER accept both a vfio container FD and an
iommufd.

In iommufd mode an IOAS will exist after the SET_CONTAINER, but it will
not be attached to any groups.

For VFIO this means that the VFIO_GROUP_GET_STATUS and
VFIO_GROUP_FLAGS_VIABLE works subtly differently. With the container FD
the iommu_group_claim_dma_owner() is done during SET_CONTAINER but for
IOMMUFD this is done during VFIO_GROUP_GET_DEVICE_FD. Meaning that
VFIO_GROUP_FLAGS_VIABLE could be set but GET_DEVICE_FD will fail due to
viability.

As GET_DEVICE_FD can fail for many reasons already this is not expected to
be a meaningful difference.

Reorganize the tests for if the group has an assigned container or iommu
into a vfio_group_has_iommu() function and consolidate all the duplicated
WARN_ON's etc related to this.

Call container functions only if a container is actually present on the
group.

Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/Kconfig     |  1 +
 drivers/vfio/container.c |  7 +++-
 drivers/vfio/vfio.h      |  2 +
 drivers/vfio/vfio_main.c | 86 +++++++++++++++++++++++++++++++++-------
 4 files changed, 80 insertions(+), 16 deletions(-)

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 86c381ceb9a1e9..1118d322eec97d 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -2,6 +2,7 @@
 menuconfig VFIO
 	tristate "VFIO Non-Privileged userspace driver framework"
 	select IOMMU_API
+	depends on IOMMUFD || !IOMMUFD
 	select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
 	select INTERVAL_TREE
 	help
diff --git a/drivers/vfio/container.c b/drivers/vfio/container.c
index d97747dfb05d02..8772dad6808539 100644
--- a/drivers/vfio/container.c
+++ b/drivers/vfio/container.c
@@ -516,8 +516,11 @@ int vfio_group_use_container(struct vfio_group *group)
 {
 	lockdep_assert_held(&group->group_lock);
 
-	if (!group->container || !group->container->iommu_driver ||
-	    WARN_ON(!group->container_users))
+	/*
+	 * The container fd has been assigned with VFIO_GROUP_SET_CONTAINER but
+	 * VFIO_SET_IOMMU hasn't been done yet.
+	 */
+	if (!group->container->iommu_driver)
 		return -EINVAL;
 
 	if (group->type == VFIO_NO_IOMMU && !capable(CAP_SYS_RAWIO))
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 247590334e14b0..985e13d52989ca 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -10,6 +10,7 @@
 #include <linux/cdev.h>
 #include <linux/module.h>
 
+struct iommufd_ctx;
 struct iommu_group;
 struct vfio_device;
 struct vfio_container;
@@ -60,6 +61,7 @@ struct vfio_group {
 	struct kvm			*kvm;
 	struct file			*opened_file;
 	struct blocking_notifier_head	notifier;
+	struct iommufd_ctx		*iommufd;
 };
 
 /* events for the backend driver notify callback */
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 5c0e810f8b4d08..8c124290ce9f0d 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -35,6 +35,7 @@
 #include <linux/pm_runtime.h>
 #include <linux/interval_tree.h>
 #include <linux/iova_bitmap.h>
+#include <linux/iommufd.h>
 #include "vfio.h"
 
 #define DRIVER_VERSION	"0.3"
@@ -665,6 +666,16 @@ EXPORT_SYMBOL_GPL(vfio_unregister_group_dev);
 /*
  * VFIO Group fd, /dev/vfio/$GROUP
  */
+static bool vfio_group_has_iommu(struct vfio_group *group)
+{
+	lockdep_assert_held(&group->group_lock);
+	if (!group->container)
+		WARN_ON(group->container_users);
+	else
+		WARN_ON(!group->container_users);
+	return group->container || group->iommufd;
+}
+
 /*
  * VFIO_GROUP_UNSET_CONTAINER should fail if there are other users or
  * if there was no container to unset.  Since the ioctl is called on
@@ -676,15 +687,21 @@ static int vfio_group_ioctl_unset_container(struct vfio_group *group)
 	int ret = 0;
 
 	mutex_lock(&group->group_lock);
-	if (!group->container) {
+	if (!vfio_group_has_iommu(group)) {
 		ret = -EINVAL;
 		goto out_unlock;
 	}
-	if (group->container_users != 1) {
-		ret = -EBUSY;
-		goto out_unlock;
+	if (group->container) {
+		if (group->container_users != 1) {
+			ret = -EBUSY;
+			goto out_unlock;
+		}
+		vfio_group_detach_container(group);
+	}
+	if (group->iommufd) {
+		iommufd_ctx_put(group->iommufd);
+		group->iommufd = NULL;
 	}
-	vfio_group_detach_container(group);
 
 out_unlock:
 	mutex_unlock(&group->group_lock);
@@ -695,6 +712,7 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
 					  int __user *arg)
 {
 	struct vfio_container *container;
+	struct iommufd_ctx *iommufd;
 	struct fd f;
 	int ret;
 	int fd;
@@ -707,7 +725,7 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
 		return -EBADF;
 
 	mutex_lock(&group->group_lock);
-	if (group->container || WARN_ON(group->container_users)) {
+	if (vfio_group_has_iommu(group)) {
 		ret = -EINVAL;
 		goto out_unlock;
 	}
@@ -717,12 +735,28 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
 	}
 
 	container = vfio_container_from_file(f.file);
-	ret = -EINVAL;
 	if (container) {
 		ret = vfio_container_attach_group(container, group);
 		goto out_unlock;
 	}
 
+	iommufd = iommufd_ctx_from_file(f.file);
+	if (!IS_ERR(iommufd)) {
+		u32 ioas_id;
+
+		ret = iommufd_vfio_compat_ioas_id(iommufd, &ioas_id);
+		if (ret) {
+			iommufd_ctx_put(group->iommufd);
+			goto out_unlock;
+		}
+
+		group->iommufd = iommufd;
+		goto out_unlock;
+	}
+
+	/* The FD passed is not recognized. */
+	ret = -EBADFD;
+
 out_unlock:
 	mutex_unlock(&group->group_lock);
 	fdput(f);
@@ -752,9 +786,16 @@ static int vfio_device_first_open(struct vfio_device *device)
 	 * during close_device.
 	 */
 	mutex_lock(&device->group->group_lock);
-	ret = vfio_group_use_container(device->group);
-	if (ret)
+	if (!vfio_group_has_iommu(device->group)) {
+		ret = -EINVAL;
 		goto err_module_put;
+	}
+
+	if (device->group->container) {
+		ret = vfio_group_use_container(device->group);
+		if (ret)
+			goto err_module_put;
+	}
 
 	device->kvm = device->group->kvm;
 	if (device->ops->open_device) {
@@ -762,13 +803,15 @@ static int vfio_device_first_open(struct vfio_device *device)
 		if (ret)
 			goto err_container;
 	}
-	vfio_device_container_register(device);
+	if (device->group->container)
+		vfio_device_container_register(device);
 	mutex_unlock(&device->group->group_lock);
 	return 0;
 
 err_container:
 	device->kvm = NULL;
-	vfio_group_unuse_container(device->group);
+	if (device->group->container)
+		vfio_group_unuse_container(device->group);
 err_module_put:
 	mutex_unlock(&device->group->group_lock);
 	module_put(device->dev->driver->owner);
@@ -780,11 +823,13 @@ static void vfio_device_last_close(struct vfio_device *device)
 	lockdep_assert_held(&device->dev_set->lock);
 
 	mutex_lock(&device->group->group_lock);
-	vfio_device_container_unregister(device);
+	if (device->group->container)
+		vfio_device_container_unregister(device);
 	if (device->ops->close_device)
 		device->ops->close_device(device);
 	device->kvm = NULL;
-	vfio_group_unuse_container(device->group);
+	if (device->group->container)
+		vfio_group_unuse_container(device->group);
 	mutex_unlock(&device->group->group_lock);
 	module_put(device->dev->driver->owner);
 }
@@ -900,7 +945,14 @@ static int vfio_group_ioctl_get_status(struct vfio_group *group,
 		return -ENODEV;
 	}
 
-	if (group->container)
+	/*
+	 * With the container FD the iommu_group_claim_dma_owner() is done
+	 * during SET_CONTAINER but for IOMMFD this is done during
+	 * VFIO_GROUP_GET_DEVICE_FD. Meaning that with iommufd
+	 * VFIO_GROUP_FLAGS_VIABLE could be set but GET_DEVICE_FD will fail due
+	 * to viability.
+	 */
+	if (group->container || group->iommufd)
 		status.flags |= VFIO_GROUP_FLAGS_CONTAINER_SET |
 				VFIO_GROUP_FLAGS_VIABLE;
 	else if (!iommu_group_dma_owner_claimed(group->iommu_group))
@@ -983,6 +1035,10 @@ static int vfio_group_fops_release(struct inode *inode, struct file *filep)
 	WARN_ON(group->notifier.head);
 	if (group->container)
 		vfio_group_detach_container(group);
+	if (group->iommufd) {
+		iommufd_ctx_put(group->iommufd);
+		group->iommufd = NULL;
+	}
 	group->opened_file = NULL;
 	mutex_unlock(&group->group_lock);
 	return 0;
@@ -1881,6 +1937,8 @@ static void __exit vfio_cleanup(void)
 module_init(vfio_init);
 module_exit(vfio_cleanup);
 
+MODULE_IMPORT_NS(IOMMUFD);
+MODULE_IMPORT_NS(IOMMUFD_VFIO);
 MODULE_VERSION(DRIVER_VERSION);
 MODULE_LICENSE("GPL v2");
 MODULE_AUTHOR(DRIVER_AUTHOR);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 07/11] vfio-iommufd: Support iommufd for physical VFIO devices
  2022-11-08  0:52 [PATCH v2 00/11] Connect VFIO to IOMMUFD Jason Gunthorpe
                   ` (5 preceding siblings ...)
  2022-11-08  0:52 ` [PATCH v2 06/11] vfio-iommufd: Allow iommufd to be used in place of a container fd Jason Gunthorpe
@ 2022-11-08  0:52 ` Jason Gunthorpe
  2022-11-08  6:10   ` Nicolin Chen
                     ` (2 more replies)
  2022-11-08  0:52 ` [PATCH v2 08/11] vfio-iommufd: Support iommufd for emulated " Jason Gunthorpe
                   ` (6 subsequent siblings)
  13 siblings, 3 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2022-11-08  0:52 UTC (permalink / raw)
  To: Alexander Gordeev, David Airlie, Tony Krowiak, Alex Williamson,
	Christian Borntraeger, Cornelia Huck, Daniel Vetter,
	Diana Craciun, dri-devel, Eric Auger, Eric Farman,
	Harald Freudenberger, Vasily Gorbik, Heiko Carstens, intel-gfx,
	intel-gvt-dev, iommu, Jani Nikula, Jason Herne, Joonas Lahtinen,
	Joerg Roedel, Kevin Tian, kvm, linux-s390, Longfang Liu,
	Matthew Rosato, Peter Oberparleiter, Halil Pasic, Robin Murphy,
	Rodrigo Vivi, Shameer Kolothum, Sven Schnelle, Tvrtko Ursulin,
	Vineeth Vijayan, Will Deacon, Yishai Hadas, Zhenyu Wang,
	Zhi Wang
  Cc: Liu, Yi L, Nicolin Chen, Lu Baolu

This creates the iommufd_device for the physical VFIO drivers. These are
all the drivers that are calling vfio_register_group_dev() and expect the
type1 code to setup a real iommu_domain against their parent struct
device.

The design gives the driver a choice in how it gets connected to iommufd
by providing bind_iommufd/unbind_iommufd/attach_ioas callbacks to
implement as required. The core code provides three default callbacks for
physical mode using a real iommu_domain. This is suitable for drivers
using vfio_register_group_dev()

Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/Makefile                         |  1 +
 drivers/vfio/fsl-mc/vfio_fsl_mc.c             |  3 +
 drivers/vfio/iommufd.c                        | 99 +++++++++++++++++++
 .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    |  6 ++
 drivers/vfio/pci/mlx5/main.c                  |  3 +
 drivers/vfio/pci/vfio_pci.c                   |  3 +
 drivers/vfio/platform/vfio_amba.c             |  3 +
 drivers/vfio/platform/vfio_platform.c         |  3 +
 drivers/vfio/vfio.h                           | 15 +++
 drivers/vfio/vfio_main.c                      | 13 ++-
 include/linux/vfio.h                          | 25 +++++
 11 files changed, 172 insertions(+), 2 deletions(-)
 create mode 100644 drivers/vfio/iommufd.c

diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index b693a1169286f8..3863922529ef20 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -6,6 +6,7 @@ obj-$(CONFIG_VFIO) += vfio.o
 vfio-y += vfio_main.o \
 	  iova_bitmap.o \
 	  container.o
+vfio-$(CONFIG_IOMMUFD) += iommufd.o
 
 obj-$(CONFIG_VFIO_VIRQFD) += vfio_virqfd.o
 obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc.c b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
index b16874e913e4f5..5cd4bb47644039 100644
--- a/drivers/vfio/fsl-mc/vfio_fsl_mc.c
+++ b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
@@ -592,6 +592,9 @@ static const struct vfio_device_ops vfio_fsl_mc_ops = {
 	.read		= vfio_fsl_mc_read,
 	.write		= vfio_fsl_mc_write,
 	.mmap		= vfio_fsl_mc_mmap,
+	.bind_iommufd	= vfio_iommufd_physical_bind,
+	.unbind_iommufd	= vfio_iommufd_physical_unbind,
+	.attach_ioas	= vfio_iommufd_physical_attach_ioas,
 };
 
 static struct fsl_mc_driver vfio_fsl_mc_driver = {
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
new file mode 100644
index 00000000000000..bf755d0f375c5d
--- /dev/null
+++ b/drivers/vfio/iommufd.c
@@ -0,0 +1,99 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES
+ */
+#include <linux/vfio.h>
+#include <linux/iommufd.h>
+
+#include "vfio.h"
+
+MODULE_IMPORT_NS(IOMMUFD);
+MODULE_IMPORT_NS(IOMMUFD_VFIO);
+
+int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
+{
+	u32 ioas_id;
+	u32 device_id;
+	int ret;
+
+	lockdep_assert_held(&vdev->dev_set->lock);
+
+	/*
+	 * If the driver doesn't provide this op then it means the device does
+	 * not do DMA at all. So nothing to do.
+	 */
+	if (!vdev->ops->bind_iommufd)
+		return 0;
+
+	ret = vdev->ops->bind_iommufd(vdev, ictx, &device_id);
+	if (ret)
+		return ret;
+
+	ret = iommufd_vfio_compat_ioas_id(ictx, &ioas_id);
+	if (ret)
+		goto err_unbind;
+	ret = vdev->ops->attach_ioas(vdev, &ioas_id);
+	if (ret)
+		goto err_unbind;
+	vdev->iommufd_attached = true;
+
+	/*
+	 * The legacy path has no way to return the device id or the selected
+	 * pt_id
+	 */
+	return 0;
+
+err_unbind:
+	if (vdev->ops->unbind_iommufd)
+		vdev->ops->unbind_iommufd(vdev);
+	return ret;
+}
+
+void vfio_iommufd_unbind(struct vfio_device *vdev)
+{
+	lockdep_assert_held(&vdev->dev_set->lock);
+
+	if (vdev->ops->unbind_iommufd)
+		vdev->ops->unbind_iommufd(vdev);
+}
+
+/*
+ * The physical standard ops mean that the iommufd_device is bound to the
+ * physical device vdev->dev that was provided to vfio_init_group_dev(). Drivers
+ * using this ops set should call vfio_register_group_dev()
+ */
+int vfio_iommufd_physical_bind(struct vfio_device *vdev,
+			       struct iommufd_ctx *ictx, u32 *out_device_id)
+{
+	struct iommufd_device *idev;
+
+	idev = iommufd_device_bind(ictx, vdev->dev, out_device_id);
+	if (IS_ERR(idev))
+		return PTR_ERR(idev);
+	vdev->iommufd_device = idev;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_physical_bind);
+
+void vfio_iommufd_physical_unbind(struct vfio_device *vdev)
+{
+	lockdep_assert_held(&vdev->dev_set->lock);
+
+	if (vdev->iommufd_attached) {
+		iommufd_device_detach(vdev->iommufd_device);
+		vdev->iommufd_attached = false;
+	}
+	iommufd_device_unbind(vdev->iommufd_device);
+	vdev->iommufd_device = NULL;
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_physical_unbind);
+
+int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
+{
+	unsigned int flags = 0;
+
+	if (vfio_allow_unsafe_interrupts)
+		flags |= IOMMUFD_ATTACH_FLAGS_ALLOW_UNSAFE_INTERRUPT;
+	return iommufd_device_attach(vdev->iommufd_device, pt_id, flags);
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_physical_attach_ioas);
diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index 39eeca18a0f7c8..40019b11c5a969 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -1246,6 +1246,9 @@ static const struct vfio_device_ops hisi_acc_vfio_pci_migrn_ops = {
 	.mmap = hisi_acc_vfio_pci_mmap,
 	.request = vfio_pci_core_request,
 	.match = vfio_pci_core_match,
+	.bind_iommufd = vfio_iommufd_physical_bind,
+	.unbind_iommufd = vfio_iommufd_physical_unbind,
+	.attach_ioas = vfio_iommufd_physical_attach_ioas,
 };
 
 static const struct vfio_device_ops hisi_acc_vfio_pci_ops = {
@@ -1261,6 +1264,9 @@ static const struct vfio_device_ops hisi_acc_vfio_pci_ops = {
 	.mmap = vfio_pci_core_mmap,
 	.request = vfio_pci_core_request,
 	.match = vfio_pci_core_match,
+	.bind_iommufd = vfio_iommufd_physical_bind,
+	.unbind_iommufd = vfio_iommufd_physical_unbind,
+	.attach_ioas = vfio_iommufd_physical_attach_ioas,
 };
 
 static int hisi_acc_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
index fd6ccb8454a24a..32d1f38d351e7e 100644
--- a/drivers/vfio/pci/mlx5/main.c
+++ b/drivers/vfio/pci/mlx5/main.c
@@ -623,6 +623,9 @@ static const struct vfio_device_ops mlx5vf_pci_ops = {
 	.mmap = vfio_pci_core_mmap,
 	.request = vfio_pci_core_request,
 	.match = vfio_pci_core_match,
+	.bind_iommufd = vfio_iommufd_physical_bind,
+	.unbind_iommufd = vfio_iommufd_physical_unbind,
+	.attach_ioas = vfio_iommufd_physical_attach_ioas,
 };
 
 static int mlx5vf_pci_probe(struct pci_dev *pdev,
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 1d4919edfbde48..29091ee2e9849b 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -138,6 +138,9 @@ static const struct vfio_device_ops vfio_pci_ops = {
 	.mmap		= vfio_pci_core_mmap,
 	.request	= vfio_pci_core_request,
 	.match		= vfio_pci_core_match,
+	.bind_iommufd	= vfio_iommufd_physical_bind,
+	.unbind_iommufd	= vfio_iommufd_physical_unbind,
+	.attach_ioas	= vfio_iommufd_physical_attach_ioas,
 };
 
 static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
diff --git a/drivers/vfio/platform/vfio_amba.c b/drivers/vfio/platform/vfio_amba.c
index eaea63e5294c58..5a046098d0bdf4 100644
--- a/drivers/vfio/platform/vfio_amba.c
+++ b/drivers/vfio/platform/vfio_amba.c
@@ -117,6 +117,9 @@ static const struct vfio_device_ops vfio_amba_ops = {
 	.read		= vfio_platform_read,
 	.write		= vfio_platform_write,
 	.mmap		= vfio_platform_mmap,
+	.bind_iommufd	= vfio_iommufd_physical_bind,
+	.unbind_iommufd	= vfio_iommufd_physical_unbind,
+	.attach_ioas	= vfio_iommufd_physical_attach_ioas,
 };
 
 static const struct amba_id pl330_ids[] = {
diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c
index 82cedcebfd9022..b87c3b70878341 100644
--- a/drivers/vfio/platform/vfio_platform.c
+++ b/drivers/vfio/platform/vfio_platform.c
@@ -106,6 +106,9 @@ static const struct vfio_device_ops vfio_platform_ops = {
 	.read		= vfio_platform_read,
 	.write		= vfio_platform_write,
 	.mmap		= vfio_platform_mmap,
+	.bind_iommufd	= vfio_iommufd_physical_bind,
+	.unbind_iommufd	= vfio_iommufd_physical_unbind,
+	.attach_ioas	= vfio_iommufd_physical_attach_ioas,
 };
 
 static struct platform_driver vfio_platform_driver = {
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 985e13d52989ca..809f2e8523968e 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -124,6 +124,21 @@ void vfio_device_container_unregister(struct vfio_device *device);
 int __init vfio_container_init(void);
 void vfio_container_cleanup(void);
 
+#if IS_ENABLED(CONFIG_IOMMUFD)
+int vfio_iommufd_bind(struct vfio_device *device, struct iommufd_ctx *ictx);
+void vfio_iommufd_unbind(struct vfio_device *device);
+#else
+static inline int vfio_iommufd_bind(struct vfio_device *device,
+				    struct iommufd_ctx *ictx)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline void vfio_iommufd_unbind(struct vfio_device *device)
+{
+}
+#endif
+
 #ifdef CONFIG_VFIO_NOIOMMU
 extern bool vfio_noiommu __read_mostly;
 #else
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 8c124290ce9f0d..2f3e35d2f2083d 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -528,6 +528,11 @@ static int __vfio_register_dev(struct vfio_device *device,
 	if (IS_ERR(group))
 		return PTR_ERR(group);
 
+	if (WARN_ON(device->ops->bind_iommufd &&
+		    (!device->ops->unbind_iommufd ||
+		     !device->ops->attach_ioas)))
+		return -EINVAL;
+
 	/*
 	 * If the driver doesn't specify a set then the device is added to a
 	 * singleton set just for itself.
@@ -795,6 +800,10 @@ static int vfio_device_first_open(struct vfio_device *device)
 		ret = vfio_group_use_container(device->group);
 		if (ret)
 			goto err_module_put;
+	} else if (device->group->iommufd) {
+		ret = vfio_iommufd_bind(device, device->group->iommufd);
+		if (ret)
+			goto err_module_put;
 	}
 
 	device->kvm = device->group->kvm;
@@ -812,6 +821,7 @@ static int vfio_device_first_open(struct vfio_device *device)
 	device->kvm = NULL;
 	if (device->group->container)
 		vfio_group_unuse_container(device->group);
+	vfio_iommufd_unbind(device);
 err_module_put:
 	mutex_unlock(&device->group->group_lock);
 	module_put(device->dev->driver->owner);
@@ -830,6 +840,7 @@ static void vfio_device_last_close(struct vfio_device *device)
 	device->kvm = NULL;
 	if (device->group->container)
 		vfio_group_unuse_container(device->group);
+	vfio_iommufd_unbind(device);
 	mutex_unlock(&device->group->group_lock);
 	module_put(device->dev->driver->owner);
 }
@@ -1937,8 +1948,6 @@ static void __exit vfio_cleanup(void)
 module_init(vfio_init);
 module_exit(vfio_cleanup);
 
-MODULE_IMPORT_NS(IOMMUFD);
-MODULE_IMPORT_NS(IOMMUFD_VFIO);
 MODULE_VERSION(DRIVER_VERSION);
 MODULE_LICENSE("GPL v2");
 MODULE_AUTHOR(DRIVER_AUTHOR);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index e7cebeb875dd1a..a7fc4d747dc226 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -17,6 +17,8 @@
 #include <linux/iova_bitmap.h>
 
 struct kvm;
+struct iommufd_ctx;
+struct iommufd_device;
 
 /*
  * VFIO devices can be placed in a set, this allows all devices to share this
@@ -54,6 +56,10 @@ struct vfio_device {
 	struct completion comp;
 	struct list_head group_next;
 	struct list_head iommu_entry;
+#if IS_ENABLED(CONFIG_IOMMUFD)
+	struct iommufd_device *iommufd_device;
+	bool iommufd_attached;
+#endif
 };
 
 /**
@@ -80,6 +86,10 @@ struct vfio_device_ops {
 	char	*name;
 	int	(*init)(struct vfio_device *vdev);
 	void	(*release)(struct vfio_device *vdev);
+	int	(*bind_iommufd)(struct vfio_device *vdev,
+				struct iommufd_ctx *ictx, u32 *out_device_id);
+	void	(*unbind_iommufd)(struct vfio_device *vdev);
+	int	(*attach_ioas)(struct vfio_device *vdev, u32 *pt_id);
 	int	(*open_device)(struct vfio_device *vdev);
 	void	(*close_device)(struct vfio_device *vdev);
 	ssize_t	(*read)(struct vfio_device *vdev, char __user *buf,
@@ -96,6 +106,21 @@ struct vfio_device_ops {
 				  void __user *arg, size_t argsz);
 };
 
+#if IS_ENABLED(CONFIG_IOMMUFD)
+int vfio_iommufd_physical_bind(struct vfio_device *vdev,
+			       struct iommufd_ctx *ictx, u32 *out_device_id);
+void vfio_iommufd_physical_unbind(struct vfio_device *vdev);
+int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
+#else
+#define vfio_iommufd_physical_bind                                      \
+	((int (*)(struct vfio_device *vdev, struct iommufd_ctx *ictx,   \
+		  u32 *out_device_id)) NULL)
+#define vfio_iommufd_physical_unbind \
+	((void (*)(struct vfio_device *vdev)) NULL)
+#define vfio_iommufd_physical_attach_ioas \
+	((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
+#endif
+
 /**
  * @migration_set_state: Optional callback to change the migration state for
  *         devices that support migration. It's mandatory for
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 08/11] vfio-iommufd: Support iommufd for emulated VFIO devices
  2022-11-08  0:52 [PATCH v2 00/11] Connect VFIO to IOMMUFD Jason Gunthorpe
                   ` (6 preceding siblings ...)
  2022-11-08  0:52 ` [PATCH v2 07/11] vfio-iommufd: Support iommufd for physical VFIO devices Jason Gunthorpe
@ 2022-11-08  0:52 ` Jason Gunthorpe
  2022-11-10  5:33   ` Tian, Kevin
  2022-11-08  0:52 ` [PATCH v2 09/11] vfio: Move container related MODULE_ALIAS statements into container.c Jason Gunthorpe
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 56+ messages in thread
From: Jason Gunthorpe @ 2022-11-08  0:52 UTC (permalink / raw)
  To: Alexander Gordeev, David Airlie, Tony Krowiak, Alex Williamson,
	Christian Borntraeger, Cornelia Huck, Daniel Vetter,
	Diana Craciun, dri-devel, Eric Auger, Eric Farman,
	Harald Freudenberger, Vasily Gorbik, Heiko Carstens, intel-gfx,
	intel-gvt-dev, iommu, Jani Nikula, Jason Herne, Joonas Lahtinen,
	Joerg Roedel, Kevin Tian, kvm, linux-s390, Longfang Liu,
	Matthew Rosato, Peter Oberparleiter, Halil Pasic, Robin Murphy,
	Rodrigo Vivi, Shameer Kolothum, Sven Schnelle, Tvrtko Ursulin,
	Vineeth Vijayan, Will Deacon, Yishai Hadas, Zhenyu Wang,
	Zhi Wang
  Cc: Liu, Yi L, Nicolin Chen, Lu Baolu

Emulated VFIO devices are calling vfio_register_emulated_iommu_dev() and
consist of all the mdev drivers.

Like the physical drivers, support for iommufd is provided by the driver
supplying the correct standard ops. Provide ops from the core that
duplicate what vfio_register_emulated_iommu_dev() does.

Emulated drivers are where it is more likely to see variation in the
iommfd support ops. For instance IDXD will probably need to setup both a
iommfd_device context linked to a PASID and an iommufd_access context to
support all their mdev operations.

Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c  |   3 +
 drivers/s390/cio/vfio_ccw_ops.c   |   3 +
 drivers/s390/crypto/vfio_ap_ops.c |   3 +
 drivers/vfio/container.c          | 110 +++++-----------------------
 drivers/vfio/iommufd.c            |  58 +++++++++++++++
 drivers/vfio/vfio.h               |  10 ++-
 drivers/vfio/vfio_main.c          | 114 +++++++++++++++++++++++++++++-
 include/linux/vfio.h              |  14 ++++
 8 files changed, 221 insertions(+), 94 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 7a45e5360caf2d..579b230a0f58d9 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1474,6 +1474,9 @@ static const struct vfio_device_ops intel_vgpu_dev_ops = {
 	.mmap		= intel_vgpu_mmap,
 	.ioctl		= intel_vgpu_ioctl,
 	.dma_unmap	= intel_vgpu_dma_unmap,
+	.bind_iommufd	= vfio_iommufd_emulated_bind,
+	.unbind_iommufd = vfio_iommufd_emulated_unbind,
+	.attach_ioas	= vfio_iommufd_emulated_attach_ioas,
 };
 
 static int intel_vgpu_probe(struct mdev_device *mdev)
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index 6ae4d012d80084..560453d99c24fc 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -588,6 +588,9 @@ static const struct vfio_device_ops vfio_ccw_dev_ops = {
 	.ioctl = vfio_ccw_mdev_ioctl,
 	.request = vfio_ccw_mdev_request,
 	.dma_unmap = vfio_ccw_dma_unmap,
+	.bind_iommufd = vfio_iommufd_emulated_bind,
+	.unbind_iommufd = vfio_iommufd_emulated_unbind,
+	.attach_ioas = vfio_iommufd_emulated_attach_ioas,
 };
 
 struct mdev_driver vfio_ccw_mdev_driver = {
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 0b4cc8c597ae67..bb7776d207924f 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1789,6 +1789,9 @@ static const struct vfio_device_ops vfio_ap_matrix_dev_ops = {
 	.close_device = vfio_ap_mdev_close_device,
 	.ioctl = vfio_ap_mdev_ioctl,
 	.dma_unmap = vfio_ap_mdev_dma_unmap,
+	.bind_iommufd = vfio_iommufd_emulated_bind,
+	.unbind_iommufd = vfio_iommufd_emulated_unbind,
+	.attach_ioas = vfio_iommufd_emulated_attach_ioas,
 };
 
 static struct mdev_driver vfio_ap_matrix_driver = {
diff --git a/drivers/vfio/container.c b/drivers/vfio/container.c
index 8772dad6808539..7f3961fd4b5aac 100644
--- a/drivers/vfio/container.c
+++ b/drivers/vfio/container.c
@@ -540,113 +540,41 @@ void vfio_group_unuse_container(struct vfio_group *group)
 	fput(group->opened_file);
 }
 
-/*
- * Pin contiguous user pages and return their associated host pages for local
- * domain only.
- * @device [in]  : device
- * @iova [in]    : starting IOVA of user pages to be pinned.
- * @npage [in]   : count of pages to be pinned.  This count should not
- *		   be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
- * @prot [in]    : protection flags
- * @pages[out]   : array of host pages
- * Return error or number of pages pinned.
- *
- * A driver may only call this function if the vfio_device was created
- * by vfio_register_emulated_iommu_dev().
- */
-int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
-		   int npage, int prot, struct page **pages)
+int vfio_container_pin_pages(struct vfio_container *container,
+			     struct iommu_group *iommu_group, dma_addr_t iova,
+			     int npage, int prot, struct page **pages)
 {
-	struct vfio_container *container;
-	struct vfio_group *group = device->group;
-	struct vfio_iommu_driver *driver;
-	int ret;
-
-	if (!pages || !npage || !vfio_assert_device_open(device))
-		return -EINVAL;
+	struct vfio_iommu_driver *driver = container->iommu_driver;
 
 	if (npage > VFIO_PIN_PAGES_MAX_ENTRIES)
 		return -E2BIG;
 
-	/* group->container cannot change while a vfio device is open */
-	container = group->container;
-	driver = container->iommu_driver;
-	if (likely(driver && driver->ops->pin_pages))
-		ret = driver->ops->pin_pages(container->iommu_data,
-					     group->iommu_group, iova,
-					     npage, prot, pages);
-	else
-		ret = -ENOTTY;
-
-	return ret;
+	if (unlikely(!driver || !driver->ops->pin_pages))
+		return -ENOTTY;
+	return driver->ops->pin_pages(container->iommu_data, iommu_group, iova,
+				      npage, prot, pages);
 }
-EXPORT_SYMBOL(vfio_pin_pages);
 
-/*
- * Unpin contiguous host pages for local domain only.
- * @device [in]  : device
- * @iova [in]    : starting address of user pages to be unpinned.
- * @npage [in]   : count of pages to be unpinned.  This count should not
- *                 be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
- */
-void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova, int npage)
+void vfio_container_unpin_pages(struct vfio_container *container,
+				dma_addr_t iova, int npage)
 {
-	struct vfio_container *container;
-	struct vfio_iommu_driver *driver;
-
 	if (WARN_ON(npage <= 0 || npage > VFIO_PIN_PAGES_MAX_ENTRIES))
 		return;
 
-	if (WARN_ON(!vfio_assert_device_open(device)))
-		return;
-
-	/* group->container cannot change while a vfio device is open */
-	container = device->group->container;
-	driver = container->iommu_driver;
-
-	driver->ops->unpin_pages(container->iommu_data, iova, npage);
+	container->iommu_driver->ops->unpin_pages(container->iommu_data, iova,
+						  npage);
 }
-EXPORT_SYMBOL(vfio_unpin_pages);
 
-/*
- * This interface allows the CPUs to perform some sort of virtual DMA on
- * behalf of the device.
- *
- * CPUs read/write from/into a range of IOVAs pointing to user space memory
- * into/from a kernel buffer.
- *
- * As the read/write of user space memory is conducted via the CPUs and is
- * not a real device DMA, it is not necessary to pin the user space memory.
- *
- * @device [in]		: VFIO device
- * @iova [in]		: base IOVA of a user space buffer
- * @data [in]		: pointer to kernel buffer
- * @len [in]		: kernel buffer length
- * @write		: indicate read or write
- * Return error code on failure or 0 on success.
- */
-int vfio_dma_rw(struct vfio_device *device, dma_addr_t iova, void *data,
-		size_t len, bool write)
+int vfio_container_dma_rw(struct vfio_container *container, dma_addr_t iova,
+			  void *data, size_t len, bool write)
 {
-	struct vfio_container *container;
-	struct vfio_iommu_driver *driver;
-	int ret = 0;
-
-	if (!data || len <= 0 || !vfio_assert_device_open(device))
-		return -EINVAL;
-
-	/* group->container cannot change while a vfio device is open */
-	container = device->group->container;
-	driver = container->iommu_driver;
+	struct vfio_iommu_driver *driver = container->iommu_driver;
 
-	if (likely(driver && driver->ops->dma_rw))
-		ret = driver->ops->dma_rw(container->iommu_data,
-					  iova, data, len, write);
-	else
-		ret = -ENOTTY;
-	return ret;
+	if (unlikely(!driver || !driver->ops->dma_rw))
+		return -ENOTTY;
+	return driver->ops->dma_rw(container->iommu_data, iova, data, len,
+				   write);
 }
-EXPORT_SYMBOL(vfio_dma_rw);
 
 int __init vfio_container_init(void)
 {
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index bf755d0f375c5d..595c7b2146f88c 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -97,3 +97,61 @@ int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
 	return iommufd_device_attach(vdev->iommufd_device, pt_id, flags);
 }
 EXPORT_SYMBOL_GPL(vfio_iommufd_physical_attach_ioas);
+
+/*
+ * The emulated standard ops mean that vfio_device is going to use the
+ * "mdev path" and will call vfio_pin_pages()/vfio_dma_rw(). Drivers using this
+ * ops set should call vfio_register_emulated_iommu_dev().
+ */
+
+static void vfio_emulated_unmap(void *data, unsigned long iova,
+				unsigned long length)
+{
+	struct vfio_device *vdev = data;
+
+	vdev->ops->dma_unmap(vdev, iova, length);
+}
+
+static const struct iommufd_access_ops vfio_user_ops = {
+	.needs_pin_pages = 1,
+	.unmap = vfio_emulated_unmap,
+};
+
+int vfio_iommufd_emulated_bind(struct vfio_device *vdev,
+			       struct iommufd_ctx *ictx, u32 *out_device_id)
+{
+	lockdep_assert_held(&vdev->dev_set->lock);
+
+	vdev->iommufd_ictx = ictx;
+	iommufd_ctx_get(ictx);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_emulated_bind);
+
+void vfio_iommufd_emulated_unbind(struct vfio_device *vdev)
+{
+	lockdep_assert_held(&vdev->dev_set->lock);
+
+	if (vdev->iommufd_access) {
+		iommufd_access_destroy(vdev->iommufd_access);
+		vdev->iommufd_access = NULL;
+	}
+	iommufd_ctx_put(vdev->iommufd_ictx);
+	vdev->iommufd_ictx = NULL;
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_emulated_unbind);
+
+int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
+{
+	struct iommufd_access *user;
+
+	lockdep_assert_held(&vdev->dev_set->lock);
+
+	user = iommufd_access_create(vdev->iommufd_ictx, *pt_id, &vfio_user_ops,
+				     vdev);
+	if (IS_ERR(user))
+		return PTR_ERR(user);
+	vdev->iommufd_access = user;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_emulated_attach_ioas);
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 809f2e8523968e..d57a08afb5cf5c 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -111,8 +111,6 @@ struct vfio_iommu_driver {
 int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops);
 void vfio_unregister_iommu_driver(const struct vfio_iommu_driver_ops *ops);
 
-bool vfio_assert_device_open(struct vfio_device *device);
-
 struct vfio_container *vfio_container_from_file(struct file *filep);
 int vfio_group_use_container(struct vfio_group *group);
 void vfio_group_unuse_container(struct vfio_group *group);
@@ -121,6 +119,14 @@ int vfio_container_attach_group(struct vfio_container *container,
 void vfio_group_detach_container(struct vfio_group *group);
 void vfio_device_container_register(struct vfio_device *device);
 void vfio_device_container_unregister(struct vfio_device *device);
+int vfio_container_pin_pages(struct vfio_container *container,
+			     struct iommu_group *iommu_group, dma_addr_t iova,
+			     int npage, int prot, struct page **pages);
+void vfio_container_unpin_pages(struct vfio_container *container,
+				dma_addr_t iova, int npage);
+int vfio_container_dma_rw(struct vfio_container *container, dma_addr_t iova,
+			  void *data, size_t len, bool write);
+
 int __init vfio_container_init(void);
 void vfio_container_cleanup(void);
 
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 2f3e35d2f2083d..cf49c5200a4c05 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -771,7 +771,7 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
 static const struct file_operations vfio_device_fops;
 
 /* true if the vfio_device has open_device() called but not close_device() */
-bool vfio_assert_device_open(struct vfio_device *device)
+static bool vfio_assert_device_open(struct vfio_device *device)
 {
 	return !WARN_ON_ONCE(!READ_ONCE(device->open_count));
 }
@@ -1875,6 +1875,118 @@ int vfio_set_irqs_validate_and_prepare(struct vfio_irq_set *hdr, int num_irqs,
 }
 EXPORT_SYMBOL(vfio_set_irqs_validate_and_prepare);
 
+/*
+ * Pin contiguous user pages and return their associated host pages for local
+ * domain only.
+ * @device [in]  : device
+ * @iova [in]    : starting IOVA of user pages to be pinned.
+ * @npage [in]   : count of pages to be pinned.  This count should not
+ *		   be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
+ * @prot [in]    : protection flags
+ * @pages[out]   : array of host pages
+ * Return error or number of pages pinned.
+ *
+ * A driver may only call this function if the vfio_device was created
+ * by vfio_register_emulated_iommu_dev() due to vfio_container_pin_pages().
+ */
+int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
+		   int npage, int prot, struct page **pages)
+{
+	/* group->container cannot change while a vfio device is open */
+	if (!pages || !npage || WARN_ON(!vfio_assert_device_open(device)))
+		return -EINVAL;
+	if (device->group->container)
+		return vfio_container_pin_pages(device->group->container,
+						device->group->iommu_group,
+						iova, npage, prot, pages);
+	if (device->iommufd_access) {
+		int ret;
+
+		if (iova > ULONG_MAX)
+			return -EINVAL;
+		ret = iommufd_access_pin_pages(
+			device->iommufd_access, iova, npage * PAGE_SIZE, pages,
+			(prot & IOMMU_WRITE) ? IOMMUFD_ACCESS_RW_WRITE : 0);
+		if (ret)
+			return ret;
+		return npage;
+	}
+	return -EINVAL;
+}
+EXPORT_SYMBOL(vfio_pin_pages);
+
+/*
+ * Unpin contiguous host pages for local domain only.
+ * @device [in]  : device
+ * @iova [in]    : starting address of user pages to be unpinned.
+ * @npage [in]   : count of pages to be unpinned.  This count should not
+ *                 be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
+ */
+void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova, int npage)
+{
+	if (WARN_ON(!vfio_assert_device_open(device)))
+		return;
+
+	if (device->group->container) {
+		vfio_container_unpin_pages(device->group->container, iova,
+					   npage);
+		return;
+	}
+	if (device->iommufd_access) {
+		if (WARN_ON(iova > ULONG_MAX))
+			return;
+		iommufd_access_unpin_pages(device->iommufd_access, iova,
+					   npage * PAGE_SIZE);
+		return;
+	}
+}
+EXPORT_SYMBOL(vfio_unpin_pages);
+
+/*
+ * This interface allows the CPUs to perform some sort of virtual DMA on
+ * behalf of the device.
+ *
+ * CPUs read/write from/into a range of IOVAs pointing to user space memory
+ * into/from a kernel buffer.
+ *
+ * As the read/write of user space memory is conducted via the CPUs and is
+ * not a real device DMA, it is not necessary to pin the user space memory.
+ *
+ * @device [in]		: VFIO device
+ * @iova [in]		: base IOVA of a user space buffer
+ * @data [in]		: pointer to kernel buffer
+ * @len [in]		: kernel buffer length
+ * @write		: indicate read or write
+ * Return error code on failure or 0 on success.
+ */
+int vfio_dma_rw(struct vfio_device *device, dma_addr_t iova, void *data,
+		size_t len, bool write)
+{
+	if (!data || len <= 0 || !vfio_assert_device_open(device))
+		return -EINVAL;
+
+	if (device->group->container)
+		return vfio_container_dma_rw(device->group->container, iova,
+					     data, len, write);
+
+	if (device->iommufd_access) {
+		unsigned int flags = 0;
+
+		if (iova > ULONG_MAX)
+			return -EINVAL;
+
+		/* VFIO historically tries to auto-detect a kthread */
+		if (!current->mm)
+			flags |= IOMMUFD_ACCESS_RW_KTHREAD;
+		if (write)
+			flags |= IOMMUFD_ACCESS_RW_WRITE;
+		return iommufd_access_rw(device->iommufd_access, iova, data,
+					 len, flags);
+	}
+	return -EINVAL;
+}
+EXPORT_SYMBOL(vfio_dma_rw);
+
 /*
  * Module/class support
  */
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index a7fc4d747dc226..d5f84f98c0fa8f 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -19,6 +19,7 @@
 struct kvm;
 struct iommufd_ctx;
 struct iommufd_device;
+struct iommufd_access;
 
 /*
  * VFIO devices can be placed in a set, this allows all devices to share this
@@ -56,8 +57,10 @@ struct vfio_device {
 	struct completion comp;
 	struct list_head group_next;
 	struct list_head iommu_entry;
+	struct iommufd_access *iommufd_access;
 #if IS_ENABLED(CONFIG_IOMMUFD)
 	struct iommufd_device *iommufd_device;
+	struct iommufd_ctx *iommufd_ictx;
 	bool iommufd_attached;
 #endif
 };
@@ -111,6 +114,10 @@ int vfio_iommufd_physical_bind(struct vfio_device *vdev,
 			       struct iommufd_ctx *ictx, u32 *out_device_id);
 void vfio_iommufd_physical_unbind(struct vfio_device *vdev);
 int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
+int vfio_iommufd_emulated_bind(struct vfio_device *vdev,
+			       struct iommufd_ctx *ictx, u32 *out_device_id);
+void vfio_iommufd_emulated_unbind(struct vfio_device *vdev);
+int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
 #else
 #define vfio_iommufd_physical_bind                                      \
 	((int (*)(struct vfio_device *vdev, struct iommufd_ctx *ictx,   \
@@ -119,6 +126,13 @@ int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
 	((void (*)(struct vfio_device *vdev)) NULL)
 #define vfio_iommufd_physical_attach_ioas \
 	((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
+#define vfio_iommufd_emulated_bind                                      \
+	((int (*)(struct vfio_device *vdev, struct iommufd_ctx *ictx,   \
+		  u32 *out_device_id)) NULL)
+#define vfio_iommufd_emulated_unbind \
+	((void (*)(struct vfio_device *vdev)) NULL)
+#define vfio_iommufd_emulated_attach_ioas \
+	((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
 #endif
 
 /**
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 09/11] vfio: Move container related MODULE_ALIAS statements into container.c
  2022-11-08  0:52 [PATCH v2 00/11] Connect VFIO to IOMMUFD Jason Gunthorpe
                   ` (7 preceding siblings ...)
  2022-11-08  0:52 ` [PATCH v2 08/11] vfio-iommufd: Support iommufd for emulated " Jason Gunthorpe
@ 2022-11-08  0:52 ` Jason Gunthorpe
  2022-11-10  5:34   ` Tian, Kevin
  2022-11-11  4:13   ` Yi Liu
  2022-11-08  0:52 ` [PATCH v2 10/11] vfio: Make vfio_container optionally compiled Jason Gunthorpe
                   ` (4 subsequent siblings)
  13 siblings, 2 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2022-11-08  0:52 UTC (permalink / raw)
  To: Alexander Gordeev, David Airlie, Tony Krowiak, Alex Williamson,
	Christian Borntraeger, Cornelia Huck, Daniel Vetter,
	Diana Craciun, dri-devel, Eric Auger, Eric Farman,
	Harald Freudenberger, Vasily Gorbik, Heiko Carstens, intel-gfx,
	intel-gvt-dev, iommu, Jani Nikula, Jason Herne, Joonas Lahtinen,
	Joerg Roedel, Kevin Tian, kvm, linux-s390, Longfang Liu,
	Matthew Rosato, Peter Oberparleiter, Halil Pasic, Robin Murphy,
	Rodrigo Vivi, Shameer Kolothum, Sven Schnelle, Tvrtko Ursulin,
	Vineeth Vijayan, Will Deacon, Yishai Hadas, Zhenyu Wang,
	Zhi Wang
  Cc: Liu, Yi L, Nicolin Chen, Lu Baolu

The miscdev is in container.c, so should these related MODULE_ALIAS
statements. This is necessary for the next patch to be able to fully
disable /dev/vfio/vfio.

Fixes: cdc71fe4ecbf ("vfio: Move container code into drivers/vfio/container.c")
Reported-by: "Liu, Yi L" <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/container.c | 3 +++
 drivers/vfio/vfio_main.c | 2 --
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/container.c b/drivers/vfio/container.c
index 7f3961fd4b5aac..6b362d97d68220 100644
--- a/drivers/vfio/container.c
+++ b/drivers/vfio/container.c
@@ -608,3 +608,6 @@ void vfio_container_cleanup(void)
 	misc_deregister(&vfio_dev);
 	mutex_destroy(&vfio.iommu_drivers_lock);
 }
+
+MODULE_ALIAS_MISCDEV(VFIO_MINOR);
+MODULE_ALIAS("devname:vfio/vfio");
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index cf49c5200a4c05..ee09ccf4a608e1 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -2064,6 +2064,4 @@ MODULE_VERSION(DRIVER_VERSION);
 MODULE_LICENSE("GPL v2");
 MODULE_AUTHOR(DRIVER_AUTHOR);
 MODULE_DESCRIPTION(DRIVER_DESC);
-MODULE_ALIAS_MISCDEV(VFIO_MINOR);
-MODULE_ALIAS("devname:vfio/vfio");
 MODULE_SOFTDEP("post: vfio_iommu_type1 vfio_iommu_spapr_tce");
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 10/11] vfio: Make vfio_container optionally compiled
  2022-11-08  0:52 [PATCH v2 00/11] Connect VFIO to IOMMUFD Jason Gunthorpe
                   ` (8 preceding siblings ...)
  2022-11-08  0:52 ` [PATCH v2 09/11] vfio: Move container related MODULE_ALIAS statements into container.c Jason Gunthorpe
@ 2022-11-08  0:52 ` Jason Gunthorpe
  2022-11-08 22:28   ` Alex Williamson
  2022-11-08  0:52 ` [PATCH v2 11/11] iommufd: Allow iommufd to supply /dev/vfio/vfio Jason Gunthorpe
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 56+ messages in thread
From: Jason Gunthorpe @ 2022-11-08  0:52 UTC (permalink / raw)
  To: Alexander Gordeev, David Airlie, Tony Krowiak, Alex Williamson,
	Christian Borntraeger, Cornelia Huck, Daniel Vetter,
	Diana Craciun, dri-devel, Eric Auger, Eric Farman,
	Harald Freudenberger, Vasily Gorbik, Heiko Carstens, intel-gfx,
	intel-gvt-dev, iommu, Jani Nikula, Jason Herne, Joonas Lahtinen,
	Joerg Roedel, Kevin Tian, kvm, linux-s390, Longfang Liu,
	Matthew Rosato, Peter Oberparleiter, Halil Pasic, Robin Murphy,
	Rodrigo Vivi, Shameer Kolothum, Sven Schnelle, Tvrtko Ursulin,
	Vineeth Vijayan, Will Deacon, Yishai Hadas, Zhenyu Wang,
	Zhi Wang
  Cc: Liu, Yi L, Nicolin Chen, Lu Baolu

Add a kconfig CONFIG_VFIO_CONTAINER that controls compiling the container
code. If 'n' then only iommufd will provide the container service. All the
support for vfio iommu drivers, including type1, will not be built.

This allows a compilation check that no inappropriate dependencies between
the device/group and container have been created.

Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/Kconfig  | 35 +++++++++++++++--------
 drivers/vfio/Makefile |  4 +--
 drivers/vfio/vfio.h   | 65 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 91 insertions(+), 13 deletions(-)

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 1118d322eec97d..286c1663bd7564 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -3,8 +3,8 @@ menuconfig VFIO
 	tristate "VFIO Non-Privileged userspace driver framework"
 	select IOMMU_API
 	depends on IOMMUFD || !IOMMUFD
-	select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
 	select INTERVAL_TREE
+	select VFIO_CONTAINER if IOMMUFD=n
 	help
 	  VFIO provides a framework for secure userspace device drivers.
 	  See Documentation/driver-api/vfio.rst for more details.
@@ -12,6 +12,18 @@ menuconfig VFIO
 	  If you don't know what to do here, say N.
 
 if VFIO
+config VFIO_CONTAINER
+	bool "Support for the VFIO container /dev/vfio/vfio"
+	select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
+	default y
+	help
+	  The VFIO container is the classic interface to VFIO for establishing
+	  IOMMU mappings. If N is selected here then IOMMUFD must be used to
+	  manage the mappings.
+
+	  Unless testing IOMMUFD say Y here.
+
+if VFIO_CONTAINER
 config VFIO_IOMMU_TYPE1
 	tristate
 	default n
@@ -21,16 +33,6 @@ config VFIO_IOMMU_SPAPR_TCE
 	depends on SPAPR_TCE_IOMMU
 	default VFIO
 
-config VFIO_SPAPR_EEH
-	tristate
-	depends on EEH && VFIO_IOMMU_SPAPR_TCE
-	default VFIO
-
-config VFIO_VIRQFD
-	tristate
-	select EVENTFD
-	default n
-
 config VFIO_NOIOMMU
 	bool "VFIO No-IOMMU support"
 	help
@@ -44,6 +46,17 @@ config VFIO_NOIOMMU
 	  this mode since there is no IOMMU to provide DMA translation.
 
 	  If you don't know what to do here, say N.
+endif
+
+config VFIO_SPAPR_EEH
+	tristate
+	depends on EEH && VFIO_IOMMU_SPAPR_TCE
+	default VFIO
+
+config VFIO_VIRQFD
+	tristate
+	select EVENTFD
+	default n
 
 source "drivers/vfio/pci/Kconfig"
 source "drivers/vfio/platform/Kconfig"
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index 3863922529ef20..b953517dc70f99 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -4,9 +4,9 @@ vfio_virqfd-y := virqfd.o
 obj-$(CONFIG_VFIO) += vfio.o
 
 vfio-y += vfio_main.o \
-	  iova_bitmap.o \
-	  container.o
+	  iova_bitmap.o
 vfio-$(CONFIG_IOMMUFD) += iommufd.o
+vfio-$(CONFIG_VFIO_CONTAINER) += container.o
 
 obj-$(CONFIG_VFIO_VIRQFD) += vfio_virqfd.o
 obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index d57a08afb5cf5c..3378714a746274 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -55,7 +55,9 @@ struct vfio_group {
 	struct list_head		device_list;
 	struct mutex			device_lock;
 	struct list_head		vfio_next;
+#if IS_ENABLED(CONFIG_VFIO_CONTAINER)
 	struct list_head		container_next;
+#endif
 	enum vfio_group_type		type;
 	struct mutex			group_lock;
 	struct kvm			*kvm;
@@ -64,6 +66,7 @@ struct vfio_group {
 	struct iommufd_ctx		*iommufd;
 };
 
+#if IS_ENABLED(CONFIG_VFIO_CONTAINER)
 /* events for the backend driver notify callback */
 enum vfio_iommu_notify_type {
 	VFIO_IOMMU_CONTAINER_CLOSE = 0,
@@ -129,6 +132,68 @@ int vfio_container_dma_rw(struct vfio_container *container, dma_addr_t iova,
 
 int __init vfio_container_init(void);
 void vfio_container_cleanup(void);
+#else
+static inline struct vfio_container *
+vfio_container_from_file(struct file *filep)
+{
+	return NULL;
+}
+
+static inline int vfio_group_use_container(struct vfio_group *group)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline void vfio_group_unuse_container(struct vfio_group *group)
+{
+}
+
+static inline int vfio_container_attach_group(struct vfio_container *container,
+					      struct vfio_group *group)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline void vfio_group_detach_container(struct vfio_group *group)
+{
+}
+
+static inline void vfio_device_container_register(struct vfio_device *device)
+{
+}
+
+static inline void vfio_device_container_unregister(struct vfio_device *device)
+{
+}
+
+static inline int vfio_container_pin_pages(struct vfio_container *container,
+					   struct iommu_group *iommu_group,
+					   dma_addr_t iova, int npage, int prot,
+					   struct page **pages)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline void vfio_container_unpin_pages(struct vfio_container *container,
+					      dma_addr_t iova, int npage)
+{
+}
+
+static inline int vfio_container_dma_rw(struct vfio_container *container,
+					dma_addr_t iova, void *data, size_t len,
+					bool write)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline int vfio_container_init(void)
+{
+	return 0;
+}
+static inline void vfio_container_cleanup(void)
+{
+}
+#endif
 
 #if IS_ENABLED(CONFIG_IOMMUFD)
 int vfio_iommufd_bind(struct vfio_device *device, struct iommufd_ctx *ictx);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 11/11] iommufd: Allow iommufd to supply /dev/vfio/vfio
  2022-11-08  0:52 [PATCH v2 00/11] Connect VFIO to IOMMUFD Jason Gunthorpe
                   ` (9 preceding siblings ...)
  2022-11-08  0:52 ` [PATCH v2 10/11] vfio: Make vfio_container optionally compiled Jason Gunthorpe
@ 2022-11-08  0:52 ` Jason Gunthorpe
  2022-11-10  7:01   ` Tian, Kevin
  2022-11-11  4:16   ` Yi Liu
  2022-11-08  9:19 ` [PATCH v2 00/11] Connect VFIO to IOMMUFD Nicolin Chen
                   ` (2 subsequent siblings)
  13 siblings, 2 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2022-11-08  0:52 UTC (permalink / raw)
  To: Alexander Gordeev, David Airlie, Tony Krowiak, Alex Williamson,
	Christian Borntraeger, Cornelia Huck, Daniel Vetter,
	Diana Craciun, dri-devel, Eric Auger, Eric Farman,
	Harald Freudenberger, Vasily Gorbik, Heiko Carstens, intel-gfx,
	intel-gvt-dev, iommu, Jani Nikula, Jason Herne, Joonas Lahtinen,
	Joerg Roedel, Kevin Tian, kvm, linux-s390, Longfang Liu,
	Matthew Rosato, Peter Oberparleiter, Halil Pasic, Robin Murphy,
	Rodrigo Vivi, Shameer Kolothum, Sven Schnelle, Tvrtko Ursulin,
	Vineeth Vijayan, Will Deacon, Yishai Hadas, Zhenyu Wang,
	Zhi Wang
  Cc: Liu, Yi L, Nicolin Chen, Lu Baolu

If the VFIO container is compiled out, give a kconfig option for iommufd
to provide the miscdev node with the same name and permissions as vfio
uses.

The compatibility node supports the same ioctls as VFIO and automatically
enables the VFIO compatible pinned page accounting mode.

Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/iommufd/Kconfig | 12 ++++++++++++
 drivers/iommu/iommufd/main.c  | 36 +++++++++++++++++++++++++++++++++++
 2 files changed, 48 insertions(+)

diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig
index 399a2edeaef6de..f387f803dc6f7f 100644
--- a/drivers/iommu/iommufd/Kconfig
+++ b/drivers/iommu/iommufd/Kconfig
@@ -12,6 +12,18 @@ config IOMMUFD
 	  If you don't know what to do here, say N.
 
 if IOMMUFD
+config IOMMUFD_VFIO_CONTAINER
+	bool "IOMMUFD provides the VFIO container /dev/vfio/vfio"
+	depends on VFIO && !VFIO_CONTAINER
+	default VFIO && !VFIO_CONTAINER
+	help
+	  IOMMUFD will provide /dev/vfio/vfio instead of VFIO. This relies on
+	  IOMMUFD providing compatibility emulation to give the same ioctls.
+	  It provides an option to build a kernel with legacy VFIO components
+	  removed.
+
+	  Unless testing IOMMUFD say N here.
+
 config IOMMUFD_TEST
 	bool "IOMMU Userspace API Test support"
 	depends on RUNTIME_TESTING_MENU
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index ab3fa05f38505d..1eeb326f74f005 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -18,6 +18,7 @@
 #include <uapi/linux/iommufd.h>
 #include <linux/iommufd.h>
 
+#include "io_pagetable.h"
 #include "iommufd_private.h"
 #include "iommufd_test.h"
 
@@ -25,6 +26,7 @@ struct iommufd_object_ops {
 	void (*destroy)(struct iommufd_object *obj);
 };
 static const struct iommufd_object_ops iommufd_object_ops[];
+static struct miscdevice vfio_misc_dev;
 
 struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx,
 					     size_t size,
@@ -170,6 +172,16 @@ static int iommufd_fops_open(struct inode *inode, struct file *filp)
 	if (!ictx)
 		return -ENOMEM;
 
+	/*
+	 * For compatibility with VFIO when /dev/vfio/vfio is opened we default
+	 * to the same rlimit accounting as vfio uses.
+	 */
+	if (IS_ENABLED(CONFIG_IOMMUFD_VFIO_CONTAINER) &&
+	    filp->private_data == &vfio_misc_dev) {
+		ictx->account_mode = IOPT_PAGES_ACCOUNT_MM;
+		pr_info_once("IOMMUFD is providing /dev/vfio/vfio, not VFIO.\n");
+	}
+
 	xa_init_flags(&ictx->objects, XA_FLAGS_ALLOC1 | XA_FLAGS_ACCOUNT);
 	ictx->file = filp;
 	filp->private_data = ictx;
@@ -395,6 +407,15 @@ static struct miscdevice iommu_misc_dev = {
 	.mode = 0660,
 };
 
+
+static struct miscdevice vfio_misc_dev = {
+	.minor = VFIO_MINOR,
+	.name = "vfio",
+	.fops = &iommufd_fops,
+	.nodename = "vfio/vfio",
+	.mode = 0666,
+};
+
 static int __init iommufd_init(void)
 {
 	int ret;
@@ -402,18 +423,33 @@ static int __init iommufd_init(void)
 	ret = misc_register(&iommu_misc_dev);
 	if (ret)
 		return ret;
+
+	if (IS_ENABLED(CONFIG_IOMMUFD_VFIO_CONTAINER)) {
+		ret = misc_register(&vfio_misc_dev);
+		if (ret)
+			goto err_misc;
+	}
 	iommufd_test_init();
 	return 0;
+err_misc:
+	misc_deregister(&iommu_misc_dev);
+	return ret;
 }
 
 static void __exit iommufd_exit(void)
 {
 	iommufd_test_exit();
+	if (IS_ENABLED(CONFIG_IOMMUFD_VFIO_CONTAINER))
+		misc_deregister(&vfio_misc_dev);
 	misc_deregister(&iommu_misc_dev);
 }
 
 module_init(iommufd_init);
 module_exit(iommufd_exit);
 
+#if IS_ENABLED(CONFIG_IOMMUFD_VFIO_CONTAINER)
+MODULE_ALIAS_MISCDEV(VFIO_MINOR);
+MODULE_ALIAS("devname:vfio/vfio");
+#endif
 MODULE_DESCRIPTION("I/O Address Space Management for passthrough devices");
 MODULE_LICENSE("GPL");
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 07/11] vfio-iommufd: Support iommufd for physical VFIO devices
  2022-11-08  0:52 ` [PATCH v2 07/11] vfio-iommufd: Support iommufd for physical VFIO devices Jason Gunthorpe
@ 2022-11-08  6:10   ` Nicolin Chen
  2022-11-08  7:41     ` Yi Liu
  2022-11-08 17:48     ` Jason Gunthorpe
  2022-11-10  3:11   ` Tian, Kevin
  2022-11-11  4:12   ` Yi Liu
  2 siblings, 2 replies; 56+ messages in thread
From: Nicolin Chen @ 2022-11-08  6:10 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, Kevin Tian, dri-devel, Vineeth Vijayan, Diana Craciun,
	Alexander Gordeev, Longfang Liu, linux-s390, Liu, Yi L,
	Matthew Rosato, Will Deacon, Joerg Roedel, Halil Pasic, iommu,
	Christian Borntraeger, intel-gfx, Zhi Wang, Jason Herne,
	Eric Farman, Vasily Gorbik, Heiko Carstens, Eric Auger,
	Alex Williamson, Harald Freudenberger, Rodrigo Vivi,
	intel-gvt-dev, Tony Krowiak, Tvrtko Ursulin, Yishai Hadas,
	Cornelia Huck, Peter Oberparleiter, Shameer Kolothum,
	Sven Schnelle, Robin Murphy, Lu Baolu

On Mon, Nov 07, 2022 at 08:52:51PM -0400, Jason Gunthorpe wrote:

> @@ -795,6 +800,10 @@ static int vfio_device_first_open(struct vfio_device *device)
>  		ret = vfio_group_use_container(device->group);
>  		if (ret)
>  			goto err_module_put;
> +	} else if (device->group->iommufd) {
> +		ret = vfio_iommufd_bind(device, device->group->iommufd);

Here we check device->group->iommufd...

> +		if (ret)
> +			goto err_module_put;
>  	}
>  
>  	device->kvm = device->group->kvm;
> @@ -812,6 +821,7 @@ static int vfio_device_first_open(struct vfio_device *device)
>  	device->kvm = NULL;
>  	if (device->group->container)
>  		vfio_group_unuse_container(device->group);
> +	vfio_iommufd_unbind(device);

...yet, missing here, which could result in kernel oops.

Should probably add something similar:
+	if (device->group->iommufd)
+		vfio_iommufd_unbind(device);

Or should check !vdev->iommufd_device inside the ->unbind.

>  err_module_put:
>  	mutex_unlock(&device->group->group_lock);
>  	module_put(device->dev->driver->owner);
> @@ -830,6 +840,7 @@ static void vfio_device_last_close(struct vfio_device *device)
>  	device->kvm = NULL;
>  	if (device->group->container)
>  		vfio_group_unuse_container(device->group);
> +	vfio_iommufd_unbind(device);

Ditto

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 07/11] vfio-iommufd: Support iommufd for physical VFIO devices
  2022-11-08  6:10   ` Nicolin Chen
@ 2022-11-08  7:41     ` Yi Liu
  2022-11-08 17:51       ` Jason Gunthorpe
  2022-11-08 17:48     ` Jason Gunthorpe
  1 sibling, 1 reply; 56+ messages in thread
From: Yi Liu @ 2022-11-08  7:41 UTC (permalink / raw)
  To: Nicolin Chen, Jason Gunthorpe
  Cc: kvm, Kevin Tian, dri-devel, Vineeth Vijayan, Diana Craciun,
	Alexander Gordeev, Longfang Liu, linux-s390, Matthew Rosato,
	Will Deacon, Joerg Roedel, Halil Pasic, iommu,
	Christian Borntraeger, intel-gfx, Zhi Wang, Jason Herne,
	Eric Farman, Vasily Gorbik, Heiko Carstens, Eric Auger,
	Alex Williamson, Harald Freudenberger, Rodrigo Vivi,
	intel-gvt-dev, Tony Krowiak, Tvrtko Ursulin, Yishai Hadas,
	Cornelia Huck, Peter Oberparleiter, Shameer Kolothum,
	Sven Schnelle, Robin Murphy, Lu Baolu

On 2022/11/8 14:10, Nicolin Chen wrote:
> On Mon, Nov 07, 2022 at 08:52:51PM -0400, Jason Gunthorpe wrote:
> 
>> @@ -795,6 +800,10 @@ static int vfio_device_first_open(struct vfio_device *device)
>>   		ret = vfio_group_use_container(device->group);
>>   		if (ret)
>>   			goto err_module_put;
>> +	} else if (device->group->iommufd) {
>> +		ret = vfio_iommufd_bind(device, device->group->iommufd);
> 
> Here we check device->group->iommufd...
> 
>> +		if (ret)
>> +			goto err_module_put;
>>   	}
>>   
>>   	device->kvm = device->group->kvm;
>> @@ -812,6 +821,7 @@ static int vfio_device_first_open(struct vfio_device *device)
>>   	device->kvm = NULL;
>>   	if (device->group->container)
>>   		vfio_group_unuse_container(device->group);
>> +	vfio_iommufd_unbind(device);
> 
> ...yet, missing here, which could result in kernel oops.
> 
> Should probably add something similar:
> +	if (device->group->iommufd)
> +		vfio_iommufd_unbind(device);
> 
> Or should check !vdev->iommufd_device inside the ->unbind.

this check was in prior version, but removed in this version. any
special reason? Jason?

> 
>>   err_module_put:
>>   	mutex_unlock(&device->group->group_lock);
>>   	module_put(device->dev->driver->owner);
>> @@ -830,6 +840,7 @@ static void vfio_device_last_close(struct vfio_device *device)
>>   	device->kvm = NULL;
>>   	if (device->group->container)
>>   		vfio_group_unuse_container(device->group);
>> +	vfio_iommufd_unbind(device);
> 
> Ditto

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 00/11] Connect VFIO to IOMMUFD
  2022-11-08  0:52 [PATCH v2 00/11] Connect VFIO to IOMMUFD Jason Gunthorpe
                   ` (10 preceding siblings ...)
  2022-11-08  0:52 ` [PATCH v2 11/11] iommufd: Allow iommufd to supply /dev/vfio/vfio Jason Gunthorpe
@ 2022-11-08  9:19 ` Nicolin Chen
  2022-11-08 15:18   ` Yi Liu
  2022-11-09  9:03 ` Tian, Kevin
  2022-11-11  3:01 ` Matthew Rosato
  13 siblings, 1 reply; 56+ messages in thread
From: Nicolin Chen @ 2022-11-08  9:19 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, Kevin Tian, dri-devel, Vineeth Vijayan, Diana Craciun,
	Alexander Gordeev, Longfang Liu, linux-s390, Liu, Yi L,
	Matthew Rosato, Will Deacon, Joerg Roedel, Halil Pasic, iommu,
	Christian Borntraeger, intel-gfx, Zhi Wang, Jason Herne,
	Eric Farman, Vasily Gorbik, Heiko Carstens, Eric Auger,
	Alex Williamson, Harald Freudenberger, Rodrigo Vivi,
	intel-gvt-dev, Tony Krowiak, Tvrtko Ursulin, Yishai Hadas,
	Cornelia Huck, Peter Oberparleiter, Shameer Kolothum,
	Sven Schnelle, Robin Murphy, Lu Baolu

On Mon, Nov 07, 2022 at 08:52:44PM -0400, Jason Gunthorpe wrote:

> This is on github: https://github.com/jgunthorpe/linux/commits/vfio_iommufd
[...]
> v2:
>  - Rebase to v6.1-rc3, v4 iommufd series
>  - Fixup comments and commit messages from list remarks
>  - Fix leaking of the iommufd for mdevs
>  - New patch to fix vfio modaliases when vfio container is disabled
>  - Add a dmesg once when the iommufd provided /dev/vfio/vfio is opened
>    to signal that iommufd is providing this

I've redone my previous sanity tests. Except those reported bugs,
things look fine. Once we fix those issues, GVT and other modules
can run some more stressful tests, I think.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 00/11] Connect VFIO to IOMMUFD
  2022-11-08  9:19 ` [PATCH v2 00/11] Connect VFIO to IOMMUFD Nicolin Chen
@ 2022-11-08 15:18   ` Yi Liu
  2022-11-09 16:57     ` Jason Gunthorpe
  0 siblings, 1 reply; 56+ messages in thread
From: Yi Liu @ 2022-11-08 15:18 UTC (permalink / raw)
  To: Nicolin Chen, Jason Gunthorpe
  Cc: kvm, Kevin Tian, dri-devel, Vineeth Vijayan, Diana Craciun,
	Alexander Gordeev, Longfang Liu, linux-s390, Matthew Rosato,
	Will Deacon, Joerg Roedel, Halil Pasic, iommu,
	Christian Borntraeger, intel-gfx, Zhi Wang, Jason Herne,
	Eric Farman, Vasily Gorbik, Heiko Carstens, Eric Auger,
	Alex Williamson, Harald Freudenberger, Rodrigo Vivi,
	intel-gvt-dev, Tony Krowiak, Tvrtko Ursulin, Yishai Hadas,
	Cornelia Huck, Peter Oberparleiter, Shameer Kolothum,
	Sven Schnelle, Robin Murphy, Lu Baolu

On 2022/11/8 17:19, Nicolin Chen wrote:
> On Mon, Nov 07, 2022 at 08:52:44PM -0400, Jason Gunthorpe wrote:
> 
>> This is on github: https://github.com/jgunthorpe/linux/commits/vfio_iommufd
> [...]
>> v2:
>>   - Rebase to v6.1-rc3, v4 iommufd series
>>   - Fixup comments and commit messages from list remarks
>>   - Fix leaking of the iommufd for mdevs
>>   - New patch to fix vfio modaliases when vfio container is disabled
>>   - Add a dmesg once when the iommufd provided /dev/vfio/vfio is opened
>>     to signal that iommufd is providing this
> 
> I've redone my previous sanity tests. Except those reported bugs,
> things look fine. Once we fix those issues, GVT and other modules
> can run some more stressful tests, I think.

our side is also starting test (gvt, nic passthrough) this version. need to
wait a while for the result.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 07/11] vfio-iommufd: Support iommufd for physical VFIO devices
  2022-11-08  6:10   ` Nicolin Chen
  2022-11-08  7:41     ` Yi Liu
@ 2022-11-08 17:48     ` Jason Gunthorpe
  1 sibling, 0 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2022-11-08 17:48 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: kvm, Kevin Tian, dri-devel, Vineeth Vijayan, Diana Craciun,
	Alexander Gordeev, Longfang Liu, linux-s390, Liu, Yi L,
	Matthew Rosato, Will Deacon, Joerg Roedel, Halil Pasic, iommu,
	Christian Borntraeger, intel-gfx, Zhi Wang, Jason Herne,
	Eric Farman, Vasily Gorbik, Heiko Carstens, Eric Auger,
	Alex Williamson, Harald Freudenberger, Rodrigo Vivi,
	intel-gvt-dev, Tony Krowiak, Tvrtko Ursulin, Yishai Hadas,
	Cornelia Huck, Peter Oberparleiter, Shameer Kolothum,
	Sven Schnelle, Robin Murphy, Lu Baolu

On Mon, Nov 07, 2022 at 10:10:59PM -0800, Nicolin Chen wrote:

> > @@ -812,6 +821,7 @@ static int vfio_device_first_open(struct vfio_device *device)
> >  	device->kvm = NULL;
> >  	if (device->group->container)
> >  		vfio_group_unuse_container(device->group);
> > +	vfio_iommufd_unbind(device);
> 
> ...yet, missing here, which could result in kernel oops.
> 
> Should probably add something similar:
> +	if (device->group->iommufd)
> +		vfio_iommufd_unbind(device);
> 
> Or should check !vdev->iommufd_device inside the ->unbind.

Lets keep it symmetric since the container is checked:

@@ -821,7 +821,8 @@ static int vfio_device_first_open(struct vfio_device *device)
        device->kvm = NULL;
        if (device->group->container)
                vfio_group_unuse_container(device->group);
-       vfio_iommufd_unbind(device);
+       else if (device->group->iommufd)
+               vfio_iommufd_unbind(device);
 err_module_put:
        mutex_unlock(&device->group->group_lock);
        module_put(device->dev->driver->owner);
@@ -840,7 +841,8 @@ static void vfio_device_last_close(struct vfio_device *device)
        device->kvm = NULL;
        if (device->group->container)
                vfio_group_unuse_container(device->group);
-       vfio_iommufd_unbind(device);
+       else if (device->group->iommufd)
+               vfio_iommufd_unbind(device);
        mutex_unlock(&device->group->group_lock);
        module_put(device->dev->driver->owner);

Thanks,
Jason

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 07/11] vfio-iommufd: Support iommufd for physical VFIO devices
  2022-11-08  7:41     ` Yi Liu
@ 2022-11-08 17:51       ` Jason Gunthorpe
  2022-11-10  3:12         ` Tian, Kevin
  0 siblings, 1 reply; 56+ messages in thread
From: Jason Gunthorpe @ 2022-11-08 17:51 UTC (permalink / raw)
  To: Yi Liu
  Cc: kvm, Kevin Tian, dri-devel, Vineeth Vijayan, Diana Craciun,
	Alexander Gordeev, Longfang Liu, linux-s390, Matthew Rosato,
	Will Deacon, Joerg Roedel, Halil Pasic, iommu, Nicolin Chen,
	Christian Borntraeger, intel-gfx, Zhi Wang, Jason Herne,
	Eric Farman, Vasily Gorbik, Heiko Carstens, Eric Auger,
	Alex Williamson, Harald Freudenberger, Rodrigo Vivi,
	intel-gvt-dev, Tony Krowiak, Tvrtko Ursulin, Yishai Hadas,
	Cornelia Huck, Peter Oberparleiter, Shameer Kolothum,
	Sven Schnelle, Robin Murphy, Lu Baolu

On Tue, Nov 08, 2022 at 03:41:25PM +0800, Yi Liu wrote:
> On 2022/11/8 14:10, Nicolin Chen wrote:
> > On Mon, Nov 07, 2022 at 08:52:51PM -0400, Jason Gunthorpe wrote:
> > 
> > > @@ -795,6 +800,10 @@ static int vfio_device_first_open(struct vfio_device *device)
> > >   		ret = vfio_group_use_container(device->group);
> > >   		if (ret)
> > >   			goto err_module_put;
> > > +	} else if (device->group->iommufd) {
> > > +		ret = vfio_iommufd_bind(device, device->group->iommufd);
> > 
> > Here we check device->group->iommufd...
> > 
> > > +		if (ret)
> > > +			goto err_module_put;
> > >   	}
> > >   	device->kvm = device->group->kvm;
> > > @@ -812,6 +821,7 @@ static int vfio_device_first_open(struct vfio_device *device)
> > >   	device->kvm = NULL;
> > >   	if (device->group->container)
> > >   		vfio_group_unuse_container(device->group);
> > > +	vfio_iommufd_unbind(device);
> > 
> > ...yet, missing here, which could result in kernel oops.
> > 
> > Should probably add something similar:
> > +	if (device->group->iommufd)
> > +		vfio_iommufd_unbind(device);
> > 
> > Or should check !vdev->iommufd_device inside the ->unbind.
> 
> this check was in prior version, but removed in this version. any
> special reason? Jason?

Oooh, this makes more sense - Kevin pointed out the check was wrong:

> > +void vfio_iommufd_unbind(struct vfio_device *vdev)
> > +{
> > +	lockdep_assert_held(&vdev->dev_set->lock);
> > +
> > +	if (!vdev->iommufd_device)
> > +		return;

> there is no iommufd_device in the emulated path...

And he is right, so I dropped it. But really the check was just
misspelled, it was supposed to be "device->group->iommufd" because the
caller assumed it.

Still, I think the right way to fix it is to lift the check as we
don't touch group->iommufd in iommufd.c

Thanks,
Jason

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 10/11] vfio: Make vfio_container optionally compiled
  2022-11-08  0:52 ` [PATCH v2 10/11] vfio: Make vfio_container optionally compiled Jason Gunthorpe
@ 2022-11-08 22:28   ` Alex Williamson
  2022-11-09  0:54     ` Jason Gunthorpe
  0 siblings, 1 reply; 56+ messages in thread
From: Alex Williamson @ 2022-11-08 22:28 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, Kevin Tian, dri-devel, Vineeth Vijayan, Diana Craciun,
	Alexander Gordeev, Longfang Liu, linux-s390, Liu, Yi L,
	Matthew Rosato, Will Deacon, Joerg Roedel, Halil Pasic, iommu,
	Nicolin Chen, Christian Borntraeger, intel-gfx, Zhi Wang,
	Jason Herne, Eric Farman, Vasily Gorbik, Heiko Carstens,
	Eric Auger, Harald Freudenberger, Rodrigo Vivi, intel-gvt-dev,
	Tony Krowiak, Tvrtko Ursulin, Yishai Hadas, Cornelia Huck,
	Peter Oberparleiter, Shameer Kolothum, Sven Schnelle,
	Robin Murphy, Lu Baolu

On Mon,  7 Nov 2022 20:52:54 -0400
Jason Gunthorpe <jgg@nvidia.com> wrote:

> Add a kconfig CONFIG_VFIO_CONTAINER that controls compiling the container
> code. If 'n' then only iommufd will provide the container service. All the
> support for vfio iommu drivers, including type1, will not be built.
> 
> This allows a compilation check that no inappropriate dependencies between
> the device/group and container have been created.
> 
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/vfio/Kconfig  | 35 +++++++++++++++--------
>  drivers/vfio/Makefile |  4 +--
>  drivers/vfio/vfio.h   | 65 +++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 91 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> index 1118d322eec97d..286c1663bd7564 100644
> --- a/drivers/vfio/Kconfig
> +++ b/drivers/vfio/Kconfig
> @@ -3,8 +3,8 @@ menuconfig VFIO
>  	tristate "VFIO Non-Privileged userspace driver framework"
>  	select IOMMU_API
>  	depends on IOMMUFD || !IOMMUFD
> -	select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
>  	select INTERVAL_TREE
> +	select VFIO_CONTAINER if IOMMUFD=n
>  	help
>  	  VFIO provides a framework for secure userspace device drivers.
>  	  See Documentation/driver-api/vfio.rst for more details.
> @@ -12,6 +12,18 @@ menuconfig VFIO
>  	  If you don't know what to do here, say N.
>  
>  if VFIO
> +config VFIO_CONTAINER
> +	bool "Support for the VFIO container /dev/vfio/vfio"
> +	select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
> +	default y
> +	help
> +	  The VFIO container is the classic interface to VFIO for establishing
> +	  IOMMU mappings. If N is selected here then IOMMUFD must be used to
> +	  manage the mappings.
> +
> +	  Unless testing IOMMUFD say Y here.
> +
> +if VFIO_CONTAINER
>  config VFIO_IOMMU_TYPE1
>  	tristate
>  	default n
> @@ -21,16 +33,6 @@ config VFIO_IOMMU_SPAPR_TCE
>  	depends on SPAPR_TCE_IOMMU
>  	default VFIO
>  
> -config VFIO_SPAPR_EEH
> -	tristate
> -	depends on EEH && VFIO_IOMMU_SPAPR_TCE
> -	default VFIO
> -
> -config VFIO_VIRQFD
> -	tristate
> -	select EVENTFD
> -	default n
> -
>  config VFIO_NOIOMMU
>  	bool "VFIO No-IOMMU support"
>  	help


Perhaps this should have been obvious, but I'm realizing that
vfio-noiommu mode is completely missing without VFIO_CONTAINER, which
seems a barrier to deprecating VFIO_CONTAINER and perhaps makes it a
question whether IOMMUFD should really be taking over /dev/vfio/vfio.
No-iommu mode has users.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 10/11] vfio: Make vfio_container optionally compiled
  2022-11-08 22:28   ` Alex Williamson
@ 2022-11-09  0:54     ` Jason Gunthorpe
  2022-11-09 17:18       ` Alex Williamson
  0 siblings, 1 reply; 56+ messages in thread
From: Jason Gunthorpe @ 2022-11-09  0:54 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, Kevin Tian, dri-devel, Vineeth Vijayan, Diana Craciun,
	Alexander Gordeev, Longfang Liu, linux-s390, Liu, Yi L,
	Matthew Rosato, Will Deacon, Joerg Roedel, Halil Pasic, iommu,
	Nicolin Chen, Christian Borntraeger, intel-gfx, Zhi Wang,
	Jason Herne, Eric Farman, Vasily Gorbik, Heiko Carstens,
	Eric Auger, Harald Freudenberger, Rodrigo Vivi, intel-gvt-dev,
	Tony Krowiak, Tvrtko Ursulin, Yishai Hadas, Cornelia Huck,
	Peter Oberparleiter, Shameer Kolothum, Sven Schnelle,
	Robin Murphy, Lu Baolu

On Tue, Nov 08, 2022 at 03:28:31PM -0700, Alex Williamson wrote:

> Perhaps this should have been obvious, but I'm realizing that
> vfio-noiommu mode is completely missing without VFIO_CONTAINER, which
> seems a barrier to deprecating VFIO_CONTAINER and perhaps makes it a

Yes, it is the same as the allow_unsafe_interrupts - it is something
that currently goes missing if you turn off VFIO_CONTAINER.

This seems straightforward enough to resolve in a followup, we mostly
just need someone with an existing no-iommu application to test
compatability against. Keeping it working with the device cdev will
also be a bit interesting. If you have or know about some application
I can try to make a patch.

> question whether IOMMUFD should really be taking over /dev/vfio/vfio.
> No-iommu mode has users.  

I view VFIO_CONTAINER=n as a process. An aspiration we can work
toward.

At this point there are few places that might want to use it. Android
perhaps, for example. It is also useful for testing. One of the main
values is you can switch the options and feed the kernel into an
existing test environment and see what happens. This is how we are
able to quickly get s390 mdev testing, for instance.

We are not going to get to a widely useful VFIO_CONTAINER=n if we
don't have a target that people can test against and evaluate what
compatability gaps may exist.

So, everytime we find something like this - let's think about how can
we make iommufd compatibility handle it and not jump straight to
giving up :)

I'm kind of thinking v6.4 might be a reasonable kernel target when we
might have closed off enough things.

Jason

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v2 00/11] Connect VFIO to IOMMUFD
  2022-11-08  0:52 [PATCH v2 00/11] Connect VFIO to IOMMUFD Jason Gunthorpe
                   ` (11 preceding siblings ...)
  2022-11-08  9:19 ` [PATCH v2 00/11] Connect VFIO to IOMMUFD Nicolin Chen
@ 2022-11-09  9:03 ` Tian, Kevin
  2022-11-09 12:48   ` Jason Gunthorpe
  2022-11-11  3:01 ` Matthew Rosato
  13 siblings, 1 reply; 56+ messages in thread
From: Tian, Kevin @ 2022-11-09  9:03 UTC (permalink / raw)
  To: Jason Gunthorpe, Alexander Gordeev, David Airlie, Tony Krowiak,
	Alex Williamson, Christian Borntraeger, Cornelia Huck,
	Daniel Vetter, Diana Craciun, dri-devel, Eric Auger, Eric Farman,
	Harald Freudenberger, Vasily Gorbik, Heiko Carstens, intel-gfx,
	intel-gvt-dev, iommu, Jani Nikula, Jason Herne, Joonas Lahtinen,
	Joerg Roedel, kvm, linux-s390, Longfang Liu, Matthew Rosato,
	Peter Oberparleiter, Halil Pasic, Robin Murphy, Vivi, Rodrigo,
	Shameer Kolothum, Sven Schnelle, Tvrtko Ursulin, Vineeth Vijayan,
	Will Deacon, Yishai Hadas, Zhenyu Wang, Wang, Zhi A
  Cc: Liu, Yi L, Nicolin Chen, Lu Baolu

every mail in this series is shown thrice in lore:

https://lore.kernel.org/all/0-v2-65016290f146+33e-vfio_iommufd_jgg@nvidia.com/

not sure what caused it but it's annoying to check the conversation there.

the iommufd series doesn't have this problem.

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, November 8, 2022 8:53 AM
> 
> This series provides an alternative container layer for VFIO implemented
> using iommufd. This is optional, if CONFIG_IOMMUFD is not set then it will
> not be compiled in.
> 
> At this point iommufd can be injected by passing in a iommfd FD to
> VFIO_GROUP_SET_CONTAINER which will use the VFIO compat layer in
> iommufd
> to obtain the compat IOAS and then connect up all the VFIO drivers as
> appropriate.
> 
> This is temporary stopping point, a following series will provide a way to
> directly open a VFIO device FD and directly connect it to IOMMUFD using
> native ioctls that can expose the IOMMUFD features like hwpt, future
> vPASID and dynamic attachment.
> 
> This series, in compat mode, has passed all the qemu tests we have
> available, including the test suites for the Intel GVT mdev. Aside from
> the temporary limitation with P2P memory this is belived to be fully
> compatible with VFIO.
> 
> This is on github:
> https://github.com/jgunthorpe/linux/commits/vfio_iommufd
> 
> It requires the iommufd series:
> 
> https://lore.kernel.org/r/0-v4-0de2f6c78ed0+9d1-iommufd_jgg@nvidia.com
> 
> v2:
>  - Rebase to v6.1-rc3, v4 iommufd series
>  - Fixup comments and commit messages from list remarks
>  - Fix leaking of the iommufd for mdevs
>  - New patch to fix vfio modaliases when vfio container is disabled
>  - Add a dmesg once when the iommufd provided /dev/vfio/vfio is opened
>    to signal that iommufd is providing this
> v1: https://lore.kernel.org/r/0-v1-4991695894d8+211-
> vfio_iommufd_jgg@nvidia.com
> 
> Jason Gunthorpe (11):
>   vfio: Move vfio_device driver open/close code to a function
>   vfio: Move vfio_device_assign_container() into
>     vfio_device_first_open()
>   vfio: Rename vfio_device_assign/unassign_container()
>   vfio: Move storage of allow_unsafe_interrupts to vfio_main.c
>   vfio: Use IOMMU_CAP_ENFORCE_CACHE_COHERENCY for
>     vfio_file_enforced_coherent()
>   vfio-iommufd: Allow iommufd to be used in place of a container fd
>   vfio-iommufd: Support iommufd for physical VFIO devices
>   vfio-iommufd: Support iommufd for emulated VFIO devices
>   vfio: Move container related MODULE_ALIAS statements into container.c
>   vfio: Make vfio_container optionally compiled
>   iommufd: Allow iommufd to supply /dev/vfio/vfio
> 
>  drivers/gpu/drm/i915/gvt/kvmgt.c              |   3 +
>  drivers/iommu/iommufd/Kconfig                 |  12 +
>  drivers/iommu/iommufd/main.c                  |  36 ++
>  drivers/s390/cio/vfio_ccw_ops.c               |   3 +
>  drivers/s390/crypto/vfio_ap_ops.c             |   3 +
>  drivers/vfio/Kconfig                          |  36 +-
>  drivers/vfio/Makefile                         |   5 +-
>  drivers/vfio/container.c                      | 141 ++------
>  drivers/vfio/fsl-mc/vfio_fsl_mc.c             |   3 +
>  drivers/vfio/iommufd.c                        | 157 ++++++++
>  .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    |   6 +
>  drivers/vfio/pci/mlx5/main.c                  |   3 +
>  drivers/vfio/pci/vfio_pci.c                   |   3 +
>  drivers/vfio/platform/vfio_amba.c             |   3 +
>  drivers/vfio/platform/vfio_platform.c         |   3 +
>  drivers/vfio/vfio.h                           | 100 +++++-
>  drivers/vfio/vfio_iommu_type1.c               |   5 +-
>  drivers/vfio/vfio_main.c                      | 338 ++++++++++++++----
>  include/linux/vfio.h                          |  39 ++
>  19 files changed, 700 insertions(+), 199 deletions(-)
>  create mode 100644 drivers/vfio/iommufd.c
> 
> 
> base-commit: ca3067007d4f2aa7f3a5375bd256839e08a09453
> --
> 2.38.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 00/11] Connect VFIO to IOMMUFD
  2022-11-09  9:03 ` Tian, Kevin
@ 2022-11-09 12:48   ` Jason Gunthorpe
  2022-11-10  2:16     ` Tian, Kevin
  0 siblings, 1 reply; 56+ messages in thread
From: Jason Gunthorpe @ 2022-11-09 12:48 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: kvm, dri-devel, Vineeth Vijayan, Diana Craciun,
	Alexander Gordeev, Longfang Liu, linux-s390, Liu, Yi L,
	Matthew Rosato, Will Deacon, Joerg Roedel, Halil Pasic, iommu,
	Nicolin Chen, Christian Borntraeger, intel-gfx, Wang, Zhi A,
	Jason Herne, Eric Farman, Vasily Gorbik, Heiko Carstens,
	Eric Auger, Alex Williamson, Harald Freudenberger, Vivi, Rodrigo,
	intel-gvt-dev, Tony Krowiak, Tvrtko Ursulin, Yishai Hadas,
	Cornelia Huck, Peter Oberparleiter, Shameer Kolothum,
	Sven Schnelle, Robin Murphy, Lu Baolu

On Wed, Nov 09, 2022 at 09:03:52AM +0000, Tian, Kevin wrote:
> every mail in this series is shown thrice in lore:
> 
> https://lore.kernel.org/all/0-v2-65016290f146+33e-vfio_iommufd_jgg@nvidia.com/
> 
> not sure what caused it but it's annoying to check the conversation there.

It is sort of a lore issue, it only combines messages that are exactly
the same together. Several of the mailing lists on CC here mangle the
message in various ways, eg adding trailer or whatever. This causes
repeated messages in lore.

The trick in lore is to replace "/all/" with a good list, like /kvm/
or /linux-iommu/ that shows the original non-mangled version, and only
once.

Jason

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 00/11] Connect VFIO to IOMMUFD
  2022-11-08 15:18   ` Yi Liu
@ 2022-11-09 16:57     ` Jason Gunthorpe
  2022-11-14 12:51       ` Yi Liu
  2022-11-15  1:16       ` Matthew Rosato
  0 siblings, 2 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2022-11-09 16:57 UTC (permalink / raw)
  To: Yi Liu
  Cc: kvm, Kevin Tian, dri-devel, Vineeth Vijayan, Diana Craciun,
	Alexander Gordeev, Longfang Liu, linux-s390, Matthew Rosato,
	Will Deacon, Joerg Roedel, Halil Pasic, iommu, Nicolin Chen,
	Christian Borntraeger, intel-gfx, Zhi Wang, Jason Herne,
	Eric Farman, Vasily Gorbik, Heiko Carstens, Eric Auger,
	Alex Williamson, Harald Freudenberger, Rodrigo Vivi,
	intel-gvt-dev, Tony Krowiak, Tvrtko Ursulin, Yishai Hadas,
	Cornelia Huck, Peter Oberparleiter, Shameer Kolothum,
	Sven Schnelle, Robin Murphy, Lu Baolu

On Tue, Nov 08, 2022 at 11:18:03PM +0800, Yi Liu wrote:
> On 2022/11/8 17:19, Nicolin Chen wrote:
> > On Mon, Nov 07, 2022 at 08:52:44PM -0400, Jason Gunthorpe wrote:
> > 
> > > This is on github: https://github.com/jgunthorpe/linux/commits/vfio_iommufd
> > [...]
> > > v2:
> > >   - Rebase to v6.1-rc3, v4 iommufd series
> > >   - Fixup comments and commit messages from list remarks
> > >   - Fix leaking of the iommufd for mdevs
> > >   - New patch to fix vfio modaliases when vfio container is disabled
> > >   - Add a dmesg once when the iommufd provided /dev/vfio/vfio is opened
> > >     to signal that iommufd is providing this
> > 
> > I've redone my previous sanity tests. Except those reported bugs,
> > things look fine. Once we fix those issues, GVT and other modules
> > can run some more stressful tests, I think.
> 
> our side is also starting test (gvt, nic passthrough) this version. need to
> wait a while for the result.

I've updated the branches with the two functional fixes discussed on
the list plus all the doc updates.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 10/11] vfio: Make vfio_container optionally compiled
  2022-11-09  0:54     ` Jason Gunthorpe
@ 2022-11-09 17:18       ` Alex Williamson
  2022-11-09 19:52         ` Jason Gunthorpe
  0 siblings, 1 reply; 56+ messages in thread
From: Alex Williamson @ 2022-11-09 17:18 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, Kevin Tian, dri-devel, Vineeth Vijayan, Diana Craciun,
	Alexander Gordeev, Longfang Liu, linux-s390, Liu, Yi L,
	Matthew Rosato, Will Deacon, Joerg Roedel, Halil Pasic, iommu,
	Nicolin Chen, Christian Borntraeger, intel-gfx, Zhi Wang,
	Jason Herne, Eric Farman, Vasily Gorbik, Heiko Carstens,
	Eric Auger, Harald Freudenberger, Rodrigo Vivi, intel-gvt-dev,
	Tony Krowiak, Tvrtko Ursulin, Yishai Hadas, Cornelia Huck,
	Peter Oberparleiter, Shameer Kolothum, Sven Schnelle,
	Robin Murphy, Lu Baolu

On Tue, 8 Nov 2022 20:54:58 -0400
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Tue, Nov 08, 2022 at 03:28:31PM -0700, Alex Williamson wrote:
> 
> > Perhaps this should have been obvious, but I'm realizing that
> > vfio-noiommu mode is completely missing without VFIO_CONTAINER, which
> > seems a barrier to deprecating VFIO_CONTAINER and perhaps makes it a  
> 
> Yes, it is the same as the allow_unsafe_interrupts - it is something
> that currently goes missing if you turn off VFIO_CONTAINER.
> 
> This seems straightforward enough to resolve in a followup, we mostly
> just need someone with an existing no-iommu application to test
> compatability against. Keeping it working with the device cdev will
> also be a bit interesting. If you have or know about some application
> I can try to make a patch.

DPDK supports no-iommu mode.

> > question whether IOMMUFD should really be taking over /dev/vfio/vfio.
> > No-iommu mode has users.    
> 
> I view VFIO_CONTAINER=n as a process. An aspiration we can work
> toward.
> 
> At this point there are few places that might want to use it. Android
> perhaps, for example. It is also useful for testing. One of the main
> values is you can switch the options and feed the kernel into an
> existing test environment and see what happens. This is how we are
> able to quickly get s390 mdev testing, for instance.
> 
> We are not going to get to a widely useful VFIO_CONTAINER=n if we
> don't have a target that people can test against and evaluate what
> compatability gaps may exist.
> 
> So, everytime we find something like this - let's think about how can
> we make iommufd compatibility handle it and not jump straight to
> giving up :)
> 
> I'm kind of thinking v6.4 might be a reasonable kernel target when we
> might have closed off enough things.

I agree that it's very useful for testing, I'm certainly not suggesting
to give up, but I'm not sure where no-iommu lives when iommufd owns
/dev/vfio/vfio.  Given the unsafe interrupts discussion, it doesn't
seem like the type of thing that would be a priority for iommufd.

We're on a path where vfio accepts an iommufd as a container, and
ultimately iommufd becomes the container provider, supplanting the
IOMMU driver registration aspect of vfio.  I absolutely want type1 and
spapr backends to get replaced by iommufd, but reluctance to support
aspects of vfio "legacy" behavior doesn't give me warm fuzzies about a
wholesale hand-off of the container to a different subsystem, for
example vs an iommufd shim spoofing type1 support.

Unfortunately we no longer have a CONFIG_EXPERIMENTAL option to hide
behind for disabling VFIO_CONTAINER, so regardless of our intentions
that a transition is some time off, it may become an issue sooner than
we expect.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 10/11] vfio: Make vfio_container optionally compiled
  2022-11-09 17:18       ` Alex Williamson
@ 2022-11-09 19:52         ` Jason Gunthorpe
  2022-11-10  6:57           ` Tian, Kevin
  0 siblings, 1 reply; 56+ messages in thread
From: Jason Gunthorpe @ 2022-11-09 19:52 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, Kevin Tian, dri-devel, Vineeth Vijayan, Diana Craciun,
	Alexander Gordeev, Longfang Liu, linux-s390, Liu, Yi L,
	Matthew Rosato, Will Deacon, Joerg Roedel, Halil Pasic, iommu,
	Nicolin Chen, Christian Borntraeger, intel-gfx, Zhi Wang,
	Jason Herne, Eric Farman, Vasily Gorbik, Heiko Carstens,
	Eric Auger, Harald Freudenberger, Rodrigo Vivi, intel-gvt-dev,
	Tony Krowiak, Tvrtko Ursulin, Yishai Hadas, Cornelia Huck,
	Peter Oberparleiter, Shameer Kolothum, Sven Schnelle,
	Robin Murphy, Lu Baolu

On Wed, Nov 09, 2022 at 10:18:09AM -0700, Alex Williamson wrote:

> DPDK supports no-iommu mode.

Er? Huh? How? I thought no-iommu was for applications that didn't do
DMA? How is DPDK getting packets in/out without DMA? I guess it snoops
in /proc/ or something to learn PFNs of mlock'd memory? <shudder>

> I agree that it's very useful for testing, I'm certainly not suggesting
> to give up, but I'm not sure where no-iommu lives when iommufd owns
> /dev/vfio/vfio.  Given the unsafe interrupts discussion, it doesn't
> seem like the type of thing that would be a priority for iommufd.

Ah, I think the bit below will do the job, I'm not sure without doing
some testing (and I don't think I have the necessary Intel NIC for
DPDK). The basic point is no-iommu literally means 'do not use iommufd
at all, do not create an iommufd_device or an iommufd_access'. VFIO
can easially do that on its own.

The only slightly messy bit is that uAPI requires the SET_CONTAINER
which we can just NOP in vfio_compat. With more checks it could have
higher fidelity of error cases, but not sure it is worth it.

When we talk about the device cdev flow then I suggest that no-iommu
simply requires -1 for the iommufd during bind - ie no iommufd is
used or accepted and that is how the userspace knows/confirms it is in
no-iommu mode.

> We're on a path where vfio accepts an iommufd as a container, and
> ultimately iommufd becomes the container provider, supplanting the
> IOMMU driver registration aspect of vfio.  I absolutely want type1 and
> spapr backends to get replaced by iommufd, but reluctance to support
> aspects of vfio "legacy" behavior doesn't give me warm fuzzies about a
> wholesale hand-off of the container to a different subsystem, for
> example vs an iommufd shim spoofing type1 support.

Well, I will agree to do everything required, just let's go through the
process to understand the security situation and ensure we are still
doing things the right way.

> Unfortunately we no longer have a CONFIG_EXPERIMENTAL option to hide
> behind for disabling VFIO_CONTAINER, so regardless of our intentions
> that a transition is some time off, it may become an issue sooner than
> we expect.

We can add kconfig text discouraging that?

diff --git a/drivers/iommu/iommufd/vfio_compat.c b/drivers/iommu/iommufd/vfio_compat.c
index dbef3274803336..81f7594cfff8e0 100644
--- a/drivers/iommu/iommufd/vfio_compat.c
+++ b/drivers/iommu/iommufd/vfio_compat.c
@@ -259,6 +259,14 @@ static int iommufd_vfio_set_iommu(struct iommufd_ctx *ictx, unsigned long type)
 	struct iommufd_ioas *ioas = NULL;
 	int rc = 0;
 
+	/*
+	 * Emulation for NOIMMU is imperfect in that VFIO blocks almost all
+	 * other ioctls. We let them keep working but they mostly fail since no
+	 * IOAS should exist.
+	 */
+	if (IS_ENABLED(CONFIG_VFIO_NOIOMMU) && type == VFIO_NOIOMMU_IOMMU)
+		return 0;
+
 	if (type != VFIO_TYPE1_IOMMU && type != VFIO_TYPE1v2_IOMMU)
 		return -EINVAL;
 
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index 595c7b2146f88c..64a336ebc99b9f 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -18,6 +18,10 @@ int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
 
 	lockdep_assert_held(&vdev->dev_set->lock);
 
+	if (IS_ENABLED(CONFIG_VFIO_NO_IOMMU) &&
+	    vdev->group->type == VFIO_NO_IOMMU)
+		return 0;
+
 	/*
 	 * If the driver doesn't provide this op then it means the device does
 	 * not do DMA at all. So nothing to do.
@@ -53,6 +57,10 @@ void vfio_iommufd_unbind(struct vfio_device *vdev)
 {
 	lockdep_assert_held(&vdev->dev_set->lock);
 
+	if (IS_ENABLED(CONFIG_VFIO_NO_IOMMU) &&
+	    vdev->group->type == VFIO_NO_IOMMU)
+		return;
+
 	if (vdev->ops->unbind_iommufd)
 		vdev->ops->unbind_iommufd(vdev);
 }
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index f3c48b8c45627d..08c5b05a129c2c 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -749,10 +749,13 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
 	if (!IS_ERR(iommufd)) {
 		u32 ioas_id;
 
-		ret = iommufd_vfio_compat_ioas_id(iommufd, &ioas_id);
-		if (ret) {
-			iommufd_ctx_put(group->iommufd);
-			goto out_unlock;
+		if (!IS_ENABLED(CONFIG_VFIO_NO_IOMMU) ||
+		    group->type != VFIO_NO_IOMMU) {
+			ret = iommufd_vfio_compat_ioas_id(iommufd, &ioas_id);
+			if (ret) {
+				iommufd_ctx_put(group->iommufd);
+				goto out_unlock;
+			}
 		}
 
 		group->iommufd = iommufd;

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* RE: [PATCH v2 00/11] Connect VFIO to IOMMUFD
  2022-11-09 12:48   ` Jason Gunthorpe
@ 2022-11-10  2:16     ` Tian, Kevin
  0 siblings, 0 replies; 56+ messages in thread
From: Tian, Kevin @ 2022-11-10  2:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, dri-devel, Vineeth Vijayan, Diana Craciun,
	Alexander Gordeev, Longfang Liu, linux-s390, Liu, Yi L,
	Matthew Rosato, Will Deacon, Joerg Roedel, Halil Pasic, iommu,
	Nicolin Chen, Christian Borntraeger, intel-gfx, Wang, Zhi A,
	Jason Herne, Eric Farman, Vasily Gorbik, Heiko Carstens,
	Eric Auger, Alex Williamson, Harald Freudenberger, Vivi, Rodrigo,
	intel-gvt-dev, Tony Krowiak, Tvrtko Ursulin, Yishai Hadas,
	Cornelia Huck, Peter Oberparleiter, Shameer Kolothum,
	Sven Schnelle, Robin Murphy, Lu Baolu

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, November 9, 2022 8:48 PM
> 
> On Wed, Nov 09, 2022 at 09:03:52AM +0000, Tian, Kevin wrote:
> > every mail in this series is shown thrice in lore:
> >
> > https://lore.kernel.org/all/0-v2-65016290f146+33e-
> vfio_iommufd_jgg@nvidia.com/
> >
> > not sure what caused it but it's annoying to check the conversation there.
> 
> It is sort of a lore issue, it only combines messages that are exactly
> the same together. Several of the mailing lists on CC here mangle the
> message in various ways, eg adding trailer or whatever. This causes
> repeated messages in lore.
> 
> The trick in lore is to replace "/all/" with a good list, like /kvm/
> or /linux-iommu/ that shows the original non-mangled version, and only
> once.
> 

this trick works. Thanks!


^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v2 05/11] vfio: Use IOMMU_CAP_ENFORCE_CACHE_COHERENCY for vfio_file_enforced_coherent()
  2022-11-08  0:52 ` [PATCH v2 05/11] vfio: Use IOMMU_CAP_ENFORCE_CACHE_COHERENCY for vfio_file_enforced_coherent() Jason Gunthorpe
@ 2022-11-10  2:48   ` Tian, Kevin
  0 siblings, 0 replies; 56+ messages in thread
From: Tian, Kevin @ 2022-11-10  2:48 UTC (permalink / raw)
  To: Jason Gunthorpe, Alexander Gordeev, David Airlie, Tony Krowiak,
	Alex Williamson, Christian Borntraeger, Cornelia Huck,
	Daniel Vetter, Diana Craciun, dri-devel, Eric Auger, Eric Farman,
	Harald Freudenberger, Vasily Gorbik, Heiko Carstens, intel-gfx,
	intel-gvt-dev, iommu, Jani Nikula, Jason Herne, Joonas Lahtinen,
	Joerg Roedel, kvm, linux-s390, Longfang Liu, Matthew Rosato,
	Peter Oberparleiter, Halil Pasic, Robin Murphy, Vivi, Rodrigo,
	Shameer Kolothum, Sven Schnelle, Tvrtko Ursulin, Vineeth Vijayan,
	Will Deacon, Yishai Hadas, Zhenyu Wang, Wang, Zhi A
  Cc: Liu, Yi L, Nicolin Chen, Lu Baolu

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, November 8, 2022 8:53 AM
> 
> iommufd doesn't establish the iommu_domains until after the device FD is
> opened, even if the container has been set. This design is part of moving
> away from the group centric iommu APIs.
> 
> This is fine, except that the normal sequence of establishing the kvm
> wbinvd won't work:
> 
>    group = open("/dev/vfio/XX")
>    ioctl(group, VFIO_GROUP_SET_CONTAINER)
>    ioctl(kvm, KVM_DEV_VFIO_GROUP_ADD)
>    ioctl(group, VFIO_GROUP_GET_DEVICE_FD)
> 
> As the domains don't start existing until GET_DEVICE_FD. Further,
> GET_DEVICE_FD requires that KVM_DEV_VFIO_GROUP_ADD already be
> done as that
> is what sets the group->kvm and thus device->kvm for the driver to use
> during open.
> 
> Now that we have device centric cap ops and the new
> IOMMU_CAP_ENFORCE_CACHE_COHERENCY we know what the
> iommu_domain will be
> capable of without having to create it. Use this to compute
> vfio_file_enforced_coherent() and resolve the ordering problems.
> 
> VFIO always tries to upgrade domains to enforce cache coherency, it never
> attaches a device that supports enforce cache coherency to a less capable
> domain, so the cap test is a sufficient proxy for the ultimate
> outcome. iommufd also ensures that devices that set the cap will be
> connected to enforcing domains.
> 
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v2 06/11] vfio-iommufd: Allow iommufd to be used in place of a container fd
  2022-11-08  0:52 ` [PATCH v2 06/11] vfio-iommufd: Allow iommufd to be used in place of a container fd Jason Gunthorpe
@ 2022-11-10  2:51   ` Tian, Kevin
  0 siblings, 0 replies; 56+ messages in thread
From: Tian, Kevin @ 2022-11-10  2:51 UTC (permalink / raw)
  To: Jason Gunthorpe, Alexander Gordeev, David Airlie, Tony Krowiak,
	Alex Williamson, Christian Borntraeger, Cornelia Huck,
	Daniel Vetter, Diana Craciun, dri-devel, Eric Auger, Eric Farman,
	Harald Freudenberger, Vasily Gorbik, Heiko Carstens, intel-gfx,
	intel-gvt-dev, iommu, Jani Nikula, Jason Herne, Joonas Lahtinen,
	Joerg Roedel, kvm, linux-s390, Longfang Liu, Matthew Rosato,
	Peter Oberparleiter, Halil Pasic, Robin Murphy, Vivi, Rodrigo,
	Shameer Kolothum, Sven Schnelle, Tvrtko Ursulin, Vineeth Vijayan,
	Will Deacon, Yishai Hadas, Zhenyu Wang, Wang, Zhi A
  Cc: Liu, Yi L, Nicolin Chen, Lu Baolu

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, November 8, 2022 8:53 AM
> 
> This makes VFIO_GROUP_SET_CONTAINER accept both a vfio container FD
> and an
> iommufd.
> 
> In iommufd mode an IOAS will exist after the SET_CONTAINER, but it will
> not be attached to any groups.
> 
> For VFIO this means that the VFIO_GROUP_GET_STATUS and
> VFIO_GROUP_FLAGS_VIABLE works subtly differently. With the container FD
> the iommu_group_claim_dma_owner() is done during SET_CONTAINER but
> for
> IOMMUFD this is done during VFIO_GROUP_GET_DEVICE_FD. Meaning that
> VFIO_GROUP_FLAGS_VIABLE could be set but GET_DEVICE_FD will fail due to
> viability.
> 
> As GET_DEVICE_FD can fail for many reasons already this is not expected to
> be a meaningful difference.
> 
> Reorganize the tests for if the group has an assigned container or iommu
> into a vfio_group_has_iommu() function and consolidate all the duplicated
> WARN_ON's etc related to this.
> 
> Call container functions only if a container is actually present on the
> group.
> 
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v2 07/11] vfio-iommufd: Support iommufd for physical VFIO devices
  2022-11-08  0:52 ` [PATCH v2 07/11] vfio-iommufd: Support iommufd for physical VFIO devices Jason Gunthorpe
  2022-11-08  6:10   ` Nicolin Chen
@ 2022-11-10  3:11   ` Tian, Kevin
  2022-11-10 17:20     ` Jason Gunthorpe
  2022-11-11  4:12   ` Yi Liu
  2 siblings, 1 reply; 56+ messages in thread
From: Tian, Kevin @ 2022-11-10  3:11 UTC (permalink / raw)
  To: Jason Gunthorpe, Alexander Gordeev, David Airlie, Tony Krowiak,
	Alex Williamson, Christian Borntraeger, Cornelia Huck,
	Daniel Vetter, Diana Craciun, dri-devel, Eric Auger, Eric Farman,
	Harald Freudenberger, Vasily Gorbik, Heiko Carstens, intel-gfx,
	intel-gvt-dev, iommu, Jani Nikula, Jason Herne, Joonas Lahtinen,
	Joerg Roedel, kvm, linux-s390, Longfang Liu, Matthew Rosato,
	Peter Oberparleiter, Halil Pasic, Robin Murphy, Vivi, Rodrigo,
	Shameer Kolothum, Sven Schnelle, Tvrtko Ursulin, Vineeth Vijayan,
	Will Deacon, Yishai Hadas, Zhenyu Wang, Wang, Zhi A
  Cc: Liu, Yi L, Nicolin Chen, Lu Baolu

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, November 8, 2022 8:53 AM
> 
> +
> +int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
> +{
> +	u32 ioas_id;
> +	u32 device_id;
> +	int ret;
> +
> +	lockdep_assert_held(&vdev->dev_set->lock);
> +
> +	/*
> +	 * If the driver doesn't provide this op then it means the device does
> +	 * not do DMA at all. So nothing to do.
> +	 */
> +	if (!vdev->ops->bind_iommufd)
> +		return 0;
> +
> +	ret = vdev->ops->bind_iommufd(vdev, ictx, &device_id);
> +	if (ret)
> +		return ret;
> +
> +	ret = iommufd_vfio_compat_ioas_id(ictx, &ioas_id);
> +	if (ret)
> +		goto err_unbind;
> +	ret = vdev->ops->attach_ioas(vdev, &ioas_id);
> +	if (ret)
> +		goto err_unbind;

with our discussion in v1:

https://lore.kernel.org/all/Y2mgJqz8fvm54C+f@nvidia.com/

I got the rationale on iommufd part which doesn't have the concept
of container hence not necessarily to impose restriction on when
an user can change a compat ioas.

But from vfio side I wonder whether we should cache the compat
ioas id when it's attached by the first device and then use it all the
way for other device attachments coming after. implying IOAS_SET
only affects containers which haven't been attached.

In concept a container should be only aliased to one compat ioas
in its lifetime. 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v2 07/11] vfio-iommufd: Support iommufd for physical VFIO devices
  2022-11-08 17:51       ` Jason Gunthorpe
@ 2022-11-10  3:12         ` Tian, Kevin
  0 siblings, 0 replies; 56+ messages in thread
From: Tian, Kevin @ 2022-11-10  3:12 UTC (permalink / raw)
  To: Jason Gunthorpe, Liu, Yi L
  Cc: kvm, dri-devel, Vineeth Vijayan, Diana Craciun,
	Alexander Gordeev, Longfang Liu, linux-s390, Matthew Rosato,
	Will Deacon, Joerg Roedel, Halil Pasic, iommu, Nicolin Chen,
	Christian Borntraeger, intel-gfx, Wang, Zhi A, Jason Herne,
	Eric Farman, Vasily Gorbik, Heiko Carstens, Eric Auger,
	Alex Williamson, Harald Freudenberger, Vivi, Rodrigo,
	intel-gvt-dev, Tony Krowiak, Tvrtko Ursulin, Yishai Hadas,
	Cornelia Huck, Peter Oberparleiter, Shameer Kolothum,
	Sven Schnelle, Robin Murphy, Lu Baolu

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, November 9, 2022 1:52 AM
> 
> On Tue, Nov 08, 2022 at 03:41:25PM +0800, Yi Liu wrote:
> > On 2022/11/8 14:10, Nicolin Chen wrote:
> > > On Mon, Nov 07, 2022 at 08:52:51PM -0400, Jason Gunthorpe wrote:
> > >
> > > > @@ -795,6 +800,10 @@ static int vfio_device_first_open(struct
> vfio_device *device)
> > > >   		ret = vfio_group_use_container(device->group);
> > > >   		if (ret)
> > > >   			goto err_module_put;
> > > > +	} else if (device->group->iommufd) {
> > > > +		ret = vfio_iommufd_bind(device, device->group->iommufd);
> > >
> > > Here we check device->group->iommufd...
> > >
> > > > +		if (ret)
> > > > +			goto err_module_put;
> > > >   	}
> > > >   	device->kvm = device->group->kvm;
> > > > @@ -812,6 +821,7 @@ static int vfio_device_first_open(struct
> vfio_device *device)
> > > >   	device->kvm = NULL;
> > > >   	if (device->group->container)
> > > >   		vfio_group_unuse_container(device->group);
> > > > +	vfio_iommufd_unbind(device);
> > >
> > > ...yet, missing here, which could result in kernel oops.
> > >
> > > Should probably add something similar:
> > > +	if (device->group->iommufd)
> > > +		vfio_iommufd_unbind(device);
> > >
> > > Or should check !vdev->iommufd_device inside the ->unbind.
> >
> > this check was in prior version, but removed in this version. any
> > special reason? Jason?
> 
> Oooh, this makes more sense - Kevin pointed out the check was wrong:
> 
> > > +void vfio_iommufd_unbind(struct vfio_device *vdev)
> > > +{
> > > +	lockdep_assert_held(&vdev->dev_set->lock);
> > > +
> > > +	if (!vdev->iommufd_device)
> > > +		return;
> 
> > there is no iommufd_device in the emulated path...
> 
> And he is right, so I dropped it. But really the check was just
> misspelled, it was supposed to be "device->group->iommufd" because the
> caller assumed it.
> 
> Still, I think the right way to fix it is to lift the check as we
> don't touch group->iommufd in iommufd.c
> 

yes this is the right fix.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v2 08/11] vfio-iommufd: Support iommufd for emulated VFIO devices
  2022-11-08  0:52 ` [PATCH v2 08/11] vfio-iommufd: Support iommufd for emulated " Jason Gunthorpe
@ 2022-11-10  5:33   ` Tian, Kevin
  0 siblings, 0 replies; 56+ messages in thread
From: Tian, Kevin @ 2022-11-10  5:33 UTC (permalink / raw)
  To: Jason Gunthorpe, Alexander Gordeev, David Airlie, Tony Krowiak,
	Alex Williamson, Christian Borntraeger, Cornelia Huck,
	Daniel Vetter, Diana Craciun, dri-devel, Eric Auger, Eric Farman,
	Harald Freudenberger, Vasily Gorbik, Heiko Carstens, intel-gfx,
	intel-gvt-dev, iommu, Jani Nikula, Jason Herne, Joonas Lahtinen,
	Joerg Roedel, kvm, linux-s390, Longfang Liu, Matthew Rosato,
	Peter Oberparleiter, Halil Pasic, Robin Murphy, Vivi, Rodrigo,
	Shameer Kolothum, Sven Schnelle, Tvrtko Ursulin, Vineeth Vijayan,
	Will Deacon, Yishai Hadas, Zhenyu Wang, Wang, Zhi A
  Cc: Liu, Yi L, Nicolin Chen, Lu Baolu

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, November 8, 2022 8:53 AM
> 
> Emulated VFIO devices are calling vfio_register_emulated_iommu_dev() and
> consist of all the mdev drivers.
> 
> Like the physical drivers, support for iommufd is provided by the driver
> supplying the correct standard ops. Provide ops from the core that
> duplicate what vfio_register_emulated_iommu_dev() does.
> 
> Emulated drivers are where it is more likely to see variation in the
> iommfd support ops. For instance IDXD will probably need to setup both a
> iommfd_device context linked to a PASID and an iommufd_access context to
> support all their mdev operations.
> 
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

this looks good to me overall. but I'll wait for next version to ack
after the change for s390 is settled down and incorporated.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v2 09/11] vfio: Move container related MODULE_ALIAS statements into container.c
  2022-11-08  0:52 ` [PATCH v2 09/11] vfio: Move container related MODULE_ALIAS statements into container.c Jason Gunthorpe
@ 2022-11-10  5:34   ` Tian, Kevin
  2022-11-11  4:13   ` Yi Liu
  1 sibling, 0 replies; 56+ messages in thread
From: Tian, Kevin @ 2022-11-10  5:34 UTC (permalink / raw)
  To: Jason Gunthorpe, Alexander Gordeev, David Airlie, Tony Krowiak,
	Alex Williamson, Christian Borntraeger, Cornelia Huck,
	Daniel Vetter, Diana Craciun, dri-devel, Eric Auger, Eric Farman,
	Harald Freudenberger, Vasily Gorbik, Heiko Carstens, intel-gfx,
	intel-gvt-dev, iommu, Jani Nikula, Jason Herne, Joonas Lahtinen,
	Joerg Roedel, kvm, linux-s390, Longfang Liu, Matthew Rosato,
	Peter Oberparleiter, Halil Pasic, Robin Murphy, Vivi, Rodrigo,
	Shameer Kolothum, Sven Schnelle, Tvrtko Ursulin, Vineeth Vijayan,
	Will Deacon, Yishai Hadas, Zhenyu Wang, Wang, Zhi A
  Cc: Liu, Yi L, Nicolin Chen, Lu Baolu

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, November 8, 2022 8:53 AM
> 
> The miscdev is in container.c, so should these related MODULE_ALIAS
> statements. This is necessary for the next patch to be able to fully
> disable /dev/vfio/vfio.
> 
> Fixes: cdc71fe4ecbf ("vfio: Move container code into
> drivers/vfio/container.c")
> Reported-by: "Liu, Yi L" <yi.l.liu@intel.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v2 10/11] vfio: Make vfio_container optionally compiled
  2022-11-09 19:52         ` Jason Gunthorpe
@ 2022-11-10  6:57           ` Tian, Kevin
  2022-11-10 17:10             ` Alex Williamson
  2022-11-10 17:52             ` Jason Gunthorpe
  0 siblings, 2 replies; 56+ messages in thread
From: Tian, Kevin @ 2022-11-10  6:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson
  Cc: kvm, dri-devel, Vineeth Vijayan, Diana Craciun,
	Alexander Gordeev, Longfang Liu, linux-s390, Liu, Yi L,
	Matthew Rosato, Will Deacon, Joerg Roedel, Halil Pasic, iommu,
	Nicolin Chen, Christian Borntraeger, intel-gfx, Wang, Zhi A,
	Jason Herne, Eric Farman, Vasily Gorbik, Heiko Carstens,
	Eric Auger, Harald Freudenberger, Vivi, Rodrigo, intel-gvt-dev,
	Tony Krowiak, Tvrtko Ursulin, Yishai Hadas, Cornelia Huck,
	Peter Oberparleiter, Shameer Kolothum, Sven Schnelle,
	Robin Murphy, Lu Baolu

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, November 10, 2022 3:53 AM
> 
> On Wed, Nov 09, 2022 at 10:18:09AM -0700, Alex Williamson wrote:
> 
> > DPDK supports no-iommu mode.
> 
> Er? Huh? How? I thought no-iommu was for applications that didn't do
> DMA? How is DPDK getting packets in/out without DMA? I guess it snoops
> in /proc/ or something to learn PFNs of mlock'd memory? <shudder>

iirc dpdk started with UIO plus various tricks (root privilege, hugepage, etc.)
to lock and learn PFN's from pagemap. Then when migrating it to vfio the
no-iommu option was introduced to provide UIO compatibility.

> 
> > I agree that it's very useful for testing, I'm certainly not suggesting
> > to give up, but I'm not sure where no-iommu lives when iommufd owns
> > /dev/vfio/vfio.  Given the unsafe interrupts discussion, it doesn't
> > seem like the type of thing that would be a priority for iommufd.
> 
> Ah, I think the bit below will do the job, I'm not sure without doing
> some testing (and I don't think I have the necessary Intel NIC for
> DPDK). The basic point is no-iommu literally means 'do not use iommufd
> at all, do not create an iommufd_device or an iommufd_access'. VFIO
> can easially do that on its own.
> 
> The only slightly messy bit is that uAPI requires the SET_CONTAINER
> which we can just NOP in vfio_compat. With more checks it could have
> higher fidelity of error cases, but not sure it is worth it.
> 
> When we talk about the device cdev flow then I suggest that no-iommu
> simply requires -1 for the iommufd during bind - ie no iommufd is
> used or accepted and that is how the userspace knows/confirms it is in
> no-iommu mode.
> 
> > We're on a path where vfio accepts an iommufd as a container, and
> > ultimately iommufd becomes the container provider, supplanting the
> > IOMMU driver registration aspect of vfio.  I absolutely want type1 and
> > spapr backends to get replaced by iommufd, but reluctance to support
> > aspects of vfio "legacy" behavior doesn't give me warm fuzzies about a
> > wholesale hand-off of the container to a different subsystem, for
> > example vs an iommufd shim spoofing type1 support.
> 
> Well, I will agree to do everything required, just let's go through the
> process to understand the security situation and ensure we are still
> doing things the right way.
> 
> > Unfortunately we no longer have a CONFIG_EXPERIMENTAL option to hide
> > behind for disabling VFIO_CONTAINER, so regardless of our intentions
> > that a transition is some time off, it may become an issue sooner than
> > we expect.
> 
> We can add kconfig text discouraging that?
> 
> diff --git a/drivers/iommu/iommufd/vfio_compat.c
> b/drivers/iommu/iommufd/vfio_compat.c
> index dbef3274803336..81f7594cfff8e0 100644
> --- a/drivers/iommu/iommufd/vfio_compat.c
> +++ b/drivers/iommu/iommufd/vfio_compat.c
> @@ -259,6 +259,14 @@ static int iommufd_vfio_set_iommu(struct
> iommufd_ctx *ictx, unsigned long type)
>  	struct iommufd_ioas *ioas = NULL;
>  	int rc = 0;
> 
> +	/*
> +	 * Emulation for NOIMMU is imperfect in that VFIO blocks almost all
> +	 * other ioctls. We let them keep working but they mostly fail since no
> +	 * IOAS should exist.
> +	 */
> +	if (IS_ENABLED(CONFIG_VFIO_NOIOMMU) && type ==
> VFIO_NOIOMMU_IOMMU)
> +		return 0;
> +
>  	if (type != VFIO_TYPE1_IOMMU && type != VFIO_TYPE1v2_IOMMU)
>  		return -EINVAL;
> 

also need a check in iommufd_vfio_check_extension() so only
VFIO_NOIOMMU_IOMMU is supported in no-iommu mode.

> diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> index 595c7b2146f88c..64a336ebc99b9f 100644
> --- a/drivers/vfio/iommufd.c
> +++ b/drivers/vfio/iommufd.c
> @@ -18,6 +18,10 @@ int vfio_iommufd_bind(struct vfio_device *vdev,
> struct iommufd_ctx *ictx)
> 
>  	lockdep_assert_held(&vdev->dev_set->lock);
> 
> +	if (IS_ENABLED(CONFIG_VFIO_NO_IOMMU) &&
> +	    vdev->group->type == VFIO_NO_IOMMU)
> +		return 0;
> +
>  	/*
>  	 * If the driver doesn't provide this op then it means the device does
>  	 * not do DMA at all. So nothing to do.
> @@ -53,6 +57,10 @@ void vfio_iommufd_unbind(struct vfio_device *vdev)
>  {
>  	lockdep_assert_held(&vdev->dev_set->lock);
> 
> +	if (IS_ENABLED(CONFIG_VFIO_NO_IOMMU) &&
> +	    vdev->group->type == VFIO_NO_IOMMU)
> +		return;
> +
>  	if (vdev->ops->unbind_iommufd)
>  		vdev->ops->unbind_iommufd(vdev);
>  }
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index f3c48b8c45627d..08c5b05a129c2c 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -749,10 +749,13 @@ static int vfio_group_ioctl_set_container(struct
> vfio_group *group,
>  	if (!IS_ERR(iommufd)) {
>  		u32 ioas_id;
> 
> -		ret = iommufd_vfio_compat_ioas_id(iommufd, &ioas_id);
> -		if (ret) {
> -			iommufd_ctx_put(group->iommufd);
> -			goto out_unlock;
> +		if (!IS_ENABLED(CONFIG_VFIO_NO_IOMMU) ||
> +		    group->type != VFIO_NO_IOMMU) {
> +			ret = iommufd_vfio_compat_ioas_id(iommufd,
> &ioas_id);
> +			if (ret) {
> +				iommufd_ctx_put(group->iommufd);
> +				goto out_unlock;
> +			}
>  		}

with above I suppose other ioctls (map/unmap/etc.) are implicitly blocked
given get_compat_ioas() will fail in those paths. this is good.

btw vfio container requires exact match between group->type and
container->noiommu, i.e. noiommu group can be only attached to noiommu
container. this is another thing to be paired up.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v2 11/11] iommufd: Allow iommufd to supply /dev/vfio/vfio
  2022-11-08  0:52 ` [PATCH v2 11/11] iommufd: Allow iommufd to supply /dev/vfio/vfio Jason Gunthorpe
@ 2022-11-10  7:01   ` Tian, Kevin
  2022-11-11  4:16   ` Yi Liu
  1 sibling, 0 replies; 56+ messages in thread
From: Tian, Kevin @ 2022-11-10  7:01 UTC (permalink / raw)
  To: Jason Gunthorpe, Alexander Gordeev, David Airlie, Tony Krowiak,
	Alex Williamson, Christian Borntraeger, Cornelia Huck,
	Daniel Vetter, Diana Craciun, dri-devel, Eric Auger, Eric Farman,
	Harald Freudenberger, Vasily Gorbik, Heiko Carstens, intel-gfx,
	intel-gvt-dev, iommu, Jani Nikula, Jason Herne, Joonas Lahtinen,
	Joerg Roedel, kvm, linux-s390, Longfang Liu, Matthew Rosato,
	Peter Oberparleiter, Halil Pasic, Robin Murphy, Vivi, Rodrigo,
	Shameer Kolothum, Sven Schnelle, Tvrtko Ursulin, Vineeth Vijayan,
	Will Deacon, Yishai Hadas, Zhenyu Wang, Wang, Zhi A
  Cc: Liu, Yi L, Nicolin Chen, Lu Baolu

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, November 8, 2022 8:53 AM
> 
> If the VFIO container is compiled out, give a kconfig option for iommufd
> to provide the miscdev node with the same name and permissions as vfio
> uses.
> 
> The compatibility node supports the same ioctls as VFIO and automatically
> enables the VFIO compatible pinned page accounting mode.
> 
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 10/11] vfio: Make vfio_container optionally compiled
  2022-11-10  6:57           ` Tian, Kevin
@ 2022-11-10 17:10             ` Alex Williamson
  2022-11-10 17:52             ` Jason Gunthorpe
  1 sibling, 0 replies; 56+ messages in thread
From: Alex Williamson @ 2022-11-10 17:10 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: kvm, dri-devel, Vineeth Vijayan, Diana Craciun,
	Alexander Gordeev, Longfang Liu, linux-s390, Liu, Yi L,
	Matthew Rosato, Will Deacon, Joerg Roedel, Halil Pasic, iommu,
	Nicolin Chen, Jason Gunthorpe, Christian Borntraeger, intel-gfx,
	Wang, Zhi A, Jason Herne, Eric Farman, Vasily Gorbik,
	Heiko Carstens, Eric Auger, Harald Freudenberger, Vivi, Rodrigo,
	intel-gvt-dev, Tony Krowiak, Tvrtko Ursulin, Yishai Hadas,
	Cornelia Huck, Peter Oberparleiter, Shameer Kolothum,
	Sven Schnelle, Robin Murphy, Lu Baolu

On Thu, 10 Nov 2022 06:57:57 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Thursday, November 10, 2022 3:53 AM
> > 
> > On Wed, Nov 09, 2022 at 10:18:09AM -0700, Alex Williamson wrote:
> >   
> > > DPDK supports no-iommu mode.  
> > 
> > Er? Huh? How? I thought no-iommu was for applications that didn't do
> > DMA? How is DPDK getting packets in/out without DMA? I guess it snoops
> > in /proc/ or something to learn PFNs of mlock'd memory? <shudder>  
> 
> iirc dpdk started with UIO plus various tricks (root privilege, hugepage, etc.)
> to lock and learn PFN's from pagemap. Then when migrating it to vfio the
> no-iommu option was introduced to provide UIO compatibility.

IIRC, we essentially introduced no-iommu mode vfio because DPDK started
pushing for extending interrupt support in uio-pci-generic.  The UIO
driver is also only meant for devices that don't do DMA, but obviously
DPDK didn't care about that.  Rather than extend UIO, we offered this
no-iommu mode in vfio since we already had more extensive MSI support,
were better able to impose restrictions on access to the device, and
using the same device access makes the transition to proper IOMMU
backed configurations more seamless.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 07/11] vfio-iommufd: Support iommufd for physical VFIO devices
  2022-11-10  3:11   ` Tian, Kevin
@ 2022-11-10 17:20     ` Jason Gunthorpe
  2022-11-10 23:58       ` Tian, Kevin
  0 siblings, 1 reply; 56+ messages in thread
From: Jason Gunthorpe @ 2022-11-10 17:20 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: kvm, dri-devel, Vineeth Vijayan, Diana Craciun,
	Alexander Gordeev, Longfang Liu, linux-s390, Liu, Yi L,
	Matthew Rosato, Will Deacon, Joerg Roedel, Halil Pasic, iommu,
	Nicolin Chen, Christian Borntraeger, intel-gfx, Wang, Zhi A,
	Jason Herne, Eric Farman, Vasily Gorbik, Heiko Carstens,
	Eric Auger, Alex Williamson, Harald Freudenberger, Vivi, Rodrigo,
	intel-gvt-dev, Tony Krowiak, Tvrtko Ursulin, Yishai Hadas,
	Cornelia Huck, Peter Oberparleiter, Shameer Kolothum,
	Sven Schnelle, Robin Murphy, Lu Baolu

On Thu, Nov 10, 2022 at 03:11:16AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, November 8, 2022 8:53 AM
> > 
> > +
> > +int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
> > +{
> > +	u32 ioas_id;
> > +	u32 device_id;
> > +	int ret;
> > +
> > +	lockdep_assert_held(&vdev->dev_set->lock);
> > +
> > +	/*
> > +	 * If the driver doesn't provide this op then it means the device does
> > +	 * not do DMA at all. So nothing to do.
> > +	 */
> > +	if (!vdev->ops->bind_iommufd)
> > +		return 0;
> > +
> > +	ret = vdev->ops->bind_iommufd(vdev, ictx, &device_id);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = iommufd_vfio_compat_ioas_id(ictx, &ioas_id);
> > +	if (ret)
> > +		goto err_unbind;
> > +	ret = vdev->ops->attach_ioas(vdev, &ioas_id);
> > +	if (ret)
> > +		goto err_unbind;
> 
> with our discussion in v1:
> 
> https://lore.kernel.org/all/Y2mgJqz8fvm54C+f@nvidia.com/
> 
> I got the rationale on iommufd part which doesn't have the concept
> of container hence not necessarily to impose restriction on when
> an user can change a compat ioas.
> 
> But from vfio side I wonder whether we should cache the compat
> ioas id when it's attached by the first device and then use it all the
> way for other device attachments coming after. implying IOAS_SET
> only affects containers which haven't been attached.

I can't see a reason to do this. IOAS_SET is a new ioctl and it has
new semantics beyond what original vfio container could do. In this
case having an impact on the next vfio_device that is opened.

This seems generally useful enough I wouldn't want to block it.

In any case, we can't *really* change this because the vfio layer is
working on IDs and the IDs can be destroyed/recreated from under
it. So if we try to hold the ID we could still end up getting it
changed anyhow.

Jason

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 10/11] vfio: Make vfio_container optionally compiled
  2022-11-10  6:57           ` Tian, Kevin
  2022-11-10 17:10             ` Alex Williamson
@ 2022-11-10 17:52             ` Jason Gunthorpe
  1 sibling, 0 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2022-11-10 17:52 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: kvm, dri-devel, Vineeth Vijayan, Diana Craciun,
	Alexander Gordeev, Longfang Liu, linux-s390, Liu, Yi L,
	Matthew Rosato, Will Deacon, Joerg Roedel, Halil Pasic, iommu,
	Nicolin Chen, Christian Borntraeger, intel-gfx, Wang, Zhi A,
	Jason Herne, Eric Farman, Vasily Gorbik, Heiko Carstens,
	Eric Auger, Alex Williamson, Harald Freudenberger, Vivi, Rodrigo,
	intel-gvt-dev, Tony Krowiak, Tvrtko Ursulin, Yishai Hadas,
	Cornelia Huck, Peter Oberparleiter, Shameer Kolothum,
	Sven Schnelle, Robin Murphy, Lu Baolu

On Thu, Nov 10, 2022 at 06:57:57AM +0000, Tian, Kevin wrote:

> > +	/*
> > +	 * Emulation for NOIMMU is imperfect in that VFIO blocks almost all
> > +	 * other ioctls. We let them keep working but they mostly fail since no
> > +	 * IOAS should exist.
> > +	 */
> > +	if (IS_ENABLED(CONFIG_VFIO_NOIOMMU) && type ==
> > VFIO_NOIOMMU_IOMMU)
> > +		return 0;
> > +
> >  	if (type != VFIO_TYPE1_IOMMU && type != VFIO_TYPE1v2_IOMMU)
> >  		return -EINVAL;
> 
> also need a check in iommufd_vfio_check_extension() so only
> VFIO_NOIOMMU_IOMMU is supported in no-iommu mode.

Mm, and some permission checks too

> > +		if (!IS_ENABLED(CONFIG_VFIO_NO_IOMMU) ||
> > +		    group->type != VFIO_NO_IOMMU) {
> > +			ret = iommufd_vfio_compat_ioas_id(iommufd,
> > &ioas_id);
> > +			if (ret) {
> > +				iommufd_ctx_put(group->iommufd);
> > +				goto out_unlock;
> > +			}
> >  		}
> 
> with above I suppose other ioctls (map/unmap/etc.) are implicitly blocked
> given get_compat_ioas() will fail in those paths. this is good.
> 
> btw vfio container requires exact match between group->type and
> container->noiommu, i.e. noiommu group can be only attached to noiommu
> container. this is another thing to be paired up.

Sure, as below

So, the missing ingredient here is somone who has the necessary device
to test dpdk? I wonder if qemu e1000 is able to do this path?

diff --git a/drivers/iommu/iommufd/vfio_compat.c b/drivers/iommu/iommufd/vfio_compat.c
index dbef3274803336..c20e55ddc9aa81 100644
--- a/drivers/iommu/iommufd/vfio_compat.c
+++ b/drivers/iommu/iommufd/vfio_compat.c
@@ -26,16 +26,35 @@ static struct iommufd_ioas *get_compat_ioas(struct iommufd_ctx *ictx)
 }
 
 /**
- * iommufd_vfio_compat_ioas_id - Return the IOAS ID that vfio should use
+ * iommufd_vfio_compat_ioas_get_id - Ensure a comat IOAS exists
+ * @ictx: Context to operate on
+ *
+ * Return the ID of the current compatability ioas. The ID can be passed into
+ * other functions that take an ioas_id.
+ */
+int iommufd_vfio_compat_ioas_get_id(struct iommufd_ctx *ictx, u32 *out_ioas_id)
+{
+	struct iommufd_ioas *ioas;
+
+	ioas = get_compat_ioas(ictx);
+	if (IS_ERR(ioas))
+		return PTR_ERR(ioas);
+	*out_ioas_id = ioas->obj.id;
+	iommufd_put_object(&ioas->obj);
+	return 0;
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_vfio_compat_ioas_get_id, IOMMUFD_VFIO);
+
+/**
+ * iommufd_vfio_compat_ioas_create_id - Return the IOAS ID that vfio should use
  * @ictx: Context to operate on
- * @out_ioas_id: The ioas_id the caller should use
  *
  * The compatibility IOAS is the IOAS that the vfio compatibility ioctls operate
  * on since they do not have an IOAS ID input in their ABI. Only attaching a
- * group should cause a default creation of the internal ioas, this returns the
- * existing ioas if it has already been assigned somehow.
+ * group should cause a default creation of the internal ioas, this does nothing
+ * if an existing ioas has already been assigned somehow.
  */
-int iommufd_vfio_compat_ioas_id(struct iommufd_ctx *ictx, u32 *out_ioas_id)
+int iommufd_vfio_compat_ioas_create_id(struct iommufd_ctx *ictx)
 {
 	struct iommufd_ioas *ioas = NULL;
 	struct iommufd_ioas *out_ioas;
@@ -53,7 +72,6 @@ int iommufd_vfio_compat_ioas_id(struct iommufd_ctx *ictx, u32 *out_ioas_id)
 	}
 	xa_unlock(&ictx->objects);
 
-	*out_ioas_id = out_ioas->obj.id;
 	if (out_ioas != ioas) {
 		iommufd_put_object(&out_ioas->obj);
 		iommufd_object_abort(ictx, &ioas->obj);
@@ -68,7 +86,7 @@ int iommufd_vfio_compat_ioas_id(struct iommufd_ctx *ictx, u32 *out_ioas_id)
 	iommufd_object_finalize(ictx, &ioas->obj);
 	return 0;
 }
-EXPORT_SYMBOL_NS_GPL(iommufd_vfio_compat_ioas_id, IOMMUFD_VFIO);
+EXPORT_SYMBOL_NS_GPL(iommufd_vfio_compat_ioas_create_id, IOMMUFD_VFIO);
 
 int iommufd_vfio_ioas(struct iommufd_ucmd *ucmd)
 {
@@ -230,6 +248,9 @@ static int iommufd_vfio_check_extension(struct iommufd_ctx *ictx,
 	case VFIO_UNMAP_ALL:
 		return 1;
 
+	case VFIO_NOIOMMU_IOMMU:
+	return IS_ENABLED(CONFIG_VFIO_NOIOMMU);
+
 	case VFIO_DMA_CC_IOMMU:
 		return iommufd_vfio_cc_iommu(ictx);
 
@@ -259,6 +280,17 @@ static int iommufd_vfio_set_iommu(struct iommufd_ctx *ictx, unsigned long type)
 	struct iommufd_ioas *ioas = NULL;
 	int rc = 0;
 
+	/*
+	 * Emulation for NOIMMU is imperfect in that VFIO blocks almost all
+	 * other ioctls. We let them keep working but they mostly fail since no
+	 * IOAS should exist.
+	 */
+	if (IS_ENABLED(CONFIG_VFIO_NOIOMMU) && type == VFIO_NOIOMMU_IOMMU) {
+		if (!capable(CAP_SYS_RAWIO))
+			return -EPERM;
+		return 0;
+	}
+
 	if (type != VFIO_TYPE1_IOMMU && type != VFIO_TYPE1v2_IOMMU)
 		return -EINVAL;
 
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index 595c7b2146f88c..daa8039da7a8fa 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -18,6 +18,21 @@ int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
 
 	lockdep_assert_held(&vdev->dev_set->lock);
 
+	if (IS_ENABLED(CONFIG_VFIO_NO_IOMMU) &&
+	    vdev->group->type == VFIO_NO_IOMMU) {
+		if (!capable(CAP_SYS_RAWIO))
+			return -EPERM;
+
+		/*
+		 * Require no compat ioas to be assigned to proceed. The basic
+		 * statement is that the user cannot have done something that
+		 * implies they expected translation to exist
+		 */
+		if (!iommufd_vfio_compat_ioas_get_id(ictx, &ioas_id))
+			return -EPERM;
+		return 0;
+	}
+
 	/*
 	 * If the driver doesn't provide this op then it means the device does
 	 * not do DMA at all. So nothing to do.
@@ -29,7 +44,7 @@ int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
 	if (ret)
 		return ret;
 
-	ret = iommufd_vfio_compat_ioas_id(ictx, &ioas_id);
+	ret = iommufd_vfio_compat_ioas_get_id(ictx, &ioas_id);
 	if (ret)
 		goto err_unbind;
 	ret = vdev->ops->attach_ioas(vdev, &ioas_id);
@@ -53,6 +68,10 @@ void vfio_iommufd_unbind(struct vfio_device *vdev)
 {
 	lockdep_assert_held(&vdev->dev_set->lock);
 
+	if (IS_ENABLED(CONFIG_VFIO_NO_IOMMU) &&
+	    vdev->group->type == VFIO_NO_IOMMU)
+		return;
+
 	if (vdev->ops->unbind_iommufd)
 		vdev->ops->unbind_iommufd(vdev);
 }
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index f3c48b8c45627d..b59eff30968a1e 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -747,12 +747,13 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
 
 	iommufd = iommufd_ctx_from_file(f.file);
 	if (!IS_ERR(iommufd)) {
-		u32 ioas_id;
-
-		ret = iommufd_vfio_compat_ioas_id(iommufd, &ioas_id);
-		if (ret) {
-			iommufd_ctx_put(group->iommufd);
-			goto out_unlock;
+		if (!IS_ENABLED(CONFIG_VFIO_NO_IOMMU) ||
+		    group->type != VFIO_NO_IOMMU) {
+			ret = iommufd_vfio_compat_ioas_create_id(iommufd);
+			if (ret) {
+				iommufd_ctx_put(group->iommufd);
+				goto out_unlock;
+			}
 		}
 
 		group->iommufd = iommufd;
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index 7a5d64a1dae482..bf2b3ea5f90fd2 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -61,7 +61,8 @@ void iommufd_access_unpin_pages(struct iommufd_access *access,
 				unsigned long iova, unsigned long length);
 int iommufd_access_rw(struct iommufd_access *access, unsigned long iova,
 		      void *data, size_t len, unsigned int flags);
-int iommufd_vfio_compat_ioas_id(struct iommufd_ctx *ictx, u32 *out_ioas_id);
+int iommufd_vfio_compat_ioas_get_id(struct iommufd_ctx *ictx, u32 *out_ioas_id);
+int iommufd_vfio_compat_ioas_create_id(struct iommufd_ctx *ictx);
 #else /* !CONFIG_IOMMUFD */
 static inline struct iommufd_ctx *iommufd_ctx_from_file(struct file *file)
 {
@@ -93,8 +94,7 @@ static inline int iommufd_access_rw(struct iommufd_access *access, unsigned long
 	return -EOPNOTSUPP;
 }
 
-static inline int iommufd_vfio_compat_ioas_id(struct iommufd_ctx *ictx,
-					      u32 *out_ioas_id)
+static inline int iommufd_vfio_compat_ioas_create_id(struct iommufd_ctx *ictx)
 {
 	return -EOPNOTSUPP;
 }


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* RE: [PATCH v2 07/11] vfio-iommufd: Support iommufd for physical VFIO devices
  2022-11-10 17:20     ` Jason Gunthorpe
@ 2022-11-10 23:58       ` Tian, Kevin
  0 siblings, 0 replies; 56+ messages in thread
From: Tian, Kevin @ 2022-11-10 23:58 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, dri-devel, Vineeth Vijayan, Diana Craciun,
	Alexander Gordeev, Longfang Liu, linux-s390, Liu, Yi L,
	Matthew Rosato, Will Deacon, Joerg Roedel, Halil Pasic, iommu,
	Nicolin Chen, Christian Borntraeger, intel-gfx, Wang, Zhi A,
	Jason Herne, Eric Farman, Vasily Gorbik, Heiko Carstens,
	Eric Auger, Alex Williamson, Harald Freudenberger, Vivi, Rodrigo,
	intel-gvt-dev, Tony Krowiak, Tvrtko Ursulin, Yishai Hadas,
	Cornelia Huck, Peter Oberparleiter, Shameer Kolothum,
	Sven Schnelle, Robin Murphy, Lu Baolu

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, November 11, 2022 1:21 AM
> 
> On Thu, Nov 10, 2022 at 03:11:16AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Tuesday, November 8, 2022 8:53 AM
> > >
> > > +
> > > +int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx
> *ictx)
> > > +{
> > > +	u32 ioas_id;
> > > +	u32 device_id;
> > > +	int ret;
> > > +
> > > +	lockdep_assert_held(&vdev->dev_set->lock);
> > > +
> > > +	/*
> > > +	 * If the driver doesn't provide this op then it means the device does
> > > +	 * not do DMA at all. So nothing to do.
> > > +	 */
> > > +	if (!vdev->ops->bind_iommufd)
> > > +		return 0;
> > > +
> > > +	ret = vdev->ops->bind_iommufd(vdev, ictx, &device_id);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	ret = iommufd_vfio_compat_ioas_id(ictx, &ioas_id);
> > > +	if (ret)
> > > +		goto err_unbind;
> > > +	ret = vdev->ops->attach_ioas(vdev, &ioas_id);
> > > +	if (ret)
> > > +		goto err_unbind;
> >
> > with our discussion in v1:
> >
> > https://lore.kernel.org/all/Y2mgJqz8fvm54C+f@nvidia.com/
> >
> > I got the rationale on iommufd part which doesn't have the concept
> > of container hence not necessarily to impose restriction on when
> > an user can change a compat ioas.
> >
> > But from vfio side I wonder whether we should cache the compat
> > ioas id when it's attached by the first device and then use it all the
> > way for other device attachments coming after. implying IOAS_SET
> > only affects containers which haven't been attached.
> 
> I can't see a reason to do this. IOAS_SET is a new ioctl and it has
> new semantics beyond what original vfio container could do. In this
> case having an impact on the next vfio_device that is opened.
> 
> This seems generally useful enough I wouldn't want to block it.
> 
> In any case, we can't *really* change this because the vfio layer is
> working on IDs and the IDs can be destroyed/recreated from under
> it. So if we try to hold the ID we could still end up getting it
> changed anyhow.
> 

OK, this is a valid point.

So a legacy vfio application doesn't use IOAS_SET so the backward
compatibility is guaranteed.

a iommufd native application will use cdev where IOAS_SET and
compat ioas are irrelevant.

here just we allow an interesting usage where an user is allowed
to do more funny things with IOAS_SET on vfio-compat. Not sure
how useful it is but not something we want to prohibit.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 00/11] Connect VFIO to IOMMUFD
  2022-11-08  0:52 [PATCH v2 00/11] Connect VFIO to IOMMUFD Jason Gunthorpe
                   ` (12 preceding siblings ...)
  2022-11-09  9:03 ` Tian, Kevin
@ 2022-11-11  3:01 ` Matthew Rosato
  2022-11-14 14:23   ` Jason Gunthorpe
  13 siblings, 1 reply; 56+ messages in thread
From: Matthew Rosato @ 2022-11-11  3:01 UTC (permalink / raw)
  To: Jason Gunthorpe, Alexander Gordeev, David Airlie, Tony Krowiak,
	Alex Williamson, Christian Borntraeger, Cornelia Huck,
	Daniel Vetter, Diana Craciun, dri-devel, Eric Auger, Eric Farman,
	Harald Freudenberger, Vasily Gorbik, Heiko Carstens, intel-gfx,
	intel-gvt-dev, iommu, Jani Nikula, Jason Herne, Joonas Lahtinen,
	Joerg Roedel, Kevin Tian, kvm, linux-s390, Longfang Liu,
	Peter Oberparleiter, Halil Pasic, Robin Murphy, Rodrigo Vivi,
	Shameer Kolothum, Sven Schnelle, Tvrtko Ursulin, Vineeth Vijayan,
	Will Deacon, Yishai Hadas, Zhenyu Wang, Zhi Wang
  Cc: Liu, Yi L, Nicolin Chen, Lu Baolu

On 11/7/22 7:52 PM, Jason Gunthorpe wrote:
> This series provides an alternative container layer for VFIO implemented
> using iommufd. This is optional, if CONFIG_IOMMUFD is not set then it will
> not be compiled in.
> 
> At this point iommufd can be injected by passing in a iommfd FD to
> VFIO_GROUP_SET_CONTAINER which will use the VFIO compat layer in iommufd
> to obtain the compat IOAS and then connect up all the VFIO drivers as
> appropriate.
> 
> This is temporary stopping point, a following series will provide a way to
> directly open a VFIO device FD and directly connect it to IOMMUFD using
> native ioctls that can expose the IOMMUFD features like hwpt, future
> vPASID and dynamic attachment.
> 
> This series, in compat mode, has passed all the qemu tests we have
> available, including the test suites for the Intel GVT mdev. Aside from
> the temporary limitation with P2P memory this is belived to be fully
> compatible with VFIO.

AFAICT there is no equivalent means to specify vfio_iommu_type1.dma_entry_limit when using iommufd; looks like we'll just always get the default 65535.  Was this because you envision the limit being not applicable for iommufd (limits will be enforced via either means and eventually we won't want to ) or was it an oversight?

Thanks,
Matt


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 07/11] vfio-iommufd: Support iommufd for physical VFIO devices
  2022-11-08  0:52 ` [PATCH v2 07/11] vfio-iommufd: Support iommufd for physical VFIO devices Jason Gunthorpe
  2022-11-08  6:10   ` Nicolin Chen
  2022-11-10  3:11   ` Tian, Kevin
@ 2022-11-11  4:12   ` Yi Liu
  2022-11-14 14:47     ` Jason Gunthorpe
  2 siblings, 1 reply; 56+ messages in thread
From: Yi Liu @ 2022-11-11  4:12 UTC (permalink / raw)
  To: Jason Gunthorpe, Alexander Gordeev, David Airlie, Tony Krowiak,
	Alex Williamson, Christian Borntraeger, Cornelia Huck,
	Daniel Vetter, Diana Craciun, dri-devel, Eric Auger, Eric Farman,
	Harald Freudenberger, Vasily Gorbik, Heiko Carstens, intel-gfx,
	intel-gvt-dev, iommu, Jani Nikula, Jason Herne, Joonas Lahtinen,
	Joerg Roedel, Kevin Tian, kvm, linux-s390, Longfang Liu,
	Matthew Rosato, Peter Oberparleiter, Halil Pasic, Robin Murphy,
	Rodrigo Vivi, Shameer Kolothum, Sven Schnelle, Tvrtko Ursulin,
	Vineeth Vijayan, Will Deacon, Yishai Hadas, Zhenyu Wang,
	Zhi Wang
  Cc: Nicolin Chen, Lu Baolu

On 2022/11/8 08:52, Jason Gunthorpe wrote:
> This creates the iommufd_device for the physical VFIO drivers. These are
> all the drivers that are calling vfio_register_group_dev() and expect the
> type1 code to setup a real iommu_domain against their parent struct
> device.
> 
> The design gives the driver a choice in how it gets connected to iommufd
> by providing bind_iommufd/unbind_iommufd/attach_ioas callbacks to
> implement as required. The core code provides three default callbacks for
> physical mode using a real iommu_domain. This is suitable for drivers
> using vfio_register_group_dev()
> 
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>   drivers/vfio/Makefile                         |  1 +
>   drivers/vfio/fsl-mc/vfio_fsl_mc.c             |  3 +
>   drivers/vfio/iommufd.c                        | 99 +++++++++++++++++++
>   .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    |  6 ++
>   drivers/vfio/pci/mlx5/main.c                  |  3 +
>   drivers/vfio/pci/vfio_pci.c                   |  3 +
>   drivers/vfio/platform/vfio_amba.c             |  3 +
>   drivers/vfio/platform/vfio_platform.c         |  3 +
>   drivers/vfio/vfio.h                           | 15 +++
>   drivers/vfio/vfio_main.c                      | 13 ++-
>   include/linux/vfio.h                          | 25 +++++
>   11 files changed, 172 insertions(+), 2 deletions(-)
>   create mode 100644 drivers/vfio/iommufd.c
> 
> diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
> index b693a1169286f8..3863922529ef20 100644
> --- a/drivers/vfio/Makefile
> +++ b/drivers/vfio/Makefile
> @@ -6,6 +6,7 @@ obj-$(CONFIG_VFIO) += vfio.o
>   vfio-y += vfio_main.o \
>   	  iova_bitmap.o \
>   	  container.o
> +vfio-$(CONFIG_IOMMUFD) += iommufd.o
>   
>   obj-$(CONFIG_VFIO_VIRQFD) += vfio_virqfd.o
>   obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
> diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc.c b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
> index b16874e913e4f5..5cd4bb47644039 100644
> --- a/drivers/vfio/fsl-mc/vfio_fsl_mc.c
> +++ b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
> @@ -592,6 +592,9 @@ static const struct vfio_device_ops vfio_fsl_mc_ops = {
>   	.read		= vfio_fsl_mc_read,
>   	.write		= vfio_fsl_mc_write,
>   	.mmap		= vfio_fsl_mc_mmap,
> +	.bind_iommufd	= vfio_iommufd_physical_bind,
> +	.unbind_iommufd	= vfio_iommufd_physical_unbind,
> +	.attach_ioas	= vfio_iommufd_physical_attach_ioas,
>   };
>   
>   static struct fsl_mc_driver vfio_fsl_mc_driver = {
> diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> new file mode 100644
> index 00000000000000..bf755d0f375c5d
> --- /dev/null
> +++ b/drivers/vfio/iommufd.c
> @@ -0,0 +1,99 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES
> + */
> +#include <linux/vfio.h>
> +#include <linux/iommufd.h>
> +
> +#include "vfio.h"
> +
> +MODULE_IMPORT_NS(IOMMUFD);
> +MODULE_IMPORT_NS(IOMMUFD_VFIO);
> +
> +int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
> +{
> +	u32 ioas_id;
> +	u32 device_id;
> +	int ret;
> +
> +	lockdep_assert_held(&vdev->dev_set->lock);
> +
> +	/*
> +	 * If the driver doesn't provide this op then it means the device does
> +	 * not do DMA at all. So nothing to do.
> +	 */
> +	if (!vdev->ops->bind_iommufd)
> +		return 0;
> +
> +	ret = vdev->ops->bind_iommufd(vdev, ictx, &device_id);
> +	if (ret)
> +		return ret;
> +
> +	ret = iommufd_vfio_compat_ioas_id(ictx, &ioas_id);
> +	if (ret)
> +		goto err_unbind;
> +	ret = vdev->ops->attach_ioas(vdev, &ioas_id);
> +	if (ret)
> +		goto err_unbind;
> +	vdev->iommufd_attached = true;

it's better to set this bool in vfio_iommufd_physical_attach_ioas() as
the emulated devices uses iommufd_access instead. is it? or you mean this
flag to cover both cases?
  --
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 09/11] vfio: Move container related MODULE_ALIAS statements into container.c
  2022-11-08  0:52 ` [PATCH v2 09/11] vfio: Move container related MODULE_ALIAS statements into container.c Jason Gunthorpe
  2022-11-10  5:34   ` Tian, Kevin
@ 2022-11-11  4:13   ` Yi Liu
  1 sibling, 0 replies; 56+ messages in thread
From: Yi Liu @ 2022-11-11  4:13 UTC (permalink / raw)
  To: Jason Gunthorpe, Alexander Gordeev, David Airlie, Tony Krowiak,
	Alex Williamson, Christian Borntraeger, Cornelia Huck,
	Daniel Vetter, Diana Craciun, dri-devel, Eric Auger, Eric Farman,
	Harald Freudenberger, Vasily Gorbik, Heiko Carstens, intel-gfx,
	intel-gvt-dev, iommu, Jani Nikula, Jason Herne, Joonas Lahtinen,
	Joerg Roedel, Kevin Tian, kvm, linux-s390, Longfang Liu,
	Matthew Rosato, Peter Oberparleiter, Halil Pasic, Robin Murphy,
	Rodrigo Vivi, Shameer Kolothum, Sven Schnelle, Tvrtko Ursulin,
	Vineeth Vijayan, Will Deacon, Yishai Hadas, Zhenyu Wang,
	Zhi Wang
  Cc: Nicolin Chen, Lu Baolu

On 2022/11/8 08:52, Jason Gunthorpe wrote:
> The miscdev is in container.c, so should these related MODULE_ALIAS
> statements. This is necessary for the next patch to be able to fully
> disable /dev/vfio/vfio.
> 
> Fixes: cdc71fe4ecbf ("vfio: Move container code into drivers/vfio/container.c")
> Reported-by: "Liu, Yi L" <yi.l.liu@intel.com>

s/Liu Yi L/Yi Liu :)

Reviewed-by: Yi Liu <yi.l.liu@intel.com>

> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>   drivers/vfio/container.c | 3 +++
>   drivers/vfio/vfio_main.c | 2 --
>   2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/container.c b/drivers/vfio/container.c
> index 7f3961fd4b5aac..6b362d97d68220 100644
> --- a/drivers/vfio/container.c
> +++ b/drivers/vfio/container.c
> @@ -608,3 +608,6 @@ void vfio_container_cleanup(void)
>   	misc_deregister(&vfio_dev);
>   	mutex_destroy(&vfio.iommu_drivers_lock);
>   }
> +
> +MODULE_ALIAS_MISCDEV(VFIO_MINOR);
> +MODULE_ALIAS("devname:vfio/vfio");
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index cf49c5200a4c05..ee09ccf4a608e1 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -2064,6 +2064,4 @@ MODULE_VERSION(DRIVER_VERSION);
>   MODULE_LICENSE("GPL v2");
>   MODULE_AUTHOR(DRIVER_AUTHOR);
>   MODULE_DESCRIPTION(DRIVER_DESC);
> -MODULE_ALIAS_MISCDEV(VFIO_MINOR);
> -MODULE_ALIAS("devname:vfio/vfio");
>   MODULE_SOFTDEP("post: vfio_iommu_type1 vfio_iommu_spapr_tce");

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 11/11] iommufd: Allow iommufd to supply /dev/vfio/vfio
  2022-11-08  0:52 ` [PATCH v2 11/11] iommufd: Allow iommufd to supply /dev/vfio/vfio Jason Gunthorpe
  2022-11-10  7:01   ` Tian, Kevin
@ 2022-11-11  4:16   ` Yi Liu
  2022-11-11  6:38     ` Yi Liu
  2022-11-14 14:50     ` Jason Gunthorpe
  1 sibling, 2 replies; 56+ messages in thread
From: Yi Liu @ 2022-11-11  4:16 UTC (permalink / raw)
  To: Jason Gunthorpe, Alexander Gordeev, David Airlie, Tony Krowiak,
	Alex Williamson, Christian Borntraeger, Cornelia Huck,
	Daniel Vetter, Diana Craciun, dri-devel, Eric Auger, Eric Farman,
	Harald Freudenberger, Vasily Gorbik, Heiko Carstens, intel-gfx,
	intel-gvt-dev, iommu, Jani Nikula, Jason Herne, Joonas Lahtinen,
	Joerg Roedel, Kevin Tian, kvm, linux-s390, Longfang Liu,
	Matthew Rosato, Peter Oberparleiter, Halil Pasic, Robin Murphy,
	Rodrigo Vivi, Shameer Kolothum, Sven Schnelle, Tvrtko Ursulin,
	Vineeth Vijayan, Will Deacon, Yishai Hadas, Zhenyu Wang,
	Zhi Wang
  Cc: Nicolin Chen, Lu Baolu



On 2022/11/8 08:52, Jason Gunthorpe wrote:
> If the VFIO container is compiled out, give a kconfig option for iommufd
> to provide the miscdev node with the same name and permissions as vfio
> uses.
> 
> The compatibility node supports the same ioctls as VFIO and automatically
> enables the VFIO compatible pinned page accounting mode.
> 
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>   drivers/iommu/iommufd/Kconfig | 12 ++++++++++++
>   drivers/iommu/iommufd/main.c  | 36 +++++++++++++++++++++++++++++++++++
>   2 files changed, 48 insertions(+)
> 
> diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig
> index 399a2edeaef6de..f387f803dc6f7f 100644
> --- a/drivers/iommu/iommufd/Kconfig
> +++ b/drivers/iommu/iommufd/Kconfig
> @@ -12,6 +12,18 @@ config IOMMUFD
>   	  If you don't know what to do here, say N.
>   
>   if IOMMUFD
> +config IOMMUFD_VFIO_CONTAINER
> +	bool "IOMMUFD provides the VFIO container /dev/vfio/vfio"
> +	depends on VFIO && !VFIO_CONTAINER
> +	default VFIO && !VFIO_CONTAINER
> +	help
> +	  IOMMUFD will provide /dev/vfio/vfio instead of VFIO. This relies on
> +	  IOMMUFD providing compatibility emulation to give the same ioctls.
> +	  It provides an option to build a kernel with legacy VFIO components
> +	  removed.
> +
> +	  Unless testing IOMMUFD say N here.
> +
>   config IOMMUFD_TEST
>   	bool "IOMMU Userspace API Test support"
>   	depends on RUNTIME_TESTING_MENU
> diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
> index ab3fa05f38505d..1eeb326f74f005 100644
> --- a/drivers/iommu/iommufd/main.c
> +++ b/drivers/iommu/iommufd/main.c
> @@ -18,6 +18,7 @@
>   #include <uapi/linux/iommufd.h>
>   #include <linux/iommufd.h>
>   
> +#include "io_pagetable.h"
>   #include "iommufd_private.h"
>   #include "iommufd_test.h"
>   
> @@ -25,6 +26,7 @@ struct iommufd_object_ops {
>   	void (*destroy)(struct iommufd_object *obj);
>   };
>   static const struct iommufd_object_ops iommufd_object_ops[];
> +static struct miscdevice vfio_misc_dev;
>   
>   struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx,
>   					     size_t size,
> @@ -170,6 +172,16 @@ static int iommufd_fops_open(struct inode *inode, struct file *filp)
>   	if (!ictx)
>   		return -ENOMEM;
>   
> +	/*
> +	 * For compatibility with VFIO when /dev/vfio/vfio is opened we default
> +	 * to the same rlimit accounting as vfio uses.
> +	 */
> +	if (IS_ENABLED(CONFIG_IOMMUFD_VFIO_CONTAINER) &&
> +	    filp->private_data == &vfio_misc_dev) {
> +		ictx->account_mode = IOPT_PAGES_ACCOUNT_MM;
> +		pr_info_once("IOMMUFD is providing /dev/vfio/vfio, not VFIO.\n");
> +	}
> +
>   	xa_init_flags(&ictx->objects, XA_FLAGS_ALLOC1 | XA_FLAGS_ACCOUNT);
>   	ictx->file = filp;
>   	filp->private_data = ictx;
> @@ -395,6 +407,15 @@ static struct miscdevice iommu_misc_dev = {
>   	.mode = 0660,
>   };
>   
> +
> +static struct miscdevice vfio_misc_dev = {
> +	.minor = VFIO_MINOR,
> +	.name = "vfio",
> +	.fops = &iommufd_fops,
> +	.nodename = "vfio/vfio",
> +	.mode = 0666,
> +};
> +
>   static int __init iommufd_init(void)
>   {
>   	int ret;
> @@ -402,18 +423,33 @@ static int __init iommufd_init(void)
>   	ret = misc_register(&iommu_misc_dev);
>   	if (ret)
>   		return ret;
> +
> +	if (IS_ENABLED(CONFIG_IOMMUFD_VFIO_CONTAINER)) {
> +		ret = misc_register(&vfio_misc_dev);
> +		if (ret)
> +			goto err_misc;
> +	}
>   	iommufd_test_init();
>   	return 0;
> +err_misc:
> +	misc_deregister(&iommu_misc_dev);
> +	return ret;
>   }
>   
>   static void __exit iommufd_exit(void)
>   {
>   	iommufd_test_exit();
> +	if (IS_ENABLED(CONFIG_IOMMUFD_VFIO_CONTAINER))
> +		misc_deregister(&vfio_misc_dev);
>   	misc_deregister(&iommu_misc_dev);
>   }
>   
>   module_init(iommufd_init);
>   module_exit(iommufd_exit);
>   
> +#if IS_ENABLED(CONFIG_IOMMUFD_VFIO_CONTAINER)
> +MODULE_ALIAS_MISCDEV(VFIO_MINOR);
> +MODULE_ALIAS("devname:vfio/vfio");

will this line also result in systemd to create this devnodes at boot
based on the module info even if the IOMMUFD_VFIO_CONTAINER is not
configured?

> +#endif
>   MODULE_DESCRIPTION("I/O Address Space Management for passthrough devices");
>   MODULE_LICENSE("GPL");

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 11/11] iommufd: Allow iommufd to supply /dev/vfio/vfio
  2022-11-11  4:16   ` Yi Liu
@ 2022-11-11  6:38     ` Yi Liu
  2022-11-14 14:50     ` Jason Gunthorpe
  1 sibling, 0 replies; 56+ messages in thread
From: Yi Liu @ 2022-11-11  6:38 UTC (permalink / raw)
  To: Jason Gunthorpe, Alexander Gordeev, David Airlie, Tony Krowiak,
	Alex Williamson, Christian Borntraeger, Cornelia Huck,
	Daniel Vetter, Diana Craciun, dri-devel, Eric Auger, Eric Farman,
	Harald Freudenberger, Vasily Gorbik, Heiko Carstens, intel-gfx,
	intel-gvt-dev, iommu, Jani Nikula, Jason Herne, Joonas Lahtinen,
	Joerg Roedel, Kevin Tian, kvm, linux-s390, Longfang Liu,
	Matthew Rosato, Peter Oberparleiter, Halil Pasic, Robin Murphy,
	Rodrigo Vivi, Shameer Kolothum, Sven Schnelle, Tvrtko Ursulin,
	Vineeth Vijayan, Will Deacon, Yishai Hadas, Zhenyu Wang,
	Zhi Wang
  Cc: Nicolin Chen, Lu Baolu

On 2022/11/11 12:16, Yi Liu wrote:
> 
> 
> On 2022/11/8 08:52, Jason Gunthorpe wrote:
>> If the VFIO container is compiled out, give a kconfig option for iommufd
>> to provide the miscdev node with the same name and permissions as vfio
>> uses.
>>
>> The compatibility node supports the same ioctls as VFIO and automatically
>> enables the VFIO compatible pinned page accounting mode.
>>
>> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
>> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
>> ---
>>   drivers/iommu/iommufd/Kconfig | 12 ++++++++++++
>>   drivers/iommu/iommufd/main.c  | 36 +++++++++++++++++++++++++++++++++++
>>   2 files changed, 48 insertions(+)
>>
>> diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig
>> index 399a2edeaef6de..f387f803dc6f7f 100644
>> --- a/drivers/iommu/iommufd/Kconfig
>> +++ b/drivers/iommu/iommufd/Kconfig
>> @@ -12,6 +12,18 @@ config IOMMUFD
>>         If you don't know what to do here, say N.
>>   if IOMMUFD
>> +config IOMMUFD_VFIO_CONTAINER
>> +    bool "IOMMUFD provides the VFIO container /dev/vfio/vfio"
>> +    depends on VFIO && !VFIO_CONTAINER
>> +    default VFIO && !VFIO_CONTAINER
>> +    help
>> +      IOMMUFD will provide /dev/vfio/vfio instead of VFIO. This relies on
>> +      IOMMUFD providing compatibility emulation to give the same ioctls.
>> +      It provides an option to build a kernel with legacy VFIO components
>> +      removed.
>> +
>> +      Unless testing IOMMUFD say N here.
>> +
>>   config IOMMUFD_TEST
>>       bool "IOMMU Userspace API Test support"
>>       depends on RUNTIME_TESTING_MENU
>> diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
>> index ab3fa05f38505d..1eeb326f74f005 100644
>> --- a/drivers/iommu/iommufd/main.c
>> +++ b/drivers/iommu/iommufd/main.c
>> @@ -18,6 +18,7 @@
>>   #include <uapi/linux/iommufd.h>
>>   #include <linux/iommufd.h>
>> +#include "io_pagetable.h"
>>   #include "iommufd_private.h"
>>   #include "iommufd_test.h"
>> @@ -25,6 +26,7 @@ struct iommufd_object_ops {
>>       void (*destroy)(struct iommufd_object *obj);
>>   };
>>   static const struct iommufd_object_ops iommufd_object_ops[];
>> +static struct miscdevice vfio_misc_dev;
>>   struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx,
>>                            size_t size,
>> @@ -170,6 +172,16 @@ static int iommufd_fops_open(struct inode *inode, 
>> struct file *filp)
>>       if (!ictx)
>>           return -ENOMEM;
>> +    /*
>> +     * For compatibility with VFIO when /dev/vfio/vfio is opened we default
>> +     * to the same rlimit accounting as vfio uses.
>> +     */
>> +    if (IS_ENABLED(CONFIG_IOMMUFD_VFIO_CONTAINER) &&
>> +        filp->private_data == &vfio_misc_dev) {
>> +        ictx->account_mode = IOPT_PAGES_ACCOUNT_MM;
>> +        pr_info_once("IOMMUFD is providing /dev/vfio/vfio, not VFIO.\n");
>> +    }
>> +
>>       xa_init_flags(&ictx->objects, XA_FLAGS_ALLOC1 | XA_FLAGS_ACCOUNT);
>>       ictx->file = filp;
>>       filp->private_data = ictx;
>> @@ -395,6 +407,15 @@ static struct miscdevice iommu_misc_dev = {
>>       .mode = 0660,
>>   };
>> +
>> +static struct miscdevice vfio_misc_dev = {
>> +    .minor = VFIO_MINOR,
>> +    .name = "vfio",
>> +    .fops = &iommufd_fops,
>> +    .nodename = "vfio/vfio",
>> +    .mode = 0666,
>> +};
>> +
>>   static int __init iommufd_init(void)
>>   {
>>       int ret;
>> @@ -402,18 +423,33 @@ static int __init iommufd_init(void)
>>       ret = misc_register(&iommu_misc_dev);
>>       if (ret)
>>           return ret;
>> +
>> +    if (IS_ENABLED(CONFIG_IOMMUFD_VFIO_CONTAINER)) {
>> +        ret = misc_register(&vfio_misc_dev);
>> +        if (ret)
>> +            goto err_misc;
>> +    }
>>       iommufd_test_init();
>>       return 0;
>> +err_misc:
>> +    misc_deregister(&iommu_misc_dev);
>> +    return ret;
>>   }
>>   static void __exit iommufd_exit(void)
>>   {
>>       iommufd_test_exit();
>> +    if (IS_ENABLED(CONFIG_IOMMUFD_VFIO_CONTAINER))
>> +        misc_deregister(&vfio_misc_dev);
>>       misc_deregister(&iommu_misc_dev);
>>   }
>>   module_init(iommufd_init);
>>   module_exit(iommufd_exit);
>> +#if IS_ENABLED(CONFIG_IOMMUFD_VFIO_CONTAINER)
>> +MODULE_ALIAS_MISCDEV(VFIO_MINOR);
>> +MODULE_ALIAS("devname:vfio/vfio");
> 
> will this line also result in systemd to create this devnodes at boot
> based on the module info even if the IOMMUFD_VFIO_CONTAINER is not
> configured?

stale comment. it's already under if IS_ENABLED(CONFIG_IOMMUFD_VFIO_CONTAINER).

looks good to me.

Reviewed-by: Yi Liu <yi.l.liu@intel.com>

>> +#endif
>>   MODULE_DESCRIPTION("I/O Address Space Management for passthrough 
>> devices");
>>   MODULE_LICENSE("GPL");
> 

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 00/11] Connect VFIO to IOMMUFD
  2022-11-09 16:57     ` Jason Gunthorpe
@ 2022-11-14 12:51       ` Yi Liu
  2022-11-14 14:37         ` Yang, Lixiao
  2022-11-14 14:38         ` Jason Gunthorpe
  2022-11-15  1:16       ` Matthew Rosato
  1 sibling, 2 replies; 56+ messages in thread
From: Yi Liu @ 2022-11-14 12:51 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, Hao, Xudong, Kevin Tian, dri-devel, Vineeth Vijayan,
	Diana Craciun, Alexander Gordeev, Longfang Liu, linux-s390,
	Matthew Rosato, Will Deacon, Joerg Roedel, Halil Pasic, iommu,
	Nicolin Chen, Christian Borntraeger, intel-gfx, Zhi Wang,
	Jason Herne, Eric Farman, Vasily Gorbik, Heiko Carstens, Yang,
	Lixiao, Eric Auger, Alex Williamson, He, Yu,
	Harald Freudenberger, Rodrigo Vivi, intel-gvt-dev, Tony Krowiak,
	Tvrtko Ursulin, Yishai Hadas, Cornelia Huck, Peter Oberparleiter,
	Shameer Kolothum, Sven Schnelle, Robin Murphy, Lu Baolu

On 2022/11/10 00:57, Jason Gunthorpe wrote:
> On Tue, Nov 08, 2022 at 11:18:03PM +0800, Yi Liu wrote:
>> On 2022/11/8 17:19, Nicolin Chen wrote:
>>> On Mon, Nov 07, 2022 at 08:52:44PM -0400, Jason Gunthorpe wrote:
>>>
>>>> This is on github: https://github.com/jgunthorpe/linux/commits/vfio_iommufd
>>> [...]
>>>> v2:
>>>>    - Rebase to v6.1-rc3, v4 iommufd series
>>>>    - Fixup comments and commit messages from list remarks
>>>>    - Fix leaking of the iommufd for mdevs
>>>>    - New patch to fix vfio modaliases when vfio container is disabled
>>>>    - Add a dmesg once when the iommufd provided /dev/vfio/vfio is opened
>>>>      to signal that iommufd is providing this
>>>
>>> I've redone my previous sanity tests. Except those reported bugs,
>>> things look fine. Once we fix those issues, GVT and other modules
>>> can run some more stressful tests, I think.
>>
>> our side is also starting test (gvt, nic passthrough) this version. need to
>> wait a while for the result.
> 
> I've updated the branches with the two functional fixes discussed on
> the list plus all the doc updates.
> 

I see, due to timzone, the kernel we grabbed is 37c9e6e44d77a, it has
slight diff in the scripts/kernel-doc compared with the latest commit
(6bb16a9c67769). I don't think it impacts the test.

https://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd.git/log/?h=for-next 
  (37c9e6e44d77a)

Our side, Yu He, Lixiao Yang has done below tests on Intel platform with
the above kernel, results are:

1) GVT-g test suit passed, Intel iGFx passthrough passed.

2) NIC passthrough test with different guest memory (1G/4G), passed.

3) Booting two different QEMUs in the same time but one QEMU opens
legacy /dev/vfio/vfio and another opens /dev/iommu. Tests passed.

4) Tried below Kconfig combinations, results are expected.

VFIO_CONTAINER=y, IOMMUFD=y   -- test pass
VFIO_CONTAINER=y, IOMMUFD=n   -- test pass
VFIO_CONTAINER=n, IOMMUFD=y , IOMMUFD_VFIO_CONTAINER=y  -- test pass
VFIO_CONTAINER=n, IOMMUFD=y , IOMMUFD_VFIO_CONTAINER=n  -- no 
/dev/vfio/vfio, so test fail, expected

5) Tested devices from multi-device group. Assign such devices to the same
VM, pass; assign them to different VMs, fail; assign them to a VM with 
Intel virtual VT-d, fail; Results are expected.

Meanwhile, I also tested the branch in development branch for nesting,
the basic functionality looks good.

Tested-by: Yi Liu <yi.l.liu@intel.com>

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 00/11] Connect VFIO to IOMMUFD
  2022-11-11  3:01 ` Matthew Rosato
@ 2022-11-14 14:23   ` Jason Gunthorpe
  2022-11-14 14:55     ` Matthew Rosato
  0 siblings, 1 reply; 56+ messages in thread
From: Jason Gunthorpe @ 2022-11-14 14:23 UTC (permalink / raw)
  To: Matthew Rosato
  Cc: kvm, Kevin Tian, dri-devel, Vineeth Vijayan, Diana Craciun,
	Alexander Gordeev, Longfang Liu, linux-s390, Liu, Yi L,
	Will Deacon, Joerg Roedel, Halil Pasic, iommu, Nicolin Chen,
	Christian Borntraeger, intel-gfx, Zhi Wang, Jason Herne,
	Eric Farman, Vasily Gorbik, Heiko Carstens, Eric Auger,
	Alex Williamson, Harald Freudenberger, Rodrigo Vivi,
	intel-gvt-dev, Tony Krowiak, Tvrtko Ursulin, Yishai Hadas,
	Cornelia Huck, Peter Oberparleiter, Shameer Kolothum,
	Sven Schnelle, Robin Murphy, Lu Baolu

On Thu, Nov 10, 2022 at 10:01:13PM -0500, Matthew Rosato wrote:
> On 11/7/22 7:52 PM, Jason Gunthorpe wrote:
> > This series provides an alternative container layer for VFIO implemented
> > using iommufd. This is optional, if CONFIG_IOMMUFD is not set then it will
> > not be compiled in.
> > 
> > At this point iommufd can be injected by passing in a iommfd FD to
> > VFIO_GROUP_SET_CONTAINER which will use the VFIO compat layer in iommufd
> > to obtain the compat IOAS and then connect up all the VFIO drivers as
> > appropriate.
> > 
> > This is temporary stopping point, a following series will provide a way to
> > directly open a VFIO device FD and directly connect it to IOMMUFD using
> > native ioctls that can expose the IOMMUFD features like hwpt, future
> > vPASID and dynamic attachment.
> > 
> > This series, in compat mode, has passed all the qemu tests we have
> > available, including the test suites for the Intel GVT mdev. Aside from
> > the temporary limitation with P2P memory this is belived to be fully
> > compatible with VFIO.
> 
> AFAICT there is no equivalent means to specify
> vfio_iommu_type1.dma_entry_limit when using iommufd; looks like
> we'll just always get the default 65535.

No, there is no arbitary limit on iommufd

> Was this because you envision the limit being not applicable for
> iommufd (limits will be enforced via either means and eventually we
> won't want to ) or was it an oversight?

The limit here is primarily about limiting userspace abuse of the
interface.

iommufd is using GFP_KERNEL_ACCOUNT which shifts the responsiblity to
cgroups, which is similar to how KVM works.

So, for a VM sandbox you'd set a cgroup limit and if a hostile
userspace in the sanbox decides to try to OOM the system it will hit
that limit, regardless of which kernel APIs it tries to abuse.

This work is not entirely complete as we also need the iommu driver to
use GFP_KERNEL_ACCOUNT for allocations connected to the iommu_domain,
particularly for allocations of the IO page tables themselves - which
can be quite big.

Jason

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v2 00/11] Connect VFIO to IOMMUFD
  2022-11-14 12:51       ` Yi Liu
@ 2022-11-14 14:37         ` Yang, Lixiao
  2022-11-15  5:41           ` He, Yu
  2022-11-14 14:38         ` Jason Gunthorpe
  1 sibling, 1 reply; 56+ messages in thread
From: Yang, Lixiao @ 2022-11-14 14:37 UTC (permalink / raw)
  To: Liu, Yi L, Jason Gunthorpe
  Cc: kvm, Hao, Xudong, Tian, Kevin, dri-devel, Vineeth Vijayan,
	Diana Craciun, Alexander Gordeev, Longfang Liu, linux-s390,
	Matthew Rosato, Will Deacon, Joerg Roedel, Halil Pasic, iommu,
	Nicolin Chen, Christian Borntraeger, intel-gfx, Wang, Zhi A,
	Jason Herne, Eric Farman, Vasily Gorbik, Heiko Carstens,
	Eric Auger, Alex Williamson, He, Yu, Harald Freudenberger, Vivi,
	Rodrigo, intel-gvt-dev, Tony Krowiak, Tvrtko Ursulin,
	Yishai Hadas, Cornelia Huck, Peter Oberparleiter,
	Shameer Kolothum, Sven Schnelle, Robin Murphy, Lu Baolu

On 2022/11/14 20:51, Yi Liu wrote:
> On 2022/11/10 00:57, Jason Gunthorpe wrote:
>> On Tue, Nov 08, 2022 at 11:18:03PM +0800, Yi Liu wrote:
>>> On 2022/11/8 17:19, Nicolin Chen wrote:
>>>> On Mon, Nov 07, 2022 at 08:52:44PM -0400, Jason Gunthorpe wrote:
>>>>
>>>>> This is on github:
>>>>> https://github.com/jgunthorpe/linux/commits/vfio_iommufd
>>>> [...]
>>>>> v2:
>>>>>    - Rebase to v6.1-rc3, v4 iommufd series
>>>>>    - Fixup comments and commit messages from list remarks
>>>>>    - Fix leaking of the iommufd for mdevs
>>>>>    - New patch to fix vfio modaliases when vfio container is disabled
>>>>>    - Add a dmesg once when the iommufd provided /dev/vfio/vfio is opened
>>>>>      to signal that iommufd is providing this
>>>>
>>>> I've redone my previous sanity tests. Except those reported bugs, 
>>>> things look fine. Once we fix those issues, GVT and other modules 
>>>> can run some more stressful tests, I think.
>>>
>>> our side is also starting test (gvt, nic passthrough) this version. 
>>> need to wait a while for the result.
>>
>> I've updated the branches with the two functional fixes discussed on 
>> the list plus all the doc updates.
>>
>
> I see, due to timzone, the kernel we grabbed is 37c9e6e44d77a, it has 
> slight diff in the scripts/kernel-doc compared with the latest commit 
> (6bb16a9c67769). I don't think it impacts the test.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd.git/log/?h=for-next
>   (37c9e6e44d77a)
>
> Our side, Yu He, Lixiao Yang has done below tests on Intel platform 
> with the above kernel, results are:
>
> 1) GVT-g test suit passed, Intel iGFx passthrough passed.
>
> 2) NIC passthrough test with different guest memory (1G/4G), passed.
>
> 3) Booting two different QEMUs in the same time but one QEMU opens 
> legacy /dev/vfio/vfio and another opens /dev/iommu. Tests passed.
>
> 4) Tried below Kconfig combinations, results are expected.
>
> VFIO_CONTAINER=y, IOMMUFD=y   -- test pass
> VFIO_CONTAINER=y, IOMMUFD=n   -- test pass
> VFIO_CONTAINER=n, IOMMUFD=y , IOMMUFD_VFIO_CONTAINER=y  -- test pass 
> VFIO_CONTAINER=n, IOMMUFD=y , IOMMUFD_VFIO_CONTAINER=n  -- no 
> /dev/vfio/vfio, so test fail, expected
>
> 5) Tested devices from multi-device group. Assign such devices to the 
> same VM, pass; assign them to different VMs, fail; assign them to a VM 
> with Intel virtual VT-d, fail; Results are expected.
>
> Meanwhile, I also tested the branch in development branch for nesting, 
> the basic functionality looks good.
>
> Tested-by: Yi Liu <yi.l.liu@intel.com>
>
Tested-by: Lixiao Yang <lixiao.yang@intel.com>

--
Regards,
Lixiao Yang

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 00/11] Connect VFIO to IOMMUFD
  2022-11-14 12:51       ` Yi Liu
  2022-11-14 14:37         ` Yang, Lixiao
@ 2022-11-14 14:38         ` Jason Gunthorpe
  2022-11-14 14:42           ` Yi Liu
  1 sibling, 1 reply; 56+ messages in thread
From: Jason Gunthorpe @ 2022-11-14 14:38 UTC (permalink / raw)
  To: Yi Liu
  Cc: kvm, Hao, Xudong, Kevin Tian, dri-devel, Vineeth Vijayan,
	Diana Craciun, Alexander Gordeev, Longfang Liu, linux-s390,
	Matthew Rosato, Will Deacon, Joerg Roedel, Halil Pasic, iommu,
	Nicolin Chen, Christian Borntraeger, intel-gfx, Zhi Wang,
	Jason Herne, Eric Farman, Vasily Gorbik, Heiko Carstens, Yang,
	Lixiao, Eric Auger, Alex Williamson, He, Yu,
	Harald Freudenberger, Rodrigo Vivi, intel-gvt-dev, Tony Krowiak,
	Tvrtko Ursulin, Yishai Hadas, Cornelia Huck, Peter Oberparleiter,
	Shameer Kolothum, Sven Schnelle, Robin Murphy, Lu Baolu

On Mon, Nov 14, 2022 at 08:51:58PM +0800, Yi Liu wrote:

> Our side, Yu He, Lixiao Yang has done below tests on Intel platform with
> the above kernel, results are:
> 
> 1) GVT-g test suit passed, Intel iGFx passthrough passed.
> 
> 2) NIC passthrough test with different guest memory (1G/4G), passed.
> 
> 3) Booting two different QEMUs in the same time but one QEMU opens
> legacy /dev/vfio/vfio and another opens /dev/iommu. Tests passed.
> 
> 4) Tried below Kconfig combinations, results are expected.
> 
> VFIO_CONTAINER=y, IOMMUFD=y   -- test pass
> VFIO_CONTAINER=y, IOMMUFD=n   -- test pass
> VFIO_CONTAINER=n, IOMMUFD=y , IOMMUFD_VFIO_CONTAINER=y  -- test pass
> VFIO_CONTAINER=n, IOMMUFD=y , IOMMUFD_VFIO_CONTAINER=n  -- no
> /dev/vfio/vfio, so test fail, expected
> 
> 5) Tested devices from multi-device group. Assign such devices to the same
> VM, pass; assign them to different VMs, fail; assign them to a VM with Intel
> virtual VT-d, fail; Results are expected.
> 
> Meanwhile, I also tested the branch in development branch for nesting,
> the basic functionality looks good.
> 
> Tested-by: Yi Liu <yi.l.liu@intel.com>

Great thanks!

In future I also recommend running tests with the CONFIG_IOMMUFD_TEST
turned on, it enables a bunch more fast path assertions that might
catch something interesting

Jason

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 00/11] Connect VFIO to IOMMUFD
  2022-11-14 14:38         ` Jason Gunthorpe
@ 2022-11-14 14:42           ` Yi Liu
  0 siblings, 0 replies; 56+ messages in thread
From: Yi Liu @ 2022-11-14 14:42 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, Hao, Xudong, Kevin Tian, dri-devel, Vineeth Vijayan,
	Diana Craciun, Alexander Gordeev, Longfang Liu, linux-s390,
	Matthew Rosato, Will Deacon, Joerg Roedel, Halil Pasic, iommu,
	Nicolin Chen, Christian Borntraeger, intel-gfx, Zhi Wang,
	Jason Herne, Eric Farman, Vasily Gorbik, Heiko Carstens, Yang,
	Lixiao, Eric Auger, Alex Williamson, He, Yu,
	Harald Freudenberger, Rodrigo Vivi, intel-gvt-dev, Tony Krowiak,
	Tvrtko Ursulin, Yishai Hadas, Cornelia Huck, Peter Oberparleiter,
	Shameer Kolothum, Sven Schnelle, Robin Murphy, Lu Baolu

On 2022/11/14 22:38, Jason Gunthorpe wrote:
> On Mon, Nov 14, 2022 at 08:51:58PM +0800, Yi Liu wrote:
> 
>> Our side, Yu He, Lixiao Yang has done below tests on Intel platform with
>> the above kernel, results are:
>>
>> 1) GVT-g test suit passed, Intel iGFx passthrough passed.
>>
>> 2) NIC passthrough test with different guest memory (1G/4G), passed.
>>
>> 3) Booting two different QEMUs in the same time but one QEMU opens
>> legacy /dev/vfio/vfio and another opens /dev/iommu. Tests passed.
>>
>> 4) Tried below Kconfig combinations, results are expected.
>>
>> VFIO_CONTAINER=y, IOMMUFD=y   -- test pass
>> VFIO_CONTAINER=y, IOMMUFD=n   -- test pass
>> VFIO_CONTAINER=n, IOMMUFD=y , IOMMUFD_VFIO_CONTAINER=y  -- test pass
>> VFIO_CONTAINER=n, IOMMUFD=y , IOMMUFD_VFIO_CONTAINER=n  -- no
>> /dev/vfio/vfio, so test fail, expected
>>
>> 5) Tested devices from multi-device group. Assign such devices to the same
>> VM, pass; assign them to different VMs, fail; assign them to a VM with Intel
>> virtual VT-d, fail; Results are expected.
>>
>> Meanwhile, I also tested the branch in development branch for nesting,
>> the basic functionality looks good.
>>
>> Tested-by: Yi Liu <yi.l.liu@intel.com>
> 
> Great thanks!

you are welcome. this is a team work. :)

> In future I also recommend running tests with the CONFIG_IOMMUFD_TEST
> turned on, it enables a bunch more fast path assertions that might
> catch something interesting

oh, sure. will try.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 07/11] vfio-iommufd: Support iommufd for physical VFIO devices
  2022-11-11  4:12   ` Yi Liu
@ 2022-11-14 14:47     ` Jason Gunthorpe
  0 siblings, 0 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2022-11-14 14:47 UTC (permalink / raw)
  To: Yi Liu
  Cc: kvm, Kevin Tian, dri-devel, Vineeth Vijayan, Diana Craciun,
	Alexander Gordeev, Longfang Liu, linux-s390, Matthew Rosato,
	Will Deacon, Joerg Roedel, Halil Pasic, iommu, Nicolin Chen,
	Christian Borntraeger, intel-gfx, Zhi Wang, Jason Herne,
	Eric Farman, Vasily Gorbik, Heiko Carstens, Eric Auger,
	Alex Williamson, Harald Freudenberger, Rodrigo Vivi,
	intel-gvt-dev, Tony Krowiak, Tvrtko Ursulin, Yishai Hadas,
	Cornelia Huck, Peter Oberparleiter, Shameer Kolothum,
	Sven Schnelle, Robin Murphy, Lu Baolu

On Fri, Nov 11, 2022 at 12:12:36PM +0800, Yi Liu wrote:

> > +int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
> > +{
> > +	u32 ioas_id;
> > +	u32 device_id;
> > +	int ret;
> > +
> > +	lockdep_assert_held(&vdev->dev_set->lock);
> > +
> > +	/*
> > +	 * If the driver doesn't provide this op then it means the device does
> > +	 * not do DMA at all. So nothing to do.
> > +	 */
> > +	if (!vdev->ops->bind_iommufd)
> > +		return 0;
> > +
> > +	ret = vdev->ops->bind_iommufd(vdev, ictx, &device_id);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = iommufd_vfio_compat_ioas_id(ictx, &ioas_id);
> > +	if (ret)
> > +		goto err_unbind;
> > +	ret = vdev->ops->attach_ioas(vdev, &ioas_id);
> > +	if (ret)
> > +		goto err_unbind;
> > +	vdev->iommufd_attached = true;
> 
> it's better to set this bool in vfio_iommufd_physical_attach_ioas() as
> the emulated devices uses iommufd_access instead. is it? or you mean this
> flag to cover both cases?

Yes, that is probably clearer:

@@ -50,7 +50,6 @@ int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
 	ret = vdev->ops->attach_ioas(vdev, &ioas_id);
 	if (ret)
 		goto err_unbind;
-	vdev->iommufd_attached = true;
 
 	/*
 	 * The legacy path has no way to return the device id or the selected
@@ -110,10 +109,15 @@ EXPORT_SYMBOL_GPL(vfio_iommufd_physical_unbind);
 int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
 {
 	unsigned int flags = 0;
+	int rc;
 
 	if (vfio_allow_unsafe_interrupts)
 		flags |= IOMMUFD_ATTACH_FLAGS_ALLOW_UNSAFE_INTERRUPT;
-	return iommufd_device_attach(vdev->iommufd_device, pt_id, flags);
+	rc = iommufd_device_attach(vdev->iommufd_device, pt_id, flags);
+	if (rc)
+		return rc;
+	vdev->iommufd_attached = true;
+	return 0;
 }
 EXPORT_SYMBOL_GPL(vfio_iommufd_physical_attach_ioas);
 
Thanks,
Jason

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 11/11] iommufd: Allow iommufd to supply /dev/vfio/vfio
  2022-11-11  4:16   ` Yi Liu
  2022-11-11  6:38     ` Yi Liu
@ 2022-11-14 14:50     ` Jason Gunthorpe
  1 sibling, 0 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2022-11-14 14:50 UTC (permalink / raw)
  To: Yi Liu
  Cc: kvm, Kevin Tian, dri-devel, Vineeth Vijayan, Diana Craciun,
	Alexander Gordeev, Longfang Liu, linux-s390, Matthew Rosato,
	Will Deacon, Joerg Roedel, Halil Pasic, iommu, Nicolin Chen,
	Christian Borntraeger, intel-gfx, Zhi Wang, Jason Herne,
	Eric Farman, Vasily Gorbik, Heiko Carstens, Eric Auger,
	Alex Williamson, Harald Freudenberger, Rodrigo Vivi,
	intel-gvt-dev, Tony Krowiak, Tvrtko Ursulin, Yishai Hadas,
	Cornelia Huck, Peter Oberparleiter, Shameer Kolothum,
	Sven Schnelle, Robin Murphy, Lu Baolu

On Fri, Nov 11, 2022 at 12:16:02PM +0800, Yi Liu wrote:
> > +#if IS_ENABLED(CONFIG_IOMMUFD_VFIO_CONTAINER)
> > +MODULE_ALIAS_MISCDEV(VFIO_MINOR);
> > +MODULE_ALIAS("devname:vfio/vfio");
> 
> will this line also result in systemd to create this devnodes at boot
> based on the module info even if the IOMMUFD_VFIO_CONTAINER is not
> configured?

No, it is contained by an ifdef.

The MODULE_ALIAS mechanism works by inspecting the compiled object
files for special ELF sections, if the code is discarded by the
preprocessor then the sections will not be created by the compiler.

Jason

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 00/11] Connect VFIO to IOMMUFD
  2022-11-14 14:23   ` Jason Gunthorpe
@ 2022-11-14 14:55     ` Matthew Rosato
  2022-11-14 14:59       ` Jason Gunthorpe
  0 siblings, 1 reply; 56+ messages in thread
From: Matthew Rosato @ 2022-11-14 14:55 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, Kevin Tian, dri-devel, Vineeth Vijayan, Diana Craciun,
	Alexander Gordeev, Longfang Liu, linux-s390, Liu, Yi L,
	Will Deacon, Joerg Roedel, Halil Pasic, iommu, Nicolin Chen,
	Christian Borntraeger, intel-gfx, Zhi Wang, Jason Herne,
	Eric Farman, Vasily Gorbik, Heiko Carstens, Eric Auger,
	Alex Williamson, Harald Freudenberger, Rodrigo Vivi,
	intel-gvt-dev, Tony Krowiak, Tvrtko Ursulin, Yishai Hadas,
	Cornelia Huck, Peter Oberparleiter, Shameer Kolothum,
	Sven Schnelle, Robin Murphy, Lu Baolu

On 11/14/22 9:23 AM, Jason Gunthorpe wrote:
> On Thu, Nov 10, 2022 at 10:01:13PM -0500, Matthew Rosato wrote:
>> On 11/7/22 7:52 PM, Jason Gunthorpe wrote:
>>> This series provides an alternative container layer for VFIO implemented
>>> using iommufd. This is optional, if CONFIG_IOMMUFD is not set then it will
>>> not be compiled in.
>>>
>>> At this point iommufd can be injected by passing in a iommfd FD to
>>> VFIO_GROUP_SET_CONTAINER which will use the VFIO compat layer in iommufd
>>> to obtain the compat IOAS and then connect up all the VFIO drivers as
>>> appropriate.
>>>
>>> This is temporary stopping point, a following series will provide a way to
>>> directly open a VFIO device FD and directly connect it to IOMMUFD using
>>> native ioctls that can expose the IOMMUFD features like hwpt, future
>>> vPASID and dynamic attachment.
>>>
>>> This series, in compat mode, has passed all the qemu tests we have
>>> available, including the test suites for the Intel GVT mdev. Aside from
>>> the temporary limitation with P2P memory this is belived to be fully
>>> compatible with VFIO.
>>
>> AFAICT there is no equivalent means to specify
>> vfio_iommu_type1.dma_entry_limit when using iommufd; looks like
>> we'll just always get the default 65535.
> 
> No, there is no arbitary limit on iommufd

Yeah, that's what I suspected.  But FWIW, userspace checks the advertised limit via VFIO_IOMMU_GET_INFO / VFIO_IOMMU_TYPE1_INFO_DMA_AVAIL, and this is still being advertised as 65535 when using iommufd.  I don't think there is a defined way to return 'ignore this value'. 

This should go away later when we bind to iommufd directly since QEMU would not be sharing the type1 codepath in userspace. 

> 
>> Was this because you envision the limit being not applicable for
>> iommufd (limits will be enforced via either means and eventually we
>> won't want to ) or was it an oversight?
> 
> The limit here is primarily about limiting userspace abuse of the
> interface.
> 
> iommufd is using GFP_KERNEL_ACCOUNT which shifts the responsiblity to
> cgroups, which is similar to how KVM works.
> 
> So, for a VM sandbox you'd set a cgroup limit and if a hostile
> userspace in the sanbox decides to try to OOM the system it will hit
> that limit, regardless of which kernel APIs it tries to abuse.
> 
> This work is not entirely complete as we also need the iommu driver to
> use GFP_KERNEL_ACCOUNT for allocations connected to the iommu_domain,
> particularly for allocations of the IO page tables themselves - which
> can be quite big.
> 
> Jason


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 00/11] Connect VFIO to IOMMUFD
  2022-11-14 14:55     ` Matthew Rosato
@ 2022-11-14 14:59       ` Jason Gunthorpe
  2022-11-14 15:21         ` Matthew Rosato
  0 siblings, 1 reply; 56+ messages in thread
From: Jason Gunthorpe @ 2022-11-14 14:59 UTC (permalink / raw)
  To: Matthew Rosato
  Cc: kvm, Kevin Tian, dri-devel, Vineeth Vijayan, Diana Craciun,
	Alexander Gordeev, Longfang Liu, linux-s390, Liu, Yi L,
	Will Deacon, Joerg Roedel, Halil Pasic, iommu, Nicolin Chen,
	Christian Borntraeger, intel-gfx, Zhi Wang, Jason Herne,
	Eric Farman, Vasily Gorbik, Heiko Carstens, Eric Auger,
	Alex Williamson, Harald Freudenberger, Rodrigo Vivi,
	intel-gvt-dev, Tony Krowiak, Tvrtko Ursulin, Yishai Hadas,
	Cornelia Huck, Peter Oberparleiter, Shameer Kolothum,
	Sven Schnelle, Robin Murphy, Lu Baolu

On Mon, Nov 14, 2022 at 09:55:21AM -0500, Matthew Rosato wrote:
> >> AFAICT there is no equivalent means to specify
> >> vfio_iommu_type1.dma_entry_limit when using iommufd; looks like
> >> we'll just always get the default 65535.
> > 
> > No, there is no arbitary limit on iommufd
> 
> Yeah, that's what I suspected.  But FWIW, userspace checks the
> advertised limit via VFIO_IOMMU_GET_INFO /
> VFIO_IOMMU_TYPE1_INFO_DMA_AVAIL, and this is still being advertised
> as 65535 when using iommufd.  I don't think there is a defined way
> to return 'ignore this value'.

Is something using this? Should we make it much bigger?

Jason

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 00/11] Connect VFIO to IOMMUFD
  2022-11-14 14:59       ` Jason Gunthorpe
@ 2022-11-14 15:21         ` Matthew Rosato
  2022-11-14 19:27           ` Jason Gunthorpe
  0 siblings, 1 reply; 56+ messages in thread
From: Matthew Rosato @ 2022-11-14 15:21 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: kvm, Kevin Tian, dri-devel, Vineeth Vijayan, Diana Craciun,
	Alexander Gordeev, Longfang Liu, linux-s390, Liu, Yi L,
	Will Deacon, Joerg Roedel, Halil Pasic, iommu, Nicolin Chen,
	Christian Borntraeger, intel-gfx, Zhi Wang, Jason Herne,
	Eric Farman, Vasily Gorbik, Heiko Carstens, Eric Auger,
	Alex Williamson, Harald Freudenberger, Rodrigo Vivi,
	intel-gvt-dev, Tony Krowiak, Tvrtko Ursulin, Yishai Hadas,
	Cornelia Huck, Peter Oberparleiter, Shameer Kolothum,
	Sven Schnelle, Robin Murphy, Lu Baolu

On 11/14/22 9:59 AM, Jason Gunthorpe wrote:
> On Mon, Nov 14, 2022 at 09:55:21AM -0500, Matthew Rosato wrote:
>>>> AFAICT there is no equivalent means to specify
>>>> vfio_iommu_type1.dma_entry_limit when using iommufd; looks like
>>>> we'll just always get the default 65535.
>>>
>>> No, there is no arbitary limit on iommufd
>>
>> Yeah, that's what I suspected.  But FWIW, userspace checks the
>> advertised limit via VFIO_IOMMU_GET_INFO /
>> VFIO_IOMMU_TYPE1_INFO_DMA_AVAIL, and this is still being advertised
>> as 65535 when using iommufd.  I don't think there is a defined way
>> to return 'ignore this value'.
> 
> Is something using this? Should we make it much bigger?

Yes, s390 when doing lazy unmapping likes to use larger amounts of concurrent DMA, so there can be cases where we want to raise this limit.

The initial value of 65535 is already pretty arbitrary (U16_MAX) -- If iommufd is doing its own management and this value becomes deprecated in this scenario, and we can't set it to a magic value that says 'ignore me' then maybe it just makes sense for now to set it arbitrarily larger when using iommufd e.g. U32_MAX?


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 00/11] Connect VFIO to IOMMUFD
  2022-11-14 15:21         ` Matthew Rosato
@ 2022-11-14 19:27           ` Jason Gunthorpe
  0 siblings, 0 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2022-11-14 19:27 UTC (permalink / raw)
  To: Matthew Rosato
  Cc: kvm, Kevin Tian, dri-devel, Vineeth Vijayan, Diana Craciun,
	Alexander Gordeev, Longfang Liu, linux-s390, Liu, Yi L,
	Will Deacon, Joerg Roedel, Halil Pasic, iommu, Nicolin Chen,
	Christian Borntraeger, intel-gfx, Zhi Wang, Jason Herne,
	Eric Farman, Vasily Gorbik, Heiko Carstens, Eric Auger,
	Alex Williamson, Harald Freudenberger, Rodrigo Vivi,
	intel-gvt-dev, Tony Krowiak, Tvrtko Ursulin, Yishai Hadas,
	Cornelia Huck, Peter Oberparleiter, Shameer Kolothum,
	Sven Schnelle, Robin Murphy, Lu Baolu

On Mon, Nov 14, 2022 at 10:21:50AM -0500, Matthew Rosato wrote:
> On 11/14/22 9:59 AM, Jason Gunthorpe wrote:
> > On Mon, Nov 14, 2022 at 09:55:21AM -0500, Matthew Rosato wrote:
> >>>> AFAICT there is no equivalent means to specify
> >>>> vfio_iommu_type1.dma_entry_limit when using iommufd; looks like
> >>>> we'll just always get the default 65535.
> >>>
> >>> No, there is no arbitary limit on iommufd
> >>
> >> Yeah, that's what I suspected.  But FWIW, userspace checks the
> >> advertised limit via VFIO_IOMMU_GET_INFO /
> >> VFIO_IOMMU_TYPE1_INFO_DMA_AVAIL, and this is still being advertised
> >> as 65535 when using iommufd.  I don't think there is a defined way
> >> to return 'ignore this value'.
> > 
> > Is something using this? Should we make it much bigger?
> 
> Yes, s390 when doing lazy unmapping likes to use larger amounts of
> concurrent DMA, so there can be cases where we want to raise this
> limit.
> 
> The initial value of 65535 is already pretty arbitrary (U16_MAX) --

It was choosen to match VFIO's default

> If iommufd is doing its own management and this value becomes
> deprecated in this scenario, and we can't set it to a magic value
> that says 'ignore me' then maybe it just makes sense for now to set
> it arbitrarily larger when using iommufd e.g. U32_MAX?

Sure

		/*
		 * iommufd's limit is based on the cgroup's memory limit.
		 * Normally vfio would return U16_MAX here, and provide a module
		 * parameter to adjust it. Since S390 qemu userspace actually
		 * pays attention and needs a value bigger than U16_MAX return
		 * U32_MAX.
		 */
		.avail = U32_MAX,

Thanks,
Jason

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 00/11] Connect VFIO to IOMMUFD
  2022-11-09 16:57     ` Jason Gunthorpe
  2022-11-14 12:51       ` Yi Liu
@ 2022-11-15  1:16       ` Matthew Rosato
  1 sibling, 0 replies; 56+ messages in thread
From: Matthew Rosato @ 2022-11-15  1:16 UTC (permalink / raw)
  To: Jason Gunthorpe, Yi Liu
  Cc: kvm, Kevin Tian, dri-devel, Vineeth Vijayan, Diana Craciun,
	Alexander Gordeev, Longfang Liu, linux-s390, Will Deacon,
	Joerg Roedel, Halil Pasic, iommu, Nicolin Chen,
	Christian Borntraeger, intel-gfx, Zhi Wang, Jason Herne,
	Eric Farman, Vasily Gorbik, Heiko Carstens, Eric Auger,
	Alex Williamson, Harald Freudenberger, Rodrigo Vivi,
	intel-gvt-dev, Tony Krowiak, Tvrtko Ursulin, Yishai Hadas,
	Cornelia Huck, Peter Oberparleiter, Shameer Kolothum,
	Sven Schnelle, Robin Murphy, Lu Baolu

On 11/9/22 11:57 AM, Jason Gunthorpe wrote:
> On Tue, Nov 08, 2022 at 11:18:03PM +0800, Yi Liu wrote:
>> On 2022/11/8 17:19, Nicolin Chen wrote:
>>> On Mon, Nov 07, 2022 at 08:52:44PM -0400, Jason Gunthorpe wrote:
>>>
>>>> This is on github: https://github.com/jgunthorpe/linux/commits/vfio_iommufd
>>> [...]
>>>> v2:
>>>>   - Rebase to v6.1-rc3, v4 iommufd series
>>>>   - Fixup comments and commit messages from list remarks
>>>>   - Fix leaking of the iommufd for mdevs
>>>>   - New patch to fix vfio modaliases when vfio container is disabled
>>>>   - Add a dmesg once when the iommufd provided /dev/vfio/vfio is opened
>>>>     to signal that iommufd is providing this
>>>
>>> I've redone my previous sanity tests. Except those reported bugs,
>>> things look fine. Once we fix those issues, GVT and other modules
>>> can run some more stressful tests, I think.
>>
>> our side is also starting test (gvt, nic passthrough) this version. need to
>> wait a while for the result.
> 
> I've updated the branches with the two functional fixes discussed on
> the list plus all the doc updates.
> 

For s390, tested vfio-pci against some data mover workloads using QEMU on s390x with CONFIG_VFIO_CONTAINER=y and =n using zPCI interpretation assists (over ism/SMC-D, mlx5 and NVMe) and without zPCI interpretation assists (over mlx5 and NVMe) - will continue testing with more aggressive workloads.  
(I did not run with CONFIG_IOMMUFD_TEST other than when building the selftest, but I see you mentioned this to Yi -- I'll incorporate that setting into future runs.)

Ran the self-tests on s390 in LPAR and within a QEMU guest -- all tests pass (used 1M hugepages)

Did light regression testing of vfio-ap and vfio-ccw on s390x with CONFIG_VFIO_CONTAINER=y and =n.

Didn't see it in your branch yet, but also verified the proposed change to iommufd_fill_cap_dma_avail (.avail = U32_MAX) would work as expected.

Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> 



^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v2 00/11] Connect VFIO to IOMMUFD
  2022-11-14 14:37         ` Yang, Lixiao
@ 2022-11-15  5:41           ` He, Yu
  0 siblings, 0 replies; 56+ messages in thread
From: He, Yu @ 2022-11-15  5:41 UTC (permalink / raw)
  To: Yang, Lixiao, Liu, Yi L, Jason Gunthorpe
  Cc: kvm, Hao, Xudong, Tian, Kevin, dri-devel, Vineeth Vijayan,
	Diana Craciun, Alexander Gordeev, Longfang Liu, linux-s390,
	Matthew Rosato, Will Deacon, Joerg Roedel, Halil Pasic, iommu,
	Nicolin Chen, Christian Borntraeger, intel-gfx, Wang, Zhi A,
	Jason Herne, Eric Farman, Vasily Gorbik, Heiko Carstens,
	Eric Auger, Alex Williamson, Harald Freudenberger, Vivi, Rodrigo,
	intel-gvt-dev, Tony Krowiak, Tvrtko Ursulin, Yishai Hadas,
	Cornelia Huck, Peter Oberparleiter, Shameer Kolothum,
	Sven Schnelle, Robin Murphy, Lu Baolu

On 2022/11/14 22:37, Yang, Lixiao wrote:

> On 2022/11/14 20:51, Yi Liu wrote:
>> On 2022/11/10 00:57, Jason Gunthorpe wrote:
>>> On Tue, Nov 08, 2022 at 11:18:03PM +0800, Yi Liu wrote:
>>>> On 2022/11/8 17:19, Nicolin Chen wrote:
>>>>> On Mon, Nov 07, 2022 at 08:52:44PM -0400, Jason Gunthorpe wrote:
>>>>>
>>>>>> This is on github:
>>>>>> https://github.com/jgunthorpe/linux/commits/vfio_iommufd
>>>>> [...]
>>>>>> v2:
>>>>>>     - Rebase to v6.1-rc3, v4 iommufd series
>>>>>>     - Fixup comments and commit messages from list remarks
>>>>>>     - Fix leaking of the iommufd for mdevs
>>>>>>     - New patch to fix vfio modaliases when vfio container is disabled
>>>>>>     - Add a dmesg once when the iommufd provided /dev/vfio/vfio is opened
>>>>>>       to signal that iommufd is providing this
>>>>>
>>>>> I've redone my previous sanity tests. Except those reported bugs,
>>>>> things look fine. Once we fix those issues, GVT and other modules
>>>>> can run some more stressful tests, I think.
>>>>
>>>> our side is also starting test (gvt, nic passthrough) this version.
>>>> need to wait a while for the result.
>>>
>>> I've updated the branches with the two functional fixes discussed on
>>> the list plus all the doc updates.
>>>
>>
>> I see, due to timzone, the kernel we grabbed is 37c9e6e44d77a, it has
>> slight diff in the scripts/kernel-doc compared with the latest commit
>> (6bb16a9c67769). I don't think it impacts the test.
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd.git/log/?h=for-next
>>    (37c9e6e44d77a)
>>
>> Our side, Yu He, Lixiao Yang has done below tests on Intel platform
>> with the above kernel, results are:
>>
>> 1) GVT-g test suit passed, Intel iGFx passthrough passed.
>>
>> 2) NIC passthrough test with different guest memory (1G/4G), passed.
>>
>> 3) Booting two different QEMUs in the same time but one QEMU opens
>> legacy /dev/vfio/vfio and another opens /dev/iommu. Tests passed.
>>
>> 4) Tried below Kconfig combinations, results are expected.
>>
>> VFIO_CONTAINER=y, IOMMUFD=y   -- test pass
>> VFIO_CONTAINER=y, IOMMUFD=n   -- test pass
>> VFIO_CONTAINER=n, IOMMUFD=y , IOMMUFD_VFIO_CONTAINER=y  -- test pass
>> VFIO_CONTAINER=n, IOMMUFD=y , IOMMUFD_VFIO_CONTAINER=n  -- no
>> /dev/vfio/vfio, so test fail, expected
>>
>> 5) Tested devices from multi-device group. Assign such devices to the
>> same VM, pass; assign them to different VMs, fail; assign them to a VM
>> with Intel virtual VT-d, fail; Results are expected.
>>
>> Meanwhile, I also tested the branch in development branch for nesting,
>> the basic functionality looks good.
>>
>> Tested-by: Yi Liu <yi.l.liu@intel.com>
>>
> Tested-by: Lixiao Yang <lixiao.yang@intel.com>
>
Tested-by: Yu He <yu.he@intel.com>


--
Best regards,
He,Yu

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2022-11-15  5:41 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-08  0:52 [PATCH v2 00/11] Connect VFIO to IOMMUFD Jason Gunthorpe
2022-11-08  0:52 ` [PATCH v2 01/11] vfio: Move vfio_device driver open/close code to a function Jason Gunthorpe
2022-11-08  0:52 ` [PATCH v2 02/11] vfio: Move vfio_device_assign_container() into vfio_device_first_open() Jason Gunthorpe
2022-11-08  0:52 ` [PATCH v2 03/11] vfio: Rename vfio_device_assign/unassign_container() Jason Gunthorpe
2022-11-08  0:52 ` [PATCH v2 04/11] vfio: Move storage of allow_unsafe_interrupts to vfio_main.c Jason Gunthorpe
2022-11-08  0:52 ` [PATCH v2 05/11] vfio: Use IOMMU_CAP_ENFORCE_CACHE_COHERENCY for vfio_file_enforced_coherent() Jason Gunthorpe
2022-11-10  2:48   ` Tian, Kevin
2022-11-08  0:52 ` [PATCH v2 06/11] vfio-iommufd: Allow iommufd to be used in place of a container fd Jason Gunthorpe
2022-11-10  2:51   ` Tian, Kevin
2022-11-08  0:52 ` [PATCH v2 07/11] vfio-iommufd: Support iommufd for physical VFIO devices Jason Gunthorpe
2022-11-08  6:10   ` Nicolin Chen
2022-11-08  7:41     ` Yi Liu
2022-11-08 17:51       ` Jason Gunthorpe
2022-11-10  3:12         ` Tian, Kevin
2022-11-08 17:48     ` Jason Gunthorpe
2022-11-10  3:11   ` Tian, Kevin
2022-11-10 17:20     ` Jason Gunthorpe
2022-11-10 23:58       ` Tian, Kevin
2022-11-11  4:12   ` Yi Liu
2022-11-14 14:47     ` Jason Gunthorpe
2022-11-08  0:52 ` [PATCH v2 08/11] vfio-iommufd: Support iommufd for emulated " Jason Gunthorpe
2022-11-10  5:33   ` Tian, Kevin
2022-11-08  0:52 ` [PATCH v2 09/11] vfio: Move container related MODULE_ALIAS statements into container.c Jason Gunthorpe
2022-11-10  5:34   ` Tian, Kevin
2022-11-11  4:13   ` Yi Liu
2022-11-08  0:52 ` [PATCH v2 10/11] vfio: Make vfio_container optionally compiled Jason Gunthorpe
2022-11-08 22:28   ` Alex Williamson
2022-11-09  0:54     ` Jason Gunthorpe
2022-11-09 17:18       ` Alex Williamson
2022-11-09 19:52         ` Jason Gunthorpe
2022-11-10  6:57           ` Tian, Kevin
2022-11-10 17:10             ` Alex Williamson
2022-11-10 17:52             ` Jason Gunthorpe
2022-11-08  0:52 ` [PATCH v2 11/11] iommufd: Allow iommufd to supply /dev/vfio/vfio Jason Gunthorpe
2022-11-10  7:01   ` Tian, Kevin
2022-11-11  4:16   ` Yi Liu
2022-11-11  6:38     ` Yi Liu
2022-11-14 14:50     ` Jason Gunthorpe
2022-11-08  9:19 ` [PATCH v2 00/11] Connect VFIO to IOMMUFD Nicolin Chen
2022-11-08 15:18   ` Yi Liu
2022-11-09 16:57     ` Jason Gunthorpe
2022-11-14 12:51       ` Yi Liu
2022-11-14 14:37         ` Yang, Lixiao
2022-11-15  5:41           ` He, Yu
2022-11-14 14:38         ` Jason Gunthorpe
2022-11-14 14:42           ` Yi Liu
2022-11-15  1:16       ` Matthew Rosato
2022-11-09  9:03 ` Tian, Kevin
2022-11-09 12:48   ` Jason Gunthorpe
2022-11-10  2:16     ` Tian, Kevin
2022-11-11  3:01 ` Matthew Rosato
2022-11-14 14:23   ` Jason Gunthorpe
2022-11-14 14:55     ` Matthew Rosato
2022-11-14 14:59       ` Jason Gunthorpe
2022-11-14 15:21         ` Matthew Rosato
2022-11-14 19:27           ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).