kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/19] Add vfio_device cdev for iommufd support
@ 2023-02-21  3:47 Yi Liu
  2023-02-21  3:47 ` [PATCH v4 01/19] vfio: Allocate per device file structure Yi Liu
                   ` (18 more replies)
  0 siblings, 19 replies; 55+ messages in thread
From: Yi Liu @ 2023-02-21  3:47 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, intel-gvt-dev, intel-gfx,
	linux-s390, yan.y.zhao, xudong.hao, terrence.xu

Existing VFIO provides group-centric user APIs for userspace. Userspace
opens the /dev/vfio/$group_id first before getting device fd and hence
getting access to device. This is not the desired model for iommufd. Per
the conclusion of community discussion[1], iommufd provides device-centric
kAPIs and requires its consumer (like VFIO) to be device-centric user
APIs. Such user APIs are used to associate device with iommufd and also
the I/O address spaces managed by the iommufd.

This series first introduces a per device file structure to be prepared
for further enhancement and refactors the kvm-vfio code to be prepared
for accepting device file from userspace. Then refactors the vfio to be
able to handle iommufd binding. This refactor includes the mechanism of
blocking device access before iommufd bind, making the device_open exclusive.
between the group path and the cdev path. Eventually, adds the cdev support
for vfio device, and makes group infrastructure optional as it is not needed
when vfio device cdev is compiled.

This is also a prerequisite for iommu nesting for vfio device[2].

The complete code can be found in below branch, simple test done with the
legacy group path and the cdev path. Draft QEMU branch can be found at[3]

https://github.com/yiliu1765/iommufd/tree/vfio_device_cdev_v4
(config CONFIG_IOMMUFD=y CONFIG_VFIO_DEVICE_CDEV=y)

base-commit: 9f764ae24522428b3a4c502d394826fe8ef0338c

[1] https://lore.kernel.org/kvm/BN9PR11MB5433B1E4AE5B0480369F97178C189@BN9PR11MB5433.namprd11.prod.outlook.com/
[2] https://lore.kernel.org/linux-iommu/20230209043153.14964-1-yi.l.liu@intel.com/
[3] https://github.com/yiliu1765/qemu/tree/iommufd_rfcv3 (it is based on Eric's
    QEMU iommufd rfcv3 (https://lore.kernel.org/kvm/20230131205305.2726330-1-eric.auger@redhat.com/)
    plus two commits to align with vfio_device_cdev v3/v4)

Change log:

v4:
 - Add r-b from Kevin on patch 09/10
 - Add a line in devices/vfio.rst to emphasize user should add group/device to
   KVM prior to invoke open_device op which may be called in the VFIO_GROUP_GET_DEVICE_FD
   or VFIO_DEVICE_BIND_IOMMUFD ioctl.
 - Modify VFIO_GROUP/VFIO_DEVICE_CDEV Kconfig dependency (Alex)
 - Select VFIO_GROUP for SPAPR (Jason)
 - Check device fully-opened in PCI hotreset path for device fd (Jason)
 - Set df->access_granted in the caller of vfio_device_open() since
   the caller may fail in other operations, but df->access_granted
   does not allow a true to false change. So it should be set only when
   the open path is really done successfully. (Yan, Kevin)
 - Fix missing iommufd_ctx_put() in the cdev path (Yan)
 - Fix an issue found in testing exclusion between group and cdev path.
   vfio_device_cdev_close() should check df->access_granted before heading
   to other operations.
 - Update vfio.rst for iommufd/cdev

v3: https://lore.kernel.org/kvm/20230213151348.56451-1-yi.l.liu@intel.com/
 - Add r-b from Kevin on patch 03, 06, 07, 08.
 - Refine the group and cdev path exclusion. Remove vfio_device:single_open;
   add vfio_group::cdev_device_open_cnt to achieve exlucsion between group
   path and cdev path (Kevin, Jason)
 - Fix a bug in the error handling path (Yan Zhao)
 - Address misc remarks from Kevin

v2: https://lore.kernel.org/kvm/20230206090532.95598-1-yi.l.liu@intel.com/
 - Add r-b from Kevin and Eric on patch 01 02 04.
 - "Split kvm/vfio: Provide struct kvm_device_ops::release() insted of ::destroy()"
   from this series and got applied. (Alex, Kevin, Jason, Mathhew)
 - Add kvm_ref_lock to protect vfio_device_file->kvm instead of reusing
   dev_set->lock as dead-lock is observed with vfio-ap which would try to
   acquire kvm_lock. This is opposite lock order with kvm_device_release()
   which holds kvm_lock first and then hold dev_set->lock. (Kevin)
 - Use a separate ioctl for detaching IOAS. (Alex)
 - Rename vfio_device_file::single_open to be is_cdev_device (Kevin, Alex)
 - Move the vfio device cdev code into device_cdev.c and add a VFIO_DEVICE_CDEV
   kconfig for it. (Kevin, Jason)

v1: https://lore.kernel.org/kvm/20230117134942.101112-1-yi.l.liu@intel.com/
 - Fix the circular refcount between kvm struct and device file reference. (JasonG)
 - Address comments from KevinT
 - Remained the ioctl for detach, needs to Alex's taste
   (https://lore.kernel.org/kvm/BN9PR11MB5276BE9F4B0613EE859317028CFF9@BN9PR11MB5276.namprd11.prod.outlook.com/)

rfc: https://lore.kernel.org/kvm/20221219084718.9342-1-yi.l.liu@intel.com/

Thanks,
	Yi Liu

Yi Liu (19):
  vfio: Allocate per device file structure
  vfio: Refine vfio file kAPIs
  vfio: Accept vfio device file in the driver facing kAPI
  kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device
    fd
  kvm/vfio: Accept vfio device file from userspace
  vfio: Pass struct vfio_device_file * to vfio_device_open/close()
  vfio: Block device access via device fd until device is opened
  vfio/pci: Update comment around group_fd get in
    vfio_pci_ioctl_pci_hot_reset()
  vfio/pci: Accept device fd for hot reset
  vfio: Add infrastructure for bind_iommufd from userspace
  vfio-iommufd: Add detach_ioas support for physical VFIO devices
  vfio-iommufd: Add detach_ioas for emulated VFIO devices
  vfio: Add cdev_device_open_cnt to vfio_group
  vfio: Make vfio_device_open() single open for device cdev path
  vfio: Add cdev for vfio_device
  vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT
  vfio: Compile group optionally
  docs: vfio: Add vfio device cdev description

 Documentation/driver-api/vfio.rst             | 133 ++++++++-
 Documentation/virt/kvm/devices/vfio.rst       |  50 ++--
 drivers/gpu/drm/i915/gvt/kvmgt.c              |   1 +
 drivers/s390/cio/vfio_ccw_ops.c               |   1 +
 drivers/s390/crypto/vfio_ap_ops.c             |   1 +
 drivers/vfio/Kconfig                          |  26 ++
 drivers/vfio/Makefile                         |   3 +-
 drivers/vfio/device_cdev.c                    | 276 ++++++++++++++++++
 drivers/vfio/fsl-mc/vfio_fsl_mc.c             |   1 +
 drivers/vfio/group.c                          | 155 ++++++----
 drivers/vfio/iommufd.c                        |  59 +++-
 .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    |   2 +
 drivers/vfio/pci/mlx5/main.c                  |   1 +
 drivers/vfio/pci/vfio_pci.c                   |   1 +
 drivers/vfio/pci/vfio_pci_core.c              |  14 +-
 drivers/vfio/platform/vfio_amba.c             |   1 +
 drivers/vfio/platform/vfio_platform.c         |   1 +
 drivers/vfio/vfio.h                           | 188 +++++++++++-
 drivers/vfio/vfio_main.c                      | 271 +++++++++++++++--
 include/linux/iommufd.h                       |   6 +
 include/linux/vfio.h                          |  29 +-
 include/uapi/linux/kvm.h                      |  16 +-
 include/uapi/linux/vfio.h                     |  86 ++++++
 virt/kvm/vfio.c                               | 141 ++++-----
 24 files changed, 1252 insertions(+), 211 deletions(-)
 create mode 100644 drivers/vfio/device_cdev.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v4 01/19] vfio: Allocate per device file structure
  2023-02-21  3:47 [PATCH v4 00/19] Add vfio_device cdev for iommufd support Yi Liu
@ 2023-02-21  3:47 ` Yi Liu
  2023-02-21  3:47 ` [PATCH v4 02/19] vfio: Refine vfio file kAPIs Yi Liu
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 55+ messages in thread
From: Yi Liu @ 2023-02-21  3:47 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, intel-gvt-dev, intel-gfx,
	linux-s390, yan.y.zhao, xudong.hao, terrence.xu

This is preparation for adding vfio device cdev support. vfio device
cdev requires:
1) a per device file memory to store the kvm pointer set by KVM. It will
   be propagated to vfio_device:kvm after the device cdev file is bound
   to an iommufd
2) a mechanism to block device access through device cdev fd before it
   is bound to an iommufd

To address above requirements, this adds a per device file structure
named vfio_device_file. For now, it's only a wrapper of struct vfio_device
pointer. Other fields will be added to this per file structure in future
commits.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/vfio/group.c     | 13 +++++++++++--
 drivers/vfio/vfio.h      |  6 ++++++
 drivers/vfio/vfio_main.c | 31 ++++++++++++++++++++++++++-----
 3 files changed, 43 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 0e9036e2b9c4..cf51e1a0fd96 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -215,19 +215,26 @@ void vfio_device_group_close(struct vfio_device *device)
 
 static struct file *vfio_device_open_file(struct vfio_device *device)
 {
+	struct vfio_device_file *df;
 	struct file *filep;
 	int ret;
 
+	df = vfio_allocate_device_file(device);
+	if (IS_ERR(df)) {
+		ret = PTR_ERR(df);
+		goto err_out;
+	}
+
 	ret = vfio_device_group_open(device);
 	if (ret)
-		goto err_out;
+		goto err_free;
 
 	/*
 	 * We can't use anon_inode_getfd() because we need to modify
 	 * the f_mode flags directly to allow more than just ioctls
 	 */
 	filep = anon_inode_getfile("[vfio-device]", &vfio_device_fops,
-				   device, O_RDWR);
+				   df, O_RDWR);
 	if (IS_ERR(filep)) {
 		ret = PTR_ERR(filep);
 		goto err_close_device;
@@ -251,6 +258,8 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
 
 err_close_device:
 	vfio_device_group_close(device);
+err_free:
+	kfree(df);
 err_out:
 	return ERR_PTR(ret);
 }
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index e9721d8424bc..61bbf673e672 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -16,11 +16,17 @@ struct iommu_group;
 struct vfio_device;
 struct vfio_container;
 
+struct vfio_device_file {
+	struct vfio_device *device;
+};
+
 void vfio_device_put_registration(struct vfio_device *device);
 bool vfio_device_try_get_registration(struct vfio_device *device);
 int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd);
 void vfio_device_close(struct vfio_device *device,
 		       struct iommufd_ctx *iommufd);
+struct vfio_device_file *
+vfio_allocate_device_file(struct vfio_device *device);
 
 extern const struct file_operations vfio_device_fops;
 
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 3a597e799918..d99fa0cec18e 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -396,6 +396,20 @@ static bool vfio_assert_device_open(struct vfio_device *device)
 	return !WARN_ON_ONCE(!READ_ONCE(device->open_count));
 }
 
+struct vfio_device_file *
+vfio_allocate_device_file(struct vfio_device *device)
+{
+	struct vfio_device_file *df;
+
+	df = kzalloc(sizeof(*df), GFP_KERNEL_ACCOUNT);
+	if (!df)
+		return ERR_PTR(-ENOMEM);
+
+	df->device = device;
+
+	return df;
+}
+
 static int vfio_device_first_open(struct vfio_device *device,
 				  struct iommufd_ctx *iommufd)
 {
@@ -509,12 +523,15 @@ static inline void vfio_device_pm_runtime_put(struct vfio_device *device)
  */
 static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 
 	vfio_device_group_close(device);
 
 	vfio_device_put_registration(device);
 
+	kfree(df);
+
 	return 0;
 }
 
@@ -1079,7 +1096,8 @@ static int vfio_ioctl_device_feature(struct vfio_device *device,
 static long vfio_device_fops_unl_ioctl(struct file *filep,
 				       unsigned int cmd, unsigned long arg)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 	int ret;
 
 	ret = vfio_device_pm_runtime_get(device);
@@ -1106,7 +1124,8 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
 static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf,
 				     size_t count, loff_t *ppos)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 
 	if (unlikely(!device->ops->read))
 		return -EINVAL;
@@ -1118,7 +1137,8 @@ static ssize_t vfio_device_fops_write(struct file *filep,
 				      const char __user *buf,
 				      size_t count, loff_t *ppos)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 
 	if (unlikely(!device->ops->write))
 		return -EINVAL;
@@ -1128,7 +1148,8 @@ static ssize_t vfio_device_fops_write(struct file *filep,
 
 static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 
 	if (unlikely(!device->ops->mmap))
 		return -EINVAL;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 02/19] vfio: Refine vfio file kAPIs
  2023-02-21  3:47 [PATCH v4 00/19] Add vfio_device cdev for iommufd support Yi Liu
  2023-02-21  3:47 ` [PATCH v4 01/19] vfio: Allocate per device file structure Yi Liu
@ 2023-02-21  3:47 ` Yi Liu
  2023-02-21  3:47 ` [PATCH v4 03/19] vfio: Accept vfio device file in the driver facing kAPI Yi Liu
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 55+ messages in thread
From: Yi Liu @ 2023-02-21  3:47 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, intel-gvt-dev, intel-gfx,
	linux-s390, yan.y.zhao, xudong.hao, terrence.xu

This prepares for making the below kAPIs to accept both group file
and device file instead of only vfio group file.

  bool vfio_file_enforced_coherent(struct file *file);
  void vfio_file_set_kvm(struct file *file, struct kvm *kvm);
  bool vfio_file_has_dev(struct file *file, struct vfio_device *device);

Besides above change, vfio_file_is_group() is renamed to be
vfio_file_is_valid().

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/vfio/group.c             | 74 ++++++++------------------------
 drivers/vfio/pci/vfio_pci_core.c |  4 +-
 drivers/vfio/vfio.h              |  4 ++
 drivers/vfio/vfio_main.c         | 62 ++++++++++++++++++++++++++
 include/linux/vfio.h             |  2 +-
 virt/kvm/vfio.c                  | 10 ++---
 6 files changed, 92 insertions(+), 64 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index cf51e1a0fd96..cc0eded19a9f 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -751,6 +751,15 @@ bool vfio_device_has_container(struct vfio_device *device)
 	return device->group->container;
 }
 
+struct vfio_group *vfio_group_from_file(struct file *file)
+{
+	struct vfio_group *group = file->private_data;
+
+	if (file->f_op != &vfio_group_fops)
+		return NULL;
+	return group;
+}
+
 /**
  * vfio_file_iommu_group - Return the struct iommu_group for the vfio group file
  * @file: VFIO group file
@@ -761,13 +770,13 @@ bool vfio_device_has_container(struct vfio_device *device)
  */
 struct iommu_group *vfio_file_iommu_group(struct file *file)
 {
-	struct vfio_group *group = file->private_data;
+	struct vfio_group *group = vfio_group_from_file(file);
 	struct iommu_group *iommu_group = NULL;
 
 	if (!IS_ENABLED(CONFIG_SPAPR_TCE_IOMMU))
 		return NULL;
 
-	if (!vfio_file_is_group(file))
+	if (!group)
 		return NULL;
 
 	mutex_lock(&group->group_lock);
@@ -780,34 +789,11 @@ struct iommu_group *vfio_file_iommu_group(struct file *file)
 }
 EXPORT_SYMBOL_GPL(vfio_file_iommu_group);
 
-/**
- * vfio_file_is_group - True if the file is usable with VFIO aPIS
- * @file: VFIO group file
- */
-bool vfio_file_is_group(struct file *file)
-{
-	return file->f_op == &vfio_group_fops;
-}
-EXPORT_SYMBOL_GPL(vfio_file_is_group);
-
-/**
- * vfio_file_enforced_coherent - True if the DMA associated with the VFIO file
- *        is always CPU cache coherent
- * @file: VFIO group file
- *
- * Enforced coherency means that the IOMMU ignores things like the PCIe no-snoop
- * bit in DMA transactions. A return of false indicates that the user has
- * rights to access additional instructions such as wbinvd on x86.
- */
-bool vfio_file_enforced_coherent(struct file *file)
+bool vfio_group_enforced_coherent(struct vfio_group *group)
 {
-	struct vfio_group *group = file->private_data;
 	struct vfio_device *device;
 	bool ret = true;
 
-	if (!vfio_file_is_group(file))
-		return true;
-
 	/*
 	 * If the device does not have IOMMU_CAP_ENFORCE_CACHE_COHERENCY then
 	 * any domain later attached to it will also not support it. If the cap
@@ -825,46 +811,22 @@ bool vfio_file_enforced_coherent(struct file *file)
 	mutex_unlock(&group->device_lock);
 	return ret;
 }
-EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
 
-/**
- * vfio_file_set_kvm - Link a kvm with VFIO drivers
- * @file: VFIO group file
- * @kvm: KVM to link
- *
- * When a VFIO device is first opened the KVM will be available in
- * device->kvm if one was associated with the group.
- */
-void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
+void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm)
 {
-	struct vfio_group *group = file->private_data;
-
-	if (!vfio_file_is_group(file))
-		return;
-
+	/*
+	 * When a VFIO device is first opened the KVM will be available in
+	 * device->kvm if one was associated with the group.
+	 */
 	spin_lock(&group->kvm_ref_lock);
 	group->kvm = kvm;
 	spin_unlock(&group->kvm_ref_lock);
 }
-EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
 
-/**
- * vfio_file_has_dev - True if the VFIO file is a handle for device
- * @file: VFIO file to check
- * @device: Device that must be part of the file
- *
- * Returns true if given file has permission to manipulate the given device.
- */
-bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
+bool vfio_group_has_dev(struct vfio_group *group, struct vfio_device *device)
 {
-	struct vfio_group *group = file->private_data;
-
-	if (!vfio_file_is_group(file))
-		return false;
-
 	return group == device->group;
 }
-EXPORT_SYMBOL_GPL(vfio_file_has_dev);
 
 static char *vfio_devnode(const struct device *dev, umode_t *mode)
 {
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index a6492a25ff6a..4704c1babae3 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1320,8 +1320,8 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 			break;
 		}
 
-		/* Ensure the FD is a vfio group FD.*/
-		if (!vfio_file_is_group(file)) {
+		/* Ensure the FD is a vfio FD.*/
+		if (!vfio_file_is_valid(file)) {
 			fput(file);
 			ret = -EINVAL;
 			break;
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 61bbf673e672..f237e9410d1e 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -90,6 +90,10 @@ void vfio_device_group_unregister(struct vfio_device *device);
 int vfio_device_group_use_iommu(struct vfio_device *device);
 void vfio_device_group_unuse_iommu(struct vfio_device *device);
 void vfio_device_group_close(struct vfio_device *device);
+struct vfio_group *vfio_group_from_file(struct file *file);
+bool vfio_group_enforced_coherent(struct vfio_group *group);
+void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm);
+bool vfio_group_has_dev(struct vfio_group *group, struct vfio_device *device);
 bool vfio_device_has_container(struct vfio_device *device);
 int __init vfio_group_init(void);
 void vfio_group_cleanup(void);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index d99fa0cec18e..64dbe5266c4b 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1167,6 +1167,68 @@ const struct file_operations vfio_device_fops = {
 	.mmap		= vfio_device_fops_mmap,
 };
 
+/**
+ * vfio_file_is_valid - True if the file is valid vfio file
+ * @file: VFIO group file or VFIO device file
+ */
+bool vfio_file_is_valid(struct file *file)
+{
+	return vfio_group_from_file(file);
+}
+EXPORT_SYMBOL_GPL(vfio_file_is_valid);
+
+/**
+ * vfio_file_enforced_coherent - True if the DMA associated with the VFIO file
+ *        is always CPU cache coherent
+ * @file: VFIO group file or VFIO device file
+ *
+ * Enforced coherency means that the IOMMU ignores things like the PCIe no-snoop
+ * bit in DMA transactions. A return of false indicates that the user has
+ * rights to access additional instructions such as wbinvd on x86.
+ */
+bool vfio_file_enforced_coherent(struct file *file)
+{
+	struct vfio_group *group = vfio_group_from_file(file);
+
+	if (group)
+		return vfio_group_enforced_coherent(group);
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
+
+/**
+ * vfio_file_set_kvm - Link a kvm with VFIO drivers
+ * @file: VFIO group file or VFIO device file
+ * @kvm: KVM to link
+ *
+ */
+void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
+{
+	struct vfio_group *group = vfio_group_from_file(file);
+
+	if (group)
+		vfio_group_set_kvm(group, kvm);
+}
+EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
+
+/**
+ * vfio_file_has_dev - True if the VFIO file is a handle for device
+ * @file: VFIO file to check, VFIO group file or VFIO device file
+ * @device: Device that must be part of the file
+ *
+ * Returns true if given file has permission to manipulate the given device.
+ */
+bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
+{
+	struct vfio_group *group = vfio_group_from_file(file);
+
+	if (group)
+		return vfio_group_has_dev(group, device);
+	return false;
+}
+EXPORT_SYMBOL_GPL(vfio_file_has_dev);
+
 /*
  * Sub-module support
  */
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 93134b023968..6a07e1c6c38e 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -245,7 +245,7 @@ int vfio_mig_get_next_state(struct vfio_device *device,
  * External user API
  */
 struct iommu_group *vfio_file_iommu_group(struct file *file);
-bool vfio_file_is_group(struct file *file);
+bool vfio_file_is_valid(struct file *file);
 bool vfio_file_enforced_coherent(struct file *file);
 void vfio_file_set_kvm(struct file *file, struct kvm *kvm);
 bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 9584eb57e0ed..8bac308ba630 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -64,18 +64,18 @@ static bool kvm_vfio_file_enforced_coherent(struct file *file)
 	return ret;
 }
 
-static bool kvm_vfio_file_is_group(struct file *file)
+static bool kvm_vfio_file_is_valid(struct file *file)
 {
 	bool (*fn)(struct file *file);
 	bool ret;
 
-	fn = symbol_get(vfio_file_is_group);
+	fn = symbol_get(vfio_file_is_valid);
 	if (!fn)
 		return false;
 
 	ret = fn(file);
 
-	symbol_put(vfio_file_is_group);
+	symbol_put(vfio_file_is_valid);
 
 	return ret;
 }
@@ -154,8 +154,8 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
 	if (!filp)
 		return -EBADF;
 
-	/* Ensure the FD is a vfio group FD.*/
-	if (!kvm_vfio_file_is_group(filp)) {
+	/* Ensure the FD is a vfio FD.*/
+	if (!kvm_vfio_file_is_valid(filp)) {
 		ret = -EINVAL;
 		goto err_fput;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 03/19] vfio: Accept vfio device file in the driver facing kAPI
  2023-02-21  3:47 [PATCH v4 00/19] Add vfio_device cdev for iommufd support Yi Liu
  2023-02-21  3:47 ` [PATCH v4 01/19] vfio: Allocate per device file structure Yi Liu
  2023-02-21  3:47 ` [PATCH v4 02/19] vfio: Refine vfio file kAPIs Yi Liu
@ 2023-02-21  3:47 ` Yi Liu
  2023-02-22  7:15   ` Tian, Kevin
  2023-02-21  3:47 ` [PATCH v4 04/19] kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device fd Yi Liu
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 55+ messages in thread
From: Yi Liu @ 2023-02-21  3:47 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, intel-gvt-dev, intel-gfx,
	linux-s390, yan.y.zhao, xudong.hao, terrence.xu

This makes the vfio file kAPIs to accepte vfio device files, also a
preparation for vfio device cdev support.

For the kvm set with vfio device file, kvm pointer is stored in struct
vfio_device_file, and use kvm_ref_lock to protect kvm set and kvm
pointer usage within VFIO. This kvm pointer will be set to vfio_device
after device file is bound to iommufd in the cdev path.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 drivers/vfio/vfio.h      |  2 ++
 drivers/vfio/vfio_main.c | 51 ++++++++++++++++++++++++++++++++++++----
 2 files changed, 49 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index f237e9410d1e..cee979a1b90f 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -18,6 +18,8 @@ struct vfio_container;
 
 struct vfio_device_file {
 	struct vfio_device *device;
+	spinlock_t kvm_ref_lock; /* protect kvm field */
+	struct kvm *kvm;
 };
 
 void vfio_device_put_registration(struct vfio_device *device);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 64dbe5266c4b..28db0a563805 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -406,6 +406,7 @@ vfio_allocate_device_file(struct vfio_device *device)
 		return ERR_PTR(-ENOMEM);
 
 	df->device = device;
+	spin_lock_init(&df->kvm_ref_lock);
 
 	return df;
 }
@@ -1167,13 +1168,23 @@ const struct file_operations vfio_device_fops = {
 	.mmap		= vfio_device_fops_mmap,
 };
 
+static struct vfio_device *vfio_device_from_file(struct file *file)
+{
+	struct vfio_device_file *df = file->private_data;
+
+	if (file->f_op != &vfio_device_fops)
+		return NULL;
+	return df->device;
+}
+
 /**
  * vfio_file_is_valid - True if the file is valid vfio file
  * @file: VFIO group file or VFIO device file
  */
 bool vfio_file_is_valid(struct file *file)
 {
-	return vfio_group_from_file(file);
+	return vfio_group_from_file(file) ||
+	       vfio_device_from_file(file);
 }
 EXPORT_SYMBOL_GPL(vfio_file_is_valid);
 
@@ -1188,15 +1199,36 @@ EXPORT_SYMBOL_GPL(vfio_file_is_valid);
  */
 bool vfio_file_enforced_coherent(struct file *file)
 {
-	struct vfio_group *group = vfio_group_from_file(file);
+	struct vfio_group *group;
+	struct vfio_device *device;
 
+	group = vfio_group_from_file(file);
 	if (group)
 		return vfio_group_enforced_coherent(group);
 
+	device = vfio_device_from_file(file);
+	if (device)
+		return device_iommu_capable(device->dev,
+					    IOMMU_CAP_ENFORCE_CACHE_COHERENCY);
+
 	return true;
 }
 EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
 
+static void vfio_device_file_set_kvm(struct file *file, struct kvm *kvm)
+{
+	struct vfio_device_file *df = file->private_data;
+
+	/*
+	 * The kvm is first recorded in the vfio_device_file, and will
+	 * be propagated to vfio_device::kvm when the file is bound to
+	 * iommufd successfully in the vfio device cdev path.
+	 */
+	spin_lock(&df->kvm_ref_lock);
+	df->kvm = kvm;
+	spin_unlock(&df->kvm_ref_lock);
+}
+
 /**
  * vfio_file_set_kvm - Link a kvm with VFIO drivers
  * @file: VFIO group file or VFIO device file
@@ -1205,10 +1237,14 @@ EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
  */
 void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
 {
-	struct vfio_group *group = vfio_group_from_file(file);
+	struct vfio_group *group;
 
+	group = vfio_group_from_file(file);
 	if (group)
 		vfio_group_set_kvm(group, kvm);
+
+	if (vfio_device_from_file(file))
+		vfio_device_file_set_kvm(file, kvm);
 }
 EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
 
@@ -1221,10 +1257,17 @@ EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
  */
 bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
 {
-	struct vfio_group *group = vfio_group_from_file(file);
+	struct vfio_group *group;
+	struct vfio_device *vdev;
 
+	group = vfio_group_from_file(file);
 	if (group)
 		return vfio_group_has_dev(group, device);
+
+	vdev = vfio_device_from_file(file);
+	if (device)
+		return vdev == device;
+
 	return false;
 }
 EXPORT_SYMBOL_GPL(vfio_file_has_dev);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 04/19] kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device fd
  2023-02-21  3:47 [PATCH v4 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (2 preceding siblings ...)
  2023-02-21  3:47 ` [PATCH v4 03/19] vfio: Accept vfio device file in the driver facing kAPI Yi Liu
@ 2023-02-21  3:47 ` Yi Liu
  2023-02-21  3:47 ` [PATCH v4 05/19] kvm/vfio: Accept vfio device file from userspace Yi Liu
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 55+ messages in thread
From: Yi Liu @ 2023-02-21  3:47 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, intel-gvt-dev, intel-gfx,
	linux-s390, yan.y.zhao, xudong.hao, terrence.xu

Meanwhile, rename related helpers. No functional change is intended.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
---
 virt/kvm/vfio.c | 115 ++++++++++++++++++++++++------------------------
 1 file changed, 58 insertions(+), 57 deletions(-)

diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 8bac308ba630..857d6ba349e1 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -21,7 +21,7 @@
 #include <asm/kvm_ppc.h>
 #endif
 
-struct kvm_vfio_group {
+struct kvm_vfio_file {
 	struct list_head node;
 	struct file *file;
 #ifdef CONFIG_SPAPR_TCE_IOMMU
@@ -30,7 +30,7 @@ struct kvm_vfio_group {
 };
 
 struct kvm_vfio {
-	struct list_head group_list;
+	struct list_head file_list;
 	struct mutex lock;
 	bool noncoherent;
 };
@@ -98,34 +98,35 @@ static struct iommu_group *kvm_vfio_file_iommu_group(struct file *file)
 }
 
 static void kvm_spapr_tce_release_vfio_group(struct kvm *kvm,
-					     struct kvm_vfio_group *kvg)
+					     struct kvm_vfio_file *kvf)
 {
-	if (WARN_ON_ONCE(!kvg->iommu_group))
+	if (WARN_ON_ONCE(!kvf->iommu_group))
 		return;
 
-	kvm_spapr_tce_release_iommu_group(kvm, kvg->iommu_group);
-	iommu_group_put(kvg->iommu_group);
-	kvg->iommu_group = NULL;
+	kvm_spapr_tce_release_iommu_group(kvm, kvf->iommu_group);
+	iommu_group_put(kvf->iommu_group);
+	kvf->iommu_group = NULL;
 }
 #endif
 
 /*
- * Groups can use the same or different IOMMU domains.  If the same then
- * adding a new group may change the coherency of groups we've previously
- * been told about.  We don't want to care about any of that so we retest
- * each group and bail as soon as we find one that's noncoherent.  This
- * means we only ever [un]register_noncoherent_dma once for the whole device.
+ * Groups/devices can use the same or different IOMMU domains. If the same
+ * then adding a new group/device may change the coherency of groups/devices
+ * we've previously been told about. We don't want to care about any of
+ * that so we retest each group/device and bail as soon as we find one that's
+ * noncoherent.  This means we only ever [un]register_noncoherent_dma once
+ * for the whole device.
  */
 static void kvm_vfio_update_coherency(struct kvm_device *dev)
 {
 	struct kvm_vfio *kv = dev->private;
 	bool noncoherent = false;
-	struct kvm_vfio_group *kvg;
+	struct kvm_vfio_file *kvf;
 
 	mutex_lock(&kv->lock);
 
-	list_for_each_entry(kvg, &kv->group_list, node) {
-		if (!kvm_vfio_file_enforced_coherent(kvg->file)) {
+	list_for_each_entry(kvf, &kv->file_list, node) {
+		if (!kvm_vfio_file_enforced_coherent(kvf->file)) {
 			noncoherent = true;
 			break;
 		}
@@ -143,10 +144,10 @@ static void kvm_vfio_update_coherency(struct kvm_device *dev)
 	mutex_unlock(&kv->lock);
 }
 
-static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
+static int kvm_vfio_file_add(struct kvm_device *dev, unsigned int fd)
 {
 	struct kvm_vfio *kv = dev->private;
-	struct kvm_vfio_group *kvg;
+	struct kvm_vfio_file *kvf;
 	struct file *filp;
 	int ret;
 
@@ -162,27 +163,27 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
 
 	mutex_lock(&kv->lock);
 
-	list_for_each_entry(kvg, &kv->group_list, node) {
-		if (kvg->file == filp) {
+	list_for_each_entry(kvf, &kv->file_list, node) {
+		if (kvf->file == filp) {
 			ret = -EEXIST;
 			goto err_unlock;
 		}
 	}
 
-	kvg = kzalloc(sizeof(*kvg), GFP_KERNEL_ACCOUNT);
-	if (!kvg) {
+	kvf = kzalloc(sizeof(*kvf), GFP_KERNEL_ACCOUNT);
+	if (!kvf) {
 		ret = -ENOMEM;
 		goto err_unlock;
 	}
 
-	kvg->file = filp;
-	list_add_tail(&kvg->node, &kv->group_list);
+	kvf->file = filp;
+	list_add_tail(&kvf->node, &kv->file_list);
 
 	kvm_arch_start_assignment(dev->kvm);
 
 	mutex_unlock(&kv->lock);
 
-	kvm_vfio_file_set_kvm(kvg->file, dev->kvm);
+	kvm_vfio_file_set_kvm(kvf->file, dev->kvm);
 	kvm_vfio_update_coherency(dev);
 
 	return 0;
@@ -193,10 +194,10 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
 	return ret;
 }
 
-static int kvm_vfio_group_del(struct kvm_device *dev, unsigned int fd)
+static int kvm_vfio_file_del(struct kvm_device *dev, unsigned int fd)
 {
 	struct kvm_vfio *kv = dev->private;
-	struct kvm_vfio_group *kvg;
+	struct kvm_vfio_file *kvf;
 	struct fd f;
 	int ret;
 
@@ -208,18 +209,18 @@ static int kvm_vfio_group_del(struct kvm_device *dev, unsigned int fd)
 
 	mutex_lock(&kv->lock);
 
-	list_for_each_entry(kvg, &kv->group_list, node) {
-		if (kvg->file != f.file)
+	list_for_each_entry(kvf, &kv->file_list, node) {
+		if (kvf->file != f.file)
 			continue;
 
-		list_del(&kvg->node);
+		list_del(&kvf->node);
 		kvm_arch_end_assignment(dev->kvm);
 #ifdef CONFIG_SPAPR_TCE_IOMMU
-		kvm_spapr_tce_release_vfio_group(dev->kvm, kvg);
+		kvm_spapr_tce_release_vfio_group(dev->kvm, kvf);
 #endif
-		kvm_vfio_file_set_kvm(kvg->file, NULL);
-		fput(kvg->file);
-		kfree(kvg);
+		kvm_vfio_file_set_kvm(kvf->file, NULL);
+		fput(kvf->file);
+		kfree(kvf);
 		ret = 0;
 		break;
 	}
@@ -234,12 +235,12 @@ static int kvm_vfio_group_del(struct kvm_device *dev, unsigned int fd)
 }
 
 #ifdef CONFIG_SPAPR_TCE_IOMMU
-static int kvm_vfio_group_set_spapr_tce(struct kvm_device *dev,
-					void __user *arg)
+static int kvm_vfio_file_set_spapr_tce(struct kvm_device *dev,
+				       void __user *arg)
 {
 	struct kvm_vfio_spapr_tce param;
 	struct kvm_vfio *kv = dev->private;
-	struct kvm_vfio_group *kvg;
+	struct kvm_vfio_file *kvf;
 	struct fd f;
 	int ret;
 
@@ -254,20 +255,20 @@ static int kvm_vfio_group_set_spapr_tce(struct kvm_device *dev,
 
 	mutex_lock(&kv->lock);
 
-	list_for_each_entry(kvg, &kv->group_list, node) {
-		if (kvg->file != f.file)
+	list_for_each_entry(kvf, &kv->file_list, node) {
+		if (kvf->file != f.file)
 			continue;
 
-		if (!kvg->iommu_group) {
-			kvg->iommu_group = kvm_vfio_file_iommu_group(kvg->file);
-			if (WARN_ON_ONCE(!kvg->iommu_group)) {
+		if (!kvf->iommu_group) {
+			kvf->iommu_group = kvm_vfio_file_iommu_group(kvf->file);
+			if (WARN_ON_ONCE(!kvf->iommu_group)) {
 				ret = -EIO;
 				goto err_fdput;
 			}
 		}
 
 		ret = kvm_spapr_tce_attach_iommu_group(dev->kvm, param.tablefd,
-						       kvg->iommu_group);
+						       kvf->iommu_group);
 		break;
 	}
 
@@ -278,8 +279,8 @@ static int kvm_vfio_group_set_spapr_tce(struct kvm_device *dev,
 }
 #endif
 
-static int kvm_vfio_set_group(struct kvm_device *dev, long attr,
-			      void __user *arg)
+static int kvm_vfio_set_file(struct kvm_device *dev, long attr,
+			     void __user *arg)
 {
 	int32_t __user *argp = arg;
 	int32_t fd;
@@ -288,16 +289,16 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr,
 	case KVM_DEV_VFIO_GROUP_ADD:
 		if (get_user(fd, argp))
 			return -EFAULT;
-		return kvm_vfio_group_add(dev, fd);
+		return kvm_vfio_file_add(dev, fd);
 
 	case KVM_DEV_VFIO_GROUP_DEL:
 		if (get_user(fd, argp))
 			return -EFAULT;
-		return kvm_vfio_group_del(dev, fd);
+		return kvm_vfio_file_del(dev, fd);
 
 #ifdef CONFIG_SPAPR_TCE_IOMMU
 	case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE:
-		return kvm_vfio_group_set_spapr_tce(dev, arg);
+		return kvm_vfio_file_set_spapr_tce(dev, arg);
 #endif
 	}
 
@@ -309,8 +310,8 @@ static int kvm_vfio_set_attr(struct kvm_device *dev,
 {
 	switch (attr->group) {
 	case KVM_DEV_VFIO_GROUP:
-		return kvm_vfio_set_group(dev, attr->attr,
-					  u64_to_user_ptr(attr->addr));
+		return kvm_vfio_set_file(dev, attr->attr,
+					 u64_to_user_ptr(attr->addr));
 	}
 
 	return -ENXIO;
@@ -339,16 +340,16 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
 static void kvm_vfio_release(struct kvm_device *dev)
 {
 	struct kvm_vfio *kv = dev->private;
-	struct kvm_vfio_group *kvg, *tmp;
+	struct kvm_vfio_file *kvf, *tmp;
 
-	list_for_each_entry_safe(kvg, tmp, &kv->group_list, node) {
+	list_for_each_entry_safe(kvf, tmp, &kv->file_list, node) {
 #ifdef CONFIG_SPAPR_TCE_IOMMU
-		kvm_spapr_tce_release_vfio_group(dev->kvm, kvg);
+		kvm_spapr_tce_release_vfio_group(dev->kvm, kvf);
 #endif
-		kvm_vfio_file_set_kvm(kvg->file, NULL);
-		fput(kvg->file);
-		list_del(&kvg->node);
-		kfree(kvg);
+		kvm_vfio_file_set_kvm(kvf->file, NULL);
+		fput(kvf->file);
+		list_del(&kvf->node);
+		kfree(kvf);
 		kvm_arch_end_assignment(dev->kvm);
 	}
 
@@ -382,7 +383,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
 	if (!kv)
 		return -ENOMEM;
 
-	INIT_LIST_HEAD(&kv->group_list);
+	INIT_LIST_HEAD(&kv->file_list);
 	mutex_init(&kv->lock);
 
 	dev->private = kv;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 05/19] kvm/vfio: Accept vfio device file from userspace
  2023-02-21  3:47 [PATCH v4 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (3 preceding siblings ...)
  2023-02-21  3:47 ` [PATCH v4 04/19] kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device fd Yi Liu
@ 2023-02-21  3:47 ` Yi Liu
  2023-02-22  7:17   ` Tian, Kevin
  2023-02-21  3:47 ` [PATCH v4 06/19] vfio: Pass struct vfio_device_file * to vfio_device_open/close() Yi Liu
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 55+ messages in thread
From: Yi Liu @ 2023-02-21  3:47 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, intel-gvt-dev, intel-gfx,
	linux-s390, yan.y.zhao, xudong.hao, terrence.xu

This defines KVM_DEV_VFIO_FILE* and make alias with KVM_DEV_VFIO_GROUP*.
Old userspace uses KVM_DEV_VFIO_GROUP* works as well.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 Documentation/virt/kvm/devices/vfio.rst | 50 ++++++++++++++++---------
 include/uapi/linux/kvm.h                | 16 ++++++--
 virt/kvm/vfio.c                         | 16 ++++----
 3 files changed, 53 insertions(+), 29 deletions(-)

diff --git a/Documentation/virt/kvm/devices/vfio.rst b/Documentation/virt/kvm/devices/vfio.rst
index 5722e283f1b5..50bf76fc24e5 100644
--- a/Documentation/virt/kvm/devices/vfio.rst
+++ b/Documentation/virt/kvm/devices/vfio.rst
@@ -9,24 +9,37 @@ Device types supported:
   - KVM_DEV_TYPE_VFIO
 
 Only one VFIO instance may be created per VM.  The created device
-tracks VFIO groups in use by the VM and features of those groups
-important to the correctness and acceleration of the VM.  As groups
-are enabled and disabled for use by the VM, KVM should be updated
-about their presence.  When registered with KVM, a reference to the
-VFIO-group is held by KVM.
+tracks VFIO files (group or device) in use by the VM and features
+of those groups/devices important to the correctness and acceleration
+of the VM.  As groups/devices are enabled and disabled for use by the
+VM, KVM should be updated about their presence.  When registered with
+KVM, a reference to the VFIO file is held by KVM.
 
 Groups:
-  KVM_DEV_VFIO_GROUP
-
-KVM_DEV_VFIO_GROUP attributes:
-  KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
-	kvm_device_attr.addr points to an int32_t file descriptor
-	for the VFIO group.
-  KVM_DEV_VFIO_GROUP_DEL: Remove a VFIO group from VFIO-KVM device tracking
-	kvm_device_attr.addr points to an int32_t file descriptor
-	for the VFIO group.
-  KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE: attaches a guest visible TCE table
+  KVM_DEV_VFIO_FILE
+	alias: KVM_DEV_VFIO_GROUP
+
+KVM_DEV_VFIO_FILE attributes:
+  KVM_DEV_VFIO_FILE_ADD: Add a VFIO file (group/device) to VFIO-KVM device
+	tracking
+
+	alias: KVM_DEV_VFIO_GROUP_ADD
+
+	kvm_device_attr.addr points to an int32_t file descriptor for the
+	VFIO file.
+  KVM_DEV_VFIO_FILE_DEL: Remove a VFIO file (group/device) from VFIO-KVM
+	device tracking
+
+	alias: KVM_DEV_VFIO_GROUP_DEL
+
+	kvm_device_attr.addr points to an int32_t file descriptor for the
+	VFIO file.
+
+  KVM_DEV_VFIO_FILE_SET_SPAPR_TCE: attaches a guest visible TCE table
 	allocated by sPAPR KVM.
+
+	alias: KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE
+
 	kvm_device_attr.addr points to a struct::
 
 		struct kvm_vfio_spapr_tce {
@@ -40,7 +53,10 @@ KVM_DEV_VFIO_GROUP attributes:
 	- @tablefd is a file descriptor for a TCE table allocated via
 	  KVM_CREATE_SPAPR_TCE.
 
+	only accepts vfio group file as SPAPR has no iommufd support
+
 ::
 
-The GROUP_ADD operation above should be invoked before vfio_device's
-open_device op which is called in the ioctl VFIO_GROUP_GET_DEVICE_FD.
+The FILE/GROUP_ADD operation above should be invoked before vfio_device's
+open_device op which is called in the ioctl VFIO_GROUP_GET_DEVICE_FD
+or VFIO_DEVICE_BIND_IOMMUFD.
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 55155e262646..484a8133bc69 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1401,10 +1401,18 @@ struct kvm_device_attr {
 	__u64	addr;		/* userspace address of attr data */
 };
 
-#define  KVM_DEV_VFIO_GROUP			1
-#define   KVM_DEV_VFIO_GROUP_ADD			1
-#define   KVM_DEV_VFIO_GROUP_DEL			2
-#define   KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE		3
+#define  KVM_DEV_VFIO_FILE	1
+
+#define   KVM_DEV_VFIO_FILE_ADD			1
+#define   KVM_DEV_VFIO_FILE_DEL			2
+#define   KVM_DEV_VFIO_FILE_SET_SPAPR_TCE	3
+
+/* KVM_DEV_VFIO_GROUP aliases are for compile time uapi compatibility */
+#define  KVM_DEV_VFIO_GROUP	KVM_DEV_VFIO_FILE
+
+#define   KVM_DEV_VFIO_GROUP_ADD	KVM_DEV_VFIO_FILE_ADD
+#define   KVM_DEV_VFIO_GROUP_DEL	KVM_DEV_VFIO_FILE_DEL
+#define   KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE	KVM_DEV_VFIO_FILE_SET_SPAPR_TCE
 
 enum kvm_device_type {
 	KVM_DEV_TYPE_FSL_MPIC_20	= 1,
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 857d6ba349e1..d869913baafd 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -286,18 +286,18 @@ static int kvm_vfio_set_file(struct kvm_device *dev, long attr,
 	int32_t fd;
 
 	switch (attr) {
-	case KVM_DEV_VFIO_GROUP_ADD:
+	case KVM_DEV_VFIO_FILE_ADD:
 		if (get_user(fd, argp))
 			return -EFAULT;
 		return kvm_vfio_file_add(dev, fd);
 
-	case KVM_DEV_VFIO_GROUP_DEL:
+	case KVM_DEV_VFIO_FILE_DEL:
 		if (get_user(fd, argp))
 			return -EFAULT;
 		return kvm_vfio_file_del(dev, fd);
 
 #ifdef CONFIG_SPAPR_TCE_IOMMU
-	case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE:
+	case KVM_DEV_VFIO_FILE_SET_SPAPR_TCE:
 		return kvm_vfio_file_set_spapr_tce(dev, arg);
 #endif
 	}
@@ -309,7 +309,7 @@ static int kvm_vfio_set_attr(struct kvm_device *dev,
 			     struct kvm_device_attr *attr)
 {
 	switch (attr->group) {
-	case KVM_DEV_VFIO_GROUP:
+	case KVM_DEV_VFIO_FILE:
 		return kvm_vfio_set_file(dev, attr->attr,
 					 u64_to_user_ptr(attr->addr));
 	}
@@ -321,12 +321,12 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
 			     struct kvm_device_attr *attr)
 {
 	switch (attr->group) {
-	case KVM_DEV_VFIO_GROUP:
+	case KVM_DEV_VFIO_FILE:
 		switch (attr->attr) {
-		case KVM_DEV_VFIO_GROUP_ADD:
-		case KVM_DEV_VFIO_GROUP_DEL:
+		case KVM_DEV_VFIO_FILE_ADD:
+		case KVM_DEV_VFIO_FILE_DEL:
 #ifdef CONFIG_SPAPR_TCE_IOMMU
-		case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE:
+		case KVM_DEV_VFIO_FILE_SET_SPAPR_TCE:
 #endif
 			return 0;
 		}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 06/19] vfio: Pass struct vfio_device_file * to vfio_device_open/close()
  2023-02-21  3:47 [PATCH v4 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (4 preceding siblings ...)
  2023-02-21  3:47 ` [PATCH v4 05/19] kvm/vfio: Accept vfio device file from userspace Yi Liu
@ 2023-02-21  3:47 ` Yi Liu
  2023-02-21  3:48 ` [PATCH v4 07/19] vfio: Block device access via device fd until device is opened Yi Liu
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 55+ messages in thread
From: Yi Liu @ 2023-02-21  3:47 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, intel-gvt-dev, intel-gfx,
	linux-s390, yan.y.zhao, xudong.hao, terrence.xu

This avoids passing too much parameters in multiple functions.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 drivers/vfio/group.c     | 19 +++++++++++++------
 drivers/vfio/vfio.h      |  8 ++++----
 drivers/vfio/vfio_main.c | 25 +++++++++++++++----------
 3 files changed, 32 insertions(+), 20 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index cc0eded19a9f..2abf55c69281 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -166,8 +166,9 @@ static void vfio_device_group_get_kvm_safe(struct vfio_device *device)
 	spin_unlock(&device->group->kvm_ref_lock);
 }
 
-static int vfio_device_group_open(struct vfio_device *device)
+static int vfio_device_group_open(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
 	int ret;
 
 	mutex_lock(&device->group->group_lock);
@@ -187,7 +188,11 @@ static int vfio_device_group_open(struct vfio_device *device)
 	if (device->open_count == 0)
 		vfio_device_group_get_kvm_safe(device);
 
-	ret = vfio_device_open(device, device->group->iommufd);
+	df->iommufd = device->group->iommufd;
+
+	ret = vfio_device_open(df);
+	if (ret)
+		df->iommufd = NULL;
 
 	if (device->open_count == 0)
 		vfio_device_put_kvm(device);
@@ -199,12 +204,14 @@ static int vfio_device_group_open(struct vfio_device *device)
 	return ret;
 }
 
-void vfio_device_group_close(struct vfio_device *device)
+void vfio_device_group_close(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
+
 	mutex_lock(&device->group->group_lock);
 	mutex_lock(&device->dev_set->lock);
 
-	vfio_device_close(device, device->group->iommufd);
+	vfio_device_close(df);
 
 	if (device->open_count == 0)
 		vfio_device_put_kvm(device);
@@ -225,7 +232,7 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
 		goto err_out;
 	}
 
-	ret = vfio_device_group_open(device);
+	ret = vfio_device_group_open(df);
 	if (ret)
 		goto err_free;
 
@@ -257,7 +264,7 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
 	return filep;
 
 err_close_device:
-	vfio_device_group_close(device);
+	vfio_device_group_close(df);
 err_free:
 	kfree(df);
 err_out:
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index cee979a1b90f..11e56fe079a1 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -20,13 +20,13 @@ struct vfio_device_file {
 	struct vfio_device *device;
 	spinlock_t kvm_ref_lock; /* protect kvm field */
 	struct kvm *kvm;
+	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
 };
 
 void vfio_device_put_registration(struct vfio_device *device);
 bool vfio_device_try_get_registration(struct vfio_device *device);
-int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd);
-void vfio_device_close(struct vfio_device *device,
-		       struct iommufd_ctx *iommufd);
+int vfio_device_open(struct vfio_device_file *df);
+void vfio_device_close(struct vfio_device_file *df);
 struct vfio_device_file *
 vfio_allocate_device_file(struct vfio_device *device);
 
@@ -91,7 +91,7 @@ void vfio_device_group_register(struct vfio_device *device);
 void vfio_device_group_unregister(struct vfio_device *device);
 int vfio_device_group_use_iommu(struct vfio_device *device);
 void vfio_device_group_unuse_iommu(struct vfio_device *device);
-void vfio_device_group_close(struct vfio_device *device);
+void vfio_device_group_close(struct vfio_device_file *df);
 struct vfio_group *vfio_group_from_file(struct file *file);
 bool vfio_group_enforced_coherent(struct vfio_group *group);
 void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 28db0a563805..ea507a61e3b7 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -411,9 +411,10 @@ vfio_allocate_device_file(struct vfio_device *device)
 	return df;
 }
 
-static int vfio_device_first_open(struct vfio_device *device,
-				  struct iommufd_ctx *iommufd)
+static int vfio_device_first_open(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
+	struct iommufd_ctx *iommufd = df->iommufd;
 	int ret;
 
 	lockdep_assert_held(&device->dev_set->lock);
@@ -445,9 +446,11 @@ static int vfio_device_first_open(struct vfio_device *device,
 	return ret;
 }
 
-static void vfio_device_last_close(struct vfio_device *device,
-				   struct iommufd_ctx *iommufd)
+static void vfio_device_last_close(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
+	struct iommufd_ctx *iommufd = df->iommufd;
+
 	lockdep_assert_held(&device->dev_set->lock);
 
 	if (device->ops->close_device)
@@ -459,15 +462,16 @@ static void vfio_device_last_close(struct vfio_device *device,
 	module_put(device->dev->driver->owner);
 }
 
-int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd)
+int vfio_device_open(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
 	int ret = 0;
 
 	lockdep_assert_held(&device->dev_set->lock);
 
 	device->open_count++;
 	if (device->open_count == 1) {
-		ret = vfio_device_first_open(device, iommufd);
+		ret = vfio_device_first_open(df);
 		if (ret)
 			device->open_count--;
 	}
@@ -475,14 +479,15 @@ int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd)
 	return ret;
 }
 
-void vfio_device_close(struct vfio_device *device,
-		       struct iommufd_ctx *iommufd)
+void vfio_device_close(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
+
 	lockdep_assert_held(&device->dev_set->lock);
 
 	vfio_assert_device_open(device);
 	if (device->open_count == 1)
-		vfio_device_last_close(device, iommufd);
+		vfio_device_last_close(df);
 	device->open_count--;
 }
 
@@ -527,7 +532,7 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
-	vfio_device_group_close(device);
+	vfio_device_group_close(df);
 
 	vfio_device_put_registration(device);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 07/19] vfio: Block device access via device fd until device is opened
  2023-02-21  3:47 [PATCH v4 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (5 preceding siblings ...)
  2023-02-21  3:47 ` [PATCH v4 06/19] vfio: Pass struct vfio_device_file * to vfio_device_open/close() Yi Liu
@ 2023-02-21  3:48 ` Yi Liu
  2023-02-22  7:55   ` Yan Zhao
  2023-02-21  3:48 ` [PATCH v4 08/19] vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset() Yi Liu
                   ` (11 subsequent siblings)
  18 siblings, 1 reply; 55+ messages in thread
From: Yi Liu @ 2023-02-21  3:48 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, intel-gvt-dev, intel-gfx,
	linux-s390, yan.y.zhao, xudong.hao, terrence.xu

Allow the vfio_device file to be in a state where the device FD is
opened but the device cannot be used by userspace (i.e. its .open_device()
hasn't been called). This inbetween state is not used when the device
FD is spawned from the group FD, however when we create the device FD
directly by opening a cdev it will be opened in the blocked state.

The reason for the inbetween state is that userspace only gets a FD but
doesn't gain access permission until binding the FD to an iommufd. So in
the blocked state, only the bind operation is allowed. Completing bind
will allow user to further access the device.

This is implemented by adding a flag in struct vfio_device_file to mark
the blocked state and using a simple smp_load_acquire() to obtain the
flag value and serialize all the device setup with the thread accessing
this device.

Following this lockless scheme, it can safely handle the device FD
unbound->bound but it cannot handle bound->unbound. To allow this we'd
need to add a lock on all the vfio ioctls which seems costly. So once
device FD is bound, it remains bound until the FD is closed.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 drivers/vfio/group.c     |  6 ++++++
 drivers/vfio/vfio.h      |  1 +
 drivers/vfio/vfio_main.c | 16 ++++++++++++++++
 3 files changed, 23 insertions(+)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 2abf55c69281..14e29525e354 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -197,6 +197,12 @@ static int vfio_device_group_open(struct vfio_device_file *df)
 	if (device->open_count == 0)
 		vfio_device_put_kvm(device);
 
+	/*
+	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
+	 * read/write/mmap
+	 */
+	smp_store_release(&df->access_granted, true);
+
 	mutex_unlock(&device->dev_set->lock);
 
 out_unlock:
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 11e56fe079a1..d56cdb114024 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -18,6 +18,7 @@ struct vfio_container;
 
 struct vfio_device_file {
 	struct vfio_device *device;
+	bool access_granted;
 	spinlock_t kvm_ref_lock; /* protect kvm field */
 	struct kvm *kvm;
 	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index ea507a61e3b7..91c8f25393db 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1106,6 +1106,10 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
 	struct vfio_device *device = df->device;
 	int ret;
 
+	/* Paired with smp_store_release() in vfio_device_open() */
+	if (!smp_load_acquire(&df->access_granted))
+		return -EINVAL;
+
 	ret = vfio_device_pm_runtime_get(device);
 	if (ret)
 		return ret;
@@ -1133,6 +1137,10 @@ static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf,
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
+	/* Paired with smp_store_release() in vfio_device_open() */
+	if (!smp_load_acquire(&df->access_granted))
+		return -EINVAL;
+
 	if (unlikely(!device->ops->read))
 		return -EINVAL;
 
@@ -1146,6 +1154,10 @@ static ssize_t vfio_device_fops_write(struct file *filep,
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
+	/* Paired with smp_store_release() in vfio_device_open() */
+	if (!smp_load_acquire(&df->access_granted))
+		return -EINVAL;
+
 	if (unlikely(!device->ops->write))
 		return -EINVAL;
 
@@ -1157,6 +1169,10 @@ static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
+	/* Paired with smp_store_release() in vfio_device_open() */
+	if (!smp_load_acquire(&df->access_granted))
+		return -EINVAL;
+
 	if (unlikely(!device->ops->mmap))
 		return -EINVAL;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 08/19] vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset()
  2023-02-21  3:47 [PATCH v4 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (6 preceding siblings ...)
  2023-02-21  3:48 ` [PATCH v4 07/19] vfio: Block device access via device fd until device is opened Yi Liu
@ 2023-02-21  3:48 ` Yi Liu
  2023-02-22  7:20   ` Tian, Kevin
  2023-02-21  3:48 ` [PATCH v4 09/19] vfio/pci: Accept device fd for hot reset Yi Liu
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 55+ messages in thread
From: Yi Liu @ 2023-02-21  3:48 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, intel-gvt-dev, intel-gfx,
	linux-s390, yan.y.zhao, xudong.hao, terrence.xu

this suits more on what the code does.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 4704c1babae3..827524510f3f 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1308,8 +1308,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	}
 
 	/*
-	 * For each group_fd, get the group through the vfio external user
-	 * interface and store the group and iommu ID.  This ensures the group
+	 * For each group_fd, get the group file, this ensures the group
 	 * is held across the reset.
 	 */
 	for (file_idx = 0; file_idx < hdr.count; file_idx++) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 09/19] vfio/pci: Accept device fd for hot reset
  2023-02-21  3:47 [PATCH v4 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (7 preceding siblings ...)
  2023-02-21  3:48 ` [PATCH v4 08/19] vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset() Yi Liu
@ 2023-02-21  3:48 ` Yi Liu
  2023-02-22  7:26   ` Tian, Kevin
  2023-02-21  3:48 ` [PATCH v4 10/19] vfio: Add infrastructure for bind_iommufd from userspace Yi Liu
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 55+ messages in thread
From: Yi Liu @ 2023-02-21  3:48 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, intel-gvt-dev, intel-gfx,
	linux-s390, yan.y.zhao, xudong.hao, terrence.xu

This prepares for using vfio device cdev as no group fd will be opened
in device cdev usage.

vfio_file_is_device_opened() is added for checking a given vfio file is
able to be a proof for the device ownership or not. The reason is that
the cdev path has the device opened in an in-between state, in which the
device is not fully opened. But to be proof of ownership, device should
be fully opened.

This also updates some comments as it also accepts device fd passed by
user. The uapi has no change, but user can specify a set of device fds
in the struct vfio_pci_hot_reset::group_fds field.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 11 ++++++-----
 drivers/vfio/vfio_main.c         | 19 +++++++++++++++++++
 include/linux/vfio.h             |  1 +
 3 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 827524510f3f..09086fefd515 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1280,8 +1280,9 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 
 	/*
 	 * We can't let userspace give us an arbitrarily large buffer to copy,
-	 * so verify how many we think there could be.  Note groups can have
-	 * multiple devices so one group per device is the max.
+	 * so verify how many we think there could be.  Note user may provide
+	 * a set of groups, group can have multiple devices so one group per
+	 * device is the max.
 	 */
 	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_count_devs,
 					    &count, slot);
@@ -1308,7 +1309,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	}
 
 	/*
-	 * For each group_fd, get the group file, this ensures the group
+	 * For each fd, get the file, this ensures the group or device
 	 * is held across the reset.
 	 */
 	for (file_idx = 0; file_idx < hdr.count; file_idx++) {
@@ -1320,7 +1321,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 		}
 
 		/* Ensure the FD is a vfio FD.*/
-		if (!vfio_file_is_valid(file)) {
+		if (!vfio_file_is_device_opened(file)) {
 			fput(file);
 			ret = -EINVAL;
 			break;
@@ -2430,7 +2431,7 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
 	list_for_each_entry(cur_vma, &dev_set->device_list, vdev.dev_set_list) {
 		/*
 		 * Test whether all the affected devices are contained by the
-		 * set of groups provided by the user.
+		 * set of files provided by the user.
 		 */
 		if (!vfio_dev_in_groups(cur_vma, groups)) {
 			ret = -EINVAL;
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 91c8f25393db..2c851e172586 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1209,6 +1209,25 @@ bool vfio_file_is_valid(struct file *file)
 }
 EXPORT_SYMBOL_GPL(vfio_file_is_valid);
 
+/**
+ * vfio_file_is_device_opened - True if the file is fully opened
+ * @file: VFIO group file or VFIO device file
+ */
+bool vfio_file_is_device_opened(struct file *file)
+{
+	struct vfio_device *device;
+
+	if (vfio_group_from_file(file))
+		return true;
+
+	device = vfio_device_from_file(file);
+	if (device)
+		return READ_ONCE(device->open_count);
+
+	return false;
+}
+EXPORT_SYMBOL_GPL(vfio_file_is_device_opened);
+
 /**
  * vfio_file_enforced_coherent - True if the DMA associated with the VFIO file
  *        is always CPU cache coherent
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 6a07e1c6c38e..615f8a081a41 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -249,6 +249,7 @@ bool vfio_file_is_valid(struct file *file);
 bool vfio_file_enforced_coherent(struct file *file);
 void vfio_file_set_kvm(struct file *file, struct kvm *kvm);
 bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
+bool vfio_file_is_device_opened(struct file *file);
 
 #define VFIO_PIN_PAGES_MAX_ENTRIES	(PAGE_SIZE/sizeof(unsigned long))
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 10/19] vfio: Add infrastructure for bind_iommufd from userspace
  2023-02-21  3:47 [PATCH v4 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (8 preceding siblings ...)
  2023-02-21  3:48 ` [PATCH v4 09/19] vfio/pci: Accept device fd for hot reset Yi Liu
@ 2023-02-21  3:48 ` Yi Liu
  2023-02-21  3:48 ` [PATCH v4 11/19] vfio-iommufd: Add detach_ioas support for physical VFIO devices Yi Liu
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 55+ messages in thread
From: Yi Liu @ 2023-02-21  3:48 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, intel-gvt-dev, intel-gfx,
	linux-s390, yan.y.zhao, xudong.hao, terrence.xu

For the device fd opened from cdev, userspace needs to bind it to an
iommufd and attach it to IOAS managed by iommufd. With such operations,
userspace can set up a secure DMA context and hence access device.

This changes the existing vfio_iommufd_bind() to accept a pt_id pointer
as an optional input, and also an dev_id pointer to selectively return
the dev_id to prepare for adding bind_iommufd ioctl, which does the bind
first and then attach IOAS.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 drivers/vfio/group.c     | 17 ++++++++++++++---
 drivers/vfio/iommufd.c   | 21 +++++++++------------
 drivers/vfio/vfio.h      |  9 ++++++---
 drivers/vfio/vfio_main.c | 10 ++++++----
 4 files changed, 35 insertions(+), 22 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 14e29525e354..77559e035078 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -169,6 +169,7 @@ static void vfio_device_group_get_kvm_safe(struct vfio_device *device)
 static int vfio_device_group_open(struct vfio_device_file *df)
 {
 	struct vfio_device *device = df->device;
+	u32 ioas_id;
 	int ret;
 
 	mutex_lock(&device->group->group_lock);
@@ -177,6 +178,13 @@ static int vfio_device_group_open(struct vfio_device_file *df)
 		goto out_unlock;
 	}
 
+	if (device->group->iommufd) {
+		ret = iommufd_vfio_compat_ioas_id(device->group->iommufd,
+						  &ioas_id);
+		if (ret)
+			goto out_unlock;
+	}
+
 	mutex_lock(&device->dev_set->lock);
 
 	/*
@@ -188,9 +196,12 @@ static int vfio_device_group_open(struct vfio_device_file *df)
 	if (device->open_count == 0)
 		vfio_device_group_get_kvm_safe(device);
 
-	df->iommufd = device->group->iommufd;
-
-	ret = vfio_device_open(df);
+	if (device->group->iommufd) {
+		df->iommufd = device->group->iommufd;
+		ret = vfio_device_open(df, NULL, &ioas_id);
+	} else {
+		ret = vfio_device_open(df, NULL, NULL);
+	}
 	if (ret)
 		df->iommufd = NULL;
 
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index 4f82a6fa7c6c..beef6ca21107 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -10,9 +10,9 @@
 MODULE_IMPORT_NS(IOMMUFD);
 MODULE_IMPORT_NS(IOMMUFD_VFIO);
 
-int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
+int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx,
+		      u32 *dev_id, u32 *pt_id)
 {
-	u32 ioas_id;
 	u32 device_id;
 	int ret;
 
@@ -29,17 +29,14 @@ int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
 	if (ret)
 		return ret;
 
-	ret = iommufd_vfio_compat_ioas_id(ictx, &ioas_id);
-	if (ret)
-		goto err_unbind;
-	ret = vdev->ops->attach_ioas(vdev, &ioas_id);
-	if (ret)
-		goto err_unbind;
+	if (pt_id) {
+		ret = vdev->ops->attach_ioas(vdev, pt_id);
+		if (ret)
+			goto err_unbind;
+	}
 
-	/*
-	 * The legacy path has no way to return the device id or the selected
-	 * pt_id
-	 */
+	if (dev_id)
+		*dev_id = device_id;
 	return 0;
 
 err_unbind:
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index d56cdb114024..6f063e31d08a 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -26,7 +26,8 @@ struct vfio_device_file {
 
 void vfio_device_put_registration(struct vfio_device *device);
 bool vfio_device_try_get_registration(struct vfio_device *device);
-int vfio_device_open(struct vfio_device_file *df);
+int vfio_device_open(struct vfio_device_file *df,
+		     u32 *dev_id, u32 *pt_id);
 void vfio_device_close(struct vfio_device_file *df);
 struct vfio_device_file *
 vfio_allocate_device_file(struct vfio_device *device);
@@ -224,11 +225,13 @@ static inline void vfio_container_cleanup(void)
 #endif
 
 #if IS_ENABLED(CONFIG_IOMMUFD)
-int vfio_iommufd_bind(struct vfio_device *device, struct iommufd_ctx *ictx);
+int vfio_iommufd_bind(struct vfio_device *device, struct iommufd_ctx *ictx,
+		      u32 *dev_id, u32 *pt_id);
 void vfio_iommufd_unbind(struct vfio_device *device);
 #else
 static inline int vfio_iommufd_bind(struct vfio_device *device,
-				    struct iommufd_ctx *ictx)
+				    struct iommufd_ctx *ictx,
+				    u32 *dev_id, u32 *pt_id)
 {
 	return -EOPNOTSUPP;
 }
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 2c851e172586..fd5b4dfa9615 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -411,7 +411,8 @@ vfio_allocate_device_file(struct vfio_device *device)
 	return df;
 }
 
-static int vfio_device_first_open(struct vfio_device_file *df)
+static int vfio_device_first_open(struct vfio_device_file *df,
+				  u32 *dev_id, u32 *pt_id)
 {
 	struct vfio_device *device = df->device;
 	struct iommufd_ctx *iommufd = df->iommufd;
@@ -423,7 +424,7 @@ static int vfio_device_first_open(struct vfio_device_file *df)
 		return -ENODEV;
 
 	if (iommufd)
-		ret = vfio_iommufd_bind(device, iommufd);
+		ret = vfio_iommufd_bind(device, iommufd, dev_id, pt_id);
 	else
 		ret = vfio_device_group_use_iommu(device);
 	if (ret)
@@ -462,7 +463,8 @@ static void vfio_device_last_close(struct vfio_device_file *df)
 	module_put(device->dev->driver->owner);
 }
 
-int vfio_device_open(struct vfio_device_file *df)
+int vfio_device_open(struct vfio_device_file *df,
+		     u32 *dev_id, u32 *pt_id)
 {
 	struct vfio_device *device = df->device;
 	int ret = 0;
@@ -471,7 +473,7 @@ int vfio_device_open(struct vfio_device_file *df)
 
 	device->open_count++;
 	if (device->open_count == 1) {
-		ret = vfio_device_first_open(df);
+		ret = vfio_device_first_open(df, dev_id, pt_id);
 		if (ret)
 			device->open_count--;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 11/19] vfio-iommufd: Add detach_ioas support for physical VFIO devices
  2023-02-21  3:47 [PATCH v4 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (9 preceding siblings ...)
  2023-02-21  3:48 ` [PATCH v4 10/19] vfio: Add infrastructure for bind_iommufd from userspace Yi Liu
@ 2023-02-21  3:48 ` Yi Liu
  2023-02-21  3:48 ` [PATCH v4 12/19] vfio-iommufd: Add detach_ioas for emulated " Yi Liu
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 55+ messages in thread
From: Yi Liu @ 2023-02-21  3:48 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, intel-gvt-dev, intel-gfx,
	linux-s390, yan.y.zhao, xudong.hao, terrence.xu

this prepares for adding DETACH ioctl for physical VFIO devices.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 Documentation/driver-api/vfio.rst             |  8 +++++---
 drivers/vfio/fsl-mc/vfio_fsl_mc.c             |  1 +
 drivers/vfio/iommufd.c                        | 20 +++++++++++++++++++
 .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    |  2 ++
 drivers/vfio/pci/mlx5/main.c                  |  1 +
 drivers/vfio/pci/vfio_pci.c                   |  1 +
 drivers/vfio/platform/vfio_amba.c             |  1 +
 drivers/vfio/platform/vfio_platform.c         |  1 +
 drivers/vfio/vfio_main.c                      |  3 ++-
 include/linux/vfio.h                          |  8 +++++++-
 10 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
index 50b690f7f663..44527420f20d 100644
--- a/Documentation/driver-api/vfio.rst
+++ b/Documentation/driver-api/vfio.rst
@@ -279,6 +279,7 @@ similar to a file operations structure::
 					struct iommufd_ctx *ictx, u32 *out_device_id);
 		void	(*unbind_iommufd)(struct vfio_device *vdev);
 		int	(*attach_ioas)(struct vfio_device *vdev, u32 *pt_id);
+		void	(*detach_ioas)(struct vfio_device *vdev);
 		int	(*open_device)(struct vfio_device *vdev);
 		void	(*close_device)(struct vfio_device *vdev);
 		ssize_t	(*read)(struct vfio_device *vdev, char __user *buf,
@@ -315,9 +316,10 @@ container_of().
 	- The [un]bind_iommufd callbacks are issued when the device is bound to
 	  and unbound from iommufd.
 
-	- The attach_ioas callback is issued when the device is attached to an
-	  IOAS managed by the bound iommufd. The attached IOAS is automatically
-	  detached when the device is unbound from iommufd.
+	- The [de]attach_ioas callback is issued when the device is attached to
+	  and detached from an IOAS managed by the bound iommufd. However, the
+	  attached IOAS can also be automatically detached when the device is
+	  unbound from iommufd.
 
 	- The read/write/mmap callbacks implement the device region access defined
 	  by the device's own VFIO_DEVICE_GET_REGION_INFO ioctl.
diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc.c b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
index c89a047a4cd8..d540cf683d93 100644
--- a/drivers/vfio/fsl-mc/vfio_fsl_mc.c
+++ b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
@@ -594,6 +594,7 @@ static const struct vfio_device_ops vfio_fsl_mc_ops = {
 	.bind_iommufd	= vfio_iommufd_physical_bind,
 	.unbind_iommufd	= vfio_iommufd_physical_unbind,
 	.attach_ioas	= vfio_iommufd_physical_attach_ioas,
+	.detach_ioas	= vfio_iommufd_physical_detach_ioas,
 };
 
 static struct fsl_mc_driver vfio_fsl_mc_driver = {
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index beef6ca21107..bfaa9876499b 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -88,6 +88,14 @@ int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
 {
 	int rc;
 
+	lockdep_assert_held(&vdev->dev_set->lock);
+
+	if (!vdev->iommufd_device)
+		return -EINVAL;
+
+	if (vdev->iommufd_attached)
+		return -EBUSY;
+
 	rc = iommufd_device_attach(vdev->iommufd_device, pt_id);
 	if (rc)
 		return rc;
@@ -96,6 +104,18 @@ int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
 }
 EXPORT_SYMBOL_GPL(vfio_iommufd_physical_attach_ioas);
 
+void vfio_iommufd_physical_detach_ioas(struct vfio_device *vdev)
+{
+	lockdep_assert_held(&vdev->dev_set->lock);
+
+	if (!vdev->iommufd_device || !vdev->iommufd_attached)
+		return;
+
+	iommufd_device_detach(vdev->iommufd_device);
+	vdev->iommufd_attached = false;
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_physical_detach_ioas);
+
 /*
  * The emulated standard ops mean that vfio_device is going to use the
  * "mdev path" and will call vfio_pin_pages()/vfio_dma_rw(). Drivers using this
diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index a117eaf21c14..b2f9778c8366 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -1373,6 +1373,7 @@ static const struct vfio_device_ops hisi_acc_vfio_pci_migrn_ops = {
 	.bind_iommufd = vfio_iommufd_physical_bind,
 	.unbind_iommufd = vfio_iommufd_physical_unbind,
 	.attach_ioas = vfio_iommufd_physical_attach_ioas,
+	.detach_ioas = vfio_iommufd_physical_detach_ioas,
 };
 
 static const struct vfio_device_ops hisi_acc_vfio_pci_ops = {
@@ -1391,6 +1392,7 @@ static const struct vfio_device_ops hisi_acc_vfio_pci_ops = {
 	.bind_iommufd = vfio_iommufd_physical_bind,
 	.unbind_iommufd = vfio_iommufd_physical_unbind,
 	.attach_ioas = vfio_iommufd_physical_attach_ioas,
+	.detach_ioas = vfio_iommufd_physical_detach_ioas,
 };
 
 static int hisi_acc_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
index e897537a9e8a..6fc3410989eb 100644
--- a/drivers/vfio/pci/mlx5/main.c
+++ b/drivers/vfio/pci/mlx5/main.c
@@ -1326,6 +1326,7 @@ static const struct vfio_device_ops mlx5vf_pci_ops = {
 	.bind_iommufd = vfio_iommufd_physical_bind,
 	.unbind_iommufd = vfio_iommufd_physical_unbind,
 	.attach_ioas = vfio_iommufd_physical_attach_ioas,
+	.detach_ioas = vfio_iommufd_physical_detach_ioas,
 };
 
 static int mlx5vf_pci_probe(struct pci_dev *pdev,
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 29091ee2e984..cb5b7f865d58 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -141,6 +141,7 @@ static const struct vfio_device_ops vfio_pci_ops = {
 	.bind_iommufd	= vfio_iommufd_physical_bind,
 	.unbind_iommufd	= vfio_iommufd_physical_unbind,
 	.attach_ioas	= vfio_iommufd_physical_attach_ioas,
+	.detach_ioas	= vfio_iommufd_physical_detach_ioas,
 };
 
 static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
diff --git a/drivers/vfio/platform/vfio_amba.c b/drivers/vfio/platform/vfio_amba.c
index 83fe54015595..6464b3939ebc 100644
--- a/drivers/vfio/platform/vfio_amba.c
+++ b/drivers/vfio/platform/vfio_amba.c
@@ -119,6 +119,7 @@ static const struct vfio_device_ops vfio_amba_ops = {
 	.bind_iommufd	= vfio_iommufd_physical_bind,
 	.unbind_iommufd	= vfio_iommufd_physical_unbind,
 	.attach_ioas	= vfio_iommufd_physical_attach_ioas,
+	.detach_ioas	= vfio_iommufd_physical_detach_ioas,
 };
 
 static const struct amba_id pl330_ids[] = {
diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c
index 22a1efca32a8..8cf22fa65baa 100644
--- a/drivers/vfio/platform/vfio_platform.c
+++ b/drivers/vfio/platform/vfio_platform.c
@@ -108,6 +108,7 @@ static const struct vfio_device_ops vfio_platform_ops = {
 	.bind_iommufd	= vfio_iommufd_physical_bind,
 	.unbind_iommufd	= vfio_iommufd_physical_unbind,
 	.attach_ioas	= vfio_iommufd_physical_attach_ioas,
+	.detach_ioas	= vfio_iommufd_physical_detach_ioas,
 };
 
 static struct platform_driver vfio_platform_driver = {
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index fd5b4dfa9615..484f89eef7e5 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -250,7 +250,8 @@ static int __vfio_register_dev(struct vfio_device *device,
 
 	if (WARN_ON(device->ops->bind_iommufd &&
 		    (!device->ops->unbind_iommufd ||
-		     !device->ops->attach_ioas)))
+		     !device->ops->attach_ioas ||
+		     !device->ops->detach_ioas)))
 		return -EINVAL;
 
 	/*
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 615f8a081a41..234a5456b81a 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -73,7 +73,9 @@ struct vfio_device {
  * @bind_iommufd: Called when binding the device to an iommufd
  * @unbind_iommufd: Opposite of bind_iommufd
  * @attach_ioas: Called when attaching device to an IOAS/HWPT managed by the
- *		 bound iommufd. Undo in unbind_iommufd.
+ *		 bound iommufd. Undo in unbind_iommufd if @detach_ioas is not
+ *		 called
+ * @detach_ioas: Opposite of attach_ioas
  * @open_device: Called when the first file descriptor is opened for this device
  * @close_device: Opposite of open_device
  * @read: Perform read(2) on device file descriptor
@@ -97,6 +99,7 @@ struct vfio_device_ops {
 				struct iommufd_ctx *ictx, u32 *out_device_id);
 	void	(*unbind_iommufd)(struct vfio_device *vdev);
 	int	(*attach_ioas)(struct vfio_device *vdev, u32 *pt_id);
+	void	(*detach_ioas)(struct vfio_device *vdev);
 	int	(*open_device)(struct vfio_device *vdev);
 	void	(*close_device)(struct vfio_device *vdev);
 	ssize_t	(*read)(struct vfio_device *vdev, char __user *buf,
@@ -118,6 +121,7 @@ int vfio_iommufd_physical_bind(struct vfio_device *vdev,
 			       struct iommufd_ctx *ictx, u32 *out_device_id);
 void vfio_iommufd_physical_unbind(struct vfio_device *vdev);
 int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
+void vfio_iommufd_physical_detach_ioas(struct vfio_device *vdev);
 int vfio_iommufd_emulated_bind(struct vfio_device *vdev,
 			       struct iommufd_ctx *ictx, u32 *out_device_id);
 void vfio_iommufd_emulated_unbind(struct vfio_device *vdev);
@@ -130,6 +134,8 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
 	((void (*)(struct vfio_device *vdev)) NULL)
 #define vfio_iommufd_physical_attach_ioas \
 	((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
+#define vfio_iommufd_physical_detach_ioas \
+	((void (*)(struct vfio_device *vdev)) NULL)
 #define vfio_iommufd_emulated_bind                                      \
 	((int (*)(struct vfio_device *vdev, struct iommufd_ctx *ictx,   \
 		  u32 *out_device_id)) NULL)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 12/19] vfio-iommufd: Add detach_ioas for emulated VFIO devices
  2023-02-21  3:47 [PATCH v4 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (10 preceding siblings ...)
  2023-02-21  3:48 ` [PATCH v4 11/19] vfio-iommufd: Add detach_ioas support for physical VFIO devices Yi Liu
@ 2023-02-21  3:48 ` Yi Liu
  2023-02-21  3:48 ` [PATCH v4 13/19] vfio: Add cdev_device_open_cnt to vfio_group Yi Liu
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 55+ messages in thread
From: Yi Liu @ 2023-02-21  3:48 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, intel-gvt-dev, intel-gfx,
	linux-s390, yan.y.zhao, xudong.hao, terrence.xu

this prepares for adding DETACH ioctl for emulated VFIO devices.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c  |  1 +
 drivers/s390/cio/vfio_ccw_ops.c   |  1 +
 drivers/s390/crypto/vfio_ap_ops.c |  1 +
 drivers/vfio/iommufd.c            | 18 ++++++++++++++++++
 include/linux/vfio.h              |  3 +++
 5 files changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 8ae7039b3683..8a76a84bc3c1 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1474,6 +1474,7 @@ static const struct vfio_device_ops intel_vgpu_dev_ops = {
 	.bind_iommufd	= vfio_iommufd_emulated_bind,
 	.unbind_iommufd = vfio_iommufd_emulated_unbind,
 	.attach_ioas	= vfio_iommufd_emulated_attach_ioas,
+	.detach_ioas	= vfio_iommufd_emulated_detach_ioas,
 };
 
 static int intel_vgpu_probe(struct mdev_device *mdev)
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index 5b53b94f13c7..cba4971618ff 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -632,6 +632,7 @@ static const struct vfio_device_ops vfio_ccw_dev_ops = {
 	.bind_iommufd = vfio_iommufd_emulated_bind,
 	.unbind_iommufd = vfio_iommufd_emulated_unbind,
 	.attach_ioas = vfio_iommufd_emulated_attach_ioas,
+	.detach_ioas = vfio_iommufd_emulated_detach_ioas,
 };
 
 struct mdev_driver vfio_ccw_mdev_driver = {
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 9c01957e56b3..f99c69d40982 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1802,6 +1802,7 @@ static const struct vfio_device_ops vfio_ap_matrix_dev_ops = {
 	.bind_iommufd = vfio_iommufd_emulated_bind,
 	.unbind_iommufd = vfio_iommufd_emulated_unbind,
 	.attach_ioas = vfio_iommufd_emulated_attach_ioas,
+	.detach_ioas = vfio_iommufd_emulated_detach_ioas,
 };
 
 static struct mdev_driver vfio_ap_matrix_driver = {
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index bfaa9876499b..faf2516b0f06 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -165,6 +165,12 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
 
 	lockdep_assert_held(&vdev->dev_set->lock);
 
+	if (!vdev->iommufd_ictx)
+		return -EINVAL;
+
+	if (vdev->iommufd_access)
+		return -EBUSY;
+
 	user = iommufd_access_create(vdev->iommufd_ictx, *pt_id, &vfio_user_ops,
 				     vdev);
 	if (IS_ERR(user))
@@ -173,3 +179,15 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
 	return 0;
 }
 EXPORT_SYMBOL_GPL(vfio_iommufd_emulated_attach_ioas);
+
+void vfio_iommufd_emulated_detach_ioas(struct vfio_device *vdev)
+{
+	lockdep_assert_held(&vdev->dev_set->lock);
+
+	if (!vdev->iommufd_ictx || !vdev->iommufd_access)
+		return;
+
+	iommufd_access_destroy(vdev->iommufd_access);
+	vdev->iommufd_access = NULL;
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_emulated_detach_ioas);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 234a5456b81a..6963514d9249 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -126,6 +126,7 @@ int vfio_iommufd_emulated_bind(struct vfio_device *vdev,
 			       struct iommufd_ctx *ictx, u32 *out_device_id);
 void vfio_iommufd_emulated_unbind(struct vfio_device *vdev);
 int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
+void vfio_iommufd_emulated_detach_ioas(struct vfio_device *vdev);
 #else
 #define vfio_iommufd_physical_bind                                      \
 	((int (*)(struct vfio_device *vdev, struct iommufd_ctx *ictx,   \
@@ -143,6 +144,8 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
 	((void (*)(struct vfio_device *vdev)) NULL)
 #define vfio_iommufd_emulated_attach_ioas \
 	((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
+#define vfio_iommufd_emulated_detach_ioas \
+	((void (*)(struct vfio_device *vdev)) NULL)
 #endif
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 13/19] vfio: Add cdev_device_open_cnt to vfio_group
  2023-02-21  3:47 [PATCH v4 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (11 preceding siblings ...)
  2023-02-21  3:48 ` [PATCH v4 12/19] vfio-iommufd: Add detach_ioas for emulated " Yi Liu
@ 2023-02-21  3:48 ` Yi Liu
  2023-02-22  7:31   ` Tian, Kevin
  2023-02-21  3:48 ` [PATCH v4 14/19] vfio: Make vfio_device_open() single open for device cdev path Yi Liu
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 55+ messages in thread
From: Yi Liu @ 2023-02-21  3:48 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, intel-gvt-dev, intel-gfx,
	linux-s390, yan.y.zhao, xudong.hao, terrence.xu

for counting the devices that are opened via the cdev path. This count
is increased and decreased by the cdev path. The group path checks it
to achieve exclusion with the cdev path. With this, only one path (group
path or cdev path) will claim DMA ownership. This avoids scenarios in
which devices within the same group may be opened via different paths.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c | 32 ++++++++++++++++++++++++++++++++
 drivers/vfio/vfio.h  |  3 +++
 2 files changed, 35 insertions(+)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 77559e035078..c19be9ea398b 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -387,6 +387,33 @@ static long vfio_group_fops_unl_ioctl(struct file *filep,
 	}
 }
 
+int vfio_device_block_group(struct vfio_device *device)
+{
+	struct vfio_group *group = device->group;
+	int ret = 0;
+
+	mutex_lock(&group->group_lock);
+	if (group->opened_file) {
+		ret = -EBUSY;
+		goto out_unlock;
+	}
+
+	group->cdev_device_open_cnt++;
+
+out_unlock:
+	mutex_unlock(&group->group_lock);
+	return ret;
+}
+
+void vfio_device_unblock_group(struct vfio_device *device)
+{
+	struct vfio_group *group = device->group;
+
+	mutex_lock(&group->group_lock);
+	group->cdev_device_open_cnt--;
+	mutex_unlock(&group->group_lock);
+}
+
 static int vfio_group_fops_open(struct inode *inode, struct file *filep)
 {
 	struct vfio_group *group =
@@ -409,6 +436,11 @@ static int vfio_group_fops_open(struct inode *inode, struct file *filep)
 		goto out_unlock;
 	}
 
+	if (group->cdev_device_open_cnt) {
+		ret = -EBUSY;
+		goto out_unlock;
+	}
+
 	/*
 	 * Do we need multiple instances of the group open?  Seems not.
 	 */
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 6f063e31d08a..bf84cf36eac7 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -84,8 +84,11 @@ struct vfio_group {
 	struct blocking_notifier_head	notifier;
 	struct iommufd_ctx		*iommufd;
 	spinlock_t			kvm_ref_lock;
+	unsigned int			cdev_device_open_cnt;
 };
 
+int vfio_device_block_group(struct vfio_device *device);
+void vfio_device_unblock_group(struct vfio_device *device);
 int vfio_device_set_group(struct vfio_device *device,
 			  enum vfio_group_type type);
 void vfio_device_remove_group(struct vfio_device *device);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 14/19] vfio: Make vfio_device_open() single open for device cdev path
  2023-02-21  3:47 [PATCH v4 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (12 preceding siblings ...)
  2023-02-21  3:48 ` [PATCH v4 13/19] vfio: Add cdev_device_open_cnt to vfio_group Yi Liu
@ 2023-02-21  3:48 ` Yi Liu
  2023-02-22  7:32   ` Tian, Kevin
  2023-02-21  3:48 ` [PATCH v4 15/19] vfio: Add cdev for vfio_device Yi Liu
                   ` (4 subsequent siblings)
  18 siblings, 1 reply; 55+ messages in thread
From: Yi Liu @ 2023-02-21  3:48 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, intel-gvt-dev, intel-gfx,
	linux-s390, yan.y.zhao, xudong.hao, terrence.xu

VFIO group has historically allowed multi-open of the device FD. This
was made secure because the "open" was executed via an ioctl to the
group FD which is itself only single open.

However, no known use of multiple device FDs today. It is kind of a
strange thing to do because new device FDs can naturally be created
via dup().

When we implement the new device uAPI (only used in cdev path) there is
no natural way to allow the device itself from being multi-opened in a
secure manner. Without the group FD we cannot prove the security context
of the opener.

Thus, when moving to the new uAPI we block the ability to multi-open
the device. Old group path still allows it.

vfio_device_open() needs to sustain both the legacy behavior i.e. multi-open
in the group path and the new behavior i.e. single-open in the cdev path.
This mixture leads to the introduction of a new is_cdev_device flag in struct
vfio_device_file.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/vfio.h      |  2 ++
 drivers/vfio/vfio_main.c | 10 +++++++++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index bf84cf36eac7..68be0e8279c7 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -18,6 +18,8 @@ struct vfio_container;
 
 struct vfio_device_file {
 	struct vfio_device *device;
+	bool is_cdev_device;
+
 	bool access_granted;
 	spinlock_t kvm_ref_lock; /* protect kvm field */
 	struct kvm *kvm;
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 484f89eef7e5..925127a38a3a 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -472,6 +472,13 @@ int vfio_device_open(struct vfio_device_file *df,
 
 	lockdep_assert_held(&device->dev_set->lock);
 
+	/*
+	 * Device cdev path cannot support multiple device open since
+	 * it doesn't have a secure way for it.
+	 */
+	if (device->open_count != 0 && df->is_cdev_device)
+		return -EINVAL;
+
 	device->open_count++;
 	if (device->open_count == 1) {
 		ret = vfio_device_first_open(df, dev_id, pt_id);
@@ -535,7 +542,8 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
-	vfio_device_group_close(df);
+	if (!df->is_cdev_device)
+		vfio_device_group_close(df);
 
 	vfio_device_put_registration(device);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 15/19] vfio: Add cdev for vfio_device
  2023-02-21  3:47 [PATCH v4 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (13 preceding siblings ...)
  2023-02-21  3:48 ` [PATCH v4 14/19] vfio: Make vfio_device_open() single open for device cdev path Yi Liu
@ 2023-02-21  3:48 ` Yi Liu
  2023-02-22  7:34   ` Tian, Kevin
  2023-02-21  3:48 ` [PATCH v4 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD Yi Liu
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 55+ messages in thread
From: Yi Liu @ 2023-02-21  3:48 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, intel-gvt-dev, intel-gfx,
	linux-s390, yan.y.zhao, xudong.hao, terrence.xu

This allows user to directly open a vfio device w/o using the legacy
container/group interface, as a prerequisite for supporting new iommu
features like nested translation.

The device fd opened in this manner doesn't have the capability to access
the device as the fops open() doesn't open the device until the successful
BIND_IOMMUFD which be added in next patch.

With this patch, devices registered to vfio core have both group and device
interface created.

- group interface : /dev/vfio/$groupID
- device interface: /dev/vfio/devices/vfioX  (X is the minor number and
					      is unique across devices)

Given a vfio device the user can identify the matching vfioX by checking
the sysfs path of the device. Take PCI device (0000:6a:01.0) for example,
/sys/bus/pci/devices/0000\:6a\:01.0/vfio-dev/vfio0/dev contains the
major:minor of the matching vfioX.

Userspace then opens the /dev/vfio/devices/vfioX and checks with fstat
that the major:minor matches.

The vfio_device cdev logic in this patch:
*) __vfio_register_dev() path ends up doing cdev_device_add() for each
   vfio_device;
*) vfio_unregister_group_dev() path does cdev_device_del();

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/Kconfig       | 12 ++++++++
 drivers/vfio/Makefile      |  1 +
 drivers/vfio/device_cdev.c | 63 ++++++++++++++++++++++++++++++++++++++
 drivers/vfio/vfio.h        | 46 ++++++++++++++++++++++++++++
 drivers/vfio/vfio_main.c   | 22 ++++++++++---
 include/linux/vfio.h       |  4 +++
 6 files changed, 144 insertions(+), 4 deletions(-)
 create mode 100644 drivers/vfio/device_cdev.c

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index a8f544629467..169762316513 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -12,6 +12,18 @@ menuconfig VFIO
 	  If you don't know what to do here, say N.
 
 if VFIO
+config VFIO_DEVICE_CDEV
+	bool "Support for the VFIO cdev /dev/vfio/devices/vfioX"
+	depends on IOMMUFD && (X86 || S390 || ARM || ARM64)
+	help
+	  The VFIO device cdev is another way for userspace to get device
+	  access. Userspace gets device fd by opening device cdev under
+	  /dev/vfio/devices/vfioX, and then bind the device fd with an iommufd
+	  to set up secure context for device access. It is not available for
+	  SPAPR_TCE_IOMMU.
+
+	  If you don't know what to do here, say N.
+
 config VFIO_CONTAINER
 	bool "Support for the VFIO container /dev/vfio/vfio"
 	select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index 70e7dcb302ef..245394aeb94b 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_VFIO) += vfio.o
 vfio-y += vfio_main.o \
 	  group.o \
 	  iova_bitmap.o
+vfio-$(CONFIG_VFIO_DEVICE_CDEV) += device_cdev.o
 vfio-$(CONFIG_IOMMUFD) += iommufd.o
 vfio-$(CONFIG_VFIO_CONTAINER) += container.o
 vfio-$(CONFIG_VFIO_VIRQFD) += virqfd.o
diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
new file mode 100644
index 000000000000..9e2c1ecaaf4f
--- /dev/null
+++ b/drivers/vfio/device_cdev.c
@@ -0,0 +1,63 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2023 Intel Corporation.
+ */
+#include <linux/vfio.h>
+
+#include "vfio.h"
+
+static dev_t device_devt;
+
+void vfio_init_device_cdev(struct vfio_device *device)
+{
+	device->device.devt = MKDEV(MAJOR(device_devt), device->index);
+	cdev_init(&device->cdev, &vfio_device_fops);
+	device->cdev.owner = THIS_MODULE;
+}
+
+/*
+ * device access via the fd opened by this function is blocked until
+ * .open_device() is called successfully during BIND_IOMMUFD.
+ */
+int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
+{
+	struct vfio_device *device = container_of(inode->i_cdev,
+						  struct vfio_device, cdev);
+	struct vfio_device_file *df;
+	int ret;
+
+	if (!vfio_device_try_get_registration(device))
+		return -ENODEV;
+
+	df = vfio_allocate_device_file(device);
+	if (IS_ERR(df)) {
+		ret = PTR_ERR(df);
+		goto err_put_registration;
+	}
+
+	df->is_cdev_device = true;
+	filep->private_data = df;
+
+	return 0;
+
+err_put_registration:
+	vfio_device_put_registration(device);
+	return ret;
+}
+
+static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
+{
+	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
+}
+
+int vfio_cdev_init(struct class *device_class)
+{
+	device_class->devnode = vfio_device_devnode;
+	return alloc_chrdev_region(&device_devt, 0,
+				   MINORMASK + 1, "vfio-dev");
+}
+
+void vfio_cdev_cleanup(void)
+{
+	unregister_chrdev_region(device_devt, MINORMASK + 1);
+}
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 68be0e8279c7..15734dbba9fb 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -246,6 +246,52 @@ static inline void vfio_iommufd_unbind(struct vfio_device *device)
 }
 #endif
 
+#if IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV)
+static inline int vfio_device_add(struct vfio_device *device)
+{
+	return cdev_device_add(&device->cdev, &device->device);
+}
+
+static inline void vfio_device_del(struct vfio_device *device)
+{
+	cdev_device_del(&device->cdev, &device->device);
+}
+
+void vfio_init_device_cdev(struct vfio_device *device);
+int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
+int vfio_cdev_init(struct class *device_class);
+void vfio_cdev_cleanup(void);
+#else
+static inline int vfio_device_add(struct vfio_device *device)
+{
+	return device_add(&device->device);
+}
+
+static inline void vfio_device_del(struct vfio_device *device)
+{
+	device_del(&device->device);
+}
+
+static inline void vfio_init_device_cdev(struct vfio_device *device)
+{
+}
+
+static inline int vfio_device_fops_cdev_open(struct inode *inode,
+					     struct file *filep)
+{
+	return 0;
+}
+
+static inline int vfio_cdev_init(struct class *device_class)
+{
+	return 0;
+}
+
+static inline void vfio_cdev_cleanup(void)
+{
+}
+#endif /* CONFIG_VFIO_DEVICE_CDEV */
+
 #if IS_ENABLED(CONFIG_VFIO_VIRQFD)
 int __init vfio_virqfd_init(void);
 void vfio_virqfd_exit(void);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 925127a38a3a..48616ef97dfd 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -235,6 +235,7 @@ static int vfio_init_device(struct vfio_device *device, struct device *dev,
 	device->device.release = vfio_device_release;
 	device->device.class = vfio.device_class;
 	device->device.parent = device->dev;
+	vfio_init_device_cdev(device);
 	return 0;
 
 out_uninit:
@@ -269,7 +270,7 @@ static int __vfio_register_dev(struct vfio_device *device,
 	if (ret)
 		return ret;
 
-	ret = device_add(&device->device);
+	ret = vfio_device_add(device);
 	if (ret)
 		goto err_out;
 
@@ -309,6 +310,13 @@ void vfio_unregister_group_dev(struct vfio_device *device)
 	bool interrupted = false;
 	long rc;
 
+	/*
+	 * Balances vfio_device_add in register path. Putting it as the
+	 * first operation in unregister to prevent registration refcount
+	 * from incrementing per cdev open.
+	 */
+	vfio_device_del(device);
+
 	vfio_device_put_registration(device);
 	rc = try_wait_for_completion(&device->comp);
 	while (rc <= 0) {
@@ -334,9 +342,6 @@ void vfio_unregister_group_dev(struct vfio_device *device)
 
 	vfio_device_group_unregister(device);
 
-	/* Balances device_add in register path */
-	device_del(&device->device);
-
 	/* Balances vfio_device_set_group in register path */
 	vfio_device_remove_group(device);
 }
@@ -1192,6 +1197,7 @@ static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
 
 const struct file_operations vfio_device_fops = {
 	.owner		= THIS_MODULE,
+	.open		= vfio_device_fops_cdev_open,
 	.release	= vfio_device_fops_release,
 	.read		= vfio_device_fops_read,
 	.write		= vfio_device_fops_write,
@@ -1584,9 +1590,16 @@ static int __init vfio_init(void)
 		goto err_dev_class;
 	}
 
+	ret = vfio_cdev_init(vfio.device_class);
+	if (ret)
+		goto err_alloc_dev_chrdev;
+
 	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
 	return 0;
 
+err_alloc_dev_chrdev:
+	class_destroy(vfio.device_class);
+	vfio.device_class = NULL;
 err_dev_class:
 	vfio_virqfd_exit();
 err_virqfd:
@@ -1597,6 +1610,7 @@ static int __init vfio_init(void)
 static void __exit vfio_cleanup(void)
 {
 	ida_destroy(&vfio.device_ida);
+	vfio_cdev_cleanup();
 	class_destroy(vfio.device_class);
 	vfio.device_class = NULL;
 	vfio_virqfd_exit();
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 6963514d9249..a94cfe8d8073 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -13,6 +13,7 @@
 #include <linux/mm.h>
 #include <linux/workqueue.h>
 #include <linux/poll.h>
+#include <linux/cdev.h>
 #include <uapi/linux/vfio.h>
 #include <linux/iova_bitmap.h>
 
@@ -51,6 +52,9 @@ struct vfio_device {
 	/* Members below here are private, not for driver use */
 	unsigned int index;
 	struct device device;	/* device.kref covers object life circle */
+#if IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV)
+	struct cdev cdev;
+#endif
 	refcount_t refcount;	/* user count on registered device*/
 	unsigned int open_count;
 	struct completion comp;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-02-21  3:47 [PATCH v4 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (14 preceding siblings ...)
  2023-02-21  3:48 ` [PATCH v4 15/19] vfio: Add cdev for vfio_device Yi Liu
@ 2023-02-21  3:48 ` Yi Liu
  2023-02-22  7:39   ` Tian, Kevin
  2023-02-22  7:53   ` Yan Zhao
  2023-02-21  3:48 ` [PATCH v4 17/19] vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT Yi Liu
                   ` (2 subsequent siblings)
  18 siblings, 2 replies; 55+ messages in thread
From: Yi Liu @ 2023-02-21  3:48 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, intel-gvt-dev, intel-gfx,
	linux-s390, yan.y.zhao, xudong.hao, terrence.xu

This adds ioctl for userspace to bind device cdev fd to iommufd.

    VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA
			      control provided by the iommufd. open_device
			      op is called after bind_iommufd op.
			      VFIO no iommu mode is indicated by passing
			      a negative iommufd value.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/device_cdev.c | 137 +++++++++++++++++++++++++++++++++++++
 drivers/vfio/vfio.h        |  17 ++++-
 drivers/vfio/vfio_main.c   |  30 ++++++--
 include/linux/iommufd.h    |   6 ++
 include/uapi/linux/vfio.h  |  34 +++++++++
 5 files changed, 219 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
index 9e2c1ecaaf4f..be62f0a5687e 100644
--- a/drivers/vfio/device_cdev.c
+++ b/drivers/vfio/device_cdev.c
@@ -3,6 +3,7 @@
  * Copyright (c) 2023 Intel Corporation.
  */
 #include <linux/vfio.h>
+#include <linux/iommufd.h>
 
 #include "vfio.h"
 
@@ -45,6 +46,142 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
 	return ret;
 }
 
+static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
+{
+	spin_lock(&df->kvm_ref_lock);
+	if (!df->kvm)
+		goto unlock;
+
+	_vfio_device_get_kvm_safe(df->device, df->kvm);
+
+unlock:
+	spin_unlock(&df->kvm_ref_lock);
+}
+
+void vfio_device_cdev_close(struct vfio_device_file *df)
+{
+	struct vfio_device *device = df->device;
+
+	mutex_lock(&device->dev_set->lock);
+	if (!smp_load_acquire(&df->access_granted)) {
+		mutex_unlock(&device->dev_set->lock);
+		return;
+	}
+	vfio_device_close(df);
+	vfio_device_put_kvm(device);
+	if (df->iommufd)
+		iommufd_ctx_put(df->iommufd);
+	mutex_unlock(&device->dev_set->lock);
+	vfio_device_unblock_group(device);
+}
+
+long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
+				    unsigned long arg)
+{
+	struct vfio_device *device = df->device;
+	struct vfio_device_bind_iommufd bind;
+	struct iommufd_ctx *iommufd = NULL;
+	struct fd f;
+	unsigned long minsz;
+	int ret;
+
+	minsz = offsetofend(struct vfio_device_bind_iommufd, out_devid);
+
+	if (copy_from_user(&bind, (void __user *)arg, minsz))
+		return -EFAULT;
+
+	if (bind.argsz < minsz || bind.flags)
+		return -EINVAL;
+
+	if (!device->ops->bind_iommufd)
+		return -ENODEV;
+
+	ret = vfio_device_block_group(device);
+	if (ret)
+		return ret;
+
+	mutex_lock(&device->dev_set->lock);
+	/*
+	 * If already been bound to an iommufd, or already set noiommu
+	 * then fail it.
+	 */
+	if (df->iommufd || df->noiommu) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	/* iommufd < 0 means noiommu mode */
+	if (bind.iommufd < 0) {
+		if (!capable(CAP_SYS_RAWIO)) {
+			ret = -EPERM;
+			goto out_unlock;
+		}
+		df->noiommu = true;
+	} else {
+		f = fdget(bind.iommufd);
+		if (!f.file) {
+			ret = -EBADF;
+			goto out_unlock;
+		}
+		iommufd = iommufd_ctx_from_file(f.file);
+		if (IS_ERR(iommufd)) {
+			ret = PTR_ERR(iommufd);
+			goto out_put_file;
+		}
+	}
+
+	/*
+	 * Before the device open, get the KVM pointer currently
+	 * associated with the device file (if there is) and obtain a
+	 * reference. This reference is held until device closed. Save
+	 * the pointer in the device for use by drivers.
+	 */
+	vfio_device_get_kvm_safe(df);
+
+	df->iommufd = iommufd;
+	ret = vfio_device_open(df, &bind.out_devid, NULL);
+	if (ret)
+		goto out_put_kvm;
+
+	ret = copy_to_user((void __user *)arg +
+			   offsetofend(struct vfio_device_bind_iommufd, iommufd),
+			   &bind.out_devid,
+			   sizeof(bind.out_devid)) ? -EFAULT : 0;
+	if (ret)
+		goto out_close_device;
+
+	if (iommufd)
+		fdput(f);
+	else if (df->noiommu)
+		dev_warn(device->dev, "vfio-noiommu device used by user "
+			 "(%s:%d)\n", current->comm, task_pid_nr(current));
+
+	/*
+	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
+	 * read/write/mmap
+	 */
+	smp_store_release(&df->access_granted, true);
+	mutex_unlock(&device->dev_set->lock);
+
+	return 0;
+
+out_close_device:
+	vfio_device_close(df);
+out_put_kvm:
+	df->iommufd = NULL;
+	df->noiommu = false;
+	vfio_device_put_kvm(device);
+out_put_file:
+	if (iommufd) {
+		iommufd_ctx_put(iommufd);
+		fdput(f);
+	}
+out_unlock:
+	mutex_unlock(&device->dev_set->lock);
+	vfio_device_unblock_group(device);
+	return ret;
+}
+
 static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
 {
 	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 15734dbba9fb..40b54b0ce993 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -23,7 +23,9 @@ struct vfio_device_file {
 	bool access_granted;
 	spinlock_t kvm_ref_lock; /* protect kvm field */
 	struct kvm *kvm;
-	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
+	/* protected by struct vfio_device_set::lock */
+	struct iommufd_ctx *iommufd;
+	bool noiommu;
 };
 
 void vfio_device_put_registration(struct vfio_device *device);
@@ -259,6 +261,9 @@ static inline void vfio_device_del(struct vfio_device *device)
 
 void vfio_init_device_cdev(struct vfio_device *device);
 int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
+void vfio_device_cdev_close(struct vfio_device_file *df);
+long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
+				    unsigned long arg);
 int vfio_cdev_init(struct class *device_class);
 void vfio_cdev_cleanup(void);
 #else
@@ -282,6 +287,16 @@ static inline int vfio_device_fops_cdev_open(struct inode *inode,
 	return 0;
 }
 
+static inline void vfio_device_cdev_close(struct vfio_device_file *df)
+{
+}
+
+static inline long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
+						  unsigned long arg)
+{
+	return -EOPNOTSUPP;
+}
+
 static inline int vfio_cdev_init(struct class *device_class)
 {
 	return 0;
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 48616ef97dfd..5e9a4fd800be 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -37,6 +37,7 @@
 #include <linux/interval_tree.h>
 #include <linux/iova_bitmap.h>
 #include <linux/iommufd.h>
+#include <uapi/linux/iommufd.h>
 #include "vfio.h"
 
 #define DRIVER_VERSION	"0.3"
@@ -422,16 +423,32 @@ static int vfio_device_first_open(struct vfio_device_file *df,
 {
 	struct vfio_device *device = df->device;
 	struct iommufd_ctx *iommufd = df->iommufd;
-	int ret;
+	int ret = 0;
 
 	lockdep_assert_held(&device->dev_set->lock);
 
+	if (WARN_ON(iommufd && df->noiommu))
+		return -EINVAL;
+
 	if (!try_module_get(device->dev->driver->owner))
 		return -ENODEV;
 
+	/*
+	 * For group/container path, iommufd pointer is NULL when comes
+	 * into this helper. Its noiommu support is handled by
+	 * vfio_device_group_use_iommu()
+	 *
+	 * For iommufd compat mode, iommufd pointer here is a valid value.
+	 * Its noiommu support is in vfio_iommufd_bind().
+	 *
+	 * For device cdev path, iommufd pointer here is a valid value for
+	 * normal cases, but it is NULL if it's noiommu. Check df->noiommu
+	 * to differentiate cdev noiommu from the group/container path which
+	 * also passes NULL iommufd pointer in. If set then do nothing.
+	 */
 	if (iommufd)
 		ret = vfio_iommufd_bind(device, iommufd, dev_id, pt_id);
-	else
+	else if (!df->noiommu)
 		ret = vfio_device_group_use_iommu(device);
 	if (ret)
 		goto err_module_put;
@@ -446,7 +463,7 @@ static int vfio_device_first_open(struct vfio_device_file *df,
 err_unuse_iommu:
 	if (iommufd)
 		vfio_iommufd_unbind(device);
-	else
+	else if (!df->noiommu)
 		vfio_device_group_unuse_iommu(device);
 err_module_put:
 	module_put(device->dev->driver->owner);
@@ -464,7 +481,7 @@ static void vfio_device_last_close(struct vfio_device_file *df)
 		device->ops->close_device(device);
 	if (iommufd)
 		vfio_iommufd_unbind(device);
-	else
+	else if (!df->noiommu)
 		vfio_device_group_unuse_iommu(device);
 	module_put(device->dev->driver->owner);
 }
@@ -549,6 +566,8 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 
 	if (!df->is_cdev_device)
 		vfio_device_group_close(df);
+	else
+		vfio_device_cdev_close(df);
 
 	vfio_device_put_registration(device);
 
@@ -1122,6 +1141,9 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
 	struct vfio_device *device = df->device;
 	int ret;
 
+	if (cmd == VFIO_DEVICE_BIND_IOMMUFD)
+		return vfio_device_ioctl_bind_iommufd(df, arg);
+
 	/* Paired with smp_store_release() in vfio_device_open() */
 	if (!smp_load_acquire(&df->access_granted))
 		return -EINVAL;
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index 650d45629647..9672cf839687 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -17,6 +17,12 @@ struct iommufd_ctx;
 struct iommufd_access;
 struct file;
 
+/*
+ * iommufd core init xarray with flags==XA_FLAGS_ALLOC1, so valid
+ * ID starts from 1.
+ */
+#define IOMMUFD_INVALID_ID 0
+
 struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
 					   struct device *dev, u32 *id);
 void iommufd_device_unbind(struct iommufd_device *idev);
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 0552e8dcf0cb..1fe69173e739 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -194,6 +194,40 @@ struct vfio_group_status {
 
 /* --------------- IOCTLs for DEVICE file descriptors --------------- */
 
+/*
+ * VFIO_DEVICE_BIND_IOMMUFD - _IOR(VFIO_TYPE, VFIO_BASE + 19,
+ *				   struct vfio_device_bind_iommufd)
+ *
+ * Bind a vfio_device to the specified iommufd.
+ *
+ * The user should provide a device cookie when calling this ioctl. The
+ * cookie is carried only in event e.g. I/O fault reported to userspace
+ * via iommufd. The user should use devid returned by this ioctl to mark
+ * the target device in other ioctls (e.g. capability query via iommufd).
+ *
+ * User is not allowed to access the device before the binding operation
+ * is completed.
+ *
+ * Unbind is automatically conducted when device fd is closed.
+ *
+ * @argsz:	 user filled size of this data.
+ * @flags:	 reserved for future extension.
+ * @dev_cookie:	 a per device cookie provided by userspace.
+ * @iommufd:	 iommufd to bind. a negative value means noiommu.
+ * @out_devid:	 the device id generated by this bind.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+struct vfio_device_bind_iommufd {
+	__u32		argsz;
+	__u32		flags;
+	__aligned_u64	dev_cookie;
+	__s32		iommufd;
+	__u32		out_devid;
+};
+
+#define VFIO_DEVICE_BIND_IOMMUFD	_IO(VFIO_TYPE, VFIO_BASE + 19)
+
 /**
  * VFIO_DEVICE_GET_INFO - _IOR(VFIO_TYPE, VFIO_BASE + 7,
  *						struct vfio_device_info)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 17/19] vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT
  2023-02-21  3:47 [PATCH v4 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (15 preceding siblings ...)
  2023-02-21  3:48 ` [PATCH v4 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD Yi Liu
@ 2023-02-21  3:48 ` Yi Liu
  2023-02-22  7:41   ` Tian, Kevin
  2023-02-21  3:48 ` [PATCH v4 18/19] vfio: Compile group optionally Yi Liu
  2023-02-21  3:48 ` [PATCH v4 19/19] docs: vfio: Add vfio device cdev description Yi Liu
  18 siblings, 1 reply; 55+ messages in thread
From: Yi Liu @ 2023-02-21  3:48 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, intel-gvt-dev, intel-gfx,
	linux-s390, yan.y.zhao, xudong.hao, terrence.xu

This adds ioctl for userspace to attach device cdev fd to and detach
from IOAS/hw_pagetable managed by iommufd.

    VFIO_DEVICE_ATTACH_IOMMUFD_PT: attach vfio device to IOAS, hw_pagetable
				   managed by iommufd. Attach can be
				   undo by VFIO_DEVICE_DETACH_IOMMUFD_PT
				   or device fd close.
    VFIO_DEVICE_DETACH_IOMMUFD_PT: detach vfio device from the current attached
				   IOAS or hw_pagetable managed by iommufd.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/device_cdev.c | 76 ++++++++++++++++++++++++++++++++++++++
 drivers/vfio/vfio.h        | 16 ++++++++
 drivers/vfio/vfio_main.c   |  8 ++++
 include/uapi/linux/vfio.h  | 52 ++++++++++++++++++++++++++
 4 files changed, 152 insertions(+)

diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
index be62f0a5687e..6870e42944ba 100644
--- a/drivers/vfio/device_cdev.c
+++ b/drivers/vfio/device_cdev.c
@@ -182,6 +182,82 @@ long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
 	return ret;
 }
 
+int vfio_ioctl_device_attach(struct vfio_device_file *df,
+			     void __user *arg)
+{
+	struct vfio_device *device = df->device;
+	struct vfio_device_attach_iommufd_pt attach;
+	unsigned long minsz;
+	int ret;
+
+	minsz = offsetofend(struct vfio_device_attach_iommufd_pt, pt_id);
+
+	if (copy_from_user(&attach, (void __user *)arg, minsz))
+		return -EFAULT;
+
+	if (attach.argsz < minsz || attach.flags ||
+	    attach.pt_id == IOMMUFD_INVALID_ID)
+		return -EINVAL;
+
+	if (!device->ops->bind_iommufd)
+		return -ENODEV;
+
+	mutex_lock(&device->dev_set->lock);
+	if (df->noiommu) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	ret = device->ops->attach_ioas(device, &attach.pt_id);
+	if (ret)
+		goto out_unlock;
+
+	ret = copy_to_user((void __user *)arg +
+			   offsetofend(struct vfio_device_attach_iommufd_pt, flags),
+			   &attach.pt_id,
+			   sizeof(attach.pt_id)) ? -EFAULT : 0;
+	if (ret)
+		goto out_detach;
+	mutex_unlock(&device->dev_set->lock);
+
+	return 0;
+
+out_detach:
+	device->ops->detach_ioas(device);
+out_unlock:
+	mutex_unlock(&device->dev_set->lock);
+	return ret;
+}
+
+int vfio_ioctl_device_detach(struct vfio_device_file *df,
+			     void __user *arg)
+{
+	struct vfio_device *device = df->device;
+	struct vfio_device_detach_iommufd_pt detach;
+	unsigned long minsz;
+
+	minsz = offsetofend(struct vfio_device_detach_iommufd_pt, flags);
+
+	if (copy_from_user(&detach, (void __user *)arg, minsz))
+		return -EFAULT;
+
+	if (detach.argsz < minsz || detach.flags)
+		return -EINVAL;
+
+	if (!device->ops->bind_iommufd)
+		return -ENODEV;
+
+	mutex_lock(&device->dev_set->lock);
+	if (df->noiommu) {
+		mutex_unlock(&device->dev_set->lock);
+		return -EINVAL;
+	}
+	device->ops->detach_ioas(device);
+	mutex_unlock(&device->dev_set->lock);
+
+	return 0;
+}
+
 static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
 {
 	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 40b54b0ce993..1cba2baafb08 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -264,6 +264,10 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
 void vfio_device_cdev_close(struct vfio_device_file *df);
 long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
 				    unsigned long arg);
+int vfio_ioctl_device_attach(struct vfio_device_file *df,
+			     void __user *arg);
+int vfio_ioctl_device_detach(struct vfio_device_file *df,
+			     void __user *arg);
 int vfio_cdev_init(struct class *device_class);
 void vfio_cdev_cleanup(void);
 #else
@@ -297,6 +301,18 @@ static inline long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
 	return -EOPNOTSUPP;
 }
 
+static inline int vfio_ioctl_device_attach(struct vfio_device_file *df,
+					   void __user *arg)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline int vfio_ioctl_device_detach(struct vfio_device_file *df,
+					   void __user *arg)
+{
+	return -EOPNOTSUPP;
+}
+
 static inline int vfio_cdev_init(struct class *device_class)
 {
 	return 0;
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 5e9a4fd800be..af03ea97844e 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1157,6 +1157,14 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
 		ret = vfio_ioctl_device_feature(device, (void __user *)arg);
 		break;
 
+	case VFIO_DEVICE_ATTACH_IOMMUFD_PT:
+		ret = vfio_ioctl_device_attach(df, (void __user *)arg);
+		break;
+
+	case VFIO_DEVICE_DETACH_IOMMUFD_PT:
+		ret = vfio_ioctl_device_detach(df, (void __user *)arg);
+		break;
+
 	default:
 		if (unlikely(!device->ops->ioctl))
 			ret = -EINVAL;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 1fe69173e739..026af52cf22e 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -228,6 +228,58 @@ struct vfio_device_bind_iommufd {
 
 #define VFIO_DEVICE_BIND_IOMMUFD	_IO(VFIO_TYPE, VFIO_BASE + 19)
 
+/*
+ * VFIO_DEVICE_ATTACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 20,
+ *					struct vfio_device_attach_iommufd_pt)
+ *
+ * Attach a vfio device to an iommufd address space specified by IOAS
+ * id or hw_pagetable (hwpt) id.
+ *
+ * Available only after a device has been bound to iommufd via
+ * VFIO_DEVICE_BIND_IOMMUFD
+ *
+ * Undo by VFIO_DEVICE_DETACH_IOMMUFD_PT or device fd close.
+ *
+ * @argsz:	user filled size of this data.
+ * @flags:	must be 0.
+ * @pt_id:	Input the target id which can represent an ioas or a hwpt
+ *		allocated via iommufd subsystem.
+ *		Output the attached hwpt id which could be the specified
+ *		hwpt itself or a hwpt automatically created for the
+ *		specified ioas by kernel during the attachment.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+struct vfio_device_attach_iommufd_pt {
+	__u32	argsz;
+	__u32	flags;
+	__u32	pt_id;
+};
+
+#define VFIO_DEVICE_ATTACH_IOMMUFD_PT		_IO(VFIO_TYPE, VFIO_BASE + 20)
+
+/*
+ * VFIO_DEVICE_DETACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 21,
+ *					struct vfio_device_detach_iommufd_pt)
+ *
+ * Detach a vfio device from the iommufd address space it has been
+ * attached to. After it, device should be in a blocking DMA state.
+ *
+ * Available only after a device has been bound to iommufd via
+ * VFIO_DEVICE_BIND_IOMMUFD
+ *
+ * @argsz:	user filled size of this data.
+ * @flags:	must be 0.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+struct vfio_device_detach_iommufd_pt {
+	__u32	argsz;
+	__u32	flags;
+};
+
+#define VFIO_DEVICE_DETACH_IOMMUFD_PT		_IO(VFIO_TYPE, VFIO_BASE + 21)
+
 /**
  * VFIO_DEVICE_GET_INFO - _IOR(VFIO_TYPE, VFIO_BASE + 7,
  *						struct vfio_device_info)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 18/19] vfio: Compile group optionally
  2023-02-21  3:47 [PATCH v4 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (16 preceding siblings ...)
  2023-02-21  3:48 ` [PATCH v4 17/19] vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT Yi Liu
@ 2023-02-21  3:48 ` Yi Liu
  2023-02-21  3:48 ` [PATCH v4 19/19] docs: vfio: Add vfio device cdev description Yi Liu
  18 siblings, 0 replies; 55+ messages in thread
From: Yi Liu @ 2023-02-21  3:48 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, intel-gvt-dev, intel-gfx,
	linux-s390, yan.y.zhao, xudong.hao, terrence.xu

group code is not needed for vfio device cdev, so with vfio device cdev
introduced, the group infrastructures can be compiled out if only cdev
is needed.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/Kconfig  | 14 ++++++++
 drivers/vfio/Makefile |  2 +-
 drivers/vfio/vfio.h   | 78 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/vfio.h  | 11 ++++++
 4 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 169762316513..c3ab06c314ea 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -4,6 +4,8 @@ menuconfig VFIO
 	select IOMMU_API
 	depends on IOMMUFD || !IOMMUFD
 	select INTERVAL_TREE
+	select VFIO_GROUP if SPAPR_TCE_IOMMU
+	select VFIO_DEVICE_CDEV if !VFIO_GROUP && (X86 || S390 || ARM || ARM64)
 	select VFIO_CONTAINER if IOMMUFD=n
 	help
 	  VFIO provides a framework for secure userspace device drivers.
@@ -15,6 +17,7 @@ if VFIO
 config VFIO_DEVICE_CDEV
 	bool "Support for the VFIO cdev /dev/vfio/devices/vfioX"
 	depends on IOMMUFD && (X86 || S390 || ARM || ARM64)
+	default !VFIO_GROUP
 	help
 	  The VFIO device cdev is another way for userspace to get device
 	  access. Userspace gets device fd by opening device cdev under
@@ -24,9 +27,20 @@ config VFIO_DEVICE_CDEV
 
 	  If you don't know what to do here, say N.
 
+config VFIO_GROUP
+	bool "Support for the VFIO group /dev/vfio/$group_id"
+	default y
+	help
+	   VFIO group support provides the traditional model for accessing
+	   devices through VFIO and is used by the majority of userspace
+	   applications and drivers making use of VFIO.
+
+	   If you don't know what to do here, say Y.
+
 config VFIO_CONTAINER
 	bool "Support for the VFIO container /dev/vfio/vfio"
 	select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
+	depends on VFIO_GROUP
 	default y
 	help
 	  The VFIO container is the classic interface to VFIO for establishing
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index 245394aeb94b..57c3515af606 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -2,9 +2,9 @@
 obj-$(CONFIG_VFIO) += vfio.o
 
 vfio-y += vfio_main.o \
-	  group.o \
 	  iova_bitmap.o
 vfio-$(CONFIG_VFIO_DEVICE_CDEV) += device_cdev.o
+vfio-$(CONFIG_VFIO_GROUP) += group.o
 vfio-$(CONFIG_IOMMUFD) += iommufd.o
 vfio-$(CONFIG_VFIO_CONTAINER) += container.o
 vfio-$(CONFIG_VFIO_VIRQFD) += virqfd.o
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 1cba2baafb08..7cbb8aa15c59 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -62,6 +62,7 @@ enum vfio_group_type {
 	VFIO_NO_IOMMU,
 };
 
+#if IS_ENABLED(CONFIG_VFIO_GROUP)
 struct vfio_group {
 	struct device 			dev;
 	struct cdev			cdev;
@@ -108,6 +109,83 @@ bool vfio_group_has_dev(struct vfio_group *group, struct vfio_device *device);
 bool vfio_device_has_container(struct vfio_device *device);
 int __init vfio_group_init(void);
 void vfio_group_cleanup(void);
+#else
+struct vfio_group;
+
+static inline int vfio_device_block_group(struct vfio_device *device)
+{
+	return 0;
+}
+
+static inline void vfio_device_unblock_group(struct vfio_device *device)
+{
+}
+
+static inline int vfio_device_set_group(struct vfio_device *device,
+					enum vfio_group_type type)
+{
+	return 0;
+}
+
+static inline void vfio_device_remove_group(struct vfio_device *device)
+{
+}
+
+static inline void vfio_device_group_register(struct vfio_device *device)
+{
+}
+
+static inline void vfio_device_group_unregister(struct vfio_device *device)
+{
+}
+
+static inline int vfio_device_group_use_iommu(struct vfio_device *device)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline void vfio_device_group_unuse_iommu(struct vfio_device *device)
+{
+}
+
+static inline void vfio_device_group_close(struct vfio_device_file *df)
+{
+}
+
+static inline struct vfio_group *vfio_group_from_file(struct file *file)
+{
+	return NULL;
+}
+
+static inline bool vfio_group_enforced_coherent(struct vfio_group *group)
+{
+	return true;
+}
+
+static inline void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm)
+{
+}
+
+static inline bool vfio_group_has_dev(struct vfio_group *group,
+				      struct vfio_device *device)
+{
+	return false;
+}
+
+static inline bool vfio_device_has_container(struct vfio_device *device)
+{
+	return false;
+}
+
+static inline int __init vfio_group_init(void)
+{
+	return 0;
+}
+
+static inline void vfio_group_cleanup(void)
+{
+}
+#endif /* CONFIG_VFIO_GROUP */
 
 #if IS_ENABLED(CONFIG_VFIO_CONTAINER)
 /**
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index a94cfe8d8073..58afaf816255 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -43,7 +43,9 @@ struct vfio_device {
 	 */
 	const struct vfio_migration_ops *mig_ops;
 	const struct vfio_log_ops *log_ops;
+#if IS_ENABLED(CONFIG_VFIO_GROUP)
 	struct vfio_group *group;
+#endif
 	struct vfio_device_set *dev_set;
 	struct list_head dev_set_list;
 	unsigned int migration_flags;
@@ -58,8 +60,10 @@ struct vfio_device {
 	refcount_t refcount;	/* user count on registered device*/
 	unsigned int open_count;
 	struct completion comp;
+#if IS_ENABLED(CONFIG_VFIO_GROUP)
 	struct list_head group_next;
 	struct list_head iommu_entry;
+#endif
 	struct iommufd_access *iommufd_access;
 	void (*put_kvm)(struct kvm *kvm);
 #if IS_ENABLED(CONFIG_IOMMUFD)
@@ -257,7 +261,14 @@ int vfio_mig_get_next_state(struct vfio_device *device,
 /*
  * External user API
  */
+#if IS_ENABLED(CONFIG_VFIO_GROUP)
 struct iommu_group *vfio_file_iommu_group(struct file *file);
+#else
+static inline struct iommu_group *vfio_file_iommu_group(struct file *file)
+{
+	return NULL;
+}
+#endif
 bool vfio_file_is_valid(struct file *file);
 bool vfio_file_enforced_coherent(struct file *file);
 void vfio_file_set_kvm(struct file *file, struct kvm *kvm);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v4 19/19] docs: vfio: Add vfio device cdev description
  2023-02-21  3:47 [PATCH v4 00/19] Add vfio_device cdev for iommufd support Yi Liu
                   ` (17 preceding siblings ...)
  2023-02-21  3:48 ` [PATCH v4 18/19] vfio: Compile group optionally Yi Liu
@ 2023-02-21  3:48 ` Yi Liu
  18 siblings, 0 replies; 55+ messages in thread
From: Yi Liu @ 2023-02-21  3:48 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.l.liu, yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi,
	lulu, suravee.suthikulpanit, intel-gvt-dev, intel-gfx,
	linux-s390, yan.y.zhao, xudong.hao, terrence.xu

This gives hints for userspace applications on device cdev usage.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 Documentation/driver-api/vfio.rst | 125 ++++++++++++++++++++++++++++++
 1 file changed, 125 insertions(+)

diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
index 44527420f20d..5d290ceb2bbf 100644
--- a/Documentation/driver-api/vfio.rst
+++ b/Documentation/driver-api/vfio.rst
@@ -239,6 +239,123 @@ group and can access them as follows::
 	/* Gratuitous device reset and go... */
 	ioctl(device, VFIO_DEVICE_RESET);
 
+IOMMUFD and vfio_iommu_type1
+----------------------------
+
+IOMMUFD is the new user API to manage I/O page tables from userspace.
+It intends to be the portal of delivering advanced userspace DMA
+features (nested translation [5], PASID [6], etc.) and backward
+compatible with the vfio_iommu_type1 driver. Eventually vfio_iommu_type1
+will be deprecated.
+
+With the backward compatibility, no change is required for legacy VFIO
+drivers or applications to connect a VFIO device to IOMMUFD.
+
+	When CONFIG_IOMMUFD_VFIO_CONTAINER=n, VFIO container still provides
+	/dev/vfio/vfio which connects to vfio_iommu_type1. To disable VFIO
+	container and vfio_iommu_type1, the administrator could symbol link
+	/dev/vfio/vfio to /dev/iommu to enable VFIO container emulation
+	in IOMMUFD.
+
+	When CONFIG_IOMMUFD_VFIO_CONTAINER=y, IOMMUFD directly provides
+	/dev/vfio/vfio while the VFIO container and vfio_iommu_type1 are
+	explicitly disabled.
+
+VFIO Device cdev
+----------------
+
+Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
+in a VFIO group.
+
+With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd
+by directly opening a character device /dev/vfio/devices/vfioX where
+"X" is the number allocated uniquely by VFIO for registered devices.
+
+The cdev only works with IOMMUFD. Both VFIO drivers and applications
+must adapt to the new cdev security model which requires using
+VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to
+actually use the device. Once bind succeeds then a VFIO device can
+be fully accessed by the user.
+
+VFIO device cdev doesn't rely on VFIO group/container/iommu drivers.
+Hence those modules can be fully compiled out in an environment
+where no legacy VFIO application exists.
+
+So far SPAPR does not support IOMMUFD yet. So it cannot support device
+cdev either.
+
+Device cdev Example
+-------------------
+
+Assume user wants to access PCI device 0000:6a:01.0::
+
+	$ ls /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/
+	vfio0
+
+This device is therefore represented as vfio0. The user can verify
+its existence::
+
+	$ ls -l /dev/vfio/devices/vfio0
+	crw------- 1 root root 511, 0 Feb 16 01:22 /dev/vfio/devices/vfio0
+	$ cat /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/vfio0/dev
+	511:0
+	$ ls -l /dev/char/511\:0
+	lrwxrwxrwx 1 root root 21 Feb 16 01:22 /dev/char/511:0 -> ../vfio/devices/vfio0
+
+Then provide the user with access to the device if unprivileged
+operation is desired::
+
+	$ chown user:user /dev/vfio/devices/vfio0
+
+Finally the user could get cdev fd by::
+
+	cdev_fd = open("/dev/vfio/devices/vfio0", O_RDWR);
+
+An opened cdev_fd doesn't give the user any permission of accessing
+the device except binding the cdev_fd to an iommufd. After that point
+then the device is fully accessible including attaching it to an
+IOMMUFD IOAS/HWPT to enable userspace DMA::
+
+	struct vfio_device_bind_iommufd bind = {
+		.argsz = sizeof(bind),
+		.flags = 0,
+	};
+	struct iommu_ioas_alloc alloc_data  = {
+		.size = sizeof(alloc_data),
+		.flags = 0,
+	};
+	struct vfio_device_attach_iommufd_pt attach_data = {
+		.argsz = sizeof(attach_data),
+		.flags = 0,
+	};
+	struct iommu_ioas_map map = {
+		.size = sizeof(map),
+		.flags = IOMMU_IOAS_MAP_READABLE |
+			 IOMMU_IOAS_MAP_WRITEABLE |
+			 IOMMU_IOAS_MAP_FIXED_IOVA,
+		.__reserved = 0,
+	};
+
+	iommufd = open("/dev/iommu", O_RDWR);
+
+	bind.iommufd = iommufd;
+	ioctl(cdev_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind);
+
+	ioctl(iommufd, IOMMU_IOAS_ALLOC, &alloc_data);
+	attach_data.pt_id = alloc_data.out_ioas_id;
+	ioctl(cdev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data);
+
+	/* Allocate some space and setup a DMA mapping */
+	map.user_va = (int64_t)mmap(0, 1024 * 1024, PROT_READ | PROT_WRITE,
+				    MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
+	map.iova = 0; /* 1MB starting at 0x0 from device view */
+	map.length = 1024 * 1024;
+	map.ioas_id = alloc_data.out_ioas_id;;
+
+	ioctl(iommufd, IOMMU_IOAS_MAP, &map);
+
+	/* Other device operations as stated in "VFIO Usage Example" */
+
 VFIO User API
 -------------------------------------------------------------------------------
 
@@ -566,3 +683,11 @@ This implementation has some specifics:
 				\-0d.1
 
 	00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
+
+.. [5] Nested translation is an IOMMU feature which supports two stage
+   address translations. This improves the address translation efficiency
+   in IOMMU virtualization.
+
+.. [6] PASID stands for Process Address Space ID, introduced by PCI
+   Express. It is a prerequisite for Shared Virtual Addressing (SVA)
+   and Scalable I/O Virtualization (Scalable IOV).
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* RE: [PATCH v4 03/19] vfio: Accept vfio device file in the driver facing kAPI
  2023-02-21  3:47 ` [PATCH v4 03/19] vfio: Accept vfio device file in the driver facing kAPI Yi Liu
@ 2023-02-22  7:15   ` Tian, Kevin
  2023-02-26 12:20     ` Liu, Yi L
  0 siblings, 1 reply; 55+ messages in thread
From: Tian, Kevin @ 2023-02-22  7:15 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	Zhao, Yan Y, Hao, Xudong, Xu, Terrence

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Tuesday, February 21, 2023 11:48 AM
> 
>  /**
>   * vfio_file_is_valid - True if the file is valid vfio file
>   * @file: VFIO group file or VFIO device file
>   */
>  bool vfio_file_is_valid(struct file *file)
>  {
> -	return vfio_group_from_file(file);
> +	return vfio_group_from_file(file) ||
> +	       vfio_device_from_file(file);
>  }
>  EXPORT_SYMBOL_GPL(vfio_file_is_valid);

Following previous Alex's comment I'd leave pci hot reset path
to continue calling vfio_file_is_group() at this point so the uAPI
semantics is not changed here before patch 9 comes to properly
handle device fd in that path.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH v4 05/19] kvm/vfio: Accept vfio device file from userspace
  2023-02-21  3:47 ` [PATCH v4 05/19] kvm/vfio: Accept vfio device file from userspace Yi Liu
@ 2023-02-22  7:17   ` Tian, Kevin
  2023-02-23 10:33     ` Liu, Yi L
  0 siblings, 1 reply; 55+ messages in thread
From: Tian, Kevin @ 2023-02-22  7:17 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	Zhao, Yan Y, Hao, Xudong, Xu, Terrence

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Tuesday, February 21, 2023 11:48 AM
> 
> +
> +  KVM_DEV_VFIO_FILE_SET_SPAPR_TCE: attaches a guest visible TCE table
>  	allocated by sPAPR KVM.
> +
> +	alias: KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE
> +
>  	kvm_device_attr.addr points to a struct::
> 

I'm a bit lost from previous discussion in v3. If all agree that SPAPR
only works with group then is it necessary to add a FILE alias here?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH v4 08/19] vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset()
  2023-02-21  3:48 ` [PATCH v4 08/19] vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset() Yi Liu
@ 2023-02-22  7:20   ` Tian, Kevin
  0 siblings, 0 replies; 55+ messages in thread
From: Tian, Kevin @ 2023-02-22  7:20 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	Zhao, Yan Y, Hao, Xudong, Xu, Terrence

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Tuesday, February 21, 2023 11:48 AM
> 
> this suits more on what the code does.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/pci/vfio_pci_core.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 4704c1babae3..827524510f3f 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -1308,8 +1308,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct
> vfio_pci_core_device *vdev,
>  	}
> 
>  	/*
> -	 * For each group_fd, get the group through the vfio external user
> -	 * interface and store the group and iommu ID.  This ensures the
> group
> +	 * For each group_fd, get the group file, this ensures the group
>  	 * is held across the reset.

"Get the group file for each fd to ensure the group held across the reset"

btw I'm not sure whether it's worthy of being a separate patch. but if Alex is fine:

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

>  	 */
>  	for (file_idx = 0; file_idx < hdr.count; file_idx++) {
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH v4 09/19] vfio/pci: Accept device fd for hot reset
  2023-02-21  3:48 ` [PATCH v4 09/19] vfio/pci: Accept device fd for hot reset Yi Liu
@ 2023-02-22  7:26   ` Tian, Kevin
  2023-02-22 13:35     ` Liu, Yi L
  0 siblings, 1 reply; 55+ messages in thread
From: Tian, Kevin @ 2023-02-22  7:26 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	Zhao, Yan Y, Hao, Xudong, Xu, Terrence

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Tuesday, February 21, 2023 11:48 AM
> 
>  	/*
>  	 * We can't let userspace give us an arbitrarily large buffer to copy,
> -	 * so verify how many we think there could be.  Note groups can have
> -	 * multiple devices so one group per device is the max.
> +	 * so verify how many we think there could be.  Note user may
> provide
> +	 * a set of groups, group can have multiple devices so one group per
> +	 * device is the max.

well this change doesn't include cdev

> @@ -1320,7 +1321,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct
> vfio_pci_core_device *vdev,
>  		}
> 
>  		/* Ensure the FD is a vfio FD.*/
> -		if (!vfio_file_is_valid(file)) {
> +		if (!vfio_file_is_device_opened(file)) {
>  			fput(file);
>  			ret = -EINVAL;
>  			break;

that function is not just for checking device.

Probably rename it to vfio_file_is_reset_valid().

btw this patch is insufficient to handle device fd. The current logic
requires every device in the dev_set covered by provided fd's:

static bool vfio_dev_in_groups(struct vfio_pci_core_device *vdev,
                               struct vfio_pci_group_info *groups)
{
 	unsigned int i;

	for (i = 0; i < groups->count; i++)
		if (vfio_file_has_dev(groups->files[i], &vdev->vdev))
			return true;
	return false;
}

Presumably when cdev fd is provided above should compare iommu
group of the fd and that of the vdev. Otherwise it expects the user
to have full access to every device in the set which is impractical.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH v4 13/19] vfio: Add cdev_device_open_cnt to vfio_group
  2023-02-21  3:48 ` [PATCH v4 13/19] vfio: Add cdev_device_open_cnt to vfio_group Yi Liu
@ 2023-02-22  7:31   ` Tian, Kevin
  0 siblings, 0 replies; 55+ messages in thread
From: Tian, Kevin @ 2023-02-22  7:31 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	Zhao, Yan Y, Hao, Xudong, Xu, Terrence

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Tuesday, February 21, 2023 11:48 AM
> 
> +
> +void vfio_device_unblock_group(struct vfio_device *device)
> +{
> +	struct vfio_group *group = device->group;
> +
> +	mutex_lock(&group->group_lock);
> +	group->cdev_device_open_cnt--;
> +	mutex_unlock(&group->group_lock);
> +}
> +

Add a WARN_ON(group->cdev_device_open_cnt) in vfio_group_release().

with that:

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH v4 14/19] vfio: Make vfio_device_open() single open for device cdev path
  2023-02-21  3:48 ` [PATCH v4 14/19] vfio: Make vfio_device_open() single open for device cdev path Yi Liu
@ 2023-02-22  7:32   ` Tian, Kevin
  0 siblings, 0 replies; 55+ messages in thread
From: Tian, Kevin @ 2023-02-22  7:32 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	Zhao, Yan Y, Hao, Xudong, Xu, Terrence

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Tuesday, February 21, 2023 11:48 AM
> 
> VFIO group has historically allowed multi-open of the device FD. This
> was made secure because the "open" was executed via an ioctl to the
> group FD which is itself only single open.
> 
> However, no known use of multiple device FDs today. It is kind of a
> strange thing to do because new device FDs can naturally be created
> via dup().
> 
> When we implement the new device uAPI (only used in cdev path) there is
> no natural way to allow the device itself from being multi-opened in a
> secure manner. Without the group FD we cannot prove the security context
> of the opener.
> 
> Thus, when moving to the new uAPI we block the ability to multi-open
> the device. Old group path still allows it.
> 
> vfio_device_open() needs to sustain both the legacy behavior i.e. multi-open
> in the group path and the new behavior i.e. single-open in the cdev path.
> This mixture leads to the introduction of a new is_cdev_device flag in struct
> vfio_device_file.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH v4 15/19] vfio: Add cdev for vfio_device
  2023-02-21  3:48 ` [PATCH v4 15/19] vfio: Add cdev for vfio_device Yi Liu
@ 2023-02-22  7:34   ` Tian, Kevin
  0 siblings, 0 replies; 55+ messages in thread
From: Tian, Kevin @ 2023-02-22  7:34 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	Zhao, Yan Y, Hao, Xudong, Xu, Terrence

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Tuesday, February 21, 2023 11:48 AM
> 
> This allows user to directly open a vfio device w/o using the legacy
> container/group interface, as a prerequisite for supporting new iommu
> features like nested translation.
> 
> The device fd opened in this manner doesn't have the capability to access
> the device as the fops open() doesn't open the device until the successful
> BIND_IOMMUFD which be added in next patch.
> 
> With this patch, devices registered to vfio core have both group and device
> interface created.
> 
> - group interface : /dev/vfio/$groupID
> - device interface: /dev/vfio/devices/vfioX  (X is the minor number and
> 					      is unique across devices)
> 
> Given a vfio device the user can identify the matching vfioX by checking
> the sysfs path of the device. Take PCI device (0000:6a:01.0) for example,
> /sys/bus/pci/devices/0000\:6a\:01.0/vfio-dev/vfio0/dev contains the
> major:minor of the matching vfioX.
> 
> Userspace then opens the /dev/vfio/devices/vfioX and checks with fstat
> that the major:minor matches.
> 
> The vfio_device cdev logic in this patch:
> *) __vfio_register_dev() path ends up doing cdev_device_add() for each
>    vfio_device;
> *) vfio_unregister_group_dev() path does cdev_device_del();
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH v4 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-02-21  3:48 ` [PATCH v4 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD Yi Liu
@ 2023-02-22  7:39   ` Tian, Kevin
  2023-02-22  7:44     ` Liu, Yi L
  2023-02-22  7:53   ` Yan Zhao
  1 sibling, 1 reply; 55+ messages in thread
From: Tian, Kevin @ 2023-02-22  7:39 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	Zhao, Yan Y, Hao, Xudong, Xu, Terrence

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Tuesday, February 21, 2023 11:48 AM
> 
> +
> +void vfio_device_cdev_close(struct vfio_device_file *df)
> +{
> +	struct vfio_device *device = df->device;
> +
> +	mutex_lock(&device->dev_set->lock);
> +	if (!smp_load_acquire(&df->access_granted)) {

there is no contention with another one changing this flag at this
point so directly accessing it is fine.

but actually should check device->open_count as v3 does. Otherwise
the last error on copy_to_user() in ioctl_bind_iommufd() simply returns
here given df->access_granted hasn't been set but .open_device()
has been completed.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH v4 17/19] vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT
  2023-02-21  3:48 ` [PATCH v4 17/19] vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT Yi Liu
@ 2023-02-22  7:41   ` Tian, Kevin
  0 siblings, 0 replies; 55+ messages in thread
From: Tian, Kevin @ 2023-02-22  7:41 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	Zhao, Yan Y, Hao, Xudong, Xu, Terrence

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Tuesday, February 21, 2023 11:48 AM
> 
> This adds ioctl for userspace to attach device cdev fd to and detach
> from IOAS/hw_pagetable managed by iommufd.
> 
>     VFIO_DEVICE_ATTACH_IOMMUFD_PT: attach vfio device to IOAS,
> hw_pagetable
> 				   managed by iommufd. Attach can be
> 				   undo by
> VFIO_DEVICE_DETACH_IOMMUFD_PT
> 				   or device fd close.
>     VFIO_DEVICE_DETACH_IOMMUFD_PT: detach vfio device from the current
> attached
> 				   IOAS or hw_pagetable managed by
> iommufd.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH v4 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-02-22  7:39   ` Tian, Kevin
@ 2023-02-22  7:44     ` Liu, Yi L
  2023-02-22  7:59       ` Tian, Kevin
  2023-02-22 12:59       ` Jason Gunthorpe
  0 siblings, 2 replies; 55+ messages in thread
From: Liu, Yi L @ 2023-02-22  7:44 UTC (permalink / raw)
  To: Tian, Kevin, alex.williamson, jgg
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	Zhao, Yan Y, Hao, Xudong, Xu, Terrence

> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Wednesday, February 22, 2023 3:40 PM
> 
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Tuesday, February 21, 2023 11:48 AM
> >
> > +
> > +void vfio_device_cdev_close(struct vfio_device_file *df)
> > +{
> > +	struct vfio_device *device = df->device;
> > +
> > +	mutex_lock(&device->dev_set->lock);
> > +	if (!smp_load_acquire(&df->access_granted)) {
> 
> there is no contention with another one changing this flag at this
> point so directly accessing it is fine.

make sense. 

> but actually should check device->open_count as v3 does. Otherwise
> the last error on copy_to_user() in ioctl_bind_iommufd() simply returns
> here given df->access_granted hasn't been set but .open_device()
> has been completed.

If copy_to_user() failed, vfio_device_close() would be called in the
error path. Then this close function just returns.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-02-21  3:48 ` [PATCH v4 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD Yi Liu
  2023-02-22  7:39   ` Tian, Kevin
@ 2023-02-22  7:53   ` Yan Zhao
  2023-02-22  8:28     ` Liu, Yi L
  1 sibling, 1 reply; 55+ messages in thread
From: Yan Zhao @ 2023-02-22  7:53 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, jgg, kevin.tian, joro, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, terrence.xu

On Mon, Feb 20, 2023 at 07:48:09PM -0800, Yi Liu wrote:
> This adds ioctl for userspace to bind device cdev fd to iommufd.
> 
>     VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA
> 			      control provided by the iommufd. open_device
> 			      op is called after bind_iommufd op.
> 			      VFIO no iommu mode is indicated by passing
> 			      a negative iommufd value.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/device_cdev.c | 137 +++++++++++++++++++++++++++++++++++++
>  drivers/vfio/vfio.h        |  17 ++++-
>  drivers/vfio/vfio_main.c   |  30 ++++++--
>  include/linux/iommufd.h    |   6 ++
>  include/uapi/linux/vfio.h  |  34 +++++++++
>  5 files changed, 219 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> index 9e2c1ecaaf4f..be62f0a5687e 100644
> --- a/drivers/vfio/device_cdev.c
> +++ b/drivers/vfio/device_cdev.c
> @@ -3,6 +3,7 @@
>   * Copyright (c) 2023 Intel Corporation.
>   */
>  #include <linux/vfio.h>
> +#include <linux/iommufd.h>
>  
>  #include "vfio.h"
>  
> @@ -45,6 +46,142 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
>  	return ret;
>  }
>  
> +static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
> +{
> +	spin_lock(&df->kvm_ref_lock);
> +	if (!df->kvm)
> +		goto unlock;
> +
> +	_vfio_device_get_kvm_safe(df->device, df->kvm);
> +
> +unlock:
> +	spin_unlock(&df->kvm_ref_lock);
> +}
> +
> +void vfio_device_cdev_close(struct vfio_device_file *df)
> +{
> +	struct vfio_device *device = df->device;
> +
> +	mutex_lock(&device->dev_set->lock);
> +	if (!smp_load_acquire(&df->access_granted)) {
> +		mutex_unlock(&device->dev_set->lock);
> +		return;
> +	}
> +	vfio_device_close(df);
> +	vfio_device_put_kvm(device);
> +	if (df->iommufd)
> +		iommufd_ctx_put(df->iommufd);
> +	mutex_unlock(&device->dev_set->lock);
> +	vfio_device_unblock_group(device);
> +}
> +
> +long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
> +				    unsigned long arg)
> +{
> +	struct vfio_device *device = df->device;
> +	struct vfio_device_bind_iommufd bind;
> +	struct iommufd_ctx *iommufd = NULL;
> +	struct fd f;
> +	unsigned long minsz;
> +	int ret;
> +
> +	minsz = offsetofend(struct vfio_device_bind_iommufd, out_devid);
> +
> +	if (copy_from_user(&bind, (void __user *)arg, minsz))
> +		return -EFAULT;
> +
> +	if (bind.argsz < minsz || bind.flags)
> +		return -EINVAL;
> +
> +	if (!device->ops->bind_iommufd)
> +		return -ENODEV;
> +
> +	ret = vfio_device_block_group(device);
> +	if (ret)
> +		return ret;
> +
> +	mutex_lock(&device->dev_set->lock);
> +	/*
> +	 * If already been bound to an iommufd, or already set noiommu
> +	 * then fail it.
> +	 */
> +	if (df->iommufd || df->noiommu) {
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	}
> +
> +	/* iommufd < 0 means noiommu mode */
> +	if (bind.iommufd < 0) {
> +		if (!capable(CAP_SYS_RAWIO)) {
> +			ret = -EPERM;
> +			goto out_unlock;
> +		}
> +		df->noiommu = true;
> +	} else {
> +		f = fdget(bind.iommufd);
> +		if (!f.file) {
> +			ret = -EBADF;
> +			goto out_unlock;
> +		}
> +		iommufd = iommufd_ctx_from_file(f.file);
> +		if (IS_ERR(iommufd)) {
> +			ret = PTR_ERR(iommufd);
> +			goto out_put_file;
> +		}
> +	}
> +
> +	/*
> +	 * Before the device open, get the KVM pointer currently
> +	 * associated with the device file (if there is) and obtain a
> +	 * reference. This reference is held until device closed. Save
> +	 * the pointer in the device for use by drivers.
> +	 */
> +	vfio_device_get_kvm_safe(df);
> +
> +	df->iommufd = iommufd;
> +	ret = vfio_device_open(df, &bind.out_devid, NULL);
> +	if (ret)
> +		goto out_put_kvm;
> +
> +	ret = copy_to_user((void __user *)arg +
> +			   offsetofend(struct vfio_device_bind_iommufd, iommufd),
> +			   &bind.out_devid,
> +			   sizeof(bind.out_devid)) ? -EFAULT : 0;
> +	if (ret)
> +		goto out_close_device;
> +
> +	if (iommufd)
> +		fdput(f);
> +	else if (df->noiommu)
> +		dev_warn(device->dev, "vfio-noiommu device used by user "
> +			 "(%s:%d)\n", current->comm, task_pid_nr(current));
> +
> +	/*
> +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> +	 * read/write/mmap
> +	 */
> +	smp_store_release(&df->access_granted, true);
> +	mutex_unlock(&device->dev_set->lock);
> +
> +	return 0;
> +
> +out_close_device:
> +	vfio_device_close(df);
> +out_put_kvm:
> +	df->iommufd = NULL;
> +	df->noiommu = false;
> +	vfio_device_put_kvm(device);
> +out_put_file:
> +	if (iommufd) {
> +		iommufd_ctx_put(iommufd);
> +		fdput(f);
even if iommufd is NULL, still need to fdput(f) if f.file is true, right?

> +	}
> +out_unlock:
> +	mutex_unlock(&device->dev_set->lock);
> +	vfio_device_unblock_group(device);
> +	return ret;
> +}
> +

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 07/19] vfio: Block device access via device fd until device is opened
  2023-02-21  3:48 ` [PATCH v4 07/19] vfio: Block device access via device fd until device is opened Yi Liu
@ 2023-02-22  7:55   ` Yan Zhao
  2023-02-22  8:29     ` Liu, Yi L
  0 siblings, 1 reply; 55+ messages in thread
From: Yan Zhao @ 2023-02-22  7:55 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, jgg, kevin.tian, joro, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, terrence.xu

On Mon, Feb 20, 2023 at 07:48:00PM -0800, Yi Liu wrote:
> Allow the vfio_device file to be in a state where the device FD is
> opened but the device cannot be used by userspace (i.e. its .open_device()
> hasn't been called). This inbetween state is not used when the device
> FD is spawned from the group FD, however when we create the device FD
> directly by opening a cdev it will be opened in the blocked state.
> 
> The reason for the inbetween state is that userspace only gets a FD but
> doesn't gain access permission until binding the FD to an iommufd. So in
> the blocked state, only the bind operation is allowed. Completing bind
> will allow user to further access the device.
> 
> This is implemented by adding a flag in struct vfio_device_file to mark
> the blocked state and using a simple smp_load_acquire() to obtain the
> flag value and serialize all the device setup with the thread accessing
> this device.
> 
> Following this lockless scheme, it can safely handle the device FD
> unbound->bound but it cannot handle bound->unbound. To allow this we'd
> need to add a lock on all the vfio ioctls which seems costly. So once
> device FD is bound, it remains bound until the FD is closed.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> ---
>  drivers/vfio/group.c     |  6 ++++++
>  drivers/vfio/vfio.h      |  1 +
>  drivers/vfio/vfio_main.c | 16 ++++++++++++++++
>  3 files changed, 23 insertions(+)
> 
> diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> index 2abf55c69281..14e29525e354 100644
> --- a/drivers/vfio/group.c
> +++ b/drivers/vfio/group.c
> @@ -197,6 +197,12 @@ static int vfio_device_group_open(struct vfio_device_file *df)
>  	if (device->open_count == 0)
>  		vfio_device_put_kvm(device);
>  
> +	/*
> +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> +	 * read/write/mmap
> +	 */
> +	smp_store_release(&df->access_granted, true);
> +
>  	mutex_unlock(&device->dev_set->lock);
>  
>  out_unlock:
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index 11e56fe079a1..d56cdb114024 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -18,6 +18,7 @@ struct vfio_container;
>  
>  struct vfio_device_file {
>  	struct vfio_device *device;
> +	bool access_granted;
>  	spinlock_t kvm_ref_lock; /* protect kvm field */
>  	struct kvm *kvm;
>  	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index ea507a61e3b7..91c8f25393db 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -1106,6 +1106,10 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
>  	struct vfio_device *device = df->device;
>  	int ret;
>  
> +	/* Paired with smp_store_release() in vfio_device_open() */

Nit pick: not vfio_device_open() now :)

> +	if (!smp_load_acquire(&df->access_granted))
> +		return -EINVAL;
> +
>  	ret = vfio_device_pm_runtime_get(device);
>  	if (ret)
>  		return ret;
> @@ -1133,6 +1137,10 @@ static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf,
>  	struct vfio_device_file *df = filep->private_data;
>  	struct vfio_device *device = df->device;
>  
> +	/* Paired with smp_store_release() in vfio_device_open() */
> +	if (!smp_load_acquire(&df->access_granted))
> +		return -EINVAL;
> +
>  	if (unlikely(!device->ops->read))
>  		return -EINVAL;
>  
> @@ -1146,6 +1154,10 @@ static ssize_t vfio_device_fops_write(struct file *filep,
>  	struct vfio_device_file *df = filep->private_data;
>  	struct vfio_device *device = df->device;
>  
> +	/* Paired with smp_store_release() in vfio_device_open() */
> +	if (!smp_load_acquire(&df->access_granted))
> +		return -EINVAL;
> +
>  	if (unlikely(!device->ops->write))
>  		return -EINVAL;
>  
> @@ -1157,6 +1169,10 @@ static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
>  	struct vfio_device_file *df = filep->private_data;
>  	struct vfio_device *device = df->device;
>  
> +	/* Paired with smp_store_release() in vfio_device_open() */
> +	if (!smp_load_acquire(&df->access_granted))
> +		return -EINVAL;
> +
>  	if (unlikely(!device->ops->mmap))
>  		return -EINVAL;
>  
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH v4 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-02-22  7:44     ` Liu, Yi L
@ 2023-02-22  7:59       ` Tian, Kevin
  2023-02-22 12:59       ` Jason Gunthorpe
  1 sibling, 0 replies; 55+ messages in thread
From: Tian, Kevin @ 2023-02-22  7:59 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	Zhao, Yan Y, Hao, Xudong, Xu, Terrence

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Wednesday, February 22, 2023 3:44 PM
> 
> > From: Tian, Kevin <kevin.tian@intel.com>
> > Sent: Wednesday, February 22, 2023 3:40 PM
> >
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Tuesday, February 21, 2023 11:48 AM
> > >
> > > +
> > > +void vfio_device_cdev_close(struct vfio_device_file *df)
> > > +{
> > > +	struct vfio_device *device = df->device;
> > > +
> > > +	mutex_lock(&device->dev_set->lock);
> > > +	if (!smp_load_acquire(&df->access_granted)) {
> >
> > there is no contention with another one changing this flag at this
> > point so directly accessing it is fine.
> 
> make sense.
> 
> > but actually should check device->open_count as v3 does. Otherwise
> > the last error on copy_to_user() in ioctl_bind_iommufd() simply returns
> > here given df->access_granted hasn't been set but .open_device()
> > has been completed.
> 
> If copy_to_user() failed, vfio_device_close() would be called in the
> error path. Then this close function just returns.
> 

yeah, I misread it.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH v4 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-02-22  7:53   ` Yan Zhao
@ 2023-02-22  8:28     ` Liu, Yi L
  0 siblings, 0 replies; 55+ messages in thread
From: Liu, Yi L @ 2023-02-22  8:28 UTC (permalink / raw)
  To: Zhao, Yan Y
  Cc: alex.williamson, jgg, Tian, Kevin, joro, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Xu, Terrence

> From: Zhao, Yan Y <yan.y.zhao@intel.com>
> Sent: Wednesday, February 22, 2023 3:53 PM
> 
> On Mon, Feb 20, 2023 at 07:48:09PM -0800, Yi Liu wrote:
> > This adds ioctl for userspace to bind device cdev fd to iommufd.
> >
> >     VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain
> DMA
> > 			      control provided by the iommufd. open_device
> > 			      op is called after bind_iommufd op.
> > 			      VFIO no iommu mode is indicated by passing
> > 			      a negative iommufd value.
> >
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/device_cdev.c | 137
> +++++++++++++++++++++++++++++++++++++
> >  drivers/vfio/vfio.h        |  17 ++++-
> >  drivers/vfio/vfio_main.c   |  30 ++++++--
> >  include/linux/iommufd.h    |   6 ++
> >  include/uapi/linux/vfio.h  |  34 +++++++++
> >  5 files changed, 219 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> > index 9e2c1ecaaf4f..be62f0a5687e 100644
> > --- a/drivers/vfio/device_cdev.c
> > +++ b/drivers/vfio/device_cdev.c
> > @@ -3,6 +3,7 @@
> >   * Copyright (c) 2023 Intel Corporation.
> >   */
> >  #include <linux/vfio.h>
> > +#include <linux/iommufd.h>
> >
> >  #include "vfio.h"
> >
> > @@ -45,6 +46,142 @@ int vfio_device_fops_cdev_open(struct inode
> *inode, struct file *filep)
> >  	return ret;
> >  }
> >
> > +static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
> > +{
> > +	spin_lock(&df->kvm_ref_lock);
> > +	if (!df->kvm)
> > +		goto unlock;
> > +
> > +	_vfio_device_get_kvm_safe(df->device, df->kvm);
> > +
> > +unlock:
> > +	spin_unlock(&df->kvm_ref_lock);
> > +}
> > +
> > +void vfio_device_cdev_close(struct vfio_device_file *df)
> > +{
> > +	struct vfio_device *device = df->device;
> > +
> > +	mutex_lock(&device->dev_set->lock);
> > +	if (!smp_load_acquire(&df->access_granted)) {
> > +		mutex_unlock(&device->dev_set->lock);
> > +		return;
> > +	}
> > +	vfio_device_close(df);
> > +	vfio_device_put_kvm(device);
> > +	if (df->iommufd)
> > +		iommufd_ctx_put(df->iommufd);
> > +	mutex_unlock(&device->dev_set->lock);
> > +	vfio_device_unblock_group(device);
> > +}
> > +
> > +long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
> > +				    unsigned long arg)
> > +{
> > +	struct vfio_device *device = df->device;
> > +	struct vfio_device_bind_iommufd bind;
> > +	struct iommufd_ctx *iommufd = NULL;
> > +	struct fd f;
> > +	unsigned long minsz;
> > +	int ret;
> > +
> > +	minsz = offsetofend(struct vfio_device_bind_iommufd, out_devid);
> > +
> > +	if (copy_from_user(&bind, (void __user *)arg, minsz))
> > +		return -EFAULT;
> > +
> > +	if (bind.argsz < minsz || bind.flags)
> > +		return -EINVAL;
> > +
> > +	if (!device->ops->bind_iommufd)
> > +		return -ENODEV;
> > +
> > +	ret = vfio_device_block_group(device);
> > +	if (ret)
> > +		return ret;
> > +
> > +	mutex_lock(&device->dev_set->lock);
> > +	/*
> > +	 * If already been bound to an iommufd, or already set noiommu
> > +	 * then fail it.
> > +	 */
> > +	if (df->iommufd || df->noiommu) {
> > +		ret = -EINVAL;
> > +		goto out_unlock;
> > +	}
> > +
> > +	/* iommufd < 0 means noiommu mode */
> > +	if (bind.iommufd < 0) {
> > +		if (!capable(CAP_SYS_RAWIO)) {
> > +			ret = -EPERM;
> > +			goto out_unlock;
> > +		}
> > +		df->noiommu = true;
> > +	} else {
> > +		f = fdget(bind.iommufd);
> > +		if (!f.file) {
> > +			ret = -EBADF;
> > +			goto out_unlock;
> > +		}
> > +		iommufd = iommufd_ctx_from_file(f.file);
> > +		if (IS_ERR(iommufd)) {
> > +			ret = PTR_ERR(iommufd);
> > +			goto out_put_file;
> > +		}
> > +	}
> > +
> > +	/*
> > +	 * Before the device open, get the KVM pointer currently
> > +	 * associated with the device file (if there is) and obtain a
> > +	 * reference. This reference is held until device closed. Save
> > +	 * the pointer in the device for use by drivers.
> > +	 */
> > +	vfio_device_get_kvm_safe(df);
> > +
> > +	df->iommufd = iommufd;
> > +	ret = vfio_device_open(df, &bind.out_devid, NULL);
> > +	if (ret)
> > +		goto out_put_kvm;
> > +
> > +	ret = copy_to_user((void __user *)arg +
> > +			   offsetofend(struct vfio_device_bind_iommufd,
> iommufd),
> > +			   &bind.out_devid,
> > +			   sizeof(bind.out_devid)) ? -EFAULT : 0;
> > +	if (ret)
> > +		goto out_close_device;
> > +
> > +	if (iommufd)
> > +		fdput(f);

Here check f.file

> > +	else if (df->noiommu)
> > +		dev_warn(device->dev, "vfio-noiommu device used by user
> "
> > +			 "(%s:%d)\n", current->comm,
> task_pid_nr(current));
> > +
> > +	/*
> > +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> > +	 * read/write/mmap
> > +	 */
> > +	smp_store_release(&df->access_granted, true);
> > +	mutex_unlock(&device->dev_set->lock);
> > +
> > +	return 0;
> > +
> > +out_close_device:
> > +	vfio_device_close(df);
> > +out_put_kvm:
> > +	df->iommufd = NULL;
> > +	df->noiommu = false;
> > +	vfio_device_put_kvm(device);
> > +out_put_file:
> > +	if (iommufd) {
> > +		iommufd_ctx_put(iommufd);
> > +		fdput(f);
> even if iommufd is NULL, still need to fdput(f) if f.file is true, right?

Yes. it is. And also another line in above. 😊 may need to wrap the
fdget() and iommufd_ctx_from_file() into a helper. So fdput is called
in the helper.

> > +	}
> > +out_unlock:
> > +	mutex_unlock(&device->dev_set->lock);
> > +	vfio_device_unblock_group(device);
> > +	return ret;
> > +}
> > +

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH v4 07/19] vfio: Block device access via device fd until device is opened
  2023-02-22  7:55   ` Yan Zhao
@ 2023-02-22  8:29     ` Liu, Yi L
  0 siblings, 0 replies; 55+ messages in thread
From: Liu, Yi L @ 2023-02-22  8:29 UTC (permalink / raw)
  To: Zhao, Yan Y
  Cc: alex.williamson, jgg, Tian, Kevin, joro, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Xu, Terrence

> From: Zhao, Yan Y <yan.y.zhao@intel.com>
> Sent: Wednesday, February 22, 2023 3:56 PM
> 
> On Mon, Feb 20, 2023 at 07:48:00PM -0800, Yi Liu wrote:
> > Allow the vfio_device file to be in a state where the device FD is
> > opened but the device cannot be used by userspace (i.e.
> its .open_device()
> > hasn't been called). This inbetween state is not used when the device
> > FD is spawned from the group FD, however when we create the device FD
> > directly by opening a cdev it will be opened in the blocked state.
> >
> > The reason for the inbetween state is that userspace only gets a FD but
> > doesn't gain access permission until binding the FD to an iommufd. So in
> > the blocked state, only the bind operation is allowed. Completing bind
> > will allow user to further access the device.
> >
> > This is implemented by adding a flag in struct vfio_device_file to mark
> > the blocked state and using a simple smp_load_acquire() to obtain the
> > flag value and serialize all the device setup with the thread accessing
> > this device.
> >
> > Following this lockless scheme, it can safely handle the device FD
> > unbound->bound but it cannot handle bound->unbound. To allow this
> we'd
> > need to add a lock on all the vfio ioctls which seems costly. So once
> > device FD is bound, it remains bound until the FD is closed.
> >
> > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > ---
> >  drivers/vfio/group.c     |  6 ++++++
> >  drivers/vfio/vfio.h      |  1 +
> >  drivers/vfio/vfio_main.c | 16 ++++++++++++++++
> >  3 files changed, 23 insertions(+)
> >
> > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > index 2abf55c69281..14e29525e354 100644
> > --- a/drivers/vfio/group.c
> > +++ b/drivers/vfio/group.c
> > @@ -197,6 +197,12 @@ static int vfio_device_group_open(struct
> vfio_device_file *df)
> >  	if (device->open_count == 0)
> >  		vfio_device_put_kvm(device);
> >
> > +	/*
> > +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> > +	 * read/write/mmap
> > +	 */
> > +	smp_store_release(&df->access_granted, true);
> > +
> >  	mutex_unlock(&device->dev_set->lock);
> >
> >  out_unlock:
> > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > index 11e56fe079a1..d56cdb114024 100644
> > --- a/drivers/vfio/vfio.h
> > +++ b/drivers/vfio/vfio.h
> > @@ -18,6 +18,7 @@ struct vfio_container;
> >
> >  struct vfio_device_file {
> >  	struct vfio_device *device;
> > +	bool access_granted;
> >  	spinlock_t kvm_ref_lock; /* protect kvm field */
> >  	struct kvm *kvm;
> >  	struct iommufd_ctx *iommufd; /* protected by struct
> vfio_device_set::lock */
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index ea507a61e3b7..91c8f25393db 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -1106,6 +1106,10 @@ static long vfio_device_fops_unl_ioctl(struct
> file *filep,
> >  	struct vfio_device *device = df->device;
> >  	int ret;
> >
> > +	/* Paired with smp_store_release() in vfio_device_open() */
> 
> Nit pick: not vfio_device_open() now :)

Yes. need to update it here and below :-

> 
> > +	if (!smp_load_acquire(&df->access_granted))
> > +		return -EINVAL;
> > +
> >  	ret = vfio_device_pm_runtime_get(device);
> >  	if (ret)
> >  		return ret;
> > @@ -1133,6 +1137,10 @@ static ssize_t vfio_device_fops_read(struct file
> *filep, char __user *buf,
> >  	struct vfio_device_file *df = filep->private_data;
> >  	struct vfio_device *device = df->device;
> >
> > +	/* Paired with smp_store_release() in vfio_device_open() */
> > +	if (!smp_load_acquire(&df->access_granted))
> > +		return -EINVAL;
> > +
> >  	if (unlikely(!device->ops->read))
> >  		return -EINVAL;
> >
> > @@ -1146,6 +1154,10 @@ static ssize_t vfio_device_fops_write(struct file
> *filep,
> >  	struct vfio_device_file *df = filep->private_data;
> >  	struct vfio_device *device = df->device;
> >
> > +	/* Paired with smp_store_release() in vfio_device_open() */
> > +	if (!smp_load_acquire(&df->access_granted))
> > +		return -EINVAL;
> > +
> >  	if (unlikely(!device->ops->write))
> >  		return -EINVAL;
> >
> > @@ -1157,6 +1169,10 @@ static int vfio_device_fops_mmap(struct file
> *filep, struct vm_area_struct *vma)
> >  	struct vfio_device_file *df = filep->private_data;
> >  	struct vfio_device *device = df->device;
> >
> > +	/* Paired with smp_store_release() in vfio_device_open() */
> > +	if (!smp_load_acquire(&df->access_granted))
> > +		return -EINVAL;
> > +
> >  	if (unlikely(!device->ops->mmap))
> >  		return -EINVAL;
> >
> > --
> > 2.34.1
> >

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-02-22  7:44     ` Liu, Yi L
  2023-02-22  7:59       ` Tian, Kevin
@ 2023-02-22 12:59       ` Jason Gunthorpe
  2023-02-24  4:58         ` Yan Zhao
  1 sibling, 1 reply; 55+ messages in thread
From: Jason Gunthorpe @ 2023-02-22 12:59 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Tian, Kevin, alex.williamson, joro, cohuck, eric.auger, nicolinc,
	kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Zhao, Yan Y, Hao, Xudong,
	Xu, Terrence

On Wed, Feb 22, 2023 at 07:44:12AM +0000, Liu, Yi L wrote:
> > From: Tian, Kevin <kevin.tian@intel.com>
> > Sent: Wednesday, February 22, 2023 3:40 PM
> > 
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Tuesday, February 21, 2023 11:48 AM
> > >
> > > +
> > > +void vfio_device_cdev_close(struct vfio_device_file *df)
> > > +{
> > > +	struct vfio_device *device = df->device;
> > > +
> > > +	mutex_lock(&device->dev_set->lock);
> > > +	if (!smp_load_acquire(&df->access_granted)) {
> > 
> > there is no contention with another one changing this flag at this
> > point so directly accessing it is fine.
> 
> make sense. 

Have to use READ_ONCE though

Jason

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH v4 09/19] vfio/pci: Accept device fd for hot reset
  2023-02-22  7:26   ` Tian, Kevin
@ 2023-02-22 13:35     ` Liu, Yi L
  2023-02-22 17:17       ` Jason Gunthorpe
  0 siblings, 1 reply; 55+ messages in thread
From: Liu, Yi L @ 2023-02-22 13:35 UTC (permalink / raw)
  To: Tian, Kevin, alex.williamson, jgg
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	Zhao, Yan Y, Hao, Xudong, Xu, Terrence

> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Wednesday, February 22, 2023 3:26 PM
> 
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Tuesday, February 21, 2023 11:48 AM
> >
> >  	/*
> >  	 * We can't let userspace give us an arbitrarily large buffer to copy,
> > -	 * so verify how many we think there could be.  Note groups can
> have
> > -	 * multiple devices so one group per device is the max.
> > +	 * so verify how many we think there could be.  Note user may
> > provide
> > +	 * a set of groups, group can have multiple devices so one group per
> > +	 * device is the max.
> 
> well this change doesn't include cdev

For cdev, it should be the number of devices. 😊

> 
> > @@ -1320,7 +1321,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct
> > vfio_pci_core_device *vdev,
> >  		}
> >
> >  		/* Ensure the FD is a vfio FD.*/
> > -		if (!vfio_file_is_valid(file)) {
> > +		if (!vfio_file_is_device_opened(file)) {
> >  			fput(file);
> >  			ret = -EINVAL;
> >  			break;
> 
> that function is not just for checking device.
> 
> Probably rename it to vfio_file_is_reset_valid().

How about vfio_file_is_resettable()?

> btw this patch is insufficient to handle device fd. The current logic
> requires every device in the dev_set covered by provided fd's:
> 
> static bool vfio_dev_in_groups(struct vfio_pci_core_device *vdev,
>                                struct vfio_pci_group_info *groups)
> {
>  	unsigned int i;
> 
> 	for (i = 0; i < groups->count; i++)
> 		if (vfio_file_has_dev(groups->files[i], &vdev->vdev))
> 			return true;
> 	return false;
> }
> 
> Presumably when cdev fd is provided above should compare iommu
> group of the fd and that of the vdev. Otherwise it expects the user
> to have full access to every device in the set which is impractical.

Yes. This can be done by checking the device->vfio_group->iommu_group.
But group code may be compiled out eventually. If CONFIG_VFIO_GROUP==n.
needs to use iommu_group_get() to get iommu_group and compare.

btw. I have a doubt about if it is possible that the iommu_group
can disappear during the reset. If yes, maybe better store iommu_group
in vfio_device in the vfio_register_group_dev() path if CONFIG_VFIO_GROUP
is not enabled.

Regards,
Yi Liu 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 09/19] vfio/pci: Accept device fd for hot reset
  2023-02-22 13:35     ` Liu, Yi L
@ 2023-02-22 17:17       ` Jason Gunthorpe
  2023-02-23  7:55         ` Tian, Kevin
  0 siblings, 1 reply; 55+ messages in thread
From: Jason Gunthorpe @ 2023-02-22 17:17 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Tian, Kevin, alex.williamson, joro, cohuck, eric.auger, nicolinc,
	kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Zhao, Yan Y, Hao, Xudong,
	Xu, Terrence

On Wed, Feb 22, 2023 at 01:35:06PM +0000, Liu, Yi L wrote:

> > btw this patch is insufficient to handle device fd. The current logic
> > requires every device in the dev_set covered by provided fd's:

Yes, which is what it should be

> > static bool vfio_dev_in_groups(struct vfio_pci_core_device *vdev,
> >                                struct vfio_pci_group_info *groups)
> > {
> >  	unsigned int i;
> > 
> > 	for (i = 0; i < groups->count; i++)
> > 		if (vfio_file_has_dev(groups->files[i], &vdev->vdev))
> > 			return true;
> > 	return false;
> > }
> > 
> > Presumably when cdev fd is provided above should compare iommu
> > group of the fd and that of the vdev. Otherwise it expects the user
> > to have full access to every device in the set which is impractical.

No, it should check the dev's directly, userspace has to provide every
dev in the dev set to do a reset. We should not allow userspace to
take a shortcut based on hidden group stuff.

The dev set is already unrelated to the groups, and userspace cannot
discover the devset, so nothing has changed.

This is looking worse to me. I think we should not require userspace
to pass in lists of devices here. The simpler solution is to just take
in a single iommufd and use that as the ownership proof. Something
like the below.

Jason

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index d81f93a321afcb..a5833bfdd7307e 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -114,6 +114,34 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_device_bind, IOMMUFD);
 
+/**
+ * iommufd_ctx_has_device - True if the struct device is bound to this ictx
+ * @ictx: iommufd file descriptor
+ * @dev: Pointer to a physical device struct
+ *
+ * True if a iommufd_device_bind() is present for dev.
+ */
+bool iommufd_ctx_has_device(struct iommufd_ctx *ictx, struct device *dev)
+{
+	unsigned long index;
+	struct iommufd_object *obj;
+
+	if (!ictx)
+		return false;
+
+	xa_lock(&ictx->objects);
+	xa_for_each(&ictx->objects, index, obj) {
+		if (obj->type == IOMMUFD_OBJ_DEVICE &&
+		    container_of(obj, struct iommufd_device, obj)->dev == dev) {
+			xa_unlock(&ictx->objects);
+			return true;
+		}
+	}
+	xa_unlock(&ictx->objects);
+	return false;
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_ctx_has_device, IOMMUFD);
+
 /**
  * iommufd_device_unbind - Undo iommufd_device_bind()
  * @idev: Device returned by iommufd_device_bind()
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 26a541cc64d114..28f6db1b81c1af 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -27,6 +27,7 @@
 #include <linux/vgaarb.h>
 #include <linux/nospec.h>
 #include <linux/sched/mm.h>
+#include <linux/iommufd.h>
 #if IS_ENABLED(CONFIG_EEH)
 #include <asm/eeh.h>
 #endif
@@ -179,7 +180,8 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_core_device *vdev)
 struct vfio_pci_group_info;
 static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set);
 static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
-				      struct vfio_pci_group_info *groups);
+				      struct vfio_pci_group_info *groups,
+				      struct iommufd_ctx *iommufd_ctx);
 
 /*
  * INTx masking requires the ability to disable INTx signaling via PCI_COMMAND
@@ -1254,29 +1256,17 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
 	return ret;
 }
 
-static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
-					struct vfio_pci_hot_reset __user *arg)
+static int
+vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
+				    struct vfio_pci_hot_reset *hdr,
+				    struct vfio_pci_hot_reset __user *arg)
 {
-	unsigned long minsz = offsetofend(struct vfio_pci_hot_reset, count);
-	struct vfio_pci_hot_reset hdr;
 	int32_t *group_fds;
 	struct file **files;
 	struct vfio_pci_group_info info;
 	bool slot = false;
 	int file_idx, count = 0, ret = 0;
 
-	if (copy_from_user(&hdr, arg, minsz))
-		return -EFAULT;
-
-	if (hdr.argsz < minsz || hdr.flags)
-		return -EINVAL;
-
-	/* Can we do a slot or bus reset or neither? */
-	if (!pci_probe_reset_slot(vdev->pdev->slot))
-		slot = true;
-	else if (pci_probe_reset_bus(vdev->pdev->bus))
-		return -ENODEV;
-
 	/*
 	 * We can't let userspace give us an arbitrarily large buffer to copy,
 	 * so verify how many we think there could be.  Note groups can have
@@ -1288,11 +1278,11 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 		return ret;
 
 	/* Somewhere between 1 and count is OK */
-	if (!hdr.count || hdr.count > count)
+	if (!hdr->count || hdr->count > count)
 		return -EINVAL;
 
-	group_fds = kcalloc(hdr.count, sizeof(*group_fds), GFP_KERNEL);
-	files = kcalloc(hdr.count, sizeof(*files), GFP_KERNEL);
+	group_fds = kcalloc(hdr->count, sizeof(*group_fds), GFP_KERNEL);
+	files = kcalloc(hdr->count, sizeof(*files), GFP_KERNEL);
 	if (!group_fds || !files) {
 		kfree(group_fds);
 		kfree(files);
@@ -1300,7 +1290,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	}
 
 	if (copy_from_user(group_fds, arg->group_fds,
-			   hdr.count * sizeof(*group_fds))) {
+			   hdr->count * sizeof(*group_fds))) {
 		kfree(group_fds);
 		kfree(files);
 		return -EFAULT;
@@ -1311,7 +1301,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	 * interface and store the group and iommu ID.  This ensures the group
 	 * is held across the reset.
 	 */
-	for (file_idx = 0; file_idx < hdr.count; file_idx++) {
+	for (file_idx = 0; file_idx < hdr->count; file_idx++) {
 		struct file *file = fget(group_fds[file_idx]);
 
 		if (!file) {
@@ -1335,10 +1325,10 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	if (ret)
 		goto hot_reset_release;
 
-	info.count = hdr.count;
+	info.count = hdr->count;
 	info.files = files;
 
-	ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info);
+	ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info, NULL);
 
 hot_reset_release:
 	for (file_idx--; file_idx >= 0; file_idx--)
@@ -1348,6 +1338,50 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	return ret;
 }
 
+static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
+					struct vfio_pci_hot_reset __user *arg)
+{
+	unsigned long minsz = offsetofend(struct vfio_pci_hot_reset, count);
+	struct vfio_pci_hot_reset hdr;
+	struct iommufd_ctx *iommufd;
+	bool slot = false;
+	struct fd f;
+	int32_t fd;
+	int ret;
+
+	if (copy_from_user(&hdr, arg, minsz))
+		return -EFAULT;
+
+	if (hdr.argsz < minsz || hdr.flags)
+		return -EINVAL;
+
+	/* Can we do a slot or bus reset or neither? */
+	if (!pci_probe_reset_slot(vdev->pdev->slot))
+		slot = true;
+	else if (pci_probe_reset_bus(vdev->pdev->bus))
+		return -ENODEV;
+
+	if (hdr.count != 1)
+		return vfio_pci_ioctl_pci_hot_reset_groups(vdev, &hdr, arg);
+
+	if (copy_from_user(&fd, arg->group_fds, sizeof(fd)))
+		return -EFAULT;
+
+	f = fdget(fd);
+	if (!f.file)
+		return -EBADF;
+	iommufd = iommufd_ctx_from_file(f.file);
+	if (IS_ERR(iommufd)) {
+		fdput(f);
+		return vfio_pci_ioctl_pci_hot_reset_groups(vdev, &hdr, arg);
+	}
+	fdput(f);
+
+	ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, NULL, iommufd);
+	iommufd_ctx_put(iommufd);
+	return ret;
+}
+
 static int vfio_pci_ioctl_ioeventfd(struct vfio_pci_core_device *vdev,
 				    struct vfio_device_ioeventfd __user *arg)
 {
@@ -2317,6 +2351,9 @@ static bool vfio_dev_in_groups(struct vfio_pci_core_device *vdev,
 {
 	unsigned int i;
 
+	if (!groups)
+		return false;
+
 	for (i = 0; i < groups->count; i++)
 		if (vfio_file_has_dev(groups->files[i], &vdev->vdev))
 			return true;
@@ -2398,7 +2435,8 @@ static int vfio_pci_dev_set_pm_runtime_get(struct vfio_device_set *dev_set)
  * get each memory_lock.
  */
 static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
-				      struct vfio_pci_group_info *groups)
+				      struct vfio_pci_group_info *groups,
+				      struct iommufd_ctx *iommufd_ctx)
 {
 	struct vfio_pci_core_device *cur_mem;
 	struct vfio_pci_core_device *cur_vma;
@@ -2432,7 +2470,8 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
 		 * Test whether all the affected devices are contained by the
 		 * set of groups provided by the user.
 		 */
-		if (!vfio_dev_in_groups(cur_vma, groups)) {
+		if (!vfio_dev_in_groups(cur_vma, groups) &&
+		    !iommufd_ctx_has_device(iommufd_ctx, &cur_vma->pdev->dev)) {
 			ret = -EINVAL;
 			goto err_undo;
 		}
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index 650d45629647a7..1f58673701cb1e 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -58,6 +58,7 @@ void iommufd_access_unpin_pages(struct iommufd_access *access,
 int iommufd_access_rw(struct iommufd_access *access, unsigned long iova,
 		      void *data, size_t len, unsigned int flags);
 int iommufd_vfio_compat_ioas_id(struct iommufd_ctx *ictx, u32 *out_ioas_id);
+bool iommufd_ctx_has_device(struct iommufd_ctx *ictx, struct device *dev);
 #else /* !CONFIG_IOMMUFD */
 static inline struct iommufd_ctx *iommufd_ctx_from_file(struct file *file)
 {
@@ -94,5 +95,12 @@ static inline int iommufd_vfio_compat_ioas_id(struct iommufd_ctx *ictx,
 {
 	return -EOPNOTSUPP;
 }
+
+static inline bool iommufd_ctx_has_device(struct iommufd_ctx *ictx,
+					  struct device *dev)
+{
+	return false;
+}
+
 #endif /* CONFIG_IOMMUFD */
 #endif


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* RE: [PATCH v4 09/19] vfio/pci: Accept device fd for hot reset
  2023-02-22 17:17       ` Jason Gunthorpe
@ 2023-02-23  7:55         ` Tian, Kevin
  2023-02-23 13:21           ` Jason Gunthorpe
  0 siblings, 1 reply; 55+ messages in thread
From: Tian, Kevin @ 2023-02-23  7:55 UTC (permalink / raw)
  To: Jason Gunthorpe, Liu, Yi L
  Cc: linux-s390, yi.y.sun, mjrosato, kvm, joro, cohuck, Hao, Xudong,
	peterx, Zhao, Yan Y, eric.auger, alex.williamson, Xu, Terrence,
	nicolinc, shameerali.kolothum.thodi, suravee.suthikulpanit,
	intel-gfx, chao.p.peng, lulu, intel-gvt-dev, jasowang

> From: Jason Gunthorpe
> Sent: Thursday, February 23, 2023 1:18 AM
> 
> > > static bool vfio_dev_in_groups(struct vfio_pci_core_device *vdev,
> > >                                struct vfio_pci_group_info *groups)
> > > {
> > >  	unsigned int i;
> > >
> > > 	for (i = 0; i < groups->count; i++)
> > > 		if (vfio_file_has_dev(groups->files[i], &vdev->vdev))
> > > 			return true;
> > > 	return false;
> > > }
> > >
> > > Presumably when cdev fd is provided above should compare iommu
> > > group of the fd and that of the vdev. Otherwise it expects the user
> > > to have full access to every device in the set which is impractical.
> 
> No, it should check the dev's directly, userspace has to provide every
> dev in the dev set to do a reset. We should not allow userspace to
> take a shortcut based on hidden group stuff.
> 
> The dev set is already unrelated to the groups, and userspace cannot
> discover the devset, so nothing has changed.

Agree. But I envision there might be a user-visible impact.

Say a scenario where group happens to overlap with devset. Let's say
two devices in the group/devset.

An existing deployment assigns only dev1 to Qemu. In this case dev1
is resettable via group fd given dev2 cannot be opened by another
user.

Now the admin upgrades Qemu to a newer version incorporating
cdev and your change. Then immediately dev1 cannot be reset
since dev2 is not opened by this Qemu.

Do we consider it as a regression? Or is the answer to ask the user
to upgrade the mgmt stack?

> 
> This is looking worse to me. I think we should not require userspace
> to pass in lists of devices here. The simpler solution is to just take
> in a single iommufd and use that as the ownership proof. Something
> like the below.
> 

As you said the dev set info is not exposed to the admin today. It's
only available via VFIO_DEVICE_GET_PCI_HOT_RESET_INFO after a
device is opened.

My question is more on whether in real deployments the mgmt
stack always tries to identify the reset dependency indirectly (is there
a reliable way?) and assign all relevant devices to one VM. If it's not
the case, then this change (requiring user to open all devices in the
dev set) can certainly cause regression in those deployments because
old group-level check covers more devices hence has higher possibility
of being resettable than what your change implies.

Alex probably has more insight into this usage open.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH v4 05/19] kvm/vfio: Accept vfio device file from userspace
  2023-02-22  7:17   ` Tian, Kevin
@ 2023-02-23 10:33     ` Liu, Yi L
  0 siblings, 0 replies; 55+ messages in thread
From: Liu, Yi L @ 2023-02-23 10:33 UTC (permalink / raw)
  To: Tian, Kevin, alex.williamson, jgg
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	Zhao, Yan Y, Hao, Xudong, Xu, Terrence

> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Wednesday, February 22, 2023 3:18 PM
> 
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Tuesday, February 21, 2023 11:48 AM
> >
> > +
> > +  KVM_DEV_VFIO_FILE_SET_SPAPR_TCE: attaches a guest visible TCE
> table
> >  	allocated by sPAPR KVM.
> > +
> > +	alias: KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE
> > +
> >  	kvm_device_attr.addr points to a struct::
> >
> 
> I'm a bit lost from previous discussion in v3. If all agree that SPAPR
> only works with group then is it necessary to add a FILE alias here?

@Alex, how about your opinion?

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 09/19] vfio/pci: Accept device fd for hot reset
  2023-02-23  7:55         ` Tian, Kevin
@ 2023-02-23 13:21           ` Jason Gunthorpe
  2023-02-24  2:21             ` Tian, Kevin
  0 siblings, 1 reply; 55+ messages in thread
From: Jason Gunthorpe @ 2023-02-23 13:21 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, linux-s390, yi.y.sun, mjrosato, kvm, joro, cohuck,
	Hao, Xudong, peterx, Zhao, Yan Y, eric.auger, alex.williamson,
	Xu, Terrence, nicolinc, shameerali.kolothum.thodi,
	suravee.suthikulpanit, intel-gfx, chao.p.peng, lulu,
	intel-gvt-dev, jasowang

On Thu, Feb 23, 2023 at 07:55:21AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe
> > Sent: Thursday, February 23, 2023 1:18 AM
> > 
> > > > static bool vfio_dev_in_groups(struct vfio_pci_core_device *vdev,
> > > >                                struct vfio_pci_group_info *groups)
> > > > {
> > > >  	unsigned int i;
> > > >
> > > > 	for (i = 0; i < groups->count; i++)
> > > > 		if (vfio_file_has_dev(groups->files[i], &vdev->vdev))
> > > > 			return true;
> > > > 	return false;
> > > > }
> > > >
> > > > Presumably when cdev fd is provided above should compare iommu
> > > > group of the fd and that of the vdev. Otherwise it expects the user
> > > > to have full access to every device in the set which is impractical.
> > 
> > No, it should check the dev's directly, userspace has to provide every
> > dev in the dev set to do a reset. We should not allow userspace to
> > take a shortcut based on hidden group stuff.
> > 
> > The dev set is already unrelated to the groups, and userspace cannot
> > discover the devset, so nothing has changed.
> 
> Agree. But I envision there might be a user-visible impact.
> 
> Say a scenario where group happens to overlap with devset. Let's say
> two devices in the group/devset.
> 
> An existing deployment assigns only dev1 to Qemu. In this case dev1
> is resettable via group fd given dev2 cannot be opened by another
> user.

Oh, that is just because we took a shortcut in this logic and assumed
that if the group is open then all the devices are opened by the same
security domain.

But we can also more clearly state that any closed device is
acceptable for reset and doesn't need to be presented.

So, like this:

		if (cur_vma->vdev.open_count &&
		    !vfio_dev_in_groups(cur_vma, groups) &&
		    !iommufd_ctx_has_device(iommufd_ctx, &cur_vma->pdev->dev)) {
			ret = -EINVAL;
			goto err_undo;
		}

Jason

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH v4 09/19] vfio/pci: Accept device fd for hot reset
  2023-02-23 13:21           ` Jason Gunthorpe
@ 2023-02-24  2:21             ` Tian, Kevin
  2023-02-24  2:36               ` Jason Gunthorpe
  0 siblings, 1 reply; 55+ messages in thread
From: Tian, Kevin @ 2023-02-24  2:21 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liu, Yi L, linux-s390, yi.y.sun, mjrosato, kvm, joro, cohuck,
	Hao, Xudong, peterx, Zhao, Yan Y, eric.auger, alex.williamson,
	Xu, Terrence, nicolinc, shameerali.kolothum.thodi,
	suravee.suthikulpanit, intel-gfx, chao.p.peng, lulu,
	intel-gvt-dev, jasowang

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, February 23, 2023 9:22 PM
> 
> On Thu, Feb 23, 2023 at 07:55:21AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe
> > > Sent: Thursday, February 23, 2023 1:18 AM
> > >
> > > > > static bool vfio_dev_in_groups(struct vfio_pci_core_device *vdev,
> > > > >                                struct vfio_pci_group_info *groups)
> > > > > {
> > > > >  	unsigned int i;
> > > > >
> > > > > 	for (i = 0; i < groups->count; i++)
> > > > > 		if (vfio_file_has_dev(groups->files[i], &vdev->vdev))
> > > > > 			return true;
> > > > > 	return false;
> > > > > }
> > > > >
> > > > > Presumably when cdev fd is provided above should compare iommu
> > > > > group of the fd and that of the vdev. Otherwise it expects the user
> > > > > to have full access to every device in the set which is impractical.
> > >
> > > No, it should check the dev's directly, userspace has to provide every
> > > dev in the dev set to do a reset. We should not allow userspace to
> > > take a shortcut based on hidden group stuff.
> > >
> > > The dev set is already unrelated to the groups, and userspace cannot
> > > discover the devset, so nothing has changed.
> >
> > Agree. But I envision there might be a user-visible impact.
> >
> > Say a scenario where group happens to overlap with devset. Let's say
> > two devices in the group/devset.
> >
> > An existing deployment assigns only dev1 to Qemu. In this case dev1
> > is resettable via group fd given dev2 cannot be opened by another
> > user.
> 
> Oh, that is just because we took a shortcut in this logic and assumed
> that if the group is open then all the devices are opened by the same
> security domain.
> 
> But we can also more clearly state that any closed device is
> acceptable for reset and doesn't need to be presented.
> 
> So, like this:
> 
> 		if (cur_vma->vdev.open_count &&
> 		    !vfio_dev_in_groups(cur_vma, groups) &&
> 		    !iommufd_ctx_has_device(iommufd_ctx, &cur_vma-
> >pdev->dev)) {
> 			ret = -EINVAL;
> 			goto err_undo;
> 		}
> 

Yes, this makes sense.

Yi, while you are incorporating this change please also update the
uapi header. Rename 'group_fds[]' to 'fds[]' and add comment to
explain that it could be an array of group fds or a single iommufd.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 09/19] vfio/pci: Accept device fd for hot reset
  2023-02-24  2:21             ` Tian, Kevin
@ 2023-02-24  2:36               ` Jason Gunthorpe
  2023-02-24  2:48                 ` Tian, Kevin
  2023-02-26  8:59                 ` Liu, Yi L
  0 siblings, 2 replies; 55+ messages in thread
From: Jason Gunthorpe @ 2023-02-24  2:36 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, linux-s390, yi.y.sun, mjrosato, kvm, joro, cohuck,
	Hao, Xudong, peterx, Zhao, Yan Y, eric.auger, alex.williamson,
	Xu, Terrence, nicolinc, shameerali.kolothum.thodi,
	suravee.suthikulpanit, intel-gfx, chao.p.peng, lulu,
	intel-gvt-dev, jasowang

On Fri, Feb 24, 2023 at 02:21:33AM +0000, Tian, Kevin wrote:

> Yi, while you are incorporating this change please also update the
> uapi header. Rename 'group_fds[]' to 'fds[]' and add comment to
> explain that it could be an array of group fds or a single iommufd.

Upon reflection we can probably make it even simpler and just have a 0
length fd array mean to use the iommufd the vfio_device is already
associated with

And the check for correctness can be simplified to simply see if each
vfio_device in the dev_set is attached to the same iommufd ctx
pointer instead of searching the xarray.

Would need to double check that the locking is OK but seems doable

Jason

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH v4 09/19] vfio/pci: Accept device fd for hot reset
  2023-02-24  2:36               ` Jason Gunthorpe
@ 2023-02-24  2:48                 ` Tian, Kevin
  2023-02-24  3:43                   ` Liu, Yi L
  2023-02-26  8:59                 ` Liu, Yi L
  1 sibling, 1 reply; 55+ messages in thread
From: Tian, Kevin @ 2023-02-24  2:48 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-s390, Liu, Yi L, yi.y.sun, mjrosato, kvm, jasowang, joro,
	cohuck, Hao, Xudong, peterx, eric.auger, alex.williamson, Xu,
	Terrence, nicolinc, shameerali.kolothum.thodi,
	suravee.suthikulpanit, chao.p.peng, Zhao, Yan Y, lulu,
	intel-gvt-dev, intel-gfx

> From: Jason Gunthorpe
> Sent: Friday, February 24, 2023 10:36 AM
> 
> On Fri, Feb 24, 2023 at 02:21:33AM +0000, Tian, Kevin wrote:
> 
> > Yi, while you are incorporating this change please also update the
> > uapi header. Rename 'group_fds[]' to 'fds[]' and add comment to
> > explain that it could be an array of group fds or a single iommufd.
> 
> Upon reflection we can probably make it even simpler and just have a 0
> length fd array mean to use the iommufd the vfio_device is already
> associated with
> 
> And the check for correctness can be simplified to simply see if each
> vfio_device in the dev_set is attached to the same iommufd ctx
> pointer instead of searching the xarray.

yes, this is simpler

> 
> Would need to double check that the locking is OK but seems doable
> 

Locking is fine since dev_set->lock is already held in the reset path.



^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH v4 09/19] vfio/pci: Accept device fd for hot reset
  2023-02-24  2:48                 ` Tian, Kevin
@ 2023-02-24  3:43                   ` Liu, Yi L
  2023-02-24  3:56                     ` Tian, Kevin
  2023-02-24 14:30                     ` Jason Gunthorpe
  0 siblings, 2 replies; 55+ messages in thread
From: Liu, Yi L @ 2023-02-24  3:43 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe
  Cc: linux-s390, yi.y.sun, mjrosato, kvm, jasowang, joro, cohuck, Hao,
	Xudong, peterx, eric.auger, alex.williamson, Xu, Terrence,
	nicolinc, shameerali.kolothum.thodi, suravee.suthikulpanit,
	chao.p.peng, Zhao, Yan Y, lulu, intel-gvt-dev, intel-gfx

> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Friday, February 24, 2023 10:48 AM
> 
> > From: Jason Gunthorpe
> > Sent: Friday, February 24, 2023 10:36 AM
> >
> > On Fri, Feb 24, 2023 at 02:21:33AM +0000, Tian, Kevin wrote:
> >
> > > Yi, while you are incorporating this change please also update the
> > > uapi header. Rename 'group_fds[]' to 'fds[]' and add comment to
> > > explain that it could be an array of group fds or a single iommufd.
> >
> > Upon reflection we can probably make it even simpler and just have a 0
> > length fd array mean to use the iommufd the vfio_device is already
> > associated with
> >
> > And the check for correctness can be simplified to simply see if each
> > vfio_device in the dev_set is attached to the same iommufd ctx
> > pointer instead of searching the xarray.

How about the hot reset info path? We can still keep reporting the
current information to userspace. Isn't it?

another tricky question. If user passess iommufd down for reset
in the vfio iommufd compatible mode, should we support it as
well?

> yes, this is simpler
> 
> >
> > Would need to double check that the locking is OK but seems doable
> >
> 
> Locking is fine since dev_set->lock is already held in the reset path.

dev_set->lock is held prior to call bind_iommufd, so I agree locking
is ok.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH v4 09/19] vfio/pci: Accept device fd for hot reset
  2023-02-24  3:43                   ` Liu, Yi L
@ 2023-02-24  3:56                     ` Tian, Kevin
  2023-02-24  5:09                       ` Liu, Yi L
  2023-02-24 14:30                     ` Jason Gunthorpe
  1 sibling, 1 reply; 55+ messages in thread
From: Tian, Kevin @ 2023-02-24  3:56 UTC (permalink / raw)
  To: Liu, Yi L, Jason Gunthorpe
  Cc: linux-s390, yi.y.sun, mjrosato, kvm, jasowang, joro, cohuck, Hao,
	Xudong, peterx, eric.auger, alex.williamson, Xu, Terrence,
	nicolinc, shameerali.kolothum.thodi, suravee.suthikulpanit,
	chao.p.peng, Zhao, Yan Y, lulu, intel-gvt-dev, intel-gfx

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Friday, February 24, 2023 11:44 AM
> > > Upon reflection we can probably make it even simpler and just have a 0
> > > length fd array mean to use the iommufd the vfio_device is already
> > > associated with
> > >
> > > And the check for correctness can be simplified to simply see if each
> > > vfio_device in the dev_set is attached to the same iommufd ctx
> > > pointer instead of searching the xarray.
> 
> How about the hot reset info path? We can still keep reporting the
> current information to userspace. Isn't it?

No need to change that. It's already reported per device.

> 
> another tricky question. If user passess iommufd down for reset
> in the vfio iommufd compatible mode, should we support it as
> well?
> 

I don't see why we want to ban it. It does change the result from
error (vfio container) to success (iommufd vfio-compat) when using
the container fd/iommufd. But do we actually have a use case
relying on such error pattern?

On the other hand an user who knows the presence of vfio-compat
should be allowed to pass iommufd to reset even when it still uses
the legacy group/container interfaces.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-02-22 12:59       ` Jason Gunthorpe
@ 2023-02-24  4:58         ` Yan Zhao
  2023-02-24 14:31           ` Jason Gunthorpe
  0 siblings, 1 reply; 55+ messages in thread
From: Yan Zhao @ 2023-02-24  4:58 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liu, Yi L, Tian, Kevin, alex.williamson, joro, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390, Hao,
	Xudong, Xu, Terrence

On Wed, Feb 22, 2023 at 08:59:51AM -0400, Jason Gunthorpe wrote:
> On Wed, Feb 22, 2023 at 07:44:12AM +0000, Liu, Yi L wrote:
> > > From: Tian, Kevin <kevin.tian@intel.com>
> > > Sent: Wednesday, February 22, 2023 3:40 PM
> > > 
> > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > Sent: Tuesday, February 21, 2023 11:48 AM
> > > >
> > > > +
> > > > +void vfio_device_cdev_close(struct vfio_device_file *df)
> > > > +{
> > > > +	struct vfio_device *device = df->device;
> > > > +
> > > > +	mutex_lock(&device->dev_set->lock);
> > > > +	if (!smp_load_acquire(&df->access_granted)) {
> > > 
> > > there is no contention with another one changing this flag at this
> > > point so directly accessing it is fine.
> > 
> > make sense. 
> 
> Have to use READ_ONCE though
>
Just a curious question:
given df->access_granted is now written with device->dev_set->lock held and
also read with this lock held in vfio_device_cdev_close(), is READ_ONCE
still required? And what about df->iommufd ?

Thanks
Yan

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH v4 09/19] vfio/pci: Accept device fd for hot reset
  2023-02-24  3:56                     ` Tian, Kevin
@ 2023-02-24  5:09                       ` Liu, Yi L
  0 siblings, 0 replies; 55+ messages in thread
From: Liu, Yi L @ 2023-02-24  5:09 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe
  Cc: linux-s390, yi.y.sun, mjrosato, kvm, jasowang, joro, cohuck, Hao,
	Xudong, peterx, eric.auger, alex.williamson, Xu, Terrence,
	nicolinc, shameerali.kolothum.thodi, suravee.suthikulpanit,
	chao.p.peng, Zhao, Yan Y, lulu, intel-gvt-dev, intel-gfx

> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Friday, February 24, 2023 11:57 AM
> 
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Friday, February 24, 2023 11:44 AM
> > > > Upon reflection we can probably make it even simpler and just have a
> 0
> > > > length fd array mean to use the iommufd the vfio_device is already
> > > > associated with
> > > >
> > > > And the check for correctness can be simplified to simply see if each
> > > > vfio_device in the dev_set is attached to the same iommufd ctx
> > > > pointer instead of searching the xarray.
> >
> > How about the hot reset info path? We can still keep reporting the
> > current information to userspace. Isn't it?
> 
> No need to change that. It's already reported per device.
> 
> >
> > another tricky question. If user passess iommufd down for reset
> > in the vfio iommufd compatible mode, should we support it as
> > well?
> >
> 
> I don't see why we want to ban it. It does change the result from
> error (vfio container) to success (iommufd vfio-compat) when using
> the container fd/iommufd. But do we actually have a use case
> relying on such error pattern?
>
> On the other hand an user who knows the presence of vfio-compat
> should be allowed to pass iommufd to reset even when it still uses
> the legacy group/container interfaces.

Yes. although I guess no user would do such a strange thing if
no special reason. 😊

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 09/19] vfio/pci: Accept device fd for hot reset
  2023-02-24  3:43                   ` Liu, Yi L
  2023-02-24  3:56                     ` Tian, Kevin
@ 2023-02-24 14:30                     ` Jason Gunthorpe
  1 sibling, 0 replies; 55+ messages in thread
From: Jason Gunthorpe @ 2023-02-24 14:30 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Tian, Kevin, linux-s390, yi.y.sun, mjrosato, kvm, jasowang, joro,
	cohuck, Hao, Xudong, peterx, eric.auger, alex.williamson, Xu,
	Terrence, nicolinc, shameerali.kolothum.thodi,
	suravee.suthikulpanit, chao.p.peng, Zhao, Yan Y, lulu,
	intel-gvt-dev, intel-gfx

On Fri, Feb 24, 2023 at 03:43:37AM +0000, Liu, Yi L wrote:
> > From: Tian, Kevin <kevin.tian@intel.com>
> > Sent: Friday, February 24, 2023 10:48 AM
> > 
> > > From: Jason Gunthorpe
> > > Sent: Friday, February 24, 2023 10:36 AM
> > >
> > > On Fri, Feb 24, 2023 at 02:21:33AM +0000, Tian, Kevin wrote:
> > >
> > > > Yi, while you are incorporating this change please also update the
> > > > uapi header. Rename 'group_fds[]' to 'fds[]' and add comment to
> > > > explain that it could be an array of group fds or a single iommufd.
> > >
> > > Upon reflection we can probably make it even simpler and just have a 0
> > > length fd array mean to use the iommufd the vfio_device is already
> > > associated with
> > >
> > > And the check for correctness can be simplified to simply see if each
> > > vfio_device in the dev_set is attached to the same iommufd ctx
> > > pointer instead of searching the xarray.
> 
> How about the hot reset info path? We can still keep reporting the
> current information to userspace. Isn't it?

Yeah, but I wonder if it is useful
 
> another tricky question. If user passess iommufd down for reset
> in the vfio iommufd compatible mode, should we support it as
> well?

I would say if the 0 fds mode is used and the current vfio_Device does
not have an iommufd ctx then fail.

That is the only requirement, however it got that ctx doesn't matter.

> > Locking is fine since dev_set->lock is already held in the reset path.
> 
> dev_set->lock is held prior to call bind_iommufd, so I agree locking
> is ok.

As long as the vdev's iommufd ctx and opencount cannot change under
the devset lock, which I think is the case. It should be documented
though in the vfio core code, as it is a bit subtle what the devset
lock actually covers.

Jason

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-02-24  4:58         ` Yan Zhao
@ 2023-02-24 14:31           ` Jason Gunthorpe
  2023-02-27  4:46             ` Yan Zhao
  0 siblings, 1 reply; 55+ messages in thread
From: Jason Gunthorpe @ 2023-02-24 14:31 UTC (permalink / raw)
  To: Yan Zhao
  Cc: Liu, Yi L, Tian, Kevin, alex.williamson, joro, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390, Hao,
	Xudong, Xu, Terrence

On Fri, Feb 24, 2023 at 12:58:22PM +0800, Yan Zhao wrote:
> On Wed, Feb 22, 2023 at 08:59:51AM -0400, Jason Gunthorpe wrote:
> > On Wed, Feb 22, 2023 at 07:44:12AM +0000, Liu, Yi L wrote:
> > > > From: Tian, Kevin <kevin.tian@intel.com>
> > > > Sent: Wednesday, February 22, 2023 3:40 PM
> > > > 
> > > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > > Sent: Tuesday, February 21, 2023 11:48 AM
> > > > >
> > > > > +
> > > > > +void vfio_device_cdev_close(struct vfio_device_file *df)
> > > > > +{
> > > > > +	struct vfio_device *device = df->device;
> > > > > +
> > > > > +	mutex_lock(&device->dev_set->lock);
> > > > > +	if (!smp_load_acquire(&df->access_granted)) {
> > > > 
> > > > there is no contention with another one changing this flag at this
> > > > point so directly accessing it is fine.
> > > 
> > > make sense. 
> > 
> > Have to use READ_ONCE though
> >
> Just a curious question:
> given df->access_granted is now written with device->dev_set->lock held and
> also read with this lock held in vfio_device_cdev_close(), is READ_ONCE
> still required? And what about df->iommufd ?

No, if the writer is under a lock held by the reader then it is always
OK to use naked read. Best to document it with a comment

Jason

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH v4 09/19] vfio/pci: Accept device fd for hot reset
  2023-02-24  2:36               ` Jason Gunthorpe
  2023-02-24  2:48                 ` Tian, Kevin
@ 2023-02-26  8:59                 ` Liu, Yi L
  2023-02-26 23:40                   ` Jason Gunthorpe
  1 sibling, 1 reply; 55+ messages in thread
From: Liu, Yi L @ 2023-02-26  8:59 UTC (permalink / raw)
  To: Jason Gunthorpe, Tian, Kevin
  Cc: linux-s390, yi.y.sun, mjrosato, kvm, joro, cohuck, Hao, Xudong,
	peterx, Zhao, Yan Y, eric.auger, alex.williamson, Xu, Terrence,
	nicolinc, shameerali.kolothum.thodi, suravee.suthikulpanit,
	intel-gfx, chao.p.peng, lulu, intel-gvt-dev, jasowang

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, February 24, 2023 10:36 AM
> 
> On Fri, Feb 24, 2023 at 02:21:33AM +0000, Tian, Kevin wrote:
> 
> > Yi, while you are incorporating this change please also update the
> > uapi header. Rename 'group_fds[]' to 'fds[]' and add comment to
> > explain that it could be an array of group fds or a single iommufd.
> 
> Upon reflection we can probably make it even simpler and just have a 0
> length fd array mean to use the iommufd the vfio_device is already
> associated with
> 
> And the check for correctness can be simplified to simply see if each
> vfio_device in the dev_set is attached to the same iommufd ctx
> pointer instead of searching the xarray.

Sorry, it appears to me the below concern is not solved as above logic
still requires userspace to open and bind devices to the same iommufd.

"
  > Say a scenario where group happens to overlap with devset. Let's say
  > two devices in the group/devset.
  > 
  > An existing deployment assigns only dev1 to Qemu. In this case dev1
  > is resettable via group fd given dev2 cannot be opened by another
  > user.
"

Thus, I think we still need to search the xarray to check if a device is
bound to iommufd or not. And this check needs to be more relaxed.
E.g. dev1 is bound to iommufd, but dev2 has not. However, they have
the same group, so dev2 should be considered to be "bound" as well.
When 0 length fd is used, vfio just gets the iommufd_ctx from the device
that is to be reset, then check if all other devices in the dev_set are
considered as bound with the below interface.

+/**
+ * iommufd_ctx_has_group_for_device - True if any alias of struct device
+					is bound to this ictx
+ * @ictx: iommufd file descriptor
+ * @dev: Pointer to a physical device struct
+ *
+ * True if a iommufd_device_bind() is present for any alias of this dev
+ */
+bool iommufd_ctx_has_group_for_device(struct iommufd_ctx *ictx, struct device *dev)
+{
+	unsigned long index;
+	struct iommu_group *group;
+	struct iommufd_object *obj;
+
+	if (!ictx)
+		return false;
+
+	group = iommu_group_get(dev);
+	if (!group)
+		return false;
+
+	xa_lock(&ictx->objects);
+	xa_for_each(&ictx->objects, index, obj) {
+		if (obj->type == IOMMUFD_OBJ_DEVICE)  &&
+		    container_of(obj, struct iommufd_device, obj)->group == group) {
+			xa_unlock(&ictx->objects);
+			iommu_group_put(group);
+			return true;
+		}
+	}
+	xa_unlock(&ictx->objects);
+	iommu_group_put(group);
+	return false;
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_ctx_has_group_for_device, IOMMUFD);

Regards
Yi Liu

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH v4 03/19] vfio: Accept vfio device file in the driver facing kAPI
  2023-02-22  7:15   ` Tian, Kevin
@ 2023-02-26 12:20     ` Liu, Yi L
  0 siblings, 0 replies; 55+ messages in thread
From: Liu, Yi L @ 2023-02-26 12:20 UTC (permalink / raw)
  To: Tian, Kevin, alex.williamson, jgg
  Cc: joro, cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	Zhao, Yan Y, Hao, Xudong, Xu, Terrence

> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Wednesday, February 22, 2023 3:16 PM
> 
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Tuesday, February 21, 2023 11:48 AM
> >
> >  /**
> >   * vfio_file_is_valid - True if the file is valid vfio file
> >   * @file: VFIO group file or VFIO device file
> >   */
> >  bool vfio_file_is_valid(struct file *file)
> >  {
> > -	return vfio_group_from_file(file);
> > +	return vfio_group_from_file(file) ||
> > +	       vfio_device_from_file(file);
> >  }
> >  EXPORT_SYMBOL_GPL(vfio_file_is_valid);
> 
> Following previous Alex's comment I'd leave pci hot reset path
> to continue calling vfio_file_is_group() at this point so the uAPI
> semantics is not changed here before patch 9 comes to properly
> handle device fd in that path.

Done.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 09/19] vfio/pci: Accept device fd for hot reset
  2023-02-26  8:59                 ` Liu, Yi L
@ 2023-02-26 23:40                   ` Jason Gunthorpe
  2023-02-27  2:53                     ` Liu, Yi L
  0 siblings, 1 reply; 55+ messages in thread
From: Jason Gunthorpe @ 2023-02-26 23:40 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Tian, Kevin, linux-s390, yi.y.sun, mjrosato, kvm, joro, cohuck,
	Hao, Xudong, peterx, Zhao, Yan Y, eric.auger, alex.williamson,
	Xu, Terrence, nicolinc, shameerali.kolothum.thodi,
	suravee.suthikulpanit, intel-gfx, chao.p.peng, lulu,
	intel-gvt-dev, jasowang

On Sun, Feb 26, 2023 at 08:59:01AM +0000, Liu, Yi L wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Friday, February 24, 2023 10:36 AM
> > 
> > On Fri, Feb 24, 2023 at 02:21:33AM +0000, Tian, Kevin wrote:
> > 
> > > Yi, while you are incorporating this change please also update the
> > > uapi header. Rename 'group_fds[]' to 'fds[]' and add comment to
> > > explain that it could be an array of group fds or a single iommufd.
> > 
> > Upon reflection we can probably make it even simpler and just have a 0
> > length fd array mean to use the iommufd the vfio_device is already
> > associated with
> > 
> > And the check for correctness can be simplified to simply see if each
> > vfio_device in the dev_set is attached to the same iommufd ctx
> > pointer instead of searching the xarray.
> 
> Sorry, it appears to me the below concern is not solved as above logic
> still requires userspace to open and bind devices to the same iommufd.
> 
> "
>   > Say a scenario where group happens to overlap with devset. Let's say
>   > two devices in the group/devset.
>   > 
>   > An existing deployment assigns only dev1 to Qemu. In this case dev1
>   > is resettable via group fd given dev2 cannot be opened by another
>   > user.
> "

You solve this by checking for a 0 open count as already discussed.

Jason

^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH v4 09/19] vfio/pci: Accept device fd for hot reset
  2023-02-26 23:40                   ` Jason Gunthorpe
@ 2023-02-27  2:53                     ` Liu, Yi L
  0 siblings, 0 replies; 55+ messages in thread
From: Liu, Yi L @ 2023-02-27  2:53 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Tian, Kevin, linux-s390, yi.y.sun, mjrosato, kvm, joro, cohuck,
	Hao, Xudong, peterx, Zhao, Yan Y, eric.auger, alex.williamson,
	Xu, Terrence, nicolinc, shameerali.kolothum.thodi,
	suravee.suthikulpanit, intel-gfx, chao.p.peng, lulu,
	intel-gvt-dev, jasowang

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Monday, February 27, 2023 7:40 AM
> On Sun, Feb 26, 2023 at 08:59:01AM +0000, Liu, Yi L wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Friday, February 24, 2023 10:36 AM
> > >
> > > On Fri, Feb 24, 2023 at 02:21:33AM +0000, Tian, Kevin wrote:
> > >
> > > > Yi, while you are incorporating this change please also update the
> > > > uapi header. Rename 'group_fds[]' to 'fds[]' and add comment to
> > > > explain that it could be an array of group fds or a single iommufd.
> > >
> > > Upon reflection we can probably make it even simpler and just have a 0
> > > length fd array mean to use the iommufd the vfio_device is already
> > > associated with
> > >
> > > And the check for correctness can be simplified to simply see if each
> > > vfio_device in the dev_set is attached to the same iommufd ctx
> > > pointer instead of searching the xarray.
> >
> > Sorry, it appears to me the below concern is not solved as above logic
> > still requires userspace to open and bind devices to the same iommufd.
> >
> > "
> >   > Say a scenario where group happens to overlap with devset. Let's say
> >   > two devices in the group/devset.
> >   >
> >   > An existing deployment assigns only dev1 to Qemu. In this case dev1
> >   > is resettable via group fd given dev2 cannot be opened by another
> >   > user.
> > "
> 
> You solve this by checking for a 0 open count as already discussed.

Ok. this scenario is solved. so if open_count is non-zero, it should be
either bound with iommufd or should be within groups as your below
suggestion. 

		if (cur_vma->vdev.open_count &&
		    !vfio_dev_in_groups(cur_vma, groups) &&
		    !iommufd_ctx_has_device(iommufd_ctx, &cur_vma->pdev->dev)) {
			ret = -EINVAL;
			goto err_undo;
		}

Btw. In cdev path, open_count++ and iommufd bound are done in a
single dev_set->lock lock and unlock pair, so if cur_vma->vdev has
iommufd_ctx, then its open_count is non-zero. I have another scenario
that dev1 and dev2 are from different groups but happen to be in
the same dev_set, and userspace only opens dev1, this logic will allow
userspace to reset dev1, but dev2 may be opened by another userspace.
This seems to be a problem in my prior thinking. However, dev_set->lock
Is held in the reset path, so other userspace cannot open and bind
cur_vma->vdev to iommufd during reset. 😊

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v4 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-02-24 14:31           ` Jason Gunthorpe
@ 2023-02-27  4:46             ` Yan Zhao
  0 siblings, 0 replies; 55+ messages in thread
From: Yan Zhao @ 2023-02-27  4:46 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-s390, Tian, Kevin, Liu, Yi L, yi.y.sun, mjrosato, kvm,
	joro, cohuck, Hao, Xudong, peterx, eric.auger, alex.williamson,
	Xu, Terrence, nicolinc, shameerali.kolothum.thodi,
	suravee.suthikulpanit, intel-gfx, chao.p.peng, lulu,
	intel-gvt-dev, jasowang

On Fri, Feb 24, 2023 at 10:31:35AM -0400, Jason Gunthorpe wrote:
> On Fri, Feb 24, 2023 at 12:58:22PM +0800, Yan Zhao wrote:
> > On Wed, Feb 22, 2023 at 08:59:51AM -0400, Jason Gunthorpe wrote:
> > > On Wed, Feb 22, 2023 at 07:44:12AM +0000, Liu, Yi L wrote:
> > > > > From: Tian, Kevin <kevin.tian@intel.com>
> > > > > Sent: Wednesday, February 22, 2023 3:40 PM
> > > > > 
> > > > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > > > Sent: Tuesday, February 21, 2023 11:48 AM
> > > > > >
> > > > > > +
> > > > > > +void vfio_device_cdev_close(struct vfio_device_file *df)
> > > > > > +{
> > > > > > +	struct vfio_device *device = df->device;
> > > > > > +
> > > > > > +	mutex_lock(&device->dev_set->lock);
> > > > > > +	if (!smp_load_acquire(&df->access_granted)) {
> > > > > 
> > > > > there is no contention with another one changing this flag at this
> > > > > point so directly accessing it is fine.
> > > > 
> > > > make sense. 
> > > 
> > > Have to use READ_ONCE though
> > >
> > Just a curious question:
> > given df->access_granted is now written with device->dev_set->lock held and
> > also read with this lock held in vfio_device_cdev_close(), is READ_ONCE
> > still required? And what about df->iommufd ?
> 
> No, if the writer is under a lock held by the reader then it is always
> OK to use naked read. Best to document it with a comment
>
Thanks for the clarification!

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2023-02-27  5:10 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-21  3:47 [PATCH v4 00/19] Add vfio_device cdev for iommufd support Yi Liu
2023-02-21  3:47 ` [PATCH v4 01/19] vfio: Allocate per device file structure Yi Liu
2023-02-21  3:47 ` [PATCH v4 02/19] vfio: Refine vfio file kAPIs Yi Liu
2023-02-21  3:47 ` [PATCH v4 03/19] vfio: Accept vfio device file in the driver facing kAPI Yi Liu
2023-02-22  7:15   ` Tian, Kevin
2023-02-26 12:20     ` Liu, Yi L
2023-02-21  3:47 ` [PATCH v4 04/19] kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device fd Yi Liu
2023-02-21  3:47 ` [PATCH v4 05/19] kvm/vfio: Accept vfio device file from userspace Yi Liu
2023-02-22  7:17   ` Tian, Kevin
2023-02-23 10:33     ` Liu, Yi L
2023-02-21  3:47 ` [PATCH v4 06/19] vfio: Pass struct vfio_device_file * to vfio_device_open/close() Yi Liu
2023-02-21  3:48 ` [PATCH v4 07/19] vfio: Block device access via device fd until device is opened Yi Liu
2023-02-22  7:55   ` Yan Zhao
2023-02-22  8:29     ` Liu, Yi L
2023-02-21  3:48 ` [PATCH v4 08/19] vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset() Yi Liu
2023-02-22  7:20   ` Tian, Kevin
2023-02-21  3:48 ` [PATCH v4 09/19] vfio/pci: Accept device fd for hot reset Yi Liu
2023-02-22  7:26   ` Tian, Kevin
2023-02-22 13:35     ` Liu, Yi L
2023-02-22 17:17       ` Jason Gunthorpe
2023-02-23  7:55         ` Tian, Kevin
2023-02-23 13:21           ` Jason Gunthorpe
2023-02-24  2:21             ` Tian, Kevin
2023-02-24  2:36               ` Jason Gunthorpe
2023-02-24  2:48                 ` Tian, Kevin
2023-02-24  3:43                   ` Liu, Yi L
2023-02-24  3:56                     ` Tian, Kevin
2023-02-24  5:09                       ` Liu, Yi L
2023-02-24 14:30                     ` Jason Gunthorpe
2023-02-26  8:59                 ` Liu, Yi L
2023-02-26 23:40                   ` Jason Gunthorpe
2023-02-27  2:53                     ` Liu, Yi L
2023-02-21  3:48 ` [PATCH v4 10/19] vfio: Add infrastructure for bind_iommufd from userspace Yi Liu
2023-02-21  3:48 ` [PATCH v4 11/19] vfio-iommufd: Add detach_ioas support for physical VFIO devices Yi Liu
2023-02-21  3:48 ` [PATCH v4 12/19] vfio-iommufd: Add detach_ioas for emulated " Yi Liu
2023-02-21  3:48 ` [PATCH v4 13/19] vfio: Add cdev_device_open_cnt to vfio_group Yi Liu
2023-02-22  7:31   ` Tian, Kevin
2023-02-21  3:48 ` [PATCH v4 14/19] vfio: Make vfio_device_open() single open for device cdev path Yi Liu
2023-02-22  7:32   ` Tian, Kevin
2023-02-21  3:48 ` [PATCH v4 15/19] vfio: Add cdev for vfio_device Yi Liu
2023-02-22  7:34   ` Tian, Kevin
2023-02-21  3:48 ` [PATCH v4 16/19] vfio: Add VFIO_DEVICE_BIND_IOMMUFD Yi Liu
2023-02-22  7:39   ` Tian, Kevin
2023-02-22  7:44     ` Liu, Yi L
2023-02-22  7:59       ` Tian, Kevin
2023-02-22 12:59       ` Jason Gunthorpe
2023-02-24  4:58         ` Yan Zhao
2023-02-24 14:31           ` Jason Gunthorpe
2023-02-27  4:46             ` Yan Zhao
2023-02-22  7:53   ` Yan Zhao
2023-02-22  8:28     ` Liu, Yi L
2023-02-21  3:48 ` [PATCH v4 17/19] vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT Yi Liu
2023-02-22  7:41   ` Tian, Kevin
2023-02-21  3:48 ` [PATCH v4 18/19] vfio: Compile group optionally Yi Liu
2023-02-21  3:48 ` [PATCH v4 19/19] docs: vfio: Add vfio device cdev description Yi Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).