All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/13] Add vfio_device cdev for iommufd support
@ 2023-01-17 13:49 Yi Liu
  2023-01-17 13:49 ` [PATCH 01/13] vfio: Allocate per device file structure Yi Liu
                   ` (12 more replies)
  0 siblings, 13 replies; 80+ messages in thread
From: Yi Liu @ 2023-01-17 13:49 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kevin.tian, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	suravee.suthikulpanit

Existing VFIO provides group-centric user APIs for userspace. Userspace
opens the /dev/vfio/$group_id first before getting device fd and hence
getting access to device. This is not the desired model for iommufd. Per
the conclusion of community discussion[1], iommufd provides device-centric
kAPIs and requires its consumer (like VFIO) to be device-centric user
APIs. Such user APIs are used to associate device with iommufd and also
the I/O address spaces managed by the iommufd.

This series first introduces a per device file structure to be prepared
for further enhancement and refactors the kvm-vfio code to be prepared
for accepting device file from userspace. Then refactors the vfio to be
able to handle iommufd binding. This refactor includes the mechanism of
blocking device access before iommufd bind, making vfio_device_open() be
exclusive between the group path and the cdev path. Eventually, adds the
cdev support for vfio device, and makes group infrastructure optional as
it is not needed when vfio device cdev is compiled.

This is also a base for further support iommu nesting for vfio device[2].

The complete code can be found in below branch, simple test done with the
legacy group path and the cdev path. Draft QEMU branch can be found at[3]

https://github.com/yiliu1765/iommufd/tree/vfio_device_cdev_v1
(config CONFIG_IOMMUFD=y)

[1] https://lore.kernel.org/kvm/BN9PR11MB5433B1E4AE5B0480369F97178C189@BN9PR11MB5433.namprd11.prod.outlook.com/
[2] https://github.com/yiliu1765/iommufd/tree/wip/iommufd-v6.2-rc4-nesting
[3] https://github.com/yiliu1765/qemu/tree/wip/qemu-iommufd-6.2-rc4 (need further cleanup)

Change log:

v1:
 - Fix the circular refcount between kvm struct and device file reference. (JasonG)
 - Address comments from KevinT
 - Remained the ioctl for detach, needs to Alex's taste
   (https://lore.kernel.org/kvm/BN9PR11MB5276BE9F4B0613EE859317028CFF9@BN9PR11MB5276.namprd11.prod.outlook.com/)

rfc: https://lore.kernel.org/kvm/20221219084718.9342-1-yi.l.liu@intel.com/

Thanks,
	Yi Liu

Yi Liu (13):
  vfio: Allocate per device file structure
  vfio: Refine vfio file kAPIs
  vfio: Accept vfio device file in the driver facing kAPI
  kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device
    fd
  kvm/vfio: Provide struct kvm_device_ops::release() insted of
    ::destroy()
  kvm/vfio: Accept vfio device file from userspace
  vfio: Pass struct vfio_device_file * to vfio_device_open/close()
  vfio: Block device access via device fd until device is opened
  vfio: Add infrastructure for bind_iommufd and attach
  vfio: Make vfio_device_open() exclusive between group path and device
    cdev path
  vfio: Add cdev for vfio_device
  vfio: Add ioctls for device cdev iommufd
  vfio: Compile group optionally

 Documentation/virt/kvm/devices/vfio.rst |  32 +-
 drivers/vfio/Kconfig                    |  17 +
 drivers/vfio/Makefile                   |   3 +-
 drivers/vfio/group.c                    | 131 +++---
 drivers/vfio/iommufd.c                  |  79 +++-
 drivers/vfio/pci/vfio_pci_core.c        |   4 +-
 drivers/vfio/vfio.h                     | 109 ++++-
 drivers/vfio/vfio_main.c                | 506 ++++++++++++++++++++++--
 include/linux/vfio.h                    |  21 +-
 include/uapi/linux/iommufd.h            |   2 +
 include/uapi/linux/kvm.h                |  23 +-
 include/uapi/linux/vfio.h               |  64 +++
 virt/kvm/vfio.c                         | 145 +++----
 13 files changed, 909 insertions(+), 227 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 01/13] vfio: Allocate per device file structure
  2023-01-17 13:49 [PATCH 00/13] Add vfio_device cdev for iommufd support Yi Liu
@ 2023-01-17 13:49 ` Yi Liu
  2023-01-18  8:37   ` Tian, Kevin
  2023-01-18 13:28   ` Eric Auger
  2023-01-17 13:49 ` [PATCH 02/13] vfio: Refine vfio file kAPIs Yi Liu
                   ` (11 subsequent siblings)
  12 siblings, 2 replies; 80+ messages in thread
From: Yi Liu @ 2023-01-17 13:49 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kevin.tian, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	suravee.suthikulpanit

This is preparation for adding vfio device cdev support. vfio device
cdev requires:
1) a per device file memory to store the kvm pointer set by KVM. It will
   be propagated to vfio_device:kvm after the device cdev file is bound
   to an iommufd
2) a mechanism to block device access through device cdev fd before it
   is bound to an iommufd

To address above requirements, this adds a per device file structure
named vfio_device_file. For now, it's only a wrapper of struct vfio_device
pointer. Other fields will be added to this per file structure in future
commits.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c     | 13 +++++++++++--
 drivers/vfio/vfio.h      |  6 ++++++
 drivers/vfio/vfio_main.c | 31 ++++++++++++++++++++++++++-----
 3 files changed, 43 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index bb24b2f0271e..8fdb7e35b0a6 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -186,19 +186,26 @@ void vfio_device_group_close(struct vfio_device *device)
 
 static struct file *vfio_device_open_file(struct vfio_device *device)
 {
+	struct vfio_device_file *df;
 	struct file *filep;
 	int ret;
 
+	df = vfio_allocate_device_file(device);
+	if (IS_ERR(df)) {
+		ret = PTR_ERR(df);
+		goto err_out;
+	}
+
 	ret = vfio_device_group_open(device);
 	if (ret)
-		goto err_out;
+		goto err_free;
 
 	/*
 	 * We can't use anon_inode_getfd() because we need to modify
 	 * the f_mode flags directly to allow more than just ioctls
 	 */
 	filep = anon_inode_getfile("[vfio-device]", &vfio_device_fops,
-				   device, O_RDWR);
+				   df, O_RDWR);
 	if (IS_ERR(filep)) {
 		ret = PTR_ERR(filep);
 		goto err_close_device;
@@ -222,6 +229,8 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
 
 err_close_device:
 	vfio_device_group_close(device);
+err_free:
+	kfree(df);
 err_out:
 	return ERR_PTR(ret);
 }
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index f8219a438bfb..1091806bc89d 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -16,12 +16,18 @@ struct iommu_group;
 struct vfio_device;
 struct vfio_container;
 
+struct vfio_device_file {
+	struct vfio_device *device;
+};
+
 void vfio_device_put_registration(struct vfio_device *device);
 bool vfio_device_try_get_registration(struct vfio_device *device);
 int vfio_device_open(struct vfio_device *device,
 		     struct iommufd_ctx *iommufd, struct kvm *kvm);
 void vfio_device_close(struct vfio_device *device,
 		       struct iommufd_ctx *iommufd);
+struct vfio_device_file *
+vfio_allocate_device_file(struct vfio_device *device);
 
 extern const struct file_operations vfio_device_fops;
 
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 5177bb061b17..ee54c9ae0af4 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -344,6 +344,20 @@ static bool vfio_assert_device_open(struct vfio_device *device)
 	return !WARN_ON_ONCE(!READ_ONCE(device->open_count));
 }
 
+struct vfio_device_file *
+vfio_allocate_device_file(struct vfio_device *device)
+{
+	struct vfio_device_file *df;
+
+	df = kzalloc(sizeof(*df), GFP_KERNEL_ACCOUNT);
+	if (!df)
+		return ERR_PTR(-ENOMEM);
+
+	df->device = device;
+
+	return df;
+}
+
 static int vfio_device_first_open(struct vfio_device *device,
 				  struct iommufd_ctx *iommufd, struct kvm *kvm)
 {
@@ -461,12 +475,15 @@ static inline void vfio_device_pm_runtime_put(struct vfio_device *device)
  */
 static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 
 	vfio_device_group_close(device);
 
 	vfio_device_put_registration(device);
 
+	kfree(df);
+
 	return 0;
 }
 
@@ -1031,7 +1048,8 @@ static int vfio_ioctl_device_feature(struct vfio_device *device,
 static long vfio_device_fops_unl_ioctl(struct file *filep,
 				       unsigned int cmd, unsigned long arg)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 	int ret;
 
 	ret = vfio_device_pm_runtime_get(device);
@@ -1058,7 +1076,8 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
 static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf,
 				     size_t count, loff_t *ppos)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 
 	if (unlikely(!device->ops->read))
 		return -EINVAL;
@@ -1070,7 +1089,8 @@ static ssize_t vfio_device_fops_write(struct file *filep,
 				      const char __user *buf,
 				      size_t count, loff_t *ppos)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 
 	if (unlikely(!device->ops->write))
 		return -EINVAL;
@@ -1080,7 +1100,8 @@ static ssize_t vfio_device_fops_write(struct file *filep,
 
 static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 
 	if (unlikely(!device->ops->mmap))
 		return -EINVAL;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 02/13] vfio: Refine vfio file kAPIs
  2023-01-17 13:49 [PATCH 00/13] Add vfio_device cdev for iommufd support Yi Liu
  2023-01-17 13:49 ` [PATCH 01/13] vfio: Allocate per device file structure Yi Liu
@ 2023-01-17 13:49 ` Yi Liu
  2023-01-18  8:42   ` Tian, Kevin
  2023-01-18 14:37   ` Eric Auger
  2023-01-17 13:49 ` [PATCH 03/13] vfio: Accept vfio device file in the driver facing kAPI Yi Liu
                   ` (10 subsequent siblings)
  12 siblings, 2 replies; 80+ messages in thread
From: Yi Liu @ 2023-01-17 13:49 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kevin.tian, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	suravee.suthikulpanit

This prepares for making the below kAPIs to accept both group file
and device file instead of only vfio group file.

  bool vfio_file_enforced_coherent(struct file *file);
  void vfio_file_set_kvm(struct file *file, struct kvm *kvm);
  bool vfio_file_has_dev(struct file *file, struct vfio_device *device);

Besides above change, vfio_file_is_group() is renamed to be
vfio_file_is_valid().

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c             | 74 ++++++++------------------------
 drivers/vfio/pci/vfio_pci_core.c |  4 +-
 drivers/vfio/vfio.h              |  4 ++
 drivers/vfio/vfio_main.c         | 62 ++++++++++++++++++++++++++
 include/linux/vfio.h             |  2 +-
 virt/kvm/vfio.c                  | 10 ++---
 6 files changed, 92 insertions(+), 64 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 8fdb7e35b0a6..d83cf069d290 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -721,6 +721,15 @@ bool vfio_device_has_container(struct vfio_device *device)
 	return device->group->container;
 }
 
+struct vfio_group *vfio_group_from_file(struct file *file)
+{
+	struct vfio_group *group = file->private_data;
+
+	if (file->f_op != &vfio_group_fops)
+		return NULL;
+	return group;
+}
+
 /**
  * vfio_file_iommu_group - Return the struct iommu_group for the vfio group file
  * @file: VFIO group file
@@ -731,13 +740,13 @@ bool vfio_device_has_container(struct vfio_device *device)
  */
 struct iommu_group *vfio_file_iommu_group(struct file *file)
 {
-	struct vfio_group *group = file->private_data;
+	struct vfio_group *group = vfio_group_from_file(file);
 	struct iommu_group *iommu_group = NULL;
 
 	if (!IS_ENABLED(CONFIG_SPAPR_TCE_IOMMU))
 		return NULL;
 
-	if (!vfio_file_is_group(file))
+	if (!group)
 		return NULL;
 
 	mutex_lock(&group->group_lock);
@@ -750,34 +759,11 @@ struct iommu_group *vfio_file_iommu_group(struct file *file)
 }
 EXPORT_SYMBOL_GPL(vfio_file_iommu_group);
 
-/**
- * vfio_file_is_group - True if the file is usable with VFIO aPIS
- * @file: VFIO group file
- */
-bool vfio_file_is_group(struct file *file)
-{
-	return file->f_op == &vfio_group_fops;
-}
-EXPORT_SYMBOL_GPL(vfio_file_is_group);
-
-/**
- * vfio_file_enforced_coherent - True if the DMA associated with the VFIO file
- *        is always CPU cache coherent
- * @file: VFIO group file
- *
- * Enforced coherency means that the IOMMU ignores things like the PCIe no-snoop
- * bit in DMA transactions. A return of false indicates that the user has
- * rights to access additional instructions such as wbinvd on x86.
- */
-bool vfio_file_enforced_coherent(struct file *file)
+bool vfio_group_enforced_coherent(struct vfio_group *group)
 {
-	struct vfio_group *group = file->private_data;
 	struct vfio_device *device;
 	bool ret = true;
 
-	if (!vfio_file_is_group(file))
-		return true;
-
 	/*
 	 * If the device does not have IOMMU_CAP_ENFORCE_CACHE_COHERENCY then
 	 * any domain later attached to it will also not support it. If the cap
@@ -795,46 +781,22 @@ bool vfio_file_enforced_coherent(struct file *file)
 	mutex_unlock(&group->device_lock);
 	return ret;
 }
-EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
 
-/**
- * vfio_file_set_kvm - Link a kvm with VFIO drivers
- * @file: VFIO group file
- * @kvm: KVM to link
- *
- * When a VFIO device is first opened the KVM will be available in
- * device->kvm if one was associated with the group.
- */
-void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
+void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm)
 {
-	struct vfio_group *group = file->private_data;
-
-	if (!vfio_file_is_group(file))
-		return;
-
+	/*
+	 * When a VFIO device is first opened the KVM will be available in
+	 * device->kvm if one was associated with the group.
+	 */
 	mutex_lock(&group->group_lock);
 	group->kvm = kvm;
 	mutex_unlock(&group->group_lock);
 }
-EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
 
-/**
- * vfio_file_has_dev - True if the VFIO file is a handle for device
- * @file: VFIO file to check
- * @device: Device that must be part of the file
- *
- * Returns true if given file has permission to manipulate the given device.
- */
-bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
+bool vfio_group_has_dev(struct vfio_group *group, struct vfio_device *device)
 {
-	struct vfio_group *group = file->private_data;
-
-	if (!vfio_file_is_group(file))
-		return false;
-
 	return group == device->group;
 }
-EXPORT_SYMBOL_GPL(vfio_file_has_dev);
 
 static char *vfio_devnode(const struct device *dev, umode_t *mode)
 {
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 26a541cc64d1..985c6184a587 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1319,8 +1319,8 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 			break;
 		}
 
-		/* Ensure the FD is a vfio group FD.*/
-		if (!vfio_file_is_group(file)) {
+		/* Ensure the FD is a vfio FD.*/
+		if (!vfio_file_is_valid(file)) {
 			fput(file);
 			ret = -EINVAL;
 			break;
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 1091806bc89d..ef5de2872983 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -90,6 +90,10 @@ void vfio_device_group_unregister(struct vfio_device *device);
 int vfio_device_group_use_iommu(struct vfio_device *device);
 void vfio_device_group_unuse_iommu(struct vfio_device *device);
 void vfio_device_group_close(struct vfio_device *device);
+struct vfio_group *vfio_group_from_file(struct file *file);
+bool vfio_group_enforced_coherent(struct vfio_group *group);
+void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm);
+bool vfio_group_has_dev(struct vfio_group *group, struct vfio_device *device);
 bool vfio_device_has_container(struct vfio_device *device);
 int __init vfio_group_init(void);
 void vfio_group_cleanup(void);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index ee54c9ae0af4..1aedfbd15ca0 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1119,6 +1119,68 @@ const struct file_operations vfio_device_fops = {
 	.mmap		= vfio_device_fops_mmap,
 };
 
+/**
+ * vfio_file_is_valid - True if the file is usable with VFIO aPIS
+ * @file: VFIO group file or VFIO device file
+ */
+bool vfio_file_is_valid(struct file *file)
+{
+	return vfio_group_from_file(file);
+}
+EXPORT_SYMBOL_GPL(vfio_file_is_valid);
+
+/**
+ * vfio_file_enforced_coherent - True if the DMA associated with the VFIO file
+ *        is always CPU cache coherent
+ * @file: VFIO group or device file
+ *
+ * Enforced coherency means that the IOMMU ignores things like the PCIe no-snoop
+ * bit in DMA transactions. A return of false indicates that the user has
+ * rights to access additional instructions such as wbinvd on x86.
+ */
+bool vfio_file_enforced_coherent(struct file *file)
+{
+	struct vfio_group *group = vfio_group_from_file(file);
+
+	if (group)
+		return vfio_group_enforced_coherent(group);
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
+
+/**
+ * vfio_file_set_kvm - Link a kvm with VFIO drivers
+ * @file: VFIO group file or device file
+ * @kvm: KVM to link
+ *
+ */
+void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
+{
+	struct vfio_group *group = vfio_group_from_file(file);
+
+	if (group)
+		vfio_group_set_kvm(group, kvm);
+}
+EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
+
+/**
+ * vfio_file_has_dev - True if the VFIO file is a handle for device
+ * @file: VFIO file to check
+ * @device: Device that must be part of the file
+ *
+ * Returns true if given file has permission to manipulate the given device.
+ */
+bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
+{
+	struct vfio_group *group = vfio_group_from_file(file);
+
+	if (group)
+		return vfio_group_has_dev(group, device);
+	return false;
+}
+EXPORT_SYMBOL_GPL(vfio_file_has_dev);
+
 /*
  * Sub-module support
  */
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 35be78e9ae57..46edd6e6c0ba 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -241,7 +241,7 @@ int vfio_mig_get_next_state(struct vfio_device *device,
  * External user API
  */
 struct iommu_group *vfio_file_iommu_group(struct file *file);
-bool vfio_file_is_group(struct file *file);
+bool vfio_file_is_valid(struct file *file);
 bool vfio_file_enforced_coherent(struct file *file);
 void vfio_file_set_kvm(struct file *file, struct kvm *kvm);
 bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 495ceabffe88..868930c7a59b 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -64,18 +64,18 @@ static bool kvm_vfio_file_enforced_coherent(struct file *file)
 	return ret;
 }
 
-static bool kvm_vfio_file_is_group(struct file *file)
+static bool kvm_vfio_file_is_valid(struct file *file)
 {
 	bool (*fn)(struct file *file);
 	bool ret;
 
-	fn = symbol_get(vfio_file_is_group);
+	fn = symbol_get(vfio_file_is_valid);
 	if (!fn)
 		return false;
 
 	ret = fn(file);
 
-	symbol_put(vfio_file_is_group);
+	symbol_put(vfio_file_is_valid);
 
 	return ret;
 }
@@ -154,8 +154,8 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
 	if (!filp)
 		return -EBADF;
 
-	/* Ensure the FD is a vfio group FD.*/
-	if (!kvm_vfio_file_is_group(filp)) {
+	/* Ensure the FD is a vfio FD.*/
+	if (!kvm_vfio_file_is_valid(filp)) {
 		ret = -EINVAL;
 		goto err_fput;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 03/13] vfio: Accept vfio device file in the driver facing kAPI
  2023-01-17 13:49 [PATCH 00/13] Add vfio_device cdev for iommufd support Yi Liu
  2023-01-17 13:49 ` [PATCH 01/13] vfio: Allocate per device file structure Yi Liu
  2023-01-17 13:49 ` [PATCH 02/13] vfio: Refine vfio file kAPIs Yi Liu
@ 2023-01-17 13:49 ` Yi Liu
  2023-01-18  8:45   ` Tian, Kevin
  2023-01-18 16:11   ` Eric Auger
  2023-01-17 13:49 ` [PATCH 04/13] kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device fd Yi Liu
                   ` (9 subsequent siblings)
  12 siblings, 2 replies; 80+ messages in thread
From: Yi Liu @ 2023-01-17 13:49 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kevin.tian, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	suravee.suthikulpanit

This makes the vfio file kAPIs to accepte vfio device files, also a
preparation for vfio device cdev support.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/vfio.h      |  1 +
 drivers/vfio/vfio_main.c | 51 ++++++++++++++++++++++++++++++++++++----
 2 files changed, 48 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index ef5de2872983..53af6e3ea214 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -18,6 +18,7 @@ struct vfio_container;
 
 struct vfio_device_file {
 	struct vfio_device *device;
+	struct kvm *kvm;
 };
 
 void vfio_device_put_registration(struct vfio_device *device);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 1aedfbd15ca0..dc08d5dd62cc 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1119,13 +1119,23 @@ const struct file_operations vfio_device_fops = {
 	.mmap		= vfio_device_fops_mmap,
 };
 
+static struct vfio_device *vfio_device_from_file(struct file *file)
+{
+	struct vfio_device_file *df = file->private_data;
+
+	if (file->f_op != &vfio_device_fops)
+		return NULL;
+	return df->device;
+}
+
 /**
  * vfio_file_is_valid - True if the file is usable with VFIO aPIS
  * @file: VFIO group file or VFIO device file
  */
 bool vfio_file_is_valid(struct file *file)
 {
-	return vfio_group_from_file(file);
+	return vfio_group_from_file(file) ||
+	       vfio_device_from_file(file);
 }
 EXPORT_SYMBOL_GPL(vfio_file_is_valid);
 
@@ -1140,15 +1150,37 @@ EXPORT_SYMBOL_GPL(vfio_file_is_valid);
  */
 bool vfio_file_enforced_coherent(struct file *file)
 {
-	struct vfio_group *group = vfio_group_from_file(file);
+	struct vfio_group *group;
+	struct vfio_device *device;
 
+	group = vfio_group_from_file(file);
 	if (group)
 		return vfio_group_enforced_coherent(group);
 
+	device = vfio_device_from_file(file);
+	if (device)
+		return device_iommu_capable(device->dev,
+					    IOMMU_CAP_ENFORCE_CACHE_COHERENCY);
+
 	return true;
 }
 EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
 
+static void vfio_device_file_set_kvm(struct file *file, struct kvm *kvm)
+{
+	struct vfio_device_file *df = file->private_data;
+	struct vfio_device *device = df->device;
+
+	/*
+	 * The kvm is first recorded in the df, and will be propagated
+	 * to vfio_device::kvm when the file binds iommufd successfully in
+	 * the vfio device cdev path.
+	 */
+	mutex_lock(&device->dev_set->lock);
+	df->kvm = kvm;
+	mutex_unlock(&device->dev_set->lock);
+}
+
 /**
  * vfio_file_set_kvm - Link a kvm with VFIO drivers
  * @file: VFIO group file or device file
@@ -1157,10 +1189,14 @@ EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
  */
 void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
 {
-	struct vfio_group *group = vfio_group_from_file(file);
+	struct vfio_group *group;
 
+	group = vfio_group_from_file(file);
 	if (group)
 		vfio_group_set_kvm(group, kvm);
+
+	if (vfio_device_from_file(file))
+		vfio_device_file_set_kvm(file, kvm);
 }
 EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
 
@@ -1173,10 +1209,17 @@ EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
  */
 bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
 {
-	struct vfio_group *group = vfio_group_from_file(file);
+	struct vfio_group *group;
+	struct vfio_device *vdev;
 
+	group = vfio_group_from_file(file);
 	if (group)
 		return vfio_group_has_dev(group, device);
+
+	vdev = vfio_device_from_file(file);
+	if (device)
+		return vdev == device;
+
 	return false;
 }
 EXPORT_SYMBOL_GPL(vfio_file_has_dev);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 04/13] kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device fd
  2023-01-17 13:49 [PATCH 00/13] Add vfio_device cdev for iommufd support Yi Liu
                   ` (2 preceding siblings ...)
  2023-01-17 13:49 ` [PATCH 03/13] vfio: Accept vfio device file in the driver facing kAPI Yi Liu
@ 2023-01-17 13:49 ` Yi Liu
  2023-01-18  8:47   ` Tian, Kevin
  2023-01-18 16:33   ` Eric Auger
  2023-01-17 13:49 ` [PATCH 05/13] kvm/vfio: Provide struct kvm_device_ops::release() insted of ::destroy() Yi Liu
                   ` (8 subsequent siblings)
  12 siblings, 2 replies; 80+ messages in thread
From: Yi Liu @ 2023-01-17 13:49 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kevin.tian, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	suravee.suthikulpanit

Meanwhile, rename related helpers. No functional change is intended.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 virt/kvm/vfio.c | 115 ++++++++++++++++++++++++------------------------
 1 file changed, 58 insertions(+), 57 deletions(-)

diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 868930c7a59b..0f54b9d308d7 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -21,7 +21,7 @@
 #include <asm/kvm_ppc.h>
 #endif
 
-struct kvm_vfio_group {
+struct kvm_vfio_file {
 	struct list_head node;
 	struct file *file;
 #ifdef CONFIG_SPAPR_TCE_IOMMU
@@ -30,7 +30,7 @@ struct kvm_vfio_group {
 };
 
 struct kvm_vfio {
-	struct list_head group_list;
+	struct list_head file_list;
 	struct mutex lock;
 	bool noncoherent;
 };
@@ -98,34 +98,35 @@ static struct iommu_group *kvm_vfio_file_iommu_group(struct file *file)
 }
 
 static void kvm_spapr_tce_release_vfio_group(struct kvm *kvm,
-					     struct kvm_vfio_group *kvg)
+					     struct kvm_vfio_file *kvf)
 {
-	if (WARN_ON_ONCE(!kvg->iommu_group))
+	if (WARN_ON_ONCE(!kvf->iommu_group))
 		return;
 
-	kvm_spapr_tce_release_iommu_group(kvm, kvg->iommu_group);
-	iommu_group_put(kvg->iommu_group);
-	kvg->iommu_group = NULL;
+	kvm_spapr_tce_release_iommu_group(kvm, kvf->iommu_group);
+	iommu_group_put(kvf->iommu_group);
+	kvf->iommu_group = NULL;
 }
 #endif
 
 /*
- * Groups can use the same or different IOMMU domains.  If the same then
- * adding a new group may change the coherency of groups we've previously
- * been told about.  We don't want to care about any of that so we retest
- * each group and bail as soon as we find one that's noncoherent.  This
- * means we only ever [un]register_noncoherent_dma once for the whole device.
+ * Groups/devices can use the same or different IOMMU domains.  If the same
+ * then adding a new group/device may change the coherency of groups/devices
+ * we've previously been told about.  We don't want to care about any of
+ * that so we retest each group/device and bail as soon as we find one that's
+ * noncoherent.  This means we only ever [un]register_noncoherent_dma once
+ * for the whole device.
  */
 static void kvm_vfio_update_coherency(struct kvm_device *dev)
 {
 	struct kvm_vfio *kv = dev->private;
 	bool noncoherent = false;
-	struct kvm_vfio_group *kvg;
+	struct kvm_vfio_file *kvf;
 
 	mutex_lock(&kv->lock);
 
-	list_for_each_entry(kvg, &kv->group_list, node) {
-		if (!kvm_vfio_file_enforced_coherent(kvg->file)) {
+	list_for_each_entry(kvf, &kv->file_list, node) {
+		if (!kvm_vfio_file_enforced_coherent(kvf->file)) {
 			noncoherent = true;
 			break;
 		}
@@ -143,10 +144,10 @@ static void kvm_vfio_update_coherency(struct kvm_device *dev)
 	mutex_unlock(&kv->lock);
 }
 
-static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
+static int kvm_vfio_file_add(struct kvm_device *dev, unsigned int fd)
 {
 	struct kvm_vfio *kv = dev->private;
-	struct kvm_vfio_group *kvg;
+	struct kvm_vfio_file *kvf;
 	struct file *filp;
 	int ret;
 
@@ -162,27 +163,27 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
 
 	mutex_lock(&kv->lock);
 
-	list_for_each_entry(kvg, &kv->group_list, node) {
-		if (kvg->file == filp) {
+	list_for_each_entry(kvf, &kv->file_list, node) {
+		if (kvf->file == filp) {
 			ret = -EEXIST;
 			goto err_unlock;
 		}
 	}
 
-	kvg = kzalloc(sizeof(*kvg), GFP_KERNEL_ACCOUNT);
-	if (!kvg) {
+	kvf = kzalloc(sizeof(*kvf), GFP_KERNEL_ACCOUNT);
+	if (!kvf) {
 		ret = -ENOMEM;
 		goto err_unlock;
 	}
 
-	kvg->file = filp;
-	list_add_tail(&kvg->node, &kv->group_list);
+	kvf->file = filp;
+	list_add_tail(&kvf->node, &kv->file_list);
 
 	kvm_arch_start_assignment(dev->kvm);
 
 	mutex_unlock(&kv->lock);
 
-	kvm_vfio_file_set_kvm(kvg->file, dev->kvm);
+	kvm_vfio_file_set_kvm(kvf->file, dev->kvm);
 	kvm_vfio_update_coherency(dev);
 
 	return 0;
@@ -193,10 +194,10 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
 	return ret;
 }
 
-static int kvm_vfio_group_del(struct kvm_device *dev, unsigned int fd)
+static int kvm_vfio_file_del(struct kvm_device *dev, unsigned int fd)
 {
 	struct kvm_vfio *kv = dev->private;
-	struct kvm_vfio_group *kvg;
+	struct kvm_vfio_file *kvf;
 	struct fd f;
 	int ret;
 
@@ -208,18 +209,18 @@ static int kvm_vfio_group_del(struct kvm_device *dev, unsigned int fd)
 
 	mutex_lock(&kv->lock);
 
-	list_for_each_entry(kvg, &kv->group_list, node) {
-		if (kvg->file != f.file)
+	list_for_each_entry(kvf, &kv->file_list, node) {
+		if (kvf->file != f.file)
 			continue;
 
-		list_del(&kvg->node);
+		list_del(&kvf->node);
 		kvm_arch_end_assignment(dev->kvm);
 #ifdef CONFIG_SPAPR_TCE_IOMMU
-		kvm_spapr_tce_release_vfio_group(dev->kvm, kvg);
+		kvm_spapr_tce_release_vfio_group(dev->kvm, kvf);
 #endif
-		kvm_vfio_file_set_kvm(kvg->file, NULL);
-		fput(kvg->file);
-		kfree(kvg);
+		kvm_vfio_file_set_kvm(kvf->file, NULL);
+		fput(kvf->file);
+		kfree(kvf);
 		ret = 0;
 		break;
 	}
@@ -234,12 +235,12 @@ static int kvm_vfio_group_del(struct kvm_device *dev, unsigned int fd)
 }
 
 #ifdef CONFIG_SPAPR_TCE_IOMMU
-static int kvm_vfio_group_set_spapr_tce(struct kvm_device *dev,
-					void __user *arg)
+static int kvm_vfio_file_set_spapr_tce(struct kvm_device *dev,
+				       void __user *arg)
 {
 	struct kvm_vfio_spapr_tce param;
 	struct kvm_vfio *kv = dev->private;
-	struct kvm_vfio_group *kvg;
+	struct kvm_vfio_file *kvf;
 	struct fd f;
 	int ret;
 
@@ -254,20 +255,20 @@ static int kvm_vfio_group_set_spapr_tce(struct kvm_device *dev,
 
 	mutex_lock(&kv->lock);
 
-	list_for_each_entry(kvg, &kv->group_list, node) {
-		if (kvg->file != f.file)
+	list_for_each_entry(kvf, &kv->file_list, node) {
+		if (kvf->file != f.file)
 			continue;
 
-		if (!kvg->iommu_group) {
-			kvg->iommu_group = kvm_vfio_file_iommu_group(kvg->file);
-			if (WARN_ON_ONCE(!kvg->iommu_group)) {
+		if (!kvf->iommu_group) {
+			kvf->iommu_group = kvm_vfio_file_iommu_group(kvf->file);
+			if (WARN_ON_ONCE(!kvf->iommu_group)) {
 				ret = -EIO;
 				goto err_fdput;
 			}
 		}
 
 		ret = kvm_spapr_tce_attach_iommu_group(dev->kvm, param.tablefd,
-						       kvg->iommu_group);
+						       kvf->iommu_group);
 		break;
 	}
 
@@ -278,8 +279,8 @@ static int kvm_vfio_group_set_spapr_tce(struct kvm_device *dev,
 }
 #endif
 
-static int kvm_vfio_set_group(struct kvm_device *dev, long attr,
-			      void __user *arg)
+static int kvm_vfio_set_file(struct kvm_device *dev, long attr,
+			     void __user *arg)
 {
 	int32_t __user *argp = arg;
 	int32_t fd;
@@ -288,16 +289,16 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr,
 	case KVM_DEV_VFIO_GROUP_ADD:
 		if (get_user(fd, argp))
 			return -EFAULT;
-		return kvm_vfio_group_add(dev, fd);
+		return kvm_vfio_file_add(dev, fd);
 
 	case KVM_DEV_VFIO_GROUP_DEL:
 		if (get_user(fd, argp))
 			return -EFAULT;
-		return kvm_vfio_group_del(dev, fd);
+		return kvm_vfio_file_del(dev, fd);
 
 #ifdef CONFIG_SPAPR_TCE_IOMMU
 	case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE:
-		return kvm_vfio_group_set_spapr_tce(dev, arg);
+		return kvm_vfio_file_set_spapr_tce(dev, arg);
 #endif
 	}
 
@@ -309,8 +310,8 @@ static int kvm_vfio_set_attr(struct kvm_device *dev,
 {
 	switch (attr->group) {
 	case KVM_DEV_VFIO_GROUP:
-		return kvm_vfio_set_group(dev, attr->attr,
-					  u64_to_user_ptr(attr->addr));
+		return kvm_vfio_set_file(dev, attr->attr,
+					 u64_to_user_ptr(attr->addr));
 	}
 
 	return -ENXIO;
@@ -339,16 +340,16 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
 static void kvm_vfio_destroy(struct kvm_device *dev)
 {
 	struct kvm_vfio *kv = dev->private;
-	struct kvm_vfio_group *kvg, *tmp;
+	struct kvm_vfio_file *kvf, *tmp;
 
-	list_for_each_entry_safe(kvg, tmp, &kv->group_list, node) {
+	list_for_each_entry_safe(kvf, tmp, &kv->file_list, node) {
 #ifdef CONFIG_SPAPR_TCE_IOMMU
-		kvm_spapr_tce_release_vfio_group(dev->kvm, kvg);
+		kvm_spapr_tce_release_vfio_group(dev->kvm, kvf);
 #endif
-		kvm_vfio_file_set_kvm(kvg->file, NULL);
-		fput(kvg->file);
-		list_del(&kvg->node);
-		kfree(kvg);
+		kvm_vfio_file_set_kvm(kvf->file, NULL);
+		fput(kvf->file);
+		list_del(&kvf->node);
+		kfree(kvf);
 		kvm_arch_end_assignment(dev->kvm);
 	}
 
@@ -382,7 +383,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
 	if (!kv)
 		return -ENOMEM;
 
-	INIT_LIST_HEAD(&kv->group_list);
+	INIT_LIST_HEAD(&kv->file_list);
 	mutex_init(&kv->lock);
 
 	dev->private = kv;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 05/13] kvm/vfio: Provide struct kvm_device_ops::release() insted of ::destroy()
  2023-01-17 13:49 [PATCH 00/13] Add vfio_device cdev for iommufd support Yi Liu
                   ` (3 preceding siblings ...)
  2023-01-17 13:49 ` [PATCH 04/13] kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device fd Yi Liu
@ 2023-01-17 13:49 ` Yi Liu
  2023-01-18  8:56   ` Tian, Kevin
                     ` (2 more replies)
  2023-01-17 13:49 ` [PATCH 06/13] kvm/vfio: Accept vfio device file from userspace Yi Liu
                   ` (7 subsequent siblings)
  12 siblings, 3 replies; 80+ messages in thread
From: Yi Liu @ 2023-01-17 13:49 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kevin.tian, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	suravee.suthikulpanit

This is to avoid a circular refcount problem between the kvm struct and
the device file. KVM modules holds device/group file reference when the
device/group is added and releases it per removal or the last kvm reference
is released. This reference model is ok for the group since there is no
kvm reference in the group paths.

But it is a problem for device file since the vfio devices may get kvm
reference in the device open path and put it in the device file release.
e.g. Intel kvmgt. This would result in a circular issue since the kvm
side won't put the device file reference if kvm reference is not 0, while
the vfio device side needs to put kvm reference in the release callback.

To solve this problem for device file, let vfio provide release() which
would be called once kvm file is closed, it won't depend on the last kvm
reference. Hence avoid circular refcount problem.

Suggested-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 virt/kvm/vfio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 0f54b9d308d7..525efe37ab6d 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -364,7 +364,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type);
 static struct kvm_device_ops kvm_vfio_ops = {
 	.name = "kvm-vfio",
 	.create = kvm_vfio_create,
-	.destroy = kvm_vfio_destroy,
+	.release = kvm_vfio_destroy,
 	.set_attr = kvm_vfio_set_attr,
 	.has_attr = kvm_vfio_has_attr,
 };
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 06/13] kvm/vfio: Accept vfio device file from userspace
  2023-01-17 13:49 [PATCH 00/13] Add vfio_device cdev for iommufd support Yi Liu
                   ` (4 preceding siblings ...)
  2023-01-17 13:49 ` [PATCH 05/13] kvm/vfio: Provide struct kvm_device_ops::release() insted of ::destroy() Yi Liu
@ 2023-01-17 13:49 ` Yi Liu
  2023-01-18  9:18   ` Tian, Kevin
  2023-01-19  9:35   ` Eric Auger
  2023-01-17 13:49 ` [PATCH 07/13] vfio: Pass struct vfio_device_file * to vfio_device_open/close() Yi Liu
                   ` (6 subsequent siblings)
  12 siblings, 2 replies; 80+ messages in thread
From: Yi Liu @ 2023-01-17 13:49 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kevin.tian, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	suravee.suthikulpanit

This defines KVM_DEV_VFIO_FILE* and make alias with KVM_DEV_VFIO_GROUP*.
Old userspace uses KVM_DEV_VFIO_GROUP* works as well.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 Documentation/virt/kvm/devices/vfio.rst | 32 ++++++++++++-------------
 include/uapi/linux/kvm.h                | 23 +++++++++++++-----
 virt/kvm/vfio.c                         | 18 +++++++-------
 3 files changed, 42 insertions(+), 31 deletions(-)

diff --git a/Documentation/virt/kvm/devices/vfio.rst b/Documentation/virt/kvm/devices/vfio.rst
index 2d20dc561069..ac4300ded398 100644
--- a/Documentation/virt/kvm/devices/vfio.rst
+++ b/Documentation/virt/kvm/devices/vfio.rst
@@ -9,23 +9,23 @@ Device types supported:
   - KVM_DEV_TYPE_VFIO
 
 Only one VFIO instance may be created per VM.  The created device
-tracks VFIO groups in use by the VM and features of those groups
-important to the correctness and acceleration of the VM.  As groups
-are enabled and disabled for use by the VM, KVM should be updated
-about their presence.  When registered with KVM, a reference to the
-VFIO-group is held by KVM.
-
-Groups:
-  KVM_DEV_VFIO_GROUP
-
-KVM_DEV_VFIO_GROUP attributes:
-  KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
-	kvm_device_attr.addr points to an int32_t file descriptor
+tracks VFIO files (group or device) in use by the VM and features
+of those groups/devices important to the correctness and acceleration
+of the VM.  As groups/device are enabled and disabled for use by the
+VM, KVM should be updated about their presence.  When registered with
+KVM, a reference to the VFIO file is held by KVM.
+
+VFIO Files:
+  KVM_DEV_VFIO_FILE
+
+KVM_DEV_VFIO_FILE attributes:
+  KVM_DEV_VFIO_FILE_ADD: Add a VFIO file (group/device) to VFIO-KVM device
+	tracking kvm_device_attr.addr points to an int32_t file descriptor
+	for the VFIO file.
+  KVM_DEV_VFIO_FILE_DEL: Remove a VFIO file (group/device) from VFIO-KVM device
+	tracking kvm_device_attr.addr points to an int32_t file descriptor
 	for the VFIO group.
-  KVM_DEV_VFIO_GROUP_DEL: Remove a VFIO group from VFIO-KVM device tracking
-	kvm_device_attr.addr points to an int32_t file descriptor
-	for the VFIO group.
-  KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE: attaches a guest visible TCE table
+  KVM_DEV_VFIO_FILE_SET_SPAPR_TCE: attaches a guest visible TCE table
 	allocated by sPAPR KVM.
 	kvm_device_attr.addr points to a struct::
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 55155e262646..ad36e144a41d 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1396,15 +1396,26 @@ struct kvm_create_device {
 
 struct kvm_device_attr {
 	__u32	flags;		/* no flags currently defined */
-	__u32	group;		/* device-defined */
-	__u64	attr;		/* group-defined */
+	union {
+		__u32	group;
+		__u32	file;
+	}; /* device-defined */
+	__u64	attr;		/* VFIO-file-defined or group-defined */
 	__u64	addr;		/* userspace address of attr data */
 };
 
-#define  KVM_DEV_VFIO_GROUP			1
-#define   KVM_DEV_VFIO_GROUP_ADD			1
-#define   KVM_DEV_VFIO_GROUP_DEL			2
-#define   KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE		3
+#define  KVM_DEV_VFIO_FILE	1
+
+#define   KVM_DEV_VFIO_FILE_ADD			1
+#define   KVM_DEV_VFIO_FILE_DEL			2
+#define   KVM_DEV_VFIO_FILE_SET_SPAPR_TCE	3
+
+/* Group aliases are for compile time uapi compatibility */
+#define  KVM_DEV_VFIO_GROUP	KVM_DEV_VFIO_FILE
+
+#define   KVM_DEV_VFIO_GROUP_ADD	KVM_DEV_VFIO_FILE_ADD
+#define   KVM_DEV_VFIO_GROUP_DEL	KVM_DEV_VFIO_FILE_DEL
+#define   KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE	KVM_DEV_VFIO_FILE_SET_SPAPR_TCE
 
 enum kvm_device_type {
 	KVM_DEV_TYPE_FSL_MPIC_20	= 1,
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 525efe37ab6d..e73ca60af3ae 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -286,18 +286,18 @@ static int kvm_vfio_set_file(struct kvm_device *dev, long attr,
 	int32_t fd;
 
 	switch (attr) {
-	case KVM_DEV_VFIO_GROUP_ADD:
+	case KVM_DEV_VFIO_FILE_ADD:
 		if (get_user(fd, argp))
 			return -EFAULT;
 		return kvm_vfio_file_add(dev, fd);
 
-	case KVM_DEV_VFIO_GROUP_DEL:
+	case KVM_DEV_VFIO_FILE_DEL:
 		if (get_user(fd, argp))
 			return -EFAULT;
 		return kvm_vfio_file_del(dev, fd);
 
 #ifdef CONFIG_SPAPR_TCE_IOMMU
-	case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE:
+	case KVM_DEV_VFIO_FILE_SET_SPAPR_TCE:
 		return kvm_vfio_file_set_spapr_tce(dev, arg);
 #endif
 	}
@@ -309,7 +309,7 @@ static int kvm_vfio_set_attr(struct kvm_device *dev,
 			     struct kvm_device_attr *attr)
 {
 	switch (attr->group) {
-	case KVM_DEV_VFIO_GROUP:
+	case KVM_DEV_VFIO_FILE:
 		return kvm_vfio_set_file(dev, attr->attr,
 					 u64_to_user_ptr(attr->addr));
 	}
@@ -320,13 +320,13 @@ static int kvm_vfio_set_attr(struct kvm_device *dev,
 static int kvm_vfio_has_attr(struct kvm_device *dev,
 			     struct kvm_device_attr *attr)
 {
-	switch (attr->group) {
-	case KVM_DEV_VFIO_GROUP:
+	switch (attr->file) {
+	case KVM_DEV_VFIO_FILE:
 		switch (attr->attr) {
-		case KVM_DEV_VFIO_GROUP_ADD:
-		case KVM_DEV_VFIO_GROUP_DEL:
+		case KVM_DEV_VFIO_FILE_ADD:
+		case KVM_DEV_VFIO_FILE_DEL:
 #ifdef CONFIG_SPAPR_TCE_IOMMU
-		case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE:
+		case KVM_DEV_VFIO_FILE_SET_SPAPR_TCE:
 #endif
 			return 0;
 		}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 07/13] vfio: Pass struct vfio_device_file * to vfio_device_open/close()
  2023-01-17 13:49 [PATCH 00/13] Add vfio_device cdev for iommufd support Yi Liu
                   ` (5 preceding siblings ...)
  2023-01-17 13:49 ` [PATCH 06/13] kvm/vfio: Accept vfio device file from userspace Yi Liu
@ 2023-01-17 13:49 ` Yi Liu
  2023-01-18  9:27   ` Tian, Kevin
  2023-01-19 11:01   ` Eric Auger
  2023-01-17 13:49 ` [PATCH 08/13] vfio: Block device access via device fd until device is opened Yi Liu
                   ` (5 subsequent siblings)
  12 siblings, 2 replies; 80+ messages in thread
From: Yi Liu @ 2023-01-17 13:49 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kevin.tian, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	suravee.suthikulpanit

This avoids passing struct kvm * and struct iommufd_ctx * in multiple
functions. vfio_device_open() becomes to be a locked helper.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c     | 34 +++++++++++++++++++++++++---------
 drivers/vfio/vfio.h      | 10 +++++-----
 drivers/vfio/vfio_main.c | 40 ++++++++++++++++++++++++----------------
 3 files changed, 54 insertions(+), 30 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index d83cf069d290..7200304663e5 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -154,33 +154,49 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
 	return ret;
 }
 
-static int vfio_device_group_open(struct vfio_device *device)
+static int vfio_device_group_open(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
 	int ret;
 
 	mutex_lock(&device->group->group_lock);
 	if (!vfio_group_has_iommu(device->group)) {
 		ret = -EINVAL;
-		goto out_unlock;
+		goto err_unlock_group;
 	}
 
+	mutex_lock(&device->dev_set->lock);
 	/*
 	 * Here we pass the KVM pointer with the group under the lock.  If the
 	 * device driver will use it, it must obtain a reference and release it
 	 * during close_device.
 	 */
-	ret = vfio_device_open(device, device->group->iommufd,
-			       device->group->kvm);
+	df->kvm = device->group->kvm;
+	df->iommufd = device->group->iommufd;
+
+	ret = vfio_device_open(df);
+	if (ret)
+		goto err_unlock_device;
+	mutex_unlock(&device->dev_set->lock);
 
-out_unlock:
+	mutex_unlock(&device->group->group_lock);
+	return 0;
+
+err_unlock_device:
+	df->kvm = NULL;
+	df->iommufd = NULL;
+	mutex_unlock(&device->dev_set->lock);
+err_unlock_group:
 	mutex_unlock(&device->group->group_lock);
 	return ret;
 }
 
-void vfio_device_group_close(struct vfio_device *device)
+void vfio_device_group_close(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
+
 	mutex_lock(&device->group->group_lock);
-	vfio_device_close(device, device->group->iommufd);
+	vfio_device_close(df);
 	mutex_unlock(&device->group->group_lock);
 }
 
@@ -196,7 +212,7 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
 		goto err_out;
 	}
 
-	ret = vfio_device_group_open(device);
+	ret = vfio_device_group_open(df);
 	if (ret)
 		goto err_free;
 
@@ -228,7 +244,7 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
 	return filep;
 
 err_close_device:
-	vfio_device_group_close(device);
+	vfio_device_group_close(df);
 err_free:
 	kfree(df);
 err_out:
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 53af6e3ea214..3d8ba165146c 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -19,14 +19,14 @@ struct vfio_container;
 struct vfio_device_file {
 	struct vfio_device *device;
 	struct kvm *kvm;
+	struct iommufd_ctx *iommufd;
 };
 
 void vfio_device_put_registration(struct vfio_device *device);
 bool vfio_device_try_get_registration(struct vfio_device *device);
-int vfio_device_open(struct vfio_device *device,
-		     struct iommufd_ctx *iommufd, struct kvm *kvm);
-void vfio_device_close(struct vfio_device *device,
-		       struct iommufd_ctx *iommufd);
+int vfio_device_open(struct vfio_device_file *df);
+void vfio_device_close(struct vfio_device_file *device);
+
 struct vfio_device_file *
 vfio_allocate_device_file(struct vfio_device *device);
 
@@ -90,7 +90,7 @@ void vfio_device_group_register(struct vfio_device *device);
 void vfio_device_group_unregister(struct vfio_device *device);
 int vfio_device_group_use_iommu(struct vfio_device *device);
 void vfio_device_group_unuse_iommu(struct vfio_device *device);
-void vfio_device_group_close(struct vfio_device *device);
+void vfio_device_group_close(struct vfio_device_file *df);
 struct vfio_group *vfio_group_from_file(struct file *file);
 bool vfio_group_enforced_coherent(struct vfio_group *group);
 void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index dc08d5dd62cc..3df71bd9cd1e 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -358,9 +358,11 @@ vfio_allocate_device_file(struct vfio_device *device)
 	return df;
 }
 
-static int vfio_device_first_open(struct vfio_device *device,
-				  struct iommufd_ctx *iommufd, struct kvm *kvm)
+static int vfio_device_first_open(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
+	struct iommufd_ctx *iommufd = df->iommufd;
+	struct kvm *kvm = df->kvm;
 	int ret;
 
 	lockdep_assert_held(&device->dev_set->lock);
@@ -394,9 +396,11 @@ static int vfio_device_first_open(struct vfio_device *device,
 	return ret;
 }
 
-static void vfio_device_last_close(struct vfio_device *device,
-				   struct iommufd_ctx *iommufd)
+static void vfio_device_last_close(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
+	struct iommufd_ctx *iommufd = df->iommufd;
+
 	lockdep_assert_held(&device->dev_set->lock);
 
 	if (device->ops->close_device)
@@ -409,30 +413,34 @@ static void vfio_device_last_close(struct vfio_device *device,
 	module_put(device->dev->driver->owner);
 }
 
-int vfio_device_open(struct vfio_device *device,
-		     struct iommufd_ctx *iommufd, struct kvm *kvm)
+int vfio_device_open(struct vfio_device_file *df)
 {
-	int ret = 0;
+	struct vfio_device *device = df->device;
+
+	lockdep_assert_held(&device->dev_set->lock);
 
-	mutex_lock(&device->dev_set->lock);
 	device->open_count++;
 	if (device->open_count == 1) {
-		ret = vfio_device_first_open(device, iommufd, kvm);
-		if (ret)
+		int ret;
+
+		ret = vfio_device_first_open(df);
+		if (ret) {
 			device->open_count--;
+			return ret;
+		}
 	}
-	mutex_unlock(&device->dev_set->lock);
 
-	return ret;
+	return 0;
 }
 
-void vfio_device_close(struct vfio_device *device,
-		       struct iommufd_ctx *iommufd)
+void vfio_device_close(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
+
 	mutex_lock(&device->dev_set->lock);
 	vfio_assert_device_open(device);
 	if (device->open_count == 1)
-		vfio_device_last_close(device, iommufd);
+		vfio_device_last_close(df);
 	device->open_count--;
 	mutex_unlock(&device->dev_set->lock);
 }
@@ -478,7 +486,7 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
-	vfio_device_group_close(device);
+	vfio_device_group_close(df);
 
 	vfio_device_put_registration(device);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 08/13] vfio: Block device access via device fd until device is opened
  2023-01-17 13:49 [PATCH 00/13] Add vfio_device cdev for iommufd support Yi Liu
                   ` (6 preceding siblings ...)
  2023-01-17 13:49 ` [PATCH 07/13] vfio: Pass struct vfio_device_file * to vfio_device_open/close() Yi Liu
@ 2023-01-17 13:49 ` Yi Liu
  2023-01-18  9:35   ` Tian, Kevin
                     ` (2 more replies)
  2023-01-17 13:49 ` [PATCH 09/13] vfio: Add infrastructure for bind_iommufd and attach Yi Liu
                   ` (4 subsequent siblings)
  12 siblings, 3 replies; 80+ messages in thread
From: Yi Liu @ 2023-01-17 13:49 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kevin.tian, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	suravee.suthikulpanit

Allow the vfio_device file to be in a state where the device FD is
opened but the device cannot be used by userspace (i.e. its .open_device()
hasn't been called). This inbetween state is not used when the device
FD is spawned from the group FD, however when we create the device FD
directly by opening a cdev it will be opened in the blocked state.

In the blocked state, currently only the bind operation is allowed,
other device accesses are not allowed. Completing bind will allow user
to further access the device.

This is implemented by adding a flag in struct vfio_device_file to mark
the blocked state and using a simple smp_load_acquire() to obtain the
flag value and serialize all the device setup with the thread accessing
this device.

Due to this scheme it is not possible to unbind the FD, once it is bound,
it remains bound until the FD is closed.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/vfio.h      |  1 +
 drivers/vfio/vfio_main.c | 29 +++++++++++++++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 3d8ba165146c..c69a9902ea84 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -20,6 +20,7 @@ struct vfio_device_file {
 	struct vfio_device *device;
 	struct kvm *kvm;
 	struct iommufd_ctx *iommufd;
+	bool access_granted;
 };
 
 void vfio_device_put_registration(struct vfio_device *device);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 3df71bd9cd1e..d442ebaa4b21 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -430,6 +430,11 @@ int vfio_device_open(struct vfio_device_file *df)
 		}
 	}
 
+	/*
+	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
+	 * read/write/mmap
+	 */
+	smp_store_release(&df->access_granted, true);
 	return 0;
 }
 
@@ -1058,8 +1063,14 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
 {
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
+	bool access;
 	int ret;
 
+	/* Paired with smp_store_release() in vfio_device_open() */
+	access = smp_load_acquire(&df->access_granted);
+	if (!access)
+		return -EINVAL;
+
 	ret = vfio_device_pm_runtime_get(device);
 	if (ret)
 		return ret;
@@ -1086,6 +1097,12 @@ static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf,
 {
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
+	bool access;
+
+	/* Paired with smp_store_release() in vfio_device_open() */
+	access = smp_load_acquire(&df->access_granted);
+	if (!access)
+		return -EINVAL;
 
 	if (unlikely(!device->ops->read))
 		return -EINVAL;
@@ -1099,6 +1116,12 @@ static ssize_t vfio_device_fops_write(struct file *filep,
 {
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
+	bool access;
+
+	/* Paired with smp_store_release() in vfio_device_open() */
+	access = smp_load_acquire(&df->access_granted);
+	if (!access)
+		return -EINVAL;
 
 	if (unlikely(!device->ops->write))
 		return -EINVAL;
@@ -1110,6 +1133,12 @@ static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
 {
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
+	bool access;
+
+	/* Paired with smp_store_release() in vfio_device_open() */
+	access = smp_load_acquire(&df->access_granted);
+	if (!access)
+		return -EINVAL;
 
 	if (unlikely(!device->ops->mmap))
 		return -EINVAL;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 09/13] vfio: Add infrastructure for bind_iommufd and attach
  2023-01-17 13:49 [PATCH 00/13] Add vfio_device cdev for iommufd support Yi Liu
                   ` (7 preceding siblings ...)
  2023-01-17 13:49 ` [PATCH 08/13] vfio: Block device access via device fd until device is opened Yi Liu
@ 2023-01-17 13:49 ` Yi Liu
  2023-01-19  9:45   ` Tian, Kevin
  2023-01-19 23:05   ` Alex Williamson
  2023-01-17 13:49 ` [PATCH 10/13] vfio: Make vfio_device_open() exclusive between group path and device cdev path Yi Liu
                   ` (3 subsequent siblings)
  12 siblings, 2 replies; 80+ messages in thread
From: Yi Liu @ 2023-01-17 13:49 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kevin.tian, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	suravee.suthikulpanit

This prepares to add ioctls for device cdev fd. This infrastructure includes:
    - add vfio_iommufd_attach() to support iommufd pgtable attach after
      bind_iommufd. A NULL pt_id indicates detach.
    - let vfio_iommufd_bind() to accept pt_id, e.g. the comapt_ioas_id in the
      legacy group path, and also return back dev_id if caller requires it.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c     | 12 +++++-
 drivers/vfio/iommufd.c   | 79 ++++++++++++++++++++++++++++++----------
 drivers/vfio/vfio.h      | 15 ++++++--
 drivers/vfio/vfio_main.c | 10 +++--
 4 files changed, 88 insertions(+), 28 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 7200304663e5..9484bb1c54a9 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -157,6 +157,8 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
 static int vfio_device_group_open(struct vfio_device_file *df)
 {
 	struct vfio_device *device = df->device;
+	u32 ioas_id;
+	u32 *pt_id = NULL;
 	int ret;
 
 	mutex_lock(&device->group->group_lock);
@@ -165,6 +167,14 @@ static int vfio_device_group_open(struct vfio_device_file *df)
 		goto err_unlock_group;
 	}
 
+	if (device->group->iommufd) {
+		ret = iommufd_vfio_compat_ioas_id(device->group->iommufd,
+						  &ioas_id);
+		if (ret)
+			goto err_unlock_group;
+		pt_id = &ioas_id;
+	}
+
 	mutex_lock(&device->dev_set->lock);
 	/*
 	 * Here we pass the KVM pointer with the group under the lock.  If the
@@ -174,7 +184,7 @@ static int vfio_device_group_open(struct vfio_device_file *df)
 	df->kvm = device->group->kvm;
 	df->iommufd = device->group->iommufd;
 
-	ret = vfio_device_open(df);
+	ret = vfio_device_open(df, NULL, pt_id);
 	if (ret)
 		goto err_unlock_device;
 	mutex_unlock(&device->dev_set->lock);
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index 4f82a6fa7c6c..412644fdbf16 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -10,9 +10,17 @@
 MODULE_IMPORT_NS(IOMMUFD);
 MODULE_IMPORT_NS(IOMMUFD_VFIO);
 
-int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
+/* @pt_id == NULL implies detach */
+int vfio_iommufd_attach(struct vfio_device *vdev, u32 *pt_id)
+{
+	lockdep_assert_held(&vdev->dev_set->lock);
+
+	return vdev->ops->attach_ioas(vdev, pt_id);
+}
+
+int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx,
+		      u32 *dev_id, u32 *pt_id)
 {
-	u32 ioas_id;
 	u32 device_id;
 	int ret;
 
@@ -29,17 +37,14 @@ int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
 	if (ret)
 		return ret;
 
-	ret = iommufd_vfio_compat_ioas_id(ictx, &ioas_id);
-	if (ret)
-		goto err_unbind;
-	ret = vdev->ops->attach_ioas(vdev, &ioas_id);
-	if (ret)
-		goto err_unbind;
+	if (pt_id) {
+		ret = vfio_iommufd_attach(vdev, pt_id);
+		if (ret)
+			goto err_unbind;
+	}
 
-	/*
-	 * The legacy path has no way to return the device id or the selected
-	 * pt_id
-	 */
+	if (dev_id)
+		*dev_id = device_id;
 	return 0;
 
 err_unbind:
@@ -74,14 +79,18 @@ int vfio_iommufd_physical_bind(struct vfio_device *vdev,
 }
 EXPORT_SYMBOL_GPL(vfio_iommufd_physical_bind);
 
+static void __vfio_iommufd_detach(struct vfio_device *vdev)
+{
+	iommufd_device_detach(vdev->iommufd_device);
+	vdev->iommufd_attached = false;
+}
+
 void vfio_iommufd_physical_unbind(struct vfio_device *vdev)
 {
 	lockdep_assert_held(&vdev->dev_set->lock);
 
-	if (vdev->iommufd_attached) {
-		iommufd_device_detach(vdev->iommufd_device);
-		vdev->iommufd_attached = false;
-	}
+	if (vdev->iommufd_attached)
+		__vfio_iommufd_detach(vdev);
 	iommufd_device_unbind(vdev->iommufd_device);
 	vdev->iommufd_device = NULL;
 }
@@ -91,6 +100,20 @@ int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
 {
 	int rc;
 
+	lockdep_assert_held(&vdev->dev_set->lock);
+
+	if (!vdev->iommufd_device)
+		return -EINVAL;
+
+	if (!pt_id) {
+		if (vdev->iommufd_attached)
+			__vfio_iommufd_detach(vdev);
+		return 0;
+	}
+
+	if (vdev->iommufd_attached)
+		return -EBUSY;
+
 	rc = iommufd_device_attach(vdev->iommufd_device, pt_id);
 	if (rc)
 		return rc;
@@ -129,14 +152,18 @@ int vfio_iommufd_emulated_bind(struct vfio_device *vdev,
 }
 EXPORT_SYMBOL_GPL(vfio_iommufd_emulated_bind);
 
+static void __vfio_iommufd_access_destroy(struct vfio_device *vdev)
+{
+	iommufd_access_destroy(vdev->iommufd_access);
+	vdev->iommufd_access = NULL;
+}
+
 void vfio_iommufd_emulated_unbind(struct vfio_device *vdev)
 {
 	lockdep_assert_held(&vdev->dev_set->lock);
 
-	if (vdev->iommufd_access) {
-		iommufd_access_destroy(vdev->iommufd_access);
-		vdev->iommufd_access = NULL;
-	}
+	if (vdev->iommufd_access)
+		__vfio_iommufd_access_destroy(vdev);
 	iommufd_ctx_put(vdev->iommufd_ictx);
 	vdev->iommufd_ictx = NULL;
 }
@@ -148,6 +175,18 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
 
 	lockdep_assert_held(&vdev->dev_set->lock);
 
+	if (!vdev->iommufd_ictx)
+		return -EINVAL;
+
+	if (!pt_id) {
+		if (vdev->iommufd_access)
+			__vfio_iommufd_access_destroy(vdev);
+		return 0;
+	}
+
+	if (vdev->iommufd_access)
+		return -EBUSY;
+
 	user = iommufd_access_create(vdev->iommufd_ictx, *pt_id, &vfio_user_ops,
 				     vdev);
 	if (IS_ERR(user))
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index c69a9902ea84..fe0fcfa78710 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -25,7 +25,8 @@ struct vfio_device_file {
 
 void vfio_device_put_registration(struct vfio_device *device);
 bool vfio_device_try_get_registration(struct vfio_device *device);
-int vfio_device_open(struct vfio_device_file *df);
+int vfio_device_open(struct vfio_device_file *df,
+		     u32 *dev_id, u32 *pt_id);
 void vfio_device_close(struct vfio_device_file *device);
 
 struct vfio_device_file *
@@ -230,11 +231,14 @@ static inline void vfio_container_cleanup(void)
 #endif
 
 #if IS_ENABLED(CONFIG_IOMMUFD)
-int vfio_iommufd_bind(struct vfio_device *device, struct iommufd_ctx *ictx);
+int vfio_iommufd_bind(struct vfio_device *device, struct iommufd_ctx *ictx,
+		      u32 *dev_id, u32 *pt_id);
 void vfio_iommufd_unbind(struct vfio_device *device);
+int vfio_iommufd_attach(struct vfio_device *vdev, u32 *pt_id);
 #else
 static inline int vfio_iommufd_bind(struct vfio_device *device,
-				    struct iommufd_ctx *ictx)
+				    struct iommufd_ctx *ictx,
+				    u32 *dev_id, u32 *pt_id)
 {
 	return -EOPNOTSUPP;
 }
@@ -242,6 +246,11 @@ static inline int vfio_iommufd_bind(struct vfio_device *device,
 static inline void vfio_iommufd_unbind(struct vfio_device *device)
 {
 }
+
+static inline int vfio_iommufd_attach(struct vfio_device *vdev, u32 *pt_id)
+{
+	return -EOPNOTSUPP;
+}
 #endif
 
 #if IS_ENABLED(CONFIG_VFIO_VIRQFD)
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index d442ebaa4b21..90174a9015c4 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -358,7 +358,8 @@ vfio_allocate_device_file(struct vfio_device *device)
 	return df;
 }
 
-static int vfio_device_first_open(struct vfio_device_file *df)
+static int vfio_device_first_open(struct vfio_device_file *df,
+				  u32 *dev_id, u32 *pt_id)
 {
 	struct vfio_device *device = df->device;
 	struct iommufd_ctx *iommufd = df->iommufd;
@@ -371,7 +372,7 @@ static int vfio_device_first_open(struct vfio_device_file *df)
 		return -ENODEV;
 
 	if (iommufd)
-		ret = vfio_iommufd_bind(device, iommufd);
+		ret = vfio_iommufd_bind(device, iommufd, dev_id, pt_id);
 	else
 		ret = vfio_device_group_use_iommu(device);
 	if (ret)
@@ -413,7 +414,8 @@ static void vfio_device_last_close(struct vfio_device_file *df)
 	module_put(device->dev->driver->owner);
 }
 
-int vfio_device_open(struct vfio_device_file *df)
+int vfio_device_open(struct vfio_device_file *df,
+		     u32 *dev_id, u32 *pt_id)
 {
 	struct vfio_device *device = df->device;
 
@@ -423,7 +425,7 @@ int vfio_device_open(struct vfio_device_file *df)
 	if (device->open_count == 1) {
 		int ret;
 
-		ret = vfio_device_first_open(df);
+		ret = vfio_device_first_open(df, dev_id, pt_id);
 		if (ret) {
 			device->open_count--;
 			return ret;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 10/13] vfio: Make vfio_device_open() exclusive between group path and device cdev path
  2023-01-17 13:49 [PATCH 00/13] Add vfio_device cdev for iommufd support Yi Liu
                   ` (8 preceding siblings ...)
  2023-01-17 13:49 ` [PATCH 09/13] vfio: Add infrastructure for bind_iommufd and attach Yi Liu
@ 2023-01-17 13:49 ` Yi Liu
  2023-01-19  9:55   ` Tian, Kevin
  2023-01-19 23:51   ` Alex Williamson
  2023-01-17 13:49 ` [PATCH 11/13] vfio: Add cdev for vfio_device Yi Liu
                   ` (2 subsequent siblings)
  12 siblings, 2 replies; 80+ messages in thread
From: Yi Liu @ 2023-01-17 13:49 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kevin.tian, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	suravee.suthikulpanit

VFIO group has historically allowed multi-open of the device FD. This
was made secure because the "open" was executed via an ioctl to the
group FD which is itself only single open.

No know use of multiple device FDs is known. It is kind of a strange
thing to do because new device FDs can naturally be created via dup().

When we implement the new device uAPI there is no natural way to allow
the device itself from being multi-opened in a secure manner. Without
the group FD we cannot prove the security context of the opener.

Thus, when moving to the new uAPI we block the ability to multi-open
the device. This also makes the cdev path exclusive with group path.

The main logic is in the vfio_device_open(). It needs to sustain both
the legacy behavior i.e. multi-open in the group path and the new
behavior i.e. single-open in the cdev path. This mixture leads to the
introduction of a new single_open flag stored both in struct vfio_device
and vfio_device_file. vfio_device_file::single_open is set per the
vfio_device_file allocation. Its value is propagated to struct vfio_device
after device is opened successfully.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c     |  2 +-
 drivers/vfio/vfio.h      |  6 +++++-
 drivers/vfio/vfio_main.c | 25 ++++++++++++++++++++++---
 include/linux/vfio.h     |  1 +
 4 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 9484bb1c54a9..57ebe5e1a7e6 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -216,7 +216,7 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
 	struct file *filep;
 	int ret;
 
-	df = vfio_allocate_device_file(device);
+	df = vfio_allocate_device_file(device, false);
 	if (IS_ERR(df)) {
 		ret = PTR_ERR(df);
 		goto err_out;
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index fe0fcfa78710..bdcf9762521d 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -17,7 +17,11 @@ struct vfio_device;
 struct vfio_container;
 
 struct vfio_device_file {
+	/* static fields, init per allocation */
 	struct vfio_device *device;
+	bool single_open;
+
+	/* fields set after allocation */
 	struct kvm *kvm;
 	struct iommufd_ctx *iommufd;
 	bool access_granted;
@@ -30,7 +34,7 @@ int vfio_device_open(struct vfio_device_file *df,
 void vfio_device_close(struct vfio_device_file *device);
 
 struct vfio_device_file *
-vfio_allocate_device_file(struct vfio_device *device);
+vfio_allocate_device_file(struct vfio_device *device, bool single_open);
 
 extern const struct file_operations vfio_device_fops;
 
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 90174a9015c4..78725c28b933 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -345,7 +345,7 @@ static bool vfio_assert_device_open(struct vfio_device *device)
 }
 
 struct vfio_device_file *
-vfio_allocate_device_file(struct vfio_device *device)
+vfio_allocate_device_file(struct vfio_device *device, bool single_open)
 {
 	struct vfio_device_file *df;
 
@@ -354,6 +354,7 @@ vfio_allocate_device_file(struct vfio_device *device)
 		return ERR_PTR(-ENOMEM);
 
 	df->device = device;
+	df->single_open = single_open;
 
 	return df;
 }
@@ -421,6 +422,16 @@ int vfio_device_open(struct vfio_device_file *df,
 
 	lockdep_assert_held(&device->dev_set->lock);
 
+	/*
+	 * Device cdev path cannot support multiple device open since
+	 * it doesn't have a secure way for it. So a second device
+	 * open attempt should be failed if the caller is from a cdev
+	 * path or the device has already been opened by a cdev path.
+	 */
+	if (device->open_count != 0 &&
+	    (df->single_open || device->single_open))
+		return -EINVAL;
+
 	device->open_count++;
 	if (device->open_count == 1) {
 		int ret;
@@ -430,6 +441,7 @@ int vfio_device_open(struct vfio_device_file *df,
 			device->open_count--;
 			return ret;
 		}
+		device->single_open = df->single_open;
 	}
 
 	/*
@@ -446,8 +458,10 @@ void vfio_device_close(struct vfio_device_file *df)
 
 	mutex_lock(&device->dev_set->lock);
 	vfio_assert_device_open(device);
-	if (device->open_count == 1)
+	if (device->open_count == 1) {
 		vfio_device_last_close(df);
+		device->single_open = false;
+	}
 	device->open_count--;
 	mutex_unlock(&device->dev_set->lock);
 }
@@ -493,7 +507,12 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
-	vfio_device_group_close(df);
+	/*
+	 * group path supports multiple device open, while cdev doesn't.
+	 * So use vfio_device_group_close() for !singel_open case.
+	 */
+	if (!df->single_open)
+		vfio_device_group_close(df);
 
 	vfio_device_put_registration(device);
 
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 46edd6e6c0ba..300318f0d448 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -63,6 +63,7 @@ struct vfio_device {
 	struct iommufd_ctx *iommufd_ictx;
 	bool iommufd_attached;
 #endif
+	bool single_open;
 };
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 11/13] vfio: Add cdev for vfio_device
  2023-01-17 13:49 [PATCH 00/13] Add vfio_device cdev for iommufd support Yi Liu
                   ` (9 preceding siblings ...)
  2023-01-17 13:49 ` [PATCH 10/13] vfio: Make vfio_device_open() exclusive between group path and device cdev path Yi Liu
@ 2023-01-17 13:49 ` Yi Liu
  2023-01-20  7:26   ` Tian, Kevin
  2023-01-24 20:44   ` Jason Gunthorpe
  2023-01-17 13:49 ` [PATCH 12/13] vfio: Add ioctls for device cdev iommufd Yi Liu
  2023-01-17 13:49 ` [PATCH 13/13] vfio: Compile group optionally Yi Liu
  12 siblings, 2 replies; 80+ messages in thread
From: Yi Liu @ 2023-01-17 13:49 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kevin.tian, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	suravee.suthikulpanit, Joao Martins

This allows user to directly open a vfio device w/o using the legacy
container/group interface, as a prerequisite for supporting new iommu
features like nested translation.

The device fd opened in this manner doesn't have the capability to access
the device as the fops open() doesn't open the device until the successful
BIND_IOMMUFD which be added in next patch.

With this patch, devices registered to vfio core have both group and device
interface created.

- group interface : /dev/vfio/$groupID
- device interface: /dev/vfio/devices/vfioX  (X is the minor number and
					      is unique across devices)

Given a vfio device the user can identify the matching vfioX by checking
the sysfs path of the device. Take PCI device (0000:6a:01.0) for example,
/sys/bus/pci/devices/0000\:6a\:01.0/vfio-dev/vfio0/dev contains the
major:minor of the matching vfioX.

Userspace then opens the /dev/vfio/devices/vfioX and checks with fstat
that the major:minor matches.

The vfio_device cdev logic in this patch:
*) __vfio_register_dev() path ends up doing cdev_device_add() for each
   vfio_device;
*) vfio_unregister_group_dev() path does cdev_device_del();

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/vfio/vfio_main.c | 103 ++++++++++++++++++++++++++++++++++++---
 include/linux/vfio.h     |   7 ++-
 2 files changed, 102 insertions(+), 8 deletions(-)

diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 78725c28b933..6068ffb7c6b7 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -43,6 +43,9 @@
 static struct vfio {
 	struct class			*device_class;
 	struct ida			device_ida;
+#if IS_ENABLED(CONFIG_IOMMUFD)
+	dev_t                           device_devt;
+#endif
 } vfio;
 
 static DEFINE_XARRAY(vfio_device_set_xa);
@@ -156,7 +159,11 @@ static void vfio_device_release(struct device *dev)
 			container_of(dev, struct vfio_device, device);
 
 	vfio_release_device_set(device);
+#if IS_ENABLED(CONFIG_IOMMUFD)
+	ida_free(&vfio.device_ida, MINOR(device->device.devt));
+#else
 	ida_free(&vfio.device_ida, device->index);
+#endif
 
 	if (device->ops->release)
 		device->ops->release(device);
@@ -209,15 +216,16 @@ EXPORT_SYMBOL_GPL(_vfio_alloc_device);
 static int vfio_init_device(struct vfio_device *device, struct device *dev,
 			    const struct vfio_device_ops *ops)
 {
+	unsigned int minor;
 	int ret;
 
 	ret = ida_alloc_max(&vfio.device_ida, MINORMASK, GFP_KERNEL);
 	if (ret < 0) {
-		dev_dbg(dev, "Error to alloc index\n");
+		dev_dbg(dev, "Error to alloc minor\n");
 		return ret;
 	}
 
-	device->index = ret;
+	minor = ret;
 	init_completion(&device->comp);
 	device->dev = dev;
 	device->ops = ops;
@@ -232,17 +240,25 @@ static int vfio_init_device(struct vfio_device *device, struct device *dev,
 	device->device.release = vfio_device_release;
 	device->device.class = vfio.device_class;
 	device->device.parent = device->dev;
+#if IS_ENABLED(CONFIG_IOMMUFD)
+	device->device.devt = MKDEV(MAJOR(vfio.device_devt), minor);
+	cdev_init(&device->cdev, &vfio_device_fops);
+	device->cdev.owner = THIS_MODULE;
+#else
+	device->index = minor;
+#endif
 	return 0;
 
 out_uninit:
 	vfio_release_device_set(device);
-	ida_free(&vfio.device_ida, device->index);
+	ida_free(&vfio.device_ida, minor);
 	return ret;
 }
 
 static int __vfio_register_dev(struct vfio_device *device,
 			       enum vfio_group_type type)
 {
+	unsigned int minor;
 	int ret;
 
 	if (WARN_ON(device->ops->bind_iommufd &&
@@ -257,7 +273,12 @@ static int __vfio_register_dev(struct vfio_device *device,
 	if (!device->dev_set)
 		vfio_assign_device_set(device, device);
 
-	ret = dev_set_name(&device->device, "vfio%d", device->index);
+#if IS_ENABLED(CONFIG_IOMMUFD)
+	minor = MINOR(device->device.devt);
+#else
+	minor = device->index;
+#endif
+	ret = dev_set_name(&device->device, "vfio%d", minor);
 	if (ret)
 		return ret;
 
@@ -265,7 +286,11 @@ static int __vfio_register_dev(struct vfio_device *device,
 	if (ret)
 		return ret;
 
+#if IS_ENABLED(CONFIG_IOMMUFD)
+	ret = cdev_device_add(&device->cdev, &device->device);
+#else
 	ret = device_add(&device->device);
+#endif
 	if (ret)
 		goto err_out;
 
@@ -305,6 +330,17 @@ void vfio_unregister_group_dev(struct vfio_device *device)
 	bool interrupted = false;
 	long rc;
 
+#if IS_ENABLED(CONFIG_IOMMUFD)
+	/*
+	 * Balances device_add in register path. Putting it as the first
+	 * operation in unregister to prevent registration refcount from
+	 * incrementing per cdev open.
+	 */
+	cdev_device_del(&device->cdev, &device->device);
+#else
+	device_del(&device->device);
+#endif
+
 	vfio_device_put_registration(device);
 	rc = try_wait_for_completion(&device->comp);
 	while (rc <= 0) {
@@ -330,9 +366,6 @@ void vfio_unregister_group_dev(struct vfio_device *device)
 
 	vfio_device_group_unregister(device);
 
-	/* Balances device_add in register path */
-	device_del(&device->device);
-
 	/* Balances vfio_device_set_group in register path */
 	vfio_device_remove_group(device);
 }
@@ -502,6 +535,37 @@ static inline void vfio_device_pm_runtime_put(struct vfio_device *device)
 /*
  * VFIO Device fd
  */
+#if IS_ENABLED(CONFIG_IOMMUFD)
+static int vfio_device_fops_open(struct inode *inode, struct file *filep)
+{
+	struct vfio_device *device = container_of(inode->i_cdev,
+						  struct vfio_device, cdev);
+	struct vfio_device_file *df;
+	int ret;
+
+	if (!vfio_device_try_get_registration(device))
+		return -ENODEV;
+
+	/*
+	 * device access is blocked until .open_device() is called
+	 * in BIND_IOMMUFD.
+	 */
+	df = vfio_allocate_device_file(device, true);
+	if (IS_ERR(df)) {
+		ret = PTR_ERR(df);
+		goto err_put_registration;
+	}
+
+	filep->private_data = df;
+
+	return 0;
+
+err_put_registration:
+	vfio_device_put_registration(device);
+	return ret;
+}
+#endif
+
 static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 {
 	struct vfio_device_file *df = filep->private_data;
@@ -1169,6 +1233,9 @@ static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
 
 const struct file_operations vfio_device_fops = {
 	.owner		= THIS_MODULE,
+#if IS_ENABLED(CONFIG_IOMMUFD)
+	.open		= vfio_device_fops_open,
+#endif
 	.release	= vfio_device_fops_release,
 	.read		= vfio_device_fops_read,
 	.write		= vfio_device_fops_write,
@@ -1522,6 +1589,13 @@ EXPORT_SYMBOL(vfio_dma_rw);
 /*
  * Module/class support
  */
+#if IS_ENABLED(CONFIG_IOMMUFD)
+static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
+{
+	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
+}
+#endif
+
 static int __init vfio_init(void)
 {
 	int ret;
@@ -1543,9 +1617,21 @@ static int __init vfio_init(void)
 		goto err_dev_class;
 	}
 
+#if IS_ENABLED(CONFIG_IOMMUFD)
+	vfio.device_class->devnode = vfio_device_devnode;
+	ret = alloc_chrdev_region(&vfio.device_devt, 0,
+				  MINORMASK + 1, "vfio-dev");
+	if (ret)
+		goto err_alloc_dev_chrdev;
+#endif
 	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
 	return 0;
 
+#if IS_ENABLED(CONFIG_IOMMUFD)
+err_alloc_dev_chrdev:
+	class_destroy(vfio.device_class);
+	vfio.device_class = NULL;
+#endif
 err_dev_class:
 	vfio_virqfd_exit();
 err_virqfd:
@@ -1556,6 +1642,9 @@ static int __init vfio_init(void)
 static void __exit vfio_cleanup(void)
 {
 	ida_destroy(&vfio.device_ida);
+#if IS_ENABLED(CONFIG_IOMMUFD)
+	unregister_chrdev_region(vfio.device_devt, MINORMASK + 1);
+#endif
 	class_destroy(vfio.device_class);
 	vfio.device_class = NULL;
 	vfio_virqfd_exit();
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 300318f0d448..4a31842ebe0b 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -13,6 +13,7 @@
 #include <linux/mm.h>
 #include <linux/workqueue.h>
 #include <linux/poll.h>
+#include <linux/cdev.h>
 #include <uapi/linux/vfio.h>
 #include <linux/iova_bitmap.h>
 
@@ -50,8 +51,12 @@ struct vfio_device {
 	struct kvm *kvm;
 
 	/* Members below here are private, not for driver use */
-	unsigned int index;
 	struct device device;	/* device.kref covers object life circle */
+#if IS_ENABLED(CONFIG_IOMMUFD)
+	struct cdev cdev;
+#else
+	unsigned int index;
+#endif
 	refcount_t refcount;	/* user count on registered device*/
 	unsigned int open_count;
 	struct completion comp;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 12/13] vfio: Add ioctls for device cdev iommufd
  2023-01-17 13:49 [PATCH 00/13] Add vfio_device cdev for iommufd support Yi Liu
                   ` (10 preceding siblings ...)
  2023-01-17 13:49 ` [PATCH 11/13] vfio: Add cdev for vfio_device Yi Liu
@ 2023-01-17 13:49 ` Yi Liu
  2023-01-20  8:03   ` Tian, Kevin
  2023-01-17 13:49 ` [PATCH 13/13] vfio: Compile group optionally Yi Liu
  12 siblings, 1 reply; 80+ messages in thread
From: Yi Liu @ 2023-01-17 13:49 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kevin.tian, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	suravee.suthikulpanit

This adds two vfio device ioctls for userspace using iommufd on vfio
devices.

    VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA
			      control provided by the iommufd. VFIO no
			      iommu is indicated by passing a minus
			      fd value.
    VFIO_DEVICE_ATTACH_IOMMUFD_PT: attach device to ioas, page tables
				   managed by iommufd. Attach can be
				   undo by passing IOMMUFD_INVALID_ID
				   to kernel.

The ioctls introduced here are just on par with existing VFIO.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/vfio.h          |   1 +
 drivers/vfio/vfio_main.c     | 175 ++++++++++++++++++++++++++++++++++-
 include/uapi/linux/iommufd.h |   2 +
 include/uapi/linux/vfio.h    |  64 +++++++++++++
 4 files changed, 237 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index bdcf9762521d..444be924c915 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -25,6 +25,7 @@ struct vfio_device_file {
 	struct kvm *kvm;
 	struct iommufd_ctx *iommufd;
 	bool access_granted;
+	bool noiommu;
 };
 
 void vfio_device_put_registration(struct vfio_device *device);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 6068ffb7c6b7..99ebb5bd1eda 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -34,6 +34,7 @@
 #include <linux/interval_tree.h>
 #include <linux/iova_bitmap.h>
 #include <linux/iommufd.h>
+#include <uapi/linux/iommufd.h>
 #include "vfio.h"
 
 #define DRIVER_VERSION	"0.3"
@@ -402,12 +403,37 @@ static int vfio_device_first_open(struct vfio_device_file *df,
 
 	lockdep_assert_held(&device->dev_set->lock);
 
+	/* df->iommufd and df->noiommu should be exclusive */
+	if (WARN_ON(iommufd && df->noiommu))
+		return -EINVAL;
+
 	if (!try_module_get(device->dev->driver->owner))
 		return -ENODEV;
 
+	/*
+	 * For group path, iommufd pointer is NULL when comes into this
+	 * helper. Its noiommu support is in container.c.
+	 *
+	 * For iommufd compat mode, iommufd pointer here is a valid value.
+	 * Its noiommu support is supposed to be in vfio_iommufd_bind().
+	 *
+	 * For device cdev path, iommufd pointer here is a valid value for
+	 * normal cases, but it is NULL if it's noiommu. The reason is
+	 * that userspace uses iommufd==-1 to indicate noiommu mode in this
+	 * path. So caller of this helper will pass in a NULL iommufd
+	 * pointer. To differentiate it from the group path which also
+	 * passes NULL iommufd pointer in, df->noiommu is used. For cdev
+	 * noiommu, df->noiommu would be set to mark noiommu case for cdev
+	 * path.
+	 *
+	 * So if df->noiommu is set then this helper just goes ahead to
+	 * open device. If not, it depends on if iommufd pointer is NULL
+	 * to handle the group path, iommufd compat mode, normal cases in
+	 * the cdev path.
+	 */
 	if (iommufd)
 		ret = vfio_iommufd_bind(device, iommufd, dev_id, pt_id);
-	else
+	else if (!df->noiommu)
 		ret = vfio_device_group_use_iommu(device);
 	if (ret)
 		goto err_module_put;
@@ -424,7 +450,7 @@ static int vfio_device_first_open(struct vfio_device_file *df,
 	device->kvm = NULL;
 	if (iommufd)
 		vfio_iommufd_unbind(device);
-	else
+	else if (!df->noiommu)
 		vfio_device_group_unuse_iommu(device);
 err_module_put:
 	module_put(device->dev->driver->owner);
@@ -443,7 +469,7 @@ static void vfio_device_last_close(struct vfio_device_file *df)
 	device->kvm = NULL;
 	if (iommufd)
 		vfio_iommufd_unbind(device);
-	else
+	else if (!df->noiommu)
 		vfio_device_group_unuse_iommu(device);
 	module_put(device->dev->driver->owner);
 }
@@ -485,17 +511,24 @@ int vfio_device_open(struct vfio_device_file *df,
 	return 0;
 }
 
-void vfio_device_close(struct vfio_device_file *df)
+static void __vfio_device_close(struct vfio_device_file *df)
 {
 	struct vfio_device *device = df->device;
 
-	mutex_lock(&device->dev_set->lock);
 	vfio_assert_device_open(device);
 	if (device->open_count == 1) {
 		vfio_device_last_close(df);
 		device->single_open = false;
 	}
 	device->open_count--;
+}
+
+void vfio_device_close(struct vfio_device_file *df)
+{
+	struct vfio_device *device = df->device;
+
+	mutex_lock(&device->dev_set->lock);
+	__vfio_device_close(df);
 	mutex_unlock(&device->dev_set->lock);
 }
 
@@ -577,6 +610,8 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 	 */
 	if (!df->single_open)
 		vfio_device_group_close(df);
+	else
+		vfio_device_close(df);
 
 	vfio_device_put_registration(device);
 
@@ -1143,6 +1178,129 @@ static int vfio_ioctl_device_feature(struct vfio_device *device,
 	}
 }
 
+static long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
+					   unsigned long arg)
+{
+	struct vfio_device *device = df->device;
+	struct vfio_device_bind_iommufd bind;
+	struct iommufd_ctx *iommufd = NULL;
+	unsigned long minsz;
+	struct fd f;
+	int ret;
+
+	minsz = offsetofend(struct vfio_device_bind_iommufd, iommufd);
+
+	if (copy_from_user(&bind, (void __user *)arg, minsz))
+		return -EFAULT;
+
+	if (bind.argsz < minsz || bind.flags)
+		return -EINVAL;
+
+	if (!device->ops->bind_iommufd)
+		return -ENODEV;
+
+	mutex_lock(&device->dev_set->lock);
+	/*
+	 * If already been bound to an iommufd, or already set noiommu
+	 * then fail it.
+	 */
+	if (df->iommufd || df->noiommu) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	/* iommufd < 0 means noiommu mode */
+	if (bind.iommufd < 0) {
+		if (!capable(CAP_SYS_RAWIO)) {
+			ret = -EPERM;
+			goto out_unlock;
+		}
+		df->noiommu = true;
+	} else {
+		f = fdget(bind.iommufd);
+		if (!f.file) {
+			ret = -EBADF;
+			goto out_unlock;
+		}
+		iommufd = iommufd_ctx_from_file(f.file);
+		if (IS_ERR(iommufd)) {
+			ret = PTR_ERR(iommufd);
+			goto out_put_file;
+		}
+	}
+
+	/* df->kvm is supposed to be set in vfio_device_file_set_kvm() */
+	df->iommufd = iommufd;
+	ret = vfio_device_open(df, &bind.out_devid, NULL);
+	if (ret)
+		goto out_put_file;
+
+	ret = copy_to_user((void __user *)arg + minsz,
+			   &bind.out_devid,
+			   sizeof(bind.out_devid)) ? -EFAULT : 0;
+	if (ret)
+		goto out_close_device;
+
+	mutex_unlock(&device->dev_set->lock);
+	if (iommufd)
+		fdput(f);
+	else if (df->noiommu)
+		dev_warn(device->dev, "vfio-noiommu device used by user "
+			 "(%s:%d)\n", current->comm, task_pid_nr(current));
+	return 0;
+
+out_close_device:
+	__vfio_device_close(df);
+out_put_file:
+	if (iommufd)
+		fdput(f);
+out_unlock:
+	df->iommufd = NULL;
+	df->noiommu = false;
+	mutex_unlock(&device->dev_set->lock);
+	return ret;
+}
+
+static int vfio_ioctl_device_attach(struct vfio_device *device,
+				    struct vfio_device_feature __user *arg)
+{
+	struct vfio_device_attach_iommufd_pt attach;
+	int ret;
+	bool is_attach;
+
+	if (copy_from_user(&attach, (void __user *)arg, sizeof(attach)))
+		return -EFAULT;
+
+	if (attach.flags)
+		return -EINVAL;
+
+	if (!device->ops->bind_iommufd)
+		return -ENODEV;
+
+	mutex_lock(&device->dev_set->lock);
+	is_attach = attach.pt_id != IOMMUFD_INVALID_ID;
+	ret = vfio_iommufd_attach(device, is_attach ? &attach.pt_id : NULL);
+	if (ret)
+		goto out_unlock;
+
+	if (is_attach) {
+		ret = copy_to_user((void __user *)arg + offsetofend(
+				   struct vfio_device_attach_iommufd_pt, flags),
+				   &attach.pt_id,
+				   sizeof(attach.pt_id)) ? -EFAULT : 0;
+		if (ret)
+			goto out_detach;
+	}
+	mutex_unlock(&device->dev_set->lock);
+	return 0;
+
+out_detach:
+	vfio_iommufd_attach(device, NULL);
+out_unlock:
+	mutex_unlock(&device->dev_set->lock);
+	return ret;
+}
+
 static long vfio_device_fops_unl_ioctl(struct file *filep,
 				       unsigned int cmd, unsigned long arg)
 {
@@ -1151,6 +1309,9 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
 	bool access;
 	int ret;
 
+	if (cmd == VFIO_DEVICE_BIND_IOMMUFD)
+		return vfio_device_ioctl_bind_iommufd(df, arg);
+
 	/* Paired with smp_store_release() in vfio_device_open() */
 	access = smp_load_acquire(&df->access_granted);
 	if (!access)
@@ -1165,6 +1326,10 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
 		ret = vfio_ioctl_device_feature(device, (void __user *)arg);
 		break;
 
+	case VFIO_DEVICE_ATTACH_IOMMUFD_PT:
+		ret = vfio_ioctl_device_attach(device, (void __user *)arg);
+		break;
+
 	default:
 		if (unlikely(!device->ops->ioctl))
 			ret = -EINVAL;
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 98ebba80cfa1..87680274c01b 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -9,6 +9,8 @@
 
 #define IOMMUFD_TYPE (';')
 
+#define IOMMUFD_INVALID_ID 0  /* valid ID starts from 1 */
+
 /**
  * DOC: General ioctl format
  *
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 23105eb036fa..235d3485a883 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -190,6 +190,70 @@ struct vfio_group_status {
 
 /* --------------- IOCTLs for DEVICE file descriptors --------------- */
 
+/*
+ * VFIO_DEVICE_BIND_IOMMUFD - _IOR(VFIO_TYPE, VFIO_BASE + 19,
+ *				   struct vfio_device_bind_iommufd)
+ *
+ * Bind a vfio_device to the specified iommufd.
+ *
+ * The user should provide a device cookie when calling this ioctl. The
+ * cookie is carried only in event e.g. I/O fault reported to userspace
+ * via iommufd. The user should use devid returned by this ioctl to mark
+ * the target device in other ioctls (e.g. capability query via iommufd).
+ *
+ * User is not allowed to access the device before the binding operation
+ * is completed.
+ *
+ * Unbind is automatically conducted when device fd is closed.
+ *
+ * @argsz:	 user filled size of this data.
+ * @flags:	 reserved for future extension.
+ * @dev_cookie:	 a per device cookie provided by userspace.
+ * @iommufd:	 iommufd to bind. iommufd < 0 means noiommu.
+ * @out_devid:	 the device id generated by this bind.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+struct vfio_device_bind_iommufd {
+	__u32		argsz;
+	__u32		flags;
+	__aligned_u64	dev_cookie;
+	__s32		iommufd;
+	__u32		out_devid;
+};
+
+#define VFIO_DEVICE_BIND_IOMMUFD	_IO(VFIO_TYPE, VFIO_BASE + 19)
+
+/*
+ * VFIO_DEVICE_ATTACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 20,
+ *					struct vfio_device_attach_iommufd_pt)
+ *
+ * Attach a vfio device to an iommufd address space specified by IOAS
+ * id or hw_pagetable (hwpt) id.
+ *
+ * Available only after a device has been bound to iommufd via
+ * VFIO_DEVICE_BIND_IOMMUFD
+ *
+ * Undo by passing pt_id == IOMMUFD_INVALID_ID
+ *
+ * @argsz:	user filled size of this data.
+ * @flags:	must be 0.
+ * @pt_id:	Input the target id which can represent an ioas or a hwpt
+ *		allocated via iommufd subsystem.
+ *		Output the attached hwpt id which could be the specified
+ *		hwpt itself or a hwpt automatically created for the
+ *		specified ioas by kernel during the attachment.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+struct vfio_device_attach_iommufd_pt {
+	__u32	argsz;
+	__u32	flags;
+	__u32	pt_id;
+};
+
+#define VFIO_DEVICE_ATTACH_IOMMUFD_PT		_IO(VFIO_TYPE, VFIO_BASE + 20)
+
 /**
  * VFIO_DEVICE_GET_INFO - _IOR(VFIO_TYPE, VFIO_BASE + 7,
  *						struct vfio_device_info)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 13/13] vfio: Compile group optionally
  2023-01-17 13:49 [PATCH 00/13] Add vfio_device cdev for iommufd support Yi Liu
                   ` (11 preceding siblings ...)
  2023-01-17 13:49 ` [PATCH 12/13] vfio: Add ioctls for device cdev iommufd Yi Liu
@ 2023-01-17 13:49 ` Yi Liu
  12 siblings, 0 replies; 80+ messages in thread
From: Yi Liu @ 2023-01-17 13:49 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kevin.tian, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	suravee.suthikulpanit

group code is not needed for vfio device cdev, so with vfio device cdev
introduced, the group infrastructures can be compiled out.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/Kconfig  | 17 +++++++++++
 drivers/vfio/Makefile |  3 +-
 drivers/vfio/vfio.h   | 69 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/vfio.h  | 11 +++++++
 4 files changed, 99 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index a8f544629467..7e3f6249fa15 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -12,9 +12,26 @@ menuconfig VFIO
 	  If you don't know what to do here, say N.
 
 if VFIO
+config VFIO_ENABLE_GROUP
+	bool
+	default !IOMMUFD
+
+config VFIO_GROUP
+	bool "Support for the VFIO group /dev/vfio/$group_id"
+	select VFIO_ENABLE_GROUP
+	default y
+	help
+	   VFIO group is legacy interface for userspace. For userspaces
+	   adapted to iommufd and vfio device cdev, this can be N. For
+	   now, before iommufd is ready and userspace applications fully
+	   converted to iommufd and vfio device cdev, this should be Y.
+
+	   If you don't know what to do here, say Y.
+
 config VFIO_CONTAINER
 	bool "Support for the VFIO container /dev/vfio/vfio"
 	select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
+	depends on VFIO_ENABLE_GROUP
 	default y
 	help
 	  The VFIO container is the classic interface to VFIO for establishing
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index 70e7dcb302ef..bb3fec9ea6bf 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -2,8 +2,9 @@
 obj-$(CONFIG_VFIO) += vfio.o
 
 vfio-y += vfio_main.o \
-	  group.o \
 	  iova_bitmap.o
+
+vfio-$(CONFIG_VFIO_ENABLE_GROUP) += group.o
 vfio-$(CONFIG_IOMMUFD) += iommufd.o
 vfio-$(CONFIG_VFIO_CONTAINER) += container.o
 vfio-$(CONFIG_VFIO_VIRQFD) += virqfd.o
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 444be924c915..cd282e5c07bb 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -63,6 +63,7 @@ enum vfio_group_type {
 	VFIO_NO_IOMMU,
 };
 
+#if IS_ENABLED(CONFIG_VFIO_ENABLE_GROUP)
 struct vfio_group {
 	struct device 			dev;
 	struct cdev			cdev;
@@ -105,6 +106,74 @@ bool vfio_group_has_dev(struct vfio_group *group, struct vfio_device *device);
 bool vfio_device_has_container(struct vfio_device *device);
 int __init vfio_group_init(void);
 void vfio_group_cleanup(void);
+#else
+struct vfio_group;
+
+static inline int vfio_device_set_group(struct vfio_device *device,
+					enum vfio_group_type type)
+{
+	return 0;
+}
+
+static inline void vfio_device_remove_group(struct vfio_device *device)
+{
+}
+
+static inline void vfio_device_group_register(struct vfio_device *device)
+{
+}
+
+static inline void vfio_device_group_unregister(struct vfio_device *device)
+{
+}
+
+static inline int vfio_device_group_use_iommu(struct vfio_device *device)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline void vfio_device_group_unuse_iommu(struct vfio_device *device)
+{
+}
+
+static inline void vfio_device_group_close(struct vfio_device_file *df)
+{
+}
+
+static inline struct vfio_group *vfio_group_from_file(struct file *file)
+{
+	return NULL;
+}
+
+static inline bool vfio_group_enforced_coherent(struct vfio_group *group)
+{
+	return true;
+}
+
+static inline void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm)
+{
+}
+
+static inline bool vfio_group_has_dev(struct vfio_group *group,
+				      struct vfio_device *device)
+{
+	return false;
+}
+
+static inline bool vfio_device_has_container(struct vfio_device *device)
+{
+	return false;
+}
+
+static inline int __init vfio_group_init(void)
+{
+	return 0;
+}
+
+static inline void vfio_group_cleanup(void)
+{
+}
+#endif /* CONFIG_VFIO_ENABLE_GROUP */
 
 #if IS_ENABLED(CONFIG_VFIO_CONTAINER)
 /* events for the backend driver notify callback */
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 4a31842ebe0b..eb4dc3dfab03 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -43,7 +43,9 @@ struct vfio_device {
 	 */
 	const struct vfio_migration_ops *mig_ops;
 	const struct vfio_log_ops *log_ops;
+#if IS_ENABLED(CONFIG_VFIO_ENABLE_GROUP)
 	struct vfio_group *group;
+#endif
 	struct vfio_device_set *dev_set;
 	struct list_head dev_set_list;
 	unsigned int migration_flags;
@@ -60,8 +62,10 @@ struct vfio_device {
 	refcount_t refcount;	/* user count on registered device*/
 	unsigned int open_count;
 	struct completion comp;
+#if IS_ENABLED(CONFIG_VFIO_ENABLE_GROUP)
 	struct list_head group_next;
 	struct list_head iommu_entry;
+#endif
 	struct iommufd_access *iommufd_access;
 #if IS_ENABLED(CONFIG_IOMMUFD)
 	struct iommufd_device *iommufd_device;
@@ -246,7 +250,14 @@ int vfio_mig_get_next_state(struct vfio_device *device,
 /*
  * External user API
  */
+#if IS_ENABLED(CONFIG_VFIO_ENABLE_GROUP)
 struct iommu_group *vfio_file_iommu_group(struct file *file);
+#else
+static inline struct iommu_group *vfio_file_iommu_group(struct file *file)
+{
+	return NULL;
+}
+#endif
 bool vfio_file_is_valid(struct file *file);
 bool vfio_file_enforced_coherent(struct file *file);
 void vfio_file_set_kvm(struct file *file, struct kvm *kvm);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* RE: [PATCH 01/13] vfio: Allocate per device file structure
  2023-01-17 13:49 ` [PATCH 01/13] vfio: Allocate per device file structure Yi Liu
@ 2023-01-18  8:37   ` Tian, Kevin
  2023-01-18 13:28   ` Eric Auger
  1 sibling, 0 replies; 80+ messages in thread
From: Tian, Kevin @ 2023-01-18  8:37 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, suravee.suthikulpanit

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Tuesday, January 17, 2023 9:50 PM
> 
> This is preparation for adding vfio device cdev support. vfio device
> cdev requires:
> 1) a per device file memory to store the kvm pointer set by KVM. It will
>    be propagated to vfio_device:kvm after the device cdev file is bound
>    to an iommufd
> 2) a mechanism to block device access through device cdev fd before it
>    is bound to an iommufd
> 
> To address above requirements, this adds a per device file structure
> named vfio_device_file. For now, it's only a wrapper of struct vfio_device
> pointer. Other fields will be added to this per file structure in future
> commits.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 02/13] vfio: Refine vfio file kAPIs
  2023-01-17 13:49 ` [PATCH 02/13] vfio: Refine vfio file kAPIs Yi Liu
@ 2023-01-18  8:42   ` Tian, Kevin
  2023-01-18 14:37   ` Eric Auger
  1 sibling, 0 replies; 80+ messages in thread
From: Tian, Kevin @ 2023-01-18  8:42 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, suravee.suthikulpanit

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Tuesday, January 17, 2023 9:50 PM
> 
> +/**
> + * vfio_file_is_valid - True if the file is usable with VFIO aPIS

s/aPIS/API/

> +
> +/**
> + * vfio_file_enforced_coherent - True if the DMA associated with the VFIO
> file
> + *        is always CPU cache coherent
> + * @file: VFIO group or device file

to on par with other places, "VFIO group file or device file"

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 03/13] vfio: Accept vfio device file in the driver facing kAPI
  2023-01-17 13:49 ` [PATCH 03/13] vfio: Accept vfio device file in the driver facing kAPI Yi Liu
@ 2023-01-18  8:45   ` Tian, Kevin
  2023-01-18 16:11   ` Eric Auger
  1 sibling, 0 replies; 80+ messages in thread
From: Tian, Kevin @ 2023-01-18  8:45 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, suravee.suthikulpanit

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Tuesday, January 17, 2023 9:50 PM
> 
> This makes the vfio file kAPIs to accepte vfio device files, also a
> preparation for vfio device cdev support.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 04/13] kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device fd
  2023-01-17 13:49 ` [PATCH 04/13] kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device fd Yi Liu
@ 2023-01-18  8:47   ` Tian, Kevin
  2023-01-18 16:33   ` Eric Auger
  1 sibling, 0 replies; 80+ messages in thread
From: Tian, Kevin @ 2023-01-18  8:47 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, suravee.suthikulpanit

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Tuesday, January 17, 2023 9:50 PM
> 
> Meanwhile, rename related helpers. No functional change is intended.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 05/13] kvm/vfio: Provide struct kvm_device_ops::release() insted of ::destroy()
  2023-01-17 13:49 ` [PATCH 05/13] kvm/vfio: Provide struct kvm_device_ops::release() insted of ::destroy() Yi Liu
@ 2023-01-18  8:56   ` Tian, Kevin
  2023-01-19  9:12   ` Eric Auger
  2023-01-19 19:07   ` Jason Gunthorpe
  2 siblings, 0 replies; 80+ messages in thread
From: Tian, Kevin @ 2023-01-18  8:56 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, suravee.suthikulpanit

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Tuesday, January 17, 2023 9:50 PM
> 
> @@ -364,7 +364,7 @@ static int kvm_vfio_create(struct kvm_device *dev,
> u32 type);
>  static struct kvm_device_ops kvm_vfio_ops = {
>  	.name = "kvm-vfio",
>  	.create = kvm_vfio_create,
> -	.destroy = kvm_vfio_destroy,
> +	.release = kvm_vfio_destroy,

Also rename to kvm_vfio_release.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 06/13] kvm/vfio: Accept vfio device file from userspace
  2023-01-17 13:49 ` [PATCH 06/13] kvm/vfio: Accept vfio device file from userspace Yi Liu
@ 2023-01-18  9:18   ` Tian, Kevin
  2023-01-19  9:35   ` Eric Auger
  1 sibling, 0 replies; 80+ messages in thread
From: Tian, Kevin @ 2023-01-18  9:18 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, suravee.suthikulpanit

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Tuesday, January 17, 2023 9:50 PM
> 
> +tracks VFIO files (group or device) in use by the VM and features
> +of those groups/devices important to the correctness and acceleration
> +of the VM.  As groups/device are enabled and disabled for use by the

"groups/devices"

> +  KVM_DEV_VFIO_FILE_DEL: Remove a VFIO file (group/device) from VFIO-
> KVM device
> +	tracking kvm_device_attr.addr points to an int32_t file descriptor
>  	for the VFIO group.

"for the VFIO file"

> -  KVM_DEV_VFIO_GROUP_DEL: Remove a VFIO group from VFIO-KVM
> device tracking
> -	kvm_device_attr.addr points to an int32_t file descriptor
> -	for the VFIO group.
> -  KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE: attaches a guest visible TCE
> table
> +  KVM_DEV_VFIO_FILE_SET_SPAPR_TCE: attaches a guest visible TCE table
>  	allocated by sPAPR KVM.
>  	kvm_device_attr.addr points to a struct::

btw do we want to mention the GROUP cmd alias here instead of
simply removing them?

> @@ -1396,15 +1396,26 @@ struct kvm_create_device {
> 
>  struct kvm_device_attr {
>  	__u32	flags;		/* no flags currently defined */
> -	__u32	group;		/* device-defined */
> -	__u64	attr;		/* group-defined */
> +	union {
> +		__u32	group;
> +		__u32	file;
> +	}; /* device-defined */
> +	__u64	attr;		/* VFIO-file-defined or group-defined */

remove "VFIO-"


^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 07/13] vfio: Pass struct vfio_device_file * to vfio_device_open/close()
  2023-01-17 13:49 ` [PATCH 07/13] vfio: Pass struct vfio_device_file * to vfio_device_open/close() Yi Liu
@ 2023-01-18  9:27   ` Tian, Kevin
  2023-01-19 11:01   ` Eric Auger
  1 sibling, 0 replies; 80+ messages in thread
From: Tian, Kevin @ 2023-01-18  9:27 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, suravee.suthikulpanit

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Tuesday, January 17, 2023 9:50 PM
> 
> This avoids passing struct kvm * and struct iommufd_ctx * in multiple
> functions. vfio_device_open() becomes to be a locked helper.

remove "to be"

> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 08/13] vfio: Block device access via device fd until device is opened
  2023-01-17 13:49 ` [PATCH 08/13] vfio: Block device access via device fd until device is opened Yi Liu
@ 2023-01-18  9:35   ` Tian, Kevin
  2023-01-18 13:52     ` Jason Gunthorpe
  2023-01-19 14:00   ` Eric Auger
  2023-01-19 20:47   ` Alex Williamson
  2 siblings, 1 reply; 80+ messages in thread
From: Tian, Kevin @ 2023-01-18  9:35 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, suravee.suthikulpanit

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Tuesday, January 17, 2023 9:50 PM
> 
> Allow the vfio_device file to be in a state where the device FD is
> opened but the device cannot be used by userspace (i.e. its .open_device()
> hasn't been called). This inbetween state is not used when the device
> FD is spawned from the group FD, however when we create the device FD
> directly by opening a cdev it will be opened in the blocked state.
> 
> In the blocked state, currently only the bind operation is allowed,
> other device accesses are not allowed. Completing bind will allow user
> to further access the device.
> 
> This is implemented by adding a flag in struct vfio_device_file to mark
> the blocked state and using a simple smp_load_acquire() to obtain the
> flag value and serialize all the device setup with the thread accessing
> this device.
> 
> Due to this scheme it is not possible to unbind the FD, once it is bound,
> it remains bound until the FD is closed.
> 

My question to the last version was not answered...

Can you elaborate why it is impossible to unbind? Is it more an
implementation choice or conceptual restriction?

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 01/13] vfio: Allocate per device file structure
  2023-01-17 13:49 ` [PATCH 01/13] vfio: Allocate per device file structure Yi Liu
  2023-01-18  8:37   ` Tian, Kevin
@ 2023-01-18 13:28   ` Eric Auger
  1 sibling, 0 replies; 80+ messages in thread
From: Eric Auger @ 2023-01-18 13:28 UTC (permalink / raw)
  To: Yi Liu, alex.williamson, jgg
  Cc: kevin.tian, cohuck, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, suravee.suthikulpanit

Hi Yi,

On 1/17/23 14:49, Yi Liu wrote:
> This is preparation for adding vfio device cdev support. vfio device
> cdev requires:
> 1) a per device file memory to store the kvm pointer set by KVM. It will
>    be propagated to vfio_device:kvm after the device cdev file is bound
>    to an iommufd
> 2) a mechanism to block device access through device cdev fd before it
>    is bound to an iommufd
>
> To address above requirements, this adds a per device file structure
> named vfio_device_file. For now, it's only a wrapper of struct vfio_device
> pointer. Other fields will be added to this per file structure in future
> commits.
>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/group.c     | 13 +++++++++++--
>  drivers/vfio/vfio.h      |  6 ++++++
>  drivers/vfio/vfio_main.c | 31 ++++++++++++++++++++++++++-----
>  3 files changed, 43 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> index bb24b2f0271e..8fdb7e35b0a6 100644
> --- a/drivers/vfio/group.c
> +++ b/drivers/vfio/group.c
> @@ -186,19 +186,26 @@ void vfio_device_group_close(struct vfio_device *device)
>  
>  static struct file *vfio_device_open_file(struct vfio_device *device)
>  {
> +	struct vfio_device_file *df;
>  	struct file *filep;
>  	int ret;
>  
> +	df = vfio_allocate_device_file(device);
> +	if (IS_ERR(df)) {
> +		ret = PTR_ERR(df);
> +		goto err_out;
> +	}
> +
>  	ret = vfio_device_group_open(device);
>  	if (ret)
> -		goto err_out;
> +		goto err_free;
>  
>  	/*
>  	 * We can't use anon_inode_getfd() because we need to modify
>  	 * the f_mode flags directly to allow more than just ioctls
>  	 */
>  	filep = anon_inode_getfile("[vfio-device]", &vfio_device_fops,
> -				   device, O_RDWR);
> +				   df, O_RDWR);
>  	if (IS_ERR(filep)) {
>  		ret = PTR_ERR(filep);
>  		goto err_close_device;
> @@ -222,6 +229,8 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
>  
>  err_close_device:
>  	vfio_device_group_close(device);
> +err_free:
> +	kfree(df);
>  err_out:
>  	return ERR_PTR(ret);
>  }
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index f8219a438bfb..1091806bc89d 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -16,12 +16,18 @@ struct iommu_group;
>  struct vfio_device;
>  struct vfio_container;
>  
> +struct vfio_device_file {
> +	struct vfio_device *device;
> +};
> +
>  void vfio_device_put_registration(struct vfio_device *device);
>  bool vfio_device_try_get_registration(struct vfio_device *device);
>  int vfio_device_open(struct vfio_device *device,
>  		     struct iommufd_ctx *iommufd, struct kvm *kvm);
>  void vfio_device_close(struct vfio_device *device,
>  		       struct iommufd_ctx *iommufd);
> +struct vfio_device_file *
> +vfio_allocate_device_file(struct vfio_device *device);
>  
>  extern const struct file_operations vfio_device_fops;
>  
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 5177bb061b17..ee54c9ae0af4 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -344,6 +344,20 @@ static bool vfio_assert_device_open(struct vfio_device *device)
>  	return !WARN_ON_ONCE(!READ_ONCE(device->open_count));
>  }
>  
> +struct vfio_device_file *
> +vfio_allocate_device_file(struct vfio_device *device)
> +{
> +	struct vfio_device_file *df;
> +
> +	df = kzalloc(sizeof(*df), GFP_KERNEL_ACCOUNT);
> +	if (!df)
> +		return ERR_PTR(-ENOMEM);
> +
> +	df->device = device;
> +
> +	return df;
> +}
> +
>  static int vfio_device_first_open(struct vfio_device *device,
>  				  struct iommufd_ctx *iommufd, struct kvm *kvm)
>  {
> @@ -461,12 +475,15 @@ static inline void vfio_device_pm_runtime_put(struct vfio_device *device)
>   */
>  static int vfio_device_fops_release(struct inode *inode, struct file *filep)
>  {
> -	struct vfio_device *device = filep->private_data;
> +	struct vfio_device_file *df = filep->private_data;
> +	struct vfio_device *device = df->device;
>  
>  	vfio_device_group_close(device);
>  
>  	vfio_device_put_registration(device);
>  
> +	kfree(df);
> +
>  	return 0;
>  }
>  
> @@ -1031,7 +1048,8 @@ static int vfio_ioctl_device_feature(struct vfio_device *device,
>  static long vfio_device_fops_unl_ioctl(struct file *filep,
>  				       unsigned int cmd, unsigned long arg)
>  {
> -	struct vfio_device *device = filep->private_data;
> +	struct vfio_device_file *df = filep->private_data;
> +	struct vfio_device *device = df->device;
>  	int ret;
>  
>  	ret = vfio_device_pm_runtime_get(device);
> @@ -1058,7 +1076,8 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
>  static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf,
>  				     size_t count, loff_t *ppos)
>  {
> -	struct vfio_device *device = filep->private_data;
> +	struct vfio_device_file *df = filep->private_data;
> +	struct vfio_device *device = df->device;
>  
>  	if (unlikely(!device->ops->read))
>  		return -EINVAL;
> @@ -1070,7 +1089,8 @@ static ssize_t vfio_device_fops_write(struct file *filep,
>  				      const char __user *buf,
>  				      size_t count, loff_t *ppos)
>  {
> -	struct vfio_device *device = filep->private_data;
> +	struct vfio_device_file *df = filep->private_data;
> +	struct vfio_device *device = df->device;
>  
>  	if (unlikely(!device->ops->write))
>  		return -EINVAL;
> @@ -1080,7 +1100,8 @@ static ssize_t vfio_device_fops_write(struct file *filep,
>  
>  static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
>  {
> -	struct vfio_device *device = filep->private_data;
> +	struct vfio_device_file *df = filep->private_data;
> +	struct vfio_device *device = df->device;
>  
>  	if (unlikely(!device->ops->mmap))
>  		return -EINVAL;
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Thanks

Eric


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/13] vfio: Block device access via device fd until device is opened
  2023-01-18  9:35   ` Tian, Kevin
@ 2023-01-18 13:52     ` Jason Gunthorpe
  2023-01-19  3:42       ` Tian, Kevin
  0 siblings, 1 reply; 80+ messages in thread
From: Jason Gunthorpe @ 2023-01-18 13:52 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, alex.williamson, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	suravee.suthikulpanit

On Wed, Jan 18, 2023 at 09:35:33AM +0000, Tian, Kevin wrote:
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Tuesday, January 17, 2023 9:50 PM
> > 
> > Allow the vfio_device file to be in a state where the device FD is
> > opened but the device cannot be used by userspace (i.e. its .open_device()
> > hasn't been called). This inbetween state is not used when the device
> > FD is spawned from the group FD, however when we create the device FD
> > directly by opening a cdev it will be opened in the blocked state.
> > 
> > In the blocked state, currently only the bind operation is allowed,
> > other device accesses are not allowed. Completing bind will allow user
> > to further access the device.
> > 
> > This is implemented by adding a flag in struct vfio_device_file to mark
> > the blocked state and using a simple smp_load_acquire() to obtain the
> > flag value and serialize all the device setup with the thread accessing
> > this device.
> > 
> > Due to this scheme it is not possible to unbind the FD, once it is bound,
> > it remains bound until the FD is closed.
> > 
> 
> My question to the last version was not answered...
> 
> Can you elaborate why it is impossible to unbind? Is it more an
> implementation choice or conceptual restriction?

At least for the implementation it is due to the use of the lockless
test for bind.

It can safely handle unbind->bind but it cannot handle
bind->unbind. To allows this we'd need to add a lock on all the vfio
ioctls which seems costly.

Jason

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 02/13] vfio: Refine vfio file kAPIs
  2023-01-17 13:49 ` [PATCH 02/13] vfio: Refine vfio file kAPIs Yi Liu
  2023-01-18  8:42   ` Tian, Kevin
@ 2023-01-18 14:37   ` Eric Auger
  2023-01-29 13:32     ` Liu, Yi L
  1 sibling, 1 reply; 80+ messages in thread
From: Eric Auger @ 2023-01-18 14:37 UTC (permalink / raw)
  To: Yi Liu, alex.williamson, jgg
  Cc: kevin.tian, cohuck, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, suravee.suthikulpanit

Hi Yi,

On 1/17/23 14:49, Yi Liu wrote:
> This prepares for making the below kAPIs to accept both group file
> and device file instead of only vfio group file.
>   bool vfio_file_enforced_coherent(struct file *file);
>   void vfio_file_set_kvm(struct file *file, struct kvm *kvm);
>   bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
>
> Besides above change, vfio_file_is_group() is renamed to be
> vfio_file_is_valid().
>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/group.c             | 74 ++++++++------------------------
>  drivers/vfio/pci/vfio_pci_core.c |  4 +-
>  drivers/vfio/vfio.h              |  4 ++
>  drivers/vfio/vfio_main.c         | 62 ++++++++++++++++++++++++++
>  include/linux/vfio.h             |  2 +-
>  virt/kvm/vfio.c                  | 10 ++---
>  6 files changed, 92 insertions(+), 64 deletions(-)
>
> diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> index 8fdb7e35b0a6..d83cf069d290 100644
> --- a/drivers/vfio/group.c
> +++ b/drivers/vfio/group.c
> @@ -721,6 +721,15 @@ bool vfio_device_has_container(struct vfio_device *device)
>  	return device->group->container;
>  }
>  
> +struct vfio_group *vfio_group_from_file(struct file *file)
> +{
> +	struct vfio_group *group = file->private_data;
> +
> +	if (file->f_op != &vfio_group_fops)
> +		return NULL;
> +	return group;
> +}
> +
>  /**
>   * vfio_file_iommu_group - Return the struct iommu_group for the vfio group file
>   * @file: VFIO group file
> @@ -731,13 +740,13 @@ bool vfio_device_has_container(struct vfio_device *device)
>   */
>  struct iommu_group *vfio_file_iommu_group(struct file *file)
>  {
> -	struct vfio_group *group = file->private_data;
> +	struct vfio_group *group = vfio_group_from_file(file);
>  	struct iommu_group *iommu_group = NULL;
>  
>  	if (!IS_ENABLED(CONFIG_SPAPR_TCE_IOMMU))
>  		return NULL;
>  
> -	if (!vfio_file_is_group(file))
> +	if (!group)
>  		return NULL;
>  
>  	mutex_lock(&group->group_lock);
> @@ -750,34 +759,11 @@ struct iommu_group *vfio_file_iommu_group(struct file *file)
>  }
>  EXPORT_SYMBOL_GPL(vfio_file_iommu_group);
>  
> -/**
> - * vfio_file_is_group - True if the file is usable with VFIO aPIS
> - * @file: VFIO group file
> - */
> -bool vfio_file_is_group(struct file *file)
> -{
> -	return file->f_op == &vfio_group_fops;
> -}
> -EXPORT_SYMBOL_GPL(vfio_file_is_group);
> -
> -/**
> - * vfio_file_enforced_coherent - True if the DMA associated with the VFIO file
> - *        is always CPU cache coherent
> - * @file: VFIO group file
> - *
> - * Enforced coherency means that the IOMMU ignores things like the PCIe no-snoop
> - * bit in DMA transactions. A return of false indicates that the user has
> - * rights to access additional instructions such as wbinvd on x86.
> - */
> -bool vfio_file_enforced_coherent(struct file *file)
> +bool vfio_group_enforced_coherent(struct vfio_group *group)
>  {
> -	struct vfio_group *group = file->private_data;
>  	struct vfio_device *device;
>  	bool ret = true;
>  
> -	if (!vfio_file_is_group(file))
> -		return true;
> -
>  	/*
>  	 * If the device does not have IOMMU_CAP_ENFORCE_CACHE_COHERENCY then
>  	 * any domain later attached to it will also not support it. If the cap
> @@ -795,46 +781,22 @@ bool vfio_file_enforced_coherent(struct file *file)
>  	mutex_unlock(&group->device_lock);
>  	return ret;
>  }
> -EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
>  
> -/**
> - * vfio_file_set_kvm - Link a kvm with VFIO drivers
> - * @file: VFIO group file
> - * @kvm: KVM to link
> - *
> - * When a VFIO device is first opened the KVM will be available in
> - * device->kvm if one was associated with the group.
> - */
> -void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
> +void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm)
>  {
> -	struct vfio_group *group = file->private_data;
> -
> -	if (!vfio_file_is_group(file))
> -		return;
> -
> +	/*
> +	 * When a VFIO device is first opened the KVM will be available in
> +	 * device->kvm if one was associated with the group.
> +	 */
>  	mutex_lock(&group->group_lock);
>  	group->kvm = kvm;
>  	mutex_unlock(&group->group_lock);
>  }
> -EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
>  
> -/**
> - * vfio_file_has_dev - True if the VFIO file is a handle for device
> - * @file: VFIO file to check
> - * @device: Device that must be part of the file
> - *
> - * Returns true if given file has permission to manipulate the given device.
> - */
> -bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
> +bool vfio_group_has_dev(struct vfio_group *group, struct vfio_device *device)
>  {
> -	struct vfio_group *group = file->private_data;
> -
> -	if (!vfio_file_is_group(file))
> -		return false;
> -
>  	return group == device->group;
>  }
> -EXPORT_SYMBOL_GPL(vfio_file_has_dev);
>  
>  static char *vfio_devnode(const struct device *dev, umode_t *mode)
>  {
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 26a541cc64d1..985c6184a587 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -1319,8 +1319,8 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
>  			break;
>  		}
>  
> -		/* Ensure the FD is a vfio group FD.*/
> -		if (!vfio_file_is_group(file)) {
> +		/* Ensure the FD is a vfio FD.*/
> +		if (!vfio_file_is_valid(file)) {
>  			fput(file);
>  			ret = -EINVAL;
>  			break;
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index 1091806bc89d..ef5de2872983 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -90,6 +90,10 @@ void vfio_device_group_unregister(struct vfio_device *device);
>  int vfio_device_group_use_iommu(struct vfio_device *device);
>  void vfio_device_group_unuse_iommu(struct vfio_device *device);
>  void vfio_device_group_close(struct vfio_device *device);
> +struct vfio_group *vfio_group_from_file(struct file *file);
> +bool vfio_group_enforced_coherent(struct vfio_group *group);
> +void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm);
> +bool vfio_group_has_dev(struct vfio_group *group, struct vfio_device *device);
>  bool vfio_device_has_container(struct vfio_device *device);
>  int __init vfio_group_init(void);
>  void vfio_group_cleanup(void);
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index ee54c9ae0af4..1aedfbd15ca0 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -1119,6 +1119,68 @@ const struct file_operations vfio_device_fops = {
>  	.mmap		= vfio_device_fops_mmap,
>  };
>  
> +/**
> + * vfio_file_is_valid - True if the file is usable with VFIO aPIS
> + * @file: VFIO group file or VFIO device file
> + */
> +bool vfio_file_is_valid(struct file *file)
> +{
> +	return vfio_group_from_file(file);
is this implicit conversion from ptr to bool always safe?
> +}
> +EXPORT_SYMBOL_GPL(vfio_file_is_valid);
> +
> +/**
> + * vfio_file_enforced_coherent - True if the DMA associated with the VFIO file
> + *        is always CPU cache coherent
> + * @file: VFIO group or device file
> + *
> + * Enforced coherency means that the IOMMU ignores things like the PCIe no-snoop
> + * bit in DMA transactions. A return of false indicates that the user has
> + * rights to access additional instructions such as wbinvd on x86.
> + */
> +bool vfio_file_enforced_coherent(struct file *file)
> +{
> +	struct vfio_group *group = vfio_group_from_file(file);
> +
> +	if (group)
> +		return vfio_group_enforced_coherent(group);
> +
> +	return true;
> +}
> +EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
> +
> +/**
> + * vfio_file_set_kvm - Link a kvm with VFIO drivers
> + * @file: VFIO group file or device file
> + * @kvm: KVM to link
> + *
> + */
> +void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
> +{
> +	struct vfio_group *group = vfio_group_from_file(file);
> +
> +	if (group)
> +		vfio_group_set_kvm(group, kvm);
> +}
> +EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
> +
> +/**
> + * vfio_file_has_dev - True if the VFIO file is a handle for device
This original description sounds weird because originally it aimed
at figuring whether the device belonged to that vfio group fd, no?
And since it will handle both group fd and device fd it still sounds
weird to me.
> + * @file: VFIO file to check
> + * @device: Device that must be part of the file
> + *
> + * Returns true if given file has permission to manipulate the given device.
> + */
> +bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
> +{
> +	struct vfio_group *group = vfio_group_from_file(file);
> +
> +	if (group)
> +		return vfio_group_has_dev(group, device);
> +	return false;
> +}
> +EXPORT_SYMBOL_GPL(vfio_file_has_dev);
> +
>  /*
>   * Sub-module support
>   */
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 35be78e9ae57..46edd6e6c0ba 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -241,7 +241,7 @@ int vfio_mig_get_next_state(struct vfio_device *device,
>   * External user API
>   */
>  struct iommu_group *vfio_file_iommu_group(struct file *file);
> -bool vfio_file_is_group(struct file *file);
> +bool vfio_file_is_valid(struct file *file);
>  bool vfio_file_enforced_coherent(struct file *file);
>  void vfio_file_set_kvm(struct file *file, struct kvm *kvm);
>  bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> index 495ceabffe88..868930c7a59b 100644
> --- a/virt/kvm/vfio.c
> +++ b/virt/kvm/vfio.c
> @@ -64,18 +64,18 @@ static bool kvm_vfio_file_enforced_coherent(struct file *file)
>  	return ret;
>  }
>  
> -static bool kvm_vfio_file_is_group(struct file *file)
> +static bool kvm_vfio_file_is_valid(struct file *file)
>  {
>  	bool (*fn)(struct file *file);
>  	bool ret;
>  
> -	fn = symbol_get(vfio_file_is_group);
> +	fn = symbol_get(vfio_file_is_valid);
>  	if (!fn)
>  		return false;
>  
>  	ret = fn(file);
>  
> -	symbol_put(vfio_file_is_group);
> +	symbol_put(vfio_file_is_valid);
>  
>  	return ret;
>  }
> @@ -154,8 +154,8 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
>  	if (!filp)
>  		return -EBADF;
>  
> -	/* Ensure the FD is a vfio group FD.*/
> -	if (!kvm_vfio_file_is_group(filp)) {
> +	/* Ensure the FD is a vfio FD.*/
> +	if (!kvm_vfio_file_is_valid(filp)) {
>  		ret = -EINVAL;
>  		goto err_fput;
>  	}
Besides

Reviewed-by: Eric Auger <eric.auger@redhat.com>

Thanks

Eric


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/13] vfio: Accept vfio device file in the driver facing kAPI
  2023-01-17 13:49 ` [PATCH 03/13] vfio: Accept vfio device file in the driver facing kAPI Yi Liu
  2023-01-18  8:45   ` Tian, Kevin
@ 2023-01-18 16:11   ` Eric Auger
  2023-01-30  9:47     ` Liu, Yi L
  1 sibling, 1 reply; 80+ messages in thread
From: Eric Auger @ 2023-01-18 16:11 UTC (permalink / raw)
  To: Yi Liu, alex.williamson, jgg
  Cc: kevin.tian, cohuck, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, suravee.suthikulpanit

Hi Yi,

On 1/17/23 14:49, Yi Liu wrote:
> This makes the vfio file kAPIs to accepte vfio device files, also a
> preparation for vfio device cdev support.
>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/vfio.h      |  1 +
>  drivers/vfio/vfio_main.c | 51 ++++++++++++++++++++++++++++++++++++----
>  2 files changed, 48 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index ef5de2872983..53af6e3ea214 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -18,6 +18,7 @@ struct vfio_container;
>  
>  struct vfio_device_file {
>  	struct vfio_device *device;
> +	struct kvm *kvm;
>  };
>  
>  void vfio_device_put_registration(struct vfio_device *device);
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 1aedfbd15ca0..dc08d5dd62cc 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -1119,13 +1119,23 @@ const struct file_operations vfio_device_fops = {
>  	.mmap		= vfio_device_fops_mmap,
>  };
>  
> +static struct vfio_device *vfio_device_from_file(struct file *file)
> +{
> +	struct vfio_device_file *df = file->private_data;
> +
> +	if (file->f_op != &vfio_device_fops)
> +		return NULL;
> +	return df->device;
> +}
> +
>  /**
>   * vfio_file_is_valid - True if the file is usable with VFIO aPIS
>   * @file: VFIO group file or VFIO device file
>   */
>  bool vfio_file_is_valid(struct file *file)
>  {
> -	return vfio_group_from_file(file);
> +	return vfio_group_from_file(file) ||
> +	       vfio_device_from_file(file);
>  }
>  EXPORT_SYMBOL_GPL(vfio_file_is_valid);
>  
> @@ -1140,15 +1150,37 @@ EXPORT_SYMBOL_GPL(vfio_file_is_valid);
>   */
>  bool vfio_file_enforced_coherent(struct file *file)
>  {
> -	struct vfio_group *group = vfio_group_from_file(file);
> +	struct vfio_group *group;
> +	struct vfio_device *device;
>  
> +	group = vfio_group_from_file(file);
>  	if (group)
>  		return vfio_group_enforced_coherent(group);
>  
> +	device = vfio_device_from_file(file);
> +	if (device)
> +		return device_iommu_capable(device->dev,
> +					    IOMMU_CAP_ENFORCE_CACHE_COHERENCY);
> +
>  	return true;
>  }
>  EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
>  
> +static void vfio_device_file_set_kvm(struct file *file, struct kvm *kvm)
> +{
> +	struct vfio_device_file *df = file->private_data;
> +	struct vfio_device *device = df->device;
> +
> +	/*
> +	 * The kvm is first recorded in the df, and will be propagated
> +	 * to vfio_device::kvm when the file binds iommufd successfully in
> +	 * the vfio device cdev path.
> +	 */
> +	mutex_lock(&device->dev_set->lock);
it is not totally obvious to me why the

device->dev_set->lock needs to be held here and why that lock in particular. Isn't supposed to protect the vfio_device_set. The header just mentions
"the VFIO core will provide a lock that is held around open_device()/close_device() for all devices in the set."

> +	df->kvm = kvm;
> +	mutex_unlock(&device->dev_set->lock);
> +}
> +
>  /**
>   * vfio_file_set_kvm - Link a kvm with VFIO drivers
>   * @file: VFIO group file or device file
> @@ -1157,10 +1189,14 @@ EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
>   */
>  void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
>  {
> -	struct vfio_group *group = vfio_group_from_file(file);
> +	struct vfio_group *group;
>  
> +	group = vfio_group_from_file(file);
>  	if (group)
>  		vfio_group_set_kvm(group, kvm);
> +
> +	if (vfio_device_from_file(file))
> +		vfio_device_file_set_kvm(file, kvm);
>  }
>  EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
>  
> @@ -1173,10 +1209,17 @@ EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
>   */
>  bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
>  {
> -	struct vfio_group *group = vfio_group_from_file(file);
> +	struct vfio_group *group;
> +	struct vfio_device *vdev;
>  
> +	group = vfio_group_from_file(file);
>  	if (group)
>  		return vfio_group_has_dev(group, device);
> +
> +	vdev = vfio_device_from_file(file);
> +	if (device)
> +		return vdev == device;
> +
>  	return false;
>  }
>  EXPORT_SYMBOL_GPL(vfio_file_has_dev);
Thanks

Eric


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 04/13] kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device fd
  2023-01-17 13:49 ` [PATCH 04/13] kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device fd Yi Liu
  2023-01-18  8:47   ` Tian, Kevin
@ 2023-01-18 16:33   ` Eric Auger
  1 sibling, 0 replies; 80+ messages in thread
From: Eric Auger @ 2023-01-18 16:33 UTC (permalink / raw)
  To: Yi Liu, alex.williamson, jgg
  Cc: kevin.tian, cohuck, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, suravee.suthikulpanit



On 1/17/23 14:49, Yi Liu wrote:
> Meanwhile, rename related helpers. No functional change is intended.
>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  virt/kvm/vfio.c | 115 ++++++++++++++++++++++++------------------------
>  1 file changed, 58 insertions(+), 57 deletions(-)
>
> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> index 868930c7a59b..0f54b9d308d7 100644
> --- a/virt/kvm/vfio.c
> +++ b/virt/kvm/vfio.c
> @@ -21,7 +21,7 @@
>  #include <asm/kvm_ppc.h>
>  #endif
>  
> -struct kvm_vfio_group {
> +struct kvm_vfio_file {
>  	struct list_head node;
>  	struct file *file;
>  #ifdef CONFIG_SPAPR_TCE_IOMMU
> @@ -30,7 +30,7 @@ struct kvm_vfio_group {
>  };
>  
>  struct kvm_vfio {
> -	struct list_head group_list;
> +	struct list_head file_list;
>  	struct mutex lock;
>  	bool noncoherent;
>  };
> @@ -98,34 +98,35 @@ static struct iommu_group *kvm_vfio_file_iommu_group(struct file *file)
>  }
>  
>  static void kvm_spapr_tce_release_vfio_group(struct kvm *kvm,
> -					     struct kvm_vfio_group *kvg)
> +					     struct kvm_vfio_file *kvf)
>  {
> -	if (WARN_ON_ONCE(!kvg->iommu_group))
> +	if (WARN_ON_ONCE(!kvf->iommu_group))
>  		return;
>  
> -	kvm_spapr_tce_release_iommu_group(kvm, kvg->iommu_group);
> -	iommu_group_put(kvg->iommu_group);
> -	kvg->iommu_group = NULL;
> +	kvm_spapr_tce_release_iommu_group(kvm, kvf->iommu_group);
> +	iommu_group_put(kvf->iommu_group);
> +	kvf->iommu_group = NULL;
>  }
>  #endif
>  
>  /*
> - * Groups can use the same or different IOMMU domains.  If the same then
> - * adding a new group may change the coherency of groups we've previously
> - * been told about.  We don't want to care about any of that so we retest
> - * each group and bail as soon as we find one that's noncoherent.  This
> - * means we only ever [un]register_noncoherent_dma once for the whole device.
> + * Groups/devices can use the same or different IOMMU domains.  If the same
> + * then adding a new group/device may change the coherency of groups/devices
> + * we've previously been told about.  We don't want to care about any of
> + * that so we retest each group/device and bail as soon as we find one that's
> + * noncoherent.  This means we only ever [un]register_noncoherent_dma once
> + * for the whole device.
>   */
>  static void kvm_vfio_update_coherency(struct kvm_device *dev)
>  {
>  	struct kvm_vfio *kv = dev->private;
>  	bool noncoherent = false;
> -	struct kvm_vfio_group *kvg;
> +	struct kvm_vfio_file *kvf;
>  
>  	mutex_lock(&kv->lock);
>  
> -	list_for_each_entry(kvg, &kv->group_list, node) {
> -		if (!kvm_vfio_file_enforced_coherent(kvg->file)) {
> +	list_for_each_entry(kvf, &kv->file_list, node) {
> +		if (!kvm_vfio_file_enforced_coherent(kvf->file)) {
>  			noncoherent = true;
>  			break;
>  		}
> @@ -143,10 +144,10 @@ static void kvm_vfio_update_coherency(struct kvm_device *dev)
>  	mutex_unlock(&kv->lock);
>  }
>  
> -static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
> +static int kvm_vfio_file_add(struct kvm_device *dev, unsigned int fd)
>  {
>  	struct kvm_vfio *kv = dev->private;
> -	struct kvm_vfio_group *kvg;
> +	struct kvm_vfio_file *kvf;
>  	struct file *filp;
>  	int ret;
>  
> @@ -162,27 +163,27 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
>  
>  	mutex_lock(&kv->lock);
>  
> -	list_for_each_entry(kvg, &kv->group_list, node) {
> -		if (kvg->file == filp) {
> +	list_for_each_entry(kvf, &kv->file_list, node) {
> +		if (kvf->file == filp) {
>  			ret = -EEXIST;
>  			goto err_unlock;
>  		}
>  	}
>  
> -	kvg = kzalloc(sizeof(*kvg), GFP_KERNEL_ACCOUNT);
> -	if (!kvg) {
> +	kvf = kzalloc(sizeof(*kvf), GFP_KERNEL_ACCOUNT);
> +	if (!kvf) {
>  		ret = -ENOMEM;
>  		goto err_unlock;
>  	}
>  
> -	kvg->file = filp;
> -	list_add_tail(&kvg->node, &kv->group_list);
> +	kvf->file = filp;
> +	list_add_tail(&kvf->node, &kv->file_list);
>  
>  	kvm_arch_start_assignment(dev->kvm);
>  
>  	mutex_unlock(&kv->lock);
>  
> -	kvm_vfio_file_set_kvm(kvg->file, dev->kvm);
> +	kvm_vfio_file_set_kvm(kvf->file, dev->kvm);
>  	kvm_vfio_update_coherency(dev);
>  
>  	return 0;
> @@ -193,10 +194,10 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
>  	return ret;
>  }
>  
> -static int kvm_vfio_group_del(struct kvm_device *dev, unsigned int fd)
> +static int kvm_vfio_file_del(struct kvm_device *dev, unsigned int fd)
>  {
>  	struct kvm_vfio *kv = dev->private;
> -	struct kvm_vfio_group *kvg;
> +	struct kvm_vfio_file *kvf;
>  	struct fd f;
>  	int ret;
>  
> @@ -208,18 +209,18 @@ static int kvm_vfio_group_del(struct kvm_device *dev, unsigned int fd)
>  
>  	mutex_lock(&kv->lock);
>  
> -	list_for_each_entry(kvg, &kv->group_list, node) {
> -		if (kvg->file != f.file)
> +	list_for_each_entry(kvf, &kv->file_list, node) {
> +		if (kvf->file != f.file)
>  			continue;
>  
> -		list_del(&kvg->node);
> +		list_del(&kvf->node);
>  		kvm_arch_end_assignment(dev->kvm);
>  #ifdef CONFIG_SPAPR_TCE_IOMMU
> -		kvm_spapr_tce_release_vfio_group(dev->kvm, kvg);
> +		kvm_spapr_tce_release_vfio_group(dev->kvm, kvf);
>  #endif
> -		kvm_vfio_file_set_kvm(kvg->file, NULL);
> -		fput(kvg->file);
> -		kfree(kvg);
> +		kvm_vfio_file_set_kvm(kvf->file, NULL);
> +		fput(kvf->file);
> +		kfree(kvf);
>  		ret = 0;
>  		break;
>  	}
> @@ -234,12 +235,12 @@ static int kvm_vfio_group_del(struct kvm_device *dev, unsigned int fd)
>  }
>  
>  #ifdef CONFIG_SPAPR_TCE_IOMMU
> -static int kvm_vfio_group_set_spapr_tce(struct kvm_device *dev,
> -					void __user *arg)
> +static int kvm_vfio_file_set_spapr_tce(struct kvm_device *dev,
> +				       void __user *arg)
>  {
>  	struct kvm_vfio_spapr_tce param;
>  	struct kvm_vfio *kv = dev->private;
> -	struct kvm_vfio_group *kvg;
> +	struct kvm_vfio_file *kvf;
>  	struct fd f;
>  	int ret;
>  
> @@ -254,20 +255,20 @@ static int kvm_vfio_group_set_spapr_tce(struct kvm_device *dev,
>  
>  	mutex_lock(&kv->lock);
>  
> -	list_for_each_entry(kvg, &kv->group_list, node) {
> -		if (kvg->file != f.file)
> +	list_for_each_entry(kvf, &kv->file_list, node) {
> +		if (kvf->file != f.file)
>  			continue;
>  
> -		if (!kvg->iommu_group) {
> -			kvg->iommu_group = kvm_vfio_file_iommu_group(kvg->file);
> -			if (WARN_ON_ONCE(!kvg->iommu_group)) {
> +		if (!kvf->iommu_group) {
> +			kvf->iommu_group = kvm_vfio_file_iommu_group(kvf->file);
> +			if (WARN_ON_ONCE(!kvf->iommu_group)) {
>  				ret = -EIO;
>  				goto err_fdput;
>  			}
>  		}
>  
>  		ret = kvm_spapr_tce_attach_iommu_group(dev->kvm, param.tablefd,
> -						       kvg->iommu_group);
> +						       kvf->iommu_group);
>  		break;
>  	}
>  
> @@ -278,8 +279,8 @@ static int kvm_vfio_group_set_spapr_tce(struct kvm_device *dev,
>  }
>  #endif
>  
> -static int kvm_vfio_set_group(struct kvm_device *dev, long attr,
> -			      void __user *arg)
> +static int kvm_vfio_set_file(struct kvm_device *dev, long attr,
> +			     void __user *arg)
>  {
>  	int32_t __user *argp = arg;
>  	int32_t fd;
> @@ -288,16 +289,16 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr,
>  	case KVM_DEV_VFIO_GROUP_ADD:
>  		if (get_user(fd, argp))
>  			return -EFAULT;
> -		return kvm_vfio_group_add(dev, fd);
> +		return kvm_vfio_file_add(dev, fd);
>  
>  	case KVM_DEV_VFIO_GROUP_DEL:
>  		if (get_user(fd, argp))
>  			return -EFAULT;
> -		return kvm_vfio_group_del(dev, fd);
> +		return kvm_vfio_file_del(dev, fd);
>  
>  #ifdef CONFIG_SPAPR_TCE_IOMMU
>  	case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE:
> -		return kvm_vfio_group_set_spapr_tce(dev, arg);
> +		return kvm_vfio_file_set_spapr_tce(dev, arg);
>  #endif
>  	}
>  
> @@ -309,8 +310,8 @@ static int kvm_vfio_set_attr(struct kvm_device *dev,
>  {
>  	switch (attr->group) {
>  	case KVM_DEV_VFIO_GROUP:
> -		return kvm_vfio_set_group(dev, attr->attr,
> -					  u64_to_user_ptr(attr->addr));
> +		return kvm_vfio_set_file(dev, attr->attr,
> +					 u64_to_user_ptr(attr->addr));
>  	}
>  
>  	return -ENXIO;
> @@ -339,16 +340,16 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
>  static void kvm_vfio_destroy(struct kvm_device *dev)
>  {
>  	struct kvm_vfio *kv = dev->private;
> -	struct kvm_vfio_group *kvg, *tmp;
> +	struct kvm_vfio_file *kvf, *tmp;
>  
> -	list_for_each_entry_safe(kvg, tmp, &kv->group_list, node) {
> +	list_for_each_entry_safe(kvf, tmp, &kv->file_list, node) {
>  #ifdef CONFIG_SPAPR_TCE_IOMMU
> -		kvm_spapr_tce_release_vfio_group(dev->kvm, kvg);
> +		kvm_spapr_tce_release_vfio_group(dev->kvm, kvf);
>  #endif
> -		kvm_vfio_file_set_kvm(kvg->file, NULL);
> -		fput(kvg->file);
> -		list_del(&kvg->node);
> -		kfree(kvg);
> +		kvm_vfio_file_set_kvm(kvf->file, NULL);
> +		fput(kvf->file);
> +		list_del(&kvf->node);
> +		kfree(kvf);
>  		kvm_arch_end_assignment(dev->kvm);
>  	}
>  
> @@ -382,7 +383,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
>  	if (!kv)
>  		return -ENOMEM;
>  
> -	INIT_LIST_HEAD(&kv->group_list);
> +	INIT_LIST_HEAD(&kv->file_list);
>  	mutex_init(&kv->lock);
>  
>  	dev->private = kv;
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Eric


^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 08/13] vfio: Block device access via device fd until device is opened
  2023-01-18 13:52     ` Jason Gunthorpe
@ 2023-01-19  3:42       ` Tian, Kevin
  2023-01-19  3:43         ` Liu, Yi L
  0 siblings, 1 reply; 80+ messages in thread
From: Tian, Kevin @ 2023-01-19  3:42 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liu, Yi L, alex.williamson, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	suravee.suthikulpanit

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, January 18, 2023 9:52 PM
> 
> On Wed, Jan 18, 2023 at 09:35:33AM +0000, Tian, Kevin wrote:
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Tuesday, January 17, 2023 9:50 PM
> > >
> > > Allow the vfio_device file to be in a state where the device FD is
> > > opened but the device cannot be used by userspace (i.e. its .open_device()
> > > hasn't been called). This inbetween state is not used when the device
> > > FD is spawned from the group FD, however when we create the device FD
> > > directly by opening a cdev it will be opened in the blocked state.
> > >
> > > In the blocked state, currently only the bind operation is allowed,
> > > other device accesses are not allowed. Completing bind will allow user
> > > to further access the device.
> > >
> > > This is implemented by adding a flag in struct vfio_device_file to mark
> > > the blocked state and using a simple smp_load_acquire() to obtain the
> > > flag value and serialize all the device setup with the thread accessing
> > > this device.
> > >
> > > Due to this scheme it is not possible to unbind the FD, once it is bound,
> > > it remains bound until the FD is closed.
> > >
> >
> > My question to the last version was not answered...
> >
> > Can you elaborate why it is impossible to unbind? Is it more an
> > implementation choice or conceptual restriction?
> 
> At least for the implementation it is due to the use of the lockless
> test for bind.
> 
> It can safely handle unbind->bind but it cannot handle
> bind->unbind. To allows this we'd need to add a lock on all the vfio
> ioctls which seems costly.
> 

OK, it makes sense. Yi, can you add this message in next version?

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 08/13] vfio: Block device access via device fd until device is opened
  2023-01-19  3:42       ` Tian, Kevin
@ 2023-01-19  3:43         ` Liu, Yi L
  0 siblings, 0 replies; 80+ messages in thread
From: Liu, Yi L @ 2023-01-19  3:43 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe
  Cc: alex.williamson, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang, suravee.suthikulpanit

> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Thursday, January 19, 2023 11:42 AM

> > > My question to the last version was not answered...
> > >
> > > Can you elaborate why it is impossible to unbind? Is it more an
> > > implementation choice or conceptual restriction?
> >
> > At least for the implementation it is due to the use of the lockless
> > test for bind.
> >
> > It can safely handle unbind->bind but it cannot handle
> > bind->unbind. To allows this we'd need to add a lock on all the vfio
> > ioctls which seems costly.
> >
> 
> OK, it makes sense. Yi, can you add this message in next version?
 
Yeah. 😊 

Regards,
Yi Liu 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 05/13] kvm/vfio: Provide struct kvm_device_ops::release() insted of ::destroy()
  2023-01-17 13:49 ` [PATCH 05/13] kvm/vfio: Provide struct kvm_device_ops::release() insted of ::destroy() Yi Liu
  2023-01-18  8:56   ` Tian, Kevin
@ 2023-01-19  9:12   ` Eric Auger
  2023-01-19  9:30     ` Tian, Kevin
  2023-01-19 19:07   ` Jason Gunthorpe
  2 siblings, 1 reply; 80+ messages in thread
From: Eric Auger @ 2023-01-19  9:12 UTC (permalink / raw)
  To: Yi Liu, alex.williamson, jgg
  Cc: kevin.tian, cohuck, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, suravee.suthikulpanit

Hi Yi,

On 1/17/23 14:49, Yi Liu wrote:
> This is to avoid a circular refcount problem between the kvm struct and
> the device file. KVM modules holds device/group file reference when the
> device/group is added and releases it per removal or the last kvm reference
> is released. This reference model is ok for the group since there is no
> kvm reference in the group paths.
>
> But it is a problem for device file since the vfio devices may get kvm
> reference in the device open path and put it in the device file release.
> e.g. Intel kvmgt. This would result in a circular issue since the kvm
> side won't put the device file reference if kvm reference is not 0, while
> the vfio device side needs to put kvm reference in the release callback.
>
> To solve this problem for device file, let vfio provide release() which
> would be called once kvm file is closed, it won't depend on the last kvm
> reference. Hence avoid circular refcount problem.
>
> Suggested-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  virt/kvm/vfio.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> index 0f54b9d308d7..525efe37ab6d 100644
> --- a/virt/kvm/vfio.c
> +++ b/virt/kvm/vfio.c
> @@ -364,7 +364,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type);
>  static struct kvm_device_ops kvm_vfio_ops = {
>  	.name = "kvm-vfio",
>  	.create = kvm_vfio_create,
> -	.destroy = kvm_vfio_destroy,
Is it safe to simply remove the destroy cb as it is called from
kvm_destroy_vm/kvm_destroy_devices?

Thanks

Eric
> +	.release = kvm_vfio_destroy,
>  	.set_attr = kvm_vfio_set_attr,
>  	.has_attr = kvm_vfio_has_attr,
>  };


^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 05/13] kvm/vfio: Provide struct kvm_device_ops::release() insted of ::destroy()
  2023-01-19  9:12   ` Eric Auger
@ 2023-01-19  9:30     ` Tian, Kevin
  2023-01-20  3:52       ` Liu, Yi L
  0 siblings, 1 reply; 80+ messages in thread
From: Tian, Kevin @ 2023-01-19  9:30 UTC (permalink / raw)
  To: eric.auger, Liu, Yi L, alex.williamson, jgg
  Cc: cohuck, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, suravee.suthikulpanit

> From: Eric Auger <eric.auger@redhat.com>
> Sent: Thursday, January 19, 2023 5:13 PM
> 
> Hi Yi,
> 
> On 1/17/23 14:49, Yi Liu wrote:
> > This is to avoid a circular refcount problem between the kvm struct and
> > the device file. KVM modules holds device/group file reference when the
> > device/group is added and releases it per removal or the last kvm reference
> > is released. This reference model is ok for the group since there is no
> > kvm reference in the group paths.
> >
> > But it is a problem for device file since the vfio devices may get kvm
> > reference in the device open path and put it in the device file release.
> > e.g. Intel kvmgt. This would result in a circular issue since the kvm
> > side won't put the device file reference if kvm reference is not 0, while
> > the vfio device side needs to put kvm reference in the release callback.
> >
> > To solve this problem for device file, let vfio provide release() which
> > would be called once kvm file is closed, it won't depend on the last kvm
> > reference. Hence avoid circular refcount problem.
> >
> > Suggested-by: Kevin Tian <kevin.tian@intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  virt/kvm/vfio.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> > index 0f54b9d308d7..525efe37ab6d 100644
> > --- a/virt/kvm/vfio.c
> > +++ b/virt/kvm/vfio.c
> > @@ -364,7 +364,7 @@ static int kvm_vfio_create(struct kvm_device *dev,
> u32 type);
> >  static struct kvm_device_ops kvm_vfio_ops = {
> >  	.name = "kvm-vfio",
> >  	.create = kvm_vfio_create,
> > -	.destroy = kvm_vfio_destroy,
> Is it safe to simply remove the destroy cb as it is called from
> kvm_destroy_vm/kvm_destroy_devices?
> 

According to the definition .release is considered as an alternative
method to free the device:

	/*
	 * Destroy is responsible for freeing dev.
	 *
	 * Destroy may be called before or after destructors are called
	 * on emulated I/O regions, depending on whether a reference is
	 * held by a vcpu or other kvm component that gets destroyed
	 * after the emulated I/O.
	 */
	void (*destroy)(struct kvm_device *dev);

	/*
	 * Release is an alternative method to free the device. It is
	 * called when the device file descriptor is closed. Once
	 * release is called, the destroy method will not be called
	 * anymore as the device is removed from the device list of
	 * the VM. kvm->lock is held.
	 */
	void (*release)(struct kvm_device *dev);

Did you see any specific problem of moving this stuff to release?


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 06/13] kvm/vfio: Accept vfio device file from userspace
  2023-01-17 13:49 ` [PATCH 06/13] kvm/vfio: Accept vfio device file from userspace Yi Liu
  2023-01-18  9:18   ` Tian, Kevin
@ 2023-01-19  9:35   ` Eric Auger
  2023-01-30  7:36     ` Liu, Yi L
  1 sibling, 1 reply; 80+ messages in thread
From: Eric Auger @ 2023-01-19  9:35 UTC (permalink / raw)
  To: Yi Liu, alex.williamson, jgg
  Cc: kevin.tian, cohuck, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, suravee.suthikulpanit

Hi Yi,

On 1/17/23 14:49, Yi Liu wrote:
> This defines KVM_DEV_VFIO_FILE* and make alias with KVM_DEV_VFIO_GROUP*.
> Old userspace uses KVM_DEV_VFIO_GROUP* works as well.
>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  Documentation/virt/kvm/devices/vfio.rst | 32 ++++++++++++-------------
>  include/uapi/linux/kvm.h                | 23 +++++++++++++-----
>  virt/kvm/vfio.c                         | 18 +++++++-------
>  3 files changed, 42 insertions(+), 31 deletions(-)
>
> diff --git a/Documentation/virt/kvm/devices/vfio.rst b/Documentation/virt/kvm/devices/vfio.rst
> index 2d20dc561069..ac4300ded398 100644
> --- a/Documentation/virt/kvm/devices/vfio.rst
> +++ b/Documentation/virt/kvm/devices/vfio.rst
> @@ -9,23 +9,23 @@ Device types supported:
>    - KVM_DEV_TYPE_VFIO
>  
>  Only one VFIO instance may be created per VM.  The created device
> -tracks VFIO groups in use by the VM and features of those groups
> -important to the correctness and acceleration of the VM.  As groups
> -are enabled and disabled for use by the VM, KVM should be updated
> -about their presence.  When registered with KVM, a reference to the
> -VFIO-group is held by KVM.
> -
> -Groups:
> -  KVM_DEV_VFIO_GROUP
> -
> -KVM_DEV_VFIO_GROUP attributes:
> -  KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
> -	kvm_device_attr.addr points to an int32_t file descriptor
> +tracks VFIO files (group or device) in use by the VM and features
> +of those groups/devices important to the correctness and acceleration
> +of the VM.  As groups/device are enabled and disabled for use by the
> +VM, KVM should be updated about their presence.  When registered with
> +KVM, a reference to the VFIO file is held by KVM.
> +
> +VFIO Files:
> +  KVM_DEV_VFIO_FILE
> +
> +KVM_DEV_VFIO_FILE attributes:
> +  KVM_DEV_VFIO_FILE_ADD: Add a VFIO file (group/device) to VFIO-KVM device
> +	tracking kvm_device_attr.addr points to an int32_t file descriptor
> +	for the VFIO file.
> +  KVM_DEV_VFIO_FILE_DEL: Remove a VFIO file (group/device) from VFIO-KVM device
> +	tracking kvm_device_attr.addr points to an int32_t file descriptor
>  	for the VFIO group.
> -  KVM_DEV_VFIO_GROUP_DEL: Remove a VFIO group from VFIO-KVM device tracking
> -	kvm_device_attr.addr points to an int32_t file descriptor
> -	for the VFIO group.
> -  KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE: attaches a guest visible TCE table
> +  KVM_DEV_VFIO_FILE_SET_SPAPR_TCE: attaches a guest visible TCE table
>  	allocated by sPAPR KVM.
>  	kvm_device_attr.addr points to a struct::
>  
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 55155e262646..ad36e144a41d 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1396,15 +1396,26 @@ struct kvm_create_device {
>  
>  struct kvm_device_attr {
>  	__u32	flags;		/* no flags currently defined */
> -	__u32	group;		/* device-defined */
> -	__u64	attr;		/* group-defined */
> +	union {
> +		__u32	group;
> +		__u32	file;
> +	}; /* device-defined */
> +	__u64	attr;		/* VFIO-file-defined or group-defined */
I think there is a confusion here between the 'VFIO group' terminology
and the 'kvm device group' terminology. Commands for kvm devices are
gathered in groups and within groups you have sub-commands called
attributes.

See Documentation/virt/kvm/devices/arm-vgic-v3.rst for instance. So to
me this shall be left unchanged.
>  	__u64	addr;		/* userspace address of attr data */
>  };
>  
> -#define  KVM_DEV_VFIO_GROUP			1
> -#define   KVM_DEV_VFIO_GROUP_ADD			1
> -#define   KVM_DEV_VFIO_GROUP_DEL			2
> -#define   KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE		3
> +#define  KVM_DEV_VFIO_FILE	1
> +
> +#define   KVM_DEV_VFIO_FILE_ADD			1
> +#define   KVM_DEV_VFIO_FILE_DEL			2
> +#define   KVM_DEV_VFIO_FILE_SET_SPAPR_TCE	3
> +
> +/* Group aliases are for compile time uapi compatibility */
> +#define  KVM_DEV_VFIO_GROUP	KVM_DEV_VFIO_FILE
> +
> +#define   KVM_DEV_VFIO_GROUP_ADD	KVM_DEV_VFIO_FILE_ADD
> +#define   KVM_DEV_VFIO_GROUP_DEL	KVM_DEV_VFIO_FILE_DEL
> +#define   KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE	KVM_DEV_VFIO_FILE_SET_SPAPR_TCE
>  
>  enum kvm_device_type {
>  	KVM_DEV_TYPE_FSL_MPIC_20	= 1,
> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> index 525efe37ab6d..e73ca60af3ae 100644
> --- a/virt/kvm/vfio.c
> +++ b/virt/kvm/vfio.c
> @@ -286,18 +286,18 @@ static int kvm_vfio_set_file(struct kvm_device *dev, long attr,
>  	int32_t fd;
>  
>  	switch (attr) {
> -	case KVM_DEV_VFIO_GROUP_ADD:
> +	case KVM_DEV_VFIO_FILE_ADD:
>  		if (get_user(fd, argp))
>  			return -EFAULT;
>  		return kvm_vfio_file_add(dev, fd);
>  
> -	case KVM_DEV_VFIO_GROUP_DEL:
> +	case KVM_DEV_VFIO_FILE_DEL:
>  		if (get_user(fd, argp))
>  			return -EFAULT;
>  		return kvm_vfio_file_del(dev, fd);
>  
>  #ifdef CONFIG_SPAPR_TCE_IOMMU
> -	case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE:
> +	case KVM_DEV_VFIO_FILE_SET_SPAPR_TCE:
>  		return kvm_vfio_file_set_spapr_tce(dev, arg);
>  #endif
>  	}
> @@ -309,7 +309,7 @@ static int kvm_vfio_set_attr(struct kvm_device *dev,
>  			     struct kvm_device_attr *attr)
>  {
>  	switch (attr->group) {
> -	case KVM_DEV_VFIO_GROUP:
> +	case KVM_DEV_VFIO_FILE:
>  		return kvm_vfio_set_file(dev, attr->attr,
>  					 u64_to_user_ptr(attr->addr));
>  	}
> @@ -320,13 +320,13 @@ static int kvm_vfio_set_attr(struct kvm_device *dev,
>  static int kvm_vfio_has_attr(struct kvm_device *dev,
>  			     struct kvm_device_attr *attr)
>  {
> -	switch (attr->group) {
> -	case KVM_DEV_VFIO_GROUP:
> +	switch (attr->file) {
> +	case KVM_DEV_VFIO_FILE:
>  		switch (attr->attr) {
> -		case KVM_DEV_VFIO_GROUP_ADD:
> -		case KVM_DEV_VFIO_GROUP_DEL:
> +		case KVM_DEV_VFIO_FILE_ADD:
> +		case KVM_DEV_VFIO_FILE_DEL:
>  #ifdef CONFIG_SPAPR_TCE_IOMMU
> -		case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE:
> +		case KVM_DEV_VFIO_FILE_SET_SPAPR_TCE:
>  #endif
>  			return 0;
>  		}
Thanks

Eric


^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 09/13] vfio: Add infrastructure for bind_iommufd and attach
  2023-01-17 13:49 ` [PATCH 09/13] vfio: Add infrastructure for bind_iommufd and attach Yi Liu
@ 2023-01-19  9:45   ` Tian, Kevin
  2023-01-30 13:52     ` Liu, Yi L
  2023-01-19 23:05   ` Alex Williamson
  1 sibling, 1 reply; 80+ messages in thread
From: Tian, Kevin @ 2023-01-19  9:45 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, suravee.suthikulpanit

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Tuesday, January 17, 2023 9:50 PM
>  static int vfio_device_group_open(struct vfio_device_file *df)
>  {
>  	struct vfio_device *device = df->device;
> +	u32 ioas_id;
> +	u32 *pt_id = NULL;
>  	int ret;
> 
>  	mutex_lock(&device->group->group_lock);
> @@ -165,6 +167,14 @@ static int vfio_device_group_open(struct
> vfio_device_file *df)
>  		goto err_unlock_group;
>  	}
> 
> +	if (device->group->iommufd) {
> +		ret = iommufd_vfio_compat_ioas_id(device->group-
> >iommufd,
> +						  &ioas_id);
> +		if (ret)
> +			goto err_unlock_group;
> +		pt_id = &ioas_id;
> +	}
> +
>  	mutex_lock(&device->dev_set->lock);
>  	/*
>  	 * Here we pass the KVM pointer with the group under the lock.  If
> the
> @@ -174,7 +184,7 @@ static int vfio_device_group_open(struct
> vfio_device_file *df)
>  	df->kvm = device->group->kvm;
>  	df->iommufd = device->group->iommufd;
> 
> -	ret = vfio_device_open(df);
> +	ret = vfio_device_open(df, NULL, pt_id);

having both ioas_id and pt_id in one function is a bit confusing.

Does it read better with below?

if (device->group->iommufd)
	ret = vfio_device_open(df, NULL, &ioas_id);
else
	ret = vfio_device_open(df, NULL, NULL);

> +/* @pt_id == NULL implies detach */
> +int vfio_iommufd_attach(struct vfio_device *vdev, u32 *pt_id)
> +{
> +	lockdep_assert_held(&vdev->dev_set->lock);
> +
> +	return vdev->ops->attach_ioas(vdev, pt_id);
> +}

what benefit does this one-line wrapper give actually?

especially pt_id==NULL is checked in the callback instead of in this
wrapper.


^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 10/13] vfio: Make vfio_device_open() exclusive between group path and device cdev path
  2023-01-17 13:49 ` [PATCH 10/13] vfio: Make vfio_device_open() exclusive between group path and device cdev path Yi Liu
@ 2023-01-19  9:55   ` Tian, Kevin
  2023-01-30 11:59     ` Liu, Yi L
  2023-01-19 23:51   ` Alex Williamson
  1 sibling, 1 reply; 80+ messages in thread
From: Tian, Kevin @ 2023-01-19  9:55 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, suravee.suthikulpanit

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Tuesday, January 17, 2023 9:50 PM
> 
> @@ -17,7 +17,11 @@ struct vfio_device;
>  struct vfio_container;
> 
>  struct vfio_device_file {
> +	/* static fields, init per allocation */
>  	struct vfio_device *device;
> +	bool single_open;

I wonder whether the readability is better by renaming this
to 'cdev', e.g.:

	/*
	 * Device cdev path cannot support multiple device open since
	 * it doesn't have a secure way for it. So a second device
	 * open attempt should be failed if the caller is from a cdev
	 * path or the device has already been opened by a cdev path.
	 */
	if (device->open_count != 0 &&
	    (df->cdev || device->single_open))
		return -EINVAL;

	/*
	 * group path supports multiple device open, while cdev doesn't.
	 * So use vfio_device_group_close() for !singel_open case.
	 */
	if (!df->cdev)
		vfio_device_group_close(df);

because from device file p.o.v we just want to differentiate cdev
vs. group interface. With this change we even don't need the
comment for the last condition check.

it's fine to have device->single_open as it's kind of a status bit
set in the cdev path to prevent more opens on this device.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/13] vfio: Pass struct vfio_device_file * to vfio_device_open/close()
  2023-01-17 13:49 ` [PATCH 07/13] vfio: Pass struct vfio_device_file * to vfio_device_open/close() Yi Liu
  2023-01-18  9:27   ` Tian, Kevin
@ 2023-01-19 11:01   ` Eric Auger
  2023-01-19 20:35     ` Alex Williamson
  2023-01-30  9:38     ` Liu, Yi L
  1 sibling, 2 replies; 80+ messages in thread
From: Eric Auger @ 2023-01-19 11:01 UTC (permalink / raw)
  To: Yi Liu, alex.williamson, jgg
  Cc: kevin.tian, cohuck, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, suravee.suthikulpanit

Hi Yi,

On 1/17/23 14:49, Yi Liu wrote:
> This avoids passing struct kvm * and struct iommufd_ctx * in multiple
> functions. vfio_device_open() becomes to be a locked helper.
why? because dev_set lock now protects vfio_device_file fields? worth to
explain.
do we need to update the comment in vfio.h related to struct
vfio_device_set?
>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/group.c     | 34 +++++++++++++++++++++++++---------
>  drivers/vfio/vfio.h      | 10 +++++-----
>  drivers/vfio/vfio_main.c | 40 ++++++++++++++++++++++++----------------
>  3 files changed, 54 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> index d83cf069d290..7200304663e5 100644
> --- a/drivers/vfio/group.c
> +++ b/drivers/vfio/group.c
> @@ -154,33 +154,49 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
>  	return ret;
>  }
>  
> -static int vfio_device_group_open(struct vfio_device *device)
> +static int vfio_device_group_open(struct vfio_device_file *df)
>  {
> +	struct vfio_device *device = df->device;
>  	int ret;
>  
>  	mutex_lock(&device->group->group_lock);
>  	if (!vfio_group_has_iommu(device->group)) {
>  		ret = -EINVAL;
> -		goto out_unlock;
> +		goto err_unlock_group;
>  	}
>  
> +	mutex_lock(&device->dev_set->lock);
is there an explanation somewhere about locking order b/w group_lock,
dev_set lock?
>  	/*
>  	 * Here we pass the KVM pointer with the group under the lock.  If the
>  	 * device driver will use it, it must obtain a reference and release it
>  	 * during close_device.
>  	 */
May be the opportunity to rephrase the above comment. I am not a native
english speaker but the time concordance seems weird + clarify a
reference to what.
> -	ret = vfio_device_open(device, device->group->iommufd,
> -			       device->group->kvm);
> +	df->kvm = device->group->kvm;
> +	df->iommufd = device->group->iommufd;
> +
> +	ret = vfio_device_open(df);
> +	if (ret)
> +		goto err_unlock_device;
> +	mutex_unlock(&device->dev_set->lock);
>  
> -out_unlock:
> +	mutex_unlock(&device->group->group_lock);
> +	return 0;
> +
> +err_unlock_device:
> +	df->kvm = NULL;
> +	df->iommufd = NULL;
> +	mutex_unlock(&device->dev_set->lock);
> +err_unlock_group:
>  	mutex_unlock(&device->group->group_lock);
>  	return ret;
>  }
>  
> -void vfio_device_group_close(struct vfio_device *device)
> +void vfio_device_group_close(struct vfio_device_file *df)
>  {
> +	struct vfio_device *device = df->device;
> +
>  	mutex_lock(&device->group->group_lock);
> -	vfio_device_close(device, device->group->iommufd);
> +	vfio_device_close(df);
>  	mutex_unlock(&device->group->group_lock);
>  }
>  
> @@ -196,7 +212,7 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
>  		goto err_out;
>  	}
>  
> -	ret = vfio_device_group_open(device);
> +	ret = vfio_device_group_open(df);
>  	if (ret)
>  		goto err_free;
>  
> @@ -228,7 +244,7 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
>  	return filep;
>  
>  err_close_device:
> -	vfio_device_group_close(device);
> +	vfio_device_group_close(df);
>  err_free:
>  	kfree(df);
>  err_out:
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index 53af6e3ea214..3d8ba165146c 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -19,14 +19,14 @@ struct vfio_container;
>  struct vfio_device_file {
>  	struct vfio_device *device;
>  	struct kvm *kvm;
> +	struct iommufd_ctx *iommufd;
>  };
>  
>  void vfio_device_put_registration(struct vfio_device *device);
>  bool vfio_device_try_get_registration(struct vfio_device *device);
> -int vfio_device_open(struct vfio_device *device,
> -		     struct iommufd_ctx *iommufd, struct kvm *kvm);
> -void vfio_device_close(struct vfio_device *device,
> -		       struct iommufd_ctx *iommufd);
> +int vfio_device_open(struct vfio_device_file *df);
> +void vfio_device_close(struct vfio_device_file *device);
> +
>  struct vfio_device_file *
>  vfio_allocate_device_file(struct vfio_device *device);
>  
> @@ -90,7 +90,7 @@ void vfio_device_group_register(struct vfio_device *device);
>  void vfio_device_group_unregister(struct vfio_device *device);
>  int vfio_device_group_use_iommu(struct vfio_device *device);
>  void vfio_device_group_unuse_iommu(struct vfio_device *device);
> -void vfio_device_group_close(struct vfio_device *device);
> +void vfio_device_group_close(struct vfio_device_file *df);
>  struct vfio_group *vfio_group_from_file(struct file *file);
>  bool vfio_group_enforced_coherent(struct vfio_group *group);
>  void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm);
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index dc08d5dd62cc..3df71bd9cd1e 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -358,9 +358,11 @@ vfio_allocate_device_file(struct vfio_device *device)
>  	return df;
>  }
>  
> -static int vfio_device_first_open(struct vfio_device *device,
> -				  struct iommufd_ctx *iommufd, struct kvm *kvm)
> +static int vfio_device_first_open(struct vfio_device_file *df)
>  {
> +	struct vfio_device *device = df->device;
> +	struct iommufd_ctx *iommufd = df->iommufd;
> +	struct kvm *kvm = df->kvm;
>  	int ret;
>  
>  	lockdep_assert_held(&device->dev_set->lock);
> @@ -394,9 +396,11 @@ static int vfio_device_first_open(struct vfio_device *device,
>  	return ret;
>  }
>  
> -static void vfio_device_last_close(struct vfio_device *device,
> -				   struct iommufd_ctx *iommufd)
> +static void vfio_device_last_close(struct vfio_device_file *df)
>  {
> +	struct vfio_device *device = df->device;
> +	struct iommufd_ctx *iommufd = df->iommufd;
> +
>  	lockdep_assert_held(&device->dev_set->lock);
>  
>  	if (device->ops->close_device)
> @@ -409,30 +413,34 @@ static void vfio_device_last_close(struct vfio_device *device,
>  	module_put(device->dev->driver->owner);
>  }
>  
> -int vfio_device_open(struct vfio_device *device,
> -		     struct iommufd_ctx *iommufd, struct kvm *kvm)
> +int vfio_device_open(struct vfio_device_file *df)
>  {
> -	int ret = 0;
> +	struct vfio_device *device = df->device;
> +
> +	lockdep_assert_held(&device->dev_set->lock);
>  
> -	mutex_lock(&device->dev_set->lock);
>  	device->open_count++;
>  	if (device->open_count == 1) {
> -		ret = vfio_device_first_open(device, iommufd, kvm);
> -		if (ret)
> +		int ret;
> +
> +		ret = vfio_device_first_open(df);
> +		if (ret) {
>  			device->open_count--;
> +			return ret;
nit: the original ret init and return was good enough, no need to change it?
> +		}
>  	}
> -	mutex_unlock(&device->dev_set->lock);
>  
> -	return ret;
> +	return 0;
>  }
>  
> -void vfio_device_close(struct vfio_device *device,
> -		       struct iommufd_ctx *iommufd)
> +void vfio_device_close(struct vfio_device_file *df)
>  {
> +	struct vfio_device *device = df->device;
> +
>  	mutex_lock(&device->dev_set->lock);
>  	vfio_assert_device_open(device);
>  	if (device->open_count == 1)
> -		vfio_device_last_close(device, iommufd);
> +		vfio_device_last_close(df);
>  	device->open_count--;
>  	mutex_unlock(&device->dev_set->lock);
>  }
> @@ -478,7 +486,7 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
>  	struct vfio_device_file *df = filep->private_data;
>  	struct vfio_device *device = df->device;
>  
> -	vfio_device_group_close(device);
> +	vfio_device_group_close(df);
>  
>  	vfio_device_put_registration(device);
>  
Thanks

Eric


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/13] vfio: Block device access via device fd until device is opened
  2023-01-17 13:49 ` [PATCH 08/13] vfio: Block device access via device fd until device is opened Yi Liu
  2023-01-18  9:35   ` Tian, Kevin
@ 2023-01-19 14:00   ` Eric Auger
  2023-01-30 10:41     ` Liu, Yi L
  2023-01-19 20:47   ` Alex Williamson
  2 siblings, 1 reply; 80+ messages in thread
From: Eric Auger @ 2023-01-19 14:00 UTC (permalink / raw)
  To: Yi Liu, alex.williamson, jgg
  Cc: kevin.tian, cohuck, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, suravee.suthikulpanit

Hi Yi,

On 1/17/23 14:49, Yi Liu wrote:
> Allow the vfio_device file to be in a state where the device FD is
> opened but the device cannot be used by userspace (i.e. its .open_device()
> hasn't been called). This inbetween state is not used when the device
> FD is spawned from the group FD, however when we create the device FD
> directly by opening a cdev it will be opened in the blocked state.
Please explain why this is needed in the commit message (although you
evoked the rationale in the cover letter).

Eric
>
> In the blocked state, currently only the bind operation is allowed,
> other device accesses are not allowed. Completing bind will allow user
> to further access the device.
>
> This is implemented by adding a flag in struct vfio_device_file to mark
> the blocked state and using a simple smp_load_acquire() to obtain the
> flag value and serialize all the device setup with the thread accessing
> this device.
>
> Due to this scheme it is not possible to unbind the FD, once it is bound,
> it remains bound until the FD is closed.
>
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/vfio.h      |  1 +
>  drivers/vfio/vfio_main.c | 29 +++++++++++++++++++++++++++++
>  2 files changed, 30 insertions(+)
>
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index 3d8ba165146c..c69a9902ea84 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -20,6 +20,7 @@ struct vfio_device_file {
>  	struct vfio_device *device;
>  	struct kvm *kvm;
>  	struct iommufd_ctx *iommufd;
> +	bool access_granted;
>  };
>  
>  void vfio_device_put_registration(struct vfio_device *device);
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 3df71bd9cd1e..d442ebaa4b21 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -430,6 +430,11 @@ int vfio_device_open(struct vfio_device_file *df)
>  		}
>  	}
>  
> +	/*
> +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> +	 * read/write/mmap
> +	 */
> +	smp_store_release(&df->access_granted, true);
>  	return 0;
>  }
>  
> @@ -1058,8 +1063,14 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
>  {
>  	struct vfio_device_file *df = filep->private_data;
>  	struct vfio_device *device = df->device;
> +	bool access;
>  	int ret;
>  
> +	/* Paired with smp_store_release() in vfio_device_open() */
> +	access = smp_load_acquire(&df->access_granted);
> +	if (!access)
> +		return -EINVAL;
> +
>  	ret = vfio_device_pm_runtime_get(device);
>  	if (ret)
>  		return ret;
> @@ -1086,6 +1097,12 @@ static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf,
>  {
>  	struct vfio_device_file *df = filep->private_data;
>  	struct vfio_device *device = df->device;
> +	bool access;
> +
> +	/* Paired with smp_store_release() in vfio_device_open() */
> +	access = smp_load_acquire(&df->access_granted);
> +	if (!access)
> +		return -EINVAL;
>  
>  	if (unlikely(!device->ops->read))
>  		return -EINVAL;
> @@ -1099,6 +1116,12 @@ static ssize_t vfio_device_fops_write(struct file *filep,
>  {
>  	struct vfio_device_file *df = filep->private_data;
>  	struct vfio_device *device = df->device;
> +	bool access;
> +
> +	/* Paired with smp_store_release() in vfio_device_open() */
> +	access = smp_load_acquire(&df->access_granted);
> +	if (!access)
> +		return -EINVAL;
>  
>  	if (unlikely(!device->ops->write))
>  		return -EINVAL;
> @@ -1110,6 +1133,12 @@ static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
>  {
>  	struct vfio_device_file *df = filep->private_data;
>  	struct vfio_device *device = df->device;
> +	bool access;
> +
> +	/* Paired with smp_store_release() in vfio_device_open() */
> +	access = smp_load_acquire(&df->access_granted);
> +	if (!access)
> +		return -EINVAL;
>  
>  	if (unlikely(!device->ops->mmap))
>  		return -EINVAL;


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 05/13] kvm/vfio: Provide struct kvm_device_ops::release() insted of ::destroy()
  2023-01-17 13:49 ` [PATCH 05/13] kvm/vfio: Provide struct kvm_device_ops::release() insted of ::destroy() Yi Liu
  2023-01-18  8:56   ` Tian, Kevin
  2023-01-19  9:12   ` Eric Auger
@ 2023-01-19 19:07   ` Jason Gunthorpe
  2023-01-19 20:04     ` Alex Williamson
                       ` (2 more replies)
  2 siblings, 3 replies; 80+ messages in thread
From: Jason Gunthorpe @ 2023-01-19 19:07 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	suravee.suthikulpanit

On Tue, Jan 17, 2023 at 05:49:34AM -0800, Yi Liu wrote:
> This is to avoid a circular refcount problem between the kvm struct and
> the device file. KVM modules holds device/group file reference when the
> device/group is added and releases it per removal or the last kvm reference
> is released. This reference model is ok for the group since there is no
> kvm reference in the group paths.
> 
> But it is a problem for device file since the vfio devices may get kvm
> reference in the device open path and put it in the device file release.
> e.g. Intel kvmgt. This would result in a circular issue since the kvm
> side won't put the device file reference if kvm reference is not 0, while
> the vfio device side needs to put kvm reference in the release callback.
> 
> To solve this problem for device file, let vfio provide release() which
> would be called once kvm file is closed, it won't depend on the last kvm
> reference. Hence avoid circular refcount problem.
> 
> Suggested-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  virt/kvm/vfio.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

From Alex's remarks please revise the commit message and add a Fixes
line of some kind that this solves the deadlock Matthew was working
on, and send it stand alone right away

Jason

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 05/13] kvm/vfio: Provide struct kvm_device_ops::release() insted of ::destroy()
  2023-01-19 19:07   ` Jason Gunthorpe
@ 2023-01-19 20:04     ` Alex Williamson
  2023-01-20 13:03     ` Liu, Yi L
  2023-01-20 14:00     ` Liu, Yi L
  2 siblings, 0 replies; 80+ messages in thread
From: Alex Williamson @ 2023-01-19 20:04 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yi Liu, kevin.tian, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang, suravee.suthikulpanit

On Thu, 19 Jan 2023 15:07:01 -0400
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Tue, Jan 17, 2023 at 05:49:34AM -0800, Yi Liu wrote:
> > This is to avoid a circular refcount problem between the kvm struct and
> > the device file. KVM modules holds device/group file reference when the
> > device/group is added and releases it per removal or the last kvm reference
> > is released. This reference model is ok for the group since there is no
> > kvm reference in the group paths.
> > 
> > But it is a problem for device file since the vfio devices may get kvm
> > reference in the device open path and put it in the device file release.
> > e.g. Intel kvmgt. This would result in a circular issue since the kvm
> > side won't put the device file reference if kvm reference is not 0, while
> > the vfio device side needs to put kvm reference in the release callback.
> > 
> > To solve this problem for device file, let vfio provide release() which
> > would be called once kvm file is closed, it won't depend on the last kvm
> > reference. Hence avoid circular refcount problem.
> > 
> > Suggested-by: Kevin Tian <kevin.tian@intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  virt/kvm/vfio.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)  
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> 
> From Alex's remarks please revise the commit message and add a Fixes
> line of some kind that this solves the deadlock Matthew was working
> on, and send it stand alone right away

Also revise the commit log since we'll be taking a reference in the
group model as well.  The function and comments should also be updated
s/destroy/release/.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/13] vfio: Pass struct vfio_device_file * to vfio_device_open/close()
  2023-01-19 11:01   ` Eric Auger
@ 2023-01-19 20:35     ` Alex Williamson
  2023-01-30  9:38       ` Liu, Yi L
  2023-01-30  9:38     ` Liu, Yi L
  1 sibling, 1 reply; 80+ messages in thread
From: Alex Williamson @ 2023-01-19 20:35 UTC (permalink / raw)
  To: Eric Auger
  Cc: Yi Liu, jgg, kevin.tian, cohuck, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang, suravee.suthikulpanit

On Thu, 19 Jan 2023 12:01:59 +0100
Eric Auger <eric.auger@redhat.com> wrote:

> Hi Yi,
> 
> On 1/17/23 14:49, Yi Liu wrote:
> > This avoids passing struct kvm * and struct iommufd_ctx * in multiple
> > functions. vfio_device_open() becomes to be a locked helper.  
> why? because dev_set lock now protects vfio_device_file fields? worth to
> explain.
> do we need to update the comment in vfio.h related to struct
> vfio_device_set?
> >
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/group.c     | 34 +++++++++++++++++++++++++---------
> >  drivers/vfio/vfio.h      | 10 +++++-----
> >  drivers/vfio/vfio_main.c | 40 ++++++++++++++++++++++++----------------
> >  3 files changed, 54 insertions(+), 30 deletions(-)
> >
> > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > index d83cf069d290..7200304663e5 100644
> > --- a/drivers/vfio/group.c
> > +++ b/drivers/vfio/group.c
> > @@ -154,33 +154,49 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
> >  	return ret;
> >  }
> >  
> > -static int vfio_device_group_open(struct vfio_device *device)
> > +static int vfio_device_group_open(struct vfio_device_file *df)
> >  {
> > +	struct vfio_device *device = df->device;
> >  	int ret;
> >  
> >  	mutex_lock(&device->group->group_lock);
> >  	if (!vfio_group_has_iommu(device->group)) {
> >  		ret = -EINVAL;
> > -		goto out_unlock;
> > +		goto err_unlock_group;
> >  	}
> >  
> > +	mutex_lock(&device->dev_set->lock);  
> is there an explanation somewhere about locking order b/w group_lock,
> dev_set lock?
> >  	/*
> >  	 * Here we pass the KVM pointer with the group under the lock.  If the
> >  	 * device driver will use it, it must obtain a reference and release it
> >  	 * during close_device.
> >  	 */  
> May be the opportunity to rephrase the above comment. I am not a native
> english speaker but the time concordance seems weird + clarify a
> reference to what.
> > -	ret = vfio_device_open(device, device->group->iommufd,
> > -			       device->group->kvm);
> > +	df->kvm = device->group->kvm;
> > +	df->iommufd = device->group->iommufd;
> > +
> > +	ret = vfio_device_open(df);
> > +	if (ret)
> > +		goto err_unlock_device;
> > +	mutex_unlock(&device->dev_set->lock);
> >  
> > -out_unlock:
> > +	mutex_unlock(&device->group->group_lock);
> > +	return 0;
> > +
> > +err_unlock_device:
> > +	df->kvm = NULL;
> > +	df->iommufd = NULL;
> > +	mutex_unlock(&device->dev_set->lock);
> > +err_unlock_group:
> >  	mutex_unlock(&device->group->group_lock);
> >  	return ret;
> >  }
> >  
> > -void vfio_device_group_close(struct vfio_device *device)
> > +void vfio_device_group_close(struct vfio_device_file *df)
> >  {
> > +	struct vfio_device *device = df->device;
> > +
> >  	mutex_lock(&device->group->group_lock);
> > -	vfio_device_close(device, device->group->iommufd);
> > +	vfio_device_close(df);
> >  	mutex_unlock(&device->group->group_lock);
> >  }
> >  
> > @@ -196,7 +212,7 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
> >  		goto err_out;
> >  	}
> >  
> > -	ret = vfio_device_group_open(device);
> > +	ret = vfio_device_group_open(df);
> >  	if (ret)
> >  		goto err_free;
> >  
> > @@ -228,7 +244,7 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
> >  	return filep;
> >  
> >  err_close_device:
> > -	vfio_device_group_close(device);
> > +	vfio_device_group_close(df);
> >  err_free:
> >  	kfree(df);
> >  err_out:
> > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > index 53af6e3ea214..3d8ba165146c 100644
> > --- a/drivers/vfio/vfio.h
> > +++ b/drivers/vfio/vfio.h
> > @@ -19,14 +19,14 @@ struct vfio_container;
> >  struct vfio_device_file {
> >  	struct vfio_device *device;
> >  	struct kvm *kvm;
> > +	struct iommufd_ctx *iommufd;
> >  };
> >  
> >  void vfio_device_put_registration(struct vfio_device *device);
> >  bool vfio_device_try_get_registration(struct vfio_device *device);
> > -int vfio_device_open(struct vfio_device *device,
> > -		     struct iommufd_ctx *iommufd, struct kvm *kvm);
> > -void vfio_device_close(struct vfio_device *device,
> > -		       struct iommufd_ctx *iommufd);
> > +int vfio_device_open(struct vfio_device_file *df);
> > +void vfio_device_close(struct vfio_device_file *device);
> > +
> >  struct vfio_device_file *
> >  vfio_allocate_device_file(struct vfio_device *device);
> >  
> > @@ -90,7 +90,7 @@ void vfio_device_group_register(struct vfio_device *device);
> >  void vfio_device_group_unregister(struct vfio_device *device);
> >  int vfio_device_group_use_iommu(struct vfio_device *device);
> >  void vfio_device_group_unuse_iommu(struct vfio_device *device);
> > -void vfio_device_group_close(struct vfio_device *device);
> > +void vfio_device_group_close(struct vfio_device_file *df);
> >  struct vfio_group *vfio_group_from_file(struct file *file);
> >  bool vfio_group_enforced_coherent(struct vfio_group *group);
> >  void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm);
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index dc08d5dd62cc..3df71bd9cd1e 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -358,9 +358,11 @@ vfio_allocate_device_file(struct vfio_device *device)
> >  	return df;
> >  }
> >  
> > -static int vfio_device_first_open(struct vfio_device *device,
> > -				  struct iommufd_ctx *iommufd, struct kvm *kvm)
> > +static int vfio_device_first_open(struct vfio_device_file *df)
> >  {
> > +	struct vfio_device *device = df->device;
> > +	struct iommufd_ctx *iommufd = df->iommufd;
> > +	struct kvm *kvm = df->kvm;
> >  	int ret;
> >  
> >  	lockdep_assert_held(&device->dev_set->lock);
> > @@ -394,9 +396,11 @@ static int vfio_device_first_open(struct vfio_device *device,
> >  	return ret;
> >  }
> >  
> > -static void vfio_device_last_close(struct vfio_device *device,
> > -				   struct iommufd_ctx *iommufd)
> > +static void vfio_device_last_close(struct vfio_device_file *df)
> >  {
> > +	struct vfio_device *device = df->device;
> > +	struct iommufd_ctx *iommufd = df->iommufd;
> > +
> >  	lockdep_assert_held(&device->dev_set->lock);
> >  
> >  	if (device->ops->close_device)
> > @@ -409,30 +413,34 @@ static void vfio_device_last_close(struct vfio_device *device,
> >  	module_put(device->dev->driver->owner);
> >  }
> >  
> > -int vfio_device_open(struct vfio_device *device,
> > -		     struct iommufd_ctx *iommufd, struct kvm *kvm)
> > +int vfio_device_open(struct vfio_device_file *df)
> >  {
> > -	int ret = 0;
> > +	struct vfio_device *device = df->device;
> > +
> > +	lockdep_assert_held(&device->dev_set->lock);
> >  
> > -	mutex_lock(&device->dev_set->lock);
> >  	device->open_count++;
> >  	if (device->open_count == 1) {
> > -		ret = vfio_device_first_open(device, iommufd, kvm);
> > -		if (ret)
> > +		int ret;
> > +
> > +		ret = vfio_device_first_open(df);
> > +		if (ret) {
> >  			device->open_count--;
> > +			return ret;  
> nit: the original ret init and return was good enough, no need to change it?
> > +		}
> >  	}
> > -	mutex_unlock(&device->dev_set->lock);
> >  
> > -	return ret;
> > +	return 0;
> >  }
> >  
> > -void vfio_device_close(struct vfio_device *device,
> > -		       struct iommufd_ctx *iommufd)
> > +void vfio_device_close(struct vfio_device_file *df)
> >  {
> > +	struct vfio_device *device = df->device;
> > +
> >  	mutex_lock(&device->dev_set->lock);
> >  	vfio_assert_device_open(device);
> >  	if (device->open_count == 1)
> > -		vfio_device_last_close(device, iommufd);
> > +		vfio_device_last_close(df);
> >  	device->open_count--;
> >  	mutex_unlock(&device->dev_set->lock);
> >  }

I find it strange that the dev_set->lock has been moved to the caller
for open, but not for close.  Like Eric suggests, this seems to be
because vfio_device_file is usurping dev_set->lock to protect its own
fields, but then those fields are set on open, cleared on the open
error path, but not on close??  Thanks,

Alex


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/13] vfio: Block device access via device fd until device is opened
  2023-01-17 13:49 ` [PATCH 08/13] vfio: Block device access via device fd until device is opened Yi Liu
  2023-01-18  9:35   ` Tian, Kevin
  2023-01-19 14:00   ` Eric Auger
@ 2023-01-19 20:47   ` Alex Williamson
  2023-01-30 10:48     ` Liu, Yi L
  2 siblings, 1 reply; 80+ messages in thread
From: Alex Williamson @ 2023-01-19 20:47 UTC (permalink / raw)
  To: Yi Liu
  Cc: jgg, kevin.tian, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang, suravee.suthikulpanit

On Tue, 17 Jan 2023 05:49:37 -0800
Yi Liu <yi.l.liu@intel.com> wrote:

> Allow the vfio_device file to be in a state where the device FD is
> opened but the device cannot be used by userspace (i.e. its .open_device()
> hasn't been called). This inbetween state is not used when the device
> FD is spawned from the group FD, however when we create the device FD
> directly by opening a cdev it will be opened in the blocked state.
> 
> In the blocked state, currently only the bind operation is allowed,
> other device accesses are not allowed. Completing bind will allow user
> to further access the device.
> 
> This is implemented by adding a flag in struct vfio_device_file to mark
> the blocked state and using a simple smp_load_acquire() to obtain the
> flag value and serialize all the device setup with the thread accessing
> this device.
> 
> Due to this scheme it is not possible to unbind the FD, once it is bound,
> it remains bound until the FD is closed.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/vfio.h      |  1 +
>  drivers/vfio/vfio_main.c | 29 +++++++++++++++++++++++++++++
>  2 files changed, 30 insertions(+)
> 
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index 3d8ba165146c..c69a9902ea84 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -20,6 +20,7 @@ struct vfio_device_file {
>  	struct vfio_device *device;
>  	struct kvm *kvm;
>  	struct iommufd_ctx *iommufd;
> +	bool access_granted;
>  };
>  
>  void vfio_device_put_registration(struct vfio_device *device);
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 3df71bd9cd1e..d442ebaa4b21 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -430,6 +430,11 @@ int vfio_device_open(struct vfio_device_file *df)
>  		}
>  	}
>  
> +	/*
> +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> +	 * read/write/mmap
> +	 */
> +	smp_store_release(&df->access_granted, true);

Why is this happening outside of the first-open branch?  Thanks,

Alex


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 09/13] vfio: Add infrastructure for bind_iommufd and attach
  2023-01-17 13:49 ` [PATCH 09/13] vfio: Add infrastructure for bind_iommufd and attach Yi Liu
  2023-01-19  9:45   ` Tian, Kevin
@ 2023-01-19 23:05   ` Alex Williamson
  2023-01-30 13:55     ` Liu, Yi L
  1 sibling, 1 reply; 80+ messages in thread
From: Alex Williamson @ 2023-01-19 23:05 UTC (permalink / raw)
  To: Yi Liu
  Cc: jgg, kevin.tian, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang, suravee.suthikulpanit

On Tue, 17 Jan 2023 05:49:38 -0800
Yi Liu <yi.l.liu@intel.com> wrote:

> This prepares to add ioctls for device cdev fd. This infrastructure includes:
>     - add vfio_iommufd_attach() to support iommufd pgtable attach after
>       bind_iommufd. A NULL pt_id indicates detach.
>     - let vfio_iommufd_bind() to accept pt_id, e.g. the comapt_ioas_id in the
>       legacy group path, and also return back dev_id if caller requires it.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/group.c     | 12 +++++-
>  drivers/vfio/iommufd.c   | 79 ++++++++++++++++++++++++++++++----------
>  drivers/vfio/vfio.h      | 15 ++++++--
>  drivers/vfio/vfio_main.c | 10 +++--
>  4 files changed, 88 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> index 7200304663e5..9484bb1c54a9 100644
> --- a/drivers/vfio/group.c
> +++ b/drivers/vfio/group.c
> @@ -157,6 +157,8 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
>  static int vfio_device_group_open(struct vfio_device_file *df)
>  {
>  	struct vfio_device *device = df->device;
> +	u32 ioas_id;
> +	u32 *pt_id = NULL;
>  	int ret;
>  
>  	mutex_lock(&device->group->group_lock);
> @@ -165,6 +167,14 @@ static int vfio_device_group_open(struct vfio_device_file *df)
>  		goto err_unlock_group;
>  	}
>  
> +	if (device->group->iommufd) {
> +		ret = iommufd_vfio_compat_ioas_id(device->group->iommufd,
> +						  &ioas_id);
> +		if (ret)
> +			goto err_unlock_group;
> +		pt_id = &ioas_id;
> +	}
> +
>  	mutex_lock(&device->dev_set->lock);
>  	/*
>  	 * Here we pass the KVM pointer with the group under the lock.  If the
> @@ -174,7 +184,7 @@ static int vfio_device_group_open(struct vfio_device_file *df)
>  	df->kvm = device->group->kvm;
>  	df->iommufd = device->group->iommufd;
>  
> -	ret = vfio_device_open(df);
> +	ret = vfio_device_open(df, NULL, pt_id);
>  	if (ret)
>  		goto err_unlock_device;
>  	mutex_unlock(&device->dev_set->lock);
> diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> index 4f82a6fa7c6c..412644fdbf16 100644
> --- a/drivers/vfio/iommufd.c
> +++ b/drivers/vfio/iommufd.c
> @@ -10,9 +10,17 @@
>  MODULE_IMPORT_NS(IOMMUFD);
>  MODULE_IMPORT_NS(IOMMUFD_VFIO);
>  
> -int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
> +/* @pt_id == NULL implies detach */
> +int vfio_iommufd_attach(struct vfio_device *vdev, u32 *pt_id)
> +{
> +	lockdep_assert_held(&vdev->dev_set->lock);
> +
> +	return vdev->ops->attach_ioas(vdev, pt_id);
> +}


I find this patch pretty confusing, I think it's rooted in all these
multiplexed interfaces, which extend all the way out to userspace with
a magic, reserved page table ID to detach a device from an IOAS.  It
seems like it would be simpler to make a 'detach' API, a detach_ioas
callback on the vfio_device_ops, and certainly not an
vfio_iommufd_attach() function that does a detach provided the correct
args while also introducing a __vfio_iommufd_detach() function.

This series is also missing an update to
Documentation/driver-api/vfio.rst, which is already behind relative to
the iommufd interfaces.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 10/13] vfio: Make vfio_device_open() exclusive between group path and device cdev path
  2023-01-17 13:49 ` [PATCH 10/13] vfio: Make vfio_device_open() exclusive between group path and device cdev path Yi Liu
  2023-01-19  9:55   ` Tian, Kevin
@ 2023-01-19 23:51   ` Alex Williamson
  2023-01-30 12:14     ` Liu, Yi L
  1 sibling, 1 reply; 80+ messages in thread
From: Alex Williamson @ 2023-01-19 23:51 UTC (permalink / raw)
  To: Yi Liu
  Cc: jgg, kevin.tian, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang, suravee.suthikulpanit

On Tue, 17 Jan 2023 05:49:39 -0800
Yi Liu <yi.l.liu@intel.com> wrote:

> VFIO group has historically allowed multi-open of the device FD. This
> was made secure because the "open" was executed via an ioctl to the
> group FD which is itself only single open.
> 
> No know use of multiple device FDs is known. It is kind of a strange
  ^^ ^^^^                               ^^^^^

> thing to do because new device FDs can naturally be created via dup().
> 
> When we implement the new device uAPI there is no natural way to allow
> the device itself from being multi-opened in a secure manner. Without
> the group FD we cannot prove the security context of the opener.
> 
> Thus, when moving to the new uAPI we block the ability to multi-open
> the device. This also makes the cdev path exclusive with group path.
> 
> The main logic is in the vfio_device_open(). It needs to sustain both
> the legacy behavior i.e. multi-open in the group path and the new
> behavior i.e. single-open in the cdev path. This mixture leads to the
> introduction of a new single_open flag stored both in struct vfio_device
> and vfio_device_file. vfio_device_file::single_open is set per the
> vfio_device_file allocation. Its value is propagated to struct vfio_device
> after device is opened successfully.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/group.c     |  2 +-
>  drivers/vfio/vfio.h      |  6 +++++-
>  drivers/vfio/vfio_main.c | 25 ++++++++++++++++++++++---
>  include/linux/vfio.h     |  1 +
>  4 files changed, 29 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> index 9484bb1c54a9..57ebe5e1a7e6 100644
> --- a/drivers/vfio/group.c
> +++ b/drivers/vfio/group.c
> @@ -216,7 +216,7 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
>  	struct file *filep;
>  	int ret;
>  
> -	df = vfio_allocate_device_file(device);
> +	df = vfio_allocate_device_file(device, false);
>  	if (IS_ERR(df)) {
>  		ret = PTR_ERR(df);
>  		goto err_out;
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index fe0fcfa78710..bdcf9762521d 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -17,7 +17,11 @@ struct vfio_device;
>  struct vfio_container;
>  
>  struct vfio_device_file {
> +	/* static fields, init per allocation */
>  	struct vfio_device *device;
> +	bool single_open;
> +
> +	/* fields set after allocation */
>  	struct kvm *kvm;
>  	struct iommufd_ctx *iommufd;
>  	bool access_granted;
> @@ -30,7 +34,7 @@ int vfio_device_open(struct vfio_device_file *df,
>  void vfio_device_close(struct vfio_device_file *device);
>  
>  struct vfio_device_file *
> -vfio_allocate_device_file(struct vfio_device *device);
> +vfio_allocate_device_file(struct vfio_device *device, bool single_open);
>  
>  extern const struct file_operations vfio_device_fops;
>  
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 90174a9015c4..78725c28b933 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -345,7 +345,7 @@ static bool vfio_assert_device_open(struct vfio_device *device)
>  }
>  
>  struct vfio_device_file *
> -vfio_allocate_device_file(struct vfio_device *device)
> +vfio_allocate_device_file(struct vfio_device *device, bool single_open)
>  {
>  	struct vfio_device_file *df;
>  
> @@ -354,6 +354,7 @@ vfio_allocate_device_file(struct vfio_device *device)
>  		return ERR_PTR(-ENOMEM);
>  
>  	df->device = device;
> +	df->single_open = single_open;

It doesn't make sense to me to convolute the definition of this
function with an unmemorable bool arg when the one caller that sets the
value true could simply open code it.

>  
>  	return df;
>  }
> @@ -421,6 +422,16 @@ int vfio_device_open(struct vfio_device_file *df,
>  
>  	lockdep_assert_held(&device->dev_set->lock);
>  
> +	/*
> +	 * Device cdev path cannot support multiple device open since
> +	 * it doesn't have a secure way for it. So a second device
> +	 * open attempt should be failed if the caller is from a cdev
> +	 * path or the device has already been opened by a cdev path.
> +	 */
> +	if (device->open_count != 0 &&
> +	    (df->single_open || device->single_open))
> +		return -EINVAL;

IIUC, the reason this exists is that we let the user open the device
cdev arbitrarily, but only one instance can call
ioctl(VFIO_DEVICE_BIND_IOMMUFD).  Why do we bother to let the user
create those other file instances?  What expectations are we setting
for the user by allowing them to open the device but not use it?

Clearly we're thinking about a case here where the device has been
opened via the group path and the user is now attempting to bind the
same device via the cdev path.  That seems wrong to even allow and I'm
surprised it gets this far.  In fact, where do we block a user from
opening one device in a group via cdev and another via the group?


> +
>  	device->open_count++;
>  	if (device->open_count == 1) {
>  		int ret;
> @@ -430,6 +441,7 @@ int vfio_device_open(struct vfio_device_file *df,
>  			device->open_count--;
>  			return ret;
>  		}
> +		device->single_open = df->single_open;
>  	}
>  
>  	/*
> @@ -446,8 +458,10 @@ void vfio_device_close(struct vfio_device_file *df)
>  
>  	mutex_lock(&device->dev_set->lock);
>  	vfio_assert_device_open(device);
> -	if (device->open_count == 1)
> +	if (device->open_count == 1) {
>  		vfio_device_last_close(df);
> +		device->single_open = false;
> +	}
>  	device->open_count--;
>  	mutex_unlock(&device->dev_set->lock);
>  }
> @@ -493,7 +507,12 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
>  	struct vfio_device_file *df = filep->private_data;
>  	struct vfio_device *device = df->device;
>  
> -	vfio_device_group_close(df);
> +	/*
> +	 * group path supports multiple device open, while cdev doesn't.
> +	 * So use vfio_device_group_close() for !singel_open case.
> +	 */
> +	if (!df->single_open)
> +		vfio_device_group_close(df);

If we're going to use this to differentiate group vs cdev use cases,
then let's name it something to reflect that rather than pretending it
only limits the number of opens, ex. is_cdev_device.  Thanks,

Alex


>  
>  	vfio_device_put_registration(device);
>  
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 46edd6e6c0ba..300318f0d448 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -63,6 +63,7 @@ struct vfio_device {
>  	struct iommufd_ctx *iommufd_ictx;
>  	bool iommufd_attached;
>  #endif
> +	bool single_open;
>  };
>  
>  /**


^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 05/13] kvm/vfio: Provide struct kvm_device_ops::release() insted of ::destroy()
  2023-01-19  9:30     ` Tian, Kevin
@ 2023-01-20  3:52       ` Liu, Yi L
  0 siblings, 0 replies; 80+ messages in thread
From: Liu, Yi L @ 2023-01-20  3:52 UTC (permalink / raw)
  To: Tian, Kevin, eric.auger, alex.williamson, jgg
  Cc: cohuck, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, suravee.suthikulpanit, Paolo Bonzini, Christopherson,
	Sean J

> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Thursday, January 19, 2023 5:30 PM
> 
> > From: Eric Auger <eric.auger@redhat.com>
> > Sent: Thursday, January 19, 2023 5:13 PM
> >
> > Hi Yi,
> >
> > On 1/17/23 14:49, Yi Liu wrote:
> > > This is to avoid a circular refcount problem between the kvm struct and
> > > the device file. KVM modules holds device/group file reference when
> the
> > > device/group is added and releases it per removal or the last kvm
> reference
> > > is released. This reference model is ok for the group since there is no
> > > kvm reference in the group paths.
> > >
> > > But it is a problem for device file since the vfio devices may get kvm
> > > reference in the device open path and put it in the device file release.
> > > e.g. Intel kvmgt. This would result in a circular issue since the kvm
> > > side won't put the device file reference if kvm reference is not 0, while
> > > the vfio device side needs to put kvm reference in the release callback.
> > >
> > > To solve this problem for device file, let vfio provide release() which
> > > would be called once kvm file is closed, it won't depend on the last kvm
> > > reference. Hence avoid circular refcount problem.
> > >
> > > Suggested-by: Kevin Tian <kevin.tian@intel.com>
> > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > ---
> > >  virt/kvm/vfio.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> > > index 0f54b9d308d7..525efe37ab6d 100644
> > > --- a/virt/kvm/vfio.c
> > > +++ b/virt/kvm/vfio.c
> > > @@ -364,7 +364,7 @@ static int kvm_vfio_create(struct kvm_device
> *dev,
> > u32 type);
> > >  static struct kvm_device_ops kvm_vfio_ops = {
> > >  	.name = "kvm-vfio",
> > >  	.create = kvm_vfio_create,
> > > -	.destroy = kvm_vfio_destroy,
> > Is it safe to simply remove the destroy cb as it is called from
> > kvm_destroy_vm/kvm_destroy_devices?
> >

Perhaps better to keep it. kvm_vfio_device is only one kind of kvm_device_type
For kvm_vfio_device, it is now considered to better provide a release cb.
While other kvm_device may better to have destroy cb.

> 
> According to the definition .release is considered as an alternative
> method to free the device:
> 
> 	/*
> 	 * Destroy is responsible for freeing dev.
> 	 *
> 	 * Destroy may be called before or after destructors are called
> 	 * on emulated I/O regions, depending on whether a reference is
> 	 * held by a vcpu or other kvm component that gets destroyed
> 	 * after the emulated I/O.
> 	 */
> 	void (*destroy)(struct kvm_device *dev);
> 
> 	/*
> 	 * Release is an alternative method to free the device. It is
> 	 * called when the device file descriptor is closed. Once
> 	 * release is called, the destroy method will not be called
> 	 * anymore as the device is removed from the device list of
> 	 * the VM. kvm->lock is held.
> 	 */
> 	void (*release)(struct kvm_device *dev);
> 
> Did you see any specific problem of moving this stuff to release?

It should only affect kvm_vfio_device itself. 😊

Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 11/13] vfio: Add cdev for vfio_device
  2023-01-17 13:49 ` [PATCH 11/13] vfio: Add cdev for vfio_device Yi Liu
@ 2023-01-20  7:26   ` Tian, Kevin
  2023-01-31  6:17     ` Liu, Yi L
  2023-01-24 20:44   ` Jason Gunthorpe
  1 sibling, 1 reply; 80+ messages in thread
From: Tian, Kevin @ 2023-01-20  7:26 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, suravee.suthikulpanit, Martins, Joao

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Tuesday, January 17, 2023 9:50 PM
> 
> @@ -156,7 +159,11 @@ static void vfio_device_release(struct device *dev)
>  			container_of(dev, struct vfio_device, device);
> 
>  	vfio_release_device_set(device);
> +#if IS_ENABLED(CONFIG_IOMMUFD)
> +	ida_free(&vfio.device_ida, MINOR(device->device.devt));
> +#else
>  	ida_free(&vfio.device_ida, device->index);
> +#endif

There are many #if in this patch, leading to bad readability.

for this what about letting device->index always storing the minor
value? then here it could just be:

	ida_free(&vfio.device_ida, device->index);

> @@ -232,17 +240,25 @@ static int vfio_init_device(struct vfio_device
> *device, struct device *dev,
>  	device->device.release = vfio_device_release;
>  	device->device.class = vfio.device_class;
>  	device->device.parent = device->dev;
> +#if IS_ENABLED(CONFIG_IOMMUFD)
> +	device->device.devt = MKDEV(MAJOR(vfio.device_devt), minor);
> +	cdev_init(&device->cdev, &vfio_device_fops);
> +	device->cdev.owner = THIS_MODULE;
> +#else
> +	device->index = minor;
> +#endif

Probably we can have a vfio_init_device_cdev() in iommufd.c and let
it be empty if !CONFIG_IOMMUFD. Then here could be:

	device->index = minor;
	vfio_init_device_cdev(device, vfio.device_devt, minor);

> @@ -257,7 +273,12 @@ static int __vfio_register_dev(struct vfio_device
> *device,
>  	if (!device->dev_set)
>  		vfio_assign_device_set(device, device);
> 
> -	ret = dev_set_name(&device->device, "vfio%d", device->index);
> +#if IS_ENABLED(CONFIG_IOMMUFD)
> +	minor = MINOR(device->device.devt);
> +#else
> +	minor = device->index;
> +#endif

then just "minor = device->index"

> 
> +#if IS_ENABLED(CONFIG_IOMMUFD)
> +	ret = cdev_device_add(&device->cdev, &device->device);
> +#else
>  	ret = device_add(&device->device);
> +#endif

also add a wrapper vfio_register_device_cdev() which does
cdev_device_add() if CONFIG_IOMMUFD and device_add() otherwise.


> +#if IS_ENABLED(CONFIG_IOMMUFD)
> +	/*
> +	 * Balances device_add in register path. Putting it as the first
> +	 * operation in unregister to prevent registration refcount from
> +	 * incrementing per cdev open.
> +	 */
> +	cdev_device_del(&device->cdev, &device->device);
> +#else
> +	device_del(&device->device);
> +#endif

ditto

> +#if IS_ENABLED(CONFIG_IOMMUFD)
> +static int vfio_device_fops_open(struct inode *inode, struct file *filep)
> +{
> +	struct vfio_device *device = container_of(inode->i_cdev,
> +						  struct vfio_device, cdev);
> +	struct vfio_device_file *df;
> +	int ret;
> +
> +	if (!vfio_device_try_get_registration(device))
> +		return -ENODEV;
> +
> +	/*
> +	 * device access is blocked until .open_device() is called
> +	 * in BIND_IOMMUFD.
> +	 */
> +	df = vfio_allocate_device_file(device, true);
> +	if (IS_ERR(df)) {
> +		ret = PTR_ERR(df);
> +		goto err_put_registration;
> +	}
> +
> +	filep->private_data = df;
> +
> +	return 0;
> +
> +err_put_registration:
> +	vfio_device_put_registration(device);
> +	return ret;
> +}
> +#endif

move to iommufd.c

> +#if IS_ENABLED(CONFIG_IOMMUFD)
> +static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
> +{
> +	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
> +}
> +#endif

ditto

> @@ -1543,9 +1617,21 @@ static int __init vfio_init(void)
>  		goto err_dev_class;
>  	}
> 
> +#if IS_ENABLED(CONFIG_IOMMUFD)
> +	vfio.device_class->devnode = vfio_device_devnode;
> +	ret = alloc_chrdev_region(&vfio.device_devt, 0,
> +				  MINORMASK + 1, "vfio-dev");
> +	if (ret)
> +		goto err_alloc_dev_chrdev;
> +#endif

vfio_cdev_init()

>  static void __exit vfio_cleanup(void)
>  {
>  	ida_destroy(&vfio.device_ida);
> +#if IS_ENABLED(CONFIG_IOMMUFD)
> +	unregister_chrdev_region(vfio.device_devt, MINORMASK + 1);
> +#endif

vfio_cdev_cleanup()


^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 12/13] vfio: Add ioctls for device cdev iommufd
  2023-01-17 13:49 ` [PATCH 12/13] vfio: Add ioctls for device cdev iommufd Yi Liu
@ 2023-01-20  8:03   ` Tian, Kevin
  2023-02-06  9:07     ` Liu, Yi L
  0 siblings, 1 reply; 80+ messages in thread
From: Tian, Kevin @ 2023-01-20  8:03 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, suravee.suthikulpanit

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Tuesday, January 17, 2023 9:50 PM
> 
> This adds two vfio device ioctls for userspace using iommufd on vfio
> devices.
> 
>     VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain
> DMA
> 			      control provided by the iommufd. VFIO no
> 			      iommu is indicated by passing a minus
> 			      fd value.

Can't this be a flag bit for better readability than using a special value?

>     VFIO_DEVICE_ATTACH_IOMMUFD_PT: attach device to ioas, page tables
> 				   managed by iommufd. Attach can be
> 				   undo by passing IOMMUFD_INVALID_ID
> 				   to kernel.

With Alex' remark we need a separate DETACH cmd now.

> 
> +	/*
> +	 * For group path, iommufd pointer is NULL when comes into this
> +	 * helper. Its noiommu support is in container.c.
> +	 *
> +	 * For iommufd compat mode, iommufd pointer here is a valid value.
> +	 * Its noiommu support is supposed to be in vfio_iommufd_bind().
> +	 *
> +	 * For device cdev path, iommufd pointer here is a valid value for
> +	 * normal cases, but it is NULL if it's noiommu. The reason is
> +	 * that userspace uses iommufd==-1 to indicate noiommu mode in
> this
> +	 * path. So caller of this helper will pass in a NULL iommufd
> +	 * pointer. To differentiate it from the group path which also
> +	 * passes NULL iommufd pointer in, df->noiommu is used. For cdev
> +	 * noiommu, df->noiommu would be set to mark noiommu case for
> cdev
> +	 * path.
> +	 *
> +	 * So if df->noiommu is set then this helper just goes ahead to
> +	 * open device. If not, it depends on if iommufd pointer is NULL
> +	 * to handle the group path, iommufd compat mode, normal cases in
> +	 * the cdev path.
> +	 */
>  	if (iommufd)
>  		ret = vfio_iommufd_bind(device, iommufd, dev_id, pt_id);
> -	else
> +	else if (!df->noiommu)
>  		ret = vfio_device_group_use_iommu(device);
>  	if (ret)
>  		goto err_module_put;

Isn't 'ret' uninitialized when df->noiommu is true?

> +static int vfio_ioctl_device_attach(struct vfio_device *device,
> +				    struct vfio_device_feature __user *arg)
> +{
> +	struct vfio_device_attach_iommufd_pt attach;
> +	int ret;
> +	bool is_attach;
> +
> +	if (copy_from_user(&attach, (void __user *)arg, sizeof(attach)))
> +		return -EFAULT;
> +
> +	if (attach.flags)
> +		return -EINVAL;
> +
> +	if (!device->ops->bind_iommufd)
> +		return -ENODEV;
> +

this should fail if noiommu is true.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 05/13] kvm/vfio: Provide struct kvm_device_ops::release() insted of ::destroy()
  2023-01-19 19:07   ` Jason Gunthorpe
  2023-01-19 20:04     ` Alex Williamson
@ 2023-01-20 13:03     ` Liu, Yi L
  2023-01-20 14:00     ` Liu, Yi L
  2 siblings, 0 replies; 80+ messages in thread
From: Liu, Yi L @ 2023-01-20 13:03 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: alex.williamson, Tian, Kevin, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	suravee.suthikulpanit

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, January 20, 2023 3:07 AM
> 
> On Tue, Jan 17, 2023 at 05:49:34AM -0800, Yi Liu wrote:
> > This is to avoid a circular refcount problem between the kvm struct and
> > the device file. KVM modules holds device/group file reference when the
> > device/group is added and releases it per removal or the last kvm
> reference
> > is released. This reference model is ok for the group since there is no
> > kvm reference in the group paths.
> >
> > But it is a problem for device file since the vfio devices may get kvm
> > reference in the device open path and put it in the device file release.
> > e.g. Intel kvmgt. This would result in a circular issue since the kvm
> > side won't put the device file reference if kvm reference is not 0, while
> > the vfio device side needs to put kvm reference in the release callback.
> >
> > To solve this problem for device file, let vfio provide release() which
> > would be called once kvm file is closed, it won't depend on the last kvm
> > reference. Hence avoid circular refcount problem.
> >
> > Suggested-by: Kevin Tian <kevin.tian@intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  virt/kvm/vfio.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> 
> From Alex's remarks please revise the commit message and add a Fixes
> line of some kind that this solves the deadlock Matthew was working
> on, and send it stand alone right away

Sure. 😊 I'll rename the patch subject and commit message.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 05/13] kvm/vfio: Provide struct kvm_device_ops::release() insted of ::destroy()
  2023-01-19 19:07   ` Jason Gunthorpe
  2023-01-19 20:04     ` Alex Williamson
  2023-01-20 13:03     ` Liu, Yi L
@ 2023-01-20 14:00     ` Liu, Yi L
  2023-01-20 14:33       ` Jason Gunthorpe
  2 siblings, 1 reply; 80+ messages in thread
From: Liu, Yi L @ 2023-01-20 14:00 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: alex.williamson, Tian, Kevin, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	suravee.suthikulpanit, Paolo Bonzini, Christopherson,,
	Sean

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, January 20, 2023 3:07 AM
> 
> On Tue, Jan 17, 2023 at 05:49:34AM -0800, Yi Liu wrote:
> > This is to avoid a circular refcount problem between the kvm struct and
> > the device file. KVM modules holds device/group file reference when the
> > device/group is added and releases it per removal or the last kvm
> reference
> > is released. This reference model is ok for the group since there is no
> > kvm reference in the group paths.
> >
> > But it is a problem for device file since the vfio devices may get kvm
> > reference in the device open path and put it in the device file release.
> > e.g. Intel kvmgt. This would result in a circular issue since the kvm
> > side won't put the device file reference if kvm reference is not 0, while
> > the vfio device side needs to put kvm reference in the release callback.
> >
> > To solve this problem for device file, let vfio provide release() which
> > would be called once kvm file is closed, it won't depend on the last kvm
> > reference. Hence avoid circular refcount problem.
> >
> > Suggested-by: Kevin Tian <kevin.tian@intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  virt/kvm/vfio.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> 
> From Alex's remarks please revise the commit message and add a Fixes
> line of some kind that this solves the deadlock Matthew was working
> on, and send it stand alone right away

Hi Kevin, Jason,

I got a minor question. Let me check your opinions.

So after this change. Say we have thread A, which is the kvm-vfio device
release. It will hold the kvm_lock and delete the kvm-vfio device from
the kvm-device list. It will also call into vfio to set KVM==NULL. So it will
try to hold group_lock. The locking order is kvm_lock -> group_lock.

Say in the same time, we have thread B closes device, it will hold
group_lock first and then calls kvm_put_kvm() which is the last
reference, then it would loop the kvm-device list. Currently, it is
not holding kvm_lock. But it also manipulating the kvm-device list,
should it hold kvm_lock? If yes, then the locking order is
group_lock -> kvm_lock. Then we will have A-B B-A lock issue.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 05/13] kvm/vfio: Provide struct kvm_device_ops::release() insted of ::destroy()
  2023-01-20 14:00     ` Liu, Yi L
@ 2023-01-20 14:33       ` Jason Gunthorpe
  2023-01-20 15:09         ` Liu, Yi L
  0 siblings, 1 reply; 80+ messages in thread
From: Jason Gunthorpe @ 2023-01-20 14:33 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: alex.williamson, Tian, Kevin, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	suravee.suthikulpanit, Paolo Bonzini, Christopherson,,
	Sean

On Fri, Jan 20, 2023 at 02:00:26PM +0000, Liu, Yi L wrote:
> Say in the same time, we have thread B closes device, it will hold
> group_lock first and then calls kvm_put_kvm() which is the last
> reference, then it would loop the kvm-device list. Currently, it is
> not holding kvm_lock. But it also manipulating the kvm-device list,
> should it hold kvm_lock? 

No. When using refcounts if the refcount is 0 it guarantees there are
no other threads that can possibly touch this memory, so any locks
internal to the memory are not required.

Jason

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 05/13] kvm/vfio: Provide struct kvm_device_ops::release() insted of ::destroy()
  2023-01-20 14:33       ` Jason Gunthorpe
@ 2023-01-20 15:09         ` Liu, Yi L
  2023-01-20 15:11           ` Liu, Yi L
  0 siblings, 1 reply; 80+ messages in thread
From: Liu, Yi L @ 2023-01-20 15:09 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: alex.williamson, Tian, Kevin, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	suravee.suthikulpanit, Paolo Bonzini, Christopherson,,
	Sean

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, January 20, 2023 10:34 PM
> 
> On Fri, Jan 20, 2023 at 02:00:26PM +0000, Liu, Yi L wrote:
> > Say in the same time, we have thread B closes device, it will hold
> > group_lock first and then calls kvm_put_kvm() which is the last
> > reference, then it would loop the kvm-device list. Currently, it is
> > not holding kvm_lock. But it also manipulating the kvm-device list,
> > should it hold kvm_lock?
> 
> No. When using refcounts if the refcount is 0 it guarantees there are
> no other threads that can possibly touch this memory, so any locks
> internal to the memory are not required.

Ok. The patch has been sent out standalone.

https://lore.kernel.org/kvm/20230114000351.115444-1-mjrosato@linux.ibm.com/T/#u

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 05/13] kvm/vfio: Provide struct kvm_device_ops::release() insted of ::destroy()
  2023-01-20 15:09         ` Liu, Yi L
@ 2023-01-20 15:11           ` Liu, Yi L
  0 siblings, 0 replies; 80+ messages in thread
From: Liu, Yi L @ 2023-01-20 15:11 UTC (permalink / raw)
  To: Liu, Yi L, Jason Gunthorpe
  Cc: alex.williamson, Tian, Kevin, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	suravee.suthikulpanit, Paolo Bonzini, Christopherson,,
	Sean

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Friday, January 20, 2023 11:10 PM
> 
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Friday, January 20, 2023 10:34 PM
> >
> > On Fri, Jan 20, 2023 at 02:00:26PM +0000, Liu, Yi L wrote:
> > > Say in the same time, we have thread B closes device, it will hold
> > > group_lock first and then calls kvm_put_kvm() which is the last
> > > reference, then it would loop the kvm-device list. Currently, it is
> > > not holding kvm_lock. But it also manipulating the kvm-device list,
> > > should it hold kvm_lock?
> >
> > No. When using refcounts if the refcount is 0 it guarantees there are
> > no other threads that can possibly touch this memory, so any locks
> > internal to the memory are not required.
> 
> Ok. The patch has been sent out standalone.
> 
> https://lore.kernel.org/kvm/20230114000351.115444-1-
> mjrosato@linux.ibm.com/T/#u

Wrong link. Below is the correct one. 😊

https://lore.kernel.org/kvm/20230120150528.471752-1-yi.l.liu@intel.com/T/#u

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 11/13] vfio: Add cdev for vfio_device
  2023-01-17 13:49 ` [PATCH 11/13] vfio: Add cdev for vfio_device Yi Liu
  2023-01-20  7:26   ` Tian, Kevin
@ 2023-01-24 20:44   ` Jason Gunthorpe
  1 sibling, 0 replies; 80+ messages in thread
From: Jason Gunthorpe @ 2023-01-24 20:44 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	suravee.suthikulpanit, Joao Martins

On Tue, Jan 17, 2023 at 05:49:40AM -0800, Yi Liu wrote:
> This allows user to directly open a vfio device w/o using the legacy
> container/group interface, as a prerequisite for supporting new iommu
> features like nested translation.
> 
> The device fd opened in this manner doesn't have the capability to access
> the device as the fops open() doesn't open the device until the successful
> BIND_IOMMUFD which be added in next patch.
> 
> With this patch, devices registered to vfio core have both group and device
> interface created.
> 
> - group interface : /dev/vfio/$groupID
> - device interface: /dev/vfio/devices/vfioX  (X is the minor number and
> 					      is unique across devices)
> 
> Given a vfio device the user can identify the matching vfioX by checking
> the sysfs path of the device. Take PCI device (0000:6a:01.0) for example,
> /sys/bus/pci/devices/0000\:6a\:01.0/vfio-dev/vfio0/dev contains the
> major:minor of the matching vfioX.
> 
> Userspace then opens the /dev/vfio/devices/vfioX and checks with fstat
> that the major:minor matches.
> 
> The vfio_device cdev logic in this patch:
> *) __vfio_register_dev() path ends up doing cdev_device_add() for each
>    vfio_device;
> *) vfio_unregister_group_dev() path does cdev_device_del();
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>  drivers/vfio/vfio_main.c | 103 ++++++++++++++++++++++++++++++++++++---
>  include/linux/vfio.h     |   7 ++-
>  2 files changed, 102 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 78725c28b933..6068ffb7c6b7 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -43,6 +43,9 @@
>  static struct vfio {
>  	struct class			*device_class;
>  	struct ida			device_ida;
> +#if IS_ENABLED(CONFIG_IOMMUFD)
> +	dev_t                           device_devt;
> +#endif

It is a bit strange to see this called CONFIG_IOMMUFD, maybe we should
have a CONFIG_VFIO_DEVICE_FILE that depends on IOMMUFD?

Please try to use a plain 'if (IS_ENABLED())' for more of these

It probably isn't worth saving a few bytes in memory to complicate all
the code, so maybe just always include things like this.

> @@ -156,7 +159,11 @@ static void vfio_device_release(struct device *dev)
>  			container_of(dev, struct vfio_device, device);
>  
>  	vfio_release_device_set(device);
> +#if IS_ENABLED(CONFIG_IOMMUFD)
> +	ida_free(&vfio.device_ida, MINOR(device->device.devt));
> +#else
>  	ida_free(&vfio.device_ida, device->index);
> +#endif

A vfio_device_get_index() would help this

Jason

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 02/13] vfio: Refine vfio file kAPIs
  2023-01-18 14:37   ` Eric Auger
@ 2023-01-29 13:32     ` Liu, Yi L
  0 siblings, 0 replies; 80+ messages in thread
From: Liu, Yi L @ 2023-01-29 13:32 UTC (permalink / raw)
  To: eric.auger, alex.williamson, jgg
  Cc: Tian, Kevin, cohuck, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, suravee.suthikulpanit

Hi Eric,

> From: Eric Auger <eric.auger@redhat.com>
> Sent: Wednesday, January 18, 2023 10:37 PM

> > +/**
> > + * vfio_file_has_dev - True if the VFIO file is a handle for device
> This original description sounds weird because originally it aimed
> at figuring whether the device belonged to that vfio group fd, no?
> And since it will handle both group fd and device fd it still sounds
> weird to me.

Yes. It is to check if a device belongs to the input vfio group specified
by the group fd. And after this commit, it means if the input file is
competent to be handle for the device either due to belong to group
or it is just a device fd for the device. I don’t have a better naming so
far. How about you?

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 06/13] kvm/vfio: Accept vfio device file from userspace
  2023-01-19  9:35   ` Eric Auger
@ 2023-01-30  7:36     ` Liu, Yi L
  0 siblings, 0 replies; 80+ messages in thread
From: Liu, Yi L @ 2023-01-30  7:36 UTC (permalink / raw)
  To: eric.auger, alex.williamson, jgg
  Cc: Tian, Kevin, cohuck, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, suravee.suthikulpanit

> From: Eric Auger <eric.auger@redhat.com>
> Sent: Thursday, January 19, 2023 5:35 PM
> Hi Yi,
> 
> On 1/17/23 14:49, Yi Liu wrote:
> > This defines KVM_DEV_VFIO_FILE* and make alias with
> KVM_DEV_VFIO_GROUP*.
> > Old userspace uses KVM_DEV_VFIO_GROUP* works as well.
> >
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
>
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index 55155e262646..ad36e144a41d 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -1396,15 +1396,26 @@ struct kvm_create_device {
> >
> >  struct kvm_device_attr {
> >  	__u32	flags;		/* no flags currently defined */
> > -	__u32	group;		/* device-defined */
> > -	__u64	attr;		/* group-defined */
> > +	union {
> > +		__u32	group;
> > +		__u32	file;
> > +	}; /* device-defined */
> > +	__u64	attr;		/* VFIO-file-defined or group-defined */
> I think there is a confusion here between the 'VFIO group' terminology
> and the 'kvm device group' terminology. Commands for kvm devices are
> gathered in groups and within groups you have sub-commands called
> attributes.

You are right 😊 will fix it in next version. So even the union is not
needed.

Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 07/13] vfio: Pass struct vfio_device_file * to vfio_device_open/close()
  2023-01-19 20:35     ` Alex Williamson
@ 2023-01-30  9:38       ` Liu, Yi L
  0 siblings, 0 replies; 80+ messages in thread
From: Liu, Yi L @ 2023-01-30  9:38 UTC (permalink / raw)
  To: Alex Williamson, Eric Auger
  Cc: jgg, Tian, Kevin, cohuck, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, suravee.suthikulpanit

Hi Alex,

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, January 20, 2023 4:36 AM
> 
> On Thu, 19 Jan 2023 12:01:59 +0100
> Eric Auger <eric.auger@redhat.com> wrote:
> 
> > >
> > > -void vfio_device_close(struct vfio_device *device,
> > > -		       struct iommufd_ctx *iommufd)
> > > +void vfio_device_close(struct vfio_device_file *df)
> > >  {
> > > +	struct vfio_device *device = df->device;
> > > +
> > >  	mutex_lock(&device->dev_set->lock);
> > >  	vfio_assert_device_open(device);
> > >  	if (device->open_count == 1)
> > > -		vfio_device_last_close(device, iommufd);
> > > +		vfio_device_last_close(df);
> > >  	device->open_count--;
> > >  	mutex_unlock(&device->dev_set->lock);
> > >  }
> 
> I find it strange that the dev_set->lock has been moved to the caller
> for open, but not for close. 

There is dev_set->lock for close. Is it? just as the above code snippet.
dev_set->lock is held before calling vfio_device_last_close.

> Like Eric suggests, this seems to be
> because vfio_device_file is usurping dev_set->lock to protect its own
> fields, but then those fields are set on open, cleared on the open
> error path, but not on close??  Thanks,

Yeah. The on close, the vfio_device_file will be freed. So there is
no clear on the vfio_device_file fields.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 07/13] vfio: Pass struct vfio_device_file * to vfio_device_open/close()
  2023-01-19 11:01   ` Eric Auger
  2023-01-19 20:35     ` Alex Williamson
@ 2023-01-30  9:38     ` Liu, Yi L
  1 sibling, 0 replies; 80+ messages in thread
From: Liu, Yi L @ 2023-01-30  9:38 UTC (permalink / raw)
  To: eric.auger, alex.williamson, jgg
  Cc: Tian, Kevin, cohuck, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, suravee.suthikulpanit

Hi Eric,

> From: Eric Auger <eric.auger@redhat.com>
> Sent: Thursday, January 19, 2023 7:02 PM
> 
> Hi Yi,
> 
> On 1/17/23 14:49, Yi Liu wrote:
> > This avoids passing struct kvm * and struct iommufd_ctx * in multiple
> > functions. vfio_device_open() becomes to be a locked helper.
> why? because dev_set lock now protects vfio_device_file fields? worth to
> explain.

Yeah, this is because the major reference of the vfio_device_file fields
are under dev_set lock.

> do we need to update the comment in vfio.h related to struct
> vfio_device_set?

Yes.

> >
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/group.c     | 34 +++++++++++++++++++++++++---------
> >  drivers/vfio/vfio.h      | 10 +++++-----
> >  drivers/vfio/vfio_main.c | 40 ++++++++++++++++++++++++----------------
> >  3 files changed, 54 insertions(+), 30 deletions(-)
> >
> > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > index d83cf069d290..7200304663e5 100644
> > --- a/drivers/vfio/group.c
> > +++ b/drivers/vfio/group.c
> > @@ -154,33 +154,49 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
> >  	return ret;
> >  }
> >
> > -static int vfio_device_group_open(struct vfio_device *device)
> > +static int vfio_device_group_open(struct vfio_device_file *df)
> >  {
> > +	struct vfio_device *device = df->device;
> >  	int ret;
> >
> >  	mutex_lock(&device->group->group_lock);
> >  	if (!vfio_group_has_iommu(device->group)) {
> >  		ret = -EINVAL;
> > -		goto out_unlock;
> > +		goto err_unlock_group;
> >  	}
> >
> > +	mutex_lock(&device->dev_set->lock);
> is there an explanation somewhere about locking order b/w group_lock,
> dev_set lock?

In the before, dev_set lock is held prior to group_lock. But now, group_lock
is held firstly and then dev_set lock if the group_lock is compiled. E.g. the
group open path.

> >  	/*
> >  	 * Here we pass the KVM pointer with the group under the lock.  If
> the
> >  	 * device driver will use it, it must obtain a reference and release it
> >  	 * during close_device.
> >  	 */
> May be the opportunity to rephrase the above comment. I am not a native
> english speaker but the time concordance seems weird + clarify a
> reference to what.

Oh, it's a reference to kvm pointer. 

> > -	ret = vfio_device_open(device, device->group->iommufd,
> > -			       device->group->kvm);
> > +	df->kvm = device->group->kvm;
> > +	df->iommufd = device->group->iommufd;
> > +
> > +	ret = vfio_device_open(df);
> > +	if (ret)
> > +		goto err_unlock_device;
> > +	mutex_unlock(&device->dev_set->lock);
> >
> > -out_unlock:
> > +	mutex_unlock(&device->group->group_lock);
> > +	return 0;
> > +
> > +err_unlock_device:
> > +	df->kvm = NULL;
> > +	df->iommufd = NULL;
> > +	mutex_unlock(&device->dev_set->lock);
> > +err_unlock_group:
> >  	mutex_unlock(&device->group->group_lock);
> >  	return ret;
> >  }
> >
> > -void vfio_device_group_close(struct vfio_device *device)
> > +void vfio_device_group_close(struct vfio_device_file *df)
> >  {
> > +	struct vfio_device *device = df->device;
> > +
> >  	mutex_lock(&device->group->group_lock);
> > -	vfio_device_close(device, device->group->iommufd);
> > +	vfio_device_close(df);
> >  	mutex_unlock(&device->group->group_lock);
> >  }
> >
> > @@ -196,7 +212,7 @@ static struct file *vfio_device_open_file(struct
> vfio_device *device)
> >  		goto err_out;
> >  	}
> >
> > -	ret = vfio_device_group_open(device);
> > +	ret = vfio_device_group_open(df);
> >  	if (ret)
> >  		goto err_free;
> >
> > @@ -228,7 +244,7 @@ static struct file *vfio_device_open_file(struct
> vfio_device *device)
> >  	return filep;
> >
> >  err_close_device:
> > -	vfio_device_group_close(device);
> > +	vfio_device_group_close(df);
> >  err_free:
> >  	kfree(df);
> >  err_out:
> > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > index 53af6e3ea214..3d8ba165146c 100644
> > --- a/drivers/vfio/vfio.h
> > +++ b/drivers/vfio/vfio.h
> > @@ -19,14 +19,14 @@ struct vfio_container;
> >  struct vfio_device_file {
> >  	struct vfio_device *device;
> >  	struct kvm *kvm;
> > +	struct iommufd_ctx *iommufd;
> >  };
> >
> >  void vfio_device_put_registration(struct vfio_device *device);
> >  bool vfio_device_try_get_registration(struct vfio_device *device);
> > -int vfio_device_open(struct vfio_device *device,
> > -		     struct iommufd_ctx *iommufd, struct kvm *kvm);
> > -void vfio_device_close(struct vfio_device *device,
> > -		       struct iommufd_ctx *iommufd);
> > +int vfio_device_open(struct vfio_device_file *df);
> > +void vfio_device_close(struct vfio_device_file *device);
> > +
> >  struct vfio_device_file *
> >  vfio_allocate_device_file(struct vfio_device *device);
> >
> > @@ -90,7 +90,7 @@ void vfio_device_group_register(struct vfio_device
> *device);
> >  void vfio_device_group_unregister(struct vfio_device *device);
> >  int vfio_device_group_use_iommu(struct vfio_device *device);
> >  void vfio_device_group_unuse_iommu(struct vfio_device *device);
> > -void vfio_device_group_close(struct vfio_device *device);
> > +void vfio_device_group_close(struct vfio_device_file *df);
> >  struct vfio_group *vfio_group_from_file(struct file *file);
> >  bool vfio_group_enforced_coherent(struct vfio_group *group);
> >  void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm);
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index dc08d5dd62cc..3df71bd9cd1e 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -358,9 +358,11 @@ vfio_allocate_device_file(struct vfio_device
> *device)
> >  	return df;
> >  }
> >
> > -static int vfio_device_first_open(struct vfio_device *device,
> > -				  struct iommufd_ctx *iommufd, struct kvm
> *kvm)
> > +static int vfio_device_first_open(struct vfio_device_file *df)
> >  {
> > +	struct vfio_device *device = df->device;
> > +	struct iommufd_ctx *iommufd = df->iommufd;
> > +	struct kvm *kvm = df->kvm;
> >  	int ret;
> >
> >  	lockdep_assert_held(&device->dev_set->lock);
> > @@ -394,9 +396,11 @@ static int vfio_device_first_open(struct
> vfio_device *device,
> >  	return ret;
> >  }
> >
> > -static void vfio_device_last_close(struct vfio_device *device,
> > -				   struct iommufd_ctx *iommufd)
> > +static void vfio_device_last_close(struct vfio_device_file *df)
> >  {
> > +	struct vfio_device *device = df->device;
> > +	struct iommufd_ctx *iommufd = df->iommufd;
> > +
> >  	lockdep_assert_held(&device->dev_set->lock);
> >
> >  	if (device->ops->close_device)
> > @@ -409,30 +413,34 @@ static void vfio_device_last_close(struct
> vfio_device *device,
> >  	module_put(device->dev->driver->owner);
> >  }
> >
> > -int vfio_device_open(struct vfio_device *device,
> > -		     struct iommufd_ctx *iommufd, struct kvm *kvm)
> > +int vfio_device_open(struct vfio_device_file *df)
> >  {
> > -	int ret = 0;
> > +	struct vfio_device *device = df->device;
> > +
> > +	lockdep_assert_held(&device->dev_set->lock);
> >
> > -	mutex_lock(&device->dev_set->lock);
> >  	device->open_count++;
> >  	if (device->open_count == 1) {
> > -		ret = vfio_device_first_open(device, iommufd, kvm);
> > -		if (ret)
> > +		int ret;
> > +
> > +		ret = vfio_device_first_open(df);
> > +		if (ret) {
> >  			device->open_count--;
> > +			return ret;
> nit: the original ret init and return was good enough, no need to change it?

This is a change needed in a later commit to be success-oriented.

https://lore.kernel.org/kvm/20230117134942.101112-11-yi.l.liu@intel.com/

but I guess it is not needed here. So you are right. May just keep the
existing ret init and return.

> > +		}
> >  	}
> > -	mutex_unlock(&device->dev_set->lock);
> >
> > -	return ret;
> > +	return 0;
> >  }
> >
> > -void vfio_device_close(struct vfio_device *device,
> > -		       struct iommufd_ctx *iommufd)
> > +void vfio_device_close(struct vfio_device_file *df)
> >  {
> > +	struct vfio_device *device = df->device;
> > +
> >  	mutex_lock(&device->dev_set->lock);
> >  	vfio_assert_device_open(device);
> >  	if (device->open_count == 1)
> > -		vfio_device_last_close(device, iommufd);
> > +		vfio_device_last_close(df);
> >  	device->open_count--;
> >  	mutex_unlock(&device->dev_set->lock);
> >  }
> > @@ -478,7 +486,7 @@ static int vfio_device_fops_release(struct inode
> *inode, struct file *filep)
> >  	struct vfio_device_file *df = filep->private_data;
> >  	struct vfio_device *device = df->device;
> >
> > -	vfio_device_group_close(device);
> > +	vfio_device_group_close(df);
> >
> >  	vfio_device_put_registration(device);
> >

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 03/13] vfio: Accept vfio device file in the driver facing kAPI
  2023-01-18 16:11   ` Eric Auger
@ 2023-01-30  9:47     ` Liu, Yi L
  2023-01-30 18:02       ` Jason Gunthorpe
  0 siblings, 1 reply; 80+ messages in thread
From: Liu, Yi L @ 2023-01-30  9:47 UTC (permalink / raw)
  To: eric.auger, alex.williamson, jgg
  Cc: Tian, Kevin, cohuck, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, suravee.suthikulpanit

Hi Eric,

> From: Eric Auger <eric.auger@redhat.com>
> Sent: Thursday, January 19, 2023 12:11 AM
> 
> Hi Yi,
> 
> On 1/17/23 14:49, Yi Liu wrote:
> > This makes the vfio file kAPIs to accepte vfio device files, also a
> > preparation for vfio device cdev support.
> >
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/vfio.h      |  1 +
> >  drivers/vfio/vfio_main.c | 51
> ++++++++++++++++++++++++++++++++++++----
> >  2 files changed, 48 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > index ef5de2872983..53af6e3ea214 100644
> > --- a/drivers/vfio/vfio.h
> > +++ b/drivers/vfio/vfio.h
> > @@ -18,6 +18,7 @@ struct vfio_container;
> >
> >  struct vfio_device_file {
> >  	struct vfio_device *device;
> > +	struct kvm *kvm;
> >  };
> >
> >  void vfio_device_put_registration(struct vfio_device *device);
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index 1aedfbd15ca0..dc08d5dd62cc 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -1119,13 +1119,23 @@ const struct file_operations vfio_device_fops
> = {
> >  	.mmap		= vfio_device_fops_mmap,
> >  };
> >
> > +static struct vfio_device *vfio_device_from_file(struct file *file)
> > +{
> > +	struct vfio_device_file *df = file->private_data;
> > +
> > +	if (file->f_op != &vfio_device_fops)
> > +		return NULL;
> > +	return df->device;
> > +}
> > +
> >  /**
> >   * vfio_file_is_valid - True if the file is usable with VFIO aPIS
> >   * @file: VFIO group file or VFIO device file
> >   */
> >  bool vfio_file_is_valid(struct file *file)
> >  {
> > -	return vfio_group_from_file(file);
> > +	return vfio_group_from_file(file) ||
> > +	       vfio_device_from_file(file);
> >  }
> >  EXPORT_SYMBOL_GPL(vfio_file_is_valid);
> >
> > @@ -1140,15 +1150,37 @@ EXPORT_SYMBOL_GPL(vfio_file_is_valid);
> >   */
> >  bool vfio_file_enforced_coherent(struct file *file)
> >  {
> > -	struct vfio_group *group = vfio_group_from_file(file);
> > +	struct vfio_group *group;
> > +	struct vfio_device *device;
> >
> > +	group = vfio_group_from_file(file);
> >  	if (group)
> >  		return vfio_group_enforced_coherent(group);
> >
> > +	device = vfio_device_from_file(file);
> > +	if (device)
> > +		return device_iommu_capable(device->dev,
> > +
> IOMMU_CAP_ENFORCE_CACHE_COHERENCY);
> > +
> >  	return true;
> >  }
> >  EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
> >
> > +static void vfio_device_file_set_kvm(struct file *file, struct kvm *kvm)
> > +{
> > +	struct vfio_device_file *df = file->private_data;
> > +	struct vfio_device *device = df->device;
> > +
> > +	/*
> > +	 * The kvm is first recorded in the df, and will be propagated
> > +	 * to vfio_device::kvm when the file binds iommufd successfully in
> > +	 * the vfio device cdev path.
> > +	 */
> > +	mutex_lock(&device->dev_set->lock);
> it is not totally obvious to me why the
> 
> device->dev_set->lock needs to be held here and why that lock in particular.
> Isn't supposed to protect the vfio_device_set. The header just mentions
> "the VFIO core will provide a lock that is held around
> open_device()/close_device() for all devices in the set."

The reason is the df->kvm was referenced in vfio_device_first_open() in
the below commit. To avoid race, a common lock is needed between the
set_kvm thread and the open thread. For group path, group_lock is used.
However, for cdev path, there may be no group_lock compiled, so need
to use another one. And dev_set->lock happens to be used in the open
path, so use it avoids to adding another specific lock.

https://lore.kernel.org/kvm/20230117134942.101112-8-yi.l.liu@intel.com/

> > +	df->kvm = kvm;
> > +	mutex_unlock(&device->dev_set->lock);

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 08/13] vfio: Block device access via device fd until device is opened
  2023-01-19 14:00   ` Eric Auger
@ 2023-01-30 10:41     ` Liu, Yi L
  0 siblings, 0 replies; 80+ messages in thread
From: Liu, Yi L @ 2023-01-30 10:41 UTC (permalink / raw)
  To: eric.auger, alex.williamson, jgg
  Cc: Tian, Kevin, cohuck, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, suravee.suthikulpanit

> From: Eric Auger <eric.auger@redhat.com>
> Sent: Thursday, January 19, 2023 10:01 PM
> 
> Hi Yi,
> 
> On 1/17/23 14:49, Yi Liu wrote:
> > Allow the vfio_device file to be in a state where the device FD is
> > opened but the device cannot be used by userspace (i.e.
> its .open_device()
> > hasn't been called). This inbetween state is not used when the device
> > FD is spawned from the group FD, however when we create the device FD
> > directly by opening a cdev it will be opened in the blocked state.
> Please explain why this is needed in the commit message (although you
> evoked the rationale in the cover letter).

Sure. 

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 08/13] vfio: Block device access via device fd until device is opened
  2023-01-19 20:47   ` Alex Williamson
@ 2023-01-30 10:48     ` Liu, Yi L
  0 siblings, 0 replies; 80+ messages in thread
From: Liu, Yi L @ 2023-01-30 10:48 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, Tian, Kevin, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang, suravee.suthikulpanit

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, January 20, 2023 4:47 AM
> 
> On Tue, 17 Jan 2023 05:49:37 -0800
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > Allow the vfio_device file to be in a state where the device FD is
> > opened but the device cannot be used by userspace (i.e.
> its .open_device()
> > hasn't been called). This inbetween state is not used when the device
> > FD is spawned from the group FD, however when we create the device FD
> > directly by opening a cdev it will be opened in the blocked state.
> >
> > In the blocked state, currently only the bind operation is allowed,
> > other device accesses are not allowed. Completing bind will allow user
> > to further access the device.
> >
> > This is implemented by adding a flag in struct vfio_device_file to mark
> > the blocked state and using a simple smp_load_acquire() to obtain the
> > flag value and serialize all the device setup with the thread accessing
> > this device.
> >
> > Due to this scheme it is not possible to unbind the FD, once it is bound,
> > it remains bound until the FD is closed.
> >
> > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/vfio.h      |  1 +
> >  drivers/vfio/vfio_main.c | 29 +++++++++++++++++++++++++++++
> >  2 files changed, 30 insertions(+)
> >
> > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > index 3d8ba165146c..c69a9902ea84 100644
> > --- a/drivers/vfio/vfio.h
> > +++ b/drivers/vfio/vfio.h
> > @@ -20,6 +20,7 @@ struct vfio_device_file {
> >  	struct vfio_device *device;
> >  	struct kvm *kvm;
> >  	struct iommufd_ctx *iommufd;
> > +	bool access_granted;
> >  };
> >
> >  void vfio_device_put_registration(struct vfio_device *device);
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index 3df71bd9cd1e..d442ebaa4b21 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -430,6 +430,11 @@ int vfio_device_open(struct vfio_device_file *df)
> >  		}
> >  	}
> >
> > +	/*
> > +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> > +	 * read/write/mmap
> > +	 */
> > +	smp_store_release(&df->access_granted, true);
> 
> Why is this happening outside of the first-open branch?  Thanks,

The reason is the group path allows multiple device fds. But only
the first device fd open instance will trigger the first-open branch. But
all the device fd open instances need to set the access_granted
flag. Surely for the cdev path, this can be moved to the first-open
branch as cdev path only allows one device fd open.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 10/13] vfio: Make vfio_device_open() exclusive between group path and device cdev path
  2023-01-19  9:55   ` Tian, Kevin
@ 2023-01-30 11:59     ` Liu, Yi L
  0 siblings, 0 replies; 80+ messages in thread
From: Liu, Yi L @ 2023-01-30 11:59 UTC (permalink / raw)
  To: Tian, Kevin, alex.williamson, jgg
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, suravee.suthikulpanit

> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Thursday, January 19, 2023 5:55 PM
> 
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Tuesday, January 17, 2023 9:50 PM
> >
> > @@ -17,7 +17,11 @@ struct vfio_device;
> >  struct vfio_container;
> >
> >  struct vfio_device_file {
> > +	/* static fields, init per allocation */
> >  	struct vfio_device *device;
> > +	bool single_open;
> 
> I wonder whether the readability is better by renaming this
> to 'cdev', e.g.:
> 
> 	/*
> 	 * Device cdev path cannot support multiple device open since
> 	 * it doesn't have a secure way for it. So a second device
> 	 * open attempt should be failed if the caller is from a cdev
> 	 * path or the device has already been opened by a cdev path.
> 	 */
> 	if (device->open_count != 0 &&
> 	    (df->cdev || device->single_open))
> 		return -EINVAL;
> 
> 	/*
> 	 * group path supports multiple device open, while cdev doesn't.
> 	 * So use vfio_device_group_close() for !singel_open case.
> 	 */
> 	if (!df->cdev)
> 		vfio_device_group_close(df);
> 
> because from device file p.o.v we just want to differentiate cdev
> vs. group interface. With this change we even don't need the
> comment for the last condition check.
> 
> it's fine to have device->single_open as it's kind of a status bit
> set in the cdev path to prevent more opens on this device.

Ok. 

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 10/13] vfio: Make vfio_device_open() exclusive between group path and device cdev path
  2023-01-19 23:51   ` Alex Williamson
@ 2023-01-30 12:14     ` Liu, Yi L
  2023-02-02  5:34       ` Liu, Yi L
  0 siblings, 1 reply; 80+ messages in thread
From: Liu, Yi L @ 2023-01-30 12:14 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, Tian, Kevin, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang, suravee.suthikulpanit

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, January 20, 2023 7:52 AM
> 
> On Tue, 17 Jan 2023 05:49:39 -0800
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > VFIO group has historically allowed multi-open of the device FD. This
> > was made secure because the "open" was executed via an ioctl to the
> > group FD which is itself only single open.
> >
> > No know use of multiple device FDs is known. It is kind of a strange
>   ^^ ^^^^                               ^^^^^

How about "No known use of multiple device FDs today"

> > thing to do because new device FDs can naturally be created via dup().
> >
> > When we implement the new device uAPI there is no natural way to allow
> > the device itself from being multi-opened in a secure manner. Without
> > the group FD we cannot prove the security context of the opener.
> >
> > Thus, when moving to the new uAPI we block the ability to multi-open
> > the device. This also makes the cdev path exclusive with group path.
> >
> > The main logic is in the vfio_device_open(). It needs to sustain both
> > the legacy behavior i.e. multi-open in the group path and the new
> > behavior i.e. single-open in the cdev path. This mixture leads to the
> > introduction of a new single_open flag stored both in struct vfio_device
> > and vfio_device_file. vfio_device_file::single_open is set per the
> > vfio_device_file allocation. Its value is propagated to struct vfio_device
> > after device is opened successfully.
> >
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/group.c     |  2 +-
> >  drivers/vfio/vfio.h      |  6 +++++-
> >  drivers/vfio/vfio_main.c | 25 ++++++++++++++++++++++---
> >  include/linux/vfio.h     |  1 +
> >  4 files changed, 29 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > index 9484bb1c54a9..57ebe5e1a7e6 100644
> > --- a/drivers/vfio/group.c
> > +++ b/drivers/vfio/group.c
> > @@ -216,7 +216,7 @@ static struct file *vfio_device_open_file(struct
> vfio_device *device)
> >  	struct file *filep;
> >  	int ret;
> >
> > -	df = vfio_allocate_device_file(device);
> > +	df = vfio_allocate_device_file(device, false);
> >  	if (IS_ERR(df)) {
> >  		ret = PTR_ERR(df);
> >  		goto err_out;
> > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > index fe0fcfa78710..bdcf9762521d 100644
> > --- a/drivers/vfio/vfio.h
> > +++ b/drivers/vfio/vfio.h
> > @@ -17,7 +17,11 @@ struct vfio_device;
> >  struct vfio_container;
> >
> >  struct vfio_device_file {
> > +	/* static fields, init per allocation */
> >  	struct vfio_device *device;
> > +	bool single_open;
> > +
> > +	/* fields set after allocation */
> >  	struct kvm *kvm;
> >  	struct iommufd_ctx *iommufd;
> >  	bool access_granted;
> > @@ -30,7 +34,7 @@ int vfio_device_open(struct vfio_device_file *df,
> >  void vfio_device_close(struct vfio_device_file *device);
> >
> >  struct vfio_device_file *
> > -vfio_allocate_device_file(struct vfio_device *device);
> > +vfio_allocate_device_file(struct vfio_device *device, bool single_open);
> >
> >  extern const struct file_operations vfio_device_fops;
> >
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index 90174a9015c4..78725c28b933 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -345,7 +345,7 @@ static bool vfio_assert_device_open(struct
> vfio_device *device)
> >  }
> >
> >  struct vfio_device_file *
> > -vfio_allocate_device_file(struct vfio_device *device)
> > +vfio_allocate_device_file(struct vfio_device *device, bool single_open)
> >  {
> >  	struct vfio_device_file *df;
> >
> > @@ -354,6 +354,7 @@ vfio_allocate_device_file(struct vfio_device
> *device)
> >  		return ERR_PTR(-ENOMEM);
> >
> >  	df->device = device;
> > +	df->single_open = single_open;
> 
> It doesn't make sense to me to convolute the definition of this
> function with an unmemorable bool arg when the one caller that sets the
> value true could simply open code it.

Yeah, how about renaming it just like Kevin's suggestion?

https://lore.kernel.org/kvm/BN9PR11MB52769CBCA68CD25DAC96B33B8CC49@BN9PR11MB5276.namprd11.prod.outlook.com/
 
> 
> >
> >  	return df;
> >  }
> > @@ -421,6 +422,16 @@ int vfio_device_open(struct vfio_device_file *df,
> >
> >  	lockdep_assert_held(&device->dev_set->lock);
> >
> > +	/*
> > +	 * Device cdev path cannot support multiple device open since
> > +	 * it doesn't have a secure way for it. So a second device
> > +	 * open attempt should be failed if the caller is from a cdev
> > +	 * path or the device has already been opened by a cdev path.
> > +	 */
> > +	if (device->open_count != 0 &&
> > +	    (df->single_open || device->single_open))
> > +		return -EINVAL;
> 
> IIUC, the reason this exists is that we let the user open the device
> cdev arbitrarily, but only one instance can call
> ioctl(VFIO_DEVICE_BIND_IOMMUFD).  Why do we bother to let the user
> create those other file instances?  What expectations are we setting
> for the user by allowing them to open the device but not use it?

It won't be able to access device as such device fd is not bound to
an iommufd.

> Clearly we're thinking about a case here where the device has been
> opened via the group path and the user is now attempting to bind the
> same device via the cdev path.

This shall fail as the group path would inc the device->open_count. Then
the cdev path will be failed as the path would have df->single_open==true.

> That seems wrong to even allow and I'm
> surprised it gets this far.  In fact, where do we block a user from
> opening one device in a group via cdev and another via the group?

such scenario would be failed by the DMA owner.

The two paths would be excluded when claiming DMA ownership in
such scenario. The group path uses the vfio_group pointer as DMA
owner marker. While the cdev path uses the iommufd_ctx pointer.
But one group only allows one DMA owner. 

> 
> > +
> >  	device->open_count++;
> >  	if (device->open_count == 1) {
> >  		int ret;
> > @@ -430,6 +441,7 @@ int vfio_device_open(struct vfio_device_file *df,
> >  			device->open_count--;
> >  			return ret;
> >  		}
> > +		device->single_open = df->single_open;
> >  	}
> >
> >  	/*
> > @@ -446,8 +458,10 @@ void vfio_device_close(struct vfio_device_file *df)
> >
> >  	mutex_lock(&device->dev_set->lock);
> >  	vfio_assert_device_open(device);
> > -	if (device->open_count == 1)
> > +	if (device->open_count == 1) {
> >  		vfio_device_last_close(df);
> > +		device->single_open = false;
> > +	}
> >  	device->open_count--;
> >  	mutex_unlock(&device->dev_set->lock);
> >  }
> > @@ -493,7 +507,12 @@ static int vfio_device_fops_release(struct inode
> *inode, struct file *filep)
> >  	struct vfio_device_file *df = filep->private_data;
> >  	struct vfio_device *device = df->device;
> >
> > -	vfio_device_group_close(df);
> > +	/*
> > +	 * group path supports multiple device open, while cdev doesn't.
> > +	 * So use vfio_device_group_close() for !singel_open case.
> > +	 */
> > +	if (!df->single_open)
> > +		vfio_device_group_close(df);
> 
> If we're going to use this to differentiate group vs cdev use cases,
> then let's name it something to reflect that rather than pretending it
> only limits the number of opens, ex. is_cdev_device.  Thanks,

Yes. I'd follow it. Kevin has a similar comment on it.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 09/13] vfio: Add infrastructure for bind_iommufd and attach
  2023-01-19  9:45   ` Tian, Kevin
@ 2023-01-30 13:52     ` Liu, Yi L
  0 siblings, 0 replies; 80+ messages in thread
From: Liu, Yi L @ 2023-01-30 13:52 UTC (permalink / raw)
  To: Tian, Kevin, alex.williamson, jgg
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, suravee.suthikulpanit

> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Thursday, January 19, 2023 5:45 PM
> 
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Tuesday, January 17, 2023 9:50 PM
> >  static int vfio_device_group_open(struct vfio_device_file *df)
> >  {
> >  	struct vfio_device *device = df->device;
> > +	u32 ioas_id;
> > +	u32 *pt_id = NULL;
> >  	int ret;
> >
> >  	mutex_lock(&device->group->group_lock);
> > @@ -165,6 +167,14 @@ static int vfio_device_group_open(struct
> > vfio_device_file *df)
> >  		goto err_unlock_group;
> >  	}
> >
> > +	if (device->group->iommufd) {
> > +		ret = iommufd_vfio_compat_ioas_id(device->group-
> > >iommufd,
> > +						  &ioas_id);
> > +		if (ret)
> > +			goto err_unlock_group;
> > +		pt_id = &ioas_id;
> > +	}
> > +
> >  	mutex_lock(&device->dev_set->lock);
> >  	/*
> >  	 * Here we pass the KVM pointer with the group under the lock.  If
> > the
> > @@ -174,7 +184,7 @@ static int vfio_device_group_open(struct
> > vfio_device_file *df)
> >  	df->kvm = device->group->kvm;
> >  	df->iommufd = device->group->iommufd;
> >
> > -	ret = vfio_device_open(df);
> > +	ret = vfio_device_open(df, NULL, pt_id);
> 
> having both ioas_id and pt_id in one function is a bit confusing.
> 
> Does it read better with below?
> 
> if (device->group->iommufd)
> 	ret = vfio_device_open(df, NULL, &ioas_id);
> else
> 	ret = vfio_device_open(df, NULL, NULL);

Yes. 😊

> > +/* @pt_id == NULL implies detach */
> > +int vfio_iommufd_attach(struct vfio_device *vdev, u32 *pt_id)
> > +{
> > +	lockdep_assert_held(&vdev->dev_set->lock);
> > +
> > +	return vdev->ops->attach_ioas(vdev, pt_id);
> > +}
> 
> what benefit does this one-line wrapper give actually?
> 
> especially pt_id==NULL is checked in the callback instead of in this
> wrapper.

Yep, will just open code in the caller.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 09/13] vfio: Add infrastructure for bind_iommufd and attach
  2023-01-19 23:05   ` Alex Williamson
@ 2023-01-30 13:55     ` Liu, Yi L
  0 siblings, 0 replies; 80+ messages in thread
From: Liu, Yi L @ 2023-01-30 13:55 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, Tian, Kevin, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang, suravee.suthikulpanit

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, January 20, 2023 7:05 AM
> On Tue, 17 Jan 2023 05:49:38 -0800
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > This prepares to add ioctls for device cdev fd. This infrastructure includes:
> >     - add vfio_iommufd_attach() to support iommufd pgtable attach after
> >       bind_iommufd. A NULL pt_id indicates detach.
> >     - let vfio_iommufd_bind() to accept pt_id, e.g. the comapt_ioas_id in
> the
> >       legacy group path, and also return back dev_id if caller requires it.
> >
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/group.c     | 12 +++++-
> >  drivers/vfio/iommufd.c   | 79 ++++++++++++++++++++++++++++++------
> ----
> >  drivers/vfio/vfio.h      | 15 ++++++--
> >  drivers/vfio/vfio_main.c | 10 +++--
> >  4 files changed, 88 insertions(+), 28 deletions(-)
> >
> > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > index 7200304663e5..9484bb1c54a9 100644
> > --- a/drivers/vfio/group.c
> > +++ b/drivers/vfio/group.c
> > @@ -157,6 +157,8 @@ static int vfio_group_ioctl_set_container(struct
> vfio_group *group,
> >  static int vfio_device_group_open(struct vfio_device_file *df)
> >  {
> >  	struct vfio_device *device = df->device;
> > +	u32 ioas_id;
> > +	u32 *pt_id = NULL;
> >  	int ret;
> >
> >  	mutex_lock(&device->group->group_lock);
> > @@ -165,6 +167,14 @@ static int vfio_device_group_open(struct
> vfio_device_file *df)
> >  		goto err_unlock_group;
> >  	}
> >
> > +	if (device->group->iommufd) {
> > +		ret = iommufd_vfio_compat_ioas_id(device->group-
> >iommufd,
> > +						  &ioas_id);
> > +		if (ret)
> > +			goto err_unlock_group;
> > +		pt_id = &ioas_id;
> > +	}
> > +
> >  	mutex_lock(&device->dev_set->lock);
> >  	/*
> >  	 * Here we pass the KVM pointer with the group under the lock.  If
> the
> > @@ -174,7 +184,7 @@ static int vfio_device_group_open(struct
> vfio_device_file *df)
> >  	df->kvm = device->group->kvm;
> >  	df->iommufd = device->group->iommufd;
> >
> > -	ret = vfio_device_open(df);
> > +	ret = vfio_device_open(df, NULL, pt_id);
> >  	if (ret)
> >  		goto err_unlock_device;
> >  	mutex_unlock(&device->dev_set->lock);
> > diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> > index 4f82a6fa7c6c..412644fdbf16 100644
> > --- a/drivers/vfio/iommufd.c
> > +++ b/drivers/vfio/iommufd.c
> > @@ -10,9 +10,17 @@
> >  MODULE_IMPORT_NS(IOMMUFD);
> >  MODULE_IMPORT_NS(IOMMUFD_VFIO);
> >
> > -int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx
> *ictx)
> > +/* @pt_id == NULL implies detach */
> > +int vfio_iommufd_attach(struct vfio_device *vdev, u32 *pt_id)
> > +{
> > +	lockdep_assert_held(&vdev->dev_set->lock);
> > +
> > +	return vdev->ops->attach_ioas(vdev, pt_id);
> > +}
> 
> 
> I find this patch pretty confusing, I think it's rooted in all these
> multiplexed interfaces, which extend all the way out to userspace with
> a magic, reserved page table ID to detach a device from an IOAS.  It
> seems like it would be simpler to make a 'detach' API, a detach_ioas
> callback on the vfio_device_ops, and certainly not an
> vfio_iommufd_attach() function that does a detach provided the correct
> args while also introducing a __vfio_iommufd_detach() function.

Sure. Will change it.

> 
> This series is also missing an update to
> Documentation/driver-api/vfio.rst, which is already behind relative to
> the iommufd interfaces.  Thanks,

Yeah, the vfio.rst is already a bit out-of-date. Will try to update it as well.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/13] vfio: Accept vfio device file in the driver facing kAPI
  2023-01-30  9:47     ` Liu, Yi L
@ 2023-01-30 18:02       ` Jason Gunthorpe
  0 siblings, 0 replies; 80+ messages in thread
From: Jason Gunthorpe @ 2023-01-30 18:02 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: eric.auger, alex.williamson, Tian, Kevin, cohuck, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	suravee.suthikulpanit

On Mon, Jan 30, 2023 at 09:47:08AM +0000, Liu, Yi L wrote:

> The reason is the df->kvm was referenced in vfio_device_first_open() in
> the below commit. To avoid race, a common lock is needed between the
> set_kvm thread and the open thread. For group path, group_lock is used.
> However, for cdev path, there may be no group_lock compiled, so need
> to use another one. And dev_set->lock happens to be used in the open
> path, so use it avoids to adding another specific lock.

Add a comment around the kvm pointer in the struct that it is weirdly
locked

Jason

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 11/13] vfio: Add cdev for vfio_device
  2023-01-20  7:26   ` Tian, Kevin
@ 2023-01-31  6:17     ` Liu, Yi L
  0 siblings, 0 replies; 80+ messages in thread
From: Liu, Yi L @ 2023-01-31  6:17 UTC (permalink / raw)
  To: Tian, Kevin, alex.williamson, jgg
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, suravee.suthikulpanit, Martins, Joao

> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Friday, January 20, 2023 3:27 PM
> 
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Tuesday, January 17, 2023 9:50 PM
> >
> > @@ -156,7 +159,11 @@ static void vfio_device_release(struct device *dev)
> >  			container_of(dev, struct vfio_device, device);
> >
> >  	vfio_release_device_set(device);
> > +#if IS_ENABLED(CONFIG_IOMMUFD)
> > +	ida_free(&vfio.device_ida, MINOR(device->device.devt));
> > +#else
> >  	ida_free(&vfio.device_ida, device->index);
> > +#endif
> 
> There are many #if in this patch, leading to bad readability.
> 
> for this what about letting device->index always storing the minor
> value? then here it could just be:
> 
> 	ida_free(&vfio.device_ida, device->index);

Yes.

> > @@ -232,17 +240,25 @@ static int vfio_init_device(struct vfio_device
> > *device, struct device *dev,
> >  	device->device.release = vfio_device_release;
> >  	device->device.class = vfio.device_class;
> >  	device->device.parent = device->dev;
> > +#if IS_ENABLED(CONFIG_IOMMUFD)
> > +	device->device.devt = MKDEV(MAJOR(vfio.device_devt), minor);
> > +	cdev_init(&device->cdev, &vfio_device_fops);
> > +	device->cdev.owner = THIS_MODULE;
> > +#else
> > +	device->index = minor;
> > +#endif
> 
> Probably we can have a vfio_init_device_cdev() in iommufd.c and let
> it be empty if !CONFIG_IOMMUFD. Then here could be:

Yes. Btw. Will adding another device_cdev.c better than reusing iommufd.c?

> 
> 	device->index = minor;
> 	vfio_init_device_cdev(device, vfio.device_devt, minor);
>
> > @@ -257,7 +273,12 @@ static int __vfio_register_dev(struct vfio_device
> > *device,
> >  	if (!device->dev_set)
> >  		vfio_assign_device_set(device, device);
> >
> > -	ret = dev_set_name(&device->device, "vfio%d", device->index);
> > +#if IS_ENABLED(CONFIG_IOMMUFD)
> > +	minor = MINOR(device->device.devt);
> > +#else
> > +	minor = device->index;
> > +#endif
> 
> then just "minor = device->index"

Yes.

> >
> > +#if IS_ENABLED(CONFIG_IOMMUFD)
> > +	ret = cdev_device_add(&device->cdev, &device->device);
> > +#else
> >  	ret = device_add(&device->device);
> > +#endif
> 
> also add a wrapper vfio_register_device_cdev() which does
> cdev_device_add() if CONFIG_IOMMUFD and device_add() otherwise.

Got it.
> 
> > +#if IS_ENABLED(CONFIG_IOMMUFD)
> > +	/*
> > +	 * Balances device_add in register path. Putting it as the first
> > +	 * operation in unregister to prevent registration refcount from
> > +	 * incrementing per cdev open.
> > +	 */
> > +	cdev_device_del(&device->cdev, &device->device);
> > +#else
> > +	device_del(&device->device);
> > +#endif
> 
> ditto
> 
> > +#if IS_ENABLED(CONFIG_IOMMUFD)
> > +static int vfio_device_fops_open(struct inode *inode, struct file *filep)
> > +{
> > +	struct vfio_device *device = container_of(inode->i_cdev,
> > +						  struct vfio_device, cdev);
> > +	struct vfio_device_file *df;
> > +	int ret;
> > +
> > +	if (!vfio_device_try_get_registration(device))
> > +		return -ENODEV;
> > +
> > +	/*
> > +	 * device access is blocked until .open_device() is called
> > +	 * in BIND_IOMMUFD.
> > +	 */
> > +	df = vfio_allocate_device_file(device, true);
> > +	if (IS_ERR(df)) {
> > +		ret = PTR_ERR(df);
> > +		goto err_put_registration;
> > +	}
> > +
> > +	filep->private_data = df;
> > +
> > +	return 0;
> > +
> > +err_put_registration:
> > +	vfio_device_put_registration(device);
> > +	return ret;
> > +}
> > +#endif
> 
> move to iommufd.c
> 
> > +#if IS_ENABLED(CONFIG_IOMMUFD)
> > +static char *vfio_device_devnode(const struct device *dev, umode_t
> *mode)
> > +{
> > +	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
> > +}
> > +#endif
> 
> ditto
> 
> > @@ -1543,9 +1617,21 @@ static int __init vfio_init(void)
> >  		goto err_dev_class;
> >  	}
> >
> > +#if IS_ENABLED(CONFIG_IOMMUFD)
> > +	vfio.device_class->devnode = vfio_device_devnode;
> > +	ret = alloc_chrdev_region(&vfio.device_devt, 0,
> > +				  MINORMASK + 1, "vfio-dev");
> > +	if (ret)
> > +		goto err_alloc_dev_chrdev;
> > +#endif
> 
> vfio_cdev_init()
> 
> >  static void __exit vfio_cleanup(void)
> >  {
> >  	ida_destroy(&vfio.device_ida);
> > +#if IS_ENABLED(CONFIG_IOMMUFD)
> > +	unregister_chrdev_region(vfio.device_devt, MINORMASK + 1);
> > +#endif
> 
> vfio_cdev_cleanup()

All above comments got.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 10/13] vfio: Make vfio_device_open() exclusive between group path and device cdev path
  2023-01-30 12:14     ` Liu, Yi L
@ 2023-02-02  5:34       ` Liu, Yi L
  2023-02-03 17:41         ` Jason Gunthorpe
  0 siblings, 1 reply; 80+ messages in thread
From: Liu, Yi L @ 2023-02-02  5:34 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, Tian, Kevin, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang, suravee.suthikulpanit

Hi Alex,

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, January 30, 2023 8:15 PM
> 
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Friday, January 20, 2023 7:52 AM
> >
> > On Tue, 17 Jan 2023 05:49:39 -0800
> > Yi Liu <yi.l.liu@intel.com> wrote:
> >
> > > VFIO group has historically allowed multi-open of the device FD. This
> > > was made secure because the "open" was executed via an ioctl to the
> > > group FD which is itself only single open.
> > >
> > > No know use of multiple device FDs is known. It is kind of a strange
> >   ^^ ^^^^                               ^^^^^
> 
> How about "No known use of multiple device FDs today"
> 
> > > thing to do because new device FDs can naturally be created via dup().
> > >
> > > When we implement the new device uAPI there is no natural way to
> allow
> > > the device itself from being multi-opened in a secure manner. Without
> > > the group FD we cannot prove the security context of the opener.
> > >
> > > Thus, when moving to the new uAPI we block the ability to multi-open
> > > the device. This also makes the cdev path exclusive with group path.
> > >
> > > The main logic is in the vfio_device_open(). It needs to sustain both
> > > the legacy behavior i.e. multi-open in the group path and the new
> > > behavior i.e. single-open in the cdev path. This mixture leads to the
> > > introduction of a new single_open flag stored both in struct vfio_device
> > > and vfio_device_file. vfio_device_file::single_open is set per the
> > > vfio_device_file allocation. Its value is propagated to struct vfio_device
> > > after device is opened successfully.
> > >
> > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > ---
> > >  drivers/vfio/group.c     |  2 +-
> > >  drivers/vfio/vfio.h      |  6 +++++-
> > >  drivers/vfio/vfio_main.c | 25 ++++++++++++++++++++++---
> > >  include/linux/vfio.h     |  1 +
> > >  4 files changed, 29 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > > index 9484bb1c54a9..57ebe5e1a7e6 100644
> > > --- a/drivers/vfio/group.c
> > > +++ b/drivers/vfio/group.c
> > > @@ -216,7 +216,7 @@ static struct file *vfio_device_open_file(struct
> > vfio_device *device)
> > >  	struct file *filep;
> > >  	int ret;
> > >
> > > -	df = vfio_allocate_device_file(device);
> > > +	df = vfio_allocate_device_file(device, false);
> > >  	if (IS_ERR(df)) {
> > >  		ret = PTR_ERR(df);
> > >  		goto err_out;
> > > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > > index fe0fcfa78710..bdcf9762521d 100644
> > > --- a/drivers/vfio/vfio.h
> > > +++ b/drivers/vfio/vfio.h
> > > @@ -17,7 +17,11 @@ struct vfio_device;
> > >  struct vfio_container;
> > >
> > >  struct vfio_device_file {
> > > +	/* static fields, init per allocation */
> > >  	struct vfio_device *device;
> > > +	bool single_open;
> > > +
> > > +	/* fields set after allocation */
> > >  	struct kvm *kvm;
> > >  	struct iommufd_ctx *iommufd;
> > >  	bool access_granted;
> > > @@ -30,7 +34,7 @@ int vfio_device_open(struct vfio_device_file *df,
> > >  void vfio_device_close(struct vfio_device_file *device);
> > >
> > >  struct vfio_device_file *
> > > -vfio_allocate_device_file(struct vfio_device *device);
> > > +vfio_allocate_device_file(struct vfio_device *device, bool
> single_open);
> > >
> > >  extern const struct file_operations vfio_device_fops;
> > >
> > > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > > index 90174a9015c4..78725c28b933 100644
> > > --- a/drivers/vfio/vfio_main.c
> > > +++ b/drivers/vfio/vfio_main.c
> > > @@ -345,7 +345,7 @@ static bool vfio_assert_device_open(struct
> > vfio_device *device)
> > >  }
> > >
> > >  struct vfio_device_file *
> > > -vfio_allocate_device_file(struct vfio_device *device)
> > > +vfio_allocate_device_file(struct vfio_device *device, bool single_open)
> > >  {
> > >  	struct vfio_device_file *df;
> > >
> > > @@ -354,6 +354,7 @@ vfio_allocate_device_file(struct vfio_device
> > *device)
> > >  		return ERR_PTR(-ENOMEM);
> > >
> > >  	df->device = device;
> > > +	df->single_open = single_open;
> >
> > It doesn't make sense to me to convolute the definition of this
> > function with an unmemorable bool arg when the one caller that sets the
> > value true could simply open code it.
> 
> Yeah, how about renaming it just like Kevin's suggestion?
> 
> https://lore.kernel.org/kvm/BN9PR11MB52769CBCA68CD25DAC96B33B8CC
> 49@BN9PR11MB5276.namprd11.prod.outlook.com/
> 
> >
> > >
> > >  	return df;
> > >  }
> > > @@ -421,6 +422,16 @@ int vfio_device_open(struct vfio_device_file
> *df,
> > >
> > >  	lockdep_assert_held(&device->dev_set->lock);
> > >
> > > +	/*
> > > +	 * Device cdev path cannot support multiple device open since
> > > +	 * it doesn't have a secure way for it. So a second device
> > > +	 * open attempt should be failed if the caller is from a cdev
> > > +	 * path or the device has already been opened by a cdev path.
> > > +	 */
> > > +	if (device->open_count != 0 &&
> > > +	    (df->single_open || device->single_open))
> > > +		return -EINVAL;
> >
> > IIUC, the reason this exists is that we let the user open the device
> > cdev arbitrarily, but only one instance can call
> > ioctl(VFIO_DEVICE_BIND_IOMMUFD).  Why do we bother to let the user
> > create those other file instances?  What expectations are we setting
> > for the user by allowing them to open the device but not use it?
> 
> It won't be able to access device as such device fd is not bound to
> an iommufd.
> 
> > Clearly we're thinking about a case here where the device has been
> > opened via the group path and the user is now attempting to bind the
> > same device via the cdev path.
> 
> This shall fail as the group path would inc the device->open_count. Then
> the cdev path will be failed as the path would have df->single_open==true.
> 
> > That seems wrong to even allow and I'm
> > surprised it gets this far.  In fact, where do we block a user from
> > opening one device in a group via cdev and another via the group?
> 
> such scenario would be failed by the DMA owner.
> 
> The two paths would be excluded when claiming DMA ownership in
> such scenario. The group path uses the vfio_group pointer as DMA
> owner marker. While the cdev path uses the iommufd_ctx pointer.
> But one group only allows one DMA owner.

However, there is one possibility that the group path and cdev path
have the same DMA marker. If the group path is the vfio iommufd
compat mode, iommufd is used as container fd, and its DMA marker
is also iommufd_ctx pointer, so it is possible that two devices in the
the same group may be opened by different paths (the vfio compat
mode group path and the cdev path).

This seems to be ok. The group path will attach the group to an auto-allocated
iommu_domain, while the cdev path actually waits for userspace to
attach it to an IOAS. Userspace should take care of it. It should ensure
the devices in the same group should be attached to the same domain.

This seems to be no much difference with userspace opening two
devices (from the same group) via cdev, userspace needs to attach them to
the same domain as well.

Also, there is a work from Nicolin to support domain replacement. So even
the group path has attached the group to an iommu_domain, userspace can
later replace it with the domain it desired (maybe still an auto-allocated
domain during the IOAS attachment or a userspace allocated domain this is
in my nesting series https://github.com/yiliu1765/iommufd/commits/wip/iommufd-v6.2-rc5-nesting).

So for a VFIO device, we use the single_open flag to exclude opening from
the group path and cdev path. While for different devices in the same group,
normally userspace may not open devices in different paths.  Even it does,
seems to be fine as above statement.

Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 10/13] vfio: Make vfio_device_open() exclusive between group path and device cdev path
  2023-02-02  5:34       ` Liu, Yi L
@ 2023-02-03 17:41         ` Jason Gunthorpe
  2023-02-06  4:30           ` Liu, Yi L
  0 siblings, 1 reply; 80+ messages in thread
From: Jason Gunthorpe @ 2023-02-03 17:41 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Alex Williamson, Tian, Kevin, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	suravee.suthikulpanit

On Thu, Feb 02, 2023 at 05:34:15AM +0000, Liu, Yi L wrote:

> This seems to be ok. The group path will attach the group to an auto-allocated
> iommu_domain, while the cdev path actually waits for userspace to
> attach it to an IOAS. Userspace should take care of it. It should ensure
> the devices in the same group should be attached to the same domain.

Aren't there problems when someone closes the group or device FD while
the other one is still open though?

Jason

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 10/13] vfio: Make vfio_device_open() exclusive between group path and device cdev path
  2023-02-03 17:41         ` Jason Gunthorpe
@ 2023-02-06  4:30           ` Liu, Yi L
  2023-02-06 10:09             ` Tian, Kevin
  0 siblings, 1 reply; 80+ messages in thread
From: Liu, Yi L @ 2023-02-06  4:30 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alex Williamson, Tian, Kevin, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	suravee.suthikulpanit

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, February 4, 2023 1:41 AM
> 
> On Thu, Feb 02, 2023 at 05:34:15AM +0000, Liu, Yi L wrote:
> 
> > This seems to be ok. The group path will attach the group to an auto-
> allocated
> > iommu_domain, while the cdev path actually waits for userspace to
> > attach it to an IOAS. Userspace should take care of it. It should ensure
> > the devices in the same group should be attached to the same domain.
> 
> Aren't there problems when someone closes the group or device FD while
> the other one is still open though?

Guess no. userspace can only open devices from the same group by both
group path and cdev path when the group path is iommufd compat mode
and uses the same iommufd as cdev path. This means the attach API are
the same in the two paths. I think iommufd attach API is able to manage
one device is closed while other devices are still attached.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 12/13] vfio: Add ioctls for device cdev iommufd
  2023-01-20  8:03   ` Tian, Kevin
@ 2023-02-06  9:07     ` Liu, Yi L
  0 siblings, 0 replies; 80+ messages in thread
From: Liu, Yi L @ 2023-02-06  9:07 UTC (permalink / raw)
  To: Tian, Kevin, alex.williamson, jgg
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, suravee.suthikulpanit

> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Friday, January 20, 2023 4:03 PM
> 
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Tuesday, January 17, 2023 9:50 PM
> >
> > This adds two vfio device ioctls for userspace using iommufd on vfio
> > devices.
> >
> >     VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain
> > DMA
> > 			      control provided by the iommufd. VFIO no
> > 			      iommu is indicated by passing a minus
> > 			      fd value.
> 
> Can't this be a flag bit for better readability than using a special value?
> 
> >     VFIO_DEVICE_ATTACH_IOMMUFD_PT: attach device to ioas, page
> tables
> > 				   managed by iommufd. Attach can be
> > 				   undo by passing IOMMUFD_INVALID_ID
> > 				   to kernel.
> 
> With Alex' remark we need a separate DETACH cmd now.

Yes.

> >
> > +	/*
> > +	 * For group path, iommufd pointer is NULL when comes into this
> > +	 * helper. Its noiommu support is in container.c.
> > +	 *
> > +	 * For iommufd compat mode, iommufd pointer here is a valid value.
> > +	 * Its noiommu support is supposed to be in vfio_iommufd_bind().
> > +	 *
> > +	 * For device cdev path, iommufd pointer here is a valid value for
> > +	 * normal cases, but it is NULL if it's noiommu. The reason is
> > +	 * that userspace uses iommufd==-1 to indicate noiommu mode in
> > this
> > +	 * path. So caller of this helper will pass in a NULL iommufd
> > +	 * pointer. To differentiate it from the group path which also
> > +	 * passes NULL iommufd pointer in, df->noiommu is used. For cdev
> > +	 * noiommu, df->noiommu would be set to mark noiommu case for
> > cdev
> > +	 * path.
> > +	 *
> > +	 * So if df->noiommu is set then this helper just goes ahead to
> > +	 * open device. If not, it depends on if iommufd pointer is NULL
> > +	 * to handle the group path, iommufd compat mode, normal cases
> in
> > +	 * the cdev path.
> > +	 */
> >  	if (iommufd)
> >  		ret = vfio_iommufd_bind(device, iommufd, dev_id, pt_id);
> > -	else
> > +	else if (!df->noiommu)
> >  		ret = vfio_device_group_use_iommu(device);
> >  	if (ret)
> >  		goto err_module_put;
> 
> Isn't 'ret' uninitialized when df->noiommu is true?

Done.

> > +static int vfio_ioctl_device_attach(struct vfio_device *device,
> > +				    struct vfio_device_feature __user *arg)
> > +{
> > +	struct vfio_device_attach_iommufd_pt attach;
> > +	int ret;
> > +	bool is_attach;
> > +
> > +	if (copy_from_user(&attach, (void __user *)arg, sizeof(attach)))
> > +		return -EFAULT;
> > +
> > +	if (attach.flags)
> > +		return -EINVAL;
> > +
> > +	if (!device->ops->bind_iommufd)
> > +		return -ENODEV;
> > +
> 
> this should fail if noiommu is true.

Yes.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 10/13] vfio: Make vfio_device_open() exclusive between group path and device cdev path
  2023-02-06  4:30           ` Liu, Yi L
@ 2023-02-06 10:09             ` Tian, Kevin
  2023-02-06 15:10               ` Jason Gunthorpe
  0 siblings, 1 reply; 80+ messages in thread
From: Tian, Kevin @ 2023-02-06 10:09 UTC (permalink / raw)
  To: Liu, Yi L, Jason Gunthorpe
  Cc: Alex Williamson, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang, suravee.suthikulpanit

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, February 6, 2023 12:31 PM
> 
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Saturday, February 4, 2023 1:41 AM
> >
> > On Thu, Feb 02, 2023 at 05:34:15AM +0000, Liu, Yi L wrote:
> >
> > > This seems to be ok. The group path will attach the group to an auto-
> > allocated
> > > iommu_domain, while the cdev path actually waits for userspace to
> > > attach it to an IOAS. Userspace should take care of it. It should ensure
> > > the devices in the same group should be attached to the same domain.
> >
> > Aren't there problems when someone closes the group or device FD while
> > the other one is still open though?
> 
> Guess no. userspace can only open devices from the same group by both
> group path and cdev path when the group path is iommufd compat mode
> and uses the same iommufd as cdev path. This means the attach API are
> the same in the two paths. I think iommufd attach API is able to manage
> one device is closed while other devices are still attached.
> 

I guess the problem is on DMA ownership.

iommu_group_release_dma_owner() just blindly set
group->owner_cnt to 0. So if someone closes the group FD while
cdev FD is still open, the ownership model is completely broken.

IMHO using iommufd_ctx to mark DMA ownership for iommufd compat
doesn't sound right. The group claim/release helpers are not designed
to be exclusive but due to sharing internal logic with device claim/release
we then allow group/cdev to share ownership when a same owner is
marked.

It's probably simpler if we always mark DMA owner with vfio_group
for the group path, no matter vfio type1 or iommufd compat is used.
This should avoid all the tricky corner cases between the two paths.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 10/13] vfio: Make vfio_device_open() exclusive between group path and device cdev path
  2023-02-06 10:09             ` Tian, Kevin
@ 2023-02-06 15:10               ` Jason Gunthorpe
  2023-02-06 15:51                 ` Liu, Yi L
  0 siblings, 1 reply; 80+ messages in thread
From: Jason Gunthorpe @ 2023-02-06 15:10 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, Alex Williamson, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	suravee.suthikulpanit

On Mon, Feb 06, 2023 at 10:09:52AM +0000, Tian, Kevin wrote:
> It's probably simpler if we always mark DMA owner with vfio_group
> for the group path, no matter vfio type1 or iommufd compat is used.
> This should avoid all the tricky corner cases between the two paths.

Yes

Jason

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 10/13] vfio: Make vfio_device_open() exclusive between group path and device cdev path
  2023-02-06 15:10               ` Jason Gunthorpe
@ 2023-02-06 15:51                 ` Liu, Yi L
  2023-02-07  0:35                   ` Tian, Kevin
  0 siblings, 1 reply; 80+ messages in thread
From: Liu, Yi L @ 2023-02-06 15:51 UTC (permalink / raw)
  To: Jason Gunthorpe, Tian, Kevin
  Cc: Alex Williamson, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang, suravee.suthikulpanit

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Monday, February 6, 2023 11:11 PM
> 
> On Mon, Feb 06, 2023 at 10:09:52AM +0000, Tian, Kevin wrote:
> > It's probably simpler if we always mark DMA owner with vfio_group
> > for the group path, no matter vfio type1 or iommufd compat is used.
> > This should avoid all the tricky corner cases between the two paths.
> 
> Yes

Then, we have two choices:

1) extend iommufd_device_bind() to allow a caller-specified DMA marker
2) claim DMA owner before calling iommufd_device_bind(), still need to
     extend iommufd_device_bind() to accept a flag to bypass DMA owner claim

which one would be better? or do we have a third choice?

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 10/13] vfio: Make vfio_device_open() exclusive between group path and device cdev path
  2023-02-06 15:51                 ` Liu, Yi L
@ 2023-02-07  0:35                   ` Tian, Kevin
  2023-02-07 13:12                     ` Jason Gunthorpe
  0 siblings, 1 reply; 80+ messages in thread
From: Tian, Kevin @ 2023-02-07  0:35 UTC (permalink / raw)
  To: Liu, Yi L, Jason Gunthorpe
  Cc: Alex Williamson, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang, suravee.suthikulpanit

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, February 6, 2023 11:51 PM
> 
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Monday, February 6, 2023 11:11 PM
> >
> > On Mon, Feb 06, 2023 at 10:09:52AM +0000, Tian, Kevin wrote:
> > > It's probably simpler if we always mark DMA owner with vfio_group
> > > for the group path, no matter vfio type1 or iommufd compat is used.
> > > This should avoid all the tricky corner cases between the two paths.
> >
> > Yes
> 
> Then, we have two choices:
> 
> 1) extend iommufd_device_bind() to allow a caller-specified DMA marker
> 2) claim DMA owner before calling iommufd_device_bind(), still need to
>      extend iommufd_device_bind() to accept a flag to bypass DMA owner
> claim
> 
> which one would be better? or do we have a third choice?
> 

first one

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 10/13] vfio: Make vfio_device_open() exclusive between group path and device cdev path
  2023-02-07  0:35                   ` Tian, Kevin
@ 2023-02-07 13:12                     ` Jason Gunthorpe
  2023-02-07 13:19                       ` Liu, Yi L
  0 siblings, 1 reply; 80+ messages in thread
From: Jason Gunthorpe @ 2023-02-07 13:12 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, Alex Williamson, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	suravee.suthikulpanit

On Tue, Feb 07, 2023 at 12:35:48AM +0000, Tian, Kevin wrote:
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Monday, February 6, 2023 11:51 PM
> > 
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Monday, February 6, 2023 11:11 PM
> > >
> > > On Mon, Feb 06, 2023 at 10:09:52AM +0000, Tian, Kevin wrote:
> > > > It's probably simpler if we always mark DMA owner with vfio_group
> > > > for the group path, no matter vfio type1 or iommufd compat is used.
> > > > This should avoid all the tricky corner cases between the two paths.
> > >
> > > Yes
> > 
> > Then, we have two choices:
> > 
> > 1) extend iommufd_device_bind() to allow a caller-specified DMA marker
> > 2) claim DMA owner before calling iommufd_device_bind(), still need to
> >      extend iommufd_device_bind() to accept a flag to bypass DMA owner
> > claim
> > 
> > which one would be better? or do we have a third choice?
> > 
> 
> first one

Why can't this all be handled in vfio??

Jason

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 10/13] vfio: Make vfio_device_open() exclusive between group path and device cdev path
  2023-02-07 13:12                     ` Jason Gunthorpe
@ 2023-02-07 13:19                       ` Liu, Yi L
  2023-02-07 13:20                         ` Jason Gunthorpe
  0 siblings, 1 reply; 80+ messages in thread
From: Liu, Yi L @ 2023-02-07 13:19 UTC (permalink / raw)
  To: Jason Gunthorpe, Tian, Kevin
  Cc: Alex Williamson, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang, suravee.suthikulpanit

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, February 7, 2023 9:13 PM
> 
> On Tue, Feb 07, 2023 at 12:35:48AM +0000, Tian, Kevin wrote:
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Monday, February 6, 2023 11:51 PM
> > >
> > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > Sent: Monday, February 6, 2023 11:11 PM
> > > >
> > > > On Mon, Feb 06, 2023 at 10:09:52AM +0000, Tian, Kevin wrote:
> > > > > It's probably simpler if we always mark DMA owner with vfio_group
> > > > > for the group path, no matter vfio type1 or iommufd compat is used.
> > > > > This should avoid all the tricky corner cases between the two paths.
> > > >
> > > > Yes
> > >
> > > Then, we have two choices:
> > >
> > > 1) extend iommufd_device_bind() to allow a caller-specified DMA
> marker
> > > 2) claim DMA owner before calling iommufd_device_bind(), still need to
> > >      extend iommufd_device_bind() to accept a flag to bypass DMA
> owner
> > > claim
> > >
> > > which one would be better? or do we have a third choice?
> > >
> >
> > first one
> 
> Why can't this all be handled in vfio??

Are you preferring the second one? Surely VFIO can claim DMA owner
by itself. But it is the vfio iommufd compat mode, so it still needs to call
iommufd_device_bind(). And it should bypass DMA owner claim since
it's already done.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 10/13] vfio: Make vfio_device_open() exclusive between group path and device cdev path
  2023-02-07 13:19                       ` Liu, Yi L
@ 2023-02-07 13:20                         ` Jason Gunthorpe
  2023-02-07 13:23                           ` Liu, Yi L
  0 siblings, 1 reply; 80+ messages in thread
From: Jason Gunthorpe @ 2023-02-07 13:20 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Tian, Kevin, Alex Williamson, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	suravee.suthikulpanit

On Tue, Feb 07, 2023 at 01:19:10PM +0000, Liu, Yi L wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, February 7, 2023 9:13 PM
> > 
> > On Tue, Feb 07, 2023 at 12:35:48AM +0000, Tian, Kevin wrote:
> > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > Sent: Monday, February 6, 2023 11:51 PM
> > > >
> > > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > > Sent: Monday, February 6, 2023 11:11 PM
> > > > >
> > > > > On Mon, Feb 06, 2023 at 10:09:52AM +0000, Tian, Kevin wrote:
> > > > > > It's probably simpler if we always mark DMA owner with vfio_group
> > > > > > for the group path, no matter vfio type1 or iommufd compat is used.
> > > > > > This should avoid all the tricky corner cases between the two paths.
> > > > >
> > > > > Yes
> > > >
> > > > Then, we have two choices:
> > > >
> > > > 1) extend iommufd_device_bind() to allow a caller-specified DMA
> > marker
> > > > 2) claim DMA owner before calling iommufd_device_bind(), still need to
> > > >      extend iommufd_device_bind() to accept a flag to bypass DMA
> > owner
> > > > claim
> > > >
> > > > which one would be better? or do we have a third choice?
> > > >
> > >
> > > first one
> > 
> > Why can't this all be handled in vfio??
> 
> Are you preferring the second one? Surely VFIO can claim DMA owner
> by itself. But it is the vfio iommufd compat mode, so it still needs to call
> iommufd_device_bind(). And it should bypass DMA owner claim since
> it's already done.

No, I mean why can't vfio just call iommufd exactly once regardless of
what mode it is running in?

Jason

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 10/13] vfio: Make vfio_device_open() exclusive between group path and device cdev path
  2023-02-07 13:20                         ` Jason Gunthorpe
@ 2023-02-07 13:23                           ` Liu, Yi L
  2023-02-07 13:27                             ` Jason Gunthorpe
  0 siblings, 1 reply; 80+ messages in thread
From: Liu, Yi L @ 2023-02-07 13:23 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Tian, Kevin, Alex Williamson, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	suravee.suthikulpanit

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, February 7, 2023 9:20 PM
> 
> On Tue, Feb 07, 2023 at 01:19:10PM +0000, Liu, Yi L wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Tuesday, February 7, 2023 9:13 PM
> > >
> > > On Tue, Feb 07, 2023 at 12:35:48AM +0000, Tian, Kevin wrote:
> > > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > > Sent: Monday, February 6, 2023 11:51 PM
> > > > >
> > > > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > > > Sent: Monday, February 6, 2023 11:11 PM
> > > > > >
> > > > > > On Mon, Feb 06, 2023 at 10:09:52AM +0000, Tian, Kevin wrote:
> > > > > > > It's probably simpler if we always mark DMA owner with
> vfio_group
> > > > > > > for the group path, no matter vfio type1 or iommufd compat is
> used.
> > > > > > > This should avoid all the tricky corner cases between the two
> paths.
> > > > > >
> > > > > > Yes
> > > > >
> > > > > Then, we have two choices:
> > > > >
> > > > > 1) extend iommufd_device_bind() to allow a caller-specified DMA
> > > marker
> > > > > 2) claim DMA owner before calling iommufd_device_bind(), still
> need to
> > > > >      extend iommufd_device_bind() to accept a flag to bypass DMA
> > > owner
> > > > > claim
> > > > >
> > > > > which one would be better? or do we have a third choice?
> > > > >
> > > >
> > > > first one
> > >
> > > Why can't this all be handled in vfio??
> >
> > Are you preferring the second one? Surely VFIO can claim DMA owner
> > by itself. But it is the vfio iommufd compat mode, so it still needs to call
> > iommufd_device_bind(). And it should bypass DMA owner claim since
> > it's already done.
> 
> No, I mean why can't vfio just call iommufd exactly once regardless of
> what mode it is running in?

This seems to be moving the DMA owner claim out of iommufd_device_bind().
Is it? Then either group and cdev can claim DMA owner with their own DMA
marker.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 10/13] vfio: Make vfio_device_open() exclusive between group path and device cdev path
  2023-02-07 13:23                           ` Liu, Yi L
@ 2023-02-07 13:27                             ` Jason Gunthorpe
  2023-02-07 13:55                               ` Liu, Yi L
  2023-02-08  4:23                               ` Tian, Kevin
  0 siblings, 2 replies; 80+ messages in thread
From: Jason Gunthorpe @ 2023-02-07 13:27 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Tian, Kevin, Alex Williamson, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	suravee.suthikulpanit

On Tue, Feb 07, 2023 at 01:23:35PM +0000, Liu, Yi L wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, February 7, 2023 9:20 PM
> > 
> > On Tue, Feb 07, 2023 at 01:19:10PM +0000, Liu, Yi L wrote:
> > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > Sent: Tuesday, February 7, 2023 9:13 PM
> > > >
> > > > On Tue, Feb 07, 2023 at 12:35:48AM +0000, Tian, Kevin wrote:
> > > > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > > > Sent: Monday, February 6, 2023 11:51 PM
> > > > > >
> > > > > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > > > > Sent: Monday, February 6, 2023 11:11 PM
> > > > > > >
> > > > > > > On Mon, Feb 06, 2023 at 10:09:52AM +0000, Tian, Kevin wrote:
> > > > > > > > It's probably simpler if we always mark DMA owner with
> > vfio_group
> > > > > > > > for the group path, no matter vfio type1 or iommufd compat is
> > used.
> > > > > > > > This should avoid all the tricky corner cases between the two
> > paths.
> > > > > > >
> > > > > > > Yes
> > > > > >
> > > > > > Then, we have two choices:
> > > > > >
> > > > > > 1) extend iommufd_device_bind() to allow a caller-specified DMA
> > > > marker
> > > > > > 2) claim DMA owner before calling iommufd_device_bind(), still
> > need to
> > > > > >      extend iommufd_device_bind() to accept a flag to bypass DMA
> > > > owner
> > > > > > claim
> > > > > >
> > > > > > which one would be better? or do we have a third choice?
> > > > > >
> > > > >
> > > > > first one
> > > >
> > > > Why can't this all be handled in vfio??
> > >
> > > Are you preferring the second one? Surely VFIO can claim DMA owner
> > > by itself. But it is the vfio iommufd compat mode, so it still needs to call
> > > iommufd_device_bind(). And it should bypass DMA owner claim since
> > > it's already done.
> > 
> > No, I mean why can't vfio just call iommufd exactly once regardless of
> > what mode it is running in?
> 
> This seems to be moving the DMA owner claim out of iommufd_device_bind().
> Is it? Then either group and cdev can claim DMA owner with their own DMA
> marker.

No, it has nothing to do with DMA ownership

Just keep a flag in vfio saying it is in group mode or device mode and
act accordingly.

The iommufd DMA owner check is *only* to be used for protecting
against two unrelated drivers trying to claim the same device.

Jason

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 10/13] vfio: Make vfio_device_open() exclusive between group path and device cdev path
  2023-02-07 13:27                             ` Jason Gunthorpe
@ 2023-02-07 13:55                               ` Liu, Yi L
  2023-02-08  4:23                               ` Tian, Kevin
  1 sibling, 0 replies; 80+ messages in thread
From: Liu, Yi L @ 2023-02-07 13:55 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Tian, Kevin, Alex Williamson, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	suravee.suthikulpanit

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, February 7, 2023 9:27 PM
>
> On Tue, Feb 07, 2023 at 01:23:35PM +0000, Liu, Yi L wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Tuesday, February 7, 2023 9:20 PM
> > >
> > > On Tue, Feb 07, 2023 at 01:19:10PM +0000, Liu, Yi L wrote:
> > > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > > Sent: Tuesday, February 7, 2023 9:13 PM
> > > > >
> > > > > On Tue, Feb 07, 2023 at 12:35:48AM +0000, Tian, Kevin wrote:
> > > > > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > > > > Sent: Monday, February 6, 2023 11:51 PM
> > > > > > >
> > > > > > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > > > > > Sent: Monday, February 6, 2023 11:11 PM
> > > > > > > >
> > > > > > > > On Mon, Feb 06, 2023 at 10:09:52AM +0000, Tian, Kevin wrote:
> > > > > > > > > It's probably simpler if we always mark DMA owner with
> > > vfio_group
> > > > > > > > > for the group path, no matter vfio type1 or iommufd compat
> is
> > > used.
> > > > > > > > > This should avoid all the tricky corner cases between the two
> > > paths.
> > > > > > > >
> > > > > > > > Yes
> > > > > > >
> > > > > > > Then, we have two choices:
> > > > > > >
> > > > > > > 1) extend iommufd_device_bind() to allow a caller-specified
> DMA
> > > > > marker
> > > > > > > 2) claim DMA owner before calling iommufd_device_bind(), still
> > > need to
> > > > > > >      extend iommufd_device_bind() to accept a flag to bypass
> DMA
> > > > > owner
> > > > > > > claim
> > > > > > >
> > > > > > > which one would be better? or do we have a third choice?
> > > > > > >
> > > > > >
> > > > > > first one
> > > > >
> > > > > Why can't this all be handled in vfio??
> > > >
> > > > Are you preferring the second one? Surely VFIO can claim DMA owner
> > > > by itself. But it is the vfio iommufd compat mode, so it still needs to call
> > > > iommufd_device_bind(). And it should bypass DMA owner claim since
> > > > it's already done.
> > >
> > > No, I mean why can't vfio just call iommufd exactly once regardless of
> > > what mode it is running in?
> >
> > This seems to be moving the DMA owner claim out of
> iommufd_device_bind().
> > Is it? Then either group and cdev can claim DMA owner with their own
> DMA
> > marker.
> 
> No, it has nothing to do with DMA ownership

sorry, I'm a bit lost here. Back to Kevin's suggestion. He suggested below.

"It's probably simpler if we always mark DMA owner with vfio_group
for the group path, no matter vfio type1 or iommufd compat is used.
This should avoid all the tricky corner cases between the two paths."

This means to enforce the group path uses vfio_group as DMA ownership
marker, while cdev path uses iommufd as DMA marker. Then there
will be no possibility that the vfio iommufd compat mode group path
can share the same DMA ownerhsip marker with cdev path. With this
the devices within the same group can only be opened either by group
path or cdev path, but no mixture.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* RE: [PATCH 10/13] vfio: Make vfio_device_open() exclusive between group path and device cdev path
  2023-02-07 13:27                             ` Jason Gunthorpe
  2023-02-07 13:55                               ` Liu, Yi L
@ 2023-02-08  4:23                               ` Tian, Kevin
  2023-02-08 12:41                                 ` Jason Gunthorpe
  1 sibling, 1 reply; 80+ messages in thread
From: Tian, Kevin @ 2023-02-08  4:23 UTC (permalink / raw)
  To: Jason Gunthorpe, Liu, Yi L
  Cc: Alex Williamson, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang, suravee.suthikulpanit

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, February 7, 2023 9:27 PM
> > >
> > > No, I mean why can't vfio just call iommufd exactly once regardless of
> > > what mode it is running in?
> >
> > This seems to be moving the DMA owner claim out of
> iommufd_device_bind().
> > Is it? Then either group and cdev can claim DMA owner with their own
> DMA
> > marker.
> 
> No, it has nothing to do with DMA ownership
> 
> Just keep a flag in vfio saying it is in group mode or device mode and
> act accordingly.

It cannot be a simple flag. needs to be a refcnt since multiple devices 
in the group might be opened via cdev so the device mode should be
cleared only when the last device via cdev is closed.

Yi actually did implement such a flavor before, kind of introducing
a vfio_group->cdev_opened_cnt field.

Then cdev bind_iommufd checks whether vfio_group->opened_file
has been set in the group open path. If not then increment
vfio_group->cdev_opened_cnt.

cdev close decrements vfio_group->cdev_opened_cnt.

group open checks whether vfio_group->cdev_opened_cnt has been
set. If not go to set vfio_group->opened_file.

In this case only one path can claim DMA ownership.

Is above what you expect?

> 
> The iommufd DMA owner check is *only* to be used for protecting
> against two unrelated drivers trying to claim the same device.
> 

this is just one implementation choice. I don't see why it cannot be
extended to allow one driver to protect against two internal paths.
Just simply allow the driver to assign an owner instead of assuming
iommufd_ctx.


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 10/13] vfio: Make vfio_device_open() exclusive between group path and device cdev path
  2023-02-08  4:23                               ` Tian, Kevin
@ 2023-02-08 12:41                                 ` Jason Gunthorpe
  0 siblings, 0 replies; 80+ messages in thread
From: Jason Gunthorpe @ 2023-02-08 12:41 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, Alex Williamson, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	suravee.suthikulpanit

On Wed, Feb 08, 2023 at 04:23:16AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, February 7, 2023 9:27 PM
> > > >
> > > > No, I mean why can't vfio just call iommufd exactly once regardless of
> > > > what mode it is running in?
> > >
> > > This seems to be moving the DMA owner claim out of
> > iommufd_device_bind().
> > > Is it? Then either group and cdev can claim DMA owner with their own
> > DMA
> > > marker.
> > 
> > No, it has nothing to do with DMA ownership
> > 
> > Just keep a flag in vfio saying it is in group mode or device mode and
> > act accordingly.
> 
> It cannot be a simple flag. needs to be a refcnt since multiple devices 
> in the group might be opened via cdev so the device mode should be
> cleared only when the last device via cdev is closed.
> 
> Yi actually did implement such a flavor before, kind of introducing
> a vfio_group->cdev_opened_cnt field.
> 
> Then cdev bind_iommufd checks whether vfio_group->opened_file
> has been set in the group open path. If not then increment
> vfio_group->cdev_opened_cnt.
> 
> cdev close decrements vfio_group->cdev_opened_cnt.
> 
> group open checks whether vfio_group->cdev_opened_cnt has been
> set. If not go to set vfio_group->opened_file.
> 
> In this case only one path can claim DMA ownership.
> 
> Is above what you expect?

It seems appropriate

You could also sweep the device list to see how the indivudal devices are
open to decice what to do.

> > The iommufd DMA owner check is *only* to be used for protecting
> > against two unrelated drivers trying to claim the same device.
> > 
> 
> this is just one implementation choice. I don't see why it cannot be
> extended to allow one driver to protect against two internal paths.
> Just simply allow the driver to assign an owner instead of assuming
> iommufd_ctx.

It is really not what it is for, and the owner thing is so ugly I
don't like the pattern. 

Jason

^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2023-02-08 12:41 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-17 13:49 [PATCH 00/13] Add vfio_device cdev for iommufd support Yi Liu
2023-01-17 13:49 ` [PATCH 01/13] vfio: Allocate per device file structure Yi Liu
2023-01-18  8:37   ` Tian, Kevin
2023-01-18 13:28   ` Eric Auger
2023-01-17 13:49 ` [PATCH 02/13] vfio: Refine vfio file kAPIs Yi Liu
2023-01-18  8:42   ` Tian, Kevin
2023-01-18 14:37   ` Eric Auger
2023-01-29 13:32     ` Liu, Yi L
2023-01-17 13:49 ` [PATCH 03/13] vfio: Accept vfio device file in the driver facing kAPI Yi Liu
2023-01-18  8:45   ` Tian, Kevin
2023-01-18 16:11   ` Eric Auger
2023-01-30  9:47     ` Liu, Yi L
2023-01-30 18:02       ` Jason Gunthorpe
2023-01-17 13:49 ` [PATCH 04/13] kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device fd Yi Liu
2023-01-18  8:47   ` Tian, Kevin
2023-01-18 16:33   ` Eric Auger
2023-01-17 13:49 ` [PATCH 05/13] kvm/vfio: Provide struct kvm_device_ops::release() insted of ::destroy() Yi Liu
2023-01-18  8:56   ` Tian, Kevin
2023-01-19  9:12   ` Eric Auger
2023-01-19  9:30     ` Tian, Kevin
2023-01-20  3:52       ` Liu, Yi L
2023-01-19 19:07   ` Jason Gunthorpe
2023-01-19 20:04     ` Alex Williamson
2023-01-20 13:03     ` Liu, Yi L
2023-01-20 14:00     ` Liu, Yi L
2023-01-20 14:33       ` Jason Gunthorpe
2023-01-20 15:09         ` Liu, Yi L
2023-01-20 15:11           ` Liu, Yi L
2023-01-17 13:49 ` [PATCH 06/13] kvm/vfio: Accept vfio device file from userspace Yi Liu
2023-01-18  9:18   ` Tian, Kevin
2023-01-19  9:35   ` Eric Auger
2023-01-30  7:36     ` Liu, Yi L
2023-01-17 13:49 ` [PATCH 07/13] vfio: Pass struct vfio_device_file * to vfio_device_open/close() Yi Liu
2023-01-18  9:27   ` Tian, Kevin
2023-01-19 11:01   ` Eric Auger
2023-01-19 20:35     ` Alex Williamson
2023-01-30  9:38       ` Liu, Yi L
2023-01-30  9:38     ` Liu, Yi L
2023-01-17 13:49 ` [PATCH 08/13] vfio: Block device access via device fd until device is opened Yi Liu
2023-01-18  9:35   ` Tian, Kevin
2023-01-18 13:52     ` Jason Gunthorpe
2023-01-19  3:42       ` Tian, Kevin
2023-01-19  3:43         ` Liu, Yi L
2023-01-19 14:00   ` Eric Auger
2023-01-30 10:41     ` Liu, Yi L
2023-01-19 20:47   ` Alex Williamson
2023-01-30 10:48     ` Liu, Yi L
2023-01-17 13:49 ` [PATCH 09/13] vfio: Add infrastructure for bind_iommufd and attach Yi Liu
2023-01-19  9:45   ` Tian, Kevin
2023-01-30 13:52     ` Liu, Yi L
2023-01-19 23:05   ` Alex Williamson
2023-01-30 13:55     ` Liu, Yi L
2023-01-17 13:49 ` [PATCH 10/13] vfio: Make vfio_device_open() exclusive between group path and device cdev path Yi Liu
2023-01-19  9:55   ` Tian, Kevin
2023-01-30 11:59     ` Liu, Yi L
2023-01-19 23:51   ` Alex Williamson
2023-01-30 12:14     ` Liu, Yi L
2023-02-02  5:34       ` Liu, Yi L
2023-02-03 17:41         ` Jason Gunthorpe
2023-02-06  4:30           ` Liu, Yi L
2023-02-06 10:09             ` Tian, Kevin
2023-02-06 15:10               ` Jason Gunthorpe
2023-02-06 15:51                 ` Liu, Yi L
2023-02-07  0:35                   ` Tian, Kevin
2023-02-07 13:12                     ` Jason Gunthorpe
2023-02-07 13:19                       ` Liu, Yi L
2023-02-07 13:20                         ` Jason Gunthorpe
2023-02-07 13:23                           ` Liu, Yi L
2023-02-07 13:27                             ` Jason Gunthorpe
2023-02-07 13:55                               ` Liu, Yi L
2023-02-08  4:23                               ` Tian, Kevin
2023-02-08 12:41                                 ` Jason Gunthorpe
2023-01-17 13:49 ` [PATCH 11/13] vfio: Add cdev for vfio_device Yi Liu
2023-01-20  7:26   ` Tian, Kevin
2023-01-31  6:17     ` Liu, Yi L
2023-01-24 20:44   ` Jason Gunthorpe
2023-01-17 13:49 ` [PATCH 12/13] vfio: Add ioctls for device cdev iommufd Yi Liu
2023-01-20  8:03   ` Tian, Kevin
2023-02-06  9:07     ` Liu, Yi L
2023-01-17 13:49 ` [PATCH 13/13] vfio: Compile group optionally Yi Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.