kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 00/12] Add vfio_device cdev for iommufd support
@ 2022-12-19  8:47 Yi Liu
  2022-12-19  8:47 ` [RFC 01/12] vfio: Allocate per device file structure Yi Liu
                   ` (13 more replies)
  0 siblings, 14 replies; 44+ messages in thread
From: Yi Liu @ 2022-12-19  8:47 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kevin.tian, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang

Existing VFIO provides group-centric user APIs for userspace. Userspace
opens the /dev/vfio/$group_id first before getting device fd and hence
getting access to device. This is not the desired model for iommufd. Per
the conclusion of community discussion[1], iommufd provides device-centric
kAPIs and requires its consumer (like VFIO) to be device-centric user
APIs. Such user APIs are used to associate device with iommufd and also
the I/O address spaces managed by the iommufd.

This series first introduces a per device file structure to be prepared
for further enhancement and refactors the kvm-vfio code to be prepared
for accepting device file from userspace. Then refactors the vfio to be
able to handle iommufd binding. This refactor includes the mechanism of
blocking device access before iommufd bind, making vfio_device_open() be
exclusive between the group path and the cdev path. Eventually, adds the
cdev support for vfio device, and makes group infrastructure optional as
it is not needed when vfio device cdev is compiled.

This is also a base for further support iommu nesting for vfio device[2].

The complete code can be found in below branch, simple test done with the
legacy group path and the cdev path. Draft QEMU branch can be found at[3]

https://github.com/yiliu1765/iommufd/tree/vfio_device_cdev_rfcv1
(config CONFIG_IOMMUFD=y)

[1] https://lore.kernel.org/kvm/BN9PR11MB5433B1E4AE5B0480369F97178C189@BN9PR11MB5433.namprd11.prod.outlook.com/
[2] https://github.com/yiliu1765/iommufd/tree/wip/iommufd-v6.1-rc3-nesting
[3] https://github.com/yiliu1765/qemu/tree/wip/qemu-iommufd-6.1-rc3

Regards,
	Yi Liu

Yi Liu (12):
  vfio: Allocate per device file structure
  vfio: Refine vfio file kAPIs
  vfio: Accept vfio device file in the driver facing kAPI
  kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device
    fd
  kvm/vfio: Accept vfio device file from userspace
  vfio: Pass struct vfio_device_file * to vfio_device_open/close()
  vfio: Block device access via device fd until device is opened
  vfio: Add infrastructure for bind_iommufd and attach
  vfio: Make vfio_device_open() exclusive between group path and device
    cdev path
  vfio: Add cdev for vfio_device
  vfio: Add ioctls for device cdev iommufd
  vfio: Compile group optionally

 Documentation/virt/kvm/devices/vfio.rst |  32 +-
 drivers/vfio/Kconfig                    |  17 +
 drivers/vfio/Makefile                   |   3 +-
 drivers/vfio/group.c                    | 131 +++----
 drivers/vfio/iommufd.c                  |  79 +++-
 drivers/vfio/pci/vfio_pci_core.c        |   4 +-
 drivers/vfio/vfio.h                     | 108 +++++-
 drivers/vfio/vfio_main.c                | 492 ++++++++++++++++++++++--
 include/linux/vfio.h                    |  21 +-
 include/uapi/linux/iommufd.h            |   2 +
 include/uapi/linux/kvm.h                |  23 +-
 include/uapi/linux/vfio.h               |  64 +++
 virt/kvm/vfio.c                         | 143 +++----
 13 files changed, 891 insertions(+), 228 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [RFC 01/12] vfio: Allocate per device file structure
  2022-12-19  8:47 [RFC 00/12] Add vfio_device cdev for iommufd support Yi Liu
@ 2022-12-19  8:47 ` Yi Liu
  2022-12-21  3:57   ` Tian, Kevin
  2022-12-19  8:47 ` [RFC 02/12] vfio: Refine vfio file kAPIs Yi Liu
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Yi Liu @ 2022-12-19  8:47 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kevin.tian, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang

This is preparation for adding vfio device cdev support. vfio device
cdev requires:
1) a per device file memory to store the kvm pointer set by KVM. It will
   be propagated to vfio_device:kvm after the device cdev file is bound
   to an iommufd
2) a mechanism to block device access through device cdev fd before it
   is bound to an iommufd

To address above requirements, this adds a per device file structure
named vfio_device_file. For now, it's only a wrapper of struct vfio_device
pointer. Other fields will be added to this per file structure in future
commits.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c     | 13 +++++++++++--
 drivers/vfio/vfio.h      |  6 ++++++
 drivers/vfio/vfio_main.c | 31 +++++++++++++++++++++++++------
 3 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index c5d8bf10495e..eb0fbc33217f 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -186,19 +186,26 @@ void vfio_device_group_close(struct vfio_device *device)
 
 static struct file *vfio_device_open_file(struct vfio_device *device)
 {
+	struct vfio_device_file *df;
 	struct file *filep;
 	int ret;
 
+	df = vfio_allocate_device_file(device);
+	if (IS_ERR(df)) {
+		ret = PTR_ERR(df);
+		goto err_out;
+	}
+
 	ret = vfio_device_group_open(device);
 	if (ret)
-		goto err_out;
+		goto err_free;
 
 	/*
 	 * We can't use anon_inode_getfd() because we need to modify
 	 * the f_mode flags directly to allow more than just ioctls
 	 */
 	filep = anon_inode_getfile("[vfio-device]", &vfio_device_fops,
-				   device, O_RDWR);
+				   df, O_RDWR);
 	if (IS_ERR(filep)) {
 		ret = PTR_ERR(filep);
 		goto err_close_device;
@@ -222,6 +229,8 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
 
 err_close_device:
 	vfio_device_group_close(device);
+err_free:
+	kfree(df);
 err_out:
 	return ERR_PTR(ret);
 }
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 2e05418fd18d..2bfa05ad4c84 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -16,12 +16,18 @@ struct iommu_group;
 struct vfio_device;
 struct vfio_container;
 
+struct vfio_device_file {
+	struct vfio_device *device;
+};
+
 void vfio_device_put_registration(struct vfio_device *device);
 bool vfio_device_try_get_registration(struct vfio_device *device);
 int vfio_device_open(struct vfio_device *device,
 		     struct iommufd_ctx *iommufd, struct kvm *kvm);
 void vfio_device_close(struct vfio_device *device,
 		       struct iommufd_ctx *iommufd);
+struct vfio_device_file *
+vfio_allocate_device_file(struct vfio_device *device);
 
 extern const struct file_operations vfio_device_fops;
 
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index e21ff965141e..2ad08ef377d2 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -358,6 +358,20 @@ static bool vfio_assert_device_open(struct vfio_device *device)
 	return !WARN_ON_ONCE(!READ_ONCE(device->open_count));
 }
 
+struct vfio_device_file *
+vfio_allocate_device_file(struct vfio_device *device)
+{
+	struct vfio_device_file *df;
+
+	df = kzalloc(sizeof(*df), GFP_KERNEL_ACCOUNT);
+	if (!df)
+		return ERR_PTR(-ENOMEM);
+
+	df->device = device;
+
+	return df;
+}
+
 static int vfio_device_first_open(struct vfio_device *device,
 				  struct iommufd_ctx *iommufd, struct kvm *kvm)
 {
@@ -475,10 +489,11 @@ static inline void vfio_device_pm_runtime_put(struct vfio_device *device)
  */
 static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 
 	vfio_device_group_close(device);
-
+	kfree(df);
 	vfio_device_put_registration(device);
 
 	return 0;
@@ -943,7 +958,8 @@ static int vfio_ioctl_device_feature(struct vfio_device *device,
 static long vfio_device_fops_unl_ioctl(struct file *filep,
 				       unsigned int cmd, unsigned long arg)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 	int ret;
 
 	ret = vfio_device_pm_runtime_get(device);
@@ -970,7 +986,8 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
 static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf,
 				     size_t count, loff_t *ppos)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 
 	if (unlikely(!device->ops->read))
 		return -EINVAL;
@@ -982,7 +999,8 @@ static ssize_t vfio_device_fops_write(struct file *filep,
 				      const char __user *buf,
 				      size_t count, loff_t *ppos)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 
 	if (unlikely(!device->ops->write))
 		return -EINVAL;
@@ -992,7 +1010,8 @@ static ssize_t vfio_device_fops_write(struct file *filep,
 
 static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 
 	if (unlikely(!device->ops->mmap))
 		return -EINVAL;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC 02/12] vfio: Refine vfio file kAPIs
  2022-12-19  8:47 [RFC 00/12] Add vfio_device cdev for iommufd support Yi Liu
  2022-12-19  8:47 ` [RFC 01/12] vfio: Allocate per device file structure Yi Liu
@ 2022-12-19  8:47 ` Yi Liu
  2022-12-19  8:47 ` [RFC 03/12] vfio: Accept vfio device file in the driver facing kAPI Yi Liu
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 44+ messages in thread
From: Yi Liu @ 2022-12-19  8:47 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kevin.tian, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang

This prepares for making the below kAPIs to accept both group file
and device file instead of only vfio group file.

  bool vfio_file_enforced_coherent(struct file *file);
  void vfio_file_set_kvm(struct file *file, struct kvm *kvm);
  bool vfio_file_has_dev(struct file *file, struct vfio_device *device);

Besides above change, vfio_file_is_group() is renamed to be
vfio_file_is_valid().

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c             | 74 ++++++++------------------------
 drivers/vfio/pci/vfio_pci_core.c |  4 +-
 drivers/vfio/vfio.h              |  4 ++
 drivers/vfio/vfio_main.c         | 62 ++++++++++++++++++++++++++
 include/linux/vfio.h             |  2 +-
 virt/kvm/vfio.c                  | 10 ++---
 6 files changed, 92 insertions(+), 64 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index eb0fbc33217f..fe73db270984 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -721,6 +721,15 @@ bool vfio_device_has_container(struct vfio_device *device)
 	return device->group->container;
 }
 
+struct vfio_group *vfio_group_from_file(struct file *file)
+{
+	struct vfio_group *group = file->private_data;
+
+	if (file->f_op != &vfio_group_fops)
+		return NULL;
+	return group;
+}
+
 /**
  * vfio_file_iommu_group - Return the struct iommu_group for the vfio group file
  * @file: VFIO group file
@@ -731,13 +740,13 @@ bool vfio_device_has_container(struct vfio_device *device)
  */
 struct iommu_group *vfio_file_iommu_group(struct file *file)
 {
-	struct vfio_group *group = file->private_data;
+	struct vfio_group *group = vfio_group_from_file(file);
 	struct iommu_group *iommu_group = NULL;
 
 	if (!IS_ENABLED(CONFIG_SPAPR_TCE_IOMMU))
 		return NULL;
 
-	if (!vfio_file_is_group(file))
+	if (!group)
 		return NULL;
 
 	mutex_lock(&group->group_lock);
@@ -750,34 +759,11 @@ struct iommu_group *vfio_file_iommu_group(struct file *file)
 }
 EXPORT_SYMBOL_GPL(vfio_file_iommu_group);
 
-/**
- * vfio_file_is_group - True if the file is usable with VFIO aPIS
- * @file: VFIO group file
- */
-bool vfio_file_is_group(struct file *file)
-{
-	return file->f_op == &vfio_group_fops;
-}
-EXPORT_SYMBOL_GPL(vfio_file_is_group);
-
-/**
- * vfio_file_enforced_coherent - True if the DMA associated with the VFIO file
- *        is always CPU cache coherent
- * @file: VFIO group file
- *
- * Enforced coherency means that the IOMMU ignores things like the PCIe no-snoop
- * bit in DMA transactions. A return of false indicates that the user has
- * rights to access additional instructions such as wbinvd on x86.
- */
-bool vfio_file_enforced_coherent(struct file *file)
+bool vfio_group_enforced_coherent(struct vfio_group *group)
 {
-	struct vfio_group *group = file->private_data;
 	struct vfio_device *device;
 	bool ret = true;
 
-	if (!vfio_file_is_group(file))
-		return true;
-
 	/*
 	 * If the device does not have IOMMU_CAP_ENFORCE_CACHE_COHERENCY then
 	 * any domain later attached to it will also not support it. If the cap
@@ -795,46 +781,22 @@ bool vfio_file_enforced_coherent(struct file *file)
 	mutex_unlock(&group->device_lock);
 	return ret;
 }
-EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
 
-/**
- * vfio_file_set_kvm - Link a kvm with VFIO drivers
- * @file: VFIO group file
- * @kvm: KVM to link
- *
- * When a VFIO device is first opened the KVM will be available in
- * device->kvm if one was associated with the group.
- */
-void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
+void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm)
 {
-	struct vfio_group *group = file->private_data;
-
-	if (!vfio_file_is_group(file))
-		return;
-
+	/*
+	 * When a VFIO device is first opened the KVM will be available in
+	 * device->kvm if one was associated with the group.
+	 */
 	mutex_lock(&group->group_lock);
 	group->kvm = kvm;
 	mutex_unlock(&group->group_lock);
 }
-EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
 
-/**
- * vfio_file_has_dev - True if the VFIO file is a handle for device
- * @file: VFIO file to check
- * @device: Device that must be part of the file
- *
- * Returns true if given file has permission to manipulate the given device.
- */
-bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
+bool vfio_group_has_dev(struct vfio_group *group, struct vfio_device *device)
 {
-	struct vfio_group *group = file->private_data;
-
-	if (!vfio_file_is_group(file))
-		return false;
-
 	return group == device->group;
 }
-EXPORT_SYMBOL_GPL(vfio_file_has_dev);
 
 static char *vfio_devnode(struct device *dev, umode_t *mode)
 {
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index e030c2120183..a20b1294d585 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1312,8 +1312,8 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 			break;
 		}
 
-		/* Ensure the FD is a vfio group FD.*/
-		if (!vfio_file_is_group(file)) {
+		/* Ensure the FD is a vfio FD.*/
+		if (!vfio_file_is_valid(file)) {
 			fput(file);
 			ret = -EINVAL;
 			break;
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 2bfa05ad4c84..69d0fd7e351e 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -90,6 +90,10 @@ void vfio_device_group_unregister(struct vfio_device *device);
 int vfio_device_group_use_iommu(struct vfio_device *device);
 void vfio_device_group_unuse_iommu(struct vfio_device *device);
 void vfio_device_group_close(struct vfio_device *device);
+struct vfio_group *vfio_group_from_file(struct file *file);
+bool vfio_group_enforced_coherent(struct vfio_group *group);
+void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm);
+bool vfio_group_has_dev(struct vfio_group *group, struct vfio_device *device);
 bool vfio_device_has_container(struct vfio_device *device);
 int __init vfio_group_init(void);
 void vfio_group_cleanup(void);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 2ad08ef377d2..7d13c5f0bfab 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1029,6 +1029,68 @@ const struct file_operations vfio_device_fops = {
 	.mmap		= vfio_device_fops_mmap,
 };
 
+/**
+ * vfio_file_is_valid - True if the file is usable with VFIO aPIS
+ * @file: VFIO group file or VFIO device file
+ */
+bool vfio_file_is_valid(struct file *file)
+{
+	return vfio_group_from_file(file);
+}
+EXPORT_SYMBOL_GPL(vfio_file_is_valid);
+
+/**
+ * vfio_file_enforced_coherent - True if the DMA associated with the VFIO file
+ *        is always CPU cache coherent
+ * @file: VFIO group or device file
+ *
+ * Enforced coherency means that the IOMMU ignores things like the PCIe no-snoop
+ * bit in DMA transactions. A return of false indicates that the user has
+ * rights to access additional instructions such as wbinvd on x86.
+ */
+bool vfio_file_enforced_coherent(struct file *file)
+{
+	struct vfio_group *group = vfio_group_from_file(file);
+
+	if (group)
+		return vfio_group_enforced_coherent(group);
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
+
+/**
+ * vfio_file_set_kvm - Link a kvm with VFIO drivers
+ * @file: VFIO group file or device file
+ * @kvm: KVM to link
+ *
+ */
+void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
+{
+	struct vfio_group *group = vfio_group_from_file(file);
+
+	if (group)
+		vfio_group_set_kvm(group, kvm);
+}
+EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
+
+/**
+ * vfio_file_has_dev - True if the VFIO file is a handle for device
+ * @file: VFIO file to check
+ * @device: Device that must be part of the file
+ *
+ * Returns true if given file has permission to manipulate the given device.
+ */
+bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
+{
+	struct vfio_group *group = vfio_group_from_file(file);
+
+	if (group)
+		return vfio_group_has_dev(group, device);
+	return false;
+}
+EXPORT_SYMBOL_GPL(vfio_file_has_dev);
+
 /*
  * Sub-module support
  */
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index a615542df1e0..9e3b9b5c8c8b 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -239,7 +239,7 @@ int vfio_mig_get_next_state(struct vfio_device *device,
  * External user API
  */
 struct iommu_group *vfio_file_iommu_group(struct file *file);
-bool vfio_file_is_group(struct file *file);
+bool vfio_file_is_valid(struct file *file);
 bool vfio_file_enforced_coherent(struct file *file);
 void vfio_file_set_kvm(struct file *file, struct kvm *kvm);
 bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 495ceabffe88..868930c7a59b 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -64,18 +64,18 @@ static bool kvm_vfio_file_enforced_coherent(struct file *file)
 	return ret;
 }
 
-static bool kvm_vfio_file_is_group(struct file *file)
+static bool kvm_vfio_file_is_valid(struct file *file)
 {
 	bool (*fn)(struct file *file);
 	bool ret;
 
-	fn = symbol_get(vfio_file_is_group);
+	fn = symbol_get(vfio_file_is_valid);
 	if (!fn)
 		return false;
 
 	ret = fn(file);
 
-	symbol_put(vfio_file_is_group);
+	symbol_put(vfio_file_is_valid);
 
 	return ret;
 }
@@ -154,8 +154,8 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
 	if (!filp)
 		return -EBADF;
 
-	/* Ensure the FD is a vfio group FD.*/
-	if (!kvm_vfio_file_is_group(filp)) {
+	/* Ensure the FD is a vfio FD.*/
+	if (!kvm_vfio_file_is_valid(filp)) {
 		ret = -EINVAL;
 		goto err_fput;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC 03/12] vfio: Accept vfio device file in the driver facing kAPI
  2022-12-19  8:47 [RFC 00/12] Add vfio_device cdev for iommufd support Yi Liu
  2022-12-19  8:47 ` [RFC 01/12] vfio: Allocate per device file structure Yi Liu
  2022-12-19  8:47 ` [RFC 02/12] vfio: Refine vfio file kAPIs Yi Liu
@ 2022-12-19  8:47 ` Yi Liu
  2022-12-21  4:07   ` Tian, Kevin
  2022-12-19  8:47 ` [RFC 04/12] kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device fd Yi Liu
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Yi Liu @ 2022-12-19  8:47 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kevin.tian, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang

This makes the vfio file kAPIs to accepte vfio device files, also a
preparation for vfio device cdev support.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/vfio.h      |  1 +
 drivers/vfio/vfio_main.c | 64 +++++++++++++++++++++++++++++++++++++---
 2 files changed, 61 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 69d0fd7e351e..f0e411995997 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -18,6 +18,7 @@ struct vfio_container;
 
 struct vfio_device_file {
 	struct vfio_device *device;
+	struct kvm *kvm;
 };
 
 void vfio_device_put_registration(struct vfio_device *device);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 7d13c5f0bfab..481502a6964a 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1029,16 +1029,40 @@ const struct file_operations vfio_device_fops = {
 	.mmap		= vfio_device_fops_mmap,
 };
 
+static struct vfio_device *vfio_device_from_file(struct file *file)
+{
+	struct vfio_device_file *df = file->private_data;
+
+	if (file->f_op != &vfio_device_fops)
+		return NULL;
+	return df->device;
+}
+
 /**
  * vfio_file_is_valid - True if the file is usable with VFIO aPIS
  * @file: VFIO group file or VFIO device file
  */
 bool vfio_file_is_valid(struct file *file)
 {
-	return vfio_group_from_file(file);
+	return vfio_group_from_file(file) ||
+	       vfio_device_from_file(file);
 }
 EXPORT_SYMBOL_GPL(vfio_file_is_valid);
 
+static bool vfio_device_enforced_coherent(struct vfio_device *device)
+{
+	bool ret;
+
+	if (!vfio_device_try_get_registration(device))
+		return true;
+
+	ret = device_iommu_capable(device->dev,
+				   IOMMU_CAP_ENFORCE_CACHE_COHERENCY);
+
+	vfio_device_put_registration(device);
+	return ret;
+}
+
 /**
  * vfio_file_enforced_coherent - True if the DMA associated with the VFIO file
  *        is always CPU cache coherent
@@ -1050,15 +1074,36 @@ EXPORT_SYMBOL_GPL(vfio_file_is_valid);
  */
 bool vfio_file_enforced_coherent(struct file *file)
 {
-	struct vfio_group *group = vfio_group_from_file(file);
+	struct vfio_group *group;
+	struct vfio_device *device;
 
+	group = vfio_group_from_file(file);
 	if (group)
 		return vfio_group_enforced_coherent(group);
 
+	device = vfio_device_from_file(file);
+	if (device)
+		return vfio_device_enforced_coherent(device);
+
 	return true;
 }
 EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
 
+static void vfio_device_file_set_kvm(struct file *file, struct kvm *kvm)
+{
+	struct vfio_device_file *df = file->private_data;
+	struct vfio_device *device = df->device;
+
+	/*
+	 * The kvm is first recorded in the df, and will be propagated
+	 * to vfio_device::kvm when the file binds iommufd successfully in
+	 * the vfio device cdev path.
+	 */
+	mutex_lock(&device->dev_set->lock);
+	df->kvm = kvm;
+	mutex_unlock(&device->dev_set->lock);
+}
+
 /**
  * vfio_file_set_kvm - Link a kvm with VFIO drivers
  * @file: VFIO group file or device file
@@ -1067,10 +1112,14 @@ EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
  */
 void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
 {
-	struct vfio_group *group = vfio_group_from_file(file);
+	struct vfio_group *group;
 
+	group = vfio_group_from_file(file);
 	if (group)
 		vfio_group_set_kvm(group, kvm);
+
+	if (vfio_device_from_file(file))
+		vfio_device_file_set_kvm(file, kvm);
 }
 EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
 
@@ -1083,10 +1132,17 @@ EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
  */
 bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
 {
-	struct vfio_group *group = vfio_group_from_file(file);
+	struct vfio_group *group;
+	struct vfio_device *vdev;
 
+	group = vfio_group_from_file(file);
 	if (group)
 		return vfio_group_has_dev(group, device);
+
+	vdev = vfio_device_from_file(file);
+	if (device)
+		return vdev == device;
+
 	return false;
 }
 EXPORT_SYMBOL_GPL(vfio_file_has_dev);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC 04/12] kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device fd
  2022-12-19  8:47 [RFC 00/12] Add vfio_device cdev for iommufd support Yi Liu
                   ` (2 preceding siblings ...)
  2022-12-19  8:47 ` [RFC 03/12] vfio: Accept vfio device file in the driver facing kAPI Yi Liu
@ 2022-12-19  8:47 ` Yi Liu
  2022-12-19  8:47 ` [RFC 05/12] kvm/vfio: Accept vfio device file from userspace Yi Liu
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 44+ messages in thread
From: Yi Liu @ 2022-12-19  8:47 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kevin.tian, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang

Meanwhile, rename related helpers. No functional change is intended.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 virt/kvm/vfio.c | 115 ++++++++++++++++++++++++------------------------
 1 file changed, 58 insertions(+), 57 deletions(-)

diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 868930c7a59b..0f54b9d308d7 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -21,7 +21,7 @@
 #include <asm/kvm_ppc.h>
 #endif
 
-struct kvm_vfio_group {
+struct kvm_vfio_file {
 	struct list_head node;
 	struct file *file;
 #ifdef CONFIG_SPAPR_TCE_IOMMU
@@ -30,7 +30,7 @@ struct kvm_vfio_group {
 };
 
 struct kvm_vfio {
-	struct list_head group_list;
+	struct list_head file_list;
 	struct mutex lock;
 	bool noncoherent;
 };
@@ -98,34 +98,35 @@ static struct iommu_group *kvm_vfio_file_iommu_group(struct file *file)
 }
 
 static void kvm_spapr_tce_release_vfio_group(struct kvm *kvm,
-					     struct kvm_vfio_group *kvg)
+					     struct kvm_vfio_file *kvf)
 {
-	if (WARN_ON_ONCE(!kvg->iommu_group))
+	if (WARN_ON_ONCE(!kvf->iommu_group))
 		return;
 
-	kvm_spapr_tce_release_iommu_group(kvm, kvg->iommu_group);
-	iommu_group_put(kvg->iommu_group);
-	kvg->iommu_group = NULL;
+	kvm_spapr_tce_release_iommu_group(kvm, kvf->iommu_group);
+	iommu_group_put(kvf->iommu_group);
+	kvf->iommu_group = NULL;
 }
 #endif
 
 /*
- * Groups can use the same or different IOMMU domains.  If the same then
- * adding a new group may change the coherency of groups we've previously
- * been told about.  We don't want to care about any of that so we retest
- * each group and bail as soon as we find one that's noncoherent.  This
- * means we only ever [un]register_noncoherent_dma once for the whole device.
+ * Groups/devices can use the same or different IOMMU domains.  If the same
+ * then adding a new group/device may change the coherency of groups/devices
+ * we've previously been told about.  We don't want to care about any of
+ * that so we retest each group/device and bail as soon as we find one that's
+ * noncoherent.  This means we only ever [un]register_noncoherent_dma once
+ * for the whole device.
  */
 static void kvm_vfio_update_coherency(struct kvm_device *dev)
 {
 	struct kvm_vfio *kv = dev->private;
 	bool noncoherent = false;
-	struct kvm_vfio_group *kvg;
+	struct kvm_vfio_file *kvf;
 
 	mutex_lock(&kv->lock);
 
-	list_for_each_entry(kvg, &kv->group_list, node) {
-		if (!kvm_vfio_file_enforced_coherent(kvg->file)) {
+	list_for_each_entry(kvf, &kv->file_list, node) {
+		if (!kvm_vfio_file_enforced_coherent(kvf->file)) {
 			noncoherent = true;
 			break;
 		}
@@ -143,10 +144,10 @@ static void kvm_vfio_update_coherency(struct kvm_device *dev)
 	mutex_unlock(&kv->lock);
 }
 
-static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
+static int kvm_vfio_file_add(struct kvm_device *dev, unsigned int fd)
 {
 	struct kvm_vfio *kv = dev->private;
-	struct kvm_vfio_group *kvg;
+	struct kvm_vfio_file *kvf;
 	struct file *filp;
 	int ret;
 
@@ -162,27 +163,27 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
 
 	mutex_lock(&kv->lock);
 
-	list_for_each_entry(kvg, &kv->group_list, node) {
-		if (kvg->file == filp) {
+	list_for_each_entry(kvf, &kv->file_list, node) {
+		if (kvf->file == filp) {
 			ret = -EEXIST;
 			goto err_unlock;
 		}
 	}
 
-	kvg = kzalloc(sizeof(*kvg), GFP_KERNEL_ACCOUNT);
-	if (!kvg) {
+	kvf = kzalloc(sizeof(*kvf), GFP_KERNEL_ACCOUNT);
+	if (!kvf) {
 		ret = -ENOMEM;
 		goto err_unlock;
 	}
 
-	kvg->file = filp;
-	list_add_tail(&kvg->node, &kv->group_list);
+	kvf->file = filp;
+	list_add_tail(&kvf->node, &kv->file_list);
 
 	kvm_arch_start_assignment(dev->kvm);
 
 	mutex_unlock(&kv->lock);
 
-	kvm_vfio_file_set_kvm(kvg->file, dev->kvm);
+	kvm_vfio_file_set_kvm(kvf->file, dev->kvm);
 	kvm_vfio_update_coherency(dev);
 
 	return 0;
@@ -193,10 +194,10 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
 	return ret;
 }
 
-static int kvm_vfio_group_del(struct kvm_device *dev, unsigned int fd)
+static int kvm_vfio_file_del(struct kvm_device *dev, unsigned int fd)
 {
 	struct kvm_vfio *kv = dev->private;
-	struct kvm_vfio_group *kvg;
+	struct kvm_vfio_file *kvf;
 	struct fd f;
 	int ret;
 
@@ -208,18 +209,18 @@ static int kvm_vfio_group_del(struct kvm_device *dev, unsigned int fd)
 
 	mutex_lock(&kv->lock);
 
-	list_for_each_entry(kvg, &kv->group_list, node) {
-		if (kvg->file != f.file)
+	list_for_each_entry(kvf, &kv->file_list, node) {
+		if (kvf->file != f.file)
 			continue;
 
-		list_del(&kvg->node);
+		list_del(&kvf->node);
 		kvm_arch_end_assignment(dev->kvm);
 #ifdef CONFIG_SPAPR_TCE_IOMMU
-		kvm_spapr_tce_release_vfio_group(dev->kvm, kvg);
+		kvm_spapr_tce_release_vfio_group(dev->kvm, kvf);
 #endif
-		kvm_vfio_file_set_kvm(kvg->file, NULL);
-		fput(kvg->file);
-		kfree(kvg);
+		kvm_vfio_file_set_kvm(kvf->file, NULL);
+		fput(kvf->file);
+		kfree(kvf);
 		ret = 0;
 		break;
 	}
@@ -234,12 +235,12 @@ static int kvm_vfio_group_del(struct kvm_device *dev, unsigned int fd)
 }
 
 #ifdef CONFIG_SPAPR_TCE_IOMMU
-static int kvm_vfio_group_set_spapr_tce(struct kvm_device *dev,
-					void __user *arg)
+static int kvm_vfio_file_set_spapr_tce(struct kvm_device *dev,
+				       void __user *arg)
 {
 	struct kvm_vfio_spapr_tce param;
 	struct kvm_vfio *kv = dev->private;
-	struct kvm_vfio_group *kvg;
+	struct kvm_vfio_file *kvf;
 	struct fd f;
 	int ret;
 
@@ -254,20 +255,20 @@ static int kvm_vfio_group_set_spapr_tce(struct kvm_device *dev,
 
 	mutex_lock(&kv->lock);
 
-	list_for_each_entry(kvg, &kv->group_list, node) {
-		if (kvg->file != f.file)
+	list_for_each_entry(kvf, &kv->file_list, node) {
+		if (kvf->file != f.file)
 			continue;
 
-		if (!kvg->iommu_group) {
-			kvg->iommu_group = kvm_vfio_file_iommu_group(kvg->file);
-			if (WARN_ON_ONCE(!kvg->iommu_group)) {
+		if (!kvf->iommu_group) {
+			kvf->iommu_group = kvm_vfio_file_iommu_group(kvf->file);
+			if (WARN_ON_ONCE(!kvf->iommu_group)) {
 				ret = -EIO;
 				goto err_fdput;
 			}
 		}
 
 		ret = kvm_spapr_tce_attach_iommu_group(dev->kvm, param.tablefd,
-						       kvg->iommu_group);
+						       kvf->iommu_group);
 		break;
 	}
 
@@ -278,8 +279,8 @@ static int kvm_vfio_group_set_spapr_tce(struct kvm_device *dev,
 }
 #endif
 
-static int kvm_vfio_set_group(struct kvm_device *dev, long attr,
-			      void __user *arg)
+static int kvm_vfio_set_file(struct kvm_device *dev, long attr,
+			     void __user *arg)
 {
 	int32_t __user *argp = arg;
 	int32_t fd;
@@ -288,16 +289,16 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr,
 	case KVM_DEV_VFIO_GROUP_ADD:
 		if (get_user(fd, argp))
 			return -EFAULT;
-		return kvm_vfio_group_add(dev, fd);
+		return kvm_vfio_file_add(dev, fd);
 
 	case KVM_DEV_VFIO_GROUP_DEL:
 		if (get_user(fd, argp))
 			return -EFAULT;
-		return kvm_vfio_group_del(dev, fd);
+		return kvm_vfio_file_del(dev, fd);
 
 #ifdef CONFIG_SPAPR_TCE_IOMMU
 	case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE:
-		return kvm_vfio_group_set_spapr_tce(dev, arg);
+		return kvm_vfio_file_set_spapr_tce(dev, arg);
 #endif
 	}
 
@@ -309,8 +310,8 @@ static int kvm_vfio_set_attr(struct kvm_device *dev,
 {
 	switch (attr->group) {
 	case KVM_DEV_VFIO_GROUP:
-		return kvm_vfio_set_group(dev, attr->attr,
-					  u64_to_user_ptr(attr->addr));
+		return kvm_vfio_set_file(dev, attr->attr,
+					 u64_to_user_ptr(attr->addr));
 	}
 
 	return -ENXIO;
@@ -339,16 +340,16 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
 static void kvm_vfio_destroy(struct kvm_device *dev)
 {
 	struct kvm_vfio *kv = dev->private;
-	struct kvm_vfio_group *kvg, *tmp;
+	struct kvm_vfio_file *kvf, *tmp;
 
-	list_for_each_entry_safe(kvg, tmp, &kv->group_list, node) {
+	list_for_each_entry_safe(kvf, tmp, &kv->file_list, node) {
 #ifdef CONFIG_SPAPR_TCE_IOMMU
-		kvm_spapr_tce_release_vfio_group(dev->kvm, kvg);
+		kvm_spapr_tce_release_vfio_group(dev->kvm, kvf);
 #endif
-		kvm_vfio_file_set_kvm(kvg->file, NULL);
-		fput(kvg->file);
-		list_del(&kvg->node);
-		kfree(kvg);
+		kvm_vfio_file_set_kvm(kvf->file, NULL);
+		fput(kvf->file);
+		list_del(&kvf->node);
+		kfree(kvf);
 		kvm_arch_end_assignment(dev->kvm);
 	}
 
@@ -382,7 +383,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
 	if (!kv)
 		return -ENOMEM;
 
-	INIT_LIST_HEAD(&kv->group_list);
+	INIT_LIST_HEAD(&kv->file_list);
 	mutex_init(&kv->lock);
 
 	dev->private = kv;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC 05/12] kvm/vfio: Accept vfio device file from userspace
  2022-12-19  8:47 [RFC 00/12] Add vfio_device cdev for iommufd support Yi Liu
                   ` (3 preceding siblings ...)
  2022-12-19  8:47 ` [RFC 04/12] kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device fd Yi Liu
@ 2022-12-19  8:47 ` Yi Liu
  2023-01-06 14:32   ` Jason Gunthorpe
  2022-12-19  8:47 ` [RFC 06/12] vfio: Pass struct vfio_device_file * to vfio_device_open/close() Yi Liu
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Yi Liu @ 2022-12-19  8:47 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kevin.tian, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang

This defines KVM_DEV_VFIO_FILE* and make alias with KVM_DEV_VFIO_GROUP*.
Old userspace uses KVM_DEV_VFIO_GROUP* works as well.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 Documentation/virt/kvm/devices/vfio.rst | 32 ++++++++++++-------------
 include/uapi/linux/kvm.h                | 23 +++++++++++++-----
 virt/kvm/vfio.c                         | 18 +++++++-------
 3 files changed, 42 insertions(+), 31 deletions(-)

diff --git a/Documentation/virt/kvm/devices/vfio.rst b/Documentation/virt/kvm/devices/vfio.rst
index 2d20dc561069..ac4300ded398 100644
--- a/Documentation/virt/kvm/devices/vfio.rst
+++ b/Documentation/virt/kvm/devices/vfio.rst
@@ -9,23 +9,23 @@ Device types supported:
   - KVM_DEV_TYPE_VFIO
 
 Only one VFIO instance may be created per VM.  The created device
-tracks VFIO groups in use by the VM and features of those groups
-important to the correctness and acceleration of the VM.  As groups
-are enabled and disabled for use by the VM, KVM should be updated
-about their presence.  When registered with KVM, a reference to the
-VFIO-group is held by KVM.
-
-Groups:
-  KVM_DEV_VFIO_GROUP
-
-KVM_DEV_VFIO_GROUP attributes:
-  KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
-	kvm_device_attr.addr points to an int32_t file descriptor
+tracks VFIO files (group or device) in use by the VM and features
+of those groups/devices important to the correctness and acceleration
+of the VM.  As groups/device are enabled and disabled for use by the
+VM, KVM should be updated about their presence.  When registered with
+KVM, a reference to the VFIO file is held by KVM.
+
+VFIO Files:
+  KVM_DEV_VFIO_FILE
+
+KVM_DEV_VFIO_FILE attributes:
+  KVM_DEV_VFIO_FILE_ADD: Add a VFIO file (group/device) to VFIO-KVM device
+	tracking kvm_device_attr.addr points to an int32_t file descriptor
+	for the VFIO file.
+  KVM_DEV_VFIO_FILE_DEL: Remove a VFIO file (group/device) from VFIO-KVM device
+	tracking kvm_device_attr.addr points to an int32_t file descriptor
 	for the VFIO group.
-  KVM_DEV_VFIO_GROUP_DEL: Remove a VFIO group from VFIO-KVM device tracking
-	kvm_device_attr.addr points to an int32_t file descriptor
-	for the VFIO group.
-  KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE: attaches a guest visible TCE table
+  KVM_DEV_VFIO_FILE_SET_SPAPR_TCE: attaches a guest visible TCE table
 	allocated by sPAPR KVM.
 	kvm_device_attr.addr points to a struct::
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 0d5d4419139a..77bd6bb409d2 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1398,15 +1398,26 @@ struct kvm_create_device {
 
 struct kvm_device_attr {
 	__u32	flags;		/* no flags currently defined */
-	__u32	group;		/* device-defined */
-	__u64	attr;		/* group-defined */
+	union {
+		__u32	group;
+		__u32	file;
+	}; /* device-defined */
+	__u64	attr;		/* VFIO-file-defined or group-defined */
 	__u64	addr;		/* userspace address of attr data */
 };
 
-#define  KVM_DEV_VFIO_GROUP			1
-#define   KVM_DEV_VFIO_GROUP_ADD			1
-#define   KVM_DEV_VFIO_GROUP_DEL			2
-#define   KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE		3
+#define  KVM_DEV_VFIO_FILE	1
+
+#define   KVM_DEV_VFIO_FILE_ADD			1
+#define   KVM_DEV_VFIO_FILE_DEL			2
+#define   KVM_DEV_VFIO_FILE_SET_SPAPR_TCE	3
+
+/* Group aliases are for compile time uapi compatibility */
+#define  KVM_DEV_VFIO_GROUP	KVM_DEV_VFIO_FILE
+
+#define   KVM_DEV_VFIO_GROUP_ADD	KVM_DEV_VFIO_FILE_ADD
+#define   KVM_DEV_VFIO_GROUP_DEL	KVM_DEV_VFIO_FILE_DEL
+#define   KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE	KVM_DEV_VFIO_FILE_SET_SPAPR_TCE
 
 enum kvm_device_type {
 	KVM_DEV_TYPE_FSL_MPIC_20	= 1,
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 0f54b9d308d7..f41705d509d2 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -286,18 +286,18 @@ static int kvm_vfio_set_file(struct kvm_device *dev, long attr,
 	int32_t fd;
 
 	switch (attr) {
-	case KVM_DEV_VFIO_GROUP_ADD:
+	case KVM_DEV_VFIO_FILE_ADD:
 		if (get_user(fd, argp))
 			return -EFAULT;
 		return kvm_vfio_file_add(dev, fd);
 
-	case KVM_DEV_VFIO_GROUP_DEL:
+	case KVM_DEV_VFIO_FILE_DEL:
 		if (get_user(fd, argp))
 			return -EFAULT;
 		return kvm_vfio_file_del(dev, fd);
 
 #ifdef CONFIG_SPAPR_TCE_IOMMU
-	case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE:
+	case KVM_DEV_VFIO_FILE_SET_SPAPR_TCE:
 		return kvm_vfio_file_set_spapr_tce(dev, arg);
 #endif
 	}
@@ -309,7 +309,7 @@ static int kvm_vfio_set_attr(struct kvm_device *dev,
 			     struct kvm_device_attr *attr)
 {
 	switch (attr->group) {
-	case KVM_DEV_VFIO_GROUP:
+	case KVM_DEV_VFIO_FILE:
 		return kvm_vfio_set_file(dev, attr->attr,
 					 u64_to_user_ptr(attr->addr));
 	}
@@ -320,13 +320,13 @@ static int kvm_vfio_set_attr(struct kvm_device *dev,
 static int kvm_vfio_has_attr(struct kvm_device *dev,
 			     struct kvm_device_attr *attr)
 {
-	switch (attr->group) {
-	case KVM_DEV_VFIO_GROUP:
+	switch (attr->file) {
+	case KVM_DEV_VFIO_FILE:
 		switch (attr->attr) {
-		case KVM_DEV_VFIO_GROUP_ADD:
-		case KVM_DEV_VFIO_GROUP_DEL:
+		case KVM_DEV_VFIO_FILE_ADD:
+		case KVM_DEV_VFIO_FILE_DEL:
 #ifdef CONFIG_SPAPR_TCE_IOMMU
-		case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE:
+		case KVM_DEV_VFIO_FILE_SET_SPAPR_TCE:
 #endif
 			return 0;
 		}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC 06/12] vfio: Pass struct vfio_device_file * to vfio_device_open/close()
  2022-12-19  8:47 [RFC 00/12] Add vfio_device cdev for iommufd support Yi Liu
                   ` (4 preceding siblings ...)
  2022-12-19  8:47 ` [RFC 05/12] kvm/vfio: Accept vfio device file from userspace Yi Liu
@ 2022-12-19  8:47 ` Yi Liu
  2022-12-21  4:10   ` Tian, Kevin
  2022-12-19  8:47 ` [RFC 07/12] vfio: Block device access via device fd until device is opened Yi Liu
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Yi Liu @ 2022-12-19  8:47 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kevin.tian, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang

This avoids passing struct kvm * and struct iommufd_ctx * in multiple
functions. vfio_device_open() becomes to be a locked helper, while
vfio_device_open() still holds device->dev_set->lock itself.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c     | 34 +++++++++++++++++++++++++---------
 drivers/vfio/vfio.h      | 10 +++++-----
 drivers/vfio/vfio_main.c | 40 ++++++++++++++++++++++++----------------
 3 files changed, 54 insertions(+), 30 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index fe73db270984..1030a0ad3cf1 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -154,33 +154,49 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
 	return ret;
 }
 
-static int vfio_device_group_open(struct vfio_device *device)
+static int vfio_device_group_open(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
 	int ret;
 
 	mutex_lock(&device->group->group_lock);
 	if (!vfio_group_has_iommu(device->group)) {
 		ret = -EINVAL;
-		goto out_unlock;
+		goto err_unlock_group;
 	}
 
+	mutex_lock(&device->dev_set->lock);
 	/*
 	 * Here we pass the KVM pointer with the group under the lock.  If the
 	 * device driver will use it, it must obtain a reference and release it
 	 * during close_device.
 	 */
-	ret = vfio_device_open(device, device->group->iommufd,
-			       device->group->kvm);
+	df->kvm = device->group->kvm;
+	df->iommufd = device->group->iommufd;
+
+	ret = vfio_device_open(df);
+	if (ret)
+		goto err_unlock_device;
+	mutex_unlock(&device->dev_set->lock);
 
-out_unlock:
+	mutex_unlock(&device->group->group_lock);
+	return 0;
+
+err_unlock_device:
+	df->kvm = NULL;
+	df->iommufd = NULL;
+	mutex_unlock(&device->dev_set->lock);
+err_unlock_group:
 	mutex_unlock(&device->group->group_lock);
 	return ret;
 }
 
-void vfio_device_group_close(struct vfio_device *device)
+void vfio_device_group_close(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
+
 	mutex_lock(&device->group->group_lock);
-	vfio_device_close(device, device->group->iommufd);
+	vfio_device_close(df);
 	mutex_unlock(&device->group->group_lock);
 }
 
@@ -196,7 +212,7 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
 		goto err_out;
 	}
 
-	ret = vfio_device_group_open(device);
+	ret = vfio_device_group_open(df);
 	if (ret)
 		goto err_free;
 
@@ -228,7 +244,7 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
 	return filep;
 
 err_close_device:
-	vfio_device_group_close(device);
+	vfio_device_group_close(df);
 err_free:
 	kfree(df);
 err_out:
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index f0e411995997..24ccf9f0ab16 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -19,14 +19,14 @@ struct vfio_container;
 struct vfio_device_file {
 	struct vfio_device *device;
 	struct kvm *kvm;
+	struct iommufd_ctx *iommufd;
 };
 
 void vfio_device_put_registration(struct vfio_device *device);
 bool vfio_device_try_get_registration(struct vfio_device *device);
-int vfio_device_open(struct vfio_device *device,
-		     struct iommufd_ctx *iommufd, struct kvm *kvm);
-void vfio_device_close(struct vfio_device *device,
-		       struct iommufd_ctx *iommufd);
+int vfio_device_open(struct vfio_device_file *df);
+void vfio_device_close(struct vfio_device_file *device);
+
 struct vfio_device_file *
 vfio_allocate_device_file(struct vfio_device *device);
 
@@ -90,7 +90,7 @@ void vfio_device_group_register(struct vfio_device *device);
 void vfio_device_group_unregister(struct vfio_device *device);
 int vfio_device_group_use_iommu(struct vfio_device *device);
 void vfio_device_group_unuse_iommu(struct vfio_device *device);
-void vfio_device_group_close(struct vfio_device *device);
+void vfio_device_group_close(struct vfio_device_file *df);
 struct vfio_group *vfio_group_from_file(struct file *file);
 bool vfio_group_enforced_coherent(struct vfio_group *group);
 void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 481502a6964a..2aa3597abb40 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -372,9 +372,11 @@ vfio_allocate_device_file(struct vfio_device *device)
 	return df;
 }
 
-static int vfio_device_first_open(struct vfio_device *device,
-				  struct iommufd_ctx *iommufd, struct kvm *kvm)
+static int vfio_device_first_open(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
+	struct iommufd_ctx *iommufd = df->iommufd;
+	struct kvm *kvm = df->kvm;
 	int ret;
 
 	lockdep_assert_held(&device->dev_set->lock);
@@ -408,9 +410,11 @@ static int vfio_device_first_open(struct vfio_device *device,
 	return ret;
 }
 
-static void vfio_device_last_close(struct vfio_device *device,
-				   struct iommufd_ctx *iommufd)
+static void vfio_device_last_close(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
+	struct iommufd_ctx *iommufd = df->iommufd;
+
 	lockdep_assert_held(&device->dev_set->lock);
 
 	if (device->ops->close_device)
@@ -423,30 +427,34 @@ static void vfio_device_last_close(struct vfio_device *device,
 	module_put(device->dev->driver->owner);
 }
 
-int vfio_device_open(struct vfio_device *device,
-		     struct iommufd_ctx *iommufd, struct kvm *kvm)
+int vfio_device_open(struct vfio_device_file *df)
 {
-	int ret = 0;
+	struct vfio_device *device = df->device;
+
+	lockdep_assert_held(&device->dev_set->lock);
 
-	mutex_lock(&device->dev_set->lock);
 	device->open_count++;
 	if (device->open_count == 1) {
-		ret = vfio_device_first_open(device, iommufd, kvm);
-		if (ret)
+		int ret;
+
+		ret = vfio_device_first_open(df);
+		if (ret) {
 			device->open_count--;
+			return ret;
+		}
 	}
-	mutex_unlock(&device->dev_set->lock);
 
-	return ret;
+	return 0;
 }
 
-void vfio_device_close(struct vfio_device *device,
-		       struct iommufd_ctx *iommufd)
+void vfio_device_close(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
+
 	mutex_lock(&device->dev_set->lock);
 	vfio_assert_device_open(device);
 	if (device->open_count == 1)
-		vfio_device_last_close(device, iommufd);
+		vfio_device_last_close(df);
 	device->open_count--;
 	mutex_unlock(&device->dev_set->lock);
 }
@@ -492,7 +500,7 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
-	vfio_device_group_close(device);
+	vfio_device_group_close(df);
 	kfree(df);
 	vfio_device_put_registration(device);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC 07/12] vfio: Block device access via device fd until device is opened
  2022-12-19  8:47 [RFC 00/12] Add vfio_device cdev for iommufd support Yi Liu
                   ` (5 preceding siblings ...)
  2022-12-19  8:47 ` [RFC 06/12] vfio: Pass struct vfio_device_file * to vfio_device_open/close() Yi Liu
@ 2022-12-19  8:47 ` Yi Liu
  2022-12-21  4:18   ` Tian, Kevin
  2023-01-04 20:47   ` Jason Gunthorpe
  2022-12-19  8:47 ` [RFC 08/12] vfio: Add infrastructure for bind_iommufd and attach Yi Liu
                   ` (6 subsequent siblings)
  13 siblings, 2 replies; 44+ messages in thread
From: Yi Liu @ 2022-12-19  8:47 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kevin.tian, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang

Allow the vfio_device file to be in a state where the device FD is
opened but the device is not opened. This inbetween state is not used
when the device FD is spawned from the group FD, however when we create
the device FD directly by opening a cdev it will be opened in the
blocked state.

In the blocked state, currently only the bind operation is allowed,
other device accesses are not allowed. Completing bind will allow user
to further access the device.

This is implemented by adding a flag in struct vfio_device_file to mark
the blocked state and using a simple smp_load_acquire() to obtain the
flag value and serialize all the device setup with the thread accessing
this device.

Due to this scheme it is not possible to unbind the FD, once it bound it
remains bound until the FD is closed.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/vfio.h      |  1 +
 drivers/vfio/vfio_main.c | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 24ccf9f0ab16..4f29e3a89f64 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -20,6 +20,7 @@ struct vfio_device_file {
 	struct vfio_device *device;
 	struct kvm *kvm;
 	struct iommufd_ctx *iommufd;
+	bool access_granted;
 };
 
 void vfio_device_put_registration(struct vfio_device *device);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 2aa3597abb40..bd1d9621af46 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -444,6 +444,11 @@ int vfio_device_open(struct vfio_device_file *df)
 		}
 	}
 
+	/*
+	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
+	 * read/write/mmap
+	 */
+	smp_store_release(&df->access_granted, true);
 	return 0;
 }
 
@@ -452,6 +457,11 @@ void vfio_device_close(struct vfio_device_file *df)
 	struct vfio_device *device = df->device;
 
 	mutex_lock(&device->dev_set->lock);
+	/*
+	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
+	 * read/write/mmap
+	 */
+	smp_store_release(&df->access_granted, false);
 	vfio_assert_device_open(device);
 	if (device->open_count == 1)
 		vfio_device_last_close(df);
@@ -968,8 +978,14 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
 {
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
+	bool access;
 	int ret;
 
+	/* Paired with smp_store_release() in vfio_device_open/close() */
+	access = smp_load_acquire(&df->access_granted);
+	if (!access)
+		return -EINVAL;
+
 	ret = vfio_device_pm_runtime_get(device);
 	if (ret)
 		return ret;
@@ -996,6 +1012,12 @@ static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf,
 {
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
+	bool access;
+
+	/* Paired with smp_store_release() in vfio_device_open/close() */
+	access = smp_load_acquire(&df->access_granted);
+	if (!access)
+		return -EINVAL;
 
 	if (unlikely(!device->ops->read))
 		return -EINVAL;
@@ -1009,6 +1031,12 @@ static ssize_t vfio_device_fops_write(struct file *filep,
 {
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
+	bool access;
+
+	/* Paired with smp_store_release() in vfio_device_open/close() */
+	access = smp_load_acquire(&df->access_granted);
+	if (!access)
+		return -EINVAL;
 
 	if (unlikely(!device->ops->write))
 		return -EINVAL;
@@ -1020,6 +1048,12 @@ static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
 {
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
+	bool access;
+
+	/* Paired with smp_store_release() in vfio_device_open/close() */
+	access = smp_load_acquire(&df->access_granted);
+	if (!access)
+		return -EINVAL;
 
 	if (unlikely(!device->ops->mmap))
 		return -EINVAL;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC 08/12] vfio: Add infrastructure for bind_iommufd and attach
  2022-12-19  8:47 [RFC 00/12] Add vfio_device cdev for iommufd support Yi Liu
                   ` (6 preceding siblings ...)
  2022-12-19  8:47 ` [RFC 07/12] vfio: Block device access via device fd until device is opened Yi Liu
@ 2022-12-19  8:47 ` Yi Liu
  2023-01-09  5:46   ` Tian, Kevin
  2022-12-19  8:47 ` [RFC 09/12] vfio: Make vfio_device_open() exclusive between group path and device cdev path Yi Liu
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Yi Liu @ 2022-12-19  8:47 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kevin.tian, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang

This prepares to add ioctls for device cdev fd. This infrastructure includes:
    - vfio_iommufd_bind() to accept pt_id, and also return back dev_id to caller.
    - vfio_iommufd_attach() to support iommufd pgtable attach after bind_iommufd.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c     | 12 +++++-
 drivers/vfio/iommufd.c   | 79 ++++++++++++++++++++++++++++++----------
 drivers/vfio/vfio.h      | 15 ++++++--
 drivers/vfio/vfio_main.c | 10 +++--
 4 files changed, 88 insertions(+), 28 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 1030a0ad3cf1..c4d0564874f2 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -157,6 +157,8 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
 static int vfio_device_group_open(struct vfio_device_file *df)
 {
 	struct vfio_device *device = df->device;
+	u32 ioas_id;
+	u32 *pt_id = NULL;
 	int ret;
 
 	mutex_lock(&device->group->group_lock);
@@ -165,6 +167,14 @@ static int vfio_device_group_open(struct vfio_device_file *df)
 		goto err_unlock_group;
 	}
 
+	if (device->group->iommufd) {
+		ret = iommufd_vfio_compat_ioas_id(device->group->iommufd,
+						  &ioas_id);
+		if (ret)
+			goto err_unlock_group;
+		pt_id = &ioas_id;
+	}
+
 	mutex_lock(&device->dev_set->lock);
 	/*
 	 * Here we pass the KVM pointer with the group under the lock.  If the
@@ -174,7 +184,7 @@ static int vfio_device_group_open(struct vfio_device_file *df)
 	df->kvm = device->group->kvm;
 	df->iommufd = device->group->iommufd;
 
-	ret = vfio_device_open(df);
+	ret = vfio_device_open(df, NULL, pt_id);
 	if (ret)
 		goto err_unlock_device;
 	mutex_unlock(&device->dev_set->lock);
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index 4f82a6fa7c6c..2d8ac65ad7e3 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -10,9 +10,17 @@
 MODULE_IMPORT_NS(IOMMUFD);
 MODULE_IMPORT_NS(IOMMUFD_VFIO);
 
-int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
+/* @pt_id == NULL impplies detach */
+int vfio_iommufd_attach(struct vfio_device *vdev, u32 *pt_id)
+{
+	lockdep_assert_held(&vdev->dev_set->lock);
+
+	return vdev->ops->attach_ioas(vdev, pt_id);
+}
+
+int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx,
+		      u32 *dev_id, u32 *pt_id)
 {
-	u32 ioas_id;
 	u32 device_id;
 	int ret;
 
@@ -29,17 +37,14 @@ int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
 	if (ret)
 		return ret;
 
-	ret = iommufd_vfio_compat_ioas_id(ictx, &ioas_id);
-	if (ret)
-		goto err_unbind;
-	ret = vdev->ops->attach_ioas(vdev, &ioas_id);
-	if (ret)
-		goto err_unbind;
+	if (pt_id) {
+		ret = vfio_iommufd_attach(vdev, pt_id);
+		if (ret)
+			goto err_unbind;
+	}
 
-	/*
-	 * The legacy path has no way to return the device id or the selected
-	 * pt_id
-	 */
+	if (dev_id)
+		*dev_id = device_id;
 	return 0;
 
 err_unbind:
@@ -74,14 +79,18 @@ int vfio_iommufd_physical_bind(struct vfio_device *vdev,
 }
 EXPORT_SYMBOL_GPL(vfio_iommufd_physical_bind);
 
+static void __vfio_iommufd_detach(struct vfio_device *vdev)
+{
+	iommufd_device_detach(vdev->iommufd_device);
+	vdev->iommufd_attached = false;
+}
+
 void vfio_iommufd_physical_unbind(struct vfio_device *vdev)
 {
 	lockdep_assert_held(&vdev->dev_set->lock);
 
-	if (vdev->iommufd_attached) {
-		iommufd_device_detach(vdev->iommufd_device);
-		vdev->iommufd_attached = false;
-	}
+	if (vdev->iommufd_attached)
+		__vfio_iommufd_detach(vdev);
 	iommufd_device_unbind(vdev->iommufd_device);
 	vdev->iommufd_device = NULL;
 }
@@ -91,6 +100,20 @@ int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
 {
 	int rc;
 
+	lockdep_assert_held(&vdev->dev_set->lock);
+
+	if (!vdev->iommufd_device)
+		return -EINVAL;
+
+	if (!pt_id) {
+		if (vdev->iommufd_attached)
+			__vfio_iommufd_detach(vdev);
+		return 0;
+	}
+
+	if (vdev->iommufd_attached)
+		return -EBUSY;
+
 	rc = iommufd_device_attach(vdev->iommufd_device, pt_id);
 	if (rc)
 		return rc;
@@ -129,14 +152,18 @@ int vfio_iommufd_emulated_bind(struct vfio_device *vdev,
 }
 EXPORT_SYMBOL_GPL(vfio_iommufd_emulated_bind);
 
+static void __vfio_iommufd_access_destroy(struct vfio_device *vdev)
+{
+	iommufd_access_destroy(vdev->iommufd_access);
+	vdev->iommufd_access = NULL;
+}
+
 void vfio_iommufd_emulated_unbind(struct vfio_device *vdev)
 {
 	lockdep_assert_held(&vdev->dev_set->lock);
 
-	if (vdev->iommufd_access) {
-		iommufd_access_destroy(vdev->iommufd_access);
-		vdev->iommufd_access = NULL;
-	}
+	if (vdev->iommufd_access)
+		__vfio_iommufd_access_destroy(vdev);
 	iommufd_ctx_put(vdev->iommufd_ictx);
 	vdev->iommufd_ictx = NULL;
 }
@@ -148,6 +175,18 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
 
 	lockdep_assert_held(&vdev->dev_set->lock);
 
+	if (!vdev->iommufd_ictx)
+		return -EINVAL;
+
+	if (!pt_id) {
+		if (vdev->iommufd_access)
+			__vfio_iommufd_access_destroy(vdev);
+		return 0;
+	}
+
+	if (vdev->iommufd_access)
+		return -EBUSY;
+
 	user = iommufd_access_create(vdev->iommufd_ictx, *pt_id, &vfio_user_ops,
 				     vdev);
 	if (IS_ERR(user))
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 4f29e3a89f64..c099aa4e7d78 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -25,7 +25,8 @@ struct vfio_device_file {
 
 void vfio_device_put_registration(struct vfio_device *device);
 bool vfio_device_try_get_registration(struct vfio_device *device);
-int vfio_device_open(struct vfio_device_file *df);
+int vfio_device_open(struct vfio_device_file *df,
+		     u32 *dev_id, u32 *pt_id);
 void vfio_device_close(struct vfio_device_file *device);
 
 struct vfio_device_file *
@@ -230,11 +231,14 @@ static inline void vfio_container_cleanup(void)
 #endif
 
 #if IS_ENABLED(CONFIG_IOMMUFD)
-int vfio_iommufd_bind(struct vfio_device *device, struct iommufd_ctx *ictx);
+int vfio_iommufd_bind(struct vfio_device *device, struct iommufd_ctx *ictx,
+		      u32 *dev_id, u32 *pt_id);
 void vfio_iommufd_unbind(struct vfio_device *device);
+int vfio_iommufd_attach(struct vfio_device *vdev, u32 *pt_id);
 #else
 static inline int vfio_iommufd_bind(struct vfio_device *device,
-				    struct iommufd_ctx *ictx)
+				    struct iommufd_ctx *ictx,
+				    u32 *dev_id, u32 *pt_id)
 {
 	return -EOPNOTSUPP;
 }
@@ -242,6 +246,11 @@ static inline int vfio_iommufd_bind(struct vfio_device *device,
 static inline void vfio_iommufd_unbind(struct vfio_device *device)
 {
 }
+
+static inline int vfio_iommufd_attach(struct vfio_device *vdev, u32 *pt_id)
+{
+	return -EOPNOTSUPP;
+}
 #endif
 
 #ifdef CONFIG_VFIO_NOIOMMU
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index bd1d9621af46..304633eee589 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -372,7 +372,8 @@ vfio_allocate_device_file(struct vfio_device *device)
 	return df;
 }
 
-static int vfio_device_first_open(struct vfio_device_file *df)
+static int vfio_device_first_open(struct vfio_device_file *df,
+				  u32 *dev_id, u32 *pt_id)
 {
 	struct vfio_device *device = df->device;
 	struct iommufd_ctx *iommufd = df->iommufd;
@@ -385,7 +386,7 @@ static int vfio_device_first_open(struct vfio_device_file *df)
 		return -ENODEV;
 
 	if (iommufd)
-		ret = vfio_iommufd_bind(device, iommufd);
+		ret = vfio_iommufd_bind(device, iommufd, dev_id, pt_id);
 	else
 		ret = vfio_device_group_use_iommu(device);
 	if (ret)
@@ -427,7 +428,8 @@ static void vfio_device_last_close(struct vfio_device_file *df)
 	module_put(device->dev->driver->owner);
 }
 
-int vfio_device_open(struct vfio_device_file *df)
+int vfio_device_open(struct vfio_device_file *df,
+		     u32 *dev_id, u32 *pt_id)
 {
 	struct vfio_device *device = df->device;
 
@@ -437,7 +439,7 @@ int vfio_device_open(struct vfio_device_file *df)
 	if (device->open_count == 1) {
 		int ret;
 
-		ret = vfio_device_first_open(df);
+		ret = vfio_device_first_open(df, dev_id, pt_id);
 		if (ret) {
 			device->open_count--;
 			return ret;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC 09/12] vfio: Make vfio_device_open() exclusive between group path and device cdev path
  2022-12-19  8:47 [RFC 00/12] Add vfio_device cdev for iommufd support Yi Liu
                   ` (7 preceding siblings ...)
  2022-12-19  8:47 ` [RFC 08/12] vfio: Add infrastructure for bind_iommufd and attach Yi Liu
@ 2022-12-19  8:47 ` Yi Liu
  2023-01-09  6:03   ` Tian, Kevin
  2022-12-19  8:47 ` [RFC 10/12] vfio: Add cdev for vfio_device Yi Liu
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Yi Liu @ 2022-12-19  8:47 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kevin.tian, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang

VFIO group has historically allowed multi-open of the device FD. This
was made secure because the "open" was executed via an ioctl to the
group FD which is itself only single open.

No know use of multiple device FDs is known. It is kind of a strange
thing to do because new device FDs can naturally be created via dup().

When we implement the new device uAPI there is no natural way to allow
the device itself from being multi-opened in a secure manner. Without
the group FD we cannot prove the security context of the opener.

Thus, when moving to the new uAPI we block the ability to multi-open
the device. This also makes the cdev path exclusive with group path.

This is done by adding a single_open flag in struct vfio_device_file and
a same flag in struct vfio_device_file. vfio_device_file::single_open is
set per the vfio_device_file allocation. Its value is propagated to struct
vfio_device after device is opened.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c     |  2 +-
 drivers/vfio/vfio.h      |  6 +++++-
 drivers/vfio/vfio_main.c | 25 ++++++++++++++++++++++---
 include/linux/vfio.h     |  1 +
 4 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index c4d0564874f2..0b08a277cd0e 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -216,7 +216,7 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
 	struct file *filep;
 	int ret;
 
-	df = vfio_allocate_device_file(device);
+	df = vfio_allocate_device_file(device, false);
 	if (IS_ERR(df)) {
 		ret = PTR_ERR(df);
 		goto err_out;
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index c099aa4e7d78..058d7a19dada 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -17,7 +17,11 @@ struct vfio_device;
 struct vfio_container;
 
 struct vfio_device_file {
+	/* static fields, init per allocation */
 	struct vfio_device *device;
+	bool single_open;
+
+	/* fields set after allocation */
 	struct kvm *kvm;
 	struct iommufd_ctx *iommufd;
 	bool access_granted;
@@ -30,7 +34,7 @@ int vfio_device_open(struct vfio_device_file *df,
 void vfio_device_close(struct vfio_device_file *device);
 
 struct vfio_device_file *
-vfio_allocate_device_file(struct vfio_device *device);
+vfio_allocate_device_file(struct vfio_device *device, bool single_open);
 
 extern const struct file_operations vfio_device_fops;
 
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 304633eee589..1bda847e9f10 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -359,7 +359,7 @@ static bool vfio_assert_device_open(struct vfio_device *device)
 }
 
 struct vfio_device_file *
-vfio_allocate_device_file(struct vfio_device *device)
+vfio_allocate_device_file(struct vfio_device *device, bool single_open)
 {
 	struct vfio_device_file *df;
 
@@ -368,6 +368,7 @@ vfio_allocate_device_file(struct vfio_device *device)
 		return ERR_PTR(-ENOMEM);
 
 	df->device = device;
+	df->single_open = single_open;
 
 	return df;
 }
@@ -435,6 +436,16 @@ int vfio_device_open(struct vfio_device_file *df,
 
 	lockdep_assert_held(&device->dev_set->lock);
 
+	/*
+	 * Device cdev path cannot support multiple device open since
+	 * it doesn't have a secure way for it. So a second device
+	 * open attempt should be failed if the caller is from a cdev
+	 * path or the device has already been opened by a cdev path.
+	 */
+	if (device->open_count != 0 &&
+	    (df->single_open || device->single_open))
+		return -EINVAL;
+
 	device->open_count++;
 	if (device->open_count == 1) {
 		int ret;
@@ -444,6 +455,7 @@ int vfio_device_open(struct vfio_device_file *df,
 			device->open_count--;
 			return ret;
 		}
+		device->single_open = df->single_open;
 	}
 
 	/*
@@ -465,8 +477,10 @@ void vfio_device_close(struct vfio_device_file *df)
 	 */
 	smp_store_release(&df->access_granted, false);
 	vfio_assert_device_open(device);
-	if (device->open_count == 1)
+	if (device->open_count == 1) {
 		vfio_device_last_close(df);
+		device->single_open = false;
+	}
 	device->open_count--;
 	mutex_unlock(&device->dev_set->lock);
 }
@@ -512,7 +526,12 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
-	vfio_device_group_close(df);
+	/*
+	 * group path supports multiple device open, while cdev doesn't.
+	 * So use vfio_device_group_close() for !singel_open case.
+	 */
+	if (!df->single_open)
+		vfio_device_group_close(df);
 	kfree(df);
 	vfio_device_put_registration(device);
 
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 9e3b9b5c8c8b..5465e29a8a83 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -63,6 +63,7 @@ struct vfio_device {
 	struct iommufd_ctx *iommufd_ictx;
 	bool iommufd_attached;
 #endif
+	bool single_open;
 };
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC 10/12] vfio: Add cdev for vfio_device
  2022-12-19  8:47 [RFC 00/12] Add vfio_device cdev for iommufd support Yi Liu
                   ` (8 preceding siblings ...)
  2022-12-19  8:47 ` [RFC 09/12] vfio: Make vfio_device_open() exclusive between group path and device cdev path Yi Liu
@ 2022-12-19  8:47 ` Yi Liu
  2023-01-09  6:54   ` Tian, Kevin
  2022-12-19  8:47 ` [RFC 11/12] vfio: Add ioctls for device cdev iommufd Yi Liu
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Yi Liu @ 2022-12-19  8:47 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kevin.tian, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang, Joao Martins

This allows user to directly open a vfio device w/o using the legacy
container/group interface, as a prerequisite for supporting new iommu
features like nested translation.

The device fd opened in this manner doesn't have the capability to access
the device as the fops open() doesn't open the device until the successful
BIND_IOMMUFD which be added in next patch.

With this patch, devices registered to vfio core have both group and device
interface created.

- group interface : /dev/vfio/$groupID
- device interface: /dev/vfio/devices/vfioX  (X is the minor number and
					      is unique across devices)

Given a vfio device the user can identify the matching vfioX by checking
the sysfs path of the device. Take PCI device (0000:6a:01.0) for example,
/sys/bus/pci/devices/0000\:6a\:01.0/vfio-dev/vfio0/dev contains the
major:minor of the matching vfioX.

Userspace then opens the /dev/vfio/devices/vfioX and checks with fstat
that the major:minor matches.

The vfio_device cdev logic in this patch:
*) __vfio_register_dev() path ends up doing cdev_device_add() for each
   vfio_device;
*) vfio_unregister_group_dev() path does cdev_device_del();

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 drivers/vfio/vfio_main.c | 101 +++++++++++++++++++++++++++++++++++----
 include/linux/vfio.h     |   7 ++-
 2 files changed, 99 insertions(+), 9 deletions(-)

diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 1bda847e9f10..4ba90b55ec44 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -43,6 +43,9 @@
 static struct vfio {
 	struct class			*device_class;
 	struct ida			device_ida;
+#if IS_ENABLED(CONFIG_IOMMUFD)
+	dev_t                           device_devt;
+#endif
 } vfio;
 
 static DEFINE_XARRAY(vfio_device_set_xa);
@@ -156,7 +159,7 @@ static void vfio_device_release(struct device *dev)
 			container_of(dev, struct vfio_device, device);
 
 	vfio_release_device_set(device);
-	ida_free(&vfio.device_ida, device->index);
+	ida_free(&vfio.device_ida, MINOR(device->device.devt));
 
 	/*
 	 * kvfree() cannot be done here due to a life cycle mess in
@@ -211,15 +214,16 @@ EXPORT_SYMBOL_GPL(_vfio_alloc_device);
 int vfio_init_device(struct vfio_device *device, struct device *dev,
 		     const struct vfio_device_ops *ops)
 {
+	unsigned int minor;
 	int ret;
 
 	ret = ida_alloc_max(&vfio.device_ida, MINORMASK, GFP_KERNEL);
 	if (ret < 0) {
-		dev_dbg(dev, "Error to alloc index\n");
+		dev_dbg(dev, "Error to alloc minor\n");
 		return ret;
 	}
 
-	device->index = ret;
+	minor = ret;
 	init_completion(&device->comp);
 	device->dev = dev;
 	device->ops = ops;
@@ -234,11 +238,18 @@ int vfio_init_device(struct vfio_device *device, struct device *dev,
 	device->device.release = vfio_device_release;
 	device->device.class = vfio.device_class;
 	device->device.parent = device->dev;
+#if IS_ENABLED(CONFIG_IOMMUFD)
+	device->device.devt = MKDEV(MAJOR(vfio.device_devt), minor);
+	cdev_init(&device->cdev, &vfio_device_fops);
+	device->cdev.owner = THIS_MODULE;
+#else
+	device->index = ret;
+#endif
 	return 0;
 
 out_uninit:
 	vfio_release_device_set(device);
-	ida_free(&vfio.device_ida, device->index);
+	ida_free(&vfio.device_ida, minor);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(vfio_init_device);
@@ -257,6 +268,7 @@ EXPORT_SYMBOL_GPL(vfio_free_device);
 static int __vfio_register_dev(struct vfio_device *device,
 			       enum vfio_group_type type)
 {
+	unsigned int minor;
 	int ret;
 
 	if (WARN_ON(device->ops->bind_iommufd &&
@@ -271,7 +283,12 @@ static int __vfio_register_dev(struct vfio_device *device,
 	if (!device->dev_set)
 		vfio_assign_device_set(device, device);
 
-	ret = dev_set_name(&device->device, "vfio%d", device->index);
+#if IS_ENABLED(CONFIG_IOMMUFD)
+	minor = MINOR(device->device.devt);
+#else
+	minor = device->index;
+#endif
+	ret = dev_set_name(&device->device, "vfio%d", minor);
 	if (ret)
 		return ret;
 
@@ -279,7 +296,11 @@ static int __vfio_register_dev(struct vfio_device *device,
 	if (ret)
 		return ret;
 
+#if IS_ENABLED(CONFIG_IOMMUFD)
+	ret = cdev_device_add(&device->cdev, &device->device);
+#else
 	ret = device_add(&device->device);
+#endif
 	if (ret)
 		goto err_out;
 
@@ -319,6 +340,17 @@ void vfio_unregister_group_dev(struct vfio_device *device)
 	bool interrupted = false;
 	long rc;
 
+#if IS_ENABLED(CONFIG_IOMMUFD)
+	/*
+	 * Balances device_add in register path. Putting it as the first
+	 * operation in unregister to prevent registration refcount from
+	 * incrementing per cdev open.
+	 */
+	cdev_device_del(&device->cdev, &device->device);
+#else
+	device_del(&device->device);
+#endif
+
 	vfio_device_put_registration(device);
 	rc = try_wait_for_completion(&device->comp);
 	while (rc <= 0) {
@@ -344,9 +376,6 @@ void vfio_unregister_group_dev(struct vfio_device *device)
 
 	vfio_device_group_unregister(device);
 
-	/* Balances device_add in register path */
-	device_del(&device->device);
-
 	/* Balances vfio_device_set_group in register path */
 	vfio_device_remove_group(device);
 }
@@ -521,6 +550,37 @@ static inline void vfio_device_pm_runtime_put(struct vfio_device *device)
 /*
  * VFIO Device fd
  */
+#if IS_ENABLED(CONFIG_IOMMUFD)
+static int vfio_device_fops_open(struct inode *inode, struct file *filep)
+{
+	struct vfio_device *device = container_of(inode->i_cdev,
+						  struct vfio_device, cdev);
+	struct vfio_device_file *df;
+	int ret;
+
+	if (!vfio_device_try_get_registration(device))
+		return -ENODEV;
+
+	/*
+	 * vfio device open is done in BIND_IOMMUFD for cdev, before
+	 * that, device access is blocked for this cdev open.
+	 */
+	df = vfio_allocate_device_file(device, true);
+	if (IS_ERR(df)) {
+		ret = PTR_ERR(df);
+		goto err_put_registration;
+	}
+
+	filep->private_data = df;
+
+	return 0;
+
+err_put_registration:
+	vfio_device_put_registration(device);
+	return ret;
+}
+#endif
+
 static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 {
 	struct vfio_device_file *df = filep->private_data;
@@ -1084,6 +1144,9 @@ static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
 
 const struct file_operations vfio_device_fops = {
 	.owner		= THIS_MODULE,
+#if IS_ENABLED(CONFIG_IOMMUFD)
+	.open		= vfio_device_fops_open,
+#endif
 	.release	= vfio_device_fops_release,
 	.read		= vfio_device_fops_read,
 	.write		= vfio_device_fops_write,
@@ -1450,6 +1513,13 @@ EXPORT_SYMBOL(vfio_dma_rw);
 /*
  * Module/class support
  */
+#if IS_ENABLED(CONFIG_IOMMUFD)
+static char *vfio_device_devnode(struct device *dev, umode_t *mode)
+{
+	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
+}
+#endif
+
 static int __init vfio_init(void)
 {
 	int ret;
@@ -1467,9 +1537,21 @@ static int __init vfio_init(void)
 		goto err_dev_class;
 	}
 
+#if IS_ENABLED(CONFIG_IOMMUFD)
+	vfio.device_class->devnode = vfio_device_devnode;
+	ret = alloc_chrdev_region(&vfio.device_devt, 0,
+				  MINORMASK + 1, "vfio-dev");
+	if (ret)
+		goto err_alloc_dev_chrdev;
+#endif
 	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
 	return 0;
 
+#if IS_ENABLED(CONFIG_IOMMUFD)
+err_alloc_dev_chrdev:
+	class_destroy(vfio.device_class);
+	vfio.device_class = NULL;
+#endif
 err_dev_class:
 	vfio_group_cleanup();
 	return ret;
@@ -1478,6 +1560,9 @@ static int __init vfio_init(void)
 static void __exit vfio_cleanup(void)
 {
 	ida_destroy(&vfio.device_ida);
+#if IS_ENABLED(CONFIG_IOMMUFD)
+	unregister_chrdev_region(vfio.device_devt, MINORMASK + 1);
+#endif
 	class_destroy(vfio.device_class);
 	vfio.device_class = NULL;
 	vfio_group_cleanup();
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 5465e29a8a83..bf909ee51185 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -13,6 +13,7 @@
 #include <linux/mm.h>
 #include <linux/workqueue.h>
 #include <linux/poll.h>
+#include <linux/cdev.h>
 #include <uapi/linux/vfio.h>
 #include <linux/iova_bitmap.h>
 
@@ -50,8 +51,12 @@ struct vfio_device {
 	struct kvm *kvm;
 
 	/* Members below here are private, not for driver use */
-	unsigned int index;
 	struct device device;	/* device.kref covers object life circle */
+#if IS_ENABLED(CONFIG_IOMMUFD)
+	struct cdev cdev;
+#else
+	unsigned int index;
+#endif
 	refcount_t refcount;	/* user count on registered device*/
 	unsigned int open_count;
 	struct completion comp;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC 11/12] vfio: Add ioctls for device cdev iommufd
  2022-12-19  8:47 [RFC 00/12] Add vfio_device cdev for iommufd support Yi Liu
                   ` (9 preceding siblings ...)
  2022-12-19  8:47 ` [RFC 10/12] vfio: Add cdev for vfio_device Yi Liu
@ 2022-12-19  8:47 ` Yi Liu
  2023-01-09  7:47   ` Tian, Kevin
  2022-12-19  8:47 ` [RFC 12/12] vfio: Compile group optionally Yi Liu
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Yi Liu @ 2022-12-19  8:47 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kevin.tian, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang

This adds two vfio device ioctls for userspace using iommufd on vfio
devices.

    VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA
			      control provided by the iommufd. VFIO no
			      iommu is indicated by passing a minus
			      fd value.
    VFIO_DEVICE_ATTACH_IOMMUFD_PT: attach device to ioas, page tables
				   managed by iommufd. Attach can be
				   undo by passing IOMMUFD_INVALID_ID
				   to kernel.

The ioctls introduced here are just on par with existing VFIO.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/vfio_main.c     | 145 +++++++++++++++++++++++++++++++++--
 include/uapi/linux/iommufd.h |   2 +
 include/uapi/linux/vfio.h    |  64 ++++++++++++++++
 3 files changed, 206 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 4ba90b55ec44..d2a32b8a562b 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -34,6 +34,7 @@
 #include <linux/interval_tree.h>
 #include <linux/iova_bitmap.h>
 #include <linux/iommufd.h>
+#include <uapi/linux/iommufd.h>
 #include "vfio.h"
 
 #define DRIVER_VERSION	"0.3"
@@ -415,7 +416,7 @@ static int vfio_device_first_open(struct vfio_device_file *df,
 	if (!try_module_get(device->dev->driver->owner))
 		return -ENODEV;
 
-	if (iommufd)
+	if (iommufd && !IS_ERR(iommufd))
 		ret = vfio_iommufd_bind(device, iommufd, dev_id, pt_id);
 	else
 		ret = vfio_device_group_use_iommu(device);
@@ -432,7 +433,7 @@ static int vfio_device_first_open(struct vfio_device_file *df,
 
 err_unuse_iommu:
 	device->kvm = NULL;
-	if (iommufd)
+	if (iommufd && !IS_ERR(iommufd))
 		vfio_iommufd_unbind(device);
 	else
 		vfio_device_group_unuse_iommu(device);
@@ -451,7 +452,7 @@ static void vfio_device_last_close(struct vfio_device_file *df)
 	if (device->ops->close_device)
 		device->ops->close_device(device);
 	device->kvm = NULL;
-	if (iommufd)
+	if (iommufd && !IS_ERR(iommufd))
 		vfio_iommufd_unbind(device);
 	else
 		vfio_device_group_unuse_iommu(device);
@@ -495,11 +496,10 @@ int vfio_device_open(struct vfio_device_file *df,
 	return 0;
 }
 
-void vfio_device_close(struct vfio_device_file *df)
+static void __vfio_device_close(struct vfio_device_file *df)
 {
 	struct vfio_device *device = df->device;
 
-	mutex_lock(&device->dev_set->lock);
 	/*
 	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
 	 * read/write/mmap
@@ -511,6 +511,14 @@ void vfio_device_close(struct vfio_device_file *df)
 		device->single_open = false;
 	}
 	device->open_count--;
+}
+
+void vfio_device_close(struct vfio_device_file *df)
+{
+	struct vfio_device *device = df->device;
+
+	mutex_lock(&device->dev_set->lock);
+	__vfio_device_close(df);
 	mutex_unlock(&device->dev_set->lock);
 }
 
@@ -592,6 +600,8 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 	 */
 	if (!df->single_open)
 		vfio_device_group_close(df);
+	else
+		vfio_device_close(df);
 	kfree(df);
 	vfio_device_put_registration(device);
 
@@ -1054,6 +1064,124 @@ static int vfio_ioctl_device_feature(struct vfio_device *device,
 	}
 }
 
+static long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
+					   unsigned long arg)
+{
+	struct vfio_device *device = df->device;
+	struct vfio_device_bind_iommufd bind;
+	struct iommufd_ctx *iommufd;
+	unsigned long minsz;
+	struct fd f;
+	int ret;
+	bool access;
+
+	minsz = offsetofend(struct vfio_device_bind_iommufd, iommufd);
+
+	if (copy_from_user(&bind, (void __user *)arg, minsz))
+		return -EFAULT;
+
+	if (bind.argsz < minsz || bind.flags)
+		return -EINVAL;
+
+	if (!device->ops->bind_iommufd)
+		return -ENODEV;
+
+	/* iommufd < 0 means noiommu mode */
+	if (bind.iommufd < 0) {
+		if (!capable(CAP_SYS_RAWIO))
+			return -EPERM;
+		iommufd = ERR_PTR(-EINVAL);
+	} else {
+		f = fdget(bind.iommufd);
+		if (!f.file)
+			return -EBADF;
+
+		iommufd = iommufd_ctx_from_file(f.file);
+		if (IS_ERR(iommufd)) {
+			fdput(f);
+			return PTR_ERR(iommufd);
+		}
+	}
+
+	mutex_lock(&device->dev_set->lock);
+	/* Paired with smp_store_release() in vfio_device_open/close() */
+	access = smp_load_acquire(&df->access_granted);
+	if (access) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	/* df->kvm is supposed to be set in vfio_device_file_set_kvm() */
+	df->iommufd = iommufd;
+	ret = vfio_device_open(df, &bind.out_devid, NULL);
+	if (ret)
+		goto out_unlock;
+
+	ret = copy_to_user((void __user *)arg + minsz,
+			   &bind.out_devid,
+			   sizeof(bind.out_devid)) ? -EFAULT : 0;
+	if (ret)
+		goto out_close_device;
+
+	mutex_unlock(&device->dev_set->lock);
+	if (!IS_ERR(iommufd))
+		fdput(f);
+	else
+		dev_warn(device->dev, "vfio-noiommu device used by user "
+			 "(%s:%d)\n", current->comm, task_pid_nr(current));
+	return 0;
+
+out_close_device:
+	__vfio_device_close(df);
+out_unlock:
+	df->iommufd = NULL;
+	mutex_unlock(&device->dev_set->lock);
+	if (!IS_ERR(iommufd))
+		fdput(f);
+	return ret;
+}
+
+static int vfio_ioctl_device_attach(struct vfio_device *device,
+				    struct vfio_device_feature __user *arg)
+{
+	struct vfio_device_attach_iommufd_pt attach;
+	u32 pt_id;
+	int ret;
+
+	if (copy_from_user(&attach, (void __user *)arg, sizeof(attach)))
+		return -EFAULT;
+
+	if (attach.flags)
+		return -EINVAL;
+
+	if (!device->ops->bind_iommufd)
+		return -ENODEV;
+
+	mutex_lock(&device->dev_set->lock);
+	pt_id = attach.pt_id;
+	ret = vfio_iommufd_attach(device,
+				  pt_id != IOMMUFD_INVALID_ID ? &pt_id : NULL);
+	if (ret)
+		goto out_unlock;
+
+	if (pt_id != IOMMUFD_INVALID_ID) {
+		ret = copy_to_user((void __user *)arg + offsetofend(
+				   struct vfio_device_attach_iommufd_pt, flags),
+				   &pt_id,
+				   sizeof(pt_id)) ? -EFAULT : 0;
+		if (ret)
+			goto out_detach;
+	}
+	mutex_unlock(&device->dev_set->lock);
+	return 0;
+
+out_detach:
+	vfio_iommufd_attach(device, NULL);
+out_unlock:
+	mutex_unlock(&device->dev_set->lock);
+	return ret;
+}
+
 static long vfio_device_fops_unl_ioctl(struct file *filep,
 				       unsigned int cmd, unsigned long arg)
 {
@@ -1062,6 +1190,9 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
 	bool access;
 	int ret;
 
+	if (cmd == VFIO_DEVICE_BIND_IOMMUFD)
+		return vfio_device_ioctl_bind_iommufd(df, arg);
+
 	/* Paired with smp_store_release() in vfio_device_open/close() */
 	access = smp_load_acquire(&df->access_granted);
 	if (!access)
@@ -1076,6 +1207,10 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
 		ret = vfio_ioctl_device_feature(device, (void __user *)arg);
 		break;
 
+	case VFIO_DEVICE_ATTACH_IOMMUFD_PT:
+		ret = vfio_ioctl_device_attach(device, (void __user *)arg);
+		break;
+
 	default:
 		if (unlikely(!device->ops->ioctl))
 			ret = -EINVAL;
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 98ebba80cfa1..87680274c01b 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -9,6 +9,8 @@
 
 #define IOMMUFD_TYPE (';')
 
+#define IOMMUFD_INVALID_ID 0  /* valid ID starts from 1 */
+
 /**
  * DOC: General ioctl format
  *
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index d7d8e0922376..d5cb347c0763 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -190,6 +190,70 @@ struct vfio_group_status {
 
 /* --------------- IOCTLs for DEVICE file descriptors --------------- */
 
+/*
+ * VFIO_DEVICE_BIND_IOMMUFD - _IOR(VFIO_TYPE, VFIO_BASE + 19,
+ *				   struct vfio_device_bind_iommufd)
+ *
+ * Bind a vfio_device to the specified iommufd and an ioas or a hardware
+ * page table.
+ *
+ * The user should provide a device cookie when calling this ioctl. The
+ * cookie is carried only in event e.g. I/O fault reported to userspace
+ * via iommufd. The user should use devid returned by this ioctl to mark
+ * the target device in other ioctls (e.g. capability query via iommufd).
+ *
+ * User is not allowed to access the device before the binding operation
+ * is completed.
+ *
+ * Unbind is automatically conducted when device fd is closed.
+ *
+ * @argsz:	 user filled size of this data.
+ * @flags:	 reserved for future extension.
+ * @dev_cookie:	 a per device cookie provided by userspace.
+ * @iommufd:	 iommufd to bind. iommufd < 0 means noiommu.
+ * @out_devid:	 the device id generated by this bind.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+struct vfio_device_bind_iommufd {
+	__u32		argsz;
+	__u32		flags;
+	__aligned_u64	dev_cookie;
+	__s32		iommufd;
+	__u32		out_devid;
+};
+
+#define VFIO_DEVICE_BIND_IOMMUFD	_IO(VFIO_TYPE, VFIO_BASE + 19)
+
+/*
+ * VFIO_DEVICE_ATTACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 20,
+ *					struct vfio_device_attach_iommufd_pt)
+ *
+ * Attach a vfio device to an iommufd address space specified by IOAS
+ * id or hardware page table id.
+ *
+ * Available only after a device has been bound to iommufd via
+ * VFIO_DEVICE_BIND_IOMMUFD
+ *
+ * Undo by passing pt_id == IOMMUFD_INVALID_ID
+ *
+ * @argsz:	user filled size of this data.
+ * @flags:	must be 0.
+ * @pt_id:	Input the target id, can be an ioas or a hwpt allocated
+ *		via iommufd subsystem, and output the attached pt_id. It
+ *		be the ioas, hwpt itself or an hwpt created by kernel
+ *		during the attachment.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+struct vfio_device_attach_iommufd_pt {
+	__u32	argsz;
+	__u32	flags;
+	__u32	pt_id;
+};
+
+#define VFIO_DEVICE_ATTACH_IOMMUFD_PT		_IO(VFIO_TYPE, VFIO_BASE + 20)
+
 /**
  * VFIO_DEVICE_GET_INFO - _IOR(VFIO_TYPE, VFIO_BASE + 7,
  *						struct vfio_device_info)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC 12/12] vfio: Compile group optionally
  2022-12-19  8:47 [RFC 00/12] Add vfio_device cdev for iommufd support Yi Liu
                   ` (10 preceding siblings ...)
  2022-12-19  8:47 ` [RFC 11/12] vfio: Add ioctls for device cdev iommufd Yi Liu
@ 2022-12-19  8:47 ` Yi Liu
  2022-12-19  8:51 ` [RFC 00/12] Add vfio_device cdev for iommufd support Yi Liu
  2023-01-04 15:23 ` Yi Liu
  13 siblings, 0 replies; 44+ messages in thread
From: Yi Liu @ 2022-12-19  8:47 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kevin.tian, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang

group code is not needed for vfio device cdev, so with vfio device cdev
introduced, the group infrastructures can be compiled out.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/Kconfig  | 17 +++++++++++
 drivers/vfio/Makefile |  3 +-
 drivers/vfio/vfio.h   | 69 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/vfio.h  | 11 +++++++
 4 files changed, 99 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 286c1663bd75..49bcc198d386 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -12,9 +12,26 @@ menuconfig VFIO
 	  If you don't know what to do here, say N.
 
 if VFIO
+config VFIO_ENABLE_GROUP
+	bool
+	default !IOMMUFD
+
+config VFIO_GROUP
+	bool "Support for the VFIO group /dev/vfio/$group_id"
+	select VFIO_ENABLE_GROUP
+	default y
+	help
+	   VFIO group is legacy interface for userspace. For userspaces
+	   adapted to iommufd and vfio device cdev, this can be N. For
+	   now, before iommufd is ready and userspace applications fully
+	   converted to iommufd and vfio device cdev, this should be Y.
+
+	   If you don't know what to do here, say Y.
+
 config VFIO_CONTAINER
 	bool "Support for the VFIO container /dev/vfio/vfio"
 	select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
+	depends on VFIO_ENABLE_GROUP
 	default y
 	help
 	  The VFIO container is the classic interface to VFIO for establishing
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index 3783db7e8082..7f47d508d866 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -4,8 +4,9 @@ vfio_virqfd-y := virqfd.o
 obj-$(CONFIG_VFIO) += vfio.o
 
 vfio-y += vfio_main.o \
-	  group.o \
 	  iova_bitmap.o
+
+vfio-$(CONFIG_VFIO_ENABLE_GROUP) += group.o
 vfio-$(CONFIG_IOMMUFD) += iommufd.o
 vfio-$(CONFIG_VFIO_CONTAINER) += container.o
 
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 058d7a19dada..313b0e0af846 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -62,6 +62,7 @@ enum vfio_group_type {
 	VFIO_NO_IOMMU,
 };
 
+#if IS_ENABLED(CONFIG_VFIO_ENABLE_GROUP)
 struct vfio_group {
 	struct device 			dev;
 	struct cdev			cdev;
@@ -104,6 +105,74 @@ bool vfio_group_has_dev(struct vfio_group *group, struct vfio_device *device);
 bool vfio_device_has_container(struct vfio_device *device);
 int __init vfio_group_init(void);
 void vfio_group_cleanup(void);
+#else
+struct vfio_group;
+
+static inline int vfio_device_set_group(struct vfio_device *device,
+					enum vfio_group_type type)
+{
+	return 0;
+}
+
+static inline void vfio_device_remove_group(struct vfio_device *device)
+{
+}
+
+static inline void vfio_device_group_register(struct vfio_device *device)
+{
+}
+
+static inline void vfio_device_group_unregister(struct vfio_device *device)
+{
+}
+
+static inline int vfio_device_group_use_iommu(struct vfio_device *device)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline void vfio_device_group_unuse_iommu(struct vfio_device *device)
+{
+}
+
+static inline void vfio_device_group_close(struct vfio_device_file *df)
+{
+}
+
+static inline struct vfio_group *vfio_group_from_file(struct file *file)
+{
+	return NULL;
+}
+
+static inline bool vfio_group_enforced_coherent(struct vfio_group *group)
+{
+	return true;
+}
+
+static inline void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm)
+{
+}
+
+static inline bool vfio_group_has_dev(struct vfio_group *group,
+				      struct vfio_device *device)
+{
+	return false;
+}
+
+static inline bool vfio_device_has_container(struct vfio_device *device)
+{
+	return false;
+}
+
+static inline int __init vfio_group_init(void)
+{
+	return 0;
+}
+
+static inline void vfio_group_cleanup(void)
+{
+}
+#endif /* CONFIG_VFIO_ENABLE_GROUP */
 
 #if IS_ENABLED(CONFIG_VFIO_CONTAINER)
 /* events for the backend driver notify callback */
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index bf909ee51185..4fd136cbf8db 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -43,7 +43,9 @@ struct vfio_device {
 	 */
 	const struct vfio_migration_ops *mig_ops;
 	const struct vfio_log_ops *log_ops;
+#if IS_ENABLED(CONFIG_VFIO_ENABLE_GROUP)
 	struct vfio_group *group;
+#endif
 	struct vfio_device_set *dev_set;
 	struct list_head dev_set_list;
 	unsigned int migration_flags;
@@ -60,8 +62,10 @@ struct vfio_device {
 	refcount_t refcount;	/* user count on registered device*/
 	unsigned int open_count;
 	struct completion comp;
+#if IS_ENABLED(CONFIG_VFIO_ENABLE_GROUP)
 	struct list_head group_next;
 	struct list_head iommu_entry;
+#endif
 	struct iommufd_access *iommufd_access;
 #if IS_ENABLED(CONFIG_IOMMUFD)
 	struct iommufd_device *iommufd_device;
@@ -244,7 +248,14 @@ int vfio_mig_get_next_state(struct vfio_device *device,
 /*
  * External user API
  */
+#if IS_ENABLED(CONFIG_VFIO_ENABLE_GROUP)
 struct iommu_group *vfio_file_iommu_group(struct file *file);
+#else
+static inline struct iommu_group *vfio_file_iommu_group(struct file *file)
+{
+	return NULL;
+}
+#endif
 bool vfio_file_is_valid(struct file *file);
 bool vfio_file_enforced_coherent(struct file *file);
 void vfio_file_set_kvm(struct file *file, struct kvm *kvm);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [RFC 00/12] Add vfio_device cdev for iommufd support
  2022-12-19  8:47 [RFC 00/12] Add vfio_device cdev for iommufd support Yi Liu
                   ` (11 preceding siblings ...)
  2022-12-19  8:47 ` [RFC 12/12] vfio: Compile group optionally Yi Liu
@ 2022-12-19  8:51 ` Yi Liu
  2023-01-04 15:23 ` Yi Liu
  13 siblings, 0 replies; 44+ messages in thread
From: Yi Liu @ 2022-12-19  8:51 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kevin.tian, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang

On 2022/12/19 16:47, Yi Liu wrote:
> Existing VFIO provides group-centric user APIs for userspace. Userspace
> opens the /dev/vfio/$group_id first before getting device fd and hence
> getting access to device. This is not the desired model for iommufd. Per
> the conclusion of community discussion[1], iommufd provides device-centric
> kAPIs and requires its consumer (like VFIO) to be device-centric user
> APIs. Such user APIs are used to associate device with iommufd and also
> the I/O address spaces managed by the iommufd.
> 
> This series first introduces a per device file structure to be prepared
> for further enhancement and refactors the kvm-vfio code to be prepared
> for accepting device file from userspace. Then refactors the vfio to be
> able to handle iommufd binding. This refactor includes the mechanism of
> blocking device access before iommufd bind, making vfio_device_open() be
> exclusive between the group path and the cdev path. Eventually, adds the
> cdev support for vfio device, and makes group infrastructure optional as
> it is not needed when vfio device cdev is compiled.
> 
> This is also a base for further support iommu nesting for vfio device[2].
> 
> The complete code can be found in below branch, simple test done with the
> legacy group path and the cdev path. Draft QEMU branch can be found at[3]
> 
> https://github.com/yiliu1765/iommufd/tree/vfio_device_cdev_rfcv1
> (config CONFIG_IOMMUFD=y)
> 
> [1] https://lore.kernel.org/kvm/BN9PR11MB5433B1E4AE5B0480369F97178C189@BN9PR11MB5433.namprd11.prod.outlook.com/
> [2] https://github.com/yiliu1765/iommufd/tree/wip/iommufd-v6.1-rc3-nesting
> [3] https://github.com/yiliu1765/qemu/tree/wip/qemu-iommufd-6.1-rc3

this is targeting for 6.3. So would appreciate early comments if bandwidth
allows. Otherwise, I can wait after merge window. :-)


-- 
Thanks,
Yi Liu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [RFC 01/12] vfio: Allocate per device file structure
  2022-12-19  8:47 ` [RFC 01/12] vfio: Allocate per device file structure Yi Liu
@ 2022-12-21  3:57   ` Tian, Kevin
  2022-12-21  6:46     ` Yi Liu
  0 siblings, 1 reply; 44+ messages in thread
From: Tian, Kevin @ 2022-12-21  3:57 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, December 19, 2022 4:47 PM
>
>  static int vfio_device_fops_release(struct inode *inode, struct file *filep)
>  {
> -	struct vfio_device *device = filep->private_data;
> +	struct vfio_device_file *df = filep->private_data;
> +	struct vfio_device *device = df->device;
> 
>  	vfio_device_group_close(device);
> -
> +	kfree(df);
>  	vfio_device_put_registration(device);

Why putting kfree() in between the two invocations? There is no
strict order requirement of doing so. It reads slightly better to
free df either in the start or in the end.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [RFC 03/12] vfio: Accept vfio device file in the driver facing kAPI
  2022-12-19  8:47 ` [RFC 03/12] vfio: Accept vfio device file in the driver facing kAPI Yi Liu
@ 2022-12-21  4:07   ` Tian, Kevin
  2022-12-21  7:02     ` Yi Liu
  2023-01-04 18:25     ` Jason Gunthorpe
  0 siblings, 2 replies; 44+ messages in thread
From: Tian, Kevin @ 2022-12-21  4:07 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, December 19, 2022 4:47 PM
> 
> +static bool vfio_device_enforced_coherent(struct vfio_device *device)
> +{
> +	bool ret;
> +
> +	if (!vfio_device_try_get_registration(device))
> +		return true;
> +
> +	ret = device_iommu_capable(device->dev,
> +
> IOMMU_CAP_ENFORCE_CACHE_COHERENCY);
> +
> +	vfio_device_put_registration(device);
> +	return ret;
> +}

This probably needs an explanation that recounting is required because
this might be called before vfio_device_open() is called to hold the count.

> +static void vfio_device_file_set_kvm(struct file *file, struct kvm *kvm)
> +{
> +	struct vfio_device_file *df = file->private_data;
> +	struct vfio_device *device = df->device;
> +
> +	/*
> +	 * The kvm is first recorded in the df, and will be propagated
> +	 * to vfio_device::kvm when the file binds iommufd successfully in
> +	 * the vfio device cdev path.
> +	 */

/*
 * The kvm is first recorded in vfio_device_file and later propagated
 * to vfio_device::kvm when the file is successfully bound to iommufd
 * in the cdev path
 */

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [RFC 06/12] vfio: Pass struct vfio_device_file * to vfio_device_open/close()
  2022-12-19  8:47 ` [RFC 06/12] vfio: Pass struct vfio_device_file * to vfio_device_open/close() Yi Liu
@ 2022-12-21  4:10   ` Tian, Kevin
  2022-12-21  7:04     ` Yi Liu
  0 siblings, 1 reply; 44+ messages in thread
From: Tian, Kevin @ 2022-12-21  4:10 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, December 19, 2022 4:47 PM
> 
> This avoids passing struct kvm * and struct iommufd_ctx * in multiple
> functions. vfio_device_open() becomes to be a locked helper, while
> vfio_device_open() still holds device->dev_set->lock itself.

Not sure what the words after 'while' intend to explain.

> -int vfio_device_open(struct vfio_device *device,
> -		     struct iommufd_ctx *iommufd, struct kvm *kvm)
> +int vfio_device_open(struct vfio_device_file *df)
>  {
> -	int ret = 0;
> +	struct vfio_device *device = df->device;
> +
> +	lockdep_assert_held(&device->dev_set->lock);
> 
> -	mutex_lock(&device->dev_set->lock);
>  	device->open_count++;
>  	if (device->open_count == 1) {
> -		ret = vfio_device_first_open(device, iommufd, kvm);
> -		if (ret)
> +		int ret;
> +
> +		ret = vfio_device_first_open(df);
> +		if (ret) {
>  			device->open_count--;
> +			return ret;
> +		}
>  	}
> -	mutex_unlock(&device->dev_set->lock);
> 
> -	return ret;
> +	return 0;
>  }

I don't see the point of moving 'ret' into the inner block.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [RFC 07/12] vfio: Block device access via device fd until device is opened
  2022-12-19  8:47 ` [RFC 07/12] vfio: Block device access via device fd until device is opened Yi Liu
@ 2022-12-21  4:18   ` Tian, Kevin
  2023-01-04 20:47   ` Jason Gunthorpe
  1 sibling, 0 replies; 44+ messages in thread
From: Tian, Kevin @ 2022-12-21  4:18 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, December 19, 2022 4:47 PM
> 
> Allow the vfio_device file to be in a state where the device FD is
> opened but the device is not opened. This inbetween state is not used

'but the device cannot be used by userspace (i.e. its .open_device()
hasn't been called)"

> when the device FD is spawned from the group FD, however when we create
> the device FD directly by opening a cdev it will be opened in the
> blocked state.
> 
> In the blocked state, currently only the bind operation is allowed,
> other device accesses are not allowed. Completing bind will allow user
> to further access the device.
> 
> This is implemented by adding a flag in struct vfio_device_file to mark
> the blocked state and using a simple smp_load_acquire() to obtain the
> flag value and serialize all the device setup with the thread accessing
> this device.
> 
> Due to this scheme it is not possible to unbind the FD, once it bound it
> remains bound until the FD is closed.

Could you elaborate why it's impossible to unbind the fd with this scheme?

> 
> +	/* Paired with smp_store_release() in vfio_device_open/close() */
> +	access = smp_load_acquire(&df->access_granted);
> +	if (!access)
> +		return -EINVAL;
> +

-EPERM?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC 01/12] vfio: Allocate per device file structure
  2022-12-21  3:57   ` Tian, Kevin
@ 2022-12-21  6:46     ` Yi Liu
  0 siblings, 0 replies; 44+ messages in thread
From: Yi Liu @ 2022-12-21  6:46 UTC (permalink / raw)
  To: Tian, Kevin, alex.williamson, jgg
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang

On 2022/12/21 11:57, Tian, Kevin wrote:
>> From: Liu, Yi L <yi.l.liu@intel.com>
>> Sent: Monday, December 19, 2022 4:47 PM
>>
>>   static int vfio_device_fops_release(struct inode *inode, struct file *filep)
>>   {
>> -	struct vfio_device *device = filep->private_data;
>> +	struct vfio_device_file *df = filep->private_data;
>> +	struct vfio_device *device = df->device;
>>
>>   	vfio_device_group_close(device);
>> -
>> +	kfree(df);
>>   	vfio_device_put_registration(device);
> 
> Why putting kfree() in between the two invocations? There is no
> strict order requirement of doing so. It reads slightly better to
> free df either in the start or in the end.

yes, no strict order here. maybe in the end.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC 03/12] vfio: Accept vfio device file in the driver facing kAPI
  2022-12-21  4:07   ` Tian, Kevin
@ 2022-12-21  7:02     ` Yi Liu
  2023-01-04 18:25     ` Jason Gunthorpe
  1 sibling, 0 replies; 44+ messages in thread
From: Yi Liu @ 2022-12-21  7:02 UTC (permalink / raw)
  To: Tian, Kevin, alex.williamson, jgg
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang

On 2022/12/21 12:07, Tian, Kevin wrote:
>> From: Liu, Yi L <yi.l.liu@intel.com>
>> Sent: Monday, December 19, 2022 4:47 PM
>>
>> +static bool vfio_device_enforced_coherent(struct vfio_device *device)
>> +{
>> +	bool ret;
>> +
>> +	if (!vfio_device_try_get_registration(device))
>> +		return true;
>> +
>> +	ret = device_iommu_capable(device->dev,
>> +
>> IOMMU_CAP_ENFORCE_CACHE_COHERENCY);
>> +
>> +	vfio_device_put_registration(device);
>> +	return ret;
>> +}
> 
> This probably needs an explanation that recounting is required because
> this might be called before vfio_device_open() is called to hold the count.

yes.

>> +static void vfio_device_file_set_kvm(struct file *file, struct kvm *kvm)
>> +{
>> +	struct vfio_device_file *df = file->private_data;
>> +	struct vfio_device *device = df->device;
>> +
>> +	/*
>> +	 * The kvm is first recorded in the df, and will be propagated
>> +	 * to vfio_device::kvm when the file binds iommufd successfully in
>> +	 * the vfio device cdev path.
>> +	 */
> 
> /*
>   * The kvm is first recorded in vfio_device_file and later propagated
>   * to vfio_device::kvm when the file is successfully bound to iommufd
>   * in the cdev path
>   */

got it.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC 06/12] vfio: Pass struct vfio_device_file * to vfio_device_open/close()
  2022-12-21  4:10   ` Tian, Kevin
@ 2022-12-21  7:04     ` Yi Liu
  0 siblings, 0 replies; 44+ messages in thread
From: Yi Liu @ 2022-12-21  7:04 UTC (permalink / raw)
  To: Tian, Kevin, alex.williamson, jgg
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang


On 2022/12/21 12:10, Tian, Kevin wrote:
>> From: Liu, Yi L <yi.l.liu@intel.com>
>> Sent: Monday, December 19, 2022 4:47 PM
>>
>> This avoids passing struct kvm * and struct iommufd_ctx * in multiple
>> functions. vfio_device_open() becomes to be a locked helper, while
>> vfio_device_open() still holds device->dev_set->lock itself.
> 
> Not sure what the words after 'while' intend to explain.

yeah, may remove it.

>> -int vfio_device_open(struct vfio_device *device,
>> -		     struct iommufd_ctx *iommufd, struct kvm *kvm)
>> +int vfio_device_open(struct vfio_device_file *df)
>>   {
>> -	int ret = 0;
>> +	struct vfio_device *device = df->device;
>> +
>> +	lockdep_assert_held(&device->dev_set->lock);
>>
>> -	mutex_lock(&device->dev_set->lock);
>>   	device->open_count++;
>>   	if (device->open_count == 1) {
>> -		ret = vfio_device_first_open(device, iommufd, kvm);
>> -		if (ret)
>> +		int ret;
>> +
>> +		ret = vfio_device_first_open(df);
>> +		if (ret) {
>>   			device->open_count--;
>> +			return ret;
>> +		}
>>   	}
>> -	mutex_unlock(&device->dev_set->lock);
>>
>> -	return ret;
>> +	return 0;
>>   }
> 
> I don't see the point of moving 'ret' into the inner block.

just want to make the function success oriented. :-)

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC 00/12] Add vfio_device cdev for iommufd support
  2022-12-19  8:47 [RFC 00/12] Add vfio_device cdev for iommufd support Yi Liu
                   ` (12 preceding siblings ...)
  2022-12-19  8:51 ` [RFC 00/12] Add vfio_device cdev for iommufd support Yi Liu
@ 2023-01-04 15:23 ` Yi Liu
  13 siblings, 0 replies; 44+ messages in thread
From: Yi Liu @ 2023-01-04 15:23 UTC (permalink / raw)
  To: alex.williamson, jgg
  Cc: kevin.tian, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang

On 2022/12/19 16:47, Yi Liu wrote:
> Existing VFIO provides group-centric user APIs for userspace. Userspace
> opens the /dev/vfio/$group_id first before getting device fd and hence
> getting access to device. This is not the desired model for iommufd. Per
> the conclusion of community discussion[1], iommufd provides device-centric
> kAPIs and requires its consumer (like VFIO) to be device-centric user
> APIs. Such user APIs are used to associate device with iommufd and also
> the I/O address spaces managed by the iommufd.
> 
> This series first introduces a per device file structure to be prepared
> for further enhancement and refactors the kvm-vfio code to be prepared
> for accepting device file from userspace. Then refactors the vfio to be
> able to handle iommufd binding. This refactor includes the mechanism of
> blocking device access before iommufd bind, making vfio_device_open() be
> exclusive between the group path and the cdev path. Eventually, adds the
> cdev support for vfio device, and makes group infrastructure optional as
> it is not needed when vfio device cdev is compiled.
> 
> This is also a base for further support iommu nesting for vfio device[2].
> 
> The complete code can be found in below branch, simple test done with the
> legacy group path and the cdev path. Draft QEMU branch can be found at[3]
> 
> https://github.com/yiliu1765/iommufd/tree/vfio_device_cdev_rfcv1
> (config CONFIG_IOMMUFD=y)
> 
> [1] https://lore.kernel.org/kvm/BN9PR11MB5433B1E4AE5B0480369F97178C189@BN9PR11MB5433.namprd11.prod.outlook.com/
> [2] https://github.com/yiliu1765/iommufd/tree/wip/iommufd-v6.1-rc3-nesting
> [3] https://github.com/yiliu1765/qemu/tree/wip/qemu-iommufd-6.1-rc3

a ping in case of this series is "buried" deeply. :-)

> Regards,
> 	Yi Liu
> 
> Yi Liu (12):
>    vfio: Allocate per device file structure
>    vfio: Refine vfio file kAPIs
>    vfio: Accept vfio device file in the driver facing kAPI
>    kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device
>      fd
>    kvm/vfio: Accept vfio device file from userspace
>    vfio: Pass struct vfio_device_file * to vfio_device_open/close()
>    vfio: Block device access via device fd until device is opened
>    vfio: Add infrastructure for bind_iommufd and attach
>    vfio: Make vfio_device_open() exclusive between group path and device
>      cdev path
>    vfio: Add cdev for vfio_device
>    vfio: Add ioctls for device cdev iommufd
>    vfio: Compile group optionally
> 
>   Documentation/virt/kvm/devices/vfio.rst |  32 +-
>   drivers/vfio/Kconfig                    |  17 +
>   drivers/vfio/Makefile                   |   3 +-
>   drivers/vfio/group.c                    | 131 +++----
>   drivers/vfio/iommufd.c                  |  79 +++-
>   drivers/vfio/pci/vfio_pci_core.c        |   4 +-
>   drivers/vfio/vfio.h                     | 108 +++++-
>   drivers/vfio/vfio_main.c                | 492 ++++++++++++++++++++++--
>   include/linux/vfio.h                    |  21 +-
>   include/uapi/linux/iommufd.h            |   2 +
>   include/uapi/linux/kvm.h                |  23 +-
>   include/uapi/linux/vfio.h               |  64 +++
>   virt/kvm/vfio.c                         | 143 +++----
>   13 files changed, 891 insertions(+), 228 deletions(-)
> 

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC 03/12] vfio: Accept vfio device file in the driver facing kAPI
  2022-12-21  4:07   ` Tian, Kevin
  2022-12-21  7:02     ` Yi Liu
@ 2023-01-04 18:25     ` Jason Gunthorpe
  2023-01-09  4:13       ` Tian, Kevin
  1 sibling, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2023-01-04 18:25 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, alex.williamson, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang

On Wed, Dec 21, 2022 at 04:07:42AM +0000, Tian, Kevin wrote:
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Monday, December 19, 2022 4:47 PM
> > 
> > +static bool vfio_device_enforced_coherent(struct vfio_device *device)
> > +{
> > +	bool ret;
> > +
> > +	if (!vfio_device_try_get_registration(device))
> > +		return true;
> > +
> > +	ret = device_iommu_capable(device->dev,
> > +
> > IOMMU_CAP_ENFORCE_CACHE_COHERENCY);
> > +
> > +	vfio_device_put_registration(device);
> > +	return ret;
> > +}
> 
> This probably needs an explanation that recounting is required because
> this might be called before vfio_device_open() is called to hold the
> count.

How does that happen? The only call site already has a struct file and
the struct file implicitly holds the registration. This should be
removed.

Just inline the whole thing and make it clear:

       device = vfio_device_from_file(file);
       if (device)
       	  return device_iommu_capable(device->dev,
                                  IOMMU_CAP_ENFORCE_CACHE_COHERENCY);

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC 07/12] vfio: Block device access via device fd until device is opened
  2022-12-19  8:47 ` [RFC 07/12] vfio: Block device access via device fd until device is opened Yi Liu
  2022-12-21  4:18   ` Tian, Kevin
@ 2023-01-04 20:47   ` Jason Gunthorpe
  1 sibling, 0 replies; 44+ messages in thread
From: Jason Gunthorpe @ 2023-01-04 20:47 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang

On Mon, Dec 19, 2022 at 12:47:13AM -0800, Yi Liu wrote:

> @@ -452,6 +457,11 @@ void vfio_device_close(struct vfio_device_file *df)
>  	struct vfio_device *device = df->device;
>  
>  	mutex_lock(&device->dev_set->lock);
> +	/*
> +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> +	 * read/write/mmap
> +	 */
> +	smp_store_release(&df->access_granted, false);

Storing false makes no sense, it can't do anything, this function
should only be called if we are in a release function so we know that
there can't be concurrent access to access_granted.

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC 05/12] kvm/vfio: Accept vfio device file from userspace
  2022-12-19  8:47 ` [RFC 05/12] kvm/vfio: Accept vfio device file from userspace Yi Liu
@ 2023-01-06 14:32   ` Jason Gunthorpe
  2023-01-06 14:46     ` Yi Liu
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2023-01-06 14:32 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang

On Mon, Dec 19, 2022 at 12:47:11AM -0800, Yi Liu wrote:
> This defines KVM_DEV_VFIO_FILE* and make alias with KVM_DEV_VFIO_GROUP*.
> Old userspace uses KVM_DEV_VFIO_GROUP* works as well.

Do we have a circular refcount problem with this plan?

The kvm will hold a ref on the vfio device struct file

Once the vfio device struct file reaches open_device we will hold a
ref on the kvm

At this point if both kvm and vfio device FDs are closed will the
kernel clean it up or does it leak because they both ref each other?

Please test to confirm..

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC 05/12] kvm/vfio: Accept vfio device file from userspace
  2023-01-06 14:32   ` Jason Gunthorpe
@ 2023-01-06 14:46     ` Yi Liu
  2023-01-06 14:55       ` Jason Gunthorpe
  0 siblings, 1 reply; 44+ messages in thread
From: Yi Liu @ 2023-01-06 14:46 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: alex.williamson, kevin.tian, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang

On 2023/1/6 22:32, Jason Gunthorpe wrote:
> On Mon, Dec 19, 2022 at 12:47:11AM -0800, Yi Liu wrote:
>> This defines KVM_DEV_VFIO_FILE* and make alias with KVM_DEV_VFIO_GROUP*.
>> Old userspace uses KVM_DEV_VFIO_GROUP* works as well.
> 
> Do we have a circular refcount problem with this plan?
> 
> The kvm will hold a ref on the vfio device struct file
> 
> Once the vfio device struct file reaches open_device we will hold a
> ref on the kvm
> 
> At this point if both kvm and vfio device FDs are closed will the
> kernel clean it up or does it leak because they both ref each other?

looks to be a circular. In my past test, seems no apparent issue. But
I'll do a test to confirm it. If this is a problem, it should be an
existing issue. right? Should have same issue with group file.

> Please test to confirm..

will do.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC 05/12] kvm/vfio: Accept vfio device file from userspace
  2023-01-06 14:46     ` Yi Liu
@ 2023-01-06 14:55       ` Jason Gunthorpe
  2023-01-06 15:04         ` Yi Liu
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2023-01-06 14:55 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang

On Fri, Jan 06, 2023 at 10:46:56PM +0800, Yi Liu wrote:
> On 2023/1/6 22:32, Jason Gunthorpe wrote:
> > On Mon, Dec 19, 2022 at 12:47:11AM -0800, Yi Liu wrote:
> > > This defines KVM_DEV_VFIO_FILE* and make alias with KVM_DEV_VFIO_GROUP*.
> > > Old userspace uses KVM_DEV_VFIO_GROUP* works as well.
> > 
> > Do we have a circular refcount problem with this plan?
> > 
> > The kvm will hold a ref on the vfio device struct file
> > 
> > Once the vfio device struct file reaches open_device we will hold a
> > ref on the kvm
> > 
> > At this point if both kvm and vfio device FDs are closed will the
> > kernel clean it up or does it leak because they both ref each other?
> 
> looks to be a circular. In my past test, seems no apparent issue. But
> I'll do a test to confirm it. If this is a problem, it should be an
> existing issue. right? Should have same issue with group file.

The group is probably fine since the device struct file will not have
any reference it will close which will release the kvm and then the
group.

> > Please test to confirm..
> 
> will do.

Probably kvm needs to put back the VFIO file reference when its own
struct file closes, not when when the kvm->users_count reaches 0.

This will allow the VFIO device file to close and drop the users_count

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC 05/12] kvm/vfio: Accept vfio device file from userspace
  2023-01-06 14:55       ` Jason Gunthorpe
@ 2023-01-06 15:04         ` Yi Liu
  2023-01-06 16:08           ` Jason Gunthorpe
  2023-01-09  4:17           ` Tian, Kevin
  0 siblings, 2 replies; 44+ messages in thread
From: Yi Liu @ 2023-01-06 15:04 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: alex.williamson, kevin.tian, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang

On 2023/1/6 22:55, Jason Gunthorpe wrote:
> On Fri, Jan 06, 2023 at 10:46:56PM +0800, Yi Liu wrote:
>> On 2023/1/6 22:32, Jason Gunthorpe wrote:
>>> On Mon, Dec 19, 2022 at 12:47:11AM -0800, Yi Liu wrote:
>>>> This defines KVM_DEV_VFIO_FILE* and make alias with KVM_DEV_VFIO_GROUP*.
>>>> Old userspace uses KVM_DEV_VFIO_GROUP* works as well.
>>>
>>> Do we have a circular refcount problem with this plan?
>>>
>>> The kvm will hold a ref on the vfio device struct file
>>>
>>> Once the vfio device struct file reaches open_device we will hold a
>>> ref on the kvm
>>>
>>> At this point if both kvm and vfio device FDs are closed will the
>>> kernel clean it up or does it leak because they both ref each other?
>>
>> looks to be a circular. In my past test, seems no apparent issue. But
>> I'll do a test to confirm it. If this is a problem, it should be an
>> existing issue. right? Should have same issue with group file.
> 
> The group is probably fine since the device struct file will not have
> any reference it will close which will release the kvm and then the
> group.

you are right.

> 
>>> Please test to confirm..
>>
>> will do.
> 
> Probably kvm needs to put back the VFIO file reference when its own
> struct file closes, not when when the kvm->users_count reaches 0.

yes. Seems no need to hold device file reference until las kvm->user_count.
At least no such need per my understanding.

> 
> This will allow the VFIO device file to close and drop the users_count

yeah. It's interesting I haven't hit real problem so far. But this does
look to be a circular. When I ctrl+c to kill qemu, I can boot qemu again
with the same device assigned. anyhow, let me add some prtink to check
it. thanks for the catch.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC 05/12] kvm/vfio: Accept vfio device file from userspace
  2023-01-06 15:04         ` Yi Liu
@ 2023-01-06 16:08           ` Jason Gunthorpe
  2023-01-09  4:17           ` Tian, Kevin
  1 sibling, 0 replies; 44+ messages in thread
From: Jason Gunthorpe @ 2023-01-06 16:08 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang

On Fri, Jan 06, 2023 at 11:04:42PM +0800, Yi Liu wrote:

> > This will allow the VFIO device file to close and drop the users_count
> 
> yeah. It's interesting I haven't hit real problem so far. But this does
> look to be a circular. When I ctrl+c to kill qemu, I can boot qemu again
> with the same device assigned. anyhow, let me add some prtink to check
> it. thanks for the catch.

You have to test with one of the situations that hold the kvm ref
during open, eg gvt on intel

To make it happen hack the vfio code to hold the reference at
open_device and let it go during close_device, then you should see the
problem.

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [RFC 03/12] vfio: Accept vfio device file in the driver facing kAPI
  2023-01-04 18:25     ` Jason Gunthorpe
@ 2023-01-09  4:13       ` Tian, Kevin
  0 siblings, 0 replies; 44+ messages in thread
From: Tian, Kevin @ 2023-01-09  4:13 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liu, Yi L, alex.williamson, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, January 5, 2023 2:25 AM
> 
> On Wed, Dec 21, 2022 at 04:07:42AM +0000, Tian, Kevin wrote:
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Monday, December 19, 2022 4:47 PM
> > >
> > > +static bool vfio_device_enforced_coherent(struct vfio_device *device)
> > > +{
> > > +	bool ret;
> > > +
> > > +	if (!vfio_device_try_get_registration(device))
> > > +		return true;
> > > +
> > > +	ret = device_iommu_capable(device->dev,
> > > +
> > > IOMMU_CAP_ENFORCE_CACHE_COHERENCY);
> > > +
> > > +	vfio_device_put_registration(device);
> > > +	return ret;
> > > +}
> >
> > This probably needs an explanation that recounting is required because
> > this might be called before vfio_device_open() is called to hold the
> > count.
> 
> How does that happen? The only call site already has a struct file and
> the struct file implicitly holds the registration. This should be
> removed.
> 

You are right.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [RFC 05/12] kvm/vfio: Accept vfio device file from userspace
  2023-01-06 15:04         ` Yi Liu
  2023-01-06 16:08           ` Jason Gunthorpe
@ 2023-01-09  4:17           ` Tian, Kevin
  2023-01-09  4:26             ` Yi Liu
  1 sibling, 1 reply; 44+ messages in thread
From: Tian, Kevin @ 2023-01-09  4:17 UTC (permalink / raw)
  To: Liu, Yi L, Jason Gunthorpe
  Cc: alex.williamson, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Friday, January 6, 2023 11:05 PM
> 
> >
> > Probably kvm needs to put back the VFIO file reference when its own
> > struct file closes, not when when the kvm->users_count reaches 0.
> 
> yes. Seems no need to hold device file reference until las kvm->user_count.
> At least no such need per my understanding.
> 

looks just replacing .destroy() with .release() in kvm_vfio_ops...

        /*
         * Destroy is responsible for freeing dev.
         *
         * Destroy may be called before or after destructors are called
         * on emulated I/O regions, depending on whether a reference is
         * held by a vcpu or other kvm component that gets destroyed
         * after the emulated I/O.
         */
        void (*destroy)(struct kvm_device *dev);

        /*
         * Release is an alternative method to free the device. It is
         * called when the device file descriptor is closed. Once
         * release is called, the destroy method will not be called
         * anymore as the device is removed from the device list of
         * the VM. kvm->lock is held.
         */
        void (*release)(struct kvm_device *dev);

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC 05/12] kvm/vfio: Accept vfio device file from userspace
  2023-01-09  4:17           ` Tian, Kevin
@ 2023-01-09  4:26             ` Yi Liu
  0 siblings, 0 replies; 44+ messages in thread
From: Yi Liu @ 2023-01-09  4:26 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe
  Cc: alex.williamson, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang

On 2023/1/9 12:17, Tian, Kevin wrote:
>> From: Liu, Yi L <yi.l.liu@intel.com>
>> Sent: Friday, January 6, 2023 11:05 PM
>>
>>>
>>> Probably kvm needs to put back the VFIO file reference when its own
>>> struct file closes, not when when the kvm->users_count reaches 0.
>>
>> yes. Seems no need to hold device file reference until las kvm->user_count.
>> At least no such need per my understanding.
>>
> 
> looks just replacing .destroy() with .release() in kvm_vfio_ops...

yeah, this seems to be the easiest way. Let kvm_vfio_ops implement the 
.release() op. Other kvm_devices won't be affected.

>          /*
>           * Destroy is responsible for freeing dev.
>           *
>           * Destroy may be called before or after destructors are called
>           * on emulated I/O regions, depending on whether a reference is
>           * held by a vcpu or other kvm component that gets destroyed
>           * after the emulated I/O.
>           */
>          void (*destroy)(struct kvm_device *dev);
> 
>          /*
>           * Release is an alternative method to free the device. It is
>           * called when the device file descriptor is closed. Once
>           * release is called, the destroy method will not be called
>           * anymore as the device is removed from the device list of
>           * the VM. kvm->lock is held.
>           */
>          void (*release)(struct kvm_device *dev);

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [RFC 08/12] vfio: Add infrastructure for bind_iommufd and attach
  2022-12-19  8:47 ` [RFC 08/12] vfio: Add infrastructure for bind_iommufd and attach Yi Liu
@ 2023-01-09  5:46   ` Tian, Kevin
  2023-01-09 13:21     ` Jason Gunthorpe
  0 siblings, 1 reply; 44+ messages in thread
From: Tian, Kevin @ 2023-01-09  5:46 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, December 19, 2022 4:47 PM
> 
> This prepares to add ioctls for device cdev fd. This infrastructure includes:
>     - vfio_iommufd_bind() to accept pt_id, and also return back dev_id to
> caller.
>     - vfio_iommufd_attach() to support iommufd pgtable attach after
> bind_iommufd.

Please mention that pt_id==-1 implies detach.

> 
> -int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
> +/* @pt_id == NULL impplies detach */

s/impplies/implies/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [RFC 09/12] vfio: Make vfio_device_open() exclusive between group path and device cdev path
  2022-12-19  8:47 ` [RFC 09/12] vfio: Make vfio_device_open() exclusive between group path and device cdev path Yi Liu
@ 2023-01-09  6:03   ` Tian, Kevin
  0 siblings, 0 replies; 44+ messages in thread
From: Tian, Kevin @ 2023-01-09  6:03 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, December 19, 2022 4:47 PM
> 
> VFIO group has historically allowed multi-open of the device FD. This
> was made secure because the "open" was executed via an ioctl to the
> group FD which is itself only single open.
> 
> No know use of multiple device FDs is known. It is kind of a strange
> thing to do because new device FDs can naturally be created via dup().
> 
> When we implement the new device uAPI there is no natural way to allow
> the device itself from being multi-opened in a secure manner. Without
> the group FD we cannot prove the security context of the opener.
> 
> Thus, when moving to the new uAPI we block the ability to multi-open
> the device. This also makes the cdev path exclusive with group path.

also highlight that the new scheme needs to sustain both the
legacy behavior i.e. multi-open in the group path and the new
behavior i.e. single-open in the cdev path. This mixture leads to
the introduction of a new single_open flag stored both in
vfio_device_file and vfio_device.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [RFC 10/12] vfio: Add cdev for vfio_device
  2022-12-19  8:47 ` [RFC 10/12] vfio: Add cdev for vfio_device Yi Liu
@ 2023-01-09  6:54   ` Tian, Kevin
  0 siblings, 0 replies; 44+ messages in thread
From: Tian, Kevin @ 2023-01-09  6:54 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang, Martins, Joao

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, December 19, 2022 4:47 PM
> @@ -156,7 +159,7 @@ static void vfio_device_release(struct device *dev)
>  			container_of(dev, struct vfio_device, device);
> 
>  	vfio_release_device_set(device);
> -	ida_free(&vfio.device_ida, device->index);
> +	ida_free(&vfio.device_ida, MINOR(device->device.devt));

device->index is required if !CONFIG_IOMMUFD.

> @@ -234,11 +238,18 @@ int vfio_init_device(struct vfio_device *device,
> struct device *dev,
>  	device->device.release = vfio_device_release;
>  	device->device.class = vfio.device_class;
>  	device->device.parent = device->dev;
> +#if IS_ENABLED(CONFIG_IOMMUFD)
> +	device->device.devt = MKDEV(MAJOR(vfio.device_devt), minor);
> +	cdev_init(&device->cdev, &vfio_device_fops);
> +	device->cdev.owner = THIS_MODULE;
> +#else
> +	device->index = ret;

this should be 'minor' since 'ret' has been overwritten by ops->init().

> +	/*
> +	 * vfio device open is done in BIND_IOMMUFD for cdev, before
> +	 * that, device access is blocked for this cdev open.
> +	 */

/* device access is blocked until .open_device() is called in BIND_IOMMUFD */


^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [RFC 11/12] vfio: Add ioctls for device cdev iommufd
  2022-12-19  8:47 ` [RFC 11/12] vfio: Add ioctls for device cdev iommufd Yi Liu
@ 2023-01-09  7:47   ` Tian, Kevin
  2023-01-09 14:14     ` Jason Gunthorpe
  2023-01-09 14:55     ` Yi Liu
  0 siblings, 2 replies; 44+ messages in thread
From: Tian, Kevin @ 2023-01-09  7:47 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, December 19, 2022 4:47 PM
> 
> @@ -415,7 +416,7 @@ static int vfio_device_first_open(struct
> vfio_device_file *df,
>  	if (!try_module_get(device->dev->driver->owner))
>  		return -ENODEV;
> 
> -	if (iommufd)
> +	if (iommufd && !IS_ERR(iommufd))
>  		ret = vfio_iommufd_bind(device, iommufd, dev_id, pt_id);
>  	else
>  		ret = vfio_device_group_use_iommu(device);

can you elaborate how noiommu actually works in the cdev path?

I'm a bit lost here.

> @@ -592,6 +600,8 @@ static int vfio_device_fops_release(struct inode
> *inode, struct file *filep)
>  	 */
>  	if (!df->single_open)
>  		vfio_device_group_close(df);
> +	else
> +		vfio_device_close(df);
>  	kfree(df);
>  	vfio_device_put_registration(device);

belong to last patch?

> +	mutex_lock(&device->dev_set->lock);
> +	/* Paired with smp_store_release() in vfio_device_open/close() */
> +	access = smp_load_acquire(&df->access_granted);
> +	if (access) {
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	}

Not sure it's required. The lock is already held then just checking
df->iommufd should be sufficient.

> +	mutex_lock(&device->dev_set->lock);
> +	pt_id = attach.pt_id;
> +	ret = vfio_iommufd_attach(device,
> +				  pt_id != IOMMUFD_INVALID_ID ? &pt_id :
> NULL);
> +	if (ret)
> +		goto out_unlock;
> +
> +	if (pt_id != IOMMUFD_INVALID_ID) {

it's clearer to use an 'attach' local variable

> diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
> index 98ebba80cfa1..87680274c01b 100644
> --- a/include/uapi/linux/iommufd.h
> +++ b/include/uapi/linux/iommufd.h
> @@ -9,6 +9,8 @@
> 
>  #define IOMMUFD_TYPE (';')
> 
> +#define IOMMUFD_INVALID_ID 0  /* valid ID starts from 1 */

Can you point out where valid IDs are guaranteed to start
from 1?

According to _iommufd_object_alloc() it uses xa_limit_32b as
limit which includes '0' as the lowest ID

> +/*
> + * VFIO_DEVICE_BIND_IOMMUFD - _IOR(VFIO_TYPE, VFIO_BASE + 19,
> + *				   struct vfio_device_bind_iommufd)
> + *
> + * Bind a vfio_device to the specified iommufd and an ioas or a hardware
> + * page table.

this is stale. BIND now is only about bind. No ioas.

> + * VFIO_DEVICE_ATTACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE +
> 20,
> + *					struct
> vfio_device_attach_iommufd_pt)
> + *
> + * Attach a vfio device to an iommufd address space specified by IOAS
> + * id or hardware page table id.
> + *
> + * Available only after a device has been bound to iommufd via
> + * VFIO_DEVICE_BIND_IOMMUFD
> + *
> + * Undo by passing pt_id == IOMMUFD_INVALID_ID
> + *
> + * @argsz:	user filled size of this data.
> + * @flags:	must be 0.
> + * @pt_id:	Input the target id, can be an ioas or a hwpt allocated
> + *		via iommufd subsystem, and output the attached pt_id. It
> + *		be the ioas, hwpt itself or an hwpt created by kernel
> + *		during the attachment.

Input the target id which can represent an ioas or a hwpt allocated
via iommufd subsystem. Output the attached hwpt id which could
be the specified hwpt itself or a hwpt automatically created for the
specified ioas by kernel during the attachment.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC 08/12] vfio: Add infrastructure for bind_iommufd and attach
  2023-01-09  5:46   ` Tian, Kevin
@ 2023-01-09 13:21     ` Jason Gunthorpe
  2023-01-10  2:53       ` Tian, Kevin
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2023-01-09 13:21 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, alex.williamson, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang

On Mon, Jan 09, 2023 at 05:46:04AM +0000, Tian, Kevin wrote:
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Monday, December 19, 2022 4:47 PM
> > 
> > This prepares to add ioctls for device cdev fd. This infrastructure includes:
> >     - vfio_iommufd_bind() to accept pt_id, and also return back dev_id to
> > caller.
> >     - vfio_iommufd_attach() to support iommufd pgtable attach after
> > bind_iommufd.
> 
> Please mention that pt_id==-1 implies detach.

Oh, do we want that or a dedicated ioctl? -1 isn't a u32..

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC 11/12] vfio: Add ioctls for device cdev iommufd
  2023-01-09  7:47   ` Tian, Kevin
@ 2023-01-09 14:14     ` Jason Gunthorpe
  2023-01-09 15:07       ` Yi Liu
  2023-01-09 14:55     ` Yi Liu
  1 sibling, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2023-01-09 14:14 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, alex.williamson, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang

On Mon, Jan 09, 2023 at 07:47:03AM +0000, Tian, Kevin wrote:
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Monday, December 19, 2022 4:47 PM
> > 
> > @@ -415,7 +416,7 @@ static int vfio_device_first_open(struct
> > vfio_device_file *df,
> >  	if (!try_module_get(device->dev->driver->owner))
> >  		return -ENODEV;
> > 
> > -	if (iommufd)
> > +	if (iommufd && !IS_ERR(iommufd))
> >  		ret = vfio_iommufd_bind(device, iommufd, dev_id, pt_id);
> >  	else
> >  		ret = vfio_device_group_use_iommu(device);
> 
> can you elaborate how noiommu actually works in the cdev path?

I still need someone to test the noiommu iommufd patch with dpdk, I'll
post it in a minute

For cdev conversion no-iommu should be triggered by passing in -1 for
the iommufd file descriptor to indicate there is no translation. Then
the module parameter and security checks should be done before
allowing the open to succeed with an identity translation in place.

There should be a check that there is no iommu driver controlling the
device as well..

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC 11/12] vfio: Add ioctls for device cdev iommufd
  2023-01-09  7:47   ` Tian, Kevin
  2023-01-09 14:14     ` Jason Gunthorpe
@ 2023-01-09 14:55     ` Yi Liu
  2023-01-10  2:57       ` Tian, Kevin
  1 sibling, 1 reply; 44+ messages in thread
From: Yi Liu @ 2023-01-09 14:55 UTC (permalink / raw)
  To: Tian, Kevin, alex.williamson, jgg
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang

On 2023/1/9 15:47, Tian, Kevin wrote:
>> From: Liu, Yi L <yi.l.liu@intel.com>
>> Sent: Monday, December 19, 2022 4:47 PM
>>
>> @@ -415,7 +416,7 @@ static int vfio_device_first_open(struct
>> vfio_device_file *df,
>>   	if (!try_module_get(device->dev->driver->owner))
>>   		return -ENODEV;
>>
>> -	if (iommufd)
>> +	if (iommufd && !IS_ERR(iommufd))
>>   		ret = vfio_iommufd_bind(device, iommufd, dev_id, pt_id);
>>   	else
>>   		ret = vfio_device_group_use_iommu(device);
> 
> can you elaborate how noiommu actually works in the cdev path?
> 
> I'm a bit lost here.
> 
>> @@ -592,6 +600,8 @@ static int vfio_device_fops_release(struct inode
>> *inode, struct file *filep)
>>   	 */
>>   	if (!df->single_open)
>>   		vfio_device_group_close(df);
>> +	else
>> +		vfio_device_close(df);
>>   	kfree(df);
>>   	vfio_device_put_registration(device);
> 
> belong to last patch?

not really. In last patch, it only adds the cdev, but no ioctls.
Only when the BIND_IOMMUFD is added, should the df->single_open
possible be true. So I added it in this patch instead of last one.

> 
>> +	mutex_lock(&device->dev_set->lock);
>> +	/* Paired with smp_store_release() in vfio_device_open/close() */
>> +	access = smp_load_acquire(&df->access_granted);
>> +	if (access) {
>> +		ret = -EINVAL;
>> +		goto out_unlock;
>> +	}
> 
> Not sure it's required. The lock is already held then just checking
> df->iommufd should be sufficient.

you are right. check df->iommufd is enough.

>> +	mutex_lock(&device->dev_set->lock);
>> +	pt_id = attach.pt_id;
>> +	ret = vfio_iommufd_attach(device,
>> +				  pt_id != IOMMUFD_INVALID_ID ? &pt_id :
>> NULL);
>> +	if (ret)
>> +		goto out_unlock;
>> +
>> +	if (pt_id != IOMMUFD_INVALID_ID) {
> 
> it's clearer to use an 'attach' local variable

not quit get. We already have 'attach' in above lines.:-)

>> diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
>> index 98ebba80cfa1..87680274c01b 100644
>> --- a/include/uapi/linux/iommufd.h
>> +++ b/include/uapi/linux/iommufd.h
>> @@ -9,6 +9,8 @@
>>
>>   #define IOMMUFD_TYPE (';')
>>
>> +#define IOMMUFD_INVALID_ID 0  /* valid ID starts from 1 */
> 
> Can you point out where valid IDs are guaranteed to start
> from 1?
> 
> According to _iommufd_object_alloc() it uses xa_limit_32b as
> limit which includes '0' as the lowest ID

xa_init_flags(&ictx->objects, XA_FLAGS_ALLOC1 | XA_FLAGS_ACCOUNT);

yes, but the xarray init uses XA_FLAGS_ALLOC1, and it means to allocate
ID from 1.

/* ALLOC is for a normal 0-based alloc.  ALLOC1 is for an 1-based alloc */

>> +/*
>> + * VFIO_DEVICE_BIND_IOMMUFD - _IOR(VFIO_TYPE, VFIO_BASE + 19,
>> + *				   struct vfio_device_bind_iommufd)
>> + *
>> + * Bind a vfio_device to the specified iommufd and an ioas or a hardware
>> + * page table.
> 
> this is stale. BIND now is only about bind. No ioas.

Oops, yes.

>> + * VFIO_DEVICE_ATTACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE +
>> 20,
>> + *					struct
>> vfio_device_attach_iommufd_pt)
>> + *
>> + * Attach a vfio device to an iommufd address space specified by IOAS
>> + * id or hardware page table id.
>> + *
>> + * Available only after a device has been bound to iommufd via
>> + * VFIO_DEVICE_BIND_IOMMUFD
>> + *
>> + * Undo by passing pt_id == IOMMUFD_INVALID_ID
>> + *
>> + * @argsz:	user filled size of this data.
>> + * @flags:	must be 0.
>> + * @pt_id:	Input the target id, can be an ioas or a hwpt allocated
>> + *		via iommufd subsystem, and output the attached pt_id. It
>> + *		be the ioas, hwpt itself or an hwpt created by kernel
>> + *		during the attachment.
> 
> Input the target id which can represent an ioas or a hwpt allocated
> via iommufd subsystem. Output the attached hwpt id which could
> be the specified hwpt itself or a hwpt automatically created for the
> specified ioas by kernel during the attachment.

got it.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC 11/12] vfio: Add ioctls for device cdev iommufd
  2023-01-09 14:14     ` Jason Gunthorpe
@ 2023-01-09 15:07       ` Yi Liu
  2023-01-09 15:12         ` Jason Gunthorpe
  0 siblings, 1 reply; 44+ messages in thread
From: Yi Liu @ 2023-01-09 15:07 UTC (permalink / raw)
  To: Jason Gunthorpe, Tian, Kevin
  Cc: alex.williamson, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang

On 2023/1/9 22:14, Jason Gunthorpe wrote:
> On Mon, Jan 09, 2023 at 07:47:03AM +0000, Tian, Kevin wrote:
>>> From: Liu, Yi L <yi.l.liu@intel.com>
>>> Sent: Monday, December 19, 2022 4:47 PM
>>>
>>> @@ -415,7 +416,7 @@ static int vfio_device_first_open(struct
>>> vfio_device_file *df,
>>>   	if (!try_module_get(device->dev->driver->owner))
>>>   		return -ENODEV;
>>>
>>> -	if (iommufd)
>>> +	if (iommufd && !IS_ERR(iommufd))
>>>   		ret = vfio_iommufd_bind(device, iommufd, dev_id, pt_id);
>>>   	else
>>>   		ret = vfio_device_group_use_iommu(device);
>>
>> can you elaborate how noiommu actually works in the cdev path?
> 
> I still need someone to test the noiommu iommufd patch with dpdk, I'll
> post it in a minute

I remembered I had mentioned I found one guy to help. But he mentioned
to me he has got some trouble. I think it may be due to environment. Let
me check with him tomorrow about if any update.

> For cdev conversion no-iommu should be triggered by passing in -1 for
> the iommufd file descriptor to indicate there is no translation. Then
> the module parameter and security checks should be done before
> allowing the open to succeed with an identity translation in place.
> 
> There should be a check that there is no iommu driver controlling the
> device as well..

yes. I used ERR_PTR(-EINVAL) as an indicator of noiommu in this patch.
I admit this logic is incorrect. Should be

	if (iommufd) {
		if (IS_ERR(iommufd))
			ret = 0;
   		else
			ret = vfio_iommufd_bind(device, iommufd, dev_id, pt_id);
    	} else {
   		ret = vfio_device_group_use_iommu(device);
	}

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC 11/12] vfio: Add ioctls for device cdev iommufd
  2023-01-09 15:07       ` Yi Liu
@ 2023-01-09 15:12         ` Jason Gunthorpe
  2023-01-09 15:20           ` Yi Liu
  0 siblings, 1 reply; 44+ messages in thread
From: Jason Gunthorpe @ 2023-01-09 15:12 UTC (permalink / raw)
  To: Yi Liu
  Cc: Tian, Kevin, alex.williamson, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang

On Mon, Jan 09, 2023 at 11:07:06PM +0800, Yi Liu wrote:
> On 2023/1/9 22:14, Jason Gunthorpe wrote:
> > On Mon, Jan 09, 2023 at 07:47:03AM +0000, Tian, Kevin wrote:
> > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > Sent: Monday, December 19, 2022 4:47 PM
> > > > 
> > > > @@ -415,7 +416,7 @@ static int vfio_device_first_open(struct
> > > > vfio_device_file *df,
> > > >   	if (!try_module_get(device->dev->driver->owner))
> > > >   		return -ENODEV;
> > > > 
> > > > -	if (iommufd)
> > > > +	if (iommufd && !IS_ERR(iommufd))
> > > >   		ret = vfio_iommufd_bind(device, iommufd, dev_id, pt_id);
> > > >   	else
> > > >   		ret = vfio_device_group_use_iommu(device);
> > > 
> > > can you elaborate how noiommu actually works in the cdev path?
> > 
> > I still need someone to test the noiommu iommufd patch with dpdk, I'll
> > post it in a minute
> 
> I remembered I had mentioned I found one guy to help. But he mentioned
> to me he has got some trouble. I think it may be due to environment. Let
> me check with him tomorrow about if any update.
> 
> > For cdev conversion no-iommu should be triggered by passing in -1 for
> > the iommufd file descriptor to indicate there is no translation. Then
> > the module parameter and security checks should be done before
> > allowing the open to succeed with an identity translation in place.
> > 
> > There should be a check that there is no iommu driver controlling the
> > device as well..
> 
> yes. I used ERR_PTR(-EINVAL) as an indicator of noiommu in this patch.
> I admit this logic is incorrect. Should be

Oh please don't store ERR_PTRs in a structs..

Jason

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC 11/12] vfio: Add ioctls for device cdev iommufd
  2023-01-09 15:12         ` Jason Gunthorpe
@ 2023-01-09 15:20           ` Yi Liu
  0 siblings, 0 replies; 44+ messages in thread
From: Yi Liu @ 2023-01-09 15:20 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Tian, Kevin, alex.williamson, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang

On 2023/1/9 23:12, Jason Gunthorpe wrote:
> On Mon, Jan 09, 2023 at 11:07:06PM +0800, Yi Liu wrote:
>> On 2023/1/9 22:14, Jason Gunthorpe wrote:
>>> On Mon, Jan 09, 2023 at 07:47:03AM +0000, Tian, Kevin wrote:
>>>>> From: Liu, Yi L <yi.l.liu@intel.com>
>>>>> Sent: Monday, December 19, 2022 4:47 PM
>>>>>
>>>>> @@ -415,7 +416,7 @@ static int vfio_device_first_open(struct
>>>>> vfio_device_file *df,
>>>>>    	if (!try_module_get(device->dev->driver->owner))
>>>>>    		return -ENODEV;
>>>>>
>>>>> -	if (iommufd)
>>>>> +	if (iommufd && !IS_ERR(iommufd))
>>>>>    		ret = vfio_iommufd_bind(device, iommufd, dev_id, pt_id);
>>>>>    	else
>>>>>    		ret = vfio_device_group_use_iommu(device);
>>>>
>>>> can you elaborate how noiommu actually works in the cdev path?
>>>
>>> I still need someone to test the noiommu iommufd patch with dpdk, I'll
>>> post it in a minute
>>
>> I remembered I had mentioned I found one guy to help. But he mentioned
>> to me he has got some trouble. I think it may be due to environment. Let
>> me check with him tomorrow about if any update.
>>
>>> For cdev conversion no-iommu should be triggered by passing in -1 for
>>> the iommufd file descriptor to indicate there is no translation. Then
>>> the module parameter and security checks should be done before
>>> allowing the open to succeed with an identity translation in place.
>>>
>>> There should be a check that there is no iommu driver controlling the
>>> device as well..
>>
>> yes. I used ERR_PTR(-EINVAL) as an indicator of noiommu in this patch.
>> I admit this logic is incorrect. Should be
> 
> Oh please don't store ERR_PTRs in a structs..

okay. I need do it in other way. At first iommufd tells iommufd path
from the legacy group path. now, this is not enough. may need another
parameter to differ the iommufd normal path, iommufd noiommu path and
the legacy group path.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [RFC 08/12] vfio: Add infrastructure for bind_iommufd and attach
  2023-01-09 13:21     ` Jason Gunthorpe
@ 2023-01-10  2:53       ` Tian, Kevin
  0 siblings, 0 replies; 44+ messages in thread
From: Tian, Kevin @ 2023-01-10  2:53 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liu, Yi L, alex.williamson, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Monday, January 9, 2023 9:21 PM
> 
> On Mon, Jan 09, 2023 at 05:46:04AM +0000, Tian, Kevin wrote:
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Monday, December 19, 2022 4:47 PM
> > >
> > > This prepares to add ioctls for device cdev fd. This infrastructure includes:
> > >     - vfio_iommufd_bind() to accept pt_id, and also return back dev_id to
> > > caller.
> > >     - vfio_iommufd_attach() to support iommufd pgtable attach after
> > > bind_iommufd.
> >
> > Please mention that pt_id==-1 implies detach.
> 
> Oh, do we want that or a dedicated ioctl? -1 isn't a u32..
> 

looks my comment was incorrect. 'detach' is marked by pt_id==NULL
in vfio_iommufd_attach() while pt_id==0 in vfio ioctl. In the latter case
u32 is fine.

aside from that I don't have a strong preference on a dedicated ioctl.

would like to hear Alex's opinion.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [RFC 11/12] vfio: Add ioctls for device cdev iommufd
  2023-01-09 14:55     ` Yi Liu
@ 2023-01-10  2:57       ` Tian, Kevin
  0 siblings, 0 replies; 44+ messages in thread
From: Tian, Kevin @ 2023-01-10  2:57 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg
  Cc: cohuck, eric.auger, nicolinc, kvm, mjrosato, chao.p.peng,
	yi.y.sun, peterx, jasowang

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, January 9, 2023 10:55 PM
> 
> >> +	mutex_lock(&device->dev_set->lock);
> >> +	pt_id = attach.pt_id;
> >> +	ret = vfio_iommufd_attach(device,
> >> +				  pt_id != IOMMUFD_INVALID_ID ? &pt_id :
> >> NULL);
> >> +	if (ret)
> >> +		goto out_unlock;
> >> +
> >> +	if (pt_id != IOMMUFD_INVALID_ID) {
> >
> > it's clearer to use an 'attach' local variable
> 
> not quit get. We already have 'attach' in above lines.:-)

Probably a different name. I just meant a variable which is
more descriptive than above condition check (and duplicated
in two places)

> 
> >> diff --git a/include/uapi/linux/iommufd.h
> b/include/uapi/linux/iommufd.h
> >> index 98ebba80cfa1..87680274c01b 100644
> >> --- a/include/uapi/linux/iommufd.h
> >> +++ b/include/uapi/linux/iommufd.h
> >> @@ -9,6 +9,8 @@
> >>
> >>   #define IOMMUFD_TYPE (';')
> >>
> >> +#define IOMMUFD_INVALID_ID 0  /* valid ID starts from 1 */
> >
> > Can you point out where valid IDs are guaranteed to start
> > from 1?
> >
> > According to _iommufd_object_alloc() it uses xa_limit_32b as
> > limit which includes '0' as the lowest ID
> 
> xa_init_flags(&ictx->objects, XA_FLAGS_ALLOC1 | XA_FLAGS_ACCOUNT);
> 
> yes, but the xarray init uses XA_FLAGS_ALLOC1, and it means to allocate
> ID from 1.
> 
> /* ALLOC is for a normal 0-based alloc.  ALLOC1 is for an 1-based alloc */

You are right. I overlooked that flag.


^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2023-01-10  2:57 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-19  8:47 [RFC 00/12] Add vfio_device cdev for iommufd support Yi Liu
2022-12-19  8:47 ` [RFC 01/12] vfio: Allocate per device file structure Yi Liu
2022-12-21  3:57   ` Tian, Kevin
2022-12-21  6:46     ` Yi Liu
2022-12-19  8:47 ` [RFC 02/12] vfio: Refine vfio file kAPIs Yi Liu
2022-12-19  8:47 ` [RFC 03/12] vfio: Accept vfio device file in the driver facing kAPI Yi Liu
2022-12-21  4:07   ` Tian, Kevin
2022-12-21  7:02     ` Yi Liu
2023-01-04 18:25     ` Jason Gunthorpe
2023-01-09  4:13       ` Tian, Kevin
2022-12-19  8:47 ` [RFC 04/12] kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device fd Yi Liu
2022-12-19  8:47 ` [RFC 05/12] kvm/vfio: Accept vfio device file from userspace Yi Liu
2023-01-06 14:32   ` Jason Gunthorpe
2023-01-06 14:46     ` Yi Liu
2023-01-06 14:55       ` Jason Gunthorpe
2023-01-06 15:04         ` Yi Liu
2023-01-06 16:08           ` Jason Gunthorpe
2023-01-09  4:17           ` Tian, Kevin
2023-01-09  4:26             ` Yi Liu
2022-12-19  8:47 ` [RFC 06/12] vfio: Pass struct vfio_device_file * to vfio_device_open/close() Yi Liu
2022-12-21  4:10   ` Tian, Kevin
2022-12-21  7:04     ` Yi Liu
2022-12-19  8:47 ` [RFC 07/12] vfio: Block device access via device fd until device is opened Yi Liu
2022-12-21  4:18   ` Tian, Kevin
2023-01-04 20:47   ` Jason Gunthorpe
2022-12-19  8:47 ` [RFC 08/12] vfio: Add infrastructure for bind_iommufd and attach Yi Liu
2023-01-09  5:46   ` Tian, Kevin
2023-01-09 13:21     ` Jason Gunthorpe
2023-01-10  2:53       ` Tian, Kevin
2022-12-19  8:47 ` [RFC 09/12] vfio: Make vfio_device_open() exclusive between group path and device cdev path Yi Liu
2023-01-09  6:03   ` Tian, Kevin
2022-12-19  8:47 ` [RFC 10/12] vfio: Add cdev for vfio_device Yi Liu
2023-01-09  6:54   ` Tian, Kevin
2022-12-19  8:47 ` [RFC 11/12] vfio: Add ioctls for device cdev iommufd Yi Liu
2023-01-09  7:47   ` Tian, Kevin
2023-01-09 14:14     ` Jason Gunthorpe
2023-01-09 15:07       ` Yi Liu
2023-01-09 15:12         ` Jason Gunthorpe
2023-01-09 15:20           ` Yi Liu
2023-01-09 14:55     ` Yi Liu
2023-01-10  2:57       ` Tian, Kevin
2022-12-19  8:47 ` [RFC 12/12] vfio: Compile group optionally Yi Liu
2022-12-19  8:51 ` [RFC 00/12] Add vfio_device cdev for iommufd support Yi Liu
2023-01-04 15:23 ` Yi Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).