linux-s390.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 00/24] Add vfio_device cdev for iommufd support
@ 2023-03-27  9:40 Yi Liu
  2023-03-27  9:40 ` [PATCH v8 01/24] vfio: Allocate per device file structure Yi Liu
                   ` (26 more replies)
  0 siblings, 27 replies; 54+ messages in thread
From: Yi Liu @ 2023-03-27  9:40 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

Existing VFIO provides group-centric user APIs for userspace. Userspace
opens the /dev/vfio/$group_id first before getting device fd and hence
getting access to device. This is not the desired model for iommufd. Per
the conclusion of community discussion[1], iommufd provides device-centric
kAPIs and requires its consumer (like VFIO) to be device-centric user
APIs. Such user APIs are used to associate device with iommufd and also
the I/O address spaces managed by the iommufd.

This series first introduces a per device file structure to be prepared
for further enhancement and refactors the kvm-vfio code to be prepared
for accepting device file from userspace. Afte this, adds a mechanism for
blocking device access before iommufd bind. Then refactors the vfio to be
able to handle cdev path (e.g. iommufd binding, no-iommufd, [de]attach ioas).
This refactor includes making the device_open exclusive between group and
cdev path, only allow single device open in cdev path; vfio-iommufd code is
also refactored to support cdev. e.g. split the vfio_iommufd_bind() into
two steps. Eventually, adds the cdev support for vfio device and the new
ioctls, then makes group infrastructure optional as it is not needed when
vfio device cdev is compiled.

This series is based on some preparation works done to vfio emulated devices[2]
and vfio pci hot reset enhancements[3].

This series is a prerequisite for iommu nesting for vfio device[4] [5].

The complete code can be found in below branch, simple tests done to the
legacy group path and the cdev path. Draft QEMU branch can be found at[6]

https://github.com/yiliu1765/iommufd/tree/vfio_device_cdev_v8
(config CONFIG_IOMMUFD=y CONFIG_VFIO_DEVICE_CDEV=y)

base-commit: 1d412cdf6cd17c347b5398416a60518671e13d37

[1] https://lore.kernel.org/kvm/BN9PR11MB5433B1E4AE5B0480369F97178C189@BN9PR11MB5433.namprd11.prod.outlook.com/
[2] https://lore.kernel.org/kvm/20230327093351.44505-1-yi.l.liu@intel.com/
[3] https://lore.kernel.org/kvm/20230327093458.44939-1-yi.l.liu@intel.com/
[4] https://lore.kernel.org/linux-iommu/20230309080910.607396-1-yi.l.liu@intel.com/
[5] https://lore.kernel.org/linux-iommu/20230309082207.612346-1-yi.l.liu@intel.com/
[6] https://github.com/yiliu1765/qemu/tree/iommufd_rfcv3 (it is based on Eric's
    QEMU iommufd rfcv3 (https://lore.kernel.org/kvm/20230131205305.2726330-1-eric.auger@redhat.com/)
    plus commits to align with vfio_device_cdev v8)

Change log:

v8:
 - Add patch 18 to determine noiommu device at vfio_device registration (Jason)
 - Add patch 19 to name noiommu device with "noiommu-" prefix to be par with
   group path
 - Add r-b from Kevin
 - Add t-b from Terrence

v7: https://lore.kernel.org/kvm/20230316125534.17216-1-yi.l.liu@intel.com/
 - Split the vfio-pci hot reset changes to be separate patch series (Jason, Kevin)
 - More polish on no-iommufd support (patch 11 - 13) in cdev path (Kevin)
 - iommufd_access_detach() in patch 16 is added by Nic for emulated devices (Kevin, Jason)

v6: https://lore.kernel.org/kvm/20230308132903.465159-1-yi.l.liu@intel.com/#t
 - Add r-b from Jason on patch 01 - 08 and 13 in v5
 - Based on the prerequisite mini-series which makes vfio emulated devices
   be prepared to cdev (Jason)
 - Add the approach to pass a set of device fds to do hot reset ownership
   check, while the zero-length array approach is also kept. (Jason, Kevin, Alex)
 - Drop patch 10 of v5, it is reworked by patch 13 and 17 in v6 (Jason)
 - Store vfio_group pointer in vfio_device_file to check if user is using
   legacy vfio container (Jason)
 - Drop the is_cdev_device flag (introduced in patch 14 of v5) as the group
   pointer stored in vfio_device_file can cover it.
 - Add iommu_group check in the cdev no-iommu path patch 24 (Kevin)
 - Add t-b from Terrence, Nicolin and Matthew (thanks for the help, some patches
   are new in this version, so I just added t-b to the patches that are also
   in v5 and no big change, for others would add in this version).

v5: https://lore.kernel.org/kvm/20230227111135.61728-1-yi.l.liu@intel.com/
 - Add r-b from Kevin on patch 08, 13, 14, 15 and 17.
 - Rename patch 02 to limit the change for KVM facing kAPIs. The vfio pci
   hot reset path only accepts group file until patch 09. (Kevin)
 - Update comment around smp_load_acquire(&df->access_granted) (Yan)
 - Adopt Jason's suggestion on the vfio pci hot reset path, passing zero-length
   fd array to indicate using bound iommufd_ctx as ownership check. (Jason, Kevin)
 - Direct read df->access_granted value in vfio_device_cdev_close() (Kevin, Yan, Jason)
 - Wrap the iommufd get/put into a helper to refine the error path of
   vfio_device_ioctl_bind_iommufd(). (Yan)

v4: https://lore.kernel.org/kvm/20230221034812.138051-1-yi.l.liu@intel.com/
 - Add r-b from Kevin on patch 09/10
 - Add a line in devices/vfio.rst to emphasize user should add group/device to
   KVM prior to invoke open_device op which may be called in the VFIO_GROUP_GET_DEVICE_FD
   or VFIO_DEVICE_BIND_IOMMUFD ioctl.
 - Modify VFIO_GROUP/VFIO_DEVICE_CDEV Kconfig dependency (Alex)
 - Select VFIO_GROUP for SPAPR (Jason)
 - Check device fully-opened in PCI hotreset path for device fd (Jason)
 - Set df->access_granted in the caller of vfio_device_open() since
   the caller may fail in other operations, but df->access_granted
   does not allow a true to false change. So it should be set only when
   the open path is really done successfully. (Yan, Kevin)
 - Fix missing iommufd_ctx_put() in the cdev path (Yan)
 - Fix an issue found in testing exclusion between group and cdev path.
   vfio_device_cdev_close() should check df->access_granted before heading
   to other operations.
 - Update vfio.rst for iommufd/cdev

v3: https://lore.kernel.org/kvm/20230213151348.56451-1-yi.l.liu@intel.com/
 - Add r-b from Kevin on patch 03, 06, 07, 08.
 - Refine the group and cdev path exclusion. Remove vfio_device:single_open;
   add vfio_group::cdev_device_open_cnt to achieve exlucsion between group
   path and cdev path (Kevin, Jason)
 - Fix a bug in the error handling path (Yan Zhao)
 - Address misc remarks from Kevin

v2: https://lore.kernel.org/kvm/20230206090532.95598-1-yi.l.liu@intel.com/
 - Add r-b from Kevin and Eric on patch 01 02 04.
 - "Split kvm/vfio: Provide struct kvm_device_ops::release() insted of ::destroy()"
   from this series and got applied. (Alex, Kevin, Jason, Mathhew)
 - Add kvm_ref_lock to protect vfio_device_file->kvm instead of reusing
   dev_set->lock as dead-lock is observed with vfio-ap which would try to
   acquire kvm_lock. This is opposite lock order with kvm_device_release()
   which holds kvm_lock first and then hold dev_set->lock. (Kevin)
 - Use a separate ioctl for detaching IOAS. (Alex)
 - Rename vfio_device_file::single_open to be is_cdev_device (Kevin, Alex)
 - Move the vfio device cdev code into device_cdev.c and add a VFIO_DEVICE_CDEV
   kconfig for it. (Kevin, Jason)

v1: https://lore.kernel.org/kvm/20230117134942.101112-1-yi.l.liu@intel.com/
 - Fix the circular refcount between kvm struct and device file reference. (JasonG)
 - Address comments from KevinT
 - Remained the ioctl for detach, needs to Alex's taste
   (https://lore.kernel.org/kvm/BN9PR11MB5276BE9F4B0613EE859317028CFF9@BN9PR11MB5276.namprd11.prod.outlook.com/)

rfc: https://lore.kernel.org/kvm/20221219084718.9342-1-yi.l.liu@intel.com/

Thanks,
	Yi Liu

Nicolin Chen (1):
  iommufd/device: Add iommufd_access_detach() API

Yi Liu (23):
  vfio: Allocate per device file structure
  vfio: Refine vfio file kAPIs for KVM
  vfio: Remove vfio_file_is_group()
  vfio: Accept vfio device file in the KVM facing kAPI
  kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device
    fd
  kvm/vfio: Accept vfio device file from userspace
  vfio: Pass struct vfio_device_file * to vfio_device_open/close()
  vfio: Block device access via device fd until device is opened
  vfio: Add cdev_device_open_cnt to vfio_group
  vfio: Make vfio_device_open() single open for device cdev path
  vfio: Make vfio_device_first_open() to accept NULL iommufd for noiommu
  vfio-iommufd: Move noiommu support out of vfio_iommufd_bind()
  vfio-iommufd: Split bind/attach into two steps
  vfio: Record devid in vfio_device_file
  vfio-iommufd: Add detach_ioas support for physical VFIO devices
  vfio-iommufd: Add detach_ioas support for emulated VFIO devices
  vfio: Determine noiommu in vfio_device registration
  vfio: Name noiommu vfio_device with "noiommu-" prefix
  vfio: Add cdev for vfio_device
  vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT
  vfio: Compile group optionally
  docs: vfio: Add vfio device cdev description

 Documentation/driver-api/vfio.rst             | 135 +++++++-
 Documentation/virt/kvm/devices/vfio.rst       |  53 +++-
 drivers/gpu/drm/i915/gvt/kvmgt.c              |   1 +
 drivers/iommu/iommufd/Kconfig                 |   4 +-
 drivers/iommu/iommufd/device.c                |  75 ++++-
 drivers/iommu/iommufd/iommufd_private.h       |   2 +
 drivers/s390/cio/vfio_ccw_ops.c               |   1 +
 drivers/s390/crypto/vfio_ap_ops.c             |   1 +
 drivers/vfio/Kconfig                          |  27 +-
 drivers/vfio/Makefile                         |   3 +-
 drivers/vfio/device_cdev.c                    | 300 ++++++++++++++++++
 drivers/vfio/fsl-mc/vfio_fsl_mc.c             |   1 +
 drivers/vfio/group.c                          | 154 ++++++---
 drivers/vfio/iommufd.c                        | 104 +++---
 .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    |   2 +
 drivers/vfio/pci/mlx5/main.c                  |   1 +
 drivers/vfio/pci/vfio_pci.c                   |   1 +
 drivers/vfio/platform/vfio_amba.c             |   1 +
 drivers/vfio/platform/vfio_platform.c         |   1 +
 drivers/vfio/vfio.h                           | 245 ++++++++++++--
 drivers/vfio/vfio_main.c                      | 236 +++++++++++---
 include/linux/iommufd.h                       |   1 +
 include/linux/vfio.h                          |  30 +-
 include/uapi/linux/kvm.h                      |  16 +-
 include/uapi/linux/vfio.h                     |  89 ++++++
 samples/vfio-mdev/mbochs.c                    |   1 +
 samples/vfio-mdev/mdpy.c                      |   1 +
 samples/vfio-mdev/mtty.c                      |   1 +
 virt/kvm/vfio.c                               | 141 ++++----
 29 files changed, 1376 insertions(+), 252 deletions(-)
 create mode 100644 drivers/vfio/device_cdev.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v8 01/24] vfio: Allocate per device file structure
  2023-03-27  9:40 [PATCH v8 00/24] Add vfio_device cdev for iommufd support Yi Liu
@ 2023-03-27  9:40 ` Yi Liu
  2023-03-27  9:40 ` [PATCH v8 02/24] vfio: Refine vfio file kAPIs for KVM Yi Liu
                   ` (25 subsequent siblings)
  26 siblings, 0 replies; 54+ messages in thread
From: Yi Liu @ 2023-03-27  9:40 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

This is preparation for adding vfio device cdev support. vfio device
cdev requires:
1) A per device file memory to store the kvm pointer set by KVM. It will
   be propagated to vfio_device:kvm after the device cdev file is bound
   to an iommufd.
2) A mechanism to block device access through device cdev fd before it
   is bound to an iommufd.

To address above requirements, this adds a per device file structure
named vfio_device_file. For now, it's only a wrapper of struct vfio_device
pointer. Other fields will be added to this per file structure in future
commits.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c     | 13 +++++++++++--
 drivers/vfio/vfio.h      |  6 ++++++
 drivers/vfio/vfio_main.c | 35 ++++++++++++++++++++++++++++-------
 3 files changed, 45 insertions(+), 9 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index d0c95d033605..8a13cea43f49 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -218,19 +218,26 @@ void vfio_device_group_close(struct vfio_device *device)
 
 static struct file *vfio_device_open_file(struct vfio_device *device)
 {
+	struct vfio_device_file *df;
 	struct file *filep;
 	int ret;
 
+	df = vfio_allocate_device_file(device);
+	if (IS_ERR(df)) {
+		ret = PTR_ERR(df);
+		goto err_out;
+	}
+
 	ret = vfio_device_group_open(device);
 	if (ret)
-		goto err_out;
+		goto err_free;
 
 	/*
 	 * We can't use anon_inode_getfd() because we need to modify
 	 * the f_mode flags directly to allow more than just ioctls
 	 */
 	filep = anon_inode_getfile("[vfio-device]", &vfio_device_fops,
-				   device, O_RDWR);
+				   df, O_RDWR);
 	if (IS_ERR(filep)) {
 		ret = PTR_ERR(filep);
 		goto err_close_device;
@@ -254,6 +261,8 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
 
 err_close_device:
 	vfio_device_group_close(device);
+err_free:
+	kfree(df);
 err_out:
 	return ERR_PTR(ret);
 }
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index c0aeea24fbd6..250fbd3786c5 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -16,11 +16,17 @@ struct iommufd_ctx;
 struct iommu_group;
 struct vfio_container;
 
+struct vfio_device_file {
+	struct vfio_device *device;
+};
+
 void vfio_device_put_registration(struct vfio_device *device);
 bool vfio_device_try_get_registration(struct vfio_device *device);
 int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd);
 void vfio_device_close(struct vfio_device *device,
 		       struct iommufd_ctx *iommufd);
+struct vfio_device_file *
+vfio_allocate_device_file(struct vfio_device *device);
 
 extern const struct file_operations vfio_device_fops;
 
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index ebbb6b91a498..89722bf87edc 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -404,6 +404,20 @@ static bool vfio_assert_device_open(struct vfio_device *device)
 	return !WARN_ON_ONCE(!READ_ONCE(device->open_count));
 }
 
+struct vfio_device_file *
+vfio_allocate_device_file(struct vfio_device *device)
+{
+	struct vfio_device_file *df;
+
+	df = kzalloc(sizeof(*df), GFP_KERNEL_ACCOUNT);
+	if (!df)
+		return ERR_PTR(-ENOMEM);
+
+	df->device = device;
+
+	return df;
+}
+
 static int vfio_device_first_open(struct vfio_device *device,
 				  struct iommufd_ctx *iommufd)
 {
@@ -517,12 +531,15 @@ static inline void vfio_device_pm_runtime_put(struct vfio_device *device)
  */
 static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 
 	vfio_device_group_close(device);
 
 	vfio_device_put_registration(device);
 
+	kfree(df);
+
 	return 0;
 }
 
@@ -1087,7 +1104,8 @@ static int vfio_ioctl_device_feature(struct vfio_device *device,
 static long vfio_device_fops_unl_ioctl(struct file *filep,
 				       unsigned int cmd, unsigned long arg)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 	int ret;
 
 	ret = vfio_device_pm_runtime_get(device);
@@ -1114,7 +1132,8 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
 static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf,
 				     size_t count, loff_t *ppos)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 
 	if (unlikely(!device->ops->read))
 		return -EINVAL;
@@ -1126,7 +1145,8 @@ static ssize_t vfio_device_fops_write(struct file *filep,
 				      const char __user *buf,
 				      size_t count, loff_t *ppos)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 
 	if (unlikely(!device->ops->write))
 		return -EINVAL;
@@ -1136,7 +1156,8 @@ static ssize_t vfio_device_fops_write(struct file *filep,
 
 static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 
 	if (unlikely(!device->ops->mmap))
 		return -EINVAL;
@@ -1156,11 +1177,11 @@ const struct file_operations vfio_device_fops = {
 
 static struct vfio_device *vfio_device_from_file(struct file *file)
 {
-	struct vfio_device *device = file->private_data;
+	struct vfio_device_file *df = file->private_data;
 
 	if (file->f_op != &vfio_device_fops)
 		return NULL;
-	return device;
+	return df->device;
 }
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 02/24] vfio: Refine vfio file kAPIs for KVM
  2023-03-27  9:40 [PATCH v8 00/24] Add vfio_device cdev for iommufd support Yi Liu
  2023-03-27  9:40 ` [PATCH v8 01/24] vfio: Allocate per device file structure Yi Liu
@ 2023-03-27  9:40 ` Yi Liu
  2023-03-27  9:40 ` [PATCH v8 03/24] vfio: Remove vfio_file_is_group() Yi Liu
                   ` (24 subsequent siblings)
  26 siblings, 0 replies; 54+ messages in thread
From: Yi Liu @ 2023-03-27  9:40 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

This prepares for making the below kAPIs to accept both group file
and device file instead of only vfio group file.

  bool vfio_file_enforced_coherent(struct file *file);
  void vfio_file_set_kvm(struct file *file, struct kvm *kvm);

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c     | 36 ++++++---------------------------
 drivers/vfio/vfio.h      |  2 ++
 drivers/vfio/vfio_main.c | 43 ++++++++++++++++++++++++++++++++++++++++
 virt/kvm/vfio.c          | 10 +++++-----
 4 files changed, 56 insertions(+), 35 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 8a13cea43f49..ede4723c5f72 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -802,24 +802,11 @@ bool vfio_file_is_group(struct file *file)
 }
 EXPORT_SYMBOL_GPL(vfio_file_is_group);
 
-/**
- * vfio_file_enforced_coherent - True if the DMA associated with the VFIO file
- *        is always CPU cache coherent
- * @file: VFIO group file
- *
- * Enforced coherency means that the IOMMU ignores things like the PCIe no-snoop
- * bit in DMA transactions. A return of false indicates that the user has
- * rights to access additional instructions such as wbinvd on x86.
- */
-bool vfio_file_enforced_coherent(struct file *file)
+bool vfio_group_enforced_coherent(struct vfio_group *group)
 {
-	struct vfio_group *group = file->private_data;
 	struct vfio_device *device;
 	bool ret = true;
 
-	if (!vfio_file_is_group(file))
-		return true;
-
 	/*
 	 * If the device does not have IOMMU_CAP_ENFORCE_CACHE_COHERENCY then
 	 * any domain later attached to it will also not support it. If the cap
@@ -837,28 +824,17 @@ bool vfio_file_enforced_coherent(struct file *file)
 	mutex_unlock(&group->device_lock);
 	return ret;
 }
-EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
 
-/**
- * vfio_file_set_kvm - Link a kvm with VFIO drivers
- * @file: VFIO group file
- * @kvm: KVM to link
- *
- * When a VFIO device is first opened the KVM will be available in
- * device->kvm if one was associated with the group.
- */
-void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
+void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm)
 {
-	struct vfio_group *group = file->private_data;
-
-	if (!vfio_file_is_group(file))
-		return;
-
+	/*
+	 * When a VFIO device is first opened the KVM will be available in
+	 * device->kvm if one was associated with the group.
+	 */
 	spin_lock(&group->kvm_ref_lock);
 	group->kvm = kvm;
 	spin_unlock(&group->kvm_ref_lock);
 }
-EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
 
 bool vfio_group_has_dev(struct vfio_group *group, struct vfio_device *device)
 {
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 250fbd3786c5..56ad127ac618 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -92,6 +92,8 @@ void vfio_device_group_unuse_iommu(struct vfio_device *device);
 void vfio_device_group_close(struct vfio_device *device);
 struct vfio_group *vfio_group_from_file(struct file *file);
 bool vfio_group_has_dev(struct vfio_group *group, struct vfio_device *device);
+bool vfio_group_enforced_coherent(struct vfio_group *group);
+void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm);
 bool vfio_device_has_container(struct vfio_device *device);
 int __init vfio_group_init(void);
 void vfio_group_cleanup(void);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 89722bf87edc..748bde4d74d9 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1219,6 +1219,49 @@ bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
 }
 EXPORT_SYMBOL_GPL(vfio_file_has_dev);
 
+/**
+ * vfio_file_enforced_coherent - True if the DMA associated with the VFIO file
+ *        is always CPU cache coherent
+ * @file: VFIO group file or VFIO device file
+ *
+ * Enforced coherency means that the IOMMU ignores things like the PCIe no-snoop
+ * bit in DMA transactions. A return of false indicates that the user has
+ * rights to access additional instructions such as wbinvd on x86.
+ */
+bool vfio_file_enforced_coherent(struct file *file)
+{
+	struct vfio_group *group;
+	struct vfio_device *device;
+
+	group = vfio_group_from_file(file);
+	if (group)
+		return vfio_group_enforced_coherent(group);
+
+	device = vfio_device_from_file(file);
+	if (device)
+		return device_iommu_capable(device->dev,
+					    IOMMU_CAP_ENFORCE_CACHE_COHERENCY);
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
+
+/**
+ * vfio_file_set_kvm - Link a kvm with VFIO drivers
+ * @file: VFIO group file or VFIO device file
+ * @kvm: KVM to link
+ *
+ */
+void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
+{
+	struct vfio_group *group;
+
+	group = vfio_group_from_file(file);
+	if (group)
+		vfio_group_set_kvm(group, kvm);
+}
+EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
+
 /*
  * Sub-module support
  */
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 9584eb57e0ed..8bac308ba630 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -64,18 +64,18 @@ static bool kvm_vfio_file_enforced_coherent(struct file *file)
 	return ret;
 }
 
-static bool kvm_vfio_file_is_group(struct file *file)
+static bool kvm_vfio_file_is_valid(struct file *file)
 {
 	bool (*fn)(struct file *file);
 	bool ret;
 
-	fn = symbol_get(vfio_file_is_group);
+	fn = symbol_get(vfio_file_is_valid);
 	if (!fn)
 		return false;
 
 	ret = fn(file);
 
-	symbol_put(vfio_file_is_group);
+	symbol_put(vfio_file_is_valid);
 
 	return ret;
 }
@@ -154,8 +154,8 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
 	if (!filp)
 		return -EBADF;
 
-	/* Ensure the FD is a vfio group FD.*/
-	if (!kvm_vfio_file_is_group(filp)) {
+	/* Ensure the FD is a vfio FD.*/
+	if (!kvm_vfio_file_is_valid(filp)) {
 		ret = -EINVAL;
 		goto err_fput;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 03/24] vfio: Remove vfio_file_is_group()
  2023-03-27  9:40 [PATCH v8 00/24] Add vfio_device cdev for iommufd support Yi Liu
  2023-03-27  9:40 ` [PATCH v8 01/24] vfio: Allocate per device file structure Yi Liu
  2023-03-27  9:40 ` [PATCH v8 02/24] vfio: Refine vfio file kAPIs for KVM Yi Liu
@ 2023-03-27  9:40 ` Yi Liu
  2023-03-30 23:50   ` Jason Gunthorpe
  2023-03-27  9:40 ` [PATCH v8 04/24] vfio: Accept vfio device file in the KVM facing kAPI Yi Liu
                   ` (23 subsequent siblings)
  26 siblings, 1 reply; 54+ messages in thread
From: Yi Liu @ 2023-03-27  9:40 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

since no user of vfio_file_is_group() now.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c | 10 ----------
 include/linux/vfio.h |  1 -
 2 files changed, 11 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index ede4723c5f72..4f937ebaf6f7 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -792,16 +792,6 @@ struct iommu_group *vfio_file_iommu_group(struct file *file)
 }
 EXPORT_SYMBOL_GPL(vfio_file_iommu_group);
 
-/**
- * vfio_file_is_group - True if the file is a vfio group file
- * @file: VFIO group file
- */
-bool vfio_file_is_group(struct file *file)
-{
-	return vfio_group_from_file(file);
-}
-EXPORT_SYMBOL_GPL(vfio_file_is_group);
-
 bool vfio_group_enforced_coherent(struct vfio_group *group)
 {
 	struct vfio_device *device;
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 79c47733ae0d..3b55dd71299b 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -257,7 +257,6 @@ int vfio_mig_get_next_state(struct vfio_device *device,
  * External user API
  */
 struct iommu_group *vfio_file_iommu_group(struct file *file);
-bool vfio_file_is_group(struct file *file);
 bool vfio_file_is_valid(struct file *file);
 bool vfio_file_enforced_coherent(struct file *file);
 void vfio_file_set_kvm(struct file *file, struct kvm *kvm);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 04/24] vfio: Accept vfio device file in the KVM facing kAPI
  2023-03-27  9:40 [PATCH v8 00/24] Add vfio_device cdev for iommufd support Yi Liu
                   ` (2 preceding siblings ...)
  2023-03-27  9:40 ` [PATCH v8 03/24] vfio: Remove vfio_file_is_group() Yi Liu
@ 2023-03-27  9:40 ` Yi Liu
  2023-03-27  9:40 ` [PATCH v8 05/24] kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device fd Yi Liu
                   ` (22 subsequent siblings)
  26 siblings, 0 replies; 54+ messages in thread
From: Yi Liu @ 2023-03-27  9:40 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

This makes the vfio file kAPIs to accept vfio device files, also a
preparation for vfio device cdev support.

For the kvm set with vfio device file, kvm pointer is stored in struct
vfio_device_file, and use kvm_ref_lock to protect kvm set and kvm
pointer usage within VFIO. This kvm pointer will be set to vfio_device
after device file is bound to iommufd in the cdev path.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/vfio.h      |  2 ++
 drivers/vfio/vfio_main.c | 18 ++++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 56ad127ac618..e4672d91a6f7 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -18,6 +18,8 @@ struct vfio_container;
 
 struct vfio_device_file {
 	struct vfio_device *device;
+	spinlock_t kvm_ref_lock; /* protect kvm field */
+	struct kvm *kvm;
 };
 
 void vfio_device_put_registration(struct vfio_device *device);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 748bde4d74d9..cb543791b28b 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -414,6 +414,7 @@ vfio_allocate_device_file(struct vfio_device *device)
 		return ERR_PTR(-ENOMEM);
 
 	df->device = device;
+	spin_lock_init(&df->kvm_ref_lock);
 
 	return df;
 }
@@ -1246,6 +1247,20 @@ bool vfio_file_enforced_coherent(struct file *file)
 }
 EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
 
+static void vfio_device_file_set_kvm(struct file *file, struct kvm *kvm)
+{
+	struct vfio_device_file *df = file->private_data;
+
+	/*
+	 * The kvm is first recorded in the vfio_device_file, and will
+	 * be propagated to vfio_device::kvm when the file is bound to
+	 * iommufd successfully in the vfio device cdev path.
+	 */
+	spin_lock(&df->kvm_ref_lock);
+	df->kvm = kvm;
+	spin_unlock(&df->kvm_ref_lock);
+}
+
 /**
  * vfio_file_set_kvm - Link a kvm with VFIO drivers
  * @file: VFIO group file or VFIO device file
@@ -1259,6 +1274,9 @@ void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
 	group = vfio_group_from_file(file);
 	if (group)
 		vfio_group_set_kvm(group, kvm);
+
+	if (vfio_device_from_file(file))
+		vfio_device_file_set_kvm(file, kvm);
 }
 EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 05/24] kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device fd
  2023-03-27  9:40 [PATCH v8 00/24] Add vfio_device cdev for iommufd support Yi Liu
                   ` (3 preceding siblings ...)
  2023-03-27  9:40 ` [PATCH v8 04/24] vfio: Accept vfio device file in the KVM facing kAPI Yi Liu
@ 2023-03-27  9:40 ` Yi Liu
  2023-03-27  9:40 ` [PATCH v8 06/24] kvm/vfio: Accept vfio device file from userspace Yi Liu
                   ` (21 subsequent siblings)
  26 siblings, 0 replies; 54+ messages in thread
From: Yi Liu @ 2023-03-27  9:40 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

Meanwhile, rename related helpers. No functional change is intended.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 virt/kvm/vfio.c | 115 ++++++++++++++++++++++++------------------------
 1 file changed, 58 insertions(+), 57 deletions(-)

diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 8bac308ba630..857d6ba349e1 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -21,7 +21,7 @@
 #include <asm/kvm_ppc.h>
 #endif
 
-struct kvm_vfio_group {
+struct kvm_vfio_file {
 	struct list_head node;
 	struct file *file;
 #ifdef CONFIG_SPAPR_TCE_IOMMU
@@ -30,7 +30,7 @@ struct kvm_vfio_group {
 };
 
 struct kvm_vfio {
-	struct list_head group_list;
+	struct list_head file_list;
 	struct mutex lock;
 	bool noncoherent;
 };
@@ -98,34 +98,35 @@ static struct iommu_group *kvm_vfio_file_iommu_group(struct file *file)
 }
 
 static void kvm_spapr_tce_release_vfio_group(struct kvm *kvm,
-					     struct kvm_vfio_group *kvg)
+					     struct kvm_vfio_file *kvf)
 {
-	if (WARN_ON_ONCE(!kvg->iommu_group))
+	if (WARN_ON_ONCE(!kvf->iommu_group))
 		return;
 
-	kvm_spapr_tce_release_iommu_group(kvm, kvg->iommu_group);
-	iommu_group_put(kvg->iommu_group);
-	kvg->iommu_group = NULL;
+	kvm_spapr_tce_release_iommu_group(kvm, kvf->iommu_group);
+	iommu_group_put(kvf->iommu_group);
+	kvf->iommu_group = NULL;
 }
 #endif
 
 /*
- * Groups can use the same or different IOMMU domains.  If the same then
- * adding a new group may change the coherency of groups we've previously
- * been told about.  We don't want to care about any of that so we retest
- * each group and bail as soon as we find one that's noncoherent.  This
- * means we only ever [un]register_noncoherent_dma once for the whole device.
+ * Groups/devices can use the same or different IOMMU domains. If the same
+ * then adding a new group/device may change the coherency of groups/devices
+ * we've previously been told about. We don't want to care about any of
+ * that so we retest each group/device and bail as soon as we find one that's
+ * noncoherent.  This means we only ever [un]register_noncoherent_dma once
+ * for the whole device.
  */
 static void kvm_vfio_update_coherency(struct kvm_device *dev)
 {
 	struct kvm_vfio *kv = dev->private;
 	bool noncoherent = false;
-	struct kvm_vfio_group *kvg;
+	struct kvm_vfio_file *kvf;
 
 	mutex_lock(&kv->lock);
 
-	list_for_each_entry(kvg, &kv->group_list, node) {
-		if (!kvm_vfio_file_enforced_coherent(kvg->file)) {
+	list_for_each_entry(kvf, &kv->file_list, node) {
+		if (!kvm_vfio_file_enforced_coherent(kvf->file)) {
 			noncoherent = true;
 			break;
 		}
@@ -143,10 +144,10 @@ static void kvm_vfio_update_coherency(struct kvm_device *dev)
 	mutex_unlock(&kv->lock);
 }
 
-static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
+static int kvm_vfio_file_add(struct kvm_device *dev, unsigned int fd)
 {
 	struct kvm_vfio *kv = dev->private;
-	struct kvm_vfio_group *kvg;
+	struct kvm_vfio_file *kvf;
 	struct file *filp;
 	int ret;
 
@@ -162,27 +163,27 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
 
 	mutex_lock(&kv->lock);
 
-	list_for_each_entry(kvg, &kv->group_list, node) {
-		if (kvg->file == filp) {
+	list_for_each_entry(kvf, &kv->file_list, node) {
+		if (kvf->file == filp) {
 			ret = -EEXIST;
 			goto err_unlock;
 		}
 	}
 
-	kvg = kzalloc(sizeof(*kvg), GFP_KERNEL_ACCOUNT);
-	if (!kvg) {
+	kvf = kzalloc(sizeof(*kvf), GFP_KERNEL_ACCOUNT);
+	if (!kvf) {
 		ret = -ENOMEM;
 		goto err_unlock;
 	}
 
-	kvg->file = filp;
-	list_add_tail(&kvg->node, &kv->group_list);
+	kvf->file = filp;
+	list_add_tail(&kvf->node, &kv->file_list);
 
 	kvm_arch_start_assignment(dev->kvm);
 
 	mutex_unlock(&kv->lock);
 
-	kvm_vfio_file_set_kvm(kvg->file, dev->kvm);
+	kvm_vfio_file_set_kvm(kvf->file, dev->kvm);
 	kvm_vfio_update_coherency(dev);
 
 	return 0;
@@ -193,10 +194,10 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
 	return ret;
 }
 
-static int kvm_vfio_group_del(struct kvm_device *dev, unsigned int fd)
+static int kvm_vfio_file_del(struct kvm_device *dev, unsigned int fd)
 {
 	struct kvm_vfio *kv = dev->private;
-	struct kvm_vfio_group *kvg;
+	struct kvm_vfio_file *kvf;
 	struct fd f;
 	int ret;
 
@@ -208,18 +209,18 @@ static int kvm_vfio_group_del(struct kvm_device *dev, unsigned int fd)
 
 	mutex_lock(&kv->lock);
 
-	list_for_each_entry(kvg, &kv->group_list, node) {
-		if (kvg->file != f.file)
+	list_for_each_entry(kvf, &kv->file_list, node) {
+		if (kvf->file != f.file)
 			continue;
 
-		list_del(&kvg->node);
+		list_del(&kvf->node);
 		kvm_arch_end_assignment(dev->kvm);
 #ifdef CONFIG_SPAPR_TCE_IOMMU
-		kvm_spapr_tce_release_vfio_group(dev->kvm, kvg);
+		kvm_spapr_tce_release_vfio_group(dev->kvm, kvf);
 #endif
-		kvm_vfio_file_set_kvm(kvg->file, NULL);
-		fput(kvg->file);
-		kfree(kvg);
+		kvm_vfio_file_set_kvm(kvf->file, NULL);
+		fput(kvf->file);
+		kfree(kvf);
 		ret = 0;
 		break;
 	}
@@ -234,12 +235,12 @@ static int kvm_vfio_group_del(struct kvm_device *dev, unsigned int fd)
 }
 
 #ifdef CONFIG_SPAPR_TCE_IOMMU
-static int kvm_vfio_group_set_spapr_tce(struct kvm_device *dev,
-					void __user *arg)
+static int kvm_vfio_file_set_spapr_tce(struct kvm_device *dev,
+				       void __user *arg)
 {
 	struct kvm_vfio_spapr_tce param;
 	struct kvm_vfio *kv = dev->private;
-	struct kvm_vfio_group *kvg;
+	struct kvm_vfio_file *kvf;
 	struct fd f;
 	int ret;
 
@@ -254,20 +255,20 @@ static int kvm_vfio_group_set_spapr_tce(struct kvm_device *dev,
 
 	mutex_lock(&kv->lock);
 
-	list_for_each_entry(kvg, &kv->group_list, node) {
-		if (kvg->file != f.file)
+	list_for_each_entry(kvf, &kv->file_list, node) {
+		if (kvf->file != f.file)
 			continue;
 
-		if (!kvg->iommu_group) {
-			kvg->iommu_group = kvm_vfio_file_iommu_group(kvg->file);
-			if (WARN_ON_ONCE(!kvg->iommu_group)) {
+		if (!kvf->iommu_group) {
+			kvf->iommu_group = kvm_vfio_file_iommu_group(kvf->file);
+			if (WARN_ON_ONCE(!kvf->iommu_group)) {
 				ret = -EIO;
 				goto err_fdput;
 			}
 		}
 
 		ret = kvm_spapr_tce_attach_iommu_group(dev->kvm, param.tablefd,
-						       kvg->iommu_group);
+						       kvf->iommu_group);
 		break;
 	}
 
@@ -278,8 +279,8 @@ static int kvm_vfio_group_set_spapr_tce(struct kvm_device *dev,
 }
 #endif
 
-static int kvm_vfio_set_group(struct kvm_device *dev, long attr,
-			      void __user *arg)
+static int kvm_vfio_set_file(struct kvm_device *dev, long attr,
+			     void __user *arg)
 {
 	int32_t __user *argp = arg;
 	int32_t fd;
@@ -288,16 +289,16 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr,
 	case KVM_DEV_VFIO_GROUP_ADD:
 		if (get_user(fd, argp))
 			return -EFAULT;
-		return kvm_vfio_group_add(dev, fd);
+		return kvm_vfio_file_add(dev, fd);
 
 	case KVM_DEV_VFIO_GROUP_DEL:
 		if (get_user(fd, argp))
 			return -EFAULT;
-		return kvm_vfio_group_del(dev, fd);
+		return kvm_vfio_file_del(dev, fd);
 
 #ifdef CONFIG_SPAPR_TCE_IOMMU
 	case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE:
-		return kvm_vfio_group_set_spapr_tce(dev, arg);
+		return kvm_vfio_file_set_spapr_tce(dev, arg);
 #endif
 	}
 
@@ -309,8 +310,8 @@ static int kvm_vfio_set_attr(struct kvm_device *dev,
 {
 	switch (attr->group) {
 	case KVM_DEV_VFIO_GROUP:
-		return kvm_vfio_set_group(dev, attr->attr,
-					  u64_to_user_ptr(attr->addr));
+		return kvm_vfio_set_file(dev, attr->attr,
+					 u64_to_user_ptr(attr->addr));
 	}
 
 	return -ENXIO;
@@ -339,16 +340,16 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
 static void kvm_vfio_release(struct kvm_device *dev)
 {
 	struct kvm_vfio *kv = dev->private;
-	struct kvm_vfio_group *kvg, *tmp;
+	struct kvm_vfio_file *kvf, *tmp;
 
-	list_for_each_entry_safe(kvg, tmp, &kv->group_list, node) {
+	list_for_each_entry_safe(kvf, tmp, &kv->file_list, node) {
 #ifdef CONFIG_SPAPR_TCE_IOMMU
-		kvm_spapr_tce_release_vfio_group(dev->kvm, kvg);
+		kvm_spapr_tce_release_vfio_group(dev->kvm, kvf);
 #endif
-		kvm_vfio_file_set_kvm(kvg->file, NULL);
-		fput(kvg->file);
-		list_del(&kvg->node);
-		kfree(kvg);
+		kvm_vfio_file_set_kvm(kvf->file, NULL);
+		fput(kvf->file);
+		list_del(&kvf->node);
+		kfree(kvf);
 		kvm_arch_end_assignment(dev->kvm);
 	}
 
@@ -382,7 +383,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
 	if (!kv)
 		return -ENOMEM;
 
-	INIT_LIST_HEAD(&kv->group_list);
+	INIT_LIST_HEAD(&kv->file_list);
 	mutex_init(&kv->lock);
 
 	dev->private = kv;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 06/24] kvm/vfio: Accept vfio device file from userspace
  2023-03-27  9:40 [PATCH v8 00/24] Add vfio_device cdev for iommufd support Yi Liu
                   ` (4 preceding siblings ...)
  2023-03-27  9:40 ` [PATCH v8 05/24] kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device fd Yi Liu
@ 2023-03-27  9:40 ` Yi Liu
  2023-03-27  9:40 ` [PATCH v8 07/24] vfio: Pass struct vfio_device_file * to vfio_device_open/close() Yi Liu
                   ` (20 subsequent siblings)
  26 siblings, 0 replies; 54+ messages in thread
From: Yi Liu @ 2023-03-27  9:40 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

This defines KVM_DEV_VFIO_FILE* and make alias with KVM_DEV_VFIO_GROUP*.
Old userspace uses KVM_DEV_VFIO_GROUP* works as well.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 Documentation/virt/kvm/devices/vfio.rst | 53 +++++++++++++++++--------
 include/uapi/linux/kvm.h                | 16 ++++++--
 virt/kvm/vfio.c                         | 16 ++++----
 3 files changed, 56 insertions(+), 29 deletions(-)

diff --git a/Documentation/virt/kvm/devices/vfio.rst b/Documentation/virt/kvm/devices/vfio.rst
index 79b6811bb4f3..277d727ec1a2 100644
--- a/Documentation/virt/kvm/devices/vfio.rst
+++ b/Documentation/virt/kvm/devices/vfio.rst
@@ -9,24 +9,38 @@ Device types supported:
   - KVM_DEV_TYPE_VFIO
 
 Only one VFIO instance may be created per VM.  The created device
-tracks VFIO groups in use by the VM and features of those groups
-important to the correctness and acceleration of the VM.  As groups
-are enabled and disabled for use by the VM, KVM should be updated
-about their presence.  When registered with KVM, a reference to the
-VFIO-group is held by KVM.
+tracks VFIO files (group or device) in use by the VM and features
+of those groups/devices important to the correctness and acceleration
+of the VM.  As groups/devices are enabled and disabled for use by the
+VM, KVM should be updated about their presence.  When registered with
+KVM, a reference to the VFIO file is held by KVM.
 
 Groups:
-  KVM_DEV_VFIO_GROUP
-
-KVM_DEV_VFIO_GROUP attributes:
-  KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
-	kvm_device_attr.addr points to an int32_t file descriptor
-	for the VFIO group.
-  KVM_DEV_VFIO_GROUP_DEL: Remove a VFIO group from VFIO-KVM device tracking
-	kvm_device_attr.addr points to an int32_t file descriptor
-	for the VFIO group.
-  KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE: attaches a guest visible TCE table
+  KVM_DEV_VFIO_FILE
+	alias: KVM_DEV_VFIO_GROUP
+
+KVM_DEV_VFIO_FILE attributes:
+  KVM_DEV_VFIO_FILE_ADD: Add a VFIO file (group/device) to VFIO-KVM device
+	tracking
+
+	alias: KVM_DEV_VFIO_GROUP_ADD
+
+	kvm_device_attr.addr points to an int32_t file descriptor for the
+	VFIO file.
+
+  KVM_DEV_VFIO_FILE_DEL: Remove a VFIO file (group/device) from VFIO-KVM
+	device tracking
+
+	alias: KVM_DEV_VFIO_GROUP_DEL
+
+	kvm_device_attr.addr points to an int32_t file descriptor for the
+	VFIO file.
+
+  KVM_DEV_VFIO_FILE_SET_SPAPR_TCE: attaches a guest visible TCE table
 	allocated by sPAPR KVM.
+
+	alias: KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE
+
 	kvm_device_attr.addr points to a struct::
 
 		struct kvm_vfio_spapr_tce {
@@ -40,9 +54,14 @@ KVM_DEV_VFIO_GROUP attributes:
 	- @tablefd is a file descriptor for a TCE table allocated via
 	  KVM_CREATE_SPAPR_TCE.
 
+	only accepts vfio group file as SPAPR has no iommufd support
+
 ::
 
-The GROUP_ADD operation above should be invoked prior to accessing the
+The FILE/GROUP_ADD operation above should be invoked prior to accessing the
 device file descriptor via VFIO_GROUP_GET_DEVICE_FD in order to support
 drivers which require a kvm pointer to be set in their .open_device()
-callback.
+callback.  It is the same for device file descriptor via character device
+open which gets device access via VFIO_DEVICE_BIND_IOMMUFD.  For such file
+descriptors, FILE_ADD should be invoked before VFIO_DEVICE_BIND_IOMMUFD
+to support the drivers mentioned in prior sentence as well.
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index d77aef872a0a..a8eeca70a498 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1410,10 +1410,18 @@ struct kvm_device_attr {
 	__u64	addr;		/* userspace address of attr data */
 };
 
-#define  KVM_DEV_VFIO_GROUP			1
-#define   KVM_DEV_VFIO_GROUP_ADD			1
-#define   KVM_DEV_VFIO_GROUP_DEL			2
-#define   KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE		3
+#define  KVM_DEV_VFIO_FILE	1
+
+#define   KVM_DEV_VFIO_FILE_ADD			1
+#define   KVM_DEV_VFIO_FILE_DEL			2
+#define   KVM_DEV_VFIO_FILE_SET_SPAPR_TCE	3
+
+/* KVM_DEV_VFIO_GROUP aliases are for compile time uapi compatibility */
+#define  KVM_DEV_VFIO_GROUP	KVM_DEV_VFIO_FILE
+
+#define   KVM_DEV_VFIO_GROUP_ADD	KVM_DEV_VFIO_FILE_ADD
+#define   KVM_DEV_VFIO_GROUP_DEL	KVM_DEV_VFIO_FILE_DEL
+#define   KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE	KVM_DEV_VFIO_FILE_SET_SPAPR_TCE
 
 enum kvm_device_type {
 	KVM_DEV_TYPE_FSL_MPIC_20	= 1,
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 857d6ba349e1..d869913baafd 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -286,18 +286,18 @@ static int kvm_vfio_set_file(struct kvm_device *dev, long attr,
 	int32_t fd;
 
 	switch (attr) {
-	case KVM_DEV_VFIO_GROUP_ADD:
+	case KVM_DEV_VFIO_FILE_ADD:
 		if (get_user(fd, argp))
 			return -EFAULT;
 		return kvm_vfio_file_add(dev, fd);
 
-	case KVM_DEV_VFIO_GROUP_DEL:
+	case KVM_DEV_VFIO_FILE_DEL:
 		if (get_user(fd, argp))
 			return -EFAULT;
 		return kvm_vfio_file_del(dev, fd);
 
 #ifdef CONFIG_SPAPR_TCE_IOMMU
-	case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE:
+	case KVM_DEV_VFIO_FILE_SET_SPAPR_TCE:
 		return kvm_vfio_file_set_spapr_tce(dev, arg);
 #endif
 	}
@@ -309,7 +309,7 @@ static int kvm_vfio_set_attr(struct kvm_device *dev,
 			     struct kvm_device_attr *attr)
 {
 	switch (attr->group) {
-	case KVM_DEV_VFIO_GROUP:
+	case KVM_DEV_VFIO_FILE:
 		return kvm_vfio_set_file(dev, attr->attr,
 					 u64_to_user_ptr(attr->addr));
 	}
@@ -321,12 +321,12 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
 			     struct kvm_device_attr *attr)
 {
 	switch (attr->group) {
-	case KVM_DEV_VFIO_GROUP:
+	case KVM_DEV_VFIO_FILE:
 		switch (attr->attr) {
-		case KVM_DEV_VFIO_GROUP_ADD:
-		case KVM_DEV_VFIO_GROUP_DEL:
+		case KVM_DEV_VFIO_FILE_ADD:
+		case KVM_DEV_VFIO_FILE_DEL:
 #ifdef CONFIG_SPAPR_TCE_IOMMU
-		case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE:
+		case KVM_DEV_VFIO_FILE_SET_SPAPR_TCE:
 #endif
 			return 0;
 		}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 07/24] vfio: Pass struct vfio_device_file * to vfio_device_open/close()
  2023-03-27  9:40 [PATCH v8 00/24] Add vfio_device cdev for iommufd support Yi Liu
                   ` (5 preceding siblings ...)
  2023-03-27  9:40 ` [PATCH v8 06/24] kvm/vfio: Accept vfio device file from userspace Yi Liu
@ 2023-03-27  9:40 ` Yi Liu
  2023-03-27  9:40 ` [PATCH v8 08/24] vfio: Block device access via device fd until device is opened Yi Liu
                   ` (19 subsequent siblings)
  26 siblings, 0 replies; 54+ messages in thread
From: Yi Liu @ 2023-03-27  9:40 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

This avoids passing too much parameters in multiple functions.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c     | 20 ++++++++++++++------
 drivers/vfio/vfio.h      |  8 ++++----
 drivers/vfio/vfio_main.c | 25 +++++++++++++++----------
 3 files changed, 33 insertions(+), 20 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 4f937ebaf6f7..9a7b2765eef6 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -169,8 +169,9 @@ static void vfio_device_group_get_kvm_safe(struct vfio_device *device)
 	spin_unlock(&device->group->kvm_ref_lock);
 }
 
-static int vfio_device_group_open(struct vfio_device *device)
+static int vfio_device_group_open(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
 	int ret;
 
 	mutex_lock(&device->group->group_lock);
@@ -190,7 +191,11 @@ static int vfio_device_group_open(struct vfio_device *device)
 	if (device->open_count == 0)
 		vfio_device_group_get_kvm_safe(device);
 
-	ret = vfio_device_open(device, device->group->iommufd);
+	df->iommufd = device->group->iommufd;
+
+	ret = vfio_device_open(df);
+	if (ret)
+		df->iommufd = NULL;
 
 	if (device->open_count == 0)
 		vfio_device_put_kvm(device);
@@ -202,12 +207,15 @@ static int vfio_device_group_open(struct vfio_device *device)
 	return ret;
 }
 
-void vfio_device_group_close(struct vfio_device *device)
+void vfio_device_group_close(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
+
 	mutex_lock(&device->group->group_lock);
 	mutex_lock(&device->dev_set->lock);
 
-	vfio_device_close(device, device->group->iommufd);
+	vfio_device_close(df);
+	df->iommufd = NULL;
 
 	if (device->open_count == 0)
 		vfio_device_put_kvm(device);
@@ -228,7 +236,7 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
 		goto err_out;
 	}
 
-	ret = vfio_device_group_open(device);
+	ret = vfio_device_group_open(df);
 	if (ret)
 		goto err_free;
 
@@ -260,7 +268,7 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
 	return filep;
 
 err_close_device:
-	vfio_device_group_close(device);
+	vfio_device_group_close(df);
 err_free:
 	kfree(df);
 err_out:
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index e4672d91a6f7..cffc08f5a6f1 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -20,13 +20,13 @@ struct vfio_device_file {
 	struct vfio_device *device;
 	spinlock_t kvm_ref_lock; /* protect kvm field */
 	struct kvm *kvm;
+	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
 };
 
 void vfio_device_put_registration(struct vfio_device *device);
 bool vfio_device_try_get_registration(struct vfio_device *device);
-int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd);
-void vfio_device_close(struct vfio_device *device,
-		       struct iommufd_ctx *iommufd);
+int vfio_device_open(struct vfio_device_file *df);
+void vfio_device_close(struct vfio_device_file *df);
 struct vfio_device_file *
 vfio_allocate_device_file(struct vfio_device *device);
 
@@ -91,7 +91,7 @@ void vfio_device_group_register(struct vfio_device *device);
 void vfio_device_group_unregister(struct vfio_device *device);
 int vfio_device_group_use_iommu(struct vfio_device *device);
 void vfio_device_group_unuse_iommu(struct vfio_device *device);
-void vfio_device_group_close(struct vfio_device *device);
+void vfio_device_group_close(struct vfio_device_file *df);
 struct vfio_group *vfio_group_from_file(struct file *file);
 bool vfio_group_has_dev(struct vfio_group *group, struct vfio_device *device);
 bool vfio_group_enforced_coherent(struct vfio_group *group);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index cb543791b28b..2ea6cb6d03c7 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -419,9 +419,10 @@ vfio_allocate_device_file(struct vfio_device *device)
 	return df;
 }
 
-static int vfio_device_first_open(struct vfio_device *device,
-				  struct iommufd_ctx *iommufd)
+static int vfio_device_first_open(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
+	struct iommufd_ctx *iommufd = df->iommufd;
 	int ret;
 
 	lockdep_assert_held(&device->dev_set->lock);
@@ -453,9 +454,11 @@ static int vfio_device_first_open(struct vfio_device *device,
 	return ret;
 }
 
-static void vfio_device_last_close(struct vfio_device *device,
-				   struct iommufd_ctx *iommufd)
+static void vfio_device_last_close(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
+	struct iommufd_ctx *iommufd = df->iommufd;
+
 	lockdep_assert_held(&device->dev_set->lock);
 
 	if (device->ops->close_device)
@@ -467,15 +470,16 @@ static void vfio_device_last_close(struct vfio_device *device,
 	module_put(device->dev->driver->owner);
 }
 
-int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd)
+int vfio_device_open(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
 	int ret = 0;
 
 	lockdep_assert_held(&device->dev_set->lock);
 
 	device->open_count++;
 	if (device->open_count == 1) {
-		ret = vfio_device_first_open(device, iommufd);
+		ret = vfio_device_first_open(df);
 		if (ret)
 			device->open_count--;
 	}
@@ -483,14 +487,15 @@ int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd)
 	return ret;
 }
 
-void vfio_device_close(struct vfio_device *device,
-		       struct iommufd_ctx *iommufd)
+void vfio_device_close(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
+
 	lockdep_assert_held(&device->dev_set->lock);
 
 	vfio_assert_device_open(device);
 	if (device->open_count == 1)
-		vfio_device_last_close(device, iommufd);
+		vfio_device_last_close(df);
 	device->open_count--;
 }
 
@@ -535,7 +540,7 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
-	vfio_device_group_close(device);
+	vfio_device_group_close(df);
 
 	vfio_device_put_registration(device);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 08/24] vfio: Block device access via device fd until device is opened
  2023-03-27  9:40 [PATCH v8 00/24] Add vfio_device cdev for iommufd support Yi Liu
                   ` (6 preceding siblings ...)
  2023-03-27  9:40 ` [PATCH v8 07/24] vfio: Pass struct vfio_device_file * to vfio_device_open/close() Yi Liu
@ 2023-03-27  9:40 ` Yi Liu
  2023-03-28 21:33   ` Alex Williamson
  2023-03-27  9:40 ` [PATCH v8 09/24] vfio: Add cdev_device_open_cnt to vfio_group Yi Liu
                   ` (18 subsequent siblings)
  26 siblings, 1 reply; 54+ messages in thread
From: Yi Liu @ 2023-03-27  9:40 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

Allow the vfio_device file to be in a state where the device FD is
opened but the device cannot be used by userspace (i.e. its .open_device()
hasn't been called). This inbetween state is not used when the device
FD is spawned from the group FD, however when we create the device FD
directly by opening a cdev it will be opened in the blocked state.

The reason for the inbetween state is that userspace only gets a FD but
doesn't gain access permission until binding the FD to an iommufd. So in
the blocked state, only the bind operation is allowed. Completing bind
will allow user to further access the device.

This is implemented by adding a flag in struct vfio_device_file to mark
the blocked state and using a simple smp_load_acquire() to obtain the
flag value and serialize all the device setup with the thread accessing
this device.

Following this lockless scheme, it can safely handle the device FD
unbound->bound but it cannot handle bound->unbound. To allow this we'd
need to add a lock on all the vfio ioctls which seems costly. So once
device FD is bound, it remains bound until the FD is closed.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c     | 11 ++++++++++-
 drivers/vfio/vfio.h      |  1 +
 drivers/vfio/vfio_main.c | 41 ++++++++++++++++++++++++++++++++++------
 3 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 9a7b2765eef6..4f267ae7bebc 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -194,9 +194,18 @@ static int vfio_device_group_open(struct vfio_device_file *df)
 	df->iommufd = device->group->iommufd;
 
 	ret = vfio_device_open(df);
-	if (ret)
+	if (ret) {
 		df->iommufd = NULL;
+		goto out_put_kvm;
+	}
+
+	/*
+	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
+	 * read/write/mmap
+	 */
+	smp_store_release(&df->access_granted, true);
 
+out_put_kvm:
 	if (device->open_count == 0)
 		vfio_device_put_kvm(device);
 
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index cffc08f5a6f1..854f2c97cb9a 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -18,6 +18,7 @@ struct vfio_container;
 
 struct vfio_device_file {
 	struct vfio_device *device;
+	bool access_granted;
 	spinlock_t kvm_ref_lock; /* protect kvm field */
 	struct kvm *kvm;
 	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 2ea6cb6d03c7..b515bbda4c74 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1114,6 +1114,10 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
 	struct vfio_device *device = df->device;
 	int ret;
 
+	/* Paired with smp_store_release() following vfio_device_open() */
+	if (!smp_load_acquire(&df->access_granted))
+		return -EINVAL;
+
 	ret = vfio_device_pm_runtime_get(device);
 	if (ret)
 		return ret;
@@ -1141,6 +1145,10 @@ static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf,
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
+	/* Paired with smp_store_release() following vfio_device_open() */
+	if (!smp_load_acquire(&df->access_granted))
+		return -EINVAL;
+
 	if (unlikely(!device->ops->read))
 		return -EINVAL;
 
@@ -1154,6 +1162,10 @@ static ssize_t vfio_device_fops_write(struct file *filep,
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
+	/* Paired with smp_store_release() following vfio_device_open() */
+	if (!smp_load_acquire(&df->access_granted))
+		return -EINVAL;
+
 	if (unlikely(!device->ops->write))
 		return -EINVAL;
 
@@ -1165,6 +1177,10 @@ static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
+	/* Paired with smp_store_release() following vfio_device_open() */
+	if (!smp_load_acquire(&df->access_granted))
+		return -EINVAL;
+
 	if (unlikely(!device->ops->mmap))
 		return -EINVAL;
 
@@ -1201,6 +1217,24 @@ bool vfio_file_is_valid(struct file *file)
 }
 EXPORT_SYMBOL_GPL(vfio_file_is_valid);
 
+/*
+ * Return true if the input file is a vfio device file and has opened
+ * the input device. Otherwise, return false.
+ */
+static bool vfio_file_has_device_access(struct file *file,
+					struct vfio_device *device)
+{
+	struct vfio_device *vdev = vfio_device_from_file(file);
+	struct vfio_device_file *df;
+
+	if (!vdev || vdev != device)
+		return false;
+
+	df = file->private_data;
+
+	return READ_ONCE(df->access_granted);
+}
+
 /**
  * vfio_file_has_dev - True if the VFIO file is a handle for device
  * @file: VFIO file to check
@@ -1211,17 +1245,12 @@ EXPORT_SYMBOL_GPL(vfio_file_is_valid);
 bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
 {
 	struct vfio_group *group;
-	struct vfio_device *vdev;
 
 	group = vfio_group_from_file(file);
 	if (group)
 		return vfio_group_has_dev(group, device);
 
-	vdev = vfio_device_from_file(file);
-	if (vdev)
-		return vdev == device;
-
-	return false;
+	return vfio_file_has_device_access(file, device);
 }
 EXPORT_SYMBOL_GPL(vfio_file_has_dev);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 09/24] vfio: Add cdev_device_open_cnt to vfio_group
  2023-03-27  9:40 [PATCH v8 00/24] Add vfio_device cdev for iommufd support Yi Liu
                   ` (7 preceding siblings ...)
  2023-03-27  9:40 ` [PATCH v8 08/24] vfio: Block device access via device fd until device is opened Yi Liu
@ 2023-03-27  9:40 ` Yi Liu
  2023-03-27  9:40 ` [PATCH v8 10/24] vfio: Make vfio_device_open() single open for device cdev path Yi Liu
                   ` (17 subsequent siblings)
  26 siblings, 0 replies; 54+ messages in thread
From: Yi Liu @ 2023-03-27  9:40 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

for counting the devices that are opened via the cdev path. This count
is increased and decreased by the cdev path. The group path checks it
to achieve exclusion with the cdev path. With this, only one path (group
path or cdev path) will claim DMA ownership. This avoids scenarios in
which devices within the same group may be opened via different paths.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c | 33 +++++++++++++++++++++++++++++++++
 drivers/vfio/vfio.h  |  3 +++
 2 files changed, 36 insertions(+)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 4f267ae7bebc..4de7a8fc3be1 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -383,6 +383,33 @@ static long vfio_group_fops_unl_ioctl(struct file *filep,
 	}
 }
 
+int vfio_device_block_group(struct vfio_device *device)
+{
+	struct vfio_group *group = device->group;
+	int ret = 0;
+
+	mutex_lock(&group->group_lock);
+	if (group->opened_file) {
+		ret = -EBUSY;
+		goto out_unlock;
+	}
+
+	group->cdev_device_open_cnt++;
+
+out_unlock:
+	mutex_unlock(&group->group_lock);
+	return ret;
+}
+
+void vfio_device_unblock_group(struct vfio_device *device)
+{
+	struct vfio_group *group = device->group;
+
+	mutex_lock(&group->group_lock);
+	group->cdev_device_open_cnt--;
+	mutex_unlock(&group->group_lock);
+}
+
 static int vfio_group_fops_open(struct inode *inode, struct file *filep)
 {
 	struct vfio_group *group =
@@ -405,6 +432,11 @@ static int vfio_group_fops_open(struct inode *inode, struct file *filep)
 		goto out_unlock;
 	}
 
+	if (group->cdev_device_open_cnt) {
+		ret = -EBUSY;
+		goto out_unlock;
+	}
+
 	/*
 	 * Do we need multiple instances of the group open?  Seems not.
 	 */
@@ -479,6 +511,7 @@ static void vfio_group_release(struct device *dev)
 	mutex_destroy(&group->device_lock);
 	mutex_destroy(&group->group_lock);
 	WARN_ON(group->iommu_group);
+	WARN_ON(group->cdev_device_open_cnt);
 	ida_free(&vfio.group_ida, MINOR(group->dev.devt));
 	kfree(group);
 }
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 854f2c97cb9a..b2f20b78a707 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -83,8 +83,11 @@ struct vfio_group {
 	struct blocking_notifier_head	notifier;
 	struct iommufd_ctx		*iommufd;
 	spinlock_t			kvm_ref_lock;
+	unsigned int			cdev_device_open_cnt;
 };
 
+int vfio_device_block_group(struct vfio_device *device);
+void vfio_device_unblock_group(struct vfio_device *device);
 int vfio_device_set_group(struct vfio_device *device,
 			  enum vfio_group_type type);
 void vfio_device_remove_group(struct vfio_device *device);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 10/24] vfio: Make vfio_device_open() single open for device cdev path
  2023-03-27  9:40 [PATCH v8 00/24] Add vfio_device cdev for iommufd support Yi Liu
                   ` (8 preceding siblings ...)
  2023-03-27  9:40 ` [PATCH v8 09/24] vfio: Add cdev_device_open_cnt to vfio_group Yi Liu
@ 2023-03-27  9:40 ` Yi Liu
  2023-03-30 23:52   ` Jason Gunthorpe
  2023-03-27  9:40 ` [PATCH v8 11/24] vfio: Make vfio_device_first_open() to accept NULL iommufd for noiommu Yi Liu
                   ` (16 subsequent siblings)
  26 siblings, 1 reply; 54+ messages in thread
From: Yi Liu @ 2023-03-27  9:40 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

VFIO group has historically allowed multi-open of the device FD. This
was made secure because the "open" was executed via an ioctl to the
group FD which is itself only single open.

However, no known use of multiple device FDs today. It is kind of a
strange thing to do because new device FDs can naturally be created
via dup().

When we implement the new device uAPI (only used in cdev path) there is
no natural way to allow the device itself from being multi-opened in a
secure manner. Without the group FD we cannot prove the security context
of the opener.

Thus, when moving to the new uAPI we block the ability of opening
a device multiple times. Given old group path still allows it we store
a vfio_group pointer in struct vfio_device_file to differentiate.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c     | 2 ++
 drivers/vfio/vfio.h      | 2 ++
 drivers/vfio/vfio_main.c | 7 +++++++
 3 files changed, 11 insertions(+)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 4de7a8fc3be1..c0065e359db6 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -245,6 +245,8 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
 		goto err_out;
 	}
 
+	df->group = device->group;
+
 	ret = vfio_device_group_open(df);
 	if (ret)
 		goto err_free;
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index b2f20b78a707..f1a448f9d067 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -18,6 +18,8 @@ struct vfio_container;
 
 struct vfio_device_file {
 	struct vfio_device *device;
+	struct vfio_group *group;
+
 	bool access_granted;
 	spinlock_t kvm_ref_lock; /* protect kvm field */
 	struct kvm *kvm;
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index b515bbda4c74..b3b7e2436aec 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -477,6 +477,13 @@ int vfio_device_open(struct vfio_device_file *df)
 
 	lockdep_assert_held(&device->dev_set->lock);
 
+	/*
+	 * Only the group path allows the device opened multiple times.
+	 * The device cdev path doesn't have a secure way for it.
+	 */
+	if (device->open_count != 0 && !df->group)
+		return -EINVAL;
+
 	device->open_count++;
 	if (device->open_count == 1) {
 		ret = vfio_device_first_open(df);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 11/24] vfio: Make vfio_device_first_open() to accept NULL iommufd for noiommu
  2023-03-27  9:40 [PATCH v8 00/24] Add vfio_device cdev for iommufd support Yi Liu
                   ` (9 preceding siblings ...)
  2023-03-27  9:40 ` [PATCH v8 10/24] vfio: Make vfio_device_open() single open for device cdev path Yi Liu
@ 2023-03-27  9:40 ` Yi Liu
  2023-03-30 23:56   ` Jason Gunthorpe
  2023-03-27  9:40 ` [PATCH v8 12/24] vfio-iommufd: Move noiommu support out of vfio_iommufd_bind() Yi Liu
                   ` (15 subsequent siblings)
  26 siblings, 1 reply; 54+ messages in thread
From: Yi Liu @ 2023-03-27  9:40 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

vfio_device_first_open() requires the caller to provide either a valid
iommufd (the group path in iommufd compat mode) or a valid container
(the group path in legacy container mode). As preparation for noiommu
support in device cdev path it's extended to allow both being NULL. The
caller is expected to verify noiommu permission before passing NULL
to this function.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c     |  8 ++++++++
 drivers/vfio/vfio.h      |  1 +
 drivers/vfio/vfio_main.c | 12 ++++++++----
 3 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index c0065e359db6..36e105960dd8 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -771,6 +771,14 @@ void vfio_device_group_unregister(struct vfio_device *device)
 	mutex_unlock(&device->group->device_lock);
 }
 
+/* No group lock since df->group and df->group->container cannot change */
+bool vfio_device_group_uses_container(struct vfio_device_file *df)
+{
+	if (WARN_ON(!df->group))
+		return false;
+	return READ_ONCE(df->group->container);
+}
+
 int vfio_device_group_use_iommu(struct vfio_device *device)
 {
 	struct vfio_group *group = device->group;
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index f1a448f9d067..7d4108cbc185 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -95,6 +95,7 @@ int vfio_device_set_group(struct vfio_device *device,
 void vfio_device_remove_group(struct vfio_device *device);
 void vfio_device_group_register(struct vfio_device *device);
 void vfio_device_group_unregister(struct vfio_device *device);
+bool vfio_device_group_uses_container(struct vfio_device_file *df);
 int vfio_device_group_use_iommu(struct vfio_device *device);
 void vfio_device_group_unuse_iommu(struct vfio_device *device);
 void vfio_device_group_close(struct vfio_device_file *df);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index b3b7e2436aec..6739203873a6 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -423,16 +423,20 @@ static int vfio_device_first_open(struct vfio_device_file *df)
 {
 	struct vfio_device *device = df->device;
 	struct iommufd_ctx *iommufd = df->iommufd;
-	int ret;
+	int ret = 0;
 
 	lockdep_assert_held(&device->dev_set->lock);
 
 	if (!try_module_get(device->dev->driver->owner))
 		return -ENODEV;
 
+	/*
+	 * if neither iommufd nor container is used the device is in
+	 * noiommu mode then just go ahead to open it.
+	 */
 	if (iommufd)
 		ret = vfio_iommufd_bind(device, iommufd);
-	else
+	else if (vfio_device_group_uses_container(df))
 		ret = vfio_device_group_use_iommu(device);
 	if (ret)
 		goto err_module_put;
@@ -447,7 +451,7 @@ static int vfio_device_first_open(struct vfio_device_file *df)
 err_unuse_iommu:
 	if (iommufd)
 		vfio_iommufd_unbind(device);
-	else
+	else if (vfio_device_group_uses_container(df))
 		vfio_device_group_unuse_iommu(device);
 err_module_put:
 	module_put(device->dev->driver->owner);
@@ -465,7 +469,7 @@ static void vfio_device_last_close(struct vfio_device_file *df)
 		device->ops->close_device(device);
 	if (iommufd)
 		vfio_iommufd_unbind(device);
-	else
+	else if (vfio_device_group_uses_container(df))
 		vfio_device_group_unuse_iommu(device);
 	module_put(device->dev->driver->owner);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 12/24] vfio-iommufd: Move noiommu support out of vfio_iommufd_bind()
  2023-03-27  9:40 [PATCH v8 00/24] Add vfio_device cdev for iommufd support Yi Liu
                   ` (10 preceding siblings ...)
  2023-03-27  9:40 ` [PATCH v8 11/24] vfio: Make vfio_device_first_open() to accept NULL iommufd for noiommu Yi Liu
@ 2023-03-27  9:40 ` Yi Liu
  2023-03-27  9:40 ` [PATCH v8 13/24] vfio-iommufd: Split bind/attach into two steps Yi Liu
                   ` (14 subsequent siblings)
  26 siblings, 0 replies; 54+ messages in thread
From: Yi Liu @ 2023-03-27  9:40 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

into vfio_device_group_open(). This is also more consistent with what
will be done in vfio device cdev path.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c   |  9 +++++++++
 drivers/vfio/iommufd.c | 35 ++++++++++++++++++-----------------
 drivers/vfio/vfio.h    |  9 +++++++++
 3 files changed, 36 insertions(+), 17 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 36e105960dd8..d7e4a7c2da95 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -192,6 +192,15 @@ static int vfio_device_group_open(struct vfio_device_file *df)
 		vfio_device_group_get_kvm_safe(device);
 
 	df->iommufd = device->group->iommufd;
+	if (df->iommufd && vfio_device_is_noiommu(device)) {
+		if (device->open_count == 0) {
+			ret = vfio_iommufd_enable_noiommu_compat(device,
+								 df->iommufd);
+			if (ret)
+				goto out_put_kvm;
+		}
+		df->iommufd = NULL;
+	}
 
 	ret = vfio_device_open(df);
 	if (ret) {
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index 44088049dbb1..d512fc057999 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -10,6 +10,24 @@
 MODULE_IMPORT_NS(IOMMUFD);
 MODULE_IMPORT_NS(IOMMUFD_VFIO);
 
+int vfio_iommufd_enable_noiommu_compat(struct vfio_device *device,
+				       struct iommufd_ctx *ictx)
+{
+	u32 ioas_id;
+
+	if (!capable(CAP_SYS_RAWIO))
+		return -EPERM;
+
+	/*
+	 * Require no compat ioas to be assigned to proceed.  The basic
+	 * statement is that the user cannot have done something that
+	 * implies they expected translation to exist
+	 */
+	if (!iommufd_vfio_compat_ioas_get_id(ictx, &ioas_id))
+		return -EPERM;
+	return 0;
+}
+
 int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
 {
 	u32 ioas_id;
@@ -18,20 +36,6 @@ int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
 
 	lockdep_assert_held(&vdev->dev_set->lock);
 
-	if (vfio_device_is_noiommu(vdev)) {
-		if (!capable(CAP_SYS_RAWIO))
-			return -EPERM;
-
-		/*
-		 * Require no compat ioas to be assigned to proceed. The basic
-		 * statement is that the user cannot have done something that
-		 * implies they expected translation to exist
-		 */
-		if (!iommufd_vfio_compat_ioas_get_id(ictx, &ioas_id))
-			return -EPERM;
-		return 0;
-	}
-
 	ret = vdev->ops->bind_iommufd(vdev, ictx, &device_id);
 	if (ret)
 		return ret;
@@ -59,9 +63,6 @@ void vfio_iommufd_unbind(struct vfio_device *vdev)
 {
 	lockdep_assert_held(&vdev->dev_set->lock);
 
-	if (vfio_device_is_noiommu(vdev))
-		return;
-
 	if (vdev->ops->unbind_iommufd)
 		vdev->ops->unbind_iommufd(vdev);
 }
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 7d4108cbc185..136137b8618d 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -236,9 +236,18 @@ static inline void vfio_container_cleanup(void)
 #endif
 
 #if IS_ENABLED(CONFIG_IOMMUFD)
+int vfio_iommufd_enable_noiommu_compat(struct vfio_device *device,
+				       struct iommufd_ctx *ictx);
 int vfio_iommufd_bind(struct vfio_device *device, struct iommufd_ctx *ictx);
 void vfio_iommufd_unbind(struct vfio_device *device);
 #else
+static inline int
+vfio_iommufd_enable_noiommu_compat(struct vfio_device *device,
+				   struct iommufd_ctx *ictx)
+{
+	return -EOPNOTSUPP;
+}
+
 static inline int vfio_iommufd_bind(struct vfio_device *device,
 				    struct iommufd_ctx *ictx)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 13/24] vfio-iommufd: Split bind/attach into two steps
  2023-03-27  9:40 [PATCH v8 00/24] Add vfio_device cdev for iommufd support Yi Liu
                   ` (11 preceding siblings ...)
  2023-03-27  9:40 ` [PATCH v8 12/24] vfio-iommufd: Move noiommu support out of vfio_iommufd_bind() Yi Liu
@ 2023-03-27  9:40 ` Yi Liu
  2023-03-27  9:40 ` [PATCH v8 14/24] vfio: Record devid in vfio_device_file Yi Liu
                   ` (13 subsequent siblings)
  26 siblings, 0 replies; 54+ messages in thread
From: Yi Liu @ 2023-03-27  9:40 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

to align with the coming vfio device cdev support.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c   | 18 ++++++++++++++----
 drivers/vfio/iommufd.c | 33 ++++++++++++++-------------------
 drivers/vfio/vfio.h    |  9 +++++++++
 3 files changed, 37 insertions(+), 23 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index d7e4a7c2da95..8f09e4541c3a 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -203,9 +203,14 @@ static int vfio_device_group_open(struct vfio_device_file *df)
 	}
 
 	ret = vfio_device_open(df);
-	if (ret) {
-		df->iommufd = NULL;
+	if (ret)
 		goto out_put_kvm;
+
+	if (df->iommufd) {
+		ret = vfio_iommufd_attach_compat_ioas(device,
+						      df->iommufd);
+		if (ret)
+			goto out_close_device;
 	}
 
 	/*
@@ -214,12 +219,17 @@ static int vfio_device_group_open(struct vfio_device_file *df)
 	 */
 	smp_store_release(&df->access_granted, true);
 
+	mutex_unlock(&device->dev_set->lock);
+	mutex_unlock(&device->group->group_lock);
+	return 0;
+
+out_close_device:
+	vfio_device_close(df);
 out_put_kvm:
+	df->iommufd = NULL;
 	if (device->open_count == 0)
 		vfio_device_put_kvm(device);
-
 	mutex_unlock(&device->dev_set->lock);
-
 out_unlock:
 	mutex_unlock(&device->group->group_lock);
 	return ret;
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index d512fc057999..b32e757bdac5 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -30,33 +30,28 @@ int vfio_iommufd_enable_noiommu_compat(struct vfio_device *device,
 
 int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
 {
-	u32 ioas_id;
 	u32 device_id;
-	int ret;
 
 	lockdep_assert_held(&vdev->dev_set->lock);
 
-	ret = vdev->ops->bind_iommufd(vdev, ictx, &device_id);
-	if (ret)
-		return ret;
+	/* The legacy path has no way to return the device id */
+	return vdev->ops->bind_iommufd(vdev, ictx, &device_id);
+}
+
+int vfio_iommufd_attach_compat_ioas(struct vfio_device *vdev,
+				    struct iommufd_ctx *ictx)
+{
+	u32 ioas_id;
+	int ret;
+
+	lockdep_assert_held(&vdev->dev_set->lock);
 
 	ret = iommufd_vfio_compat_ioas_get_id(ictx, &ioas_id);
 	if (ret)
-		goto err_unbind;
-	ret = vdev->ops->attach_ioas(vdev, &ioas_id);
-	if (ret)
-		goto err_unbind;
-
-	/*
-	 * The legacy path has no way to return the device id or the selected
-	 * pt_id
-	 */
-	return 0;
+		return ret;
 
-err_unbind:
-	if (vdev->ops->unbind_iommufd)
-		vdev->ops->unbind_iommufd(vdev);
-	return ret;
+	/* The legacy path has no way to return the selected pt_id */
+	return vdev->ops->attach_ioas(vdev, &ioas_id);
 }
 
 void vfio_iommufd_unbind(struct vfio_device *vdev)
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 136137b8618d..abfaf85cc266 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -238,6 +238,8 @@ static inline void vfio_container_cleanup(void)
 #if IS_ENABLED(CONFIG_IOMMUFD)
 int vfio_iommufd_enable_noiommu_compat(struct vfio_device *device,
 				       struct iommufd_ctx *ictx);
+int vfio_iommufd_attach_compat_ioas(struct vfio_device *device,
+				    struct iommufd_ctx *ictx);
 int vfio_iommufd_bind(struct vfio_device *device, struct iommufd_ctx *ictx);
 void vfio_iommufd_unbind(struct vfio_device *device);
 #else
@@ -248,6 +250,13 @@ vfio_iommufd_enable_noiommu_compat(struct vfio_device *device,
 	return -EOPNOTSUPP;
 }
 
+static inline int
+vfio_iommufd_attach_compat_ioas(struct vfio_device *device,
+				struct iommufd_ctx *ictx)
+{
+	return -EOPNOTSUPP;
+}
+
 static inline int vfio_iommufd_bind(struct vfio_device *device,
 				    struct iommufd_ctx *ictx)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 14/24] vfio: Record devid in vfio_device_file
  2023-03-27  9:40 [PATCH v8 00/24] Add vfio_device cdev for iommufd support Yi Liu
                   ` (12 preceding siblings ...)
  2023-03-27  9:40 ` [PATCH v8 13/24] vfio-iommufd: Split bind/attach into two steps Yi Liu
@ 2023-03-27  9:40 ` Yi Liu
  2023-03-27  9:40 ` [PATCH v8 15/24] vfio-iommufd: Add detach_ioas support for physical VFIO devices Yi Liu
                   ` (12 subsequent siblings)
  26 siblings, 0 replies; 54+ messages in thread
From: Yi Liu @ 2023-03-27  9:40 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

.bind_iommufd() will generate an ID to represent this bond, which is
needed by userspace for further usage. Store devid in vfio_device_file
to avoid passing the pointer in multiple places.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/iommufd.c   | 12 +++++++-----
 drivers/vfio/vfio.h      | 10 +++++-----
 drivers/vfio/vfio_main.c |  6 +++---
 3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index b32e757bdac5..cd3cc56f6c08 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -28,14 +28,14 @@ int vfio_iommufd_enable_noiommu_compat(struct vfio_device *device,
 	return 0;
 }
 
-int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
+int vfio_iommufd_bind(struct vfio_device_file *df)
 {
-	u32 device_id;
+	struct vfio_device *vdev = df->device;
+	struct iommufd_ctx *ictx = df->iommufd;
 
 	lockdep_assert_held(&vdev->dev_set->lock);
 
-	/* The legacy path has no way to return the device id */
-	return vdev->ops->bind_iommufd(vdev, ictx, &device_id);
+	return vdev->ops->bind_iommufd(vdev, ictx, &df->devid);
 }
 
 int vfio_iommufd_attach_compat_ioas(struct vfio_device *vdev,
@@ -54,8 +54,10 @@ int vfio_iommufd_attach_compat_ioas(struct vfio_device *vdev,
 	return vdev->ops->attach_ioas(vdev, &ioas_id);
 }
 
-void vfio_iommufd_unbind(struct vfio_device *vdev)
+void vfio_iommufd_unbind(struct vfio_device_file *df)
 {
+	struct vfio_device *vdev = df->device;
+
 	lockdep_assert_held(&vdev->dev_set->lock);
 
 	if (vdev->ops->unbind_iommufd)
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index abfaf85cc266..b47b186573ac 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -24,6 +24,7 @@ struct vfio_device_file {
 	spinlock_t kvm_ref_lock; /* protect kvm field */
 	struct kvm *kvm;
 	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
+	u32 devid; /* only valid when iommufd is valid */
 };
 
 void vfio_device_put_registration(struct vfio_device *device);
@@ -240,8 +241,8 @@ int vfio_iommufd_enable_noiommu_compat(struct vfio_device *device,
 				       struct iommufd_ctx *ictx);
 int vfio_iommufd_attach_compat_ioas(struct vfio_device *device,
 				    struct iommufd_ctx *ictx);
-int vfio_iommufd_bind(struct vfio_device *device, struct iommufd_ctx *ictx);
-void vfio_iommufd_unbind(struct vfio_device *device);
+int vfio_iommufd_bind(struct vfio_device_file *df);
+void vfio_iommufd_unbind(struct vfio_device_file *df);
 #else
 static inline int
 vfio_iommufd_enable_noiommu_compat(struct vfio_device *device,
@@ -257,13 +258,12 @@ vfio_iommufd_attach_compat_ioas(struct vfio_device *device,
 	return -EOPNOTSUPP;
 }
 
-static inline int vfio_iommufd_bind(struct vfio_device *device,
-				    struct iommufd_ctx *ictx)
+static inline int vfio_iommufd_bind(struct vfio_device_file *df)
 {
 	return -EOPNOTSUPP;
 }
 
-static inline void vfio_iommufd_unbind(struct vfio_device *device)
+static inline void vfio_iommufd_unbind(struct vfio_device_file *df)
 {
 }
 #endif
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 6739203873a6..d54c03248794 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -435,7 +435,7 @@ static int vfio_device_first_open(struct vfio_device_file *df)
 	 * noiommu mode then just go ahead to open it.
 	 */
 	if (iommufd)
-		ret = vfio_iommufd_bind(device, iommufd);
+		ret = vfio_iommufd_bind(df);
 	else if (vfio_device_group_uses_container(df))
 		ret = vfio_device_group_use_iommu(device);
 	if (ret)
@@ -450,7 +450,7 @@ static int vfio_device_first_open(struct vfio_device_file *df)
 
 err_unuse_iommu:
 	if (iommufd)
-		vfio_iommufd_unbind(device);
+		vfio_iommufd_unbind(df);
 	else if (vfio_device_group_uses_container(df))
 		vfio_device_group_unuse_iommu(device);
 err_module_put:
@@ -468,7 +468,7 @@ static void vfio_device_last_close(struct vfio_device_file *df)
 	if (device->ops->close_device)
 		device->ops->close_device(device);
 	if (iommufd)
-		vfio_iommufd_unbind(device);
+		vfio_iommufd_unbind(df);
 	else if (vfio_device_group_uses_container(df))
 		vfio_device_group_unuse_iommu(device);
 	module_put(device->dev->driver->owner);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 15/24] vfio-iommufd: Add detach_ioas support for physical VFIO devices
  2023-03-27  9:40 [PATCH v8 00/24] Add vfio_device cdev for iommufd support Yi Liu
                   ` (13 preceding siblings ...)
  2023-03-27  9:40 ` [PATCH v8 14/24] vfio: Record devid in vfio_device_file Yi Liu
@ 2023-03-27  9:40 ` Yi Liu
  2023-03-27  9:40 ` [PATCH v8 16/24] iommufd/device: Add iommufd_access_detach() API Yi Liu
                   ` (11 subsequent siblings)
  26 siblings, 0 replies; 54+ messages in thread
From: Yi Liu @ 2023-03-27  9:40 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

this prepares for adding DETACH ioctl for physical VFIO devices.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 Documentation/driver-api/vfio.rst             |  8 +++++---
 drivers/vfio/fsl-mc/vfio_fsl_mc.c             |  1 +
 drivers/vfio/iommufd.c                        | 20 +++++++++++++++++++
 .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    |  2 ++
 drivers/vfio/pci/mlx5/main.c                  |  1 +
 drivers/vfio/pci/vfio_pci.c                   |  1 +
 drivers/vfio/platform/vfio_amba.c             |  1 +
 drivers/vfio/platform/vfio_platform.c         |  1 +
 drivers/vfio/vfio_main.c                      |  3 ++-
 include/linux/vfio.h                          |  8 +++++++-
 10 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
index 68abc089d6dd..363e12c90b87 100644
--- a/Documentation/driver-api/vfio.rst
+++ b/Documentation/driver-api/vfio.rst
@@ -279,6 +279,7 @@ similar to a file operations structure::
 					struct iommufd_ctx *ictx, u32 *out_device_id);
 		void	(*unbind_iommufd)(struct vfio_device *vdev);
 		int	(*attach_ioas)(struct vfio_device *vdev, u32 *pt_id);
+		void	(*detach_ioas)(struct vfio_device *vdev);
 		int	(*open_device)(struct vfio_device *vdev);
 		void	(*close_device)(struct vfio_device *vdev);
 		ssize_t	(*read)(struct vfio_device *vdev, char __user *buf,
@@ -315,9 +316,10 @@ container_of().
 	- The [un]bind_iommufd callbacks are issued when the device is bound to
 	  and unbound from iommufd.
 
-	- The attach_ioas callback is issued when the device is attached to an
-	  IOAS managed by the bound iommufd. The attached IOAS is automatically
-	  detached when the device is unbound from iommufd.
+	- The [de]attach_ioas callback is issued when the device is attached to
+	  and detached from an IOAS managed by the bound iommufd. However, the
+	  attached IOAS can also be automatically detached when the device is
+	  unbound from iommufd.
 
 	- The read/write/mmap callbacks implement the device region access defined
 	  by the device's own VFIO_DEVICE_GET_REGION_INFO ioctl.
diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc.c b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
index c89a047a4cd8..d540cf683d93 100644
--- a/drivers/vfio/fsl-mc/vfio_fsl_mc.c
+++ b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
@@ -594,6 +594,7 @@ static const struct vfio_device_ops vfio_fsl_mc_ops = {
 	.bind_iommufd	= vfio_iommufd_physical_bind,
 	.unbind_iommufd	= vfio_iommufd_physical_unbind,
 	.attach_ioas	= vfio_iommufd_physical_attach_ioas,
+	.detach_ioas	= vfio_iommufd_physical_detach_ioas,
 };
 
 static struct fsl_mc_driver vfio_fsl_mc_driver = {
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index cd3cc56f6c08..bb23cdd646cc 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -115,6 +115,14 @@ int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
 {
 	int rc;
 
+	lockdep_assert_held(&vdev->dev_set->lock);
+
+	if (WARN_ON(!vdev->iommufd_device))
+		return -EINVAL;
+
+	if (vdev->iommufd_attached)
+		return -EBUSY;
+
 	rc = iommufd_device_attach(vdev->iommufd_device, pt_id);
 	if (rc)
 		return rc;
@@ -123,6 +131,18 @@ int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
 }
 EXPORT_SYMBOL_GPL(vfio_iommufd_physical_attach_ioas);
 
+void vfio_iommufd_physical_detach_ioas(struct vfio_device *vdev)
+{
+	lockdep_assert_held(&vdev->dev_set->lock);
+
+	if (WARN_ON(!vdev->iommufd_device) || !vdev->iommufd_attached)
+		return;
+
+	iommufd_device_detach(vdev->iommufd_device);
+	vdev->iommufd_attached = false;
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_physical_detach_ioas);
+
 /*
  * The emulated standard ops mean that vfio_device is going to use the
  * "mdev path" and will call vfio_pin_pages()/vfio_dma_rw(). Drivers using this
diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index a117eaf21c14..b2f9778c8366 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -1373,6 +1373,7 @@ static const struct vfio_device_ops hisi_acc_vfio_pci_migrn_ops = {
 	.bind_iommufd = vfio_iommufd_physical_bind,
 	.unbind_iommufd = vfio_iommufd_physical_unbind,
 	.attach_ioas = vfio_iommufd_physical_attach_ioas,
+	.detach_ioas = vfio_iommufd_physical_detach_ioas,
 };
 
 static const struct vfio_device_ops hisi_acc_vfio_pci_ops = {
@@ -1391,6 +1392,7 @@ static const struct vfio_device_ops hisi_acc_vfio_pci_ops = {
 	.bind_iommufd = vfio_iommufd_physical_bind,
 	.unbind_iommufd = vfio_iommufd_physical_unbind,
 	.attach_ioas = vfio_iommufd_physical_attach_ioas,
+	.detach_ioas = vfio_iommufd_physical_detach_ioas,
 };
 
 static int hisi_acc_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
index d95fd382814c..42ec574a8622 100644
--- a/drivers/vfio/pci/mlx5/main.c
+++ b/drivers/vfio/pci/mlx5/main.c
@@ -1320,6 +1320,7 @@ static const struct vfio_device_ops mlx5vf_pci_ops = {
 	.bind_iommufd = vfio_iommufd_physical_bind,
 	.unbind_iommufd = vfio_iommufd_physical_unbind,
 	.attach_ioas = vfio_iommufd_physical_attach_ioas,
+	.detach_ioas = vfio_iommufd_physical_detach_ioas,
 };
 
 static int mlx5vf_pci_probe(struct pci_dev *pdev,
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 29091ee2e984..cb5b7f865d58 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -141,6 +141,7 @@ static const struct vfio_device_ops vfio_pci_ops = {
 	.bind_iommufd	= vfio_iommufd_physical_bind,
 	.unbind_iommufd	= vfio_iommufd_physical_unbind,
 	.attach_ioas	= vfio_iommufd_physical_attach_ioas,
+	.detach_ioas	= vfio_iommufd_physical_detach_ioas,
 };
 
 static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
diff --git a/drivers/vfio/platform/vfio_amba.c b/drivers/vfio/platform/vfio_amba.c
index 83fe54015595..6464b3939ebc 100644
--- a/drivers/vfio/platform/vfio_amba.c
+++ b/drivers/vfio/platform/vfio_amba.c
@@ -119,6 +119,7 @@ static const struct vfio_device_ops vfio_amba_ops = {
 	.bind_iommufd	= vfio_iommufd_physical_bind,
 	.unbind_iommufd	= vfio_iommufd_physical_unbind,
 	.attach_ioas	= vfio_iommufd_physical_attach_ioas,
+	.detach_ioas	= vfio_iommufd_physical_detach_ioas,
 };
 
 static const struct amba_id pl330_ids[] = {
diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c
index 22a1efca32a8..8cf22fa65baa 100644
--- a/drivers/vfio/platform/vfio_platform.c
+++ b/drivers/vfio/platform/vfio_platform.c
@@ -108,6 +108,7 @@ static const struct vfio_device_ops vfio_platform_ops = {
 	.bind_iommufd	= vfio_iommufd_physical_bind,
 	.unbind_iommufd	= vfio_iommufd_physical_unbind,
 	.attach_ioas	= vfio_iommufd_physical_attach_ioas,
+	.detach_ioas	= vfio_iommufd_physical_detach_ioas,
 };
 
 static struct platform_driver vfio_platform_driver = {
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index d54c03248794..eb53b1f8c828 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -258,7 +258,8 @@ static int __vfio_register_dev(struct vfio_device *device,
 	if (WARN_ON(IS_ENABLED(CONFIG_IOMMUFD) &&
 		    (!device->ops->bind_iommufd ||
 		     !device->ops->unbind_iommufd ||
-		     !device->ops->attach_ioas)))
+		     !device->ops->attach_ioas ||
+		     !device->ops->detach_ioas)))
 		return -EINVAL;
 
 	/*
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 3b55dd71299b..4174fff94c13 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -72,7 +72,9 @@ struct vfio_device {
  * @bind_iommufd: Called when binding the device to an iommufd
  * @unbind_iommufd: Opposite of bind_iommufd
  * @attach_ioas: Called when attaching device to an IOAS/HWPT managed by the
- *		 bound iommufd. Undo in unbind_iommufd.
+ *		 bound iommufd. Undo in unbind_iommufd if @detach_ioas is not
+ *		 called.
+ * @detach_ioas: Opposite of attach_ioas
  * @open_device: Called when the first file descriptor is opened for this device
  * @close_device: Opposite of open_device
  * @read: Perform read(2) on device file descriptor
@@ -96,6 +98,7 @@ struct vfio_device_ops {
 				struct iommufd_ctx *ictx, u32 *out_device_id);
 	void	(*unbind_iommufd)(struct vfio_device *vdev);
 	int	(*attach_ioas)(struct vfio_device *vdev, u32 *pt_id);
+	void	(*detach_ioas)(struct vfio_device *vdev);
 	int	(*open_device)(struct vfio_device *vdev);
 	void	(*close_device)(struct vfio_device *vdev);
 	ssize_t	(*read)(struct vfio_device *vdev, char __user *buf,
@@ -119,6 +122,7 @@ int vfio_iommufd_physical_bind(struct vfio_device *vdev,
 			       struct iommufd_ctx *ictx, u32 *out_device_id);
 void vfio_iommufd_physical_unbind(struct vfio_device *vdev);
 int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
+void vfio_iommufd_physical_detach_ioas(struct vfio_device *vdev);
 int vfio_iommufd_emulated_bind(struct vfio_device *vdev,
 			       struct iommufd_ctx *ictx, u32 *out_device_id);
 void vfio_iommufd_emulated_unbind(struct vfio_device *vdev);
@@ -142,6 +146,8 @@ static inline int vfio_iommufd_physical_devid(struct vfio_device *vdev, u32 *id)
 	((void (*)(struct vfio_device *vdev)) NULL)
 #define vfio_iommufd_physical_attach_ioas \
 	((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
+#define vfio_iommufd_physical_detach_ioas \
+	((void (*)(struct vfio_device *vdev)) NULL)
 #define vfio_iommufd_emulated_bind                                      \
 	((int (*)(struct vfio_device *vdev, struct iommufd_ctx *ictx,   \
 		  u32 *out_device_id)) NULL)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 16/24] iommufd/device: Add iommufd_access_detach() API
  2023-03-27  9:40 [PATCH v8 00/24] Add vfio_device cdev for iommufd support Yi Liu
                   ` (14 preceding siblings ...)
  2023-03-27  9:40 ` [PATCH v8 15/24] vfio-iommufd: Add detach_ioas support for physical VFIO devices Yi Liu
@ 2023-03-27  9:40 ` Yi Liu
  2023-03-28  2:23   ` Jon Pan-Doh
  2023-03-27  9:40 ` [PATCH v8 17/24] vfio-iommufd: Add detach_ioas support for emulated VFIO devices Yi Liu
                   ` (10 subsequent siblings)
  26 siblings, 1 reply; 54+ messages in thread
From: Yi Liu @ 2023-03-27  9:40 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

From: Nicolin Chen <nicolinc@nvidia.com>

Previously, the detach routine is only done by the destroy(). And it was
called by vfio_iommufd_emulated_unbind() when the device runs close(), so
all the mappings in iopt were cleaned in that setup, when the call trace
reaches this detach() routine.

Now, there's a need of a detach uAPI, meaning that it does not only need
a new iommufd_access_detach() API, but also requires access->ops->unmap()
call as a cleanup. So add one.

However, leaving that unprotected can introduce some potential of a race
condition during the pin_/unpin_pages() call, where access->ioas->iopt is
getting referenced. So, add an ioas_lock to protect the context of iopt
referencings.

Also, to allow the iommufd_access_unpin_pages() callback to happen via
this unmap() call, add an ioas_unpin pointer, so the unpin routine won't
be affected by the "access->ioas = NULL" trick.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/device.c          | 75 +++++++++++++++++++++++--
 drivers/iommu/iommufd/iommufd_private.h |  2 +
 include/linux/iommufd.h                 |  1 +
 3 files changed, 73 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 04a57aa1ae2c..f0522d80919d 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -493,26 +493,66 @@ void iommufd_access_destroy(struct iommufd_access *access)
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_access_destroy, IOMMUFD);
 
+static void __iommufd_access_detach(struct iommufd_access *access)
+{
+	struct iommufd_ioas *cur_ioas = access->ioas;
+
+	lockdep_assert_held(&access->ioas_lock);
+	/*
+	 * Set ioas to NULL to block any further iommufd_access_pin_pages().
+	 * iommufd_access_unpin_pages() can continue using access->ioas_unpin.
+	 */
+	access->ioas = NULL;
+
+	if (access->ops->unmap) {
+		mutex_unlock(&access->ioas_lock);
+		access->ops->unmap(access->data, 0, ULONG_MAX);
+		mutex_lock(&access->ioas_lock);
+	}
+	iopt_remove_access(&cur_ioas->iopt, access);
+	refcount_dec(&cur_ioas->obj.users);
+}
+
+void iommufd_access_detach(struct iommufd_access *access)
+{
+	mutex_lock(&access->ioas_lock);
+	if (WARN_ON(!access->ioas))
+		goto out;
+	__iommufd_access_detach(access);
+out:
+	access->ioas_unpin = NULL;
+	mutex_unlock(&access->ioas_lock);
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_access_detach, IOMMUFD);
+
 int iommufd_access_attach(struct iommufd_access *access, u32 ioas_id)
 {
 	struct iommufd_ioas *new_ioas;
 	int rc = 0;
 
-	if (access->ioas != NULL && access->ioas->obj.id != ioas_id)
+	mutex_lock(&access->ioas_lock);
+	if (access->ioas != NULL && access->ioas->obj.id != ioas_id) {
+		mutex_unlock(&access->ioas_lock);
 		return -EINVAL;
+	}
 
 	new_ioas = iommufd_get_ioas(access->ictx, ioas_id);
-	if (IS_ERR(new_ioas))
+	if (IS_ERR(new_ioas)) {
+		mutex_unlock(&access->ioas_lock);
 		return PTR_ERR(new_ioas);
+	}
 
 	rc = iopt_add_access(&new_ioas->iopt, access);
 	if (rc) {
+		mutex_unlock(&access->ioas_lock);
 		iommufd_put_object(&new_ioas->obj);
 		return rc;
 	}
 	iommufd_ref_to_users(&new_ioas->obj);
 
 	access->ioas = new_ioas;
+	access->ioas_unpin = new_ioas;
+	mutex_unlock(&access->ioas_lock);
 	return 0;
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_access_attach, IOMMUFD);
@@ -567,8 +607,8 @@ void iommufd_access_notify_unmap(struct io_pagetable *iopt, unsigned long iova,
 void iommufd_access_unpin_pages(struct iommufd_access *access,
 				unsigned long iova, unsigned long length)
 {
-	struct io_pagetable *iopt = &access->ioas->iopt;
 	struct iopt_area_contig_iter iter;
+	struct io_pagetable *iopt;
 	unsigned long last_iova;
 	struct iopt_area *area;
 
@@ -576,6 +616,13 @@ void iommufd_access_unpin_pages(struct iommufd_access *access,
 	    WARN_ON(check_add_overflow(iova, length - 1, &last_iova)))
 		return;
 
+	mutex_lock(&access->ioas_lock);
+	if (!access->ioas_unpin) {
+		mutex_unlock(&access->ioas_lock);
+		return;
+	}
+	iopt = &access->ioas_unpin->iopt;
+
 	down_read(&iopt->iova_rwsem);
 	iopt_for_each_contig_area(&iter, area, iopt, iova, last_iova)
 		iopt_area_remove_access(
@@ -585,6 +632,7 @@ void iommufd_access_unpin_pages(struct iommufd_access *access,
 				min(last_iova, iopt_area_last_iova(area))));
 	up_read(&iopt->iova_rwsem);
 	WARN_ON(!iopt_area_contig_done(&iter));
+	mutex_unlock(&access->ioas_lock);
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_access_unpin_pages, IOMMUFD);
 
@@ -630,8 +678,8 @@ int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova,
 			     unsigned long length, struct page **out_pages,
 			     unsigned int flags)
 {
-	struct io_pagetable *iopt = &access->ioas->iopt;
 	struct iopt_area_contig_iter iter;
+	struct io_pagetable *iopt;
 	unsigned long last_iova;
 	struct iopt_area *area;
 	int rc;
@@ -646,6 +694,13 @@ int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova,
 	if (check_add_overflow(iova, length - 1, &last_iova))
 		return -EOVERFLOW;
 
+	mutex_lock(&access->ioas_lock);
+	if (!access->ioas) {
+		mutex_unlock(&access->ioas_lock);
+		return -ENOENT;
+	}
+	iopt = &access->ioas->iopt;
+
 	down_read(&iopt->iova_rwsem);
 	iopt_for_each_contig_area(&iter, area, iopt, iova, last_iova) {
 		unsigned long last = min(last_iova, iopt_area_last_iova(area));
@@ -676,6 +731,7 @@ int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova,
 	}
 
 	up_read(&iopt->iova_rwsem);
+	mutex_unlock(&access->ioas_lock);
 	return 0;
 
 err_remove:
@@ -690,6 +746,7 @@ int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova,
 						  iopt_area_last_iova(area))));
 	}
 	up_read(&iopt->iova_rwsem);
+	mutex_unlock(&access->ioas_lock);
 	return rc;
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_access_pin_pages, IOMMUFD);
@@ -709,8 +766,8 @@ EXPORT_SYMBOL_NS_GPL(iommufd_access_pin_pages, IOMMUFD);
 int iommufd_access_rw(struct iommufd_access *access, unsigned long iova,
 		      void *data, size_t length, unsigned int flags)
 {
-	struct io_pagetable *iopt = &access->ioas->iopt;
 	struct iopt_area_contig_iter iter;
+	struct io_pagetable *iopt;
 	struct iopt_area *area;
 	unsigned long last_iova;
 	int rc;
@@ -720,6 +777,13 @@ int iommufd_access_rw(struct iommufd_access *access, unsigned long iova,
 	if (check_add_overflow(iova, length - 1, &last_iova))
 		return -EOVERFLOW;
 
+	mutex_lock(&access->ioas_lock);
+	if (!access->ioas) {
+		mutex_unlock(&access->ioas_lock);
+		return -ENOENT;
+	}
+	iopt = &access->ioas->iopt;
+
 	down_read(&iopt->iova_rwsem);
 	iopt_for_each_contig_area(&iter, area, iopt, iova, last_iova) {
 		unsigned long last = min(last_iova, iopt_area_last_iova(area));
@@ -746,6 +810,7 @@ int iommufd_access_rw(struct iommufd_access *access, unsigned long iova,
 		rc = -ENOENT;
 err_out:
 	up_read(&iopt->iova_rwsem);
+	mutex_unlock(&access->ioas_lock);
 	return rc;
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_access_rw, IOMMUFD);
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 2e6e8e217cce..ec2ce3ef187d 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -263,6 +263,8 @@ struct iommufd_access {
 	struct iommufd_object obj;
 	struct iommufd_ctx *ictx;
 	struct iommufd_ioas *ioas;
+	struct iommufd_ioas *ioas_unpin;
+	struct mutex ioas_lock;
 	const struct iommufd_access_ops *ops;
 	void *data;
 	unsigned long iova_alignment;
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index ac96df406833..9e0e8894dacc 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -47,6 +47,7 @@ iommufd_access_create(struct iommufd_ctx *ictx,
 		      const struct iommufd_access_ops *ops, void *data, u32 *id);
 void iommufd_access_destroy(struct iommufd_access *access);
 int iommufd_access_attach(struct iommufd_access *access, u32 ioas_id);
+void iommufd_access_detach(struct iommufd_access *access);
 
 void iommufd_ctx_get(struct iommufd_ctx *ictx);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 17/24] vfio-iommufd: Add detach_ioas support for emulated VFIO devices
  2023-03-27  9:40 [PATCH v8 00/24] Add vfio_device cdev for iommufd support Yi Liu
                   ` (15 preceding siblings ...)
  2023-03-27  9:40 ` [PATCH v8 16/24] iommufd/device: Add iommufd_access_detach() API Yi Liu
@ 2023-03-27  9:40 ` Yi Liu
  2023-03-27  9:40 ` [PATCH v8 18/24] vfio: Determine noiommu in vfio_device registration Yi Liu
                   ` (9 subsequent siblings)
  26 siblings, 0 replies; 54+ messages in thread
From: Yi Liu @ 2023-03-27  9:40 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

this prepares for adding DETACH ioctl for emulated VFIO devices.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c  |  1 +
 drivers/s390/cio/vfio_ccw_ops.c   |  1 +
 drivers/s390/crypto/vfio_ap_ops.c |  1 +
 drivers/vfio/iommufd.c            | 12 ++++++++++++
 include/linux/vfio.h              |  3 +++
 samples/vfio-mdev/mbochs.c        |  1 +
 samples/vfio-mdev/mdpy.c          |  1 +
 samples/vfio-mdev/mtty.c          |  1 +
 8 files changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index de675d799c7d..9cd9e9da60dd 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1474,6 +1474,7 @@ static const struct vfio_device_ops intel_vgpu_dev_ops = {
 	.bind_iommufd	= vfio_iommufd_emulated_bind,
 	.unbind_iommufd = vfio_iommufd_emulated_unbind,
 	.attach_ioas	= vfio_iommufd_emulated_attach_ioas,
+	.detach_ioas	= vfio_iommufd_emulated_detach_ioas,
 };
 
 static int intel_vgpu_probe(struct mdev_device *mdev)
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index 5b53b94f13c7..cba4971618ff 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -632,6 +632,7 @@ static const struct vfio_device_ops vfio_ccw_dev_ops = {
 	.bind_iommufd = vfio_iommufd_emulated_bind,
 	.unbind_iommufd = vfio_iommufd_emulated_unbind,
 	.attach_ioas = vfio_iommufd_emulated_attach_ioas,
+	.detach_ioas = vfio_iommufd_emulated_detach_ioas,
 };
 
 struct mdev_driver vfio_ccw_mdev_driver = {
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 72e10abb103a..9902e62e7a17 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1844,6 +1844,7 @@ static const struct vfio_device_ops vfio_ap_matrix_dev_ops = {
 	.bind_iommufd = vfio_iommufd_emulated_bind,
 	.unbind_iommufd = vfio_iommufd_emulated_unbind,
 	.attach_ioas = vfio_iommufd_emulated_attach_ioas,
+	.detach_ioas = vfio_iommufd_emulated_detach_ioas,
 };
 
 static struct mdev_driver vfio_ap_matrix_driver = {
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index bb23cdd646cc..df4d6e0739b5 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -206,3 +206,15 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
 	return 0;
 }
 EXPORT_SYMBOL_GPL(vfio_iommufd_emulated_attach_ioas);
+
+void vfio_iommufd_emulated_detach_ioas(struct vfio_device *vdev)
+{
+	lockdep_assert_held(&vdev->dev_set->lock);
+
+	if (WARN_ON(!vdev->iommufd_access) || !vdev->iommufd_attached)
+		return;
+
+	iommufd_access_detach(vdev->iommufd_access);
+	vdev->iommufd_attached = false;
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_emulated_detach_ioas);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 4174fff94c13..580cfc5728e8 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -127,6 +127,7 @@ int vfio_iommufd_emulated_bind(struct vfio_device *vdev,
 			       struct iommufd_ctx *ictx, u32 *out_device_id);
 void vfio_iommufd_emulated_unbind(struct vfio_device *vdev);
 int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
+void vfio_iommufd_emulated_detach_ioas(struct vfio_device *vdev);
 #else
 static inline struct iommufd_ctx *
 vfio_iommufd_physical_ictx(struct vfio_device *vdev)
@@ -155,6 +156,8 @@ static inline int vfio_iommufd_physical_devid(struct vfio_device *vdev, u32 *id)
 	((void (*)(struct vfio_device *vdev)) NULL)
 #define vfio_iommufd_emulated_attach_ioas \
 	((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
+#define vfio_iommufd_emulated_detach_ioas \
+	((void (*)(struct vfio_device *vdev)) NULL)
 #endif
 
 /**
diff --git a/samples/vfio-mdev/mbochs.c b/samples/vfio-mdev/mbochs.c
index 19391dda5fba..47a2914b63d9 100644
--- a/samples/vfio-mdev/mbochs.c
+++ b/samples/vfio-mdev/mbochs.c
@@ -1377,6 +1377,7 @@ static const struct vfio_device_ops mbochs_dev_ops = {
 	.bind_iommufd	= vfio_iommufd_emulated_bind,
 	.unbind_iommufd	= vfio_iommufd_emulated_unbind,
 	.attach_ioas	= vfio_iommufd_emulated_attach_ioas,
+	.detach_ioas	= vfio_iommufd_emulated_detach_ioas,
 };
 
 static struct mdev_driver mbochs_driver = {
diff --git a/samples/vfio-mdev/mdpy.c b/samples/vfio-mdev/mdpy.c
index 5f48aef36995..ce0e67f37406 100644
--- a/samples/vfio-mdev/mdpy.c
+++ b/samples/vfio-mdev/mdpy.c
@@ -666,6 +666,7 @@ static const struct vfio_device_ops mdpy_dev_ops = {
 	.bind_iommufd	= vfio_iommufd_emulated_bind,
 	.unbind_iommufd	= vfio_iommufd_emulated_unbind,
 	.attach_ioas	= vfio_iommufd_emulated_attach_ioas,
+	.detach_ioas	= vfio_iommufd_emulated_detach_ioas,
 };
 
 static struct mdev_driver mdpy_driver = {
diff --git a/samples/vfio-mdev/mtty.c b/samples/vfio-mdev/mtty.c
index 35460901b9f7..5069aef3c2a2 100644
--- a/samples/vfio-mdev/mtty.c
+++ b/samples/vfio-mdev/mtty.c
@@ -1272,6 +1272,7 @@ static const struct vfio_device_ops mtty_dev_ops = {
 	.bind_iommufd	= vfio_iommufd_emulated_bind,
 	.unbind_iommufd	= vfio_iommufd_emulated_unbind,
 	.attach_ioas	= vfio_iommufd_emulated_attach_ioas,
+	.detach_ioas	= vfio_iommufd_emulated_detach_ioas,
 };
 
 static struct mdev_driver mtty_driver = {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 18/24] vfio: Determine noiommu in vfio_device registration
  2023-03-27  9:40 [PATCH v8 00/24] Add vfio_device cdev for iommufd support Yi Liu
                   ` (16 preceding siblings ...)
  2023-03-27  9:40 ` [PATCH v8 17/24] vfio-iommufd: Add detach_ioas support for emulated VFIO devices Yi Liu
@ 2023-03-27  9:40 ` Yi Liu
  2023-03-28  6:36   ` Tian, Kevin
  2023-03-27  9:40 ` [PATCH v8 19/24] vfio: Name noiommu vfio_device with "noiommu-" prefix Yi Liu
                   ` (8 subsequent siblings)
  26 siblings, 1 reply; 54+ messages in thread
From: Yi Liu @ 2023-03-27  9:40 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

This adds a noiommu flag in vfio_device, hence caller of the
vfio_device_is_noiommu() just refers to the flag for noiommu
check.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c     | 2 +-
 drivers/vfio/vfio.h      | 6 +++---
 drivers/vfio/vfio_main.c | 2 ++
 include/linux/vfio.h     | 1 +
 4 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 8f09e4541c3a..b0a73d82662e 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -192,7 +192,7 @@ static int vfio_device_group_open(struct vfio_device_file *df)
 		vfio_device_group_get_kvm_safe(device);
 
 	df->iommufd = device->group->iommufd;
-	if (df->iommufd && vfio_device_is_noiommu(device)) {
+	if (df->iommufd && device->noiommu) {
 		if (device->open_count == 0) {
 			ret = vfio_iommufd_enable_noiommu_compat(device,
 								 df->iommufd);
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index b47b186573ac..41dfc9d5205a 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -108,10 +108,10 @@ bool vfio_device_has_container(struct vfio_device *device);
 int __init vfio_group_init(void);
 void vfio_group_cleanup(void);
 
-static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
+static inline void vfio_device_set_noiommu(struct vfio_device *device)
 {
-	return IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
-	       vdev->group->type == VFIO_NO_IOMMU;
+	device->noiommu = IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
+			  device->group->type == VFIO_NO_IOMMU;
 }
 
 #if IS_ENABLED(CONFIG_VFIO_CONTAINER)
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index eb53b1f8c828..805c34c7b0ef 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -277,6 +277,8 @@ static int __vfio_register_dev(struct vfio_device *device,
 	if (ret)
 		return ret;
 
+	vfio_device_set_noiommu(device);
+
 	ret = device_add(&device->device);
 	if (ret)
 		goto err_out;
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 580cfc5728e8..5c06af04ed9e 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -62,6 +62,7 @@ struct vfio_device {
 	struct iommufd_device *iommufd_device;
 	bool iommufd_attached;
 #endif
+	bool noiommu;
 };
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 19/24] vfio: Name noiommu vfio_device with "noiommu-" prefix
  2023-03-27  9:40 [PATCH v8 00/24] Add vfio_device cdev for iommufd support Yi Liu
                   ` (17 preceding siblings ...)
  2023-03-27  9:40 ` [PATCH v8 18/24] vfio: Determine noiommu in vfio_device registration Yi Liu
@ 2023-03-27  9:40 ` Yi Liu
  2023-03-28  6:37   ` Tian, Kevin
  2023-03-27  9:40 ` [PATCH v8 20/24] vfio: Add cdev for vfio_device Yi Liu
                   ` (7 subsequent siblings)
  26 siblings, 1 reply; 54+ messages in thread
From: Yi Liu @ 2023-03-27  9:40 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

For noiommu device, vfio core names the cdev node with prefix "noiommu-".

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/vfio_main.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 805c34c7b0ef..8e96aab27029 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -269,16 +269,17 @@ static int __vfio_register_dev(struct vfio_device *device,
 	if (!device->dev_set)
 		vfio_assign_device_set(device, device);
 
-	ret = dev_set_name(&device->device, "vfio%d", device->index);
-	if (ret)
-		return ret;
-
 	ret = vfio_device_set_group(device, type);
 	if (ret)
 		return ret;
 
 	vfio_device_set_noiommu(device);
 
+	ret = dev_set_name(&device->device, "%svfio%d",
+			   device->noiommu ? "noiommu-" : "", device->index);
+	if (ret)
+		goto err_out;
+
 	ret = device_add(&device->device);
 	if (ret)
 		goto err_out;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 20/24] vfio: Add cdev for vfio_device
  2023-03-27  9:40 [PATCH v8 00/24] Add vfio_device cdev for iommufd support Yi Liu
                   ` (18 preceding siblings ...)
  2023-03-27  9:40 ` [PATCH v8 19/24] vfio: Name noiommu vfio_device with "noiommu-" prefix Yi Liu
@ 2023-03-27  9:40 ` Yi Liu
  2023-03-29 19:57   ` Alex Williamson
  2023-03-27  9:40 ` [PATCH v8 21/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD Yi Liu
                   ` (6 subsequent siblings)
  26 siblings, 1 reply; 54+ messages in thread
From: Yi Liu @ 2023-03-27  9:40 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

This allows user to directly open a vfio device w/o using the legacy
container/group interface, as a prerequisite for supporting new iommu
features like nested translation.

The device fd opened in this manner doesn't have the capability to access
the device as the fops open() doesn't open the device until the successful
BIND_IOMMUFD which be added in next patch.

With this patch, devices registered to vfio core have both group and device
interface created.

- group interface : /dev/vfio/$groupID
- device interface: /dev/vfio/devices/vfioX - normal device
		    /dev/vfio/devices/noiommu-vfioX - noiommu device
		    ("X" is the minor number and is unique across devices)

Given a vfio device the user can identify the matching vfioX by checking
the sysfs path of the device. Take PCI device (0000:6a:01.0) for example,
/sys/bus/pci/devices/0000\:6a\:01.0/vfio-dev/vfio0/dev contains the
major:minor of the matching vfioX.

Userspace then opens the /dev/vfio/devices/vfioX and checks with fstat
that the major:minor matches.

The vfio_device cdev logic in this patch:
*) __vfio_register_dev() path ends up doing cdev_device_add() for each
   vfio_device if VFIO_DEVICE_CDEV configured.
*) vfio_unregister_group_dev() path does cdev_device_del();

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/Kconfig       | 11 +++++++
 drivers/vfio/Makefile      |  1 +
 drivers/vfio/device_cdev.c | 62 ++++++++++++++++++++++++++++++++++++++
 drivers/vfio/vfio.h        | 46 ++++++++++++++++++++++++++++
 drivers/vfio/vfio_main.c   | 26 +++++++++++-----
 include/linux/vfio.h       |  4 +++
 6 files changed, 143 insertions(+), 7 deletions(-)
 create mode 100644 drivers/vfio/device_cdev.c

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 89e06c981e43..e2105b4dac2d 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -12,6 +12,17 @@ menuconfig VFIO
 	  If you don't know what to do here, say N.
 
 if VFIO
+config VFIO_DEVICE_CDEV
+	bool "Support for the VFIO cdev /dev/vfio/devices/vfioX"
+	depends on IOMMUFD
+	help
+	  The VFIO device cdev is another way for userspace to get device
+	  access. Userspace gets device fd by opening device cdev under
+	  /dev/vfio/devices/vfioX, and then bind the device fd with an iommufd
+	  to set up secure DMA context for device access.
+
+	  If you don't know what to do here, say N.
+
 config VFIO_CONTAINER
 	bool "Support for the VFIO container /dev/vfio/vfio"
 	select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index 70e7dcb302ef..245394aeb94b 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_VFIO) += vfio.o
 vfio-y += vfio_main.o \
 	  group.o \
 	  iova_bitmap.o
+vfio-$(CONFIG_VFIO_DEVICE_CDEV) += device_cdev.o
 vfio-$(CONFIG_IOMMUFD) += iommufd.o
 vfio-$(CONFIG_VFIO_CONTAINER) += container.o
 vfio-$(CONFIG_VFIO_VIRQFD) += virqfd.o
diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
new file mode 100644
index 000000000000..1c640016a824
--- /dev/null
+++ b/drivers/vfio/device_cdev.c
@@ -0,0 +1,62 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2023 Intel Corporation.
+ */
+#include <linux/vfio.h>
+
+#include "vfio.h"
+
+static dev_t device_devt;
+
+void vfio_init_device_cdev(struct vfio_device *device)
+{
+	device->device.devt = MKDEV(MAJOR(device_devt), device->index);
+	cdev_init(&device->cdev, &vfio_device_fops);
+	device->cdev.owner = THIS_MODULE;
+}
+
+/*
+ * device access via the fd opened by this function is blocked until
+ * .open_device() is called successfully during BIND_IOMMUFD.
+ */
+int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
+{
+	struct vfio_device *device = container_of(inode->i_cdev,
+						  struct vfio_device, cdev);
+	struct vfio_device_file *df;
+	int ret;
+
+	if (!vfio_device_try_get_registration(device))
+		return -ENODEV;
+
+	df = vfio_allocate_device_file(device);
+	if (IS_ERR(df)) {
+		ret = PTR_ERR(df);
+		goto err_put_registration;
+	}
+
+	filep->private_data = df;
+
+	return 0;
+
+err_put_registration:
+	vfio_device_put_registration(device);
+	return ret;
+}
+
+static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
+{
+	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
+}
+
+int vfio_cdev_init(struct class *device_class)
+{
+	device_class->devnode = vfio_device_devnode;
+	return alloc_chrdev_region(&device_devt, 0,
+				   MINORMASK + 1, "vfio-dev");
+}
+
+void vfio_cdev_cleanup(void)
+{
+	unregister_chrdev_region(device_devt, MINORMASK + 1);
+}
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 41dfc9d5205a..3a8fd0e32f59 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -268,6 +268,52 @@ static inline void vfio_iommufd_unbind(struct vfio_device_file *df)
 }
 #endif
 
+#if IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV)
+static inline int vfio_device_add(struct vfio_device *device)
+{
+	return cdev_device_add(&device->cdev, &device->device);
+}
+
+static inline void vfio_device_del(struct vfio_device *device)
+{
+	cdev_device_del(&device->cdev, &device->device);
+}
+
+void vfio_init_device_cdev(struct vfio_device *device);
+int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
+int vfio_cdev_init(struct class *device_class);
+void vfio_cdev_cleanup(void);
+#else
+static inline int vfio_device_add(struct vfio_device *device)
+{
+	return device_add(&device->device);
+}
+
+static inline void vfio_device_del(struct vfio_device *device)
+{
+	device_del(&device->device);
+}
+
+static inline void vfio_init_device_cdev(struct vfio_device *device)
+{
+}
+
+static inline int vfio_device_fops_cdev_open(struct inode *inode,
+					     struct file *filep)
+{
+	return 0;
+}
+
+static inline int vfio_cdev_init(struct class *device_class)
+{
+	return 0;
+}
+
+static inline void vfio_cdev_cleanup(void)
+{
+}
+#endif /* CONFIG_VFIO_DEVICE_CDEV */
+
 #if IS_ENABLED(CONFIG_VFIO_VIRQFD)
 int __init vfio_virqfd_init(void);
 void vfio_virqfd_exit(void);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 8e96aab27029..58fc3bb768f2 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -242,6 +242,7 @@ static int vfio_init_device(struct vfio_device *device, struct device *dev,
 	device->device.release = vfio_device_release;
 	device->device.class = vfio.device_class;
 	device->device.parent = device->dev;
+	vfio_init_device_cdev(device);
 	return 0;
 
 out_uninit:
@@ -280,7 +281,7 @@ static int __vfio_register_dev(struct vfio_device *device,
 	if (ret)
 		goto err_out;
 
-	ret = device_add(&device->device);
+	ret = vfio_device_add(device);
 	if (ret)
 		goto err_out;
 
@@ -320,6 +321,12 @@ void vfio_unregister_group_dev(struct vfio_device *device)
 	bool interrupted = false;
 	long rc;
 
+	/* Prevent new device opened in the group path */
+	vfio_device_group_unregister(device);
+
+	/* Prevent new device opened in the cdev path */
+	vfio_device_del(device);
+
 	vfio_device_put_registration(device);
 	rc = try_wait_for_completion(&device->comp);
 	while (rc <= 0) {
@@ -343,11 +350,6 @@ void vfio_unregister_group_dev(struct vfio_device *device)
 		}
 	}
 
-	vfio_device_group_unregister(device);
-
-	/* Balances device_add in register path */
-	device_del(&device->device);
-
 	/* Balances vfio_device_set_group in register path */
 	vfio_device_remove_group(device);
 }
@@ -555,7 +557,8 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
-	vfio_device_group_close(df);
+	if (df->group)
+		vfio_device_group_close(df);
 
 	vfio_device_put_registration(device);
 
@@ -1204,6 +1207,7 @@ static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
 
 const struct file_operations vfio_device_fops = {
 	.owner		= THIS_MODULE,
+	.open		= vfio_device_fops_cdev_open,
 	.release	= vfio_device_fops_release,
 	.read		= vfio_device_fops_read,
 	.write		= vfio_device_fops_write,
@@ -1590,9 +1594,16 @@ static int __init vfio_init(void)
 		goto err_dev_class;
 	}
 
+	ret = vfio_cdev_init(vfio.device_class);
+	if (ret)
+		goto err_alloc_dev_chrdev;
+
 	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
 	return 0;
 
+err_alloc_dev_chrdev:
+	class_destroy(vfio.device_class);
+	vfio.device_class = NULL;
 err_dev_class:
 	vfio_virqfd_exit();
 err_virqfd:
@@ -1603,6 +1614,7 @@ static int __init vfio_init(void)
 static void __exit vfio_cleanup(void)
 {
 	ida_destroy(&vfio.device_ida);
+	vfio_cdev_cleanup();
 	class_destroy(vfio.device_class);
 	vfio.device_class = NULL;
 	vfio_virqfd_exit();
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 5c06af04ed9e..8719ec2adbbb 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -13,6 +13,7 @@
 #include <linux/mm.h>
 #include <linux/workqueue.h>
 #include <linux/poll.h>
+#include <linux/cdev.h>
 #include <uapi/linux/vfio.h>
 #include <linux/iova_bitmap.h>
 
@@ -51,6 +52,9 @@ struct vfio_device {
 	/* Members below here are private, not for driver use */
 	unsigned int index;
 	struct device device;	/* device.kref covers object life circle */
+#if IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV)
+	struct cdev cdev;
+#endif
 	refcount_t refcount;	/* user count on registered device*/
 	unsigned int open_count;
 	struct completion comp;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 21/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-03-27  9:40 [PATCH v8 00/24] Add vfio_device cdev for iommufd support Yi Liu
                   ` (19 preceding siblings ...)
  2023-03-27  9:40 ` [PATCH v8 20/24] vfio: Add cdev for vfio_device Yi Liu
@ 2023-03-27  9:40 ` Yi Liu
  2023-03-29 21:00   ` Alex Williamson
  2023-03-27  9:40 ` [PATCH v8 22/24] vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT Yi Liu
                   ` (5 subsequent siblings)
  26 siblings, 1 reply; 54+ messages in thread
From: Yi Liu @ 2023-03-27  9:40 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

This adds ioctl for userspace to bind device cdev fd to iommufd.

    VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA
			      control provided by the iommufd. open_device
			      op is called after bind_iommufd op.
			      VFIO no iommu mode is indicated by passing
			      a negative iommufd value.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/device_cdev.c | 153 +++++++++++++++++++++++++++++++++++++
 drivers/vfio/vfio.h        |  13 ++++
 drivers/vfio/vfio_main.c   |   5 ++
 include/uapi/linux/vfio.h  |  37 +++++++++
 4 files changed, 208 insertions(+)

diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
index 1c640016a824..2b563bac50b9 100644
--- a/drivers/vfio/device_cdev.c
+++ b/drivers/vfio/device_cdev.c
@@ -3,6 +3,7 @@
  * Copyright (c) 2023 Intel Corporation.
  */
 #include <linux/vfio.h>
+#include <linux/iommufd.h>
 
 #include "vfio.h"
 
@@ -44,6 +45,158 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
 	return ret;
 }
 
+static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
+{
+	spin_lock(&df->kvm_ref_lock);
+	if (df->kvm)
+		_vfio_device_get_kvm_safe(df->device, df->kvm);
+	spin_unlock(&df->kvm_ref_lock);
+}
+
+void vfio_device_cdev_close(struct vfio_device_file *df)
+{
+	struct vfio_device *device = df->device;
+
+	/*
+	 * As df->access_granted writer is under dev_set->lock as well,
+	 * so this read no need to use smp_load_acquire() to pair with
+	 * smp_store_release() in the caller of vfio_device_open().
+	 */
+	if (!df->access_granted)
+		return;
+
+	mutex_lock(&device->dev_set->lock);
+	vfio_device_close(df);
+	vfio_device_put_kvm(device);
+	if (df->iommufd)
+		iommufd_ctx_put(df->iommufd);
+	mutex_unlock(&device->dev_set->lock);
+	vfio_device_unblock_group(device);
+}
+
+static int vfio_device_cdev_enable_noiommu(struct vfio_device *device)
+{
+	if (!capable(CAP_SYS_RAWIO))
+		return -EPERM;
+
+	if (!device->noiommu)
+		return -EINVAL;
+
+	return 0;
+}
+
+static struct iommufd_ctx *vfio_get_iommufd_from_fd(int fd)
+{
+	struct fd f;
+	struct iommufd_ctx *iommufd;
+
+	f = fdget(fd);
+	if (!f.file)
+		return ERR_PTR(-EBADF);
+
+	iommufd = iommufd_ctx_from_file(f.file);
+
+	fdput(f);
+	return iommufd;
+}
+
+long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
+				    struct vfio_device_bind_iommufd __user *arg)
+{
+	struct vfio_device *device = df->device;
+	struct vfio_device_bind_iommufd bind;
+	struct iommufd_ctx *iommufd = NULL;
+	unsigned long minsz;
+	int ret;
+
+	static_assert(__same_type(arg->out_devid, bind.out_devid));
+
+	minsz = offsetofend(struct vfio_device_bind_iommufd, out_devid);
+
+	if (copy_from_user(&bind, arg, minsz))
+		return -EFAULT;
+
+	if (bind.argsz < minsz || bind.flags)
+		return -EINVAL;
+
+	if (!device->ops->bind_iommufd)
+		return -ENODEV;
+
+	/* BIND_IOMMUFD only allowed for cdev fds */
+	if (df->group)
+		return -EINVAL;
+
+	ret = vfio_device_block_group(device);
+	if (ret)
+		return ret;
+
+	mutex_lock(&device->dev_set->lock);
+	/* one device cannot be bound twice */
+	if (df->access_granted) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	/* iommufd < 0 means noiommu mode */
+	if (bind.iommufd < 0) {
+		ret = vfio_device_cdev_enable_noiommu(device);
+		if (ret)
+			goto out_unlock;
+	} else {
+		iommufd = vfio_get_iommufd_from_fd(bind.iommufd);
+		if (IS_ERR(iommufd)) {
+			ret = PTR_ERR(iommufd);
+			goto out_unlock;
+		}
+	}
+
+	/*
+	 * Before the device open, get the KVM pointer currently
+	 * associated with the device file (if there is) and obtain
+	 * a reference.  This reference is held until device closed.
+	 * Save the pointer in the device for use by drivers.
+	 */
+	vfio_device_get_kvm_safe(df);
+
+	df->iommufd = iommufd;
+	ret = vfio_device_open(df);
+	if (ret)
+		goto out_put_kvm;
+
+	if (df->iommufd)
+		bind.out_devid = df->devid;
+
+	ret = copy_to_user(&arg->out_devid, &bind.out_devid,
+			   sizeof(bind.out_devid)) ? -EFAULT : 0;
+	if (ret)
+		goto out_close_device;
+
+	if (bind.iommufd < 0)
+		dev_warn(device->dev, "device is bound to vfio-noiommu by user "
+			 "(%s:%d)\n", current->comm, task_pid_nr(current));
+
+	/*
+	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
+	 * read/write/mmap
+	 */
+	smp_store_release(&df->access_granted, true);
+	mutex_unlock(&device->dev_set->lock);
+
+	return 0;
+
+out_close_device:
+	vfio_device_close(df);
+out_put_kvm:
+	df->iommufd = NULL;
+	vfio_device_put_kvm(device);
+	if (iommufd)
+		iommufd_ctx_put(iommufd);
+out_unlock:
+	mutex_unlock(&device->dev_set->lock);
+	vfio_device_unblock_group(device);
+	return ret;
+}
+
 static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
 {
 	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 3a8fd0e32f59..ace3d52b0928 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -281,6 +281,9 @@ static inline void vfio_device_del(struct vfio_device *device)
 
 void vfio_init_device_cdev(struct vfio_device *device);
 int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
+void vfio_device_cdev_close(struct vfio_device_file *df);
+long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
+				    struct vfio_device_bind_iommufd __user *arg);
 int vfio_cdev_init(struct class *device_class);
 void vfio_cdev_cleanup(void);
 #else
@@ -304,6 +307,16 @@ static inline int vfio_device_fops_cdev_open(struct inode *inode,
 	return 0;
 }
 
+static inline void vfio_device_cdev_close(struct vfio_device_file *df)
+{
+}
+
+static inline long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
+						  struct vfio_device_bind_iommufd __user *arg)
+{
+	return -EOPNOTSUPP;
+}
+
 static inline int vfio_cdev_init(struct class *device_class)
 {
 	return 0;
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 58fc3bb768f2..375086c8803f 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -559,6 +559,8 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 
 	if (df->group)
 		vfio_device_group_close(df);
+	else
+		vfio_device_cdev_close(df);
 
 	vfio_device_put_registration(device);
 
@@ -1132,6 +1134,9 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
 	struct vfio_device *device = df->device;
 	int ret;
 
+	if (cmd == VFIO_DEVICE_BIND_IOMMUFD)
+		return vfio_device_ioctl_bind_iommufd(df, (void __user *)arg);
+
 	/* Paired with smp_store_release() following vfio_device_open() */
 	if (!smp_load_acquire(&df->access_granted))
 		return -EINVAL;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 61b801dfd40b..62b2f2497525 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -194,6 +194,43 @@ struct vfio_group_status {
 
 /* --------------- IOCTLs for DEVICE file descriptors --------------- */
 
+/*
+ * VFIO_DEVICE_BIND_IOMMUFD - _IOR(VFIO_TYPE, VFIO_BASE + 19,
+ *				   struct vfio_device_bind_iommufd)
+ *
+ * Bind a vfio_device to the specified iommufd.
+ *
+ * The user should provide a device cookie when calling this ioctl. The
+ * cookie is carried only in event e.g. I/O fault reported to userspace
+ * via iommufd. The user should use devid returned by this ioctl to mark
+ * the target device in other ioctls (e.g. iommu hardware infomration query
+ * via iommufd, and etc.).
+ *
+ * User is not allowed to access the device before the binding operation
+ * is completed.
+ *
+ * Unbind is automatically conducted when device fd is closed.
+ *
+ * @argsz:	 user filled size of this data.
+ * @flags:	 reserved for future extension.
+ * @dev_cookie:	 a per device cookie provided by userspace.
+ * @iommufd:	 iommufd to bind. a negative value means noiommu.
+ * @out_devid:	 the device id generated by this bind. This field is valid
+ *		as long as the input @iommufd is valid. Otherwise, it is
+ *		meaningless.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+struct vfio_device_bind_iommufd {
+	__u32		argsz;
+	__u32		flags;
+	__aligned_u64	dev_cookie;
+	__s32		iommufd;
+	__u32		out_devid;
+};
+
+#define VFIO_DEVICE_BIND_IOMMUFD	_IO(VFIO_TYPE, VFIO_BASE + 19)
+
 /**
  * VFIO_DEVICE_GET_INFO - _IOR(VFIO_TYPE, VFIO_BASE + 7,
  *						struct vfio_device_info)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 22/24] vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT
  2023-03-27  9:40 [PATCH v8 00/24] Add vfio_device cdev for iommufd support Yi Liu
                   ` (20 preceding siblings ...)
  2023-03-27  9:40 ` [PATCH v8 21/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD Yi Liu
@ 2023-03-27  9:40 ` Yi Liu
  2023-03-29 21:19   ` Alex Williamson
  2023-03-27  9:40 ` [PATCH v8 23/24] vfio: Compile group optionally Yi Liu
                   ` (4 subsequent siblings)
  26 siblings, 1 reply; 54+ messages in thread
From: Yi Liu @ 2023-03-27  9:40 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

This adds ioctl for userspace to attach device cdev fd to and detach
from IOAS/hw_pagetable managed by iommufd.

    VFIO_DEVICE_ATTACH_IOMMUFD_PT: attach vfio device to IOAS, hw_pagetable
				   managed by iommufd. Attach can be
				   undo by VFIO_DEVICE_DETACH_IOMMUFD_PT
				   or device fd close.
    VFIO_DEVICE_DETACH_IOMMUFD_PT: detach vfio device from the current attached
				   IOAS or hw_pagetable managed by iommufd.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/device_cdev.c | 85 ++++++++++++++++++++++++++++++++++++++
 drivers/vfio/vfio.h        | 16 +++++++
 drivers/vfio/vfio_main.c   |  8 ++++
 include/uapi/linux/vfio.h  | 52 +++++++++++++++++++++++
 4 files changed, 161 insertions(+)

diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
index 2b563bac50b9..b5de997bff6d 100644
--- a/drivers/vfio/device_cdev.c
+++ b/drivers/vfio/device_cdev.c
@@ -197,6 +197,91 @@ long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
 	return ret;
 }
 
+int vfio_ioctl_device_attach(struct vfio_device_file *df,
+			     struct vfio_device_attach_iommufd_pt __user *arg)
+{
+	struct vfio_device *device = df->device;
+	struct vfio_device_attach_iommufd_pt attach;
+	unsigned long minsz;
+	int ret;
+
+	static_assert(__same_type(arg->pt_id, attach.pt_id));
+
+	minsz = offsetofend(struct vfio_device_attach_iommufd_pt, pt_id);
+
+	if (copy_from_user(&attach, arg, minsz))
+		return -EFAULT;
+
+	if (attach.argsz < minsz || attach.flags)
+		return -EINVAL;
+
+	if (!device->ops->bind_iommufd)
+		return -ENODEV;
+
+	/* ATTACH only allowed for cdev fds */
+	if (df->group)
+		return -EINVAL;
+
+	mutex_lock(&device->dev_set->lock);
+	/* noiommufd mode doesn't allow attach */
+	if (!df->iommufd) {
+		ret = -EOPNOTSUPP;
+		goto out_unlock;
+	}
+
+	ret = device->ops->attach_ioas(device, &attach.pt_id);
+	if (ret)
+		goto out_unlock;
+
+	ret = copy_to_user(&arg->pt_id, &attach.pt_id,
+			   sizeof(attach.pt_id)) ? -EFAULT : 0;
+	if (ret)
+		goto out_detach;
+	mutex_unlock(&device->dev_set->lock);
+
+	return 0;
+
+out_detach:
+	device->ops->detach_ioas(device);
+out_unlock:
+	mutex_unlock(&device->dev_set->lock);
+	return ret;
+}
+
+int vfio_ioctl_device_detach(struct vfio_device_file *df,
+			     struct vfio_device_detach_iommufd_pt __user *arg)
+{
+	struct vfio_device *device = df->device;
+	struct vfio_device_detach_iommufd_pt detach;
+	unsigned long minsz;
+
+	minsz = offsetofend(struct vfio_device_detach_iommufd_pt, flags);
+
+	if (copy_from_user(&detach, arg, minsz))
+		return -EFAULT;
+
+	if (detach.argsz < minsz || detach.flags)
+		return -EINVAL;
+
+	if (!device->ops->bind_iommufd)
+		return -ENODEV;
+
+	/* DETACH only allowed for cdev fds */
+	if (df->group)
+		return -EINVAL;
+
+	mutex_lock(&device->dev_set->lock);
+	/* noiommufd mode doesn't support detach */
+	if (!df->iommufd) {
+		mutex_unlock(&device->dev_set->lock);
+		return -EOPNOTSUPP;
+	}
+	device->ops->detach_ioas(device);
+	mutex_unlock(&device->dev_set->lock);
+
+	return 0;
+}
+
 static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
 {
 	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index ace3d52b0928..c199e410db18 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -284,6 +284,10 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
 void vfio_device_cdev_close(struct vfio_device_file *df);
 long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
 				    struct vfio_device_bind_iommufd __user *arg);
+int vfio_ioctl_device_attach(struct vfio_device_file *df,
+			     struct vfio_device_attach_iommufd_pt __user *arg);
+int vfio_ioctl_device_detach(struct vfio_device_file *df,
+			     struct vfio_device_detach_iommufd_pt __user *arg);
 int vfio_cdev_init(struct class *device_class);
 void vfio_cdev_cleanup(void);
 #else
@@ -317,6 +321,18 @@ static inline long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
 	return -EOPNOTSUPP;
 }
 
+static inline int vfio_ioctl_device_attach(struct vfio_device_file *df,
+					   struct vfio_device_attach_iommufd_pt __user *arg)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline int vfio_ioctl_device_detach(struct vfio_device_file *df,
+					   struct vfio_device_detach_iommufd_pt __user *arg)
+{
+	return -EOPNOTSUPP;
+}
+
 static inline int vfio_cdev_init(struct class *device_class)
 {
 	return 0;
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 375086c8803f..896d8bb49585 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1150,6 +1150,14 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
 		ret = vfio_ioctl_device_feature(device, (void __user *)arg);
 		break;
 
+	case VFIO_DEVICE_ATTACH_IOMMUFD_PT:
+		ret = vfio_ioctl_device_attach(df, (void __user *)arg);
+		break;
+
+	case VFIO_DEVICE_DETACH_IOMMUFD_PT:
+		ret = vfio_ioctl_device_detach(df, (void __user *)arg);
+		break;
+
 	default:
 		if (unlikely(!device->ops->ioctl))
 			ret = -EINVAL;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 62b2f2497525..bf6c97e759c7 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -231,6 +231,58 @@ struct vfio_device_bind_iommufd {
 
 #define VFIO_DEVICE_BIND_IOMMUFD	_IO(VFIO_TYPE, VFIO_BASE + 19)
 
+/*
+ * VFIO_DEVICE_ATTACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 20,
+ *					struct vfio_device_attach_iommufd_pt)
+ *
+ * Attach a vfio device to an iommufd address space specified by IOAS
+ * id or hw_pagetable (hwpt) id.
+ *
+ * Available only after a device has been bound to iommufd via
+ * VFIO_DEVICE_BIND_IOMMUFD
+ *
+ * Undo by VFIO_DEVICE_DETACH_IOMMUFD_PT or device fd close.
+ *
+ * @argsz:	user filled size of this data.
+ * @flags:	must be 0.
+ * @pt_id:	Input the target id which can represent an ioas or a hwpt
+ *		allocated via iommufd subsystem.
+ *		Output the attached hwpt id which could be the specified
+ *		hwpt itself or a hwpt automatically created for the
+ *		specified ioas by kernel during the attachment.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+struct vfio_device_attach_iommufd_pt {
+	__u32	argsz;
+	__u32	flags;
+	__u32	pt_id;
+};
+
+#define VFIO_DEVICE_ATTACH_IOMMUFD_PT		_IO(VFIO_TYPE, VFIO_BASE + 20)
+
+/*
+ * VFIO_DEVICE_DETACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 21,
+ *					struct vfio_device_detach_iommufd_pt)
+ *
+ * Detach a vfio device from the iommufd address space it has been
+ * attached to. After it, device should be in a blocking DMA state.
+ *
+ * Available only after a device has been bound to iommufd via
+ * VFIO_DEVICE_BIND_IOMMUFD.
+ *
+ * @argsz:	user filled size of this data.
+ * @flags:	must be 0.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+struct vfio_device_detach_iommufd_pt {
+	__u32	argsz;
+	__u32	flags;
+};
+
+#define VFIO_DEVICE_DETACH_IOMMUFD_PT		_IO(VFIO_TYPE, VFIO_BASE + 21)
+
 /**
  * VFIO_DEVICE_GET_INFO - _IOR(VFIO_TYPE, VFIO_BASE + 7,
  *						struct vfio_device_info)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 23/24] vfio: Compile group optionally
  2023-03-27  9:40 [PATCH v8 00/24] Add vfio_device cdev for iommufd support Yi Liu
                   ` (21 preceding siblings ...)
  2023-03-27  9:40 ` [PATCH v8 22/24] vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT Yi Liu
@ 2023-03-27  9:40 ` Yi Liu
  2023-03-29 21:51   ` Alex Williamson
  2023-03-27  9:40 ` [PATCH v8 24/24] docs: vfio: Add vfio device cdev description Yi Liu
                   ` (3 subsequent siblings)
  26 siblings, 1 reply; 54+ messages in thread
From: Yi Liu @ 2023-03-27  9:40 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

group code is not needed for vfio device cdev, so with vfio device cdev
introduced, the group infrastructures can be compiled out if only cdev
is needed.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/Kconfig |   4 +-
 drivers/vfio/Kconfig          |  16 ++++-
 drivers/vfio/Makefile         |   2 +-
 drivers/vfio/vfio.h           | 111 ++++++++++++++++++++++++++++++++--
 include/linux/vfio.h          |  13 +++-
 5 files changed, 134 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig
index ada693ea51a7..1946eed1826a 100644
--- a/drivers/iommu/iommufd/Kconfig
+++ b/drivers/iommu/iommufd/Kconfig
@@ -14,8 +14,8 @@ config IOMMUFD
 if IOMMUFD
 config IOMMUFD_VFIO_CONTAINER
 	bool "IOMMUFD provides the VFIO container /dev/vfio/vfio"
-	depends on VFIO && !VFIO_CONTAINER
-	default VFIO && !VFIO_CONTAINER
+	depends on VFIO && VFIO_GROUP && !VFIO_CONTAINER
+	default VFIO && VFIO_GROUP && !VFIO_CONTAINER
 	help
 	  IOMMUFD will provide /dev/vfio/vfio instead of VFIO. This relies on
 	  IOMMUFD providing compatibility emulation to give the same ioctls.
diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index e2105b4dac2d..0942a19601a2 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -4,7 +4,9 @@ menuconfig VFIO
 	select IOMMU_API
 	depends on IOMMUFD || !IOMMUFD
 	select INTERVAL_TREE
-	select VFIO_CONTAINER if IOMMUFD=n
+	select VFIO_GROUP if SPAPR_TCE_IOMMU || !IOMMUFD
+	select VFIO_DEVICE_CDEV if !VFIO_GROUP
+	select VFIO_CONTAINER if IOMMUFD=n && VFIO_GROUP
 	help
 	  VFIO provides a framework for secure userspace device drivers.
 	  See Documentation/driver-api/vfio.rst for more details.
@@ -15,6 +17,7 @@ if VFIO
 config VFIO_DEVICE_CDEV
 	bool "Support for the VFIO cdev /dev/vfio/devices/vfioX"
 	depends on IOMMUFD
+	default !VFIO_GROUP
 	help
 	  The VFIO device cdev is another way for userspace to get device
 	  access. Userspace gets device fd by opening device cdev under
@@ -23,9 +26,20 @@ config VFIO_DEVICE_CDEV
 
 	  If you don't know what to do here, say N.
 
+config VFIO_GROUP
+	bool "Support for the VFIO group /dev/vfio/$group_id"
+	default y
+	help
+	   VFIO group support provides the traditional model for accessing
+	   devices through VFIO and is used by the majority of userspace
+	   applications and drivers making use of VFIO.
+
+	   If you don't know what to do here, say Y.
+
 config VFIO_CONTAINER
 	bool "Support for the VFIO container /dev/vfio/vfio"
 	select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
+	depends on VFIO_GROUP
 	default y
 	help
 	  The VFIO container is the classic interface to VFIO for establishing
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index 245394aeb94b..57c3515af606 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -2,9 +2,9 @@
 obj-$(CONFIG_VFIO) += vfio.o
 
 vfio-y += vfio_main.o \
-	  group.o \
 	  iova_bitmap.o
 vfio-$(CONFIG_VFIO_DEVICE_CDEV) += device_cdev.o
+vfio-$(CONFIG_VFIO_GROUP) += group.o
 vfio-$(CONFIG_IOMMUFD) += iommufd.o
 vfio-$(CONFIG_VFIO_CONTAINER) += container.o
 vfio-$(CONFIG_VFIO_VIRQFD) += virqfd.o
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index c199e410db18..9c7a238ec8dd 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -36,6 +36,12 @@ vfio_allocate_device_file(struct vfio_device *device);
 
 extern const struct file_operations vfio_device_fops;
 
+#ifdef CONFIG_VFIO_NOIOMMU
+extern bool vfio_noiommu __read_mostly;
+#else
+enum { vfio_noiommu = false };
+#endif
+
 enum vfio_group_type {
 	/*
 	 * Physical device with IOMMU backing.
@@ -60,6 +66,7 @@ enum vfio_group_type {
 	VFIO_NO_IOMMU,
 };
 
+#if IS_ENABLED(CONFIG_VFIO_GROUP)
 struct vfio_group {
 	struct device 			dev;
 	struct cdev			cdev;
@@ -113,6 +120,104 @@ static inline void vfio_device_set_noiommu(struct vfio_device *device)
 	device->noiommu = IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
 			  device->group->type == VFIO_NO_IOMMU;
 }
+#else
+struct vfio_group;
+
+static inline int vfio_device_block_group(struct vfio_device *device)
+{
+	return 0;
+}
+
+static inline void vfio_device_unblock_group(struct vfio_device *device)
+{
+}
+
+static inline int vfio_device_set_group(struct vfio_device *device,
+					enum vfio_group_type type)
+{
+	return 0;
+}
+
+static inline void vfio_device_remove_group(struct vfio_device *device)
+{
+}
+
+static inline void vfio_device_group_register(struct vfio_device *device)
+{
+}
+
+static inline void vfio_device_group_unregister(struct vfio_device *device)
+{
+}
+
+static inline bool vfio_device_group_uses_container(struct vfio_device_file *df)
+{
+	return false;
+}
+
+static inline int vfio_device_group_use_iommu(struct vfio_device *device)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline void vfio_device_group_unuse_iommu(struct vfio_device *device)
+{
+}
+
+static inline void vfio_device_group_close(struct vfio_device_file *df)
+{
+}
+
+static inline struct vfio_group *vfio_group_from_file(struct file *file)
+{
+	return NULL;
+}
+
+static inline bool vfio_group_has_dev(struct vfio_group *group,
+				      struct vfio_device *device)
+{
+	return false;
+}
+
+static inline bool vfio_group_enforced_coherent(struct vfio_group *group)
+{
+	return true;
+}
+
+static inline void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm)
+{
+}
+
+static inline bool vfio_device_has_container(struct vfio_device *device)
+{
+	return false;
+}
+
+static inline int __init vfio_group_init(void)
+{
+	return 0;
+}
+
+static inline void vfio_group_cleanup(void)
+{
+}
+
+static inline void vfio_device_set_noiommu(struct vfio_device *device)
+{
+	struct iommu_group *iommu_group;
+
+	device->noiommu = false;
+
+	if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU) || !vfio_noiommu)
+		return;
+
+	iommu_group = iommu_group_get(device->dev);
+	if (iommu_group)
+		iommu_group_put(iommu_group);
+	else
+		device->noiommu = true;
+}
+#endif /* CONFIG_VFIO_GROUP */
 
 #if IS_ENABLED(CONFIG_VFIO_CONTAINER)
 /**
@@ -356,12 +461,6 @@ static inline void vfio_virqfd_exit(void)
 }
 #endif
 
-#ifdef CONFIG_VFIO_NOIOMMU
-extern bool vfio_noiommu __read_mostly;
-#else
-enum { vfio_noiommu = false };
-#endif
-
 #ifdef CONFIG_HAVE_KVM
 void _vfio_device_get_kvm_safe(struct vfio_device *device, struct kvm *kvm);
 void vfio_device_put_kvm(struct vfio_device *device);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 8719ec2adbbb..1367605d617c 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -43,7 +43,11 @@ struct vfio_device {
 	 */
 	const struct vfio_migration_ops *mig_ops;
 	const struct vfio_log_ops *log_ops;
+#if IS_ENABLED(CONFIG_VFIO_GROUP)
 	struct vfio_group *group;
+	struct list_head group_next;
+	struct list_head iommu_entry;
+#endif
 	struct vfio_device_set *dev_set;
 	struct list_head dev_set_list;
 	unsigned int migration_flags;
@@ -58,8 +62,6 @@ struct vfio_device {
 	refcount_t refcount;	/* user count on registered device*/
 	unsigned int open_count;
 	struct completion comp;
-	struct list_head group_next;
-	struct list_head iommu_entry;
 	struct iommufd_access *iommufd_access;
 	void (*put_kvm)(struct kvm *kvm);
 #if IS_ENABLED(CONFIG_IOMMUFD)
@@ -270,7 +272,14 @@ int vfio_mig_get_next_state(struct vfio_device *device,
 /*
  * External user API
  */
+#if IS_ENABLED(CONFIG_VFIO_GROUP)
 struct iommu_group *vfio_file_iommu_group(struct file *file);
+#else
+static inline struct iommu_group *vfio_file_iommu_group(struct file *file)
+{
+	return NULL;
+}
+#endif
 bool vfio_file_is_valid(struct file *file);
 bool vfio_file_enforced_coherent(struct file *file);
 void vfio_file_set_kvm(struct file *file, struct kvm *kvm);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v8 24/24] docs: vfio: Add vfio device cdev description
  2023-03-27  9:40 [PATCH v8 00/24] Add vfio_device cdev for iommufd support Yi Liu
                   ` (22 preceding siblings ...)
  2023-03-27  9:40 ` [PATCH v8 23/24] vfio: Compile group optionally Yi Liu
@ 2023-03-27  9:40 ` Yi Liu
  2023-03-29 22:47   ` Alex Williamson
  2023-03-27 17:48 ` [PATCH v8 00/24] Add vfio_device cdev for iommufd support Nicolin Chen
                   ` (2 subsequent siblings)
  26 siblings, 1 reply; 54+ messages in thread
From: Yi Liu @ 2023-03-27  9:40 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

This gives notes for userspace applications on device cdev usage.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 Documentation/driver-api/vfio.rst | 127 ++++++++++++++++++++++++++++++
 1 file changed, 127 insertions(+)

diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
index 363e12c90b87..77408788b98d 100644
--- a/Documentation/driver-api/vfio.rst
+++ b/Documentation/driver-api/vfio.rst
@@ -239,6 +239,125 @@ group and can access them as follows::
 	/* Gratuitous device reset and go... */
 	ioctl(device, VFIO_DEVICE_RESET);
 
+IOMMUFD and vfio_iommu_type1
+----------------------------
+
+IOMMUFD is the new user API to manage I/O page tables from userspace.
+It intends to be the portal of delivering advanced userspace DMA
+features (nested translation [5], PASID [6], etc.) while being backward
+compatible with the vfio_iommu_type1 driver.  Eventually vfio_iommu_type1
+will be deprecated.
+
+With the backward compatibility, no change is required for legacy VFIO
+drivers or applications to connect a VFIO device to IOMMUFD.
+
+	When CONFIG_IOMMUFD_VFIO_CONTAINER=n, VFIO container still provides
+	/dev/vfio/vfio which connects to vfio_iommu_type1.  To disable VFIO
+	container and vfio_iommu_type1, the administrator could symbol link
+	/dev/vfio/vfio to /dev/iommu to enable VFIO container emulation
+	in IOMMUFD.
+
+	When CONFIG_IOMMUFD_VFIO_CONTAINER=y, IOMMUFD directly provides
+	/dev/vfio/vfio while the VFIO container and vfio_iommu_type1 are
+	explicitly disabled.
+
+VFIO Device cdev
+----------------
+
+Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
+in a VFIO group.
+
+With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd
+by directly opening a character device /dev/vfio/devices/vfioX where
+"X" is the number allocated uniquely by VFIO for registered devices.
+For noiommu devices, the character device would be named with "noiommu-"
+prefix. e.g. /dev/vfio/devices/noiommu-vfioX.
+
+The cdev only works with IOMMUFD.  Both VFIO drivers and applications
+must adapt to the new cdev security model which requires using
+VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to
+actually use the device.  Once BIND succeeds then a VFIO device can
+be fully accessed by the user.
+
+VFIO device cdev doesn't rely on VFIO group/container/iommu drivers.
+Hence those modules can be fully compiled out in an environment
+where no legacy VFIO application exists.
+
+So far SPAPR does not support IOMMUFD yet.  So it cannot support device
+cdev neither.
+
+Device cdev Example
+-------------------
+
+Assume user wants to access PCI device 0000:6a:01.0::
+
+	$ ls /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/
+	vfio0
+
+This device is therefore represented as vfio0.  The user can verify
+its existence::
+
+	$ ls -l /dev/vfio/devices/vfio0
+	crw------- 1 root root 511, 0 Feb 16 01:22 /dev/vfio/devices/vfio0
+	$ cat /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/vfio0/dev
+	511:0
+	$ ls -l /dev/char/511\:0
+	lrwxrwxrwx 1 root root 21 Feb 16 01:22 /dev/char/511:0 -> ../vfio/devices/vfio0
+
+Then provide the user with access to the device if unprivileged
+operation is desired::
+
+	$ chown user:user /dev/vfio/devices/vfio0
+
+Finally the user could get cdev fd by::
+
+	cdev_fd = open("/dev/vfio/devices/vfio0", O_RDWR);
+
+An opened cdev_fd doesn't give the user any permission of accessing
+the device except binding the cdev_fd to an iommufd.  After that point
+then the device is fully accessible including attaching it to an
+IOMMUFD IOAS/HWPT to enable userspace DMA::
+
+	struct vfio_device_bind_iommufd bind = {
+		.argsz = sizeof(bind),
+		.flags = 0,
+	};
+	struct iommu_ioas_alloc alloc_data  = {
+		.size = sizeof(alloc_data),
+		.flags = 0,
+	};
+	struct vfio_device_attach_iommufd_pt attach_data = {
+		.argsz = sizeof(attach_data),
+		.flags = 0,
+	};
+	struct iommu_ioas_map map = {
+		.size = sizeof(map),
+		.flags = IOMMU_IOAS_MAP_READABLE |
+			 IOMMU_IOAS_MAP_WRITEABLE |
+			 IOMMU_IOAS_MAP_FIXED_IOVA,
+		.__reserved = 0,
+	};
+
+	iommufd = open("/dev/iommu", O_RDWR);
+
+	bind.iommufd = iommufd; // negative value means vfio-noiommu mode
+	ioctl(cdev_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind);
+
+	ioctl(iommufd, IOMMU_IOAS_ALLOC, &alloc_data);
+	attach_data.pt_id = alloc_data.out_ioas_id;
+	ioctl(cdev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data);
+
+	/* Allocate some space and setup a DMA mapping */
+	map.user_va = (int64_t)mmap(0, 1024 * 1024, PROT_READ | PROT_WRITE,
+				    MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
+	map.iova = 0; /* 1MB starting at 0x0 from device view */
+	map.length = 1024 * 1024;
+	map.ioas_id = alloc_data.out_ioas_id;;
+
+	ioctl(iommufd, IOMMU_IOAS_MAP, &map);
+
+	/* Other device operations as stated in "VFIO Usage Example" */
+
 VFIO User API
 -------------------------------------------------------------------------------
 
@@ -566,3 +685,11 @@ This implementation has some specifics:
 				\-0d.1
 
 	00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
+
+.. [5] Nested translation is an IOMMU feature which supports two stage
+   address translations.  This improves the address translation efficiency
+   in IOMMU virtualization.
+
+.. [6] PASID stands for Process Address Space ID, introduced by PCI
+   Express.  It is a prerequisite for Shared Virtual Addressing (SVA)
+   and Scalable I/O Virtualization (Scalable IOV).
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 00/24] Add vfio_device cdev for iommufd support
  2023-03-27  9:40 [PATCH v8 00/24] Add vfio_device cdev for iommufd support Yi Liu
                   ` (23 preceding siblings ...)
  2023-03-27  9:40 ` [PATCH v8 24/24] docs: vfio: Add vfio device cdev description Yi Liu
@ 2023-03-27 17:48 ` Nicolin Chen
  2023-03-31  3:10 ` Jiang, Yanting
  2023-03-31  5:01 ` Jiang, Yanting
  26 siblings, 0 replies; 54+ messages in thread
From: Nicolin Chen @ 2023-03-27 17:48 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, jgg, kevin.tian, joro, robin.murphy, cohuck,
	eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

On Mon, Mar 27, 2023 at 02:40:23AM -0700, Yi Liu wrote:
> External email: Use caution opening links or attachments
> 
> 
> Existing VFIO provides group-centric user APIs for userspace. Userspace
> opens the /dev/vfio/$group_id first before getting device fd and hence
> getting access to device. This is not the desired model for iommufd. Per
> the conclusion of community discussion[1], iommufd provides device-centric
> kAPIs and requires its consumer (like VFIO) to be device-centric user
> APIs. Such user APIs are used to associate device with iommufd and also
> the I/O address spaces managed by the iommufd.
> 
> This series first introduces a per device file structure to be prepared
> for further enhancement and refactors the kvm-vfio code to be prepared
> for accepting device file from userspace. Afte this, adds a mechanism for
> blocking device access before iommufd bind. Then refactors the vfio to be
> able to handle cdev path (e.g. iommufd binding, no-iommufd, [de]attach ioas).
> This refactor includes making the device_open exclusive between group and
> cdev path, only allow single device open in cdev path; vfio-iommufd code is
> also refactored to support cdev. e.g. split the vfio_iommufd_bind() into
> two steps. Eventually, adds the cdev support for vfio device and the new
> ioctls, then makes group infrastructure optional as it is not needed when
> vfio device cdev is compiled.
> 
> This series is based on some preparation works done to vfio emulated devices[2]
> and vfio pci hot reset enhancements[3].
> 
> This series is a prerequisite for iommu nesting for vfio device[4] [5].
> 
> The complete code can be found in below branch, simple tests done to the
> legacy group path and the cdev path. Draft QEMU branch can be found at[6]
> 
> https://github.com/yiliu1765/iommufd/tree/vfio_device_cdev_v8
> (config CONFIG_IOMMUFD=y CONFIG_VFIO_DEVICE_CDEV=y)
> 
> base-commit: 1d412cdf6cd17c347b5398416a60518671e13d37
> 
> [1] https://lore.kernel.org/kvm/BN9PR11MB5433B1E4AE5B0480369F97178C189@BN9PR11MB5433.namprd11.prod.outlook.com/
> [2] https://lore.kernel.org/kvm/20230327093351.44505-1-yi.l.liu@intel.com/
> [3] https://lore.kernel.org/kvm/20230327093458.44939-1-yi.l.liu@intel.com/
> [4] https://lore.kernel.org/linux-iommu/20230309080910.607396-1-yi.l.liu@intel.com/
> [5] https://lore.kernel.org/linux-iommu/20230309082207.612346-1-yi.l.liu@intel.com/
> [6] https://github.com/yiliu1765/qemu/tree/iommufd_rfcv3 (it is based on Eric's
>     QEMU iommufd rfcv3 (https://lore.kernel.org/kvm/20230131205305.2726330-1-eric.auger@redhat.com/)
>     plus commits to align with vfio_device_cdev v8)
> 
> Change log:
> 
> v8:
>  - Add patch 18 to determine noiommu device at vfio_device registration (Jason)
>  - Add patch 19 to name noiommu device with "noiommu-" prefix to be par with
>    group path
>  - Add r-b from Kevin
>  - Add t-b from Terrence

This runs well with iommufd selftest on x86 and QEMU sanity on
ARM64, applying nesting series on top of this series:
https://github.com/nicolinc/iommufd/commits/wip/iommufd_nesting-03272023

Tested-by: Nicolin Chen <nicolinc@nvidia.com>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 16/24] iommufd/device: Add iommufd_access_detach() API
  2023-03-27  9:40 ` [PATCH v8 16/24] iommufd/device: Add iommufd_access_detach() API Yi Liu
@ 2023-03-28  2:23   ` Jon Pan-Doh
  2023-03-28 15:54     ` Nicolin Chen
  0 siblings, 1 reply; 54+ messages in thread
From: Jon Pan-Doh @ 2023-03-28  2:23 UTC (permalink / raw)
  To: yi.l.liu
  Cc: alex.williamson, chao.p.peng, cohuck, eric.auger, intel-gfx,
	intel-gvt-dev, jasowang, jgg, joro, kevin.tian, kvm, linux-s390,
	lulu, mjrosato, nicolinc, peterx, robin.murphy,
	shameerali.kolothum.thodi, suravee.suthikulpanit, terrence.xu,
	xudong.hao, yan.y.zhao, yanting.jiang, yi.y.sun, Jon Pan-Doh

On 2023/3/27 02:40, Yi Liu wrote:
> diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
> index 2e6e8e217cce..ec2ce3ef187d 100644
> --- a/drivers/iommu/iommufd/iommufd_private.h
> +++ b/drivers/iommu/iommufd/iommufd_private.h
> @@ -263,6 +263,8 @@ struct iommufd_access {
>  	struct iommufd_object obj;
>  	struct iommufd_ctx *ictx;
>  	struct iommufd_ioas *ioas;
> +	struct iommufd_ioas *ioas_unpin;
> +	struct mutex ioas_lock;
>  	const struct iommufd_access_ops *ops;
>  	void *data;
>  	unsigned long iova_alignment;

I think you may need to initialize ioas_lock. I got lockdep warnings running
iommufd selftests against this patch. Those went away when I added mutex_init().

---
 drivers/iommu/iommufd/device.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index f0522d80919d..0eaae60f3537 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -474,6 +474,7 @@ iommufd_access_create(struct iommufd_ctx *ictx,
 	iommufd_ctx_get(ictx);
 	iommufd_object_finalize(ictx, &access->obj);
 	*id = access->obj.id;
+	mutex_init(&access->ioas_lock);
 	return access;
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_access_create, IOMMUFD);
-- 
2.40.0.348.gf938b09366-goog


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* RE: [PATCH v8 18/24] vfio: Determine noiommu in vfio_device registration
  2023-03-27  9:40 ` [PATCH v8 18/24] vfio: Determine noiommu in vfio_device registration Yi Liu
@ 2023-03-28  6:36   ` Tian, Kevin
  0 siblings, 0 replies; 54+ messages in thread
From: Tian, Kevin @ 2023-03-28  6:36 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, March 27, 2023 5:41 PM
> 
> This adds a noiommu flag in vfio_device, hence caller of the
> vfio_device_is_noiommu() just refers to the flag for noiommu
> check.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v8 19/24] vfio: Name noiommu vfio_device with "noiommu-" prefix
  2023-03-27  9:40 ` [PATCH v8 19/24] vfio: Name noiommu vfio_device with "noiommu-" prefix Yi Liu
@ 2023-03-28  6:37   ` Tian, Kevin
  0 siblings, 0 replies; 54+ messages in thread
From: Tian, Kevin @ 2023-03-28  6:37 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, March 27, 2023 5:41 PM
> 
> For noiommu device, vfio core names the cdev node with prefix "noiommu-".
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 16/24] iommufd/device: Add iommufd_access_detach() API
  2023-03-28  2:23   ` Jon Pan-Doh
@ 2023-03-28 15:54     ` Nicolin Chen
  2023-03-29  2:24       ` Liu, Yi L
  0 siblings, 1 reply; 54+ messages in thread
From: Nicolin Chen @ 2023-03-28 15:54 UTC (permalink / raw)
  To: yi.l.liu, Jon Pan-Doh
  Cc: alex.williamson, chao.p.peng, cohuck, eric.auger, intel-gfx,
	intel-gvt-dev, jasowang, jgg, joro, kevin.tian, kvm, linux-s390,
	lulu, mjrosato, peterx, robin.murphy, shameerali.kolothum.thodi,
	suravee.suthikulpanit, terrence.xu, xudong.hao, yan.y.zhao,
	yanting.jiang, yi.y.sun

On Mon, Mar 27, 2023 at 07:23:57PM -0700, Jon Pan-Doh wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 2023/3/27 02:40, Yi Liu wrote:
> > diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
> > index 2e6e8e217cce..ec2ce3ef187d 100644
> > --- a/drivers/iommu/iommufd/iommufd_private.h
> > +++ b/drivers/iommu/iommufd/iommufd_private.h
> > @@ -263,6 +263,8 @@ struct iommufd_access {
> >       struct iommufd_object obj;
> >       struct iommufd_ctx *ictx;
> >       struct iommufd_ioas *ioas;
> > +     struct iommufd_ioas *ioas_unpin;
> > +     struct mutex ioas_lock;
> >       const struct iommufd_access_ops *ops;
> >       void *data;
> >       unsigned long iova_alignment;
> 
> I think you may need to initialize ioas_lock. I got lockdep warnings running
> iommufd selftests against this patch. Those went away when I added mutex_init().
> 
> ---
>  drivers/iommu/iommufd/device.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> index f0522d80919d..0eaae60f3537 100644
> --- a/drivers/iommu/iommufd/device.c
> +++ b/drivers/iommu/iommufd/device.c
> @@ -474,6 +474,7 @@ iommufd_access_create(struct iommufd_ctx *ictx,
>         iommufd_ctx_get(ictx);
>         iommufd_object_finalize(ictx, &access->obj);
>         *id = access->obj.id;
> +       mutex_init(&access->ioas_lock);
>         return access;
>  }
>  EXPORT_SYMBOL_NS_GPL(iommufd_access_create, IOMMUFD);
> --
> 2.40.0.348.gf938b09366-goog

Yes... I think I lost that when splitting the changes.

Yi, can you help add this in the next version? 

Thanks!
Nicolin

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 08/24] vfio: Block device access via device fd until device is opened
  2023-03-27  9:40 ` [PATCH v8 08/24] vfio: Block device access via device fd until device is opened Yi Liu
@ 2023-03-28 21:33   ` Alex Williamson
  2023-03-29  2:23     ` Liu, Yi L
  0 siblings, 1 reply; 54+ messages in thread
From: Alex Williamson @ 2023-03-28 21:33 UTC (permalink / raw)
  To: Yi Liu
  Cc: jgg, kevin.tian, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

On Mon, 27 Mar 2023 02:40:31 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> Allow the vfio_device file to be in a state where the device FD is
> opened but the device cannot be used by userspace (i.e. its .open_device()
> hasn't been called). This inbetween state is not used when the device
> FD is spawned from the group FD, however when we create the device FD
> directly by opening a cdev it will be opened in the blocked state.
> 
> The reason for the inbetween state is that userspace only gets a FD but
> doesn't gain access permission until binding the FD to an iommufd. So in
> the blocked state, only the bind operation is allowed. Completing bind
> will allow user to further access the device.
> 
> This is implemented by adding a flag in struct vfio_device_file to mark
> the blocked state and using a simple smp_load_acquire() to obtain the
> flag value and serialize all the device setup with the thread accessing
> this device.
> 
> Following this lockless scheme, it can safely handle the device FD
> unbound->bound but it cannot handle bound->unbound. To allow this we'd
> need to add a lock on all the vfio ioctls which seems costly. So once
> device FD is bound, it remains bound until the FD is closed.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/group.c     | 11 ++++++++++-
>  drivers/vfio/vfio.h      |  1 +
>  drivers/vfio/vfio_main.c | 41 ++++++++++++++++++++++++++++++++++------
>  3 files changed, 46 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> index 9a7b2765eef6..4f267ae7bebc 100644
> --- a/drivers/vfio/group.c
> +++ b/drivers/vfio/group.c
> @@ -194,9 +194,18 @@ static int vfio_device_group_open(struct vfio_device_file *df)
>  	df->iommufd = device->group->iommufd;
>  
>  	ret = vfio_device_open(df);
> -	if (ret)
> +	if (ret) {
>  		df->iommufd = NULL;
> +		goto out_put_kvm;
> +	}
> +
> +	/*
> +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> +	 * read/write/mmap
> +	 */
> +	smp_store_release(&df->access_granted, true);
>  
> +out_put_kvm:
>  	if (device->open_count == 0)
>  		vfio_device_put_kvm(device);
>  
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index cffc08f5a6f1..854f2c97cb9a 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -18,6 +18,7 @@ struct vfio_container;
>  
>  struct vfio_device_file {
>  	struct vfio_device *device;
> +	bool access_granted;
>  	spinlock_t kvm_ref_lock; /* protect kvm field */
>  	struct kvm *kvm;
>  	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 2ea6cb6d03c7..b515bbda4c74 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -1114,6 +1114,10 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
>  	struct vfio_device *device = df->device;
>  	int ret;
>  
> +	/* Paired with smp_store_release() following vfio_device_open() */
> +	if (!smp_load_acquire(&df->access_granted))
> +		return -EINVAL;
> +
>  	ret = vfio_device_pm_runtime_get(device);
>  	if (ret)
>  		return ret;
> @@ -1141,6 +1145,10 @@ static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf,
>  	struct vfio_device_file *df = filep->private_data;
>  	struct vfio_device *device = df->device;
>  
> +	/* Paired with smp_store_release() following vfio_device_open() */
> +	if (!smp_load_acquire(&df->access_granted))
> +		return -EINVAL;
> +
>  	if (unlikely(!device->ops->read))
>  		return -EINVAL;
>  
> @@ -1154,6 +1162,10 @@ static ssize_t vfio_device_fops_write(struct file *filep,
>  	struct vfio_device_file *df = filep->private_data;
>  	struct vfio_device *device = df->device;
>  
> +	/* Paired with smp_store_release() following vfio_device_open() */
> +	if (!smp_load_acquire(&df->access_granted))
> +		return -EINVAL;
> +
>  	if (unlikely(!device->ops->write))
>  		return -EINVAL;
>  
> @@ -1165,6 +1177,10 @@ static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
>  	struct vfio_device_file *df = filep->private_data;
>  	struct vfio_device *device = df->device;
>  
> +	/* Paired with smp_store_release() following vfio_device_open() */
> +	if (!smp_load_acquire(&df->access_granted))
> +		return -EINVAL;
> +
>  	if (unlikely(!device->ops->mmap))
>  		return -EINVAL;
>  
> @@ -1201,6 +1217,24 @@ bool vfio_file_is_valid(struct file *file)
>  }
>  EXPORT_SYMBOL_GPL(vfio_file_is_valid);
>  
> +/*
> + * Return true if the input file is a vfio device file and has opened
> + * the input device. Otherwise, return false.
> + */
> +static bool vfio_file_has_device_access(struct file *file,
> +					struct vfio_device *device)
> +{
> +	struct vfio_device *vdev = vfio_device_from_file(file);
> +	struct vfio_device_file *df;
> +
> +	if (!vdev || vdev != device)
> +		return false;
> +
> +	df = file->private_data;
> +
> +	return READ_ONCE(df->access_granted);

Why did we change from smp_load_acquire() to READ_ONCE() here?  Thanks,

Alex

> +}
> +
>  /**
>   * vfio_file_has_dev - True if the VFIO file is a handle for device
>   * @file: VFIO file to check
> @@ -1211,17 +1245,12 @@ EXPORT_SYMBOL_GPL(vfio_file_is_valid);
>  bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
>  {
>  	struct vfio_group *group;
> -	struct vfio_device *vdev;
>  
>  	group = vfio_group_from_file(file);
>  	if (group)
>  		return vfio_group_has_dev(group, device);
>  
> -	vdev = vfio_device_from_file(file);
> -	if (vdev)
> -		return vdev == device;
> -
> -	return false;
> +	return vfio_file_has_device_access(file, device);
>  }
>  EXPORT_SYMBOL_GPL(vfio_file_has_dev);
>  


^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v8 08/24] vfio: Block device access via device fd until device is opened
  2023-03-28 21:33   ` Alex Williamson
@ 2023-03-29  2:23     ` Liu, Yi L
  0 siblings, 0 replies; 54+ messages in thread
From: Liu, Yi L @ 2023-03-29  2:23 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Wednesday, March 29, 2023 5:34 AM
> 
> On Mon, 27 Mar 2023 02:40:31 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > Allow the vfio_device file to be in a state where the device FD is
> > opened but the device cannot be used by userspace (i.e. its .open_device()
> > hasn't been called). This inbetween state is not used when the device
> > FD is spawned from the group FD, however when we create the device FD
> > directly by opening a cdev it will be opened in the blocked state.
> >
> > The reason for the inbetween state is that userspace only gets a FD but
> > doesn't gain access permission until binding the FD to an iommufd. So in
> > the blocked state, only the bind operation is allowed. Completing bind
> > will allow user to further access the device.
> >
> > This is implemented by adding a flag in struct vfio_device_file to mark
> > the blocked state and using a simple smp_load_acquire() to obtain the
> > flag value and serialize all the device setup with the thread accessing
> > this device.
> >
> > Following this lockless scheme, it can safely handle the device FD
> > unbound->bound but it cannot handle bound->unbound. To allow this
> we'd
> > need to add a lock on all the vfio ioctls which seems costly. So once
> > device FD is bound, it remains bound until the FD is closed.
> >
> > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > Tested-by: Terrence Xu <terrence.xu@intel.com>
> > Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> > Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/group.c     | 11 ++++++++++-
> >  drivers/vfio/vfio.h      |  1 +
> >  drivers/vfio/vfio_main.c | 41 ++++++++++++++++++++++++++++++++++---
> ---
> >  3 files changed, 46 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > index 9a7b2765eef6..4f267ae7bebc 100644
> > --- a/drivers/vfio/group.c
> > +++ b/drivers/vfio/group.c
> > @@ -194,9 +194,18 @@ static int vfio_device_group_open(struct
> vfio_device_file *df)
> >  	df->iommufd = device->group->iommufd;
> >
> >  	ret = vfio_device_open(df);
> > -	if (ret)
> > +	if (ret) {
> >  		df->iommufd = NULL;
> > +		goto out_put_kvm;
> > +	}
> > +
> > +	/*
> > +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> > +	 * read/write/mmap
> > +	 */
> > +	smp_store_release(&df->access_granted, true);
> >
> > +out_put_kvm:
> >  	if (device->open_count == 0)
> >  		vfio_device_put_kvm(device);
> >
> > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > index cffc08f5a6f1..854f2c97cb9a 100644
> > --- a/drivers/vfio/vfio.h
> > +++ b/drivers/vfio/vfio.h
> > @@ -18,6 +18,7 @@ struct vfio_container;
> >
> >  struct vfio_device_file {
> >  	struct vfio_device *device;
> > +	bool access_granted;
> >  	spinlock_t kvm_ref_lock; /* protect kvm field */
> >  	struct kvm *kvm;
> >  	struct iommufd_ctx *iommufd; /* protected by struct
> vfio_device_set::lock */
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index 2ea6cb6d03c7..b515bbda4c74 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -1114,6 +1114,10 @@ static long vfio_device_fops_unl_ioctl(struct
> file *filep,
> >  	struct vfio_device *device = df->device;
> >  	int ret;
> >
> > +	/* Paired with smp_store_release() following vfio_device_open() */
> > +	if (!smp_load_acquire(&df->access_granted))
> > +		return -EINVAL;
> > +
> >  	ret = vfio_device_pm_runtime_get(device);
> >  	if (ret)
> >  		return ret;
> > @@ -1141,6 +1145,10 @@ static ssize_t vfio_device_fops_read(struct file
> *filep, char __user *buf,
> >  	struct vfio_device_file *df = filep->private_data;
> >  	struct vfio_device *device = df->device;
> >
> > +	/* Paired with smp_store_release() following vfio_device_open() */
> > +	if (!smp_load_acquire(&df->access_granted))
> > +		return -EINVAL;
> > +
> >  	if (unlikely(!device->ops->read))
> >  		return -EINVAL;
> >
> > @@ -1154,6 +1162,10 @@ static ssize_t vfio_device_fops_write(struct
> file *filep,
> >  	struct vfio_device_file *df = filep->private_data;
> >  	struct vfio_device *device = df->device;
> >
> > +	/* Paired with smp_store_release() following vfio_device_open() */
> > +	if (!smp_load_acquire(&df->access_granted))
> > +		return -EINVAL;
> > +
> >  	if (unlikely(!device->ops->write))
> >  		return -EINVAL;
> >
> > @@ -1165,6 +1177,10 @@ static int vfio_device_fops_mmap(struct file
> *filep, struct vm_area_struct *vma)
> >  	struct vfio_device_file *df = filep->private_data;
> >  	struct vfio_device *device = df->device;
> >
> > +	/* Paired with smp_store_release() following vfio_device_open() */
> > +	if (!smp_load_acquire(&df->access_granted))
> > +		return -EINVAL;
> > +
> >  	if (unlikely(!device->ops->mmap))
> >  		return -EINVAL;
> >
> > @@ -1201,6 +1217,24 @@ bool vfio_file_is_valid(struct file *file)
> >  }
> >  EXPORT_SYMBOL_GPL(vfio_file_is_valid);
> >
> > +/*
> > + * Return true if the input file is a vfio device file and has opened
> > + * the input device. Otherwise, return false.
> > + */
> > +static bool vfio_file_has_device_access(struct file *file,
> > +					struct vfio_device *device)
> > +{
> > +	struct vfio_device *vdev = vfio_device_from_file(file);
> > +	struct vfio_device_file *df;
> > +
> > +	if (!vdev || vdev != device)
> > +		return false;
> > +
> > +	df = file->private_data;
> > +
> > +	return READ_ONCE(df->access_granted);
> 
> Why did we change from smp_load_acquire() to READ_ONCE() here?
> Thanks,

It should still use smp_load_acquire(). When this is added, its major usage
is under the dev_set->lock, so it is functionally ok since the writer of the
df->access_granted is also under the dev_set->lock. However, as an helper
in vfio_core, it should use smp_load_acquire() otherwise needs to add
lockdep to check if dev_set->lock is held.

Thanks,
Yi Liu

> 
> > +}
> > +
> >  /**
> >   * vfio_file_has_dev - True if the VFIO file is a handle for device
> >   * @file: VFIO file to check
> > @@ -1211,17 +1245,12 @@ EXPORT_SYMBOL_GPL(vfio_file_is_valid);
> >  bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
> >  {
> >  	struct vfio_group *group;
> > -	struct vfio_device *vdev;
> >
> >  	group = vfio_group_from_file(file);
> >  	if (group)
> >  		return vfio_group_has_dev(group, device);
> >
> > -	vdev = vfio_device_from_file(file);
> > -	if (vdev)
> > -		return vdev == device;
> > -
> > -	return false;
> > +	return vfio_file_has_device_access(file, device);
> >  }
> >  EXPORT_SYMBOL_GPL(vfio_file_has_dev);
> >


^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v8 16/24] iommufd/device: Add iommufd_access_detach() API
  2023-03-28 15:54     ` Nicolin Chen
@ 2023-03-29  2:24       ` Liu, Yi L
  0 siblings, 0 replies; 54+ messages in thread
From: Liu, Yi L @ 2023-03-29  2:24 UTC (permalink / raw)
  To: Nicolin Chen, Jon Pan-Doh
  Cc: alex.williamson, chao.p.peng, cohuck, eric.auger, intel-gfx,
	intel-gvt-dev, jasowang, jgg, joro, Tian, Kevin, kvm, linux-s390,
	lulu, mjrosato, peterx, robin.murphy, shameerali.kolothum.thodi,
	suravee.suthikulpanit, Xu, Terrence, Hao, Xudong, Zhao, Yan Y,
	Jiang, Yanting, yi.y.sun

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Tuesday, March 28, 2023 11:55 PM
> 
> On Mon, Mar 27, 2023 at 07:23:57PM -0700, Jon Pan-Doh wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On 2023/3/27 02:40, Yi Liu wrote:
> > > diff --git a/drivers/iommu/iommufd/iommufd_private.h
> b/drivers/iommu/iommufd/iommufd_private.h
> > > index 2e6e8e217cce..ec2ce3ef187d 100644
> > > --- a/drivers/iommu/iommufd/iommufd_private.h
> > > +++ b/drivers/iommu/iommufd/iommufd_private.h
> > > @@ -263,6 +263,8 @@ struct iommufd_access {
> > >       struct iommufd_object obj;
> > >       struct iommufd_ctx *ictx;
> > >       struct iommufd_ioas *ioas;
> > > +     struct iommufd_ioas *ioas_unpin;
> > > +     struct mutex ioas_lock;
> > >       const struct iommufd_access_ops *ops;
> > >       void *data;
> > >       unsigned long iova_alignment;
> >
> > I think you may need to initialize ioas_lock. I got lockdep warnings running
> > iommufd selftests against this patch. Those went away when I added
> mutex_init().
> >
> > ---
> >  drivers/iommu/iommufd/device.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/iommu/iommufd/device.c
> b/drivers/iommu/iommufd/device.c
> > index f0522d80919d..0eaae60f3537 100644
> > --- a/drivers/iommu/iommufd/device.c
> > +++ b/drivers/iommu/iommufd/device.c
> > @@ -474,6 +474,7 @@ iommufd_access_create(struct iommufd_ctx *ictx,
> >         iommufd_ctx_get(ictx);
> >         iommufd_object_finalize(ictx, &access->obj);
> >         *id = access->obj.id;
> > +       mutex_init(&access->ioas_lock);
> >         return access;
> >  }
> >  EXPORT_SYMBOL_NS_GPL(iommufd_access_create, IOMMUFD);
> > --
> > 2.40.0.348.gf938b09366-goog
> 
> Yes... I think I lost that when splitting the changes.
> 
> Yi, can you help add this in the next version?

Sure. 😊

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 20/24] vfio: Add cdev for vfio_device
  2023-03-27  9:40 ` [PATCH v8 20/24] vfio: Add cdev for vfio_device Yi Liu
@ 2023-03-29 19:57   ` Alex Williamson
  2023-03-30  5:35     ` Liu, Yi L
  0 siblings, 1 reply; 54+ messages in thread
From: Alex Williamson @ 2023-03-29 19:57 UTC (permalink / raw)
  To: Yi Liu
  Cc: jgg, kevin.tian, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

On Mon, 27 Mar 2023 02:40:43 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> This allows user to directly open a vfio device w/o using the legacy
> container/group interface, as a prerequisite for supporting new iommu
> features like nested translation.
> 
> The device fd opened in this manner doesn't have the capability to access
> the device as the fops open() doesn't open the device until the successful
> BIND_IOMMUFD which be added in next patch.
> 
> With this patch, devices registered to vfio core have both group and device
> interface created.
> 
> - group interface : /dev/vfio/$groupID
> - device interface: /dev/vfio/devices/vfioX - normal device
> 		    /dev/vfio/devices/noiommu-vfioX - noiommu device
> 		    ("X" is the minor number and is unique across devices)
> 
> Given a vfio device the user can identify the matching vfioX by checking
> the sysfs path of the device. Take PCI device (0000:6a:01.0) for example,
> /sys/bus/pci/devices/0000\:6a\:01.0/vfio-dev/vfio0/dev contains the
> major:minor of the matching vfioX.
> 
> Userspace then opens the /dev/vfio/devices/vfioX and checks with fstat
> that the major:minor matches.
> 
> The vfio_device cdev logic in this patch:
> *) __vfio_register_dev() path ends up doing cdev_device_add() for each
>    vfio_device if VFIO_DEVICE_CDEV configured.
> *) vfio_unregister_group_dev() path does cdev_device_del();
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/Kconfig       | 11 +++++++
>  drivers/vfio/Makefile      |  1 +
>  drivers/vfio/device_cdev.c | 62 ++++++++++++++++++++++++++++++++++++++
>  drivers/vfio/vfio.h        | 46 ++++++++++++++++++++++++++++
>  drivers/vfio/vfio_main.c   | 26 +++++++++++-----
>  include/linux/vfio.h       |  4 +++
>  6 files changed, 143 insertions(+), 7 deletions(-)
>  create mode 100644 drivers/vfio/device_cdev.c
> 
> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> index 89e06c981e43..e2105b4dac2d 100644
> --- a/drivers/vfio/Kconfig
> +++ b/drivers/vfio/Kconfig
> @@ -12,6 +12,17 @@ menuconfig VFIO
>  	  If you don't know what to do here, say N.
>  
>  if VFIO
> +config VFIO_DEVICE_CDEV
> +	bool "Support for the VFIO cdev /dev/vfio/devices/vfioX"
> +	depends on IOMMUFD
> +	help
> +	  The VFIO device cdev is another way for userspace to get device
> +	  access. Userspace gets device fd by opening device cdev under
> +	  /dev/vfio/devices/vfioX, and then bind the device fd with an iommufd
> +	  to set up secure DMA context for device access.
> +
> +	  If you don't know what to do here, say N.
> +
>  config VFIO_CONTAINER
>  	bool "Support for the VFIO container /dev/vfio/vfio"
>  	select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
> diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
> index 70e7dcb302ef..245394aeb94b 100644
> --- a/drivers/vfio/Makefile
> +++ b/drivers/vfio/Makefile
> @@ -4,6 +4,7 @@ obj-$(CONFIG_VFIO) += vfio.o
>  vfio-y += vfio_main.o \
>  	  group.o \
>  	  iova_bitmap.o
> +vfio-$(CONFIG_VFIO_DEVICE_CDEV) += device_cdev.o
>  vfio-$(CONFIG_IOMMUFD) += iommufd.o
>  vfio-$(CONFIG_VFIO_CONTAINER) += container.o
>  vfio-$(CONFIG_VFIO_VIRQFD) += virqfd.o
> diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> new file mode 100644
> index 000000000000..1c640016a824
> --- /dev/null
> +++ b/drivers/vfio/device_cdev.c
> @@ -0,0 +1,62 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2023 Intel Corporation.
> + */
> +#include <linux/vfio.h>
> +
> +#include "vfio.h"
> +
> +static dev_t device_devt;
> +
> +void vfio_init_device_cdev(struct vfio_device *device)
> +{
> +	device->device.devt = MKDEV(MAJOR(device_devt), device->index);
> +	cdev_init(&device->cdev, &vfio_device_fops);
> +	device->cdev.owner = THIS_MODULE;
> +}
> +
> +/*
> + * device access via the fd opened by this function is blocked until
> + * .open_device() is called successfully during BIND_IOMMUFD.
> + */
> +int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
> +{
> +	struct vfio_device *device = container_of(inode->i_cdev,
> +						  struct vfio_device, cdev);
> +	struct vfio_device_file *df;
> +	int ret;
> +
> +	if (!vfio_device_try_get_registration(device))
> +		return -ENODEV;
> +
> +	df = vfio_allocate_device_file(device);
> +	if (IS_ERR(df)) {
> +		ret = PTR_ERR(df);
> +		goto err_put_registration;
> +	}
> +
> +	filep->private_data = df;
> +
> +	return 0;
> +
> +err_put_registration:
> +	vfio_device_put_registration(device);
> +	return ret;
> +}
> +
> +static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
> +{
> +	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
> +}
> +
> +int vfio_cdev_init(struct class *device_class)
> +{
> +	device_class->devnode = vfio_device_devnode;
> +	return alloc_chrdev_region(&device_devt, 0,
> +				   MINORMASK + 1, "vfio-dev");
> +}
> +
> +void vfio_cdev_cleanup(void)
> +{
> +	unregister_chrdev_region(device_devt, MINORMASK + 1);
> +}
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index 41dfc9d5205a..3a8fd0e32f59 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -268,6 +268,52 @@ static inline void vfio_iommufd_unbind(struct vfio_device_file *df)
>  }
>  #endif
>  
> +#if IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV)
> +static inline int vfio_device_add(struct vfio_device *device)
> +{
> +	return cdev_device_add(&device->cdev, &device->device);
> +}
> +
> +static inline void vfio_device_del(struct vfio_device *device)
> +{
> +	cdev_device_del(&device->cdev, &device->device);
> +}
> +
> +void vfio_init_device_cdev(struct vfio_device *device);
> +int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
> +int vfio_cdev_init(struct class *device_class);
> +void vfio_cdev_cleanup(void);
> +#else
> +static inline int vfio_device_add(struct vfio_device *device)
> +{
> +	return device_add(&device->device);
> +}
> +
> +static inline void vfio_device_del(struct vfio_device *device)
> +{
> +	device_del(&device->device);
> +}
> +
> +static inline void vfio_init_device_cdev(struct vfio_device *device)
> +{
> +}
> +
> +static inline int vfio_device_fops_cdev_open(struct inode *inode,
> +					     struct file *filep)
> +{
> +	return 0;
> +}
> +
> +static inline int vfio_cdev_init(struct class *device_class)
> +{
> +	return 0;
> +}
> +
> +static inline void vfio_cdev_cleanup(void)
> +{
> +}
> +#endif /* CONFIG_VFIO_DEVICE_CDEV */
> +
>  #if IS_ENABLED(CONFIG_VFIO_VIRQFD)
>  int __init vfio_virqfd_init(void);
>  void vfio_virqfd_exit(void);
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 8e96aab27029..58fc3bb768f2 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -242,6 +242,7 @@ static int vfio_init_device(struct vfio_device *device, struct device *dev,
>  	device->device.release = vfio_device_release;
>  	device->device.class = vfio.device_class;
>  	device->device.parent = device->dev;
> +	vfio_init_device_cdev(device);
>  	return 0;
>  
>  out_uninit:
> @@ -280,7 +281,7 @@ static int __vfio_register_dev(struct vfio_device *device,
>  	if (ret)
>  		goto err_out;
>  
> -	ret = device_add(&device->device);
> +	ret = vfio_device_add(device);
>  	if (ret)
>  		goto err_out;
>  
> @@ -320,6 +321,12 @@ void vfio_unregister_group_dev(struct vfio_device *device)
>  	bool interrupted = false;
>  	long rc;
>  
> +	/* Prevent new device opened in the group path */
> +	vfio_device_group_unregister(device);
> +
> +	/* Prevent new device opened in the cdev path */
> +	vfio_device_del(device);
> +
>  	vfio_device_put_registration(device);
>  	rc = try_wait_for_completion(&device->comp);
>  	while (rc <= 0) {
> @@ -343,11 +350,6 @@ void vfio_unregister_group_dev(struct vfio_device *device)
>  		}
>  	}
>  
> -	vfio_device_group_unregister(device);
> -
> -	/* Balances device_add in register path */
> -	device_del(&device->device);
> -

Why were these relocated?  And additionally why was the comment
regarding the balance operations dropped?  The move seems unrelated to
the patch here, so if it's actually advisable for some reason, it
should be a separate patch.  Thanks,

Alex

>  	/* Balances vfio_device_set_group in register path */
>  	vfio_device_remove_group(device);
>  }
> @@ -555,7 +557,8 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
>  	struct vfio_device_file *df = filep->private_data;
>  	struct vfio_device *device = df->device;
>  
> -	vfio_device_group_close(df);
> +	if (df->group)
> +		vfio_device_group_close(df);
>  
>  	vfio_device_put_registration(device);
>  
> @@ -1204,6 +1207,7 @@ static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
>  
>  const struct file_operations vfio_device_fops = {
>  	.owner		= THIS_MODULE,
> +	.open		= vfio_device_fops_cdev_open,
>  	.release	= vfio_device_fops_release,
>  	.read		= vfio_device_fops_read,
>  	.write		= vfio_device_fops_write,
> @@ -1590,9 +1594,16 @@ static int __init vfio_init(void)
>  		goto err_dev_class;
>  	}
>  
> +	ret = vfio_cdev_init(vfio.device_class);
> +	if (ret)
> +		goto err_alloc_dev_chrdev;
> +
>  	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
>  	return 0;
>  
> +err_alloc_dev_chrdev:
> +	class_destroy(vfio.device_class);
> +	vfio.device_class = NULL;
>  err_dev_class:
>  	vfio_virqfd_exit();
>  err_virqfd:
> @@ -1603,6 +1614,7 @@ static int __init vfio_init(void)
>  static void __exit vfio_cleanup(void)
>  {
>  	ida_destroy(&vfio.device_ida);
> +	vfio_cdev_cleanup();
>  	class_destroy(vfio.device_class);
>  	vfio.device_class = NULL;
>  	vfio_virqfd_exit();
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 5c06af04ed9e..8719ec2adbbb 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -13,6 +13,7 @@
>  #include <linux/mm.h>
>  #include <linux/workqueue.h>
>  #include <linux/poll.h>
> +#include <linux/cdev.h>
>  #include <uapi/linux/vfio.h>
>  #include <linux/iova_bitmap.h>
>  
> @@ -51,6 +52,9 @@ struct vfio_device {
>  	/* Members below here are private, not for driver use */
>  	unsigned int index;
>  	struct device device;	/* device.kref covers object life circle */
> +#if IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV)
> +	struct cdev cdev;
> +#endif
>  	refcount_t refcount;	/* user count on registered device*/
>  	unsigned int open_count;
>  	struct completion comp;


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 21/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-03-27  9:40 ` [PATCH v8 21/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD Yi Liu
@ 2023-03-29 21:00   ` Alex Williamson
  2023-03-29 23:22     ` Jason Gunthorpe
  2023-03-30  7:09     ` Liu, Yi L
  0 siblings, 2 replies; 54+ messages in thread
From: Alex Williamson @ 2023-03-29 21:00 UTC (permalink / raw)
  To: Yi Liu
  Cc: jgg, kevin.tian, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

On Mon, 27 Mar 2023 02:40:44 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> This adds ioctl for userspace to bind device cdev fd to iommufd.
> 
>     VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA
> 			      control provided by the iommufd. open_device
> 			      op is called after bind_iommufd op.
> 			      VFIO no iommu mode is indicated by passing
> 			      a negative iommufd value.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/device_cdev.c | 153 +++++++++++++++++++++++++++++++++++++
>  drivers/vfio/vfio.h        |  13 ++++
>  drivers/vfio/vfio_main.c   |   5 ++
>  include/uapi/linux/vfio.h  |  37 +++++++++
>  4 files changed, 208 insertions(+)
> 
> diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> index 1c640016a824..2b563bac50b9 100644
> --- a/drivers/vfio/device_cdev.c
> +++ b/drivers/vfio/device_cdev.c
> @@ -3,6 +3,7 @@
>   * Copyright (c) 2023 Intel Corporation.
>   */
>  #include <linux/vfio.h>
> +#include <linux/iommufd.h>
>  
>  #include "vfio.h"
>  
> @@ -44,6 +45,158 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
>  	return ret;
>  }
>  
> +static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
> +{
> +	spin_lock(&df->kvm_ref_lock);
> +	if (df->kvm)
> +		_vfio_device_get_kvm_safe(df->device, df->kvm);
> +	spin_unlock(&df->kvm_ref_lock);
> +}
> +
> +void vfio_device_cdev_close(struct vfio_device_file *df)
> +{
> +	struct vfio_device *device = df->device;
> +
> +	/*
> +	 * As df->access_granted writer is under dev_set->lock as well,
> +	 * so this read no need to use smp_load_acquire() to pair with

Nit, "no need to use" -> "does not need to use"

> +	 * smp_store_release() in the caller of vfio_device_open().
> +	 */
> +	if (!df->access_granted)
> +		return;
> +

Isn't the lock we're acquiring below the one that we claim to have in
the comment above to make the non-smp_load_acquire() test safe?

> +	mutex_lock(&device->dev_set->lock);
> +	vfio_device_close(df);
> +	vfio_device_put_kvm(device);
> +	if (df->iommufd)
> +		iommufd_ctx_put(df->iommufd);
> +	mutex_unlock(&device->dev_set->lock);
> +	vfio_device_unblock_group(device);
> +}
> +
> +static int vfio_device_cdev_enable_noiommu(struct vfio_device *device)
> +{
> +	if (!capable(CAP_SYS_RAWIO))
> +		return -EPERM;
> +
> +	if (!device->noiommu)
> +		return -EINVAL;
> +
> +	return 0;
> +}

This is testing, not enabling. ie. naming nit.

> +
> +static struct iommufd_ctx *vfio_get_iommufd_from_fd(int fd)
> +{
> +	struct fd f;
> +	struct iommufd_ctx *iommufd;
> +
> +	f = fdget(fd);
> +	if (!f.file)
> +		return ERR_PTR(-EBADF);
> +
> +	iommufd = iommufd_ctx_from_file(f.file);
> +
> +	fdput(f);
> +	return iommufd;
> +}
> +
> +long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
> +				    struct vfio_device_bind_iommufd __user *arg)
> +{
> +	struct vfio_device *device = df->device;
> +	struct vfio_device_bind_iommufd bind;
> +	struct iommufd_ctx *iommufd = NULL;
> +	unsigned long minsz;
> +	int ret;
> +
> +	static_assert(__same_type(arg->out_devid, bind.out_devid));

They're the same field in the same structure, how could they be
otherwise?

> +
> +	minsz = offsetofend(struct vfio_device_bind_iommufd, out_devid);
> +
> +	if (copy_from_user(&bind, arg, minsz))
> +		return -EFAULT;
> +
> +	if (bind.argsz < minsz || bind.flags)
> +		return -EINVAL;
> +
> +	if (!device->ops->bind_iommufd)
> +		return -ENODEV;

This test seems beyond normal paranoia since we test in
__vfio_register_dev()

> +
> +	/* BIND_IOMMUFD only allowed for cdev fds */
> +	if (df->group)
> +		return -EINVAL;
> +
> +	ret = vfio_device_block_group(device);
> +	if (ret)
> +		return ret;
> +
> +	mutex_lock(&device->dev_set->lock);
> +	/* one device cannot be bound twice */
> +	if (df->access_granted) {
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	}
> +
> +	/* iommufd < 0 means noiommu mode */
> +	if (bind.iommufd < 0) {
> +		ret = vfio_device_cdev_enable_noiommu(device);
> +		if (ret)
> +			goto out_unlock;
> +	} else {
> +		iommufd = vfio_get_iommufd_from_fd(bind.iommufd);
> +		if (IS_ERR(iommufd)) {
> +			ret = PTR_ERR(iommufd);
> +			goto out_unlock;
> +		}
> +	}
> +
> +	/*
> +	 * Before the device open, get the KVM pointer currently
> +	 * associated with the device file (if there is) and obtain
> +	 * a reference.  This reference is held until device closed.
> +	 * Save the pointer in the device for use by drivers.
> +	 */
> +	vfio_device_get_kvm_safe(df);
> +
> +	df->iommufd = iommufd;
> +	ret = vfio_device_open(df);
> +	if (ret)
> +		goto out_put_kvm;
> +
> +	if (df->iommufd)
> +		bind.out_devid = df->devid;

How about only setting df->iommufd in the non-noiommu case above so
it's not confusing that it was just set 4 lines previous.  That also
allows the iommufd pointer to be scoped within that branch and not
require initialization.  It might make sense to declare:

	bool is_noiommu = (bind.iommufd < 0);

and use it consistently rather than alternating testing between
bind.iommufd and df->iommufd.

> +
> +	ret = copy_to_user(&arg->out_devid, &bind.out_devid,
> +			   sizeof(bind.out_devid)) ? -EFAULT : 0;

In the noiommu case, this copies back the input value, shouldn't it be
some known invalid value?  Seems confusing.

> +	if (ret)
> +		goto out_close_device;
> +
> +	if (bind.iommufd < 0)
> +		dev_warn(device->dev, "device is bound to vfio-noiommu by user "
> +			 "(%s:%d)\n", current->comm, task_pid_nr(current));
> +
> +	/*
> +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> +	 * read/write/mmap
> +	 */
> +	smp_store_release(&df->access_granted, true);
> +	mutex_unlock(&device->dev_set->lock);
> +
> +	return 0;
> +
> +out_close_device:
> +	vfio_device_close(df);
> +out_put_kvm:
> +	df->iommufd = NULL;
> +	vfio_device_put_kvm(device);
> +	if (iommufd)
> +		iommufd_ctx_put(iommufd);
> +out_unlock:
> +	mutex_unlock(&device->dev_set->lock);
> +	vfio_device_unblock_group(device);
> +	return ret;
> +}
> +
>  static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
>  {
>  	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index 3a8fd0e32f59..ace3d52b0928 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -281,6 +281,9 @@ static inline void vfio_device_del(struct vfio_device *device)
>  
>  void vfio_init_device_cdev(struct vfio_device *device);
>  int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
> +void vfio_device_cdev_close(struct vfio_device_file *df);
> +long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
> +				    struct vfio_device_bind_iommufd __user *arg);
>  int vfio_cdev_init(struct class *device_class);
>  void vfio_cdev_cleanup(void);
>  #else
> @@ -304,6 +307,16 @@ static inline int vfio_device_fops_cdev_open(struct inode *inode,
>  	return 0;
>  }
>  
> +static inline void vfio_device_cdev_close(struct vfio_device_file *df)
> +{
> +}
> +
> +static inline long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
> +						  struct vfio_device_bind_iommufd __user *arg)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
>  static inline int vfio_cdev_init(struct class *device_class)
>  {
>  	return 0;
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 58fc3bb768f2..375086c8803f 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -559,6 +559,8 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
>  
>  	if (df->group)
>  		vfio_device_group_close(df);
> +	else
> +		vfio_device_cdev_close(df);
>  
>  	vfio_device_put_registration(device);
>  
> @@ -1132,6 +1134,9 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
>  	struct vfio_device *device = df->device;
>  	int ret;
>  
> +	if (cmd == VFIO_DEVICE_BIND_IOMMUFD)
> +		return vfio_device_ioctl_bind_iommufd(df, (void __user *)arg);
> +
>  	/* Paired with smp_store_release() following vfio_device_open() */
>  	if (!smp_load_acquire(&df->access_granted))
>  		return -EINVAL;
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 61b801dfd40b..62b2f2497525 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -194,6 +194,43 @@ struct vfio_group_status {
>  
>  /* --------------- IOCTLs for DEVICE file descriptors --------------- */
>  
> +/*
> + * VFIO_DEVICE_BIND_IOMMUFD - _IOR(VFIO_TYPE, VFIO_BASE + 19,
> + *				   struct vfio_device_bind_iommufd)
> + *
> + * Bind a vfio_device to the specified iommufd.
> + *
> + * The user should provide a device cookie when calling this ioctl. The
> + * cookie is carried only in event e.g. I/O fault reported to userspace
> + * via iommufd. The user should use devid returned by this ioctl to mark
> + * the target device in other ioctls (e.g. iommu hardware infomration query
> + * via iommufd, and etc.).

AFAICT, the whole concept of this dev_cookie is a fantasy.  It only
exists in this series in these comments and the structure below.  It's
not even defined whether it needs to be unique within an iommufd
context, and clearly nothing here validates that.  There's not enough
implementation for it to exist in this series.  Maybe dev_cookie is
appended to the end of the structure and indicated with a flag when it
has some meaning.

> + *
> + * User is not allowed to access the device before the binding operation
> + * is completed.

s/not allowed to access/restricted from accessing/

> + *
> + * Unbind is automatically conducted when device fd is closed.
> + *
> + * @argsz:	 user filled size of this data.
> + * @flags:	 reserved for future extension.
> + * @dev_cookie:	 a per device cookie provided by userspace.
> + * @iommufd:	 iommufd to bind. a negative value means noiommu.

"Use a negative value for no-iommu, where supported", or better, should
we define this explicitly as -1, or why not use a flag bit to specify
no-iommu?  Maybe minsz is only through flags for the noiommu use case.
Thanks,

Alex

> + * @out_devid:	 the device id generated by this bind. This field is valid
> + *		as long as the input @iommufd is valid. Otherwise, it is
> + *		meaningless.
> + *
> + * Return: 0 on success, -errno on failure.
> + */
> +struct vfio_device_bind_iommufd {
> +	__u32		argsz;
> +	__u32		flags;
> +	__aligned_u64	dev_cookie;
> +	__s32		iommufd;
> +	__u32		out_devid;
> +};
> +
> +#define VFIO_DEVICE_BIND_IOMMUFD	_IO(VFIO_TYPE, VFIO_BASE + 19)
> +
>  /**
>   * VFIO_DEVICE_GET_INFO - _IOR(VFIO_TYPE, VFIO_BASE + 7,
>   *						struct vfio_device_info)


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 22/24] vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT
  2023-03-27  9:40 ` [PATCH v8 22/24] vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT Yi Liu
@ 2023-03-29 21:19   ` Alex Williamson
  2023-03-30 13:02     ` Liu, Yi L
  0 siblings, 1 reply; 54+ messages in thread
From: Alex Williamson @ 2023-03-29 21:19 UTC (permalink / raw)
  To: Yi Liu
  Cc: jgg, kevin.tian, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang


Nit, ATDETACH?  [AT|DE]TACH?

On Mon, 27 Mar 2023 02:40:45 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> This adds ioctl for userspace to attach device cdev fd to and detach
> from IOAS/hw_pagetable managed by iommufd.
> 
>     VFIO_DEVICE_ATTACH_IOMMUFD_PT: attach vfio device to IOAS, hw_pagetable
> 				   managed by iommufd. Attach can be
> 				   undo by VFIO_DEVICE_DETACH_IOMMUFD_PT
> 				   or device fd close.
>     VFIO_DEVICE_DETACH_IOMMUFD_PT: detach vfio device from the current attached
> 				   IOAS or hw_pagetable managed by iommufd.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/device_cdev.c | 85 ++++++++++++++++++++++++++++++++++++++
>  drivers/vfio/vfio.h        | 16 +++++++
>  drivers/vfio/vfio_main.c   |  8 ++++
>  include/uapi/linux/vfio.h  | 52 +++++++++++++++++++++++
>  4 files changed, 161 insertions(+)
> 
> diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> index 2b563bac50b9..b5de997bff6d 100644
> --- a/drivers/vfio/device_cdev.c
> +++ b/drivers/vfio/device_cdev.c
> @@ -197,6 +197,91 @@ long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
>  	return ret;
>  }
>  
> +int vfio_ioctl_device_attach(struct vfio_device_file *df,
> +			     struct vfio_device_attach_iommufd_pt __user *arg)
> +{
> +	struct vfio_device *device = df->device;
> +	struct vfio_device_attach_iommufd_pt attach;
> +	unsigned long minsz;
> +	int ret;
> +
> +	static_assert(__same_type(arg->pt_id, attach.pt_id));

Same comment as previous, given these are the same field of the same
structure, I don't understand how this could ever assert.

> +
> +	minsz = offsetofend(struct vfio_device_attach_iommufd_pt, pt_id);
> +
> +	if (copy_from_user(&attach, arg, minsz))
> +		return -EFAULT;
> +
> +	if (attach.argsz < minsz || attach.flags)
> +		return -EINVAL;
> +
> +	if (!device->ops->bind_iommufd)
> +		return -ENODEV;

Same as previous, we already require this to enable cdev support, so
this seems ultra paranoid.

> +
> +	/* ATTACH only allowed for cdev fds */
> +	if (df->group)
> +		return -EINVAL;
> +
> +	mutex_lock(&device->dev_set->lock);
> +	/* noiommufd mode doesn't allow attach */
> +	if (!df->iommufd) {
> +		ret = -EOPNOTSUPP;
> +		goto out_unlock;
> +	}
> +
> +	ret = device->ops->attach_ioas(device, &attach.pt_id);
> +	if (ret)
> +		goto out_unlock;
> +
> +	ret = copy_to_user(&arg->pt_id, &attach.pt_id,
> +			   sizeof(attach.pt_id)) ? -EFAULT : 0;
> +	if (ret)
> +		goto out_detach;
> +	mutex_unlock(&device->dev_set->lock);
> +
> +	return 0;
> +
> +out_detach:
> +	device->ops->detach_ioas(device);
> +out_unlock:
> +	mutex_unlock(&device->dev_set->lock);
> +	return ret;
> +}
> +
> +int vfio_ioctl_device_detach(struct vfio_device_file *df,
> +			     struct vfio_device_detach_iommufd_pt __user *arg)
> +{
> +	struct vfio_device *device = df->device;
> +	struct vfio_device_detach_iommufd_pt detach;
> +	unsigned long minsz;
> +
> +	minsz = offsetofend(struct vfio_device_detach_iommufd_pt, flags);
> +
> +	if (copy_from_user(&detach, arg, minsz))
> +		return -EFAULT;
> +
> +	if (detach.argsz < minsz || detach.flags)
> +		return -EINVAL;
> +
> +	if (!device->ops->bind_iommufd)
> +		return -ENODEV;
> +
> +	/* DETACH only allowed for cdev fds */
> +	if (df->group)
> +		return -EINVAL;
> +
> +	mutex_lock(&device->dev_set->lock);
> +	/* noiommufd mode doesn't support detach */
> +	if (!df->iommufd) {
> +		mutex_unlock(&device->dev_set->lock);
> +		return -EOPNOTSUPP;
> +	}
> +	device->ops->detach_ioas(device);
> +	mutex_unlock(&device->dev_set->lock);
> +
> +	return 0;
> +}

For both of these, can't a user trigger the
WARN_ON(!vdev->iommufd_device) if they call either of these before
VFIO_DEVICE_BIND_IOMMUFD?  Thanks,

Alex

> +
>  static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
>  {
>  	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index ace3d52b0928..c199e410db18 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -284,6 +284,10 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
>  void vfio_device_cdev_close(struct vfio_device_file *df);
>  long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
>  				    struct vfio_device_bind_iommufd __user *arg);
> +int vfio_ioctl_device_attach(struct vfio_device_file *df,
> +			     struct vfio_device_attach_iommufd_pt __user *arg);
> +int vfio_ioctl_device_detach(struct vfio_device_file *df,
> +			     struct vfio_device_detach_iommufd_pt __user *arg);
>  int vfio_cdev_init(struct class *device_class);
>  void vfio_cdev_cleanup(void);
>  #else
> @@ -317,6 +321,18 @@ static inline long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
>  	return -EOPNOTSUPP;
>  }
>  
> +static inline int vfio_ioctl_device_attach(struct vfio_device_file *df,
> +					   struct vfio_device_attach_iommufd_pt __user *arg)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
> +static inline int vfio_ioctl_device_detach(struct vfio_device_file *df,
> +					   struct vfio_device_detach_iommufd_pt __user *arg)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
>  static inline int vfio_cdev_init(struct class *device_class)
>  {
>  	return 0;
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 375086c8803f..896d8bb49585 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -1150,6 +1150,14 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
>  		ret = vfio_ioctl_device_feature(device, (void __user *)arg);
>  		break;
>  
> +	case VFIO_DEVICE_ATTACH_IOMMUFD_PT:
> +		ret = vfio_ioctl_device_attach(df, (void __user *)arg);
> +		break;
> +
> +	case VFIO_DEVICE_DETACH_IOMMUFD_PT:
> +		ret = vfio_ioctl_device_detach(df, (void __user *)arg);
> +		break;
> +
>  	default:
>  		if (unlikely(!device->ops->ioctl))
>  			ret = -EINVAL;
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 62b2f2497525..bf6c97e759c7 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -231,6 +231,58 @@ struct vfio_device_bind_iommufd {
>  
>  #define VFIO_DEVICE_BIND_IOMMUFD	_IO(VFIO_TYPE, VFIO_BASE + 19)
>  
> +/*
> + * VFIO_DEVICE_ATTACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 20,
> + *					struct vfio_device_attach_iommufd_pt)
> + *
> + * Attach a vfio device to an iommufd address space specified by IOAS
> + * id or hw_pagetable (hwpt) id.
> + *
> + * Available only after a device has been bound to iommufd via
> + * VFIO_DEVICE_BIND_IOMMUFD
> + *
> + * Undo by VFIO_DEVICE_DETACH_IOMMUFD_PT or device fd close.
> + *
> + * @argsz:	user filled size of this data.
> + * @flags:	must be 0.
> + * @pt_id:	Input the target id which can represent an ioas or a hwpt
> + *		allocated via iommufd subsystem.
> + *		Output the attached hwpt id which could be the specified
> + *		hwpt itself or a hwpt automatically created for the
> + *		specified ioas by kernel during the attachment.
> + *
> + * Return: 0 on success, -errno on failure.
> + */
> +struct vfio_device_attach_iommufd_pt {
> +	__u32	argsz;
> +	__u32	flags;
> +	__u32	pt_id;
> +};
> +
> +#define VFIO_DEVICE_ATTACH_IOMMUFD_PT		_IO(VFIO_TYPE, VFIO_BASE + 20)
> +
> +/*
> + * VFIO_DEVICE_DETACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 21,
> + *					struct vfio_device_detach_iommufd_pt)
> + *
> + * Detach a vfio device from the iommufd address space it has been
> + * attached to. After it, device should be in a blocking DMA state.
> + *
> + * Available only after a device has been bound to iommufd via
> + * VFIO_DEVICE_BIND_IOMMUFD.
> + *
> + * @argsz:	user filled size of this data.
> + * @flags:	must be 0.
> + *
> + * Return: 0 on success, -errno on failure.
> + */
> +struct vfio_device_detach_iommufd_pt {
> +	__u32	argsz;
> +	__u32	flags;
> +};
> +
> +#define VFIO_DEVICE_DETACH_IOMMUFD_PT		_IO(VFIO_TYPE, VFIO_BASE + 21)
> +
>  /**
>   * VFIO_DEVICE_GET_INFO - _IOR(VFIO_TYPE, VFIO_BASE + 7,
>   *						struct vfio_device_info)


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 23/24] vfio: Compile group optionally
  2023-03-27  9:40 ` [PATCH v8 23/24] vfio: Compile group optionally Yi Liu
@ 2023-03-29 21:51   ` Alex Williamson
  2023-03-30 13:06     ` Liu, Yi L
  0 siblings, 1 reply; 54+ messages in thread
From: Alex Williamson @ 2023-03-29 21:51 UTC (permalink / raw)
  To: Yi Liu
  Cc: jgg, kevin.tian, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

On Mon, 27 Mar 2023 02:40:46 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> group code is not needed for vfio device cdev, so with vfio device cdev
> introduced, the group infrastructures can be compiled out if only cdev
> is needed.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/iommu/iommufd/Kconfig |   4 +-
>  drivers/vfio/Kconfig          |  16 ++++-
>  drivers/vfio/Makefile         |   2 +-
>  drivers/vfio/vfio.h           | 111 ++++++++++++++++++++++++++++++++--
>  include/linux/vfio.h          |  13 +++-
>  5 files changed, 134 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig
> index ada693ea51a7..1946eed1826a 100644
> --- a/drivers/iommu/iommufd/Kconfig
> +++ b/drivers/iommu/iommufd/Kconfig
> @@ -14,8 +14,8 @@ config IOMMUFD
>  if IOMMUFD
>  config IOMMUFD_VFIO_CONTAINER
>  	bool "IOMMUFD provides the VFIO container /dev/vfio/vfio"
> -	depends on VFIO && !VFIO_CONTAINER
> -	default VFIO && !VFIO_CONTAINER
> +	depends on VFIO && VFIO_GROUP && !VFIO_CONTAINER
> +	default VFIO && VFIO_GROUP && !VFIO_CONTAINER

Shouldn't these simply replace VFIO with VFIO_GROUP since VFIO_GROUP
necessarily depends on VFIO?

>  	help
>  	  IOMMUFD will provide /dev/vfio/vfio instead of VFIO. This relies on
>  	  IOMMUFD providing compatibility emulation to give the same ioctls.
> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> index e2105b4dac2d..0942a19601a2 100644
> --- a/drivers/vfio/Kconfig
> +++ b/drivers/vfio/Kconfig
> @@ -4,7 +4,9 @@ menuconfig VFIO
>  	select IOMMU_API
>  	depends on IOMMUFD || !IOMMUFD
>  	select INTERVAL_TREE
> -	select VFIO_CONTAINER if IOMMUFD=n
> +	select VFIO_GROUP if SPAPR_TCE_IOMMU || !IOMMUFD

This needs to be IOMMUFD=n or else VFIO_GROUP cannot be unset when
IOMMUFD=m

> +	select VFIO_DEVICE_CDEV if !VFIO_GROUP
> +	select VFIO_CONTAINER if IOMMUFD=n && VFIO_GROUP

The fact that CONTAINER depends on GROUP seems to be sufficient that we
don't need GROUP here.  Thanks,

Alex

>  	help
>  	  VFIO provides a framework for secure userspace device drivers.
>  	  See Documentation/driver-api/vfio.rst for more details.
> @@ -15,6 +17,7 @@ if VFIO
>  config VFIO_DEVICE_CDEV
>  	bool "Support for the VFIO cdev /dev/vfio/devices/vfioX"
>  	depends on IOMMUFD
> +	default !VFIO_GROUP
>  	help
>  	  The VFIO device cdev is another way for userspace to get device
>  	  access. Userspace gets device fd by opening device cdev under
> @@ -23,9 +26,20 @@ config VFIO_DEVICE_CDEV
>  
>  	  If you don't know what to do here, say N.
>  
> +config VFIO_GROUP
> +	bool "Support for the VFIO group /dev/vfio/$group_id"
> +	default y
> +	help
> +	   VFIO group support provides the traditional model for accessing
> +	   devices through VFIO and is used by the majority of userspace
> +	   applications and drivers making use of VFIO.
> +
> +	   If you don't know what to do here, say Y.
> +
>  config VFIO_CONTAINER
>  	bool "Support for the VFIO container /dev/vfio/vfio"
>  	select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
> +	depends on VFIO_GROUP
>  	default y
>  	help
>  	  The VFIO container is the classic interface to VFIO for establishing
> diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
> index 245394aeb94b..57c3515af606 100644
> --- a/drivers/vfio/Makefile
> +++ b/drivers/vfio/Makefile
> @@ -2,9 +2,9 @@
>  obj-$(CONFIG_VFIO) += vfio.o
>  
>  vfio-y += vfio_main.o \
> -	  group.o \
>  	  iova_bitmap.o
>  vfio-$(CONFIG_VFIO_DEVICE_CDEV) += device_cdev.o
> +vfio-$(CONFIG_VFIO_GROUP) += group.o
>  vfio-$(CONFIG_IOMMUFD) += iommufd.o
>  vfio-$(CONFIG_VFIO_CONTAINER) += container.o
>  vfio-$(CONFIG_VFIO_VIRQFD) += virqfd.o
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index c199e410db18..9c7a238ec8dd 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -36,6 +36,12 @@ vfio_allocate_device_file(struct vfio_device *device);
>  
>  extern const struct file_operations vfio_device_fops;
>  
> +#ifdef CONFIG_VFIO_NOIOMMU
> +extern bool vfio_noiommu __read_mostly;
> +#else
> +enum { vfio_noiommu = false };
> +#endif
> +
>  enum vfio_group_type {
>  	/*
>  	 * Physical device with IOMMU backing.
> @@ -60,6 +66,7 @@ enum vfio_group_type {
>  	VFIO_NO_IOMMU,
>  };
>  
> +#if IS_ENABLED(CONFIG_VFIO_GROUP)
>  struct vfio_group {
>  	struct device 			dev;
>  	struct cdev			cdev;
> @@ -113,6 +120,104 @@ static inline void vfio_device_set_noiommu(struct vfio_device *device)
>  	device->noiommu = IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
>  			  device->group->type == VFIO_NO_IOMMU;
>  }
> +#else
> +struct vfio_group;
> +
> +static inline int vfio_device_block_group(struct vfio_device *device)
> +{
> +	return 0;
> +}
> +
> +static inline void vfio_device_unblock_group(struct vfio_device *device)
> +{
> +}
> +
> +static inline int vfio_device_set_group(struct vfio_device *device,
> +					enum vfio_group_type type)
> +{
> +	return 0;
> +}
> +
> +static inline void vfio_device_remove_group(struct vfio_device *device)
> +{
> +}
> +
> +static inline void vfio_device_group_register(struct vfio_device *device)
> +{
> +}
> +
> +static inline void vfio_device_group_unregister(struct vfio_device *device)
> +{
> +}
> +
> +static inline bool vfio_device_group_uses_container(struct vfio_device_file *df)
> +{
> +	return false;
> +}
> +
> +static inline int vfio_device_group_use_iommu(struct vfio_device *device)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
> +static inline void vfio_device_group_unuse_iommu(struct vfio_device *device)
> +{
> +}
> +
> +static inline void vfio_device_group_close(struct vfio_device_file *df)
> +{
> +}
> +
> +static inline struct vfio_group *vfio_group_from_file(struct file *file)
> +{
> +	return NULL;
> +}
> +
> +static inline bool vfio_group_has_dev(struct vfio_group *group,
> +				      struct vfio_device *device)
> +{
> +	return false;
> +}
> +
> +static inline bool vfio_group_enforced_coherent(struct vfio_group *group)
> +{
> +	return true;
> +}
> +
> +static inline void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm)
> +{
> +}
> +
> +static inline bool vfio_device_has_container(struct vfio_device *device)
> +{
> +	return false;
> +}
> +
> +static inline int __init vfio_group_init(void)
> +{
> +	return 0;
> +}
> +
> +static inline void vfio_group_cleanup(void)
> +{
> +}
> +
> +static inline void vfio_device_set_noiommu(struct vfio_device *device)
> +{
> +	struct iommu_group *iommu_group;
> +
> +	device->noiommu = false;
> +
> +	if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU) || !vfio_noiommu)
> +		return;
> +
> +	iommu_group = iommu_group_get(device->dev);
> +	if (iommu_group)
> +		iommu_group_put(iommu_group);
> +	else
> +		device->noiommu = true;
> +}
> +#endif /* CONFIG_VFIO_GROUP */
>  
>  #if IS_ENABLED(CONFIG_VFIO_CONTAINER)
>  /**
> @@ -356,12 +461,6 @@ static inline void vfio_virqfd_exit(void)
>  }
>  #endif
>  
> -#ifdef CONFIG_VFIO_NOIOMMU
> -extern bool vfio_noiommu __read_mostly;
> -#else
> -enum { vfio_noiommu = false };
> -#endif
> -
>  #ifdef CONFIG_HAVE_KVM
>  void _vfio_device_get_kvm_safe(struct vfio_device *device, struct kvm *kvm);
>  void vfio_device_put_kvm(struct vfio_device *device);
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 8719ec2adbbb..1367605d617c 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -43,7 +43,11 @@ struct vfio_device {
>  	 */
>  	const struct vfio_migration_ops *mig_ops;
>  	const struct vfio_log_ops *log_ops;
> +#if IS_ENABLED(CONFIG_VFIO_GROUP)
>  	struct vfio_group *group;
> +	struct list_head group_next;
> +	struct list_head iommu_entry;
> +#endif
>  	struct vfio_device_set *dev_set;
>  	struct list_head dev_set_list;
>  	unsigned int migration_flags;
> @@ -58,8 +62,6 @@ struct vfio_device {
>  	refcount_t refcount;	/* user count on registered device*/
>  	unsigned int open_count;
>  	struct completion comp;
> -	struct list_head group_next;
> -	struct list_head iommu_entry;
>  	struct iommufd_access *iommufd_access;
>  	void (*put_kvm)(struct kvm *kvm);
>  #if IS_ENABLED(CONFIG_IOMMUFD)
> @@ -270,7 +272,14 @@ int vfio_mig_get_next_state(struct vfio_device *device,
>  /*
>   * External user API
>   */
> +#if IS_ENABLED(CONFIG_VFIO_GROUP)
>  struct iommu_group *vfio_file_iommu_group(struct file *file);
> +#else
> +static inline struct iommu_group *vfio_file_iommu_group(struct file *file)
> +{
> +	return NULL;
> +}
> +#endif
>  bool vfio_file_is_valid(struct file *file);
>  bool vfio_file_enforced_coherent(struct file *file);
>  void vfio_file_set_kvm(struct file *file, struct kvm *kvm);


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 24/24] docs: vfio: Add vfio device cdev description
  2023-03-27  9:40 ` [PATCH v8 24/24] docs: vfio: Add vfio device cdev description Yi Liu
@ 2023-03-29 22:47   ` Alex Williamson
  2023-03-29 22:57     ` Alex Williamson
  2023-03-30 13:11     ` Liu, Yi L
  0 siblings, 2 replies; 54+ messages in thread
From: Alex Williamson @ 2023-03-29 22:47 UTC (permalink / raw)
  To: Yi Liu
  Cc: jgg, kevin.tian, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

On Mon, 27 Mar 2023 02:40:47 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> This gives notes for userspace applications on device cdev usage.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  Documentation/driver-api/vfio.rst | 127 ++++++++++++++++++++++++++++++
>  1 file changed, 127 insertions(+)
> 
> diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
> index 363e12c90b87..77408788b98d 100644
> --- a/Documentation/driver-api/vfio.rst
> +++ b/Documentation/driver-api/vfio.rst
> @@ -239,6 +239,125 @@ group and can access them as follows::
>  	/* Gratuitous device reset and go... */
>  	ioctl(device, VFIO_DEVICE_RESET);
>  
> +IOMMUFD and vfio_iommu_type1
> +----------------------------
> +
> +IOMMUFD is the new user API to manage I/O page tables from userspace.
> +It intends to be the portal of delivering advanced userspace DMA
> +features (nested translation [5], PASID [6], etc.) while being backward
> +compatible with the vfio_iommu_type1 driver.  Eventually vfio_iommu_type1
> +will be deprecated.

"... while also providing a backwards compatibility interface for
existing VFIO_TYPE1v2_IOMMU use cases.  Eventually the vfio_iommu_type1
driver, as well as the legacy vfio container and group model is
intended to be deprecated."

> +
> +With the backward compatibility, no change is required for legacy
> VFIO +drivers or applications to connect a VFIO device to IOMMUFD.
> +
> +	When CONFIG_IOMMUFD_VFIO_CONTAINER=n, VFIO container still provides
> +	/dev/vfio/vfio which connects to vfio_iommu_type1.  To disable VFIO
> +	container and vfio_iommu_type1, the administrator could symbol link
> +	/dev/vfio/vfio to /dev/iommu to enable VFIO container emulation
> +	in IOMMUFD.
> +
> +	When CONFIG_IOMMUFD_VFIO_CONTAINER=y, IOMMUFD directly provides
> +	/dev/vfio/vfio while the VFIO container and vfio_iommu_type1 are
> +	explicitly disabled.
> +

"The IOMMUFD backwards compatibility interface can be enabled two ways.
In the first method, the kernel can be configured with
CONFIG_IOMMUFD_VFIO_CONTAINER, in which case the IOMMUFD subsystem
transparently provides the entire infrastructure for the the VFIO
container and IOMMU backend interfaces.  The compatibility mode can
also be accessed if the VFIO container interface, ie. /dev/vfio/vfio is
simply symlink'd to /dev/iommu.  Note that at the time of writing, the
compatibility mode is not entirely feature complete relative to
VFIO_TYPE1v2_IOMMU (ex. DMA mapping MMIO) and does not attempt to
provide compatibility to the VFIO_SPAPR_TCE_IOMMU interface.  Therefore
it is not generally advisable at this time to switch from native VFIO
implementations to the IOMMUFD compatibility interfaces.

Long term, VFIO users should migrate to device access through the cdev
interface described below, and native access through the IOMMUFD
provided interfaces."

Thanks,
Alex

> +VFIO Device cdev
> +----------------
> +
> +Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
> +in a VFIO group.
> +
> +With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd
> +by directly opening a character device /dev/vfio/devices/vfioX where
> +"X" is the number allocated uniquely by VFIO for registered devices.
> +For noiommu devices, the character device would be named with
> "noiommu-" +prefix. e.g. /dev/vfio/devices/noiommu-vfioX.
> +
> +The cdev only works with IOMMUFD.  Both VFIO drivers and applications
> +must adapt to the new cdev security model which requires using
> +VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to
> +actually use the device.  Once BIND succeeds then a VFIO device can
> +be fully accessed by the user.
> +
> +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers.
> +Hence those modules can be fully compiled out in an environment
> +where no legacy VFIO application exists.
> +
> +So far SPAPR does not support IOMMUFD yet.  So it cannot support
> device +cdev neither.
> +
> +Device cdev Example
> +-------------------
> +
> +Assume user wants to access PCI device 0000:6a:01.0::
> +
> +	$ ls /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/
> +	vfio0
> +
> +This device is therefore represented as vfio0.  The user can verify
> +its existence::
> +
> +	$ ls -l /dev/vfio/devices/vfio0
> +	crw------- 1 root root 511, 0 Feb 16 01:22
> /dev/vfio/devices/vfio0
> +	$ cat /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/vfio0/dev
> +	511:0
> +	$ ls -l /dev/char/511\:0
> +	lrwxrwxrwx 1 root root 21 Feb 16 01:22 /dev/char/511:0 ->
> ../vfio/devices/vfio0 +
> +Then provide the user with access to the device if unprivileged
> +operation is desired::
> +
> +	$ chown user:user /dev/vfio/devices/vfio0
> +
> +Finally the user could get cdev fd by::
> +
> +	cdev_fd = open("/dev/vfio/devices/vfio0", O_RDWR);
> +
> +An opened cdev_fd doesn't give the user any permission of accessing
> +the device except binding the cdev_fd to an iommufd.  After that
> point +then the device is fully accessible including attaching it to
> an +IOMMUFD IOAS/HWPT to enable userspace DMA::
> +
> +	struct vfio_device_bind_iommufd bind = {
> +		.argsz = sizeof(bind),
> +		.flags = 0,
> +	};
> +	struct iommu_ioas_alloc alloc_data  = {
> +		.size = sizeof(alloc_data),
> +		.flags = 0,
> +	};
> +	struct vfio_device_attach_iommufd_pt attach_data = {
> +		.argsz = sizeof(attach_data),
> +		.flags = 0,
> +	};
> +	struct iommu_ioas_map map = {
> +		.size = sizeof(map),
> +		.flags = IOMMU_IOAS_MAP_READABLE |
> +			 IOMMU_IOAS_MAP_WRITEABLE |
> +			 IOMMU_IOAS_MAP_FIXED_IOVA,
> +		.__reserved = 0,
> +	};
> +
> +	iommufd = open("/dev/iommu", O_RDWR);
> +
> +	bind.iommufd = iommufd; // negative value means vfio-noiommu
> mode
> +	ioctl(cdev_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind);
> +
> +	ioctl(iommufd, IOMMU_IOAS_ALLOC, &alloc_data);
> +	attach_data.pt_id = alloc_data.out_ioas_id;
> +	ioctl(cdev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data);
> +
> +	/* Allocate some space and setup a DMA mapping */
> +	map.user_va = (int64_t)mmap(0, 1024 * 1024, PROT_READ |
> PROT_WRITE,
> +				    MAP_PRIVATE | MAP_ANONYMOUS, 0,
> 0);
> +	map.iova = 0; /* 1MB starting at 0x0 from device view */
> +	map.length = 1024 * 1024;
> +	map.ioas_id = alloc_data.out_ioas_id;;
> +
> +	ioctl(iommufd, IOMMU_IOAS_MAP, &map);
> +
> +	/* Other device operations as stated in "VFIO Usage Example"
> */ +
>  VFIO User API
>  -------------------------------------------------------------------------------
>  
> @@ -566,3 +685,11 @@ This implementation has some specifics:
>  				\-0d.1
>  
>  	00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev
> 90) +
> +.. [5] Nested translation is an IOMMU feature which supports two
> stage
> +   address translations.  This improves the address translation
> efficiency
> +   in IOMMU virtualization.
> +
> +.. [6] PASID stands for Process Address Space ID, introduced by PCI
> +   Express.  It is a prerequisite for Shared Virtual Addressing (SVA)
> +   and Scalable I/O Virtualization (Scalable IOV).


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 24/24] docs: vfio: Add vfio device cdev description
  2023-03-29 22:47   ` Alex Williamson
@ 2023-03-29 22:57     ` Alex Williamson
  2023-03-30 13:11     ` Liu, Yi L
  1 sibling, 0 replies; 54+ messages in thread
From: Alex Williamson @ 2023-03-29 22:57 UTC (permalink / raw)
  To: Yi Liu
  Cc: jgg, kevin.tian, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

On Wed, 29 Mar 2023 16:47:49 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Mon, 27 Mar 2023 02:40:47 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > This gives notes for userspace applications on device cdev usage.
> > 
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  Documentation/driver-api/vfio.rst | 127 ++++++++++++++++++++++++++++++
> >  1 file changed, 127 insertions(+)
> > 
> > diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
> > index 363e12c90b87..77408788b98d 100644
> > --- a/Documentation/driver-api/vfio.rst
> > +++ b/Documentation/driver-api/vfio.rst
> > @@ -239,6 +239,125 @@ group and can access them as follows::
> >  	/* Gratuitous device reset and go... */
> >  	ioctl(device, VFIO_DEVICE_RESET);
> >  
> > +IOMMUFD and vfio_iommu_type1
> > +----------------------------
> > +
> > +IOMMUFD is the new user API to manage I/O page tables from userspace.
> > +It intends to be the portal of delivering advanced userspace DMA
> > +features (nested translation [5], PASID [6], etc.) while being backward
> > +compatible with the vfio_iommu_type1 driver.  Eventually vfio_iommu_type1
> > +will be deprecated.  
> 
> "... while also providing a backwards compatibility interface for
> existing VFIO_TYPE1v2_IOMMU use cases.  Eventually the vfio_iommu_type1
> driver, as well as the legacy vfio container and group model is
> intended to be deprecated."
> 
> > +
> > +With the backward compatibility, no change is required for legacy
> > VFIO +drivers or applications to connect a VFIO device to IOMMUFD.
> > +
> > +	When CONFIG_IOMMUFD_VFIO_CONTAINER=n, VFIO container still provides
> > +	/dev/vfio/vfio which connects to vfio_iommu_type1.  To disable VFIO
> > +	container and vfio_iommu_type1, the administrator could symbol link
> > +	/dev/vfio/vfio to /dev/iommu to enable VFIO container emulation
> > +	in IOMMUFD.
> > +
> > +	When CONFIG_IOMMUFD_VFIO_CONTAINER=y, IOMMUFD directly provides
> > +	/dev/vfio/vfio while the VFIO container and vfio_iommu_type1 are
> > +	explicitly disabled.
> > +  
> 
> "The IOMMUFD backwards compatibility interface can be enabled two ways.
> In the first method, the kernel can be configured with
> CONFIG_IOMMUFD_VFIO_CONTAINER, in which case the IOMMUFD subsystem
> transparently provides the entire infrastructure for the the VFIO
> container and IOMMU backend interfaces.  The compatibility mode can
> also be accessed if the VFIO container interface, ie. /dev/vfio/vfio is
> simply symlink'd to /dev/iommu.  Note that at the time of writing, the
> compatibility mode is not entirely feature complete relative to
> VFIO_TYPE1v2_IOMMU (ex. DMA mapping MMIO) and does not attempt to
> provide compatibility to the VFIO_SPAPR_TCE_IOMMU interface.  Therefore
> it is not generally advisable at this time to switch from native VFIO
> implementations to the IOMMUFD compatibility interfaces.
> 
> Long term, VFIO users should migrate to device access through the cdev
> interface described below, and native access through the IOMMUFD
> provided interfaces."
> 
> Thanks,
> Alex
> 
> > +VFIO Device cdev
> > +----------------
> > +
> > +Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
> > +in a VFIO group.
> > +
> > +With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd
> > +by directly opening a character device /dev/vfio/devices/vfioX where
> > +"X" is the number allocated uniquely by VFIO for registered devices.
> > +For noiommu devices, the character device would be named with
> > "noiommu-" +prefix. e.g. /dev/vfio/devices/noiommu-vfioX.
> > +
> > +The cdev only works with IOMMUFD.  Both VFIO drivers and applications
> > +must adapt to the new cdev security model which requires using
> > +VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to
> > +actually use the device.  Once BIND succeeds then a VFIO device can
> > +be fully accessed by the user.
> > +
> > +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers.
> > +Hence those modules can be fully compiled out in an environment
> > +where no legacy VFIO application exists.
> > +
> > +So far SPAPR does not support IOMMUFD yet.  So it cannot support
> > device +cdev neither.
> > +
> > +Device cdev Example
> > +-------------------
> > +
> > +Assume user wants to access PCI device 0000:6a:01.0::
> > +
> > +	$ ls /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/
> > +	vfio0
> > +
> > +This device is therefore represented as vfio0.  The user can verify
> > +its existence::
> > +
> > +	$ ls -l /dev/vfio/devices/vfio0
> > +	crw------- 1 root root 511, 0 Feb 16 01:22
> > /dev/vfio/devices/vfio0
> > +	$ cat /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/vfio0/dev
> > +	511:0
> > +	$ ls -l /dev/char/511\:0
> > +	lrwxrwxrwx 1 root root 21 Feb 16 01:22 /dev/char/511:0 ->
> > ../vfio/devices/vfio0 +
> > +Then provide the user with access to the device if unprivileged
> > +operation is desired::
> > +
> > +	$ chown user:user /dev/vfio/devices/vfio0
> > +
> > +Finally the user could get cdev fd by::
> > +
> > +	cdev_fd = open("/dev/vfio/devices/vfio0", O_RDWR);
> > +
> > +An opened cdev_fd doesn't give the user any permission of accessing
> > +the device except binding the cdev_fd to an iommufd.  After that
> > point +then the device is fully accessible including attaching it to
> > an +IOMMUFD IOAS/HWPT to enable userspace DMA::
> > +
> > +	struct vfio_device_bind_iommufd bind = {
> > +		.argsz = sizeof(bind),
> > +		.flags = 0,
> > +	};
> > +	struct iommu_ioas_alloc alloc_data  = {
> > +		.size = sizeof(alloc_data),
> > +		.flags = 0,
> > +	};
> > +	struct vfio_device_attach_iommufd_pt attach_data = {
> > +		.argsz = sizeof(attach_data),
> > +		.flags = 0,
> > +	};
> > +	struct iommu_ioas_map map = {
> > +		.size = sizeof(map),
> > +		.flags = IOMMU_IOAS_MAP_READABLE |
> > +			 IOMMU_IOAS_MAP_WRITEABLE |
> > +			 IOMMU_IOAS_MAP_FIXED_IOVA,
> > +		.__reserved = 0,
> > +	};
> > +
> > +	iommufd = open("/dev/iommu", O_RDWR);
> > +
> > +	bind.iommufd = iommufd; // negative value means vfio-noiommu
> > mode
> > +	ioctl(cdev_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind);


Even here the dev_cookie is nowhere to be found!  Thanks,

Alex

> > +
> > +	ioctl(iommufd, IOMMU_IOAS_ALLOC, &alloc_data);
> > +	attach_data.pt_id = alloc_data.out_ioas_id;
> > +	ioctl(cdev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data);
> > +
> > +	/* Allocate some space and setup a DMA mapping */
> > +	map.user_va = (int64_t)mmap(0, 1024 * 1024, PROT_READ |
> > PROT_WRITE,
> > +				    MAP_PRIVATE | MAP_ANONYMOUS, 0,
> > 0);
> > +	map.iova = 0; /* 1MB starting at 0x0 from device view */
> > +	map.length = 1024 * 1024;
> > +	map.ioas_id = alloc_data.out_ioas_id;;
> > +
> > +	ioctl(iommufd, IOMMU_IOAS_MAP, &map);
> > +
> > +	/* Other device operations as stated in "VFIO Usage Example"
> > */ +
> >  VFIO User API
> >  -------------------------------------------------------------------------------
> >  
> > @@ -566,3 +685,11 @@ This implementation has some specifics:
> >  				\-0d.1
> >  
> >  	00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev
> > 90) +
> > +.. [5] Nested translation is an IOMMU feature which supports two
> > stage
> > +   address translations.  This improves the address translation
> > efficiency
> > +   in IOMMU virtualization.
> > +
> > +.. [6] PASID stands for Process Address Space ID, introduced by PCI
> > +   Express.  It is a prerequisite for Shared Virtual Addressing (SVA)
> > +   and Scalable I/O Virtualization (Scalable IOV).  
> 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 21/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-03-29 21:00   ` Alex Williamson
@ 2023-03-29 23:22     ` Jason Gunthorpe
  2023-03-30 12:52       ` Liu, Yi L
  2023-03-30  7:09     ` Liu, Yi L
  1 sibling, 1 reply; 54+ messages in thread
From: Jason Gunthorpe @ 2023-03-29 23:22 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Yi Liu, kevin.tian, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

On Wed, Mar 29, 2023 at 03:00:55PM -0600, Alex Williamson wrote:

> > + * The user should provide a device cookie when calling this ioctl. The
> > + * cookie is carried only in event e.g. I/O fault reported to userspace
> > + * via iommufd. The user should use devid returned by this ioctl to mark
> > + * the target device in other ioctls (e.g. iommu hardware infomration query
> > + * via iommufd, and etc.).
> 
> AFAICT, the whole concept of this dev_cookie is a fantasy.  It only
> exists in this series in these comments and the structure below.  It's
> not even defined whether it needs to be unique within an iommufd
> context, and clearly nothing here validates that.  There's not enough
> implementation for it to exist in this series.  Maybe dev_cookie is
> appended to the end of the structure and indicated with a flag when it
> has some meaning.

Yes, I've asked for this to be punted to the PRI series enough times
already, why does it keep coming back ??

> > + * @argsz:	 user filled size of this data.
> > + * @flags:	 reserved for future extension.
> > + * @dev_cookie:	 a per device cookie provided by userspace.
> > + * @iommufd:	 iommufd to bind. a negative value means noiommu.
> 
> "Use a negative value for no-iommu, where supported", or better, should
> we define this explicitly as -1, or why not use a flag bit to specify
> no-iommu?  Maybe minsz is only through flags for the noiommu use case.

I was happy enough for this to be defined as -1. We could give it a
formal sounding constant too

Jason

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v8 20/24] vfio: Add cdev for vfio_device
  2023-03-29 19:57   ` Alex Williamson
@ 2023-03-30  5:35     ` Liu, Yi L
  0 siblings, 0 replies; 54+ messages in thread
From: Liu, Yi L @ 2023-03-30  5:35 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Thursday, March 30, 2023 3:57 AM
> 
> On Mon, 27 Mar 2023 02:40:43 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
[...]
> > +/*
> > + * device access via the fd opened by this function is blocked until
> > + * .open_device() is called successfully during BIND_IOMMUFD.
> > + */
> > +int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
> > +{
> > +	struct vfio_device *device = container_of(inode->i_cdev,
> > +						  struct vfio_device, cdev);
> > +	struct vfio_device_file *df;
> > +	int ret;
> > +
> > +	if (!vfio_device_try_get_registration(device))
> > +		return -ENODEV;
> > +
> > +	df = vfio_allocate_device_file(device);
> > +	if (IS_ERR(df)) {
> > +		ret = PTR_ERR(df);
> > +		goto err_put_registration;
> > +	}
> > +
> > +	filep->private_data = df;
> > +
> > +	return 0;
> > +
> > +err_put_registration:
> > +	vfio_device_put_registration(device);
> > +	return ret;
> > +}
> > +
[...]
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index 8e96aab27029..58fc3bb768f2 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -242,6 +242,7 @@ static int vfio_init_device(struct vfio_device *device, struct
> device *dev,
> >  	device->device.release = vfio_device_release;
> >  	device->device.class = vfio.device_class;
> >  	device->device.parent = device->dev;
> > +	vfio_init_device_cdev(device);
> >  	return 0;
> >
> >  out_uninit:
> > @@ -280,7 +281,7 @@ static int __vfio_register_dev(struct vfio_device *device,
> >  	if (ret)
> >  		goto err_out;
> >
> > -	ret = device_add(&device->device);
> > +	ret = vfio_device_add(device);
> >  	if (ret)
> >  		goto err_out;
> >
> > @@ -320,6 +321,12 @@ void vfio_unregister_group_dev(struct vfio_device *device)
> >  	bool interrupted = false;
> >  	long rc;
> >
> > +	/* Prevent new device opened in the group path */
> > +	vfio_device_group_unregister(device);
> > +
> > +	/* Prevent new device opened in the cdev path */
> > +	vfio_device_del(device);
> > +
> >  	vfio_device_put_registration(device);
> >  	rc = try_wait_for_completion(&device->comp);
> >  	while (rc <= 0) {
> > @@ -343,11 +350,6 @@ void vfio_unregister_group_dev(struct vfio_device *device)
> >  		}
> >  	}
> >
> > -	vfio_device_group_unregister(device);
> > -
> > -	/* Balances device_add in register path */
> > -	device_del(&device->device);
> > -
> 
> Why were these relocated?  And additionally why was the comment
> regarding the balance operations dropped?  The move seems unrelated to
> the patch here, so if it's actually advisable for some reason, it
> should be a separate patch.  Thanks,

The reason for the relocation is to prevent new device which would result
in the device->refcount increasing. If the user keeps open device then the
device->refcount may keep increasing. Then the vfio_unregister_group_dev()
may be stuck here. This is rare, but possible. 

By doing vfio_device_group_unregister(), the device is removed from the
group->device_list. Then user cannot open the device by VFIO_GROUP_GET_DEVICE_FD.
Hence it won't increase the device->refcount. I agree with you, this should
be done in a separate patch.

Same reason for relocating device_del(&device->device); User may keep
opening the cdev to increase the device->refcount. Then the
vfio_device_group_unregister() path would be stuck as well. But this
relocation needs to be done here since user cannot do it if without cdev.

Last, need to keep the balance comment as well even the sequence
it not strictly mirrored. will keep the comment.

> Alex
> 
> >  	/* Balances vfio_device_set_group in register path */
> >  	vfio_device_remove_group(device);
> >  }
> > @@ -555,7 +557,8 @@ static int vfio_device_fops_release(struct inode *inode,
> struct file *filep)
> >  	struct vfio_device_file *df = filep->private_data;
> >  	struct vfio_device *device = df->device;
> >
> > -	vfio_device_group_close(df);
> > +	if (df->group)
> > +		vfio_device_group_close(df);
> >
> >  	vfio_device_put_registration(device);
> >

Thanks,
Yi Liu

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v8 21/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-03-29 21:00   ` Alex Williamson
  2023-03-29 23:22     ` Jason Gunthorpe
@ 2023-03-30  7:09     ` Liu, Yi L
  2023-03-30 11:52       ` Jason Gunthorpe
  1 sibling, 1 reply; 54+ messages in thread
From: Liu, Yi L @ 2023-03-30  7:09 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Thursday, March 30, 2023 5:01 AM
> 
> On Mon, 27 Mar 2023 02:40:44 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > This adds ioctl for userspace to bind device cdev fd to iommufd.
> >
> >     VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA
> > 			      control provided by the iommufd. open_device
> > 			      op is called after bind_iommufd op.
> > 			      VFIO no iommu mode is indicated by passing
> > 			      a negative iommufd value.
> >
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > Tested-by: Terrence Xu <terrence.xu@intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/device_cdev.c | 153 +++++++++++++++++++++++++++++++++++++
> >  drivers/vfio/vfio.h        |  13 ++++
> >  drivers/vfio/vfio_main.c   |   5 ++
> >  include/uapi/linux/vfio.h  |  37 +++++++++
> >  4 files changed, 208 insertions(+)
> >
> > diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> > index 1c640016a824..2b563bac50b9 100644
> > --- a/drivers/vfio/device_cdev.c
> > +++ b/drivers/vfio/device_cdev.c
> > @@ -3,6 +3,7 @@
> >   * Copyright (c) 2023 Intel Corporation.
> >   */
> >  #include <linux/vfio.h>
> > +#include <linux/iommufd.h>
> >
> >  #include "vfio.h"
> >
> > @@ -44,6 +45,158 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct
> file *filep)
> >  	return ret;
> >  }
> >
> > +static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
> > +{
> > +	spin_lock(&df->kvm_ref_lock);
> > +	if (df->kvm)
> > +		_vfio_device_get_kvm_safe(df->device, df->kvm);
> > +	spin_unlock(&df->kvm_ref_lock);
> > +}
> > +
> > +void vfio_device_cdev_close(struct vfio_device_file *df)
> > +{
> > +	struct vfio_device *device = df->device;
> > +
> > +	/*
> > +	 * As df->access_granted writer is under dev_set->lock as well,
> > +	 * so this read no need to use smp_load_acquire() to pair with
> 
> Nit, "no need to use" -> "does not need to use"

got it.

> 
> > +	 * smp_store_release() in the caller of vfio_device_open().
> > +	 */
> > +	if (!df->access_granted)
> > +		return;
> > +
> 
> Isn't the lock we're acquiring below the one that we claim to have in
> the comment above to make the non-smp_load_acquire() test safe?

the comment may be not accurate enough. The the non-smp_load_acquire()
and no lock test were according to the below two remarks in v4 and v5.

https://lore.kernel.org/kvm/Y%2FYRx7jLuyEoLxZg@nvidia.com/
https://lore.kernel.org/kvm/Y%2F0CV1K0YNHA+olf@nvidia.com/

Perhaps the comment should be:

"In the time of close, there is no contention with another one
  changing this flag. So test df->access_granted without lock
  nor smp_load_acquire() is ok."

> > +	mutex_lock(&device->dev_set->lock);
> > +	vfio_device_close(df);
> > +	vfio_device_put_kvm(device);
> > +	if (df->iommufd)
> > +		iommufd_ctx_put(df->iommufd);
> > +	mutex_unlock(&device->dev_set->lock);
> > +	vfio_device_unblock_group(device);
> > +}
> > +
> > +static int vfio_device_cdev_enable_noiommu(struct vfio_device *device)
> > +{
> > +	if (!capable(CAP_SYS_RAWIO))
> > +		return -EPERM;
> > +
> > +	if (!device->noiommu)
> > +		return -EINVAL;
> > +
> > +	return 0;
> > +}
> 
> This is testing, not enabling. ie. naming nit.

how about probe_noiommu or test_noiommu?

> 
> > +
> > +static struct iommufd_ctx *vfio_get_iommufd_from_fd(int fd)
> > +{
> > +	struct fd f;
> > +	struct iommufd_ctx *iommufd;
> > +
> > +	f = fdget(fd);
> > +	if (!f.file)
> > +		return ERR_PTR(-EBADF);
> > +
> > +	iommufd = iommufd_ctx_from_file(f.file);
> > +
> > +	fdput(f);
> > +	return iommufd;
> > +}
> > +
> > +long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
> > +				    struct vfio_device_bind_iommufd __user *arg)
> > +{
> > +	struct vfio_device *device = df->device;
> > +	struct vfio_device_bind_iommufd bind;
> > +	struct iommufd_ctx *iommufd = NULL;
> > +	unsigned long minsz;
> > +	int ret;
> > +
> > +	static_assert(__same_type(arg->out_devid, bind.out_devid));
> 
> They're the same field in the same structure, how could they be
> otherwise?

@Jason, should I remove this check?

> > +
> > +	minsz = offsetofend(struct vfio_device_bind_iommufd, out_devid);
> > +
> > +	if (copy_from_user(&bind, arg, minsz))
> > +		return -EFAULT;
> > +
> > +	if (bind.argsz < minsz || bind.flags)
> > +		return -EINVAL;
> > +
> > +	if (!device->ops->bind_iommufd)
> > +		return -ENODEV;
> 
> This test seems beyond normal paranoia since we test in
> __vfio_register_dev()

yes. The whole c file depends on VFIO_DEVICE_CDEV which
depends on IOMMUFD, and if IOMMUFD is enabled,
__vfio_register_dev() already checks this callback.

> 
> > +
> > +	/* BIND_IOMMUFD only allowed for cdev fds */
> > +	if (df->group)
> > +		return -EINVAL;
> > +
> > +	ret = vfio_device_block_group(device);
> > +	if (ret)
> > +		return ret;
> > +
> > +	mutex_lock(&device->dev_set->lock);
> > +	/* one device cannot be bound twice */
> > +	if (df->access_granted) {
> > +		ret = -EINVAL;
> > +		goto out_unlock;
> > +	}
> > +
> > +	/* iommufd < 0 means noiommu mode */
> > +	if (bind.iommufd < 0) {
> > +		ret = vfio_device_cdev_enable_noiommu(device);
> > +		if (ret)
> > +			goto out_unlock;
> > +	} else {
> > +		iommufd = vfio_get_iommufd_from_fd(bind.iommufd);
> > +		if (IS_ERR(iommufd)) {
> > +			ret = PTR_ERR(iommufd);
> > +			goto out_unlock;
> > +		}
> > +	}
> > +
> > +	/*
> > +	 * Before the device open, get the KVM pointer currently
> > +	 * associated with the device file (if there is) and obtain
> > +	 * a reference.  This reference is held until device closed.
> > +	 * Save the pointer in the device for use by drivers.
> > +	 */
> > +	vfio_device_get_kvm_safe(df);
> > +
> > +	df->iommufd = iommufd;
> > +	ret = vfio_device_open(df);
> > +	if (ret)
> > +		goto out_put_kvm;
> > +
> > +	if (df->iommufd)
> > +		bind.out_devid = df->devid;
> 
> How about only setting df->iommufd in the non-noiommu case above so
> it's not confusing that it was just set 4 lines previous.  That also
> allows the iommufd pointer to be scoped within that branch and not
> require initialization.  It might make sense to declare:
> 
> 	bool is_noiommu = (bind.iommufd < 0);
> 
> and use it consistently rather than alternating testing between
> bind.iommufd and df->iommufd.

sure.

> > +
> > +	ret = copy_to_user(&arg->out_devid, &bind.out_devid,
> > +			   sizeof(bind.out_devid)) ? -EFAULT : 0;
> 
> In the noiommu case, this copies back the input value, shouldn't it be
> some known invalid value?  Seems confusing.

maybe just do copy for the non-noiommu case?

> > +	if (ret)
> > +		goto out_close_device;
> > +
> > +	if (bind.iommufd < 0)
> > +		dev_warn(device->dev, "device is bound to vfio-noiommu by user "
> > +			 "(%s:%d)\n", current->comm, task_pid_nr(current));
> > +
> > +	/*
> > +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> > +	 * read/write/mmap
> > +	 */
> > +	smp_store_release(&df->access_granted, true);
> > +	mutex_unlock(&device->dev_set->lock);
> > +
> > +	return 0;
> > +
> > +out_close_device:
> > +	vfio_device_close(df);
> > +out_put_kvm:
> > +	df->iommufd = NULL;
> > +	vfio_device_put_kvm(device);
> > +	if (iommufd)
> > +		iommufd_ctx_put(iommufd);
> > +out_unlock:
> > +	mutex_unlock(&device->dev_set->lock);
> > +	vfio_device_unblock_group(device);
> > +	return ret;
> > +}
> > +
> >  static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
> >  {
> >  	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
> > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > index 3a8fd0e32f59..ace3d52b0928 100644
> > --- a/drivers/vfio/vfio.h
> > +++ b/drivers/vfio/vfio.h
> > @@ -281,6 +281,9 @@ static inline void vfio_device_del(struct vfio_device *device)
> >
> >  void vfio_init_device_cdev(struct vfio_device *device);
> >  int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
> > +void vfio_device_cdev_close(struct vfio_device_file *df);
> > +long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
> > +				    struct vfio_device_bind_iommufd __user *arg);
> >  int vfio_cdev_init(struct class *device_class);
> >  void vfio_cdev_cleanup(void);
> >  #else
> > @@ -304,6 +307,16 @@ static inline int vfio_device_fops_cdev_open(struct inode
> *inode,
> >  	return 0;
> >  }
> >
> > +static inline void vfio_device_cdev_close(struct vfio_device_file *df)
> > +{
> > +}
> > +
> > +static inline long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
> > +						  struct vfio_device_bind_iommufd
> __user *arg)
> > +{
> > +	return -EOPNOTSUPP;
> > +}
> > +
> >  static inline int vfio_cdev_init(struct class *device_class)
> >  {
> >  	return 0;
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index 58fc3bb768f2..375086c8803f 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -559,6 +559,8 @@ static int vfio_device_fops_release(struct inode *inode,
> struct file *filep)
> >
> >  	if (df->group)
> >  		vfio_device_group_close(df);
> > +	else
> > +		vfio_device_cdev_close(df);
> >
> >  	vfio_device_put_registration(device);
> >
> > @@ -1132,6 +1134,9 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
> >  	struct vfio_device *device = df->device;
> >  	int ret;
> >
> > +	if (cmd == VFIO_DEVICE_BIND_IOMMUFD)
> > +		return vfio_device_ioctl_bind_iommufd(df, (void __user *)arg);
> > +
> >  	/* Paired with smp_store_release() following vfio_device_open() */
> >  	if (!smp_load_acquire(&df->access_granted))
> >  		return -EINVAL;
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 61b801dfd40b..62b2f2497525 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -194,6 +194,43 @@ struct vfio_group_status {
> >
> >  /* --------------- IOCTLs for DEVICE file descriptors --------------- */
> >
> > +/*
> > + * VFIO_DEVICE_BIND_IOMMUFD - _IOR(VFIO_TYPE, VFIO_BASE + 19,
> > + *				   struct vfio_device_bind_iommufd)
> > + *
> > + * Bind a vfio_device to the specified iommufd.
> > + *
> > + * The user should provide a device cookie when calling this ioctl. The
> > + * cookie is carried only in event e.g. I/O fault reported to userspace
> > + * via iommufd. The user should use devid returned by this ioctl to mark
> > + * the target device in other ioctls (e.g. iommu hardware infomration query
> > + * via iommufd, and etc.).
> 
> AFAICT, the whole concept of this dev_cookie is a fantasy.  It only
> exists in this series in these comments and the structure below.  It's
> not even defined whether it needs to be unique within an iommufd
> context, and clearly nothing here validates that.  There's not enough
> implementation for it to exist in this series.  Maybe dev_cookie is
> appended to the end of the structure and indicated with a flag when it
> has some meaning.

sorry, I should have deleted it. ☹

> 
> > + *
> > + * User is not allowed to access the device before the binding operation
> > + * is completed.
> 
> s/not allowed to access/restricted from accessing/

got it.

> 
> > + *
> > + * Unbind is automatically conducted when device fd is closed.
> > + *
> > + * @argsz:	 user filled size of this data.
> > + * @flags:	 reserved for future extension.
> > + * @dev_cookie:	 a per device cookie provided by userspace.
> > + * @iommufd:	 iommufd to bind. a negative value means noiommu.
> 
> "Use a negative value for no-iommu, where supported", or better, should
> we define this explicitly as -1, or why not use a flag bit to specify
> no-iommu?  Maybe minsz is only through flags for the noiommu use case.
> Thanks,

I don’t have preference here. maybe using -1 can save a flag bit for future
extension. 

> 
> > + * @out_devid:	 the device id generated by this bind. This field is valid
> > + *		as long as the input @iommufd is valid. Otherwise, it is
> > + *		meaningless.
> > + *
> > + * Return: 0 on success, -errno on failure.
> > + */
> > +struct vfio_device_bind_iommufd {
> > +	__u32		argsz;
> > +	__u32		flags;
> > +	__aligned_u64	dev_cookie;
> > +	__s32		iommufd;
> > +	__u32		out_devid;
> > +};
> > +
> > +#define VFIO_DEVICE_BIND_IOMMUFD	_IO(VFIO_TYPE, VFIO_BASE + 19)
> > +
> >  /**
> >   * VFIO_DEVICE_GET_INFO - _IOR(VFIO_TYPE, VFIO_BASE + 7,
> >   *						struct vfio_device_info)

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 21/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-03-30  7:09     ` Liu, Yi L
@ 2023-03-30 11:52       ` Jason Gunthorpe
  2023-03-30 12:53         ` Liu, Yi L
  0 siblings, 1 reply; 54+ messages in thread
From: Jason Gunthorpe @ 2023-03-30 11:52 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Alex Williamson, Tian, Kevin, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390, Hao,
	Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting

On Thu, Mar 30, 2023 at 07:09:31AM +0000, Liu, Yi L wrote:

> > > +long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
> > > +				    struct vfio_device_bind_iommufd __user *arg)
> > > +{
> > > +	struct vfio_device *device = df->device;
> > > +	struct vfio_device_bind_iommufd bind;
> > > +	struct iommufd_ctx *iommufd = NULL;
> > > +	unsigned long minsz;
> > > +	int ret;
> > > +
> > > +	static_assert(__same_type(arg->out_devid, bind.out_devid));
> > 
> > They're the same field in the same structure, how could they be
> > otherwise?
> 
> @Jason, should I remove this check?

Yes, it was from something that looked very different from this

Jason

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v8 21/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-03-29 23:22     ` Jason Gunthorpe
@ 2023-03-30 12:52       ` Liu, Yi L
  2023-03-30 12:59         ` Jason Gunthorpe
  0 siblings, 1 reply; 54+ messages in thread
From: Liu, Yi L @ 2023-03-30 12:52 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson
  Cc: Tian, Kevin, joro, robin.murphy, cohuck, eric.auger, nicolinc,
	kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, March 30, 2023 7:22 AM
> 
> On Wed, Mar 29, 2023 at 03:00:55PM -0600, Alex Williamson wrote:
> 
> > > + * The user should provide a device cookie when calling this ioctl. The
> > > + * cookie is carried only in event e.g. I/O fault reported to userspace
> > > + * via iommufd. The user should use devid returned by this ioctl to mark
> > > + * the target device in other ioctls (e.g. iommu hardware infomration query
> > > + * via iommufd, and etc.).
> >
> > AFAICT, the whole concept of this dev_cookie is a fantasy.  It only
> > exists in this series in these comments and the structure below.  It's
> > not even defined whether it needs to be unique within an iommufd
> > context, and clearly nothing here validates that.  There's not enough
> > implementation for it to exist in this series.  Maybe dev_cookie is
> > appended to the end of the structure and indicated with a flag when it
> > has some meaning.
> 
> Yes, I've asked for this to be punted to the PRI series enough times
> already, why does it keep coming back ??

yes, I promise to remove it in next version.

> > > + * @argsz:	 user filled size of this data.
> > > + * @flags:	 reserved for future extension.
> > > + * @dev_cookie:	 a per device cookie provided by userspace.
> > > + * @iommufd:	 iommufd to bind. a negative value means noiommu.
> >
> > "Use a negative value for no-iommu, where supported", or better, should
> > we define this explicitly as -1, or why not use a flag bit to specify
> > no-iommu?  Maybe minsz is only through flags for the noiommu use case.
> 
> I was happy enough for this to be defined as -1. We could give it a
> formal sounding constant too

are you suggesting having something like "#define VFIO_NOIOMMU_FD	-1"?

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v8 21/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-03-30 11:52       ` Jason Gunthorpe
@ 2023-03-30 12:53         ` Liu, Yi L
  0 siblings, 0 replies; 54+ messages in thread
From: Liu, Yi L @ 2023-03-30 12:53 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alex Williamson, Tian, Kevin, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390, Hao,
	Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, March 30, 2023 7:52 PM
> 
> On Thu, Mar 30, 2023 at 07:09:31AM +0000, Liu, Yi L wrote:
> 
> > > > +long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
> > > > +				    struct vfio_device_bind_iommufd __user *arg)
> > > > +{
> > > > +	struct vfio_device *device = df->device;
> > > > +	struct vfio_device_bind_iommufd bind;
> > > > +	struct iommufd_ctx *iommufd = NULL;
> > > > +	unsigned long minsz;
> > > > +	int ret;
> > > > +
> > > > +	static_assert(__same_type(arg->out_devid, bind.out_devid));
> > >
> > > They're the same field in the same structure, how could they be
> > > otherwise?
> >
> > @Jason, should I remove this check?
> 
> Yes, it was from something that looked very different from this

ok, I'll remove it here and next patch. 😊

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 21/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-03-30 12:52       ` Liu, Yi L
@ 2023-03-30 12:59         ` Jason Gunthorpe
  0 siblings, 0 replies; 54+ messages in thread
From: Jason Gunthorpe @ 2023-03-30 12:59 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Alex Williamson, Tian, Kevin, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390, Hao,
	Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting

On Thu, Mar 30, 2023 at 12:52:39PM +0000, Liu, Yi L wrote:

> > > "Use a negative value for no-iommu, where supported", or better, should
> > > we define this explicitly as -1, or why not use a flag bit to specify
> > > no-iommu?  Maybe minsz is only through flags for the noiommu use case.
> > 
> > I was happy enough for this to be defined as -1. We could give it a
> > formal sounding constant too
> 
> are you suggesting having something like "#define VFIO_NOIOMMU_FD	-1"?

Yeah something like that

Jason

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v8 22/24] vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT
  2023-03-29 21:19   ` Alex Williamson
@ 2023-03-30 13:02     ` Liu, Yi L
  0 siblings, 0 replies; 54+ messages in thread
From: Liu, Yi L @ 2023-03-30 13:02 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Thursday, March 30, 2023 5:19 AM
> 
> 
> Nit, ATDETACH?  [AT|DE]TACH?

[AT|DE]TACH 😊

> On Mon, 27 Mar 2023 02:40:45 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > This adds ioctl for userspace to attach device cdev fd to and detach
> > from IOAS/hw_pagetable managed by iommufd.
> >
> >     VFIO_DEVICE_ATTACH_IOMMUFD_PT: attach vfio device to IOAS, hw_pagetable
> > 				   managed by iommufd. Attach can be
> > 				   undo by VFIO_DEVICE_DETACH_IOMMUFD_PT
> > 				   or device fd close.
> >     VFIO_DEVICE_DETACH_IOMMUFD_PT: detach vfio device from the current
> attached
> > 				   IOAS or hw_pagetable managed by iommufd.
> >
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > Tested-by: Terrence Xu <terrence.xu@intel.com>
> > Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> > Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/device_cdev.c | 85 ++++++++++++++++++++++++++++++++++++++
> >  drivers/vfio/vfio.h        | 16 +++++++
> >  drivers/vfio/vfio_main.c   |  8 ++++
> >  include/uapi/linux/vfio.h  | 52 +++++++++++++++++++++++
> >  4 files changed, 161 insertions(+)
> >
> > diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> > index 2b563bac50b9..b5de997bff6d 100644
> > --- a/drivers/vfio/device_cdev.c
> > +++ b/drivers/vfio/device_cdev.c
> > @@ -197,6 +197,91 @@ long vfio_device_ioctl_bind_iommufd(struct
> vfio_device_file *df,
> >  	return ret;
> >  }
> >
> > +int vfio_ioctl_device_attach(struct vfio_device_file *df,
> > +			     struct vfio_device_attach_iommufd_pt __user *arg)
> > +{
> > +	struct vfio_device *device = df->device;
> > +	struct vfio_device_attach_iommufd_pt attach;
> > +	unsigned long minsz;
> > +	int ret;
> > +
> > +	static_assert(__same_type(arg->pt_id, attach.pt_id));
> 
> Same comment as previous, given these are the same field of the same
> structure, I don't understand how this could ever assert.

yes, will remove it.

> 
> > +
> > +	minsz = offsetofend(struct vfio_device_attach_iommufd_pt, pt_id);
> > +
> > +	if (copy_from_user(&attach, arg, minsz))
> > +		return -EFAULT;
> > +
> > +	if (attach.argsz < minsz || attach.flags)
> > +		return -EINVAL;
> > +
> > +	if (!device->ops->bind_iommufd)
> > +		return -ENODEV;
> 
> Same as previous, we already require this to enable cdev support, so
> this seems ultra paranoid.

will remove it.

> 
> > +
> > +	/* ATTACH only allowed for cdev fds */
> > +	if (df->group)
> > +		return -EINVAL;
> > +
> > +	mutex_lock(&device->dev_set->lock);
> > +	/* noiommufd mode doesn't allow attach */
> > +	if (!df->iommufd) {
> > +		ret = -EOPNOTSUPP;
> > +		goto out_unlock;
> > +	}
> > +
> > +	ret = device->ops->attach_ioas(device, &attach.pt_id);
> > +	if (ret)
> > +		goto out_unlock;
> > +
> > +	ret = copy_to_user(&arg->pt_id, &attach.pt_id,
> > +			   sizeof(attach.pt_id)) ? -EFAULT : 0;
> > +	if (ret)
> > +		goto out_detach;
> > +	mutex_unlock(&device->dev_set->lock);
> > +
> > +	return 0;
> > +
> > +out_detach:
> > +	device->ops->detach_ioas(device);
> > +out_unlock:
> > +	mutex_unlock(&device->dev_set->lock);
> > +	return ret;
> > +}
> > +
> > +int vfio_ioctl_device_detach(struct vfio_device_file *df,
> > +			     struct vfio_device_detach_iommufd_pt __user *arg)
> > +{
> > +	struct vfio_device *device = df->device;
> > +	struct vfio_device_detach_iommufd_pt detach;
> > +	unsigned long minsz;
> > +
> > +	minsz = offsetofend(struct vfio_device_detach_iommufd_pt, flags);
> > +
> > +	if (copy_from_user(&detach, arg, minsz))
> > +		return -EFAULT;
> > +
> > +	if (detach.argsz < minsz || detach.flags)
> > +		return -EINVAL;
> > +
> > +	if (!device->ops->bind_iommufd)
> > +		return -ENODEV;
> > +
> > +	/* DETACH only allowed for cdev fds */
> > +	if (df->group)
> > +		return -EINVAL;
> > +
> > +	mutex_lock(&device->dev_set->lock);
> > +	/* noiommufd mode doesn't support detach */
> > +	if (!df->iommufd) {
> > +		mutex_unlock(&device->dev_set->lock);
> > +		return -EOPNOTSUPP;
> > +	}
> > +	device->ops->detach_ioas(device);
> > +	mutex_unlock(&device->dev_set->lock);
> > +
> > +	return 0;
> > +}
> 
> For both of these, can't a user trigger the
> WARN_ON(!vdev->iommufd_device) if they call either of these before
> VFIO_DEVICE_BIND_IOMMUFD?  Thanks,

if VFIO_DEVICE_BIND_IOMMUFD is not done, the ATTACH/DETACH
shall be blocked, so it won't trigger the WARN_ON which is in the
attach_ioas and detach_ioas callback.

Regards,
Yi Liu
> 
> > +
> >  static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
> >  {
> >  	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
> > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > index ace3d52b0928..c199e410db18 100644
> > --- a/drivers/vfio/vfio.h
> > +++ b/drivers/vfio/vfio.h
> > @@ -284,6 +284,10 @@ int vfio_device_fops_cdev_open(struct inode *inode,
> struct file *filep);
> >  void vfio_device_cdev_close(struct vfio_device_file *df);
> >  long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
> >  				    struct vfio_device_bind_iommufd __user *arg);
> > +int vfio_ioctl_device_attach(struct vfio_device_file *df,
> > +			     struct vfio_device_attach_iommufd_pt __user *arg);
> > +int vfio_ioctl_device_detach(struct vfio_device_file *df,
> > +			     struct vfio_device_detach_iommufd_pt __user *arg);
> >  int vfio_cdev_init(struct class *device_class);
> >  void vfio_cdev_cleanup(void);
> >  #else
> > @@ -317,6 +321,18 @@ static inline long vfio_device_ioctl_bind_iommufd(struct
> vfio_device_file *df,
> >  	return -EOPNOTSUPP;
> >  }
> >
> > +static inline int vfio_ioctl_device_attach(struct vfio_device_file *df,
> > +					   struct vfio_device_attach_iommufd_pt
> __user *arg)
> > +{
> > +	return -EOPNOTSUPP;
> > +}
> > +
> > +static inline int vfio_ioctl_device_detach(struct vfio_device_file *df,
> > +					   struct vfio_device_detach_iommufd_pt
> __user *arg)
> > +{
> > +	return -EOPNOTSUPP;
> > +}
> > +
> >  static inline int vfio_cdev_init(struct class *device_class)
> >  {
> >  	return 0;
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index 375086c8803f..896d8bb49585 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -1150,6 +1150,14 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
> >  		ret = vfio_ioctl_device_feature(device, (void __user *)arg);
> >  		break;
> >
> > +	case VFIO_DEVICE_ATTACH_IOMMUFD_PT:
> > +		ret = vfio_ioctl_device_attach(df, (void __user *)arg);
> > +		break;
> > +
> > +	case VFIO_DEVICE_DETACH_IOMMUFD_PT:
> > +		ret = vfio_ioctl_device_detach(df, (void __user *)arg);
> > +		break;
> > +
> >  	default:
> >  		if (unlikely(!device->ops->ioctl))
> >  			ret = -EINVAL;
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 62b2f2497525..bf6c97e759c7 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -231,6 +231,58 @@ struct vfio_device_bind_iommufd {
> >
> >  #define VFIO_DEVICE_BIND_IOMMUFD	_IO(VFIO_TYPE, VFIO_BASE + 19)
> >
> > +/*
> > + * VFIO_DEVICE_ATTACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 20,
> > + *					struct vfio_device_attach_iommufd_pt)
> > + *
> > + * Attach a vfio device to an iommufd address space specified by IOAS
> > + * id or hw_pagetable (hwpt) id.
> > + *
> > + * Available only after a device has been bound to iommufd via
> > + * VFIO_DEVICE_BIND_IOMMUFD
> > + *
> > + * Undo by VFIO_DEVICE_DETACH_IOMMUFD_PT or device fd close.
> > + *
> > + * @argsz:	user filled size of this data.
> > + * @flags:	must be 0.
> > + * @pt_id:	Input the target id which can represent an ioas or a hwpt
> > + *		allocated via iommufd subsystem.
> > + *		Output the attached hwpt id which could be the specified
> > + *		hwpt itself or a hwpt automatically created for the
> > + *		specified ioas by kernel during the attachment.
> > + *
> > + * Return: 0 on success, -errno on failure.
> > + */
> > +struct vfio_device_attach_iommufd_pt {
> > +	__u32	argsz;
> > +	__u32	flags;
> > +	__u32	pt_id;
> > +};
> > +
> > +#define VFIO_DEVICE_ATTACH_IOMMUFD_PT		_IO(VFIO_TYPE,
> VFIO_BASE + 20)
> > +
> > +/*
> > + * VFIO_DEVICE_DETACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 21,
> > + *					struct vfio_device_detach_iommufd_pt)
> > + *
> > + * Detach a vfio device from the iommufd address space it has been
> > + * attached to. After it, device should be in a blocking DMA state.
> > + *
> > + * Available only after a device has been bound to iommufd via
> > + * VFIO_DEVICE_BIND_IOMMUFD.
> > + *
> > + * @argsz:	user filled size of this data.
> > + * @flags:	must be 0.
> > + *
> > + * Return: 0 on success, -errno on failure.
> > + */
> > +struct vfio_device_detach_iommufd_pt {
> > +	__u32	argsz;
> > +	__u32	flags;
> > +};
> > +
> > +#define VFIO_DEVICE_DETACH_IOMMUFD_PT		_IO(VFIO_TYPE,
> VFIO_BASE + 21)
> > +
> >  /**
> >   * VFIO_DEVICE_GET_INFO - _IOR(VFIO_TYPE, VFIO_BASE + 7,
> >   *						struct vfio_device_info)


^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v8 23/24] vfio: Compile group optionally
  2023-03-29 21:51   ` Alex Williamson
@ 2023-03-30 13:06     ` Liu, Yi L
  0 siblings, 0 replies; 54+ messages in thread
From: Liu, Yi L @ 2023-03-30 13:06 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Thursday, March 30, 2023 5:51 AM
> 
> On Mon, 27 Mar 2023 02:40:46 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > group code is not needed for vfio device cdev, so with vfio device cdev
> > introduced, the group infrastructures can be compiled out if only cdev
> > is needed.
> >
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > Tested-by: Terrence Xu <terrence.xu@intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/iommu/iommufd/Kconfig |   4 +-
> >  drivers/vfio/Kconfig          |  16 ++++-
> >  drivers/vfio/Makefile         |   2 +-
> >  drivers/vfio/vfio.h           | 111 ++++++++++++++++++++++++++++++++--
> >  include/linux/vfio.h          |  13 +++-
> >  5 files changed, 134 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig
> > index ada693ea51a7..1946eed1826a 100644
> > --- a/drivers/iommu/iommufd/Kconfig
> > +++ b/drivers/iommu/iommufd/Kconfig
> > @@ -14,8 +14,8 @@ config IOMMUFD
> >  if IOMMUFD
> >  config IOMMUFD_VFIO_CONTAINER
> >  	bool "IOMMUFD provides the VFIO container /dev/vfio/vfio"
> > -	depends on VFIO && !VFIO_CONTAINER
> > -	default VFIO && !VFIO_CONTAINER
> > +	depends on VFIO && VFIO_GROUP && !VFIO_CONTAINER
> > +	default VFIO && VFIO_GROUP && !VFIO_CONTAINER
> 
> Shouldn't these simply replace VFIO with VFIO_GROUP since VFIO_GROUP
> necessarily depends on VFIO?

looks so.

> 
> >  	help
> >  	  IOMMUFD will provide /dev/vfio/vfio instead of VFIO. This relies on
> >  	  IOMMUFD providing compatibility emulation to give the same ioctls.
> > diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> > index e2105b4dac2d..0942a19601a2 100644
> > --- a/drivers/vfio/Kconfig
> > +++ b/drivers/vfio/Kconfig
> > @@ -4,7 +4,9 @@ menuconfig VFIO
> >  	select IOMMU_API
> >  	depends on IOMMUFD || !IOMMUFD
> >  	select INTERVAL_TREE
> > -	select VFIO_CONTAINER if IOMMUFD=n
> > +	select VFIO_GROUP if SPAPR_TCE_IOMMU || !IOMMUFD
> 
> This needs to be IOMMUFD=n or else VFIO_GROUP cannot be unset when
> IOMMUFD=m

yes. btw. does it mean the "depends on IOMMUFD || !IOMMUFD"
also use IOMMUFD=n?

> 
> > +	select VFIO_DEVICE_CDEV if !VFIO_GROUP
> > +	select VFIO_CONTAINER if IOMMUFD=n && VFIO_GROUP
> 
> The fact that CONTAINER depends on GROUP seems to be sufficient that we
> don't need GROUP here.  Thanks,

right. I added VFIO_GROUP as I saw a time that container code was comipled
when IOMMUFD=n and VFIO_GROUP=n. This encounters compiling issue since
container code refers device->group->xxx. But this should have been fixed by
selecting VFIO_GROUP when IOMMUFD=n.

Thanks,
Yi Liu

> 
> Alex
> 
> >  	help
> >  	  VFIO provides a framework for secure userspace device drivers.
> >  	  See Documentation/driver-api/vfio.rst for more details.
> > @@ -15,6 +17,7 @@ if VFIO
> >  config VFIO_DEVICE_CDEV
> >  	bool "Support for the VFIO cdev /dev/vfio/devices/vfioX"
> >  	depends on IOMMUFD
> > +	default !VFIO_GROUP
> >  	help
> >  	  The VFIO device cdev is another way for userspace to get device
> >  	  access. Userspace gets device fd by opening device cdev under
> > @@ -23,9 +26,20 @@ config VFIO_DEVICE_CDEV
> >
> >  	  If you don't know what to do here, say N.
> >
> > +config VFIO_GROUP
> > +	bool "Support for the VFIO group /dev/vfio/$group_id"
> > +	default y
> > +	help
> > +	   VFIO group support provides the traditional model for accessing
> > +	   devices through VFIO and is used by the majority of userspace
> > +	   applications and drivers making use of VFIO.
> > +
> > +	   If you don't know what to do here, say Y.
> > +
> >  config VFIO_CONTAINER
> >  	bool "Support for the VFIO container /dev/vfio/vfio"
> >  	select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
> > +	depends on VFIO_GROUP
> >  	default y
> >  	help
> >  	  The VFIO container is the classic interface to VFIO for establishing
> > diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
> > index 245394aeb94b..57c3515af606 100644
> > --- a/drivers/vfio/Makefile
> > +++ b/drivers/vfio/Makefile
> > @@ -2,9 +2,9 @@
> >  obj-$(CONFIG_VFIO) += vfio.o
> >
> >  vfio-y += vfio_main.o \
> > -	  group.o \
> >  	  iova_bitmap.o
> >  vfio-$(CONFIG_VFIO_DEVICE_CDEV) += device_cdev.o
> > +vfio-$(CONFIG_VFIO_GROUP) += group.o
> >  vfio-$(CONFIG_IOMMUFD) += iommufd.o
> >  vfio-$(CONFIG_VFIO_CONTAINER) += container.o
> >  vfio-$(CONFIG_VFIO_VIRQFD) += virqfd.o
> > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > index c199e410db18..9c7a238ec8dd 100644
> > --- a/drivers/vfio/vfio.h
> > +++ b/drivers/vfio/vfio.h
> > @@ -36,6 +36,12 @@ vfio_allocate_device_file(struct vfio_device *device);
> >
> >  extern const struct file_operations vfio_device_fops;
> >
> > +#ifdef CONFIG_VFIO_NOIOMMU
> > +extern bool vfio_noiommu __read_mostly;
> > +#else
> > +enum { vfio_noiommu = false };
> > +#endif
> > +
> >  enum vfio_group_type {
> >  	/*
> >  	 * Physical device with IOMMU backing.
> > @@ -60,6 +66,7 @@ enum vfio_group_type {
> >  	VFIO_NO_IOMMU,
> >  };
> >
> > +#if IS_ENABLED(CONFIG_VFIO_GROUP)
> >  struct vfio_group {
> >  	struct device 			dev;
> >  	struct cdev			cdev;
> > @@ -113,6 +120,104 @@ static inline void vfio_device_set_noiommu(struct
> vfio_device *device)
> >  	device->noiommu = IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
> >  			  device->group->type == VFIO_NO_IOMMU;
> >  }
> > +#else
> > +struct vfio_group;
> > +
> > +static inline int vfio_device_block_group(struct vfio_device *device)
> > +{
> > +	return 0;
> > +}
> > +
> > +static inline void vfio_device_unblock_group(struct vfio_device *device)
> > +{
> > +}
> > +
> > +static inline int vfio_device_set_group(struct vfio_device *device,
> > +					enum vfio_group_type type)
> > +{
> > +	return 0;
> > +}
> > +
> > +static inline void vfio_device_remove_group(struct vfio_device *device)
> > +{
> > +}
> > +
> > +static inline void vfio_device_group_register(struct vfio_device *device)
> > +{
> > +}
> > +
> > +static inline void vfio_device_group_unregister(struct vfio_device *device)
> > +{
> > +}
> > +
> > +static inline bool vfio_device_group_uses_container(struct vfio_device_file *df)
> > +{
> > +	return false;
> > +}
> > +
> > +static inline int vfio_device_group_use_iommu(struct vfio_device *device)
> > +{
> > +	return -EOPNOTSUPP;
> > +}
> > +
> > +static inline void vfio_device_group_unuse_iommu(struct vfio_device *device)
> > +{
> > +}
> > +
> > +static inline void vfio_device_group_close(struct vfio_device_file *df)
> > +{
> > +}
> > +
> > +static inline struct vfio_group *vfio_group_from_file(struct file *file)
> > +{
> > +	return NULL;
> > +}
> > +
> > +static inline bool vfio_group_has_dev(struct vfio_group *group,
> > +				      struct vfio_device *device)
> > +{
> > +	return false;
> > +}
> > +
> > +static inline bool vfio_group_enforced_coherent(struct vfio_group *group)
> > +{
> > +	return true;
> > +}
> > +
> > +static inline void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm)
> > +{
> > +}
> > +
> > +static inline bool vfio_device_has_container(struct vfio_device *device)
> > +{
> > +	return false;
> > +}
> > +
> > +static inline int __init vfio_group_init(void)
> > +{
> > +	return 0;
> > +}
> > +
> > +static inline void vfio_group_cleanup(void)
> > +{
> > +}
> > +
> > +static inline void vfio_device_set_noiommu(struct vfio_device *device)
> > +{
> > +	struct iommu_group *iommu_group;
> > +
> > +	device->noiommu = false;
> > +
> > +	if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU) || !vfio_noiommu)
> > +		return;
> > +
> > +	iommu_group = iommu_group_get(device->dev);
> > +	if (iommu_group)
> > +		iommu_group_put(iommu_group);
> > +	else
> > +		device->noiommu = true;
> > +}
> > +#endif /* CONFIG_VFIO_GROUP */
> >
> >  #if IS_ENABLED(CONFIG_VFIO_CONTAINER)
> >  /**
> > @@ -356,12 +461,6 @@ static inline void vfio_virqfd_exit(void)
> >  }
> >  #endif
> >
> > -#ifdef CONFIG_VFIO_NOIOMMU
> > -extern bool vfio_noiommu __read_mostly;
> > -#else
> > -enum { vfio_noiommu = false };
> > -#endif
> > -
> >  #ifdef CONFIG_HAVE_KVM
> >  void _vfio_device_get_kvm_safe(struct vfio_device *device, struct kvm *kvm);
> >  void vfio_device_put_kvm(struct vfio_device *device);
> > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > index 8719ec2adbbb..1367605d617c 100644
> > --- a/include/linux/vfio.h
> > +++ b/include/linux/vfio.h
> > @@ -43,7 +43,11 @@ struct vfio_device {
> >  	 */
> >  	const struct vfio_migration_ops *mig_ops;
> >  	const struct vfio_log_ops *log_ops;
> > +#if IS_ENABLED(CONFIG_VFIO_GROUP)
> >  	struct vfio_group *group;
> > +	struct list_head group_next;
> > +	struct list_head iommu_entry;
> > +#endif
> >  	struct vfio_device_set *dev_set;
> >  	struct list_head dev_set_list;
> >  	unsigned int migration_flags;
> > @@ -58,8 +62,6 @@ struct vfio_device {
> >  	refcount_t refcount;	/* user count on registered device*/
> >  	unsigned int open_count;
> >  	struct completion comp;
> > -	struct list_head group_next;
> > -	struct list_head iommu_entry;
> >  	struct iommufd_access *iommufd_access;
> >  	void (*put_kvm)(struct kvm *kvm);
> >  #if IS_ENABLED(CONFIG_IOMMUFD)
> > @@ -270,7 +272,14 @@ int vfio_mig_get_next_state(struct vfio_device *device,
> >  /*
> >   * External user API
> >   */
> > +#if IS_ENABLED(CONFIG_VFIO_GROUP)
> >  struct iommu_group *vfio_file_iommu_group(struct file *file);
> > +#else
> > +static inline struct iommu_group *vfio_file_iommu_group(struct file *file)
> > +{
> > +	return NULL;
> > +}
> > +#endif
> >  bool vfio_file_is_valid(struct file *file);
> >  bool vfio_file_enforced_coherent(struct file *file);
> >  void vfio_file_set_kvm(struct file *file, struct kvm *kvm);


^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v8 24/24] docs: vfio: Add vfio device cdev description
  2023-03-29 22:47   ` Alex Williamson
  2023-03-29 22:57     ` Alex Williamson
@ 2023-03-30 13:11     ` Liu, Yi L
  1 sibling, 0 replies; 54+ messages in thread
From: Liu, Yi L @ 2023-03-30 13:11 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Thursday, March 30, 2023 6:48 AM
> 
> On Mon, 27 Mar 2023 02:40:47 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > This gives notes for userspace applications on device cdev usage.
> >
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  Documentation/driver-api/vfio.rst | 127 ++++++++++++++++++++++++++++++
> >  1 file changed, 127 insertions(+)
> >
> > diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
> > index 363e12c90b87..77408788b98d 100644
> > --- a/Documentation/driver-api/vfio.rst
> > +++ b/Documentation/driver-api/vfio.rst
> > @@ -239,6 +239,125 @@ group and can access them as follows::
> >  	/* Gratuitous device reset and go... */
> >  	ioctl(device, VFIO_DEVICE_RESET);
> >
> > +IOMMUFD and vfio_iommu_type1
> > +----------------------------
> > +
> > +IOMMUFD is the new user API to manage I/O page tables from userspace.
> > +It intends to be the portal of delivering advanced userspace DMA
> > +features (nested translation [5], PASID [6], etc.) while being backward
> > +compatible with the vfio_iommu_type1 driver.  Eventually vfio_iommu_type1
> > +will be deprecated.
> 
> "... while also providing a backwards compatibility interface for
> existing VFIO_TYPE1v2_IOMMU use cases.  Eventually the vfio_iommu_type1
> driver, as well as the legacy vfio container and group model is
> intended to be deprecated."

only TYPE1v2 or either v1 or v2?

> 
> > +
> > +With the backward compatibility, no change is required for legacy
> > VFIO +drivers or applications to connect a VFIO device to IOMMUFD.
> > +
> > +	When CONFIG_IOMMUFD_VFIO_CONTAINER=n, VFIO container still provides
> > +	/dev/vfio/vfio which connects to vfio_iommu_type1.  To disable VFIO
> > +	container and vfio_iommu_type1, the administrator could symbol link
> > +	/dev/vfio/vfio to /dev/iommu to enable VFIO container emulation
> > +	in IOMMUFD.
> > +
> > +	When CONFIG_IOMMUFD_VFIO_CONTAINER=y, IOMMUFD directly provides
> > +	/dev/vfio/vfio while the VFIO container and vfio_iommu_type1 are
> > +	explicitly disabled.
> > +
> 
> "The IOMMUFD backwards compatibility interface can be enabled two ways.
> In the first method, the kernel can be configured with
> CONFIG_IOMMUFD_VFIO_CONTAINER, in which case the IOMMUFD subsystem
> transparently provides the entire infrastructure for the the VFIO

remove the

> container and IOMMU backend interfaces.  The compatibility mode can
> also be accessed if the VFIO container interface, ie. /dev/vfio/vfio is
> simply symlink'd to /dev/iommu.  Note that at the time of writing, the
> compatibility mode is not entirely feature complete relative to
> VFIO_TYPE1v2_IOMMU (ex. DMA mapping MMIO) and does not attempt to
> provide compatibility to the VFIO_SPAPR_TCE_IOMMU interface.  Therefore
> it is not generally advisable at this time to switch from native VFIO
> implementations to the IOMMUFD compatibility interfaces.
> 
> Long term, VFIO users should migrate to device access through the cdev
> interface described below, and native access through the IOMMUFD
> provided interfaces."
> 

will apply the above suggestion. thanks!

Regards,
Yi Liu
> 
> > +VFIO Device cdev
> > +----------------
> > +
> > +Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
> > +in a VFIO group.
> > +
> > +With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd
> > +by directly opening a character device /dev/vfio/devices/vfioX where
> > +"X" is the number allocated uniquely by VFIO for registered devices.
> > +For noiommu devices, the character device would be named with
> > "noiommu-" +prefix. e.g. /dev/vfio/devices/noiommu-vfioX.
> > +
> > +The cdev only works with IOMMUFD.  Both VFIO drivers and applications
> > +must adapt to the new cdev security model which requires using
> > +VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to
> > +actually use the device.  Once BIND succeeds then a VFIO device can
> > +be fully accessed by the user.
> > +
> > +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers.
> > +Hence those modules can be fully compiled out in an environment
> > +where no legacy VFIO application exists.
> > +
> > +So far SPAPR does not support IOMMUFD yet.  So it cannot support
> > device +cdev neither.
> > +
> > +Device cdev Example
> > +-------------------
> > +
> > +Assume user wants to access PCI device 0000:6a:01.0::
> > +
> > +	$ ls /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/
> > +	vfio0
> > +
> > +This device is therefore represented as vfio0.  The user can verify
> > +its existence::
> > +
> > +	$ ls -l /dev/vfio/devices/vfio0
> > +	crw------- 1 root root 511, 0 Feb 16 01:22
> > /dev/vfio/devices/vfio0
> > +	$ cat /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/vfio0/dev
> > +	511:0
> > +	$ ls -l /dev/char/511\:0
> > +	lrwxrwxrwx 1 root root 21 Feb 16 01:22 /dev/char/511:0 ->
> > ../vfio/devices/vfio0 +
> > +Then provide the user with access to the device if unprivileged
> > +operation is desired::
> > +
> > +	$ chown user:user /dev/vfio/devices/vfio0
> > +
> > +Finally the user could get cdev fd by::
> > +
> > +	cdev_fd = open("/dev/vfio/devices/vfio0", O_RDWR);
> > +
> > +An opened cdev_fd doesn't give the user any permission of accessing
> > +the device except binding the cdev_fd to an iommufd.  After that
> > point +then the device is fully accessible including attaching it to
> > an +IOMMUFD IOAS/HWPT to enable userspace DMA::
> > +
> > +	struct vfio_device_bind_iommufd bind = {
> > +		.argsz = sizeof(bind),
> > +		.flags = 0,
> > +	};
> > +	struct iommu_ioas_alloc alloc_data  = {
> > +		.size = sizeof(alloc_data),
> > +		.flags = 0,
> > +	};
> > +	struct vfio_device_attach_iommufd_pt attach_data = {
> > +		.argsz = sizeof(attach_data),
> > +		.flags = 0,
> > +	};
> > +	struct iommu_ioas_map map = {
> > +		.size = sizeof(map),
> > +		.flags = IOMMU_IOAS_MAP_READABLE |
> > +			 IOMMU_IOAS_MAP_WRITEABLE |
> > +			 IOMMU_IOAS_MAP_FIXED_IOVA,
> > +		.__reserved = 0,
> > +	};
> > +
> > +	iommufd = open("/dev/iommu", O_RDWR);
> > +
> > +	bind.iommufd = iommufd; // negative value means vfio-noiommu
> > mode
> > +	ioctl(cdev_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind);
> > +
> > +	ioctl(iommufd, IOMMU_IOAS_ALLOC, &alloc_data);
> > +	attach_data.pt_id = alloc_data.out_ioas_id;
> > +	ioctl(cdev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data);
> > +
> > +	/* Allocate some space and setup a DMA mapping */
> > +	map.user_va = (int64_t)mmap(0, 1024 * 1024, PROT_READ |
> > PROT_WRITE,
> > +				    MAP_PRIVATE | MAP_ANONYMOUS, 0,
> > 0);
> > +	map.iova = 0; /* 1MB starting at 0x0 from device view */
> > +	map.length = 1024 * 1024;
> > +	map.ioas_id = alloc_data.out_ioas_id;;
> > +
> > +	ioctl(iommufd, IOMMU_IOAS_MAP, &map);
> > +
> > +	/* Other device operations as stated in "VFIO Usage Example"
> > */ +
> >  VFIO User API
> >  -------------------------------------------------------------------------------
> >
> > @@ -566,3 +685,11 @@ This implementation has some specifics:
> >  				\-0d.1
> >
> >  	00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev
> > 90) +
> > +.. [5] Nested translation is an IOMMU feature which supports two
> > stage
> > +   address translations.  This improves the address translation
> > efficiency
> > +   in IOMMU virtualization.
> > +
> > +.. [6] PASID stands for Process Address Space ID, introduced by PCI
> > +   Express.  It is a prerequisite for Shared Virtual Addressing (SVA)
> > +   and Scalable I/O Virtualization (Scalable IOV).


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 03/24] vfio: Remove vfio_file_is_group()
  2023-03-27  9:40 ` [PATCH v8 03/24] vfio: Remove vfio_file_is_group() Yi Liu
@ 2023-03-30 23:50   ` Jason Gunthorpe
  0 siblings, 0 replies; 54+ messages in thread
From: Jason Gunthorpe @ 2023-03-30 23:50 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang

On Mon, Mar 27, 2023 at 02:40:26AM -0700, Yi Liu wrote:
> since no user of vfio_file_is_group() now.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/group.c | 10 ----------
>  include/linux/vfio.h |  1 -
>  2 files changed, 11 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 10/24] vfio: Make vfio_device_open() single open for device cdev path
  2023-03-27  9:40 ` [PATCH v8 10/24] vfio: Make vfio_device_open() single open for device cdev path Yi Liu
@ 2023-03-30 23:52   ` Jason Gunthorpe
  0 siblings, 0 replies; 54+ messages in thread
From: Jason Gunthorpe @ 2023-03-30 23:52 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang

On Mon, Mar 27, 2023 at 02:40:33AM -0700, Yi Liu wrote:
> VFIO group has historically allowed multi-open of the device FD. This
> was made secure because the "open" was executed via an ioctl to the
> group FD which is itself only single open.
> 
> However, no known use of multiple device FDs today. It is kind of a
> strange thing to do because new device FDs can naturally be created
> via dup().
> 
> When we implement the new device uAPI (only used in cdev path) there is
> no natural way to allow the device itself from being multi-opened in a
> secure manner. Without the group FD we cannot prove the security context
> of the opener.
> 
> Thus, when moving to the new uAPI we block the ability of opening
> a device multiple times. Given old group path still allows it we store
> a vfio_group pointer in struct vfio_device_file to differentiate.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/group.c     | 2 ++
>  drivers/vfio/vfio.h      | 2 ++
>  drivers/vfio/vfio_main.c | 7 +++++++
>  3 files changed, 11 insertions(+)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v8 11/24] vfio: Make vfio_device_first_open() to accept NULL iommufd for noiommu
  2023-03-27  9:40 ` [PATCH v8 11/24] vfio: Make vfio_device_first_open() to accept NULL iommufd for noiommu Yi Liu
@ 2023-03-30 23:56   ` Jason Gunthorpe
  0 siblings, 0 replies; 54+ messages in thread
From: Jason Gunthorpe @ 2023-03-30 23:56 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang

On Mon, Mar 27, 2023 at 02:40:34AM -0700, Yi Liu wrote:
> vfio_device_first_open() requires the caller to provide either a valid
> iommufd (the group path in iommufd compat mode) or a valid container
> (the group path in legacy container mode). As preparation for noiommu
> support in device cdev path it's extended to allow both being NULL. The
> caller is expected to verify noiommu permission before passing NULL
> to this function.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/group.c     |  8 ++++++++
>  drivers/vfio/vfio.h      |  1 +
>  drivers/vfio/vfio_main.c | 12 ++++++++----
>  3 files changed, 17 insertions(+), 4 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v8 00/24] Add vfio_device cdev for iommufd support
  2023-03-27  9:40 [PATCH v8 00/24] Add vfio_device cdev for iommufd support Yi Liu
                   ` (24 preceding siblings ...)
  2023-03-27 17:48 ` [PATCH v8 00/24] Add vfio_device cdev for iommufd support Nicolin Chen
@ 2023-03-31  3:10 ` Jiang, Yanting
  2023-03-31  5:01 ` Jiang, Yanting
  26 siblings, 0 replies; 54+ messages in thread
From: Jiang, Yanting @ 2023-03-31  3:10 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg, Tian, Kevin
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence

> On 2023/3/27 02:40, Yi Liu wrote:
> Existing VFIO provides group-centric user APIs for userspace. Userspace opens
> the /dev/vfio/$group_id first before getting device fd and hence getting access
> to device. This is not the desired model for iommufd. Per the conclusion of
> community discussion[1], iommufd provides device-centric kAPIs and requires its
> consumer (like VFIO) to be device-centric user APIs. Such user APIs are used to
> associate device with iommufd and also the I/O address spaces managed by the
> iommufd.
> 
> This series first introduces a per device file structure to be prepared for further
> enhancement and refactors the kvm-vfio code to be prepared for accepting
> device file from userspace. Afte this, adds a mechanism for blocking device
> access before iommufd bind. Then refactors the vfio to be able to handle cdev
> path (e.g. iommufd binding, no-iommufd, [de]attach ioas).
> This refactor includes making the device_open exclusive between group and
> cdev path, only allow single device open in cdev path; vfio-iommufd code is also
> refactored to support cdev. e.g. split the vfio_iommufd_bind() into two steps.
> Eventually, adds the cdev support for vfio device and the new ioctls, then makes
> group infrastructure optional as it is not needed when vfio device cdev is
> compiled.
> 
> This series is based on some preparation works done to vfio emulated devices[2]
> and vfio pci hot reset enhancements[3].
> 
> This series is a prerequisite for iommu nesting for vfio device[4] [5].
> 
> The complete code can be found in below branch, simple tests done to the
> legacy group path and the cdev path. Draft QEMU branch can be found at[6]
> 
> https://github.com/yiliu1765/iommufd/tree/vfio_device_cdev_v8
> (config CONFIG_IOMMUFD=y CONFIG_VFIO_DEVICE_CDEV=y)
> 

Tested NIC passthrough on Intel platform with above branch (commit id: 9464af85d280511639f8a3e27b6c2a2c5535fa4c).
Result looks good hence, 
Tested by: Jiang, Yanting <yanting.jiang@intel.com>

Thanks,
Yanting


^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v8 00/24] Add vfio_device cdev for iommufd support
  2023-03-27  9:40 [PATCH v8 00/24] Add vfio_device cdev for iommufd support Yi Liu
                   ` (25 preceding siblings ...)
  2023-03-31  3:10 ` Jiang, Yanting
@ 2023-03-31  5:01 ` Jiang, Yanting
  26 siblings, 0 replies; 54+ messages in thread
From: Jiang, Yanting @ 2023-03-31  5:01 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg, Tian, Kevin
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence

> Existing VFIO provides group-centric user APIs for userspace. Userspace opens
> the /dev/vfio/$group_id first before getting device fd and hence getting access
> to device. This is not the desired model for iommufd. Per the conclusion of
> community discussion[1], iommufd provides device-centric kAPIs and requires its
> consumer (like VFIO) to be device-centric user APIs. Such user APIs are used to
> associate device with iommufd and also the I/O address spaces managed by the
> iommufd.
> 
> This series first introduces a per device file structure to be prepared for further
> enhancement and refactors the kvm-vfio code to be prepared for accepting
> device file from userspace. Afte this, adds a mechanism for blocking device
> access before iommufd bind. Then refactors the vfio to be able to handle cdev
> path (e.g. iommufd binding, no-iommufd, [de]attach ioas).
> This refactor includes making the device_open exclusive between group and
> cdev path, only allow single device open in cdev path; vfio-iommufd code is also
> refactored to support cdev. e.g. split the vfio_iommufd_bind() into two steps.
> Eventually, adds the cdev support for vfio device and the new ioctls, then makes
> group infrastructure optional as it is not needed when vfio device cdev is
> compiled.
> 
> This series is based on some preparation works done to vfio emulated devices[2]
> and vfio pci hot reset enhancements[3].
> 
> This series is a prerequisite for iommu nesting for vfio device[4] [5].
> 
> The complete code can be found in below branch, simple tests done to the
> legacy group path and the cdev path. Draft QEMU branch can be found at[6]
> 
> https://github.com/yiliu1765/iommufd/tree/vfio_device_cdev_v8
> (config CONFIG_IOMMUFD=y CONFIG_VFIO_DEVICE_CDEV=y)

Tested-by: Yanting Jiang <yanting.jiang@intel.com>

Thanks,
Yanting

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2023-03-31  5:02 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-27  9:40 [PATCH v8 00/24] Add vfio_device cdev for iommufd support Yi Liu
2023-03-27  9:40 ` [PATCH v8 01/24] vfio: Allocate per device file structure Yi Liu
2023-03-27  9:40 ` [PATCH v8 02/24] vfio: Refine vfio file kAPIs for KVM Yi Liu
2023-03-27  9:40 ` [PATCH v8 03/24] vfio: Remove vfio_file_is_group() Yi Liu
2023-03-30 23:50   ` Jason Gunthorpe
2023-03-27  9:40 ` [PATCH v8 04/24] vfio: Accept vfio device file in the KVM facing kAPI Yi Liu
2023-03-27  9:40 ` [PATCH v8 05/24] kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device fd Yi Liu
2023-03-27  9:40 ` [PATCH v8 06/24] kvm/vfio: Accept vfio device file from userspace Yi Liu
2023-03-27  9:40 ` [PATCH v8 07/24] vfio: Pass struct vfio_device_file * to vfio_device_open/close() Yi Liu
2023-03-27  9:40 ` [PATCH v8 08/24] vfio: Block device access via device fd until device is opened Yi Liu
2023-03-28 21:33   ` Alex Williamson
2023-03-29  2:23     ` Liu, Yi L
2023-03-27  9:40 ` [PATCH v8 09/24] vfio: Add cdev_device_open_cnt to vfio_group Yi Liu
2023-03-27  9:40 ` [PATCH v8 10/24] vfio: Make vfio_device_open() single open for device cdev path Yi Liu
2023-03-30 23:52   ` Jason Gunthorpe
2023-03-27  9:40 ` [PATCH v8 11/24] vfio: Make vfio_device_first_open() to accept NULL iommufd for noiommu Yi Liu
2023-03-30 23:56   ` Jason Gunthorpe
2023-03-27  9:40 ` [PATCH v8 12/24] vfio-iommufd: Move noiommu support out of vfio_iommufd_bind() Yi Liu
2023-03-27  9:40 ` [PATCH v8 13/24] vfio-iommufd: Split bind/attach into two steps Yi Liu
2023-03-27  9:40 ` [PATCH v8 14/24] vfio: Record devid in vfio_device_file Yi Liu
2023-03-27  9:40 ` [PATCH v8 15/24] vfio-iommufd: Add detach_ioas support for physical VFIO devices Yi Liu
2023-03-27  9:40 ` [PATCH v8 16/24] iommufd/device: Add iommufd_access_detach() API Yi Liu
2023-03-28  2:23   ` Jon Pan-Doh
2023-03-28 15:54     ` Nicolin Chen
2023-03-29  2:24       ` Liu, Yi L
2023-03-27  9:40 ` [PATCH v8 17/24] vfio-iommufd: Add detach_ioas support for emulated VFIO devices Yi Liu
2023-03-27  9:40 ` [PATCH v8 18/24] vfio: Determine noiommu in vfio_device registration Yi Liu
2023-03-28  6:36   ` Tian, Kevin
2023-03-27  9:40 ` [PATCH v8 19/24] vfio: Name noiommu vfio_device with "noiommu-" prefix Yi Liu
2023-03-28  6:37   ` Tian, Kevin
2023-03-27  9:40 ` [PATCH v8 20/24] vfio: Add cdev for vfio_device Yi Liu
2023-03-29 19:57   ` Alex Williamson
2023-03-30  5:35     ` Liu, Yi L
2023-03-27  9:40 ` [PATCH v8 21/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD Yi Liu
2023-03-29 21:00   ` Alex Williamson
2023-03-29 23:22     ` Jason Gunthorpe
2023-03-30 12:52       ` Liu, Yi L
2023-03-30 12:59         ` Jason Gunthorpe
2023-03-30  7:09     ` Liu, Yi L
2023-03-30 11:52       ` Jason Gunthorpe
2023-03-30 12:53         ` Liu, Yi L
2023-03-27  9:40 ` [PATCH v8 22/24] vfio: Add VFIO_DEVICE_AT[DE]TACH_IOMMUFD_PT Yi Liu
2023-03-29 21:19   ` Alex Williamson
2023-03-30 13:02     ` Liu, Yi L
2023-03-27  9:40 ` [PATCH v8 23/24] vfio: Compile group optionally Yi Liu
2023-03-29 21:51   ` Alex Williamson
2023-03-30 13:06     ` Liu, Yi L
2023-03-27  9:40 ` [PATCH v8 24/24] docs: vfio: Add vfio device cdev description Yi Liu
2023-03-29 22:47   ` Alex Williamson
2023-03-29 22:57     ` Alex Williamson
2023-03-30 13:11     ` Liu, Yi L
2023-03-27 17:48 ` [PATCH v8 00/24] Add vfio_device cdev for iommufd support Nicolin Chen
2023-03-31  3:10 ` Jiang, Yanting
2023-03-31  5:01 ` Jiang, Yanting

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).