All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v12 00/24] Add vfio_device cdev for iommufd support
@ 2023-06-02 12:16 ` Yi Liu
  0 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

Existing VFIO provides group-centric user APIs for userspace. Userspace
opens the /dev/vfio/$group_id first before getting device fd and hence
getting access to device. This is not the desired model for iommufd. Per
the conclusion of community discussion[1], iommufd provides device-centric
kAPIs and requires its consumer (like VFIO) to be device-centric user
APIs. Such user APIs are used to associate device with iommufd and also
the I/O address spaces managed by the iommufd.

This series first introduces a per device file structure to be prepared
for further enhancement and refactors the kvm-vfio code to be prepared
for accepting device file from userspace. After this, adds a mechanism for
blocking device access before iommufd bind. Then refactors the vfio to be
able to handle cdev path (e.g. iommufd binding, no-iommufd, [de]attach ioas).
This refactor includes making the device_open exclusive between the group
and the cdev path, only allow single device open in cdev path; vfio-iommufd
code is also refactored to support cdev. e.g. split the vfio_iommufd_bind()
into two steps. Eventually, adds the cdev support for vfio device and the
new ioctls, then makes group infrastructure optional as it is not needed
when vfio device cdev is compiled.

This series is based on some preparation works done to vfio emulated devices[2]
and vfio pci hot reset enhancements[3].

This series is a prerequisite for iommu nesting for vfio device[4] [5].

The complete code can be found in below branch, simple tests done to the
legacy group path and the cdev path. Draft QEMU branch can be found at[6]
However, the noiommu mode test is only done with some hacks in kernel and
qemu to check if qemu can boot with noiommu devices.

https://github.com/yiliu1765/iommufd/tree/vfio_device_cdev_v12
(config CONFIG_IOMMUFD=y CONFIG_VFIO_DEVICE_CDEV=y)

base-commit: 0948fa29d62eca627a19d5b1534262a6d93d4181

[1] https://lore.kernel.org/kvm/BN9PR11MB5433B1E4AE5B0480369F97178C189@BN9PR11MB5433.namprd11.prod.outlook.com/
[2] https://lore.kernel.org/kvm/20230327093351.44505-1-yi.l.liu@intel.com/ - merged
[3] https://lore.kernel.org/kvm/20230602121515.79374-1-yi.l.liu@intel.com/
[4] https://lore.kernel.org/linux-iommu/20230511143844.22693-1-yi.l.liu@intel.com/
[5] https://lore.kernel.org/linux-iommu/20230511145110.27707-1-yi.l.liu@intel.com/#t
[6] https://github.com/yiliu1765/qemu/tree/iommufd_rfcv4.mig.reset.v4_var3

Change log:

v12:
 - Rename vfio_device_xx() to be vfio_df_xx() if the object is vfio_device_file (Alex)
 - Refine patch 10 of v11 (Alex)
 - Add new device ioctls from offset 18 (Alex)
 - Add a patch to check group->type for noiommu test, no need to check
   CONFIG_VFIO_NOIOMMU (Alex)
 - Refine the logic of vfio_device_set_noiommu() per Alex's suggestion. The noiommu
   taint is moved to __vfio_register_dev(), also add a check on group type before
   calling vfio_device_set_noiommu() as only physical device can be noiommu device.
 - Drop noiommu support for cdev, patch 16 of v11 is dropped, the related changes
   are in patch 17 - 24 of this series.

v11: https://lore.kernel.org/kvm/20230513132827.39066-1-yi.l.liu@intel.com/
 - Add back the noiommu determination at vfio device registration patch and
   put it prior to compiling vfio_group code optionally as compiling vfio_group
   optionaly is the major reason for it.
 - Fix a typo related to SPAPR (Cédric Le Goater)
 - Add t-b from Shameerali Kolothum Thodi, tested on HiSilicon D06(ARM64) platform
   with a NIC pass-through

v10: https://lore.kernel.org/kvm/20230426150321.454465-1-yi.l.liu@intel.com/
 - Drop patch 03 of v9 as vfio_file_is_group() is still needed by pci hot reset path
 - Drop 11 of v9 per the change of noiommu support
 - Move patch 18 of v9 to hot-reset series [3]
 - vfio_file_has_device_access() is dropped as no usage now (hot-reset does not accept
   device fd, hence no need for this helper)
 - Minor change to patch 02, mainly make it back to patch v2 of v6 which is before
   splitting hot-reset series
 - Minor change in 10 and 11 due to rebase
 - Functional changes in patch 19, 20 and 21 per the latest noiommu support
   policy. noiommu device can be bound to valid iommufd now, this is different
   from the prior policy in which noiommu device is not allowed to be bound to
   valid iommufd. So may pay more attention on the three patches, previous r-b
   and t-b are dropped for these three patches.

v9: https://lore.kernel.org/kvm/20230401151833.124749-1-yi.l.liu@intel.com/
 - Use smp_load_acquire() in vfio_file_has_device_access() for df->access_granted (Alex)
 - Fix lock init in patch 16 of v8 (Jon Pan-Doh)
 - Split patch 20 of v8 (Alex)
 - Refine noiommu logic in BIND_IOMMUFD (Alex)
 - Remove dev_cookie in BIND_IOMMUFD ioctl (Alex, Jason)
 - Remove static_assert in ATTACH/DETACH ioctl handling (Alex)
 - Remove device->ops->bind_iommufd presence check in BIND_IOMMUFD/ATTACH/DETACH handling (Alex)
 - Remove VFIO dependecny for VFIO_CONTAINER as VFIO_GROUP should imply it (Alex)
 - Improve the documentation per suggestions from Alex on patch 24 of v8 (Alex)
 - Remove WARN_ON(df->group) in vfio_device_group_uses_container() of patch 11
 - Add r-b from Kevin to patch 18/19 of v8
 - Add r-b from Jason to patch 03/10/11 of v8
 - Add t-b from Yanting Jiang and Nicolin Chen

v8: https://lore.kernel.org/kvm/20230327094047.47215-1-yi.l.liu@intel.com/
 - Add patch 18 to determine noiommu device at vfio_device registration (Jason)
 - Add patch 19 to name noiommu device with "noiommu-" prefix to be par with
   group path
 - Add r-b from Kevin
 - Add t-b from Terrence

v7: https://lore.kernel.org/kvm/20230316125534.17216-1-yi.l.liu@intel.com/
 - Split the vfio-pci hot reset changes to be separate patch series (Jason, Kevin)
 - More polish on no-iommufd support (patch 11 - 13) in cdev path (Kevin)
 - iommufd_access_detach() in patch 16 is added by Nic for emulated devices (Kevin, Jason)

v6: https://lore.kernel.org/kvm/20230308132903.465159-1-yi.l.liu@intel.com/#t
 - Add r-b from Jason on patch 01 - 08 and 13 in v5
 - Based on the prerequisite mini-series which makes vfio emulated devices
   be prepared to cdev (Jason)
 - Add the approach to pass a set of device fds to do hot reset ownership
   check, while the zero-length array approach is also kept. (Jason, Kevin, Alex)
 - Drop patch 10 of v5, it is reworked by patch 13 and 17 in v6 (Jason)
 - Store vfio_group pointer in vfio_device_file to check if user is using
   legacy vfio container (Jason)
 - Drop the is_cdev_device flag (introduced in patch 14 of v5) as the group
   pointer stored in vfio_device_file can cover it.
 - Add iommu_group check in the cdev no-iommu path patch 24 (Kevin)
 - Add t-b from Terrence, Nicolin and Matthew (thanks for the help, some patches
   are new in this version, so I just added t-b to the patches that are also
   in v5 and no big change, for others would add in this version).

v5: https://lore.kernel.org/kvm/20230227111135.61728-1-yi.l.liu@intel.com/
 - Add r-b from Kevin on patch 08, 13, 14, 15 and 17.
 - Rename patch 02 to limit the change for KVM facing kAPIs. The vfio pci
   hot reset path only accepts group file until patch 09. (Kevin)
 - Update comment around smp_load_acquire(&df->access_granted) (Yan)
 - Adopt Jason's suggestion on the vfio pci hot reset path, passing zero-length
   fd array to indicate using bound iommufd_ctx as ownership check. (Jason, Kevin)
 - Direct read df->access_granted value in vfio_device_cdev_close() (Kevin, Yan, Jason)
 - Wrap the iommufd get/put into a helper to refine the error path of
   vfio_device_ioctl_bind_iommufd(). (Yan)

v4: https://lore.kernel.org/kvm/20230221034812.138051-1-yi.l.liu@intel.com/
 - Add r-b from Kevin on patch 09/10
 - Add a line in devices/vfio.rst to emphasize user should add group/device to
   KVM prior to invoke open_device op which may be called in the VFIO_GROUP_GET_DEVICE_FD
   or VFIO_DEVICE_BIND_IOMMUFD ioctl.
 - Modify VFIO_GROUP/VFIO_DEVICE_CDEV Kconfig dependency (Alex)
 - Select VFIO_GROUP for SPAPR (Jason)
 - Check device fully-opened in PCI hotreset path for device fd (Jason)
 - Set df->access_granted in the caller of vfio_device_open() since
   the caller may fail in other operations, but df->access_granted
   does not allow a true to false change. So it should be set only when
   the open path is really done successfully. (Yan, Kevin)
 - Fix missing iommufd_ctx_put() in the cdev path (Yan)
 - Fix an issue found in testing exclusion between group and cdev path.
   vfio_device_cdev_close() should check df->access_granted before heading
   to other operations.
 - Update vfio.rst for iommufd/cdev

v3: https://lore.kernel.org/kvm/20230213151348.56451-1-yi.l.liu@intel.com/
 - Add r-b from Kevin on patch 03, 06, 07, 08.
 - Refine the group and cdev path exclusion. Remove vfio_device:single_open;
   add vfio_group::cdev_device_open_cnt to achieve exlucsion between group
   path and cdev path (Kevin, Jason)
 - Fix a bug in the error handling path (Yan Zhao)
 - Address misc remarks from Kevin

v2: https://lore.kernel.org/kvm/20230206090532.95598-1-yi.l.liu@intel.com/
 - Add r-b from Kevin and Eric on patch 01 02 04.
 - "Split kvm/vfio: Provide struct kvm_device_ops::release() insted of ::destroy()"
   from this series and got applied. (Alex, Kevin, Jason, Mathhew)
 - Add kvm_ref_lock to protect vfio_device_file->kvm instead of reusing
   dev_set->lock as dead-lock is observed with vfio-ap which would try to
   acquire kvm_lock. This is opposite lock order with kvm_device_release()
   which holds kvm_lock first and then hold dev_set->lock. (Kevin)
 - Use a separate ioctl for detaching IOAS. (Alex)
 - Rename vfio_device_file::single_open to be is_cdev_device (Kevin, Alex)
 - Move the vfio device cdev code into device_cdev.c and add a VFIO_DEVICE_CDEV
   kconfig for it. (Kevin, Jason)

v1: https://lore.kernel.org/kvm/20230117134942.101112-1-yi.l.liu@intel.com/
 - Fix the circular refcount between kvm struct and device file reference. (JasonG)
 - Address comments from KevinT
 - Remained the ioctl for detach, needs to Alex's taste
   (https://lore.kernel.org/kvm/BN9PR11MB5276BE9F4B0613EE859317028CFF9@BN9PR11MB5276.namprd11.prod.outlook.com/)

rfc: https://lore.kernel.org/kvm/20221219084718.9342-1-yi.l.liu@intel.com/

Thanks,
	Yi Liu

Nicolin Chen (1):
  iommufd/device: Add iommufd_access_detach() API

Yi Liu (23):
  vfio: Allocate per device file structure
  vfio: Refine vfio file kAPIs for KVM
  vfio: Accept vfio device file in the KVM facing kAPI
  kvm/vfio: Prepare for accepting vfio device fd
  kvm/vfio: Accept vfio device file from userspace
  vfio: Pass struct vfio_device_file * to vfio_device_open/close()
  vfio: Block device access via device fd until device is opened
  vfio: Add cdev_device_open_cnt to vfio_group
  vfio: Make vfio_df_open() single open for device cdev path
  vfio-iommufd: Move noiommu compat validation out of
    vfio_iommufd_bind()
  vfio-iommufd: Split bind/attach into two steps
  vfio: Record devid in vfio_device_file
  vfio-iommufd: Add detach_ioas support for physical VFIO devices
  vfio-iommufd: Add detach_ioas support for emulated VFIO devices
  vfio: Move vfio_device_group_unregister() to be the first operation in
    unregister
  vfio: Add cdev for vfio_device
  vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  vfio: Add VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT
  vfio: Only check group->type for noiommu test
  vfio: Determine noiommu device in __vfio_register_dev()
  vfio: Remove vfio_device_is_noiommu()
  vfio: Compile vfio_group infrastructure optionally
  docs: vfio: Add vfio device cdev description

 Documentation/driver-api/vfio.rst             | 140 +++++++++-
 Documentation/virt/kvm/devices/vfio.rst       |  47 ++--
 drivers/gpu/drm/i915/gvt/kvmgt.c              |   1 +
 drivers/iommu/iommufd/Kconfig                 |   4 +-
 drivers/iommu/iommufd/device.c                |  76 +++++-
 drivers/iommu/iommufd/iommufd_private.h       |   2 +
 drivers/s390/cio/vfio_ccw_ops.c               |   1 +
 drivers/s390/crypto/vfio_ap_ops.c             |   1 +
 drivers/vfio/Kconfig                          |  27 ++
 drivers/vfio/Makefile                         |   3 +-
 drivers/vfio/device_cdev.c                    | 251 ++++++++++++++++++
 drivers/vfio/fsl-mc/vfio_fsl_mc.c             |   1 +
 drivers/vfio/group.c                          | 174 +++++++-----
 drivers/vfio/iommufd.c                        |  96 ++++---
 .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    |   2 +
 drivers/vfio/pci/mlx5/main.c                  |   1 +
 drivers/vfio/pci/vfio_pci.c                   |   1 +
 drivers/vfio/platform/vfio_amba.c             |   1 +
 drivers/vfio/platform/vfio_platform.c         |   1 +
 drivers/vfio/vfio.h                           | 218 +++++++++++++--
 drivers/vfio/vfio_main.c                      | 244 +++++++++++++++--
 include/linux/iommufd.h                       |   1 +
 include/linux/vfio.h                          |  45 +++-
 include/uapi/linux/kvm.h                      |  13 +-
 include/uapi/linux/vfio.h                     |  69 +++++
 samples/vfio-mdev/mbochs.c                    |   1 +
 samples/vfio-mdev/mdpy.c                      |   1 +
 samples/vfio-mdev/mtty.c                      |   1 +
 virt/kvm/vfio.c                               | 137 +++++-----
 29 files changed, 1317 insertions(+), 243 deletions(-)
 create mode 100644 drivers/vfio/device_cdev.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 180+ messages in thread

* [Intel-gfx] [PATCH v12 00/24] Add vfio_device cdev for iommufd support
@ 2023-06-02 12:16 ` Yi Liu
  0 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

Existing VFIO provides group-centric user APIs for userspace. Userspace
opens the /dev/vfio/$group_id first before getting device fd and hence
getting access to device. This is not the desired model for iommufd. Per
the conclusion of community discussion[1], iommufd provides device-centric
kAPIs and requires its consumer (like VFIO) to be device-centric user
APIs. Such user APIs are used to associate device with iommufd and also
the I/O address spaces managed by the iommufd.

This series first introduces a per device file structure to be prepared
for further enhancement and refactors the kvm-vfio code to be prepared
for accepting device file from userspace. After this, adds a mechanism for
blocking device access before iommufd bind. Then refactors the vfio to be
able to handle cdev path (e.g. iommufd binding, no-iommufd, [de]attach ioas).
This refactor includes making the device_open exclusive between the group
and the cdev path, only allow single device open in cdev path; vfio-iommufd
code is also refactored to support cdev. e.g. split the vfio_iommufd_bind()
into two steps. Eventually, adds the cdev support for vfio device and the
new ioctls, then makes group infrastructure optional as it is not needed
when vfio device cdev is compiled.

This series is based on some preparation works done to vfio emulated devices[2]
and vfio pci hot reset enhancements[3].

This series is a prerequisite for iommu nesting for vfio device[4] [5].

The complete code can be found in below branch, simple tests done to the
legacy group path and the cdev path. Draft QEMU branch can be found at[6]
However, the noiommu mode test is only done with some hacks in kernel and
qemu to check if qemu can boot with noiommu devices.

https://github.com/yiliu1765/iommufd/tree/vfio_device_cdev_v12
(config CONFIG_IOMMUFD=y CONFIG_VFIO_DEVICE_CDEV=y)

base-commit: 0948fa29d62eca627a19d5b1534262a6d93d4181

[1] https://lore.kernel.org/kvm/BN9PR11MB5433B1E4AE5B0480369F97178C189@BN9PR11MB5433.namprd11.prod.outlook.com/
[2] https://lore.kernel.org/kvm/20230327093351.44505-1-yi.l.liu@intel.com/ - merged
[3] https://lore.kernel.org/kvm/20230602121515.79374-1-yi.l.liu@intel.com/
[4] https://lore.kernel.org/linux-iommu/20230511143844.22693-1-yi.l.liu@intel.com/
[5] https://lore.kernel.org/linux-iommu/20230511145110.27707-1-yi.l.liu@intel.com/#t
[6] https://github.com/yiliu1765/qemu/tree/iommufd_rfcv4.mig.reset.v4_var3

Change log:

v12:
 - Rename vfio_device_xx() to be vfio_df_xx() if the object is vfio_device_file (Alex)
 - Refine patch 10 of v11 (Alex)
 - Add new device ioctls from offset 18 (Alex)
 - Add a patch to check group->type for noiommu test, no need to check
   CONFIG_VFIO_NOIOMMU (Alex)
 - Refine the logic of vfio_device_set_noiommu() per Alex's suggestion. The noiommu
   taint is moved to __vfio_register_dev(), also add a check on group type before
   calling vfio_device_set_noiommu() as only physical device can be noiommu device.
 - Drop noiommu support for cdev, patch 16 of v11 is dropped, the related changes
   are in patch 17 - 24 of this series.

v11: https://lore.kernel.org/kvm/20230513132827.39066-1-yi.l.liu@intel.com/
 - Add back the noiommu determination at vfio device registration patch and
   put it prior to compiling vfio_group code optionally as compiling vfio_group
   optionaly is the major reason for it.
 - Fix a typo related to SPAPR (Cédric Le Goater)
 - Add t-b from Shameerali Kolothum Thodi, tested on HiSilicon D06(ARM64) platform
   with a NIC pass-through

v10: https://lore.kernel.org/kvm/20230426150321.454465-1-yi.l.liu@intel.com/
 - Drop patch 03 of v9 as vfio_file_is_group() is still needed by pci hot reset path
 - Drop 11 of v9 per the change of noiommu support
 - Move patch 18 of v9 to hot-reset series [3]
 - vfio_file_has_device_access() is dropped as no usage now (hot-reset does not accept
   device fd, hence no need for this helper)
 - Minor change to patch 02, mainly make it back to patch v2 of v6 which is before
   splitting hot-reset series
 - Minor change in 10 and 11 due to rebase
 - Functional changes in patch 19, 20 and 21 per the latest noiommu support
   policy. noiommu device can be bound to valid iommufd now, this is different
   from the prior policy in which noiommu device is not allowed to be bound to
   valid iommufd. So may pay more attention on the three patches, previous r-b
   and t-b are dropped for these three patches.

v9: https://lore.kernel.org/kvm/20230401151833.124749-1-yi.l.liu@intel.com/
 - Use smp_load_acquire() in vfio_file_has_device_access() for df->access_granted (Alex)
 - Fix lock init in patch 16 of v8 (Jon Pan-Doh)
 - Split patch 20 of v8 (Alex)
 - Refine noiommu logic in BIND_IOMMUFD (Alex)
 - Remove dev_cookie in BIND_IOMMUFD ioctl (Alex, Jason)
 - Remove static_assert in ATTACH/DETACH ioctl handling (Alex)
 - Remove device->ops->bind_iommufd presence check in BIND_IOMMUFD/ATTACH/DETACH handling (Alex)
 - Remove VFIO dependecny for VFIO_CONTAINER as VFIO_GROUP should imply it (Alex)
 - Improve the documentation per suggestions from Alex on patch 24 of v8 (Alex)
 - Remove WARN_ON(df->group) in vfio_device_group_uses_container() of patch 11
 - Add r-b from Kevin to patch 18/19 of v8
 - Add r-b from Jason to patch 03/10/11 of v8
 - Add t-b from Yanting Jiang and Nicolin Chen

v8: https://lore.kernel.org/kvm/20230327094047.47215-1-yi.l.liu@intel.com/
 - Add patch 18 to determine noiommu device at vfio_device registration (Jason)
 - Add patch 19 to name noiommu device with "noiommu-" prefix to be par with
   group path
 - Add r-b from Kevin
 - Add t-b from Terrence

v7: https://lore.kernel.org/kvm/20230316125534.17216-1-yi.l.liu@intel.com/
 - Split the vfio-pci hot reset changes to be separate patch series (Jason, Kevin)
 - More polish on no-iommufd support (patch 11 - 13) in cdev path (Kevin)
 - iommufd_access_detach() in patch 16 is added by Nic for emulated devices (Kevin, Jason)

v6: https://lore.kernel.org/kvm/20230308132903.465159-1-yi.l.liu@intel.com/#t
 - Add r-b from Jason on patch 01 - 08 and 13 in v5
 - Based on the prerequisite mini-series which makes vfio emulated devices
   be prepared to cdev (Jason)
 - Add the approach to pass a set of device fds to do hot reset ownership
   check, while the zero-length array approach is also kept. (Jason, Kevin, Alex)
 - Drop patch 10 of v5, it is reworked by patch 13 and 17 in v6 (Jason)
 - Store vfio_group pointer in vfio_device_file to check if user is using
   legacy vfio container (Jason)
 - Drop the is_cdev_device flag (introduced in patch 14 of v5) as the group
   pointer stored in vfio_device_file can cover it.
 - Add iommu_group check in the cdev no-iommu path patch 24 (Kevin)
 - Add t-b from Terrence, Nicolin and Matthew (thanks for the help, some patches
   are new in this version, so I just added t-b to the patches that are also
   in v5 and no big change, for others would add in this version).

v5: https://lore.kernel.org/kvm/20230227111135.61728-1-yi.l.liu@intel.com/
 - Add r-b from Kevin on patch 08, 13, 14, 15 and 17.
 - Rename patch 02 to limit the change for KVM facing kAPIs. The vfio pci
   hot reset path only accepts group file until patch 09. (Kevin)
 - Update comment around smp_load_acquire(&df->access_granted) (Yan)
 - Adopt Jason's suggestion on the vfio pci hot reset path, passing zero-length
   fd array to indicate using bound iommufd_ctx as ownership check. (Jason, Kevin)
 - Direct read df->access_granted value in vfio_device_cdev_close() (Kevin, Yan, Jason)
 - Wrap the iommufd get/put into a helper to refine the error path of
   vfio_device_ioctl_bind_iommufd(). (Yan)

v4: https://lore.kernel.org/kvm/20230221034812.138051-1-yi.l.liu@intel.com/
 - Add r-b from Kevin on patch 09/10
 - Add a line in devices/vfio.rst to emphasize user should add group/device to
   KVM prior to invoke open_device op which may be called in the VFIO_GROUP_GET_DEVICE_FD
   or VFIO_DEVICE_BIND_IOMMUFD ioctl.
 - Modify VFIO_GROUP/VFIO_DEVICE_CDEV Kconfig dependency (Alex)
 - Select VFIO_GROUP for SPAPR (Jason)
 - Check device fully-opened in PCI hotreset path for device fd (Jason)
 - Set df->access_granted in the caller of vfio_device_open() since
   the caller may fail in other operations, but df->access_granted
   does not allow a true to false change. So it should be set only when
   the open path is really done successfully. (Yan, Kevin)
 - Fix missing iommufd_ctx_put() in the cdev path (Yan)
 - Fix an issue found in testing exclusion between group and cdev path.
   vfio_device_cdev_close() should check df->access_granted before heading
   to other operations.
 - Update vfio.rst for iommufd/cdev

v3: https://lore.kernel.org/kvm/20230213151348.56451-1-yi.l.liu@intel.com/
 - Add r-b from Kevin on patch 03, 06, 07, 08.
 - Refine the group and cdev path exclusion. Remove vfio_device:single_open;
   add vfio_group::cdev_device_open_cnt to achieve exlucsion between group
   path and cdev path (Kevin, Jason)
 - Fix a bug in the error handling path (Yan Zhao)
 - Address misc remarks from Kevin

v2: https://lore.kernel.org/kvm/20230206090532.95598-1-yi.l.liu@intel.com/
 - Add r-b from Kevin and Eric on patch 01 02 04.
 - "Split kvm/vfio: Provide struct kvm_device_ops::release() insted of ::destroy()"
   from this series and got applied. (Alex, Kevin, Jason, Mathhew)
 - Add kvm_ref_lock to protect vfio_device_file->kvm instead of reusing
   dev_set->lock as dead-lock is observed with vfio-ap which would try to
   acquire kvm_lock. This is opposite lock order with kvm_device_release()
   which holds kvm_lock first and then hold dev_set->lock. (Kevin)
 - Use a separate ioctl for detaching IOAS. (Alex)
 - Rename vfio_device_file::single_open to be is_cdev_device (Kevin, Alex)
 - Move the vfio device cdev code into device_cdev.c and add a VFIO_DEVICE_CDEV
   kconfig for it. (Kevin, Jason)

v1: https://lore.kernel.org/kvm/20230117134942.101112-1-yi.l.liu@intel.com/
 - Fix the circular refcount between kvm struct and device file reference. (JasonG)
 - Address comments from KevinT
 - Remained the ioctl for detach, needs to Alex's taste
   (https://lore.kernel.org/kvm/BN9PR11MB5276BE9F4B0613EE859317028CFF9@BN9PR11MB5276.namprd11.prod.outlook.com/)

rfc: https://lore.kernel.org/kvm/20221219084718.9342-1-yi.l.liu@intel.com/

Thanks,
	Yi Liu

Nicolin Chen (1):
  iommufd/device: Add iommufd_access_detach() API

Yi Liu (23):
  vfio: Allocate per device file structure
  vfio: Refine vfio file kAPIs for KVM
  vfio: Accept vfio device file in the KVM facing kAPI
  kvm/vfio: Prepare for accepting vfio device fd
  kvm/vfio: Accept vfio device file from userspace
  vfio: Pass struct vfio_device_file * to vfio_device_open/close()
  vfio: Block device access via device fd until device is opened
  vfio: Add cdev_device_open_cnt to vfio_group
  vfio: Make vfio_df_open() single open for device cdev path
  vfio-iommufd: Move noiommu compat validation out of
    vfio_iommufd_bind()
  vfio-iommufd: Split bind/attach into two steps
  vfio: Record devid in vfio_device_file
  vfio-iommufd: Add detach_ioas support for physical VFIO devices
  vfio-iommufd: Add detach_ioas support for emulated VFIO devices
  vfio: Move vfio_device_group_unregister() to be the first operation in
    unregister
  vfio: Add cdev for vfio_device
  vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  vfio: Add VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT
  vfio: Only check group->type for noiommu test
  vfio: Determine noiommu device in __vfio_register_dev()
  vfio: Remove vfio_device_is_noiommu()
  vfio: Compile vfio_group infrastructure optionally
  docs: vfio: Add vfio device cdev description

 Documentation/driver-api/vfio.rst             | 140 +++++++++-
 Documentation/virt/kvm/devices/vfio.rst       |  47 ++--
 drivers/gpu/drm/i915/gvt/kvmgt.c              |   1 +
 drivers/iommu/iommufd/Kconfig                 |   4 +-
 drivers/iommu/iommufd/device.c                |  76 +++++-
 drivers/iommu/iommufd/iommufd_private.h       |   2 +
 drivers/s390/cio/vfio_ccw_ops.c               |   1 +
 drivers/s390/crypto/vfio_ap_ops.c             |   1 +
 drivers/vfio/Kconfig                          |  27 ++
 drivers/vfio/Makefile                         |   3 +-
 drivers/vfio/device_cdev.c                    | 251 ++++++++++++++++++
 drivers/vfio/fsl-mc/vfio_fsl_mc.c             |   1 +
 drivers/vfio/group.c                          | 174 +++++++-----
 drivers/vfio/iommufd.c                        |  96 ++++---
 .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    |   2 +
 drivers/vfio/pci/mlx5/main.c                  |   1 +
 drivers/vfio/pci/vfio_pci.c                   |   1 +
 drivers/vfio/platform/vfio_amba.c             |   1 +
 drivers/vfio/platform/vfio_platform.c         |   1 +
 drivers/vfio/vfio.h                           | 218 +++++++++++++--
 drivers/vfio/vfio_main.c                      | 244 +++++++++++++++--
 include/linux/iommufd.h                       |   1 +
 include/linux/vfio.h                          |  45 +++-
 include/uapi/linux/kvm.h                      |  13 +-
 include/uapi/linux/vfio.h                     |  69 +++++
 samples/vfio-mdev/mbochs.c                    |   1 +
 samples/vfio-mdev/mdpy.c                      |   1 +
 samples/vfio-mdev/mtty.c                      |   1 +
 virt/kvm/vfio.c                               | 137 +++++-----
 29 files changed, 1317 insertions(+), 243 deletions(-)
 create mode 100644 drivers/vfio/device_cdev.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 180+ messages in thread

* [PATCH v12 01/24] vfio: Allocate per device file structure
  2023-06-02 12:16 ` [Intel-gfx] " Yi Liu
@ 2023-06-02 12:16   ` Yi Liu
  -1 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

This is preparation for adding vfio device cdev support. vfio device
cdev requires:
1) A per device file memory to store the kvm pointer set by KVM. It will
   be propagated to vfio_device:kvm after the device cdev file is bound
   to an iommufd.
2) A mechanism to block device access through device cdev fd before it
   is bound to an iommufd.

To address the above requirements, this adds a per device file structure
named vfio_device_file. For now, it's only a wrapper of struct vfio_device
pointer. Other fields will be added to this per file structure in future
commits.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c     | 13 +++++++++++--
 drivers/vfio/vfio.h      |  6 ++++++
 drivers/vfio/vfio_main.c | 31 ++++++++++++++++++++++++++-----
 3 files changed, 43 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index fc75c1000d74..fbba9fc15e57 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -218,19 +218,26 @@ void vfio_device_group_close(struct vfio_device *device)
 
 static struct file *vfio_device_open_file(struct vfio_device *device)
 {
+	struct vfio_device_file *df;
 	struct file *filep;
 	int ret;
 
+	df = vfio_allocate_device_file(device);
+	if (IS_ERR(df)) {
+		ret = PTR_ERR(df);
+		goto err_out;
+	}
+
 	ret = vfio_device_group_open(device);
 	if (ret)
-		goto err_out;
+		goto err_free;
 
 	/*
 	 * We can't use anon_inode_getfd() because we need to modify
 	 * the f_mode flags directly to allow more than just ioctls
 	 */
 	filep = anon_inode_getfile("[vfio-device]", &vfio_device_fops,
-				   device, O_RDWR);
+				   df, O_RDWR);
 	if (IS_ERR(filep)) {
 		ret = PTR_ERR(filep);
 		goto err_close_device;
@@ -254,6 +261,8 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
 
 err_close_device:
 	vfio_device_group_close(device);
+err_free:
+	kfree(df);
 err_out:
 	return ERR_PTR(ret);
 }
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 7b19c621e0e6..87d3dd6b9ef9 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -16,11 +16,17 @@ struct iommufd_ctx;
 struct iommu_group;
 struct vfio_container;
 
+struct vfio_device_file {
+	struct vfio_device *device;
+};
+
 void vfio_device_put_registration(struct vfio_device *device);
 bool vfio_device_try_get_registration(struct vfio_device *device);
 int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd);
 void vfio_device_close(struct vfio_device *device,
 		       struct iommufd_ctx *iommufd);
+struct vfio_device_file *
+vfio_allocate_device_file(struct vfio_device *device);
 
 extern const struct file_operations vfio_device_fops;
 
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index ab4f3a794f78..39c1158ffef0 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -419,6 +419,20 @@ static bool vfio_assert_device_open(struct vfio_device *device)
 	return !WARN_ON_ONCE(!READ_ONCE(device->open_count));
 }
 
+struct vfio_device_file *
+vfio_allocate_device_file(struct vfio_device *device)
+{
+	struct vfio_device_file *df;
+
+	df = kzalloc(sizeof(*df), GFP_KERNEL_ACCOUNT);
+	if (!df)
+		return ERR_PTR(-ENOMEM);
+
+	df->device = device;
+
+	return df;
+}
+
 static int vfio_device_first_open(struct vfio_device *device,
 				  struct iommufd_ctx *iommufd)
 {
@@ -532,12 +546,15 @@ static inline void vfio_device_pm_runtime_put(struct vfio_device *device)
  */
 static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 
 	vfio_device_group_close(device);
 
 	vfio_device_put_registration(device);
 
+	kfree(df);
+
 	return 0;
 }
 
@@ -1102,7 +1119,8 @@ static int vfio_ioctl_device_feature(struct vfio_device *device,
 static long vfio_device_fops_unl_ioctl(struct file *filep,
 				       unsigned int cmd, unsigned long arg)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 	int ret;
 
 	ret = vfio_device_pm_runtime_get(device);
@@ -1129,7 +1147,8 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
 static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf,
 				     size_t count, loff_t *ppos)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 
 	if (unlikely(!device->ops->read))
 		return -EINVAL;
@@ -1141,7 +1160,8 @@ static ssize_t vfio_device_fops_write(struct file *filep,
 				      const char __user *buf,
 				      size_t count, loff_t *ppos)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 
 	if (unlikely(!device->ops->write))
 		return -EINVAL;
@@ -1151,7 +1171,8 @@ static ssize_t vfio_device_fops_write(struct file *filep,
 
 static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 
 	if (unlikely(!device->ops->mmap))
 		return -EINVAL;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [Intel-gfx] [PATCH v12 01/24] vfio: Allocate per device file structure
@ 2023-06-02 12:16   ` Yi Liu
  0 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

This is preparation for adding vfio device cdev support. vfio device
cdev requires:
1) A per device file memory to store the kvm pointer set by KVM. It will
   be propagated to vfio_device:kvm after the device cdev file is bound
   to an iommufd.
2) A mechanism to block device access through device cdev fd before it
   is bound to an iommufd.

To address the above requirements, this adds a per device file structure
named vfio_device_file. For now, it's only a wrapper of struct vfio_device
pointer. Other fields will be added to this per file structure in future
commits.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c     | 13 +++++++++++--
 drivers/vfio/vfio.h      |  6 ++++++
 drivers/vfio/vfio_main.c | 31 ++++++++++++++++++++++++++-----
 3 files changed, 43 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index fc75c1000d74..fbba9fc15e57 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -218,19 +218,26 @@ void vfio_device_group_close(struct vfio_device *device)
 
 static struct file *vfio_device_open_file(struct vfio_device *device)
 {
+	struct vfio_device_file *df;
 	struct file *filep;
 	int ret;
 
+	df = vfio_allocate_device_file(device);
+	if (IS_ERR(df)) {
+		ret = PTR_ERR(df);
+		goto err_out;
+	}
+
 	ret = vfio_device_group_open(device);
 	if (ret)
-		goto err_out;
+		goto err_free;
 
 	/*
 	 * We can't use anon_inode_getfd() because we need to modify
 	 * the f_mode flags directly to allow more than just ioctls
 	 */
 	filep = anon_inode_getfile("[vfio-device]", &vfio_device_fops,
-				   device, O_RDWR);
+				   df, O_RDWR);
 	if (IS_ERR(filep)) {
 		ret = PTR_ERR(filep);
 		goto err_close_device;
@@ -254,6 +261,8 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
 
 err_close_device:
 	vfio_device_group_close(device);
+err_free:
+	kfree(df);
 err_out:
 	return ERR_PTR(ret);
 }
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 7b19c621e0e6..87d3dd6b9ef9 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -16,11 +16,17 @@ struct iommufd_ctx;
 struct iommu_group;
 struct vfio_container;
 
+struct vfio_device_file {
+	struct vfio_device *device;
+};
+
 void vfio_device_put_registration(struct vfio_device *device);
 bool vfio_device_try_get_registration(struct vfio_device *device);
 int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd);
 void vfio_device_close(struct vfio_device *device,
 		       struct iommufd_ctx *iommufd);
+struct vfio_device_file *
+vfio_allocate_device_file(struct vfio_device *device);
 
 extern const struct file_operations vfio_device_fops;
 
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index ab4f3a794f78..39c1158ffef0 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -419,6 +419,20 @@ static bool vfio_assert_device_open(struct vfio_device *device)
 	return !WARN_ON_ONCE(!READ_ONCE(device->open_count));
 }
 
+struct vfio_device_file *
+vfio_allocate_device_file(struct vfio_device *device)
+{
+	struct vfio_device_file *df;
+
+	df = kzalloc(sizeof(*df), GFP_KERNEL_ACCOUNT);
+	if (!df)
+		return ERR_PTR(-ENOMEM);
+
+	df->device = device;
+
+	return df;
+}
+
 static int vfio_device_first_open(struct vfio_device *device,
 				  struct iommufd_ctx *iommufd)
 {
@@ -532,12 +546,15 @@ static inline void vfio_device_pm_runtime_put(struct vfio_device *device)
  */
 static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 
 	vfio_device_group_close(device);
 
 	vfio_device_put_registration(device);
 
+	kfree(df);
+
 	return 0;
 }
 
@@ -1102,7 +1119,8 @@ static int vfio_ioctl_device_feature(struct vfio_device *device,
 static long vfio_device_fops_unl_ioctl(struct file *filep,
 				       unsigned int cmd, unsigned long arg)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 	int ret;
 
 	ret = vfio_device_pm_runtime_get(device);
@@ -1129,7 +1147,8 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
 static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf,
 				     size_t count, loff_t *ppos)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 
 	if (unlikely(!device->ops->read))
 		return -EINVAL;
@@ -1141,7 +1160,8 @@ static ssize_t vfio_device_fops_write(struct file *filep,
 				      const char __user *buf,
 				      size_t count, loff_t *ppos)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 
 	if (unlikely(!device->ops->write))
 		return -EINVAL;
@@ -1151,7 +1171,8 @@ static ssize_t vfio_device_fops_write(struct file *filep,
 
 static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
 {
-	struct vfio_device *device = filep->private_data;
+	struct vfio_device_file *df = filep->private_data;
+	struct vfio_device *device = df->device;
 
 	if (unlikely(!device->ops->mmap))
 		return -EINVAL;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH v12 02/24] vfio: Refine vfio file kAPIs for KVM
  2023-06-02 12:16 ` [Intel-gfx] " Yi Liu
@ 2023-06-02 12:16   ` Yi Liu
  -1 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

This prepares for making the below kAPIs to accept both group file
and device file instead of only vfio group file.

  bool vfio_file_enforced_coherent(struct file *file);
  void vfio_file_set_kvm(struct file *file, struct kvm *kvm);

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c     | 53 +++++++++++++---------------------------
 drivers/vfio/vfio.h      |  3 +++
 drivers/vfio/vfio_main.c | 49 +++++++++++++++++++++++++++++++++++++
 include/linux/vfio.h     |  1 +
 virt/kvm/vfio.c          | 10 ++++----
 5 files changed, 75 insertions(+), 41 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index fbba9fc15e57..b56e19d2a02d 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -754,6 +754,15 @@ bool vfio_device_has_container(struct vfio_device *device)
 	return device->group->container;
 }
 
+struct vfio_group *vfio_group_from_file(struct file *file)
+{
+	struct vfio_group *group = file->private_data;
+
+	if (file->f_op != &vfio_group_fops)
+		return NULL;
+	return group;
+}
+
 /**
  * vfio_file_iommu_group - Return the struct iommu_group for the vfio group file
  * @file: VFIO group file
@@ -764,13 +773,13 @@ bool vfio_device_has_container(struct vfio_device *device)
  */
 struct iommu_group *vfio_file_iommu_group(struct file *file)
 {
-	struct vfio_group *group = file->private_data;
+	struct vfio_group *group = vfio_group_from_file(file);
 	struct iommu_group *iommu_group = NULL;
 
 	if (!IS_ENABLED(CONFIG_SPAPR_TCE_IOMMU))
 		return NULL;
 
-	if (!vfio_file_is_group(file))
+	if (!group)
 		return NULL;
 
 	mutex_lock(&group->group_lock);
@@ -784,33 +793,20 @@ struct iommu_group *vfio_file_iommu_group(struct file *file)
 EXPORT_SYMBOL_GPL(vfio_file_iommu_group);
 
 /**
- * vfio_file_is_group - True if the file is usable with VFIO aPIS
+ * vfio_file_is_group - True if the file is a vfio group file
  * @file: VFIO group file
  */
 bool vfio_file_is_group(struct file *file)
 {
-	return file->f_op == &vfio_group_fops;
+	return vfio_group_from_file(file);
 }
 EXPORT_SYMBOL_GPL(vfio_file_is_group);
 
-/**
- * vfio_file_enforced_coherent - True if the DMA associated with the VFIO file
- *        is always CPU cache coherent
- * @file: VFIO group file
- *
- * Enforced coherency means that the IOMMU ignores things like the PCIe no-snoop
- * bit in DMA transactions. A return of false indicates that the user has
- * rights to access additional instructions such as wbinvd on x86.
- */
-bool vfio_file_enforced_coherent(struct file *file)
+bool vfio_group_enforced_coherent(struct vfio_group *group)
 {
-	struct vfio_group *group = file->private_data;
 	struct vfio_device *device;
 	bool ret = true;
 
-	if (!vfio_file_is_group(file))
-		return true;
-
 	/*
 	 * If the device does not have IOMMU_CAP_ENFORCE_CACHE_COHERENCY then
 	 * any domain later attached to it will also not support it. If the cap
@@ -828,28 +824,13 @@ bool vfio_file_enforced_coherent(struct file *file)
 	mutex_unlock(&group->device_lock);
 	return ret;
 }
-EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
 
-/**
- * vfio_file_set_kvm - Link a kvm with VFIO drivers
- * @file: VFIO group file
- * @kvm: KVM to link
- *
- * When a VFIO device is first opened the KVM will be available in
- * device->kvm if one was associated with the group.
- */
-void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
+void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm)
 {
-	struct vfio_group *group = file->private_data;
-
-	if (!vfio_file_is_group(file))
-		return;
-
 	spin_lock(&group->kvm_ref_lock);
 	group->kvm = kvm;
 	spin_unlock(&group->kvm_ref_lock);
 }
-EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
 
 /**
  * vfio_file_has_dev - True if the VFIO file is a handle for device
@@ -860,9 +841,9 @@ EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
  */
 bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
 {
-	struct vfio_group *group = file->private_data;
+	struct vfio_group *group = vfio_group_from_file(file);
 
-	if (!vfio_file_is_group(file))
+	if (!group)
 		return false;
 
 	return group == device->group;
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 87d3dd6b9ef9..b1e327a85a32 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -90,6 +90,9 @@ void vfio_device_group_unregister(struct vfio_device *device);
 int vfio_device_group_use_iommu(struct vfio_device *device);
 void vfio_device_group_unuse_iommu(struct vfio_device *device);
 void vfio_device_group_close(struct vfio_device *device);
+struct vfio_group *vfio_group_from_file(struct file *file);
+bool vfio_group_enforced_coherent(struct vfio_group *group);
+void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm);
 bool vfio_device_has_container(struct vfio_device *device);
 int __init vfio_group_init(void);
 void vfio_group_cleanup(void);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 39c1158ffef0..4665791aa2eb 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1190,6 +1190,55 @@ const struct file_operations vfio_device_fops = {
 	.mmap		= vfio_device_fops_mmap,
 };
 
+/**
+ * vfio_file_is_valid - True if the file is valid vfio file
+ * @file: VFIO group file or VFIO device file
+ */
+bool vfio_file_is_valid(struct file *file)
+{
+	return vfio_group_from_file(file);
+}
+EXPORT_SYMBOL_GPL(vfio_file_is_valid);
+
+/**
+ * vfio_file_enforced_coherent - True if the DMA associated with the VFIO file
+ *        is always CPU cache coherent
+ * @file: VFIO group file or VFIO device file
+ *
+ * Enforced coherency means that the IOMMU ignores things like the PCIe no-snoop
+ * bit in DMA transactions. A return of false indicates that the user has
+ * rights to access additional instructions such as wbinvd on x86.
+ */
+bool vfio_file_enforced_coherent(struct file *file)
+{
+	struct vfio_group *group;
+
+	group = vfio_group_from_file(file);
+	if (group)
+		return vfio_group_enforced_coherent(group);
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
+
+/**
+ * vfio_file_set_kvm - Link a kvm with VFIO drivers
+ * @file: VFIO group file or VFIO device file
+ * @kvm: KVM to link
+ *
+ * When a VFIO device is first opened the KVM will be available in
+ * device->kvm if one was associated with the file.
+ */
+void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
+{
+	struct vfio_group *group;
+
+	group = vfio_group_from_file(file);
+	if (group)
+		vfio_group_set_kvm(group, kvm);
+}
+EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
+
 /*
  * Sub-module support
  */
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 382a7b119c7c..974f8bcf917a 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -274,6 +274,7 @@ int vfio_mig_get_next_state(struct vfio_device *device,
  */
 struct iommu_group *vfio_file_iommu_group(struct file *file);
 bool vfio_file_is_group(struct file *file);
+bool vfio_file_is_valid(struct file *file);
 bool vfio_file_enforced_coherent(struct file *file);
 void vfio_file_set_kvm(struct file *file, struct kvm *kvm);
 bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 9584eb57e0ed..b33c7b8488b3 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -64,18 +64,18 @@ static bool kvm_vfio_file_enforced_coherent(struct file *file)
 	return ret;
 }
 
-static bool kvm_vfio_file_is_group(struct file *file)
+static bool kvm_vfio_file_is_valid(struct file *file)
 {
 	bool (*fn)(struct file *file);
 	bool ret;
 
-	fn = symbol_get(vfio_file_is_group);
+	fn = symbol_get(vfio_file_is_valid);
 	if (!fn)
 		return false;
 
 	ret = fn(file);
 
-	symbol_put(vfio_file_is_group);
+	symbol_put(vfio_file_is_valid);
 
 	return ret;
 }
@@ -154,8 +154,8 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
 	if (!filp)
 		return -EBADF;
 
-	/* Ensure the FD is a vfio group FD.*/
-	if (!kvm_vfio_file_is_group(filp)) {
+	/* Ensure the FD is a vfio FD. */
+	if (!kvm_vfio_file_is_valid(filp)) {
 		ret = -EINVAL;
 		goto err_fput;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [Intel-gfx] [PATCH v12 02/24] vfio: Refine vfio file kAPIs for KVM
@ 2023-06-02 12:16   ` Yi Liu
  0 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

This prepares for making the below kAPIs to accept both group file
and device file instead of only vfio group file.

  bool vfio_file_enforced_coherent(struct file *file);
  void vfio_file_set_kvm(struct file *file, struct kvm *kvm);

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c     | 53 +++++++++++++---------------------------
 drivers/vfio/vfio.h      |  3 +++
 drivers/vfio/vfio_main.c | 49 +++++++++++++++++++++++++++++++++++++
 include/linux/vfio.h     |  1 +
 virt/kvm/vfio.c          | 10 ++++----
 5 files changed, 75 insertions(+), 41 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index fbba9fc15e57..b56e19d2a02d 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -754,6 +754,15 @@ bool vfio_device_has_container(struct vfio_device *device)
 	return device->group->container;
 }
 
+struct vfio_group *vfio_group_from_file(struct file *file)
+{
+	struct vfio_group *group = file->private_data;
+
+	if (file->f_op != &vfio_group_fops)
+		return NULL;
+	return group;
+}
+
 /**
  * vfio_file_iommu_group - Return the struct iommu_group for the vfio group file
  * @file: VFIO group file
@@ -764,13 +773,13 @@ bool vfio_device_has_container(struct vfio_device *device)
  */
 struct iommu_group *vfio_file_iommu_group(struct file *file)
 {
-	struct vfio_group *group = file->private_data;
+	struct vfio_group *group = vfio_group_from_file(file);
 	struct iommu_group *iommu_group = NULL;
 
 	if (!IS_ENABLED(CONFIG_SPAPR_TCE_IOMMU))
 		return NULL;
 
-	if (!vfio_file_is_group(file))
+	if (!group)
 		return NULL;
 
 	mutex_lock(&group->group_lock);
@@ -784,33 +793,20 @@ struct iommu_group *vfio_file_iommu_group(struct file *file)
 EXPORT_SYMBOL_GPL(vfio_file_iommu_group);
 
 /**
- * vfio_file_is_group - True if the file is usable with VFIO aPIS
+ * vfio_file_is_group - True if the file is a vfio group file
  * @file: VFIO group file
  */
 bool vfio_file_is_group(struct file *file)
 {
-	return file->f_op == &vfio_group_fops;
+	return vfio_group_from_file(file);
 }
 EXPORT_SYMBOL_GPL(vfio_file_is_group);
 
-/**
- * vfio_file_enforced_coherent - True if the DMA associated with the VFIO file
- *        is always CPU cache coherent
- * @file: VFIO group file
- *
- * Enforced coherency means that the IOMMU ignores things like the PCIe no-snoop
- * bit in DMA transactions. A return of false indicates that the user has
- * rights to access additional instructions such as wbinvd on x86.
- */
-bool vfio_file_enforced_coherent(struct file *file)
+bool vfio_group_enforced_coherent(struct vfio_group *group)
 {
-	struct vfio_group *group = file->private_data;
 	struct vfio_device *device;
 	bool ret = true;
 
-	if (!vfio_file_is_group(file))
-		return true;
-
 	/*
 	 * If the device does not have IOMMU_CAP_ENFORCE_CACHE_COHERENCY then
 	 * any domain later attached to it will also not support it. If the cap
@@ -828,28 +824,13 @@ bool vfio_file_enforced_coherent(struct file *file)
 	mutex_unlock(&group->device_lock);
 	return ret;
 }
-EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
 
-/**
- * vfio_file_set_kvm - Link a kvm with VFIO drivers
- * @file: VFIO group file
- * @kvm: KVM to link
- *
- * When a VFIO device is first opened the KVM will be available in
- * device->kvm if one was associated with the group.
- */
-void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
+void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm)
 {
-	struct vfio_group *group = file->private_data;
-
-	if (!vfio_file_is_group(file))
-		return;
-
 	spin_lock(&group->kvm_ref_lock);
 	group->kvm = kvm;
 	spin_unlock(&group->kvm_ref_lock);
 }
-EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
 
 /**
  * vfio_file_has_dev - True if the VFIO file is a handle for device
@@ -860,9 +841,9 @@ EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
  */
 bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
 {
-	struct vfio_group *group = file->private_data;
+	struct vfio_group *group = vfio_group_from_file(file);
 
-	if (!vfio_file_is_group(file))
+	if (!group)
 		return false;
 
 	return group == device->group;
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 87d3dd6b9ef9..b1e327a85a32 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -90,6 +90,9 @@ void vfio_device_group_unregister(struct vfio_device *device);
 int vfio_device_group_use_iommu(struct vfio_device *device);
 void vfio_device_group_unuse_iommu(struct vfio_device *device);
 void vfio_device_group_close(struct vfio_device *device);
+struct vfio_group *vfio_group_from_file(struct file *file);
+bool vfio_group_enforced_coherent(struct vfio_group *group);
+void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm);
 bool vfio_device_has_container(struct vfio_device *device);
 int __init vfio_group_init(void);
 void vfio_group_cleanup(void);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 39c1158ffef0..4665791aa2eb 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1190,6 +1190,55 @@ const struct file_operations vfio_device_fops = {
 	.mmap		= vfio_device_fops_mmap,
 };
 
+/**
+ * vfio_file_is_valid - True if the file is valid vfio file
+ * @file: VFIO group file or VFIO device file
+ */
+bool vfio_file_is_valid(struct file *file)
+{
+	return vfio_group_from_file(file);
+}
+EXPORT_SYMBOL_GPL(vfio_file_is_valid);
+
+/**
+ * vfio_file_enforced_coherent - True if the DMA associated with the VFIO file
+ *        is always CPU cache coherent
+ * @file: VFIO group file or VFIO device file
+ *
+ * Enforced coherency means that the IOMMU ignores things like the PCIe no-snoop
+ * bit in DMA transactions. A return of false indicates that the user has
+ * rights to access additional instructions such as wbinvd on x86.
+ */
+bool vfio_file_enforced_coherent(struct file *file)
+{
+	struct vfio_group *group;
+
+	group = vfio_group_from_file(file);
+	if (group)
+		return vfio_group_enforced_coherent(group);
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
+
+/**
+ * vfio_file_set_kvm - Link a kvm with VFIO drivers
+ * @file: VFIO group file or VFIO device file
+ * @kvm: KVM to link
+ *
+ * When a VFIO device is first opened the KVM will be available in
+ * device->kvm if one was associated with the file.
+ */
+void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
+{
+	struct vfio_group *group;
+
+	group = vfio_group_from_file(file);
+	if (group)
+		vfio_group_set_kvm(group, kvm);
+}
+EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
+
 /*
  * Sub-module support
  */
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 382a7b119c7c..974f8bcf917a 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -274,6 +274,7 @@ int vfio_mig_get_next_state(struct vfio_device *device,
  */
 struct iommu_group *vfio_file_iommu_group(struct file *file);
 bool vfio_file_is_group(struct file *file);
+bool vfio_file_is_valid(struct file *file);
 bool vfio_file_enforced_coherent(struct file *file);
 void vfio_file_set_kvm(struct file *file, struct kvm *kvm);
 bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 9584eb57e0ed..b33c7b8488b3 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -64,18 +64,18 @@ static bool kvm_vfio_file_enforced_coherent(struct file *file)
 	return ret;
 }
 
-static bool kvm_vfio_file_is_group(struct file *file)
+static bool kvm_vfio_file_is_valid(struct file *file)
 {
 	bool (*fn)(struct file *file);
 	bool ret;
 
-	fn = symbol_get(vfio_file_is_group);
+	fn = symbol_get(vfio_file_is_valid);
 	if (!fn)
 		return false;
 
 	ret = fn(file);
 
-	symbol_put(vfio_file_is_group);
+	symbol_put(vfio_file_is_valid);
 
 	return ret;
 }
@@ -154,8 +154,8 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
 	if (!filp)
 		return -EBADF;
 
-	/* Ensure the FD is a vfio group FD.*/
-	if (!kvm_vfio_file_is_group(filp)) {
+	/* Ensure the FD is a vfio FD. */
+	if (!kvm_vfio_file_is_valid(filp)) {
 		ret = -EINVAL;
 		goto err_fput;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH v12 03/24] vfio: Accept vfio device file in the KVM facing kAPI
  2023-06-02 12:16 ` [Intel-gfx] " Yi Liu
@ 2023-06-02 12:16   ` Yi Liu
  -1 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

This makes the vfio file kAPIs to accept vfio device files, also a
preparation for vfio device cdev support.

For the kvm set with vfio device file, kvm pointer is stored in struct
vfio_device_file, and use kvm_ref_lock to protect kvm set and kvm
pointer usage within VFIO. This kvm pointer will be set to vfio_device
after device file is bound to iommufd in the cdev path.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/vfio.h      |  2 ++
 drivers/vfio/vfio_main.c | 36 +++++++++++++++++++++++++++++++++++-
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index b1e327a85a32..69e1a0692b06 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -18,6 +18,8 @@ struct vfio_container;
 
 struct vfio_device_file {
 	struct vfio_device *device;
+	spinlock_t kvm_ref_lock; /* protect kvm field */
+	struct kvm *kvm;
 };
 
 void vfio_device_put_registration(struct vfio_device *device);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 4665791aa2eb..8ef9210ad2aa 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -429,6 +429,7 @@ vfio_allocate_device_file(struct vfio_device *device)
 		return ERR_PTR(-ENOMEM);
 
 	df->device = device;
+	spin_lock_init(&df->kvm_ref_lock);
 
 	return df;
 }
@@ -1190,13 +1191,23 @@ const struct file_operations vfio_device_fops = {
 	.mmap		= vfio_device_fops_mmap,
 };
 
+static struct vfio_device *vfio_device_from_file(struct file *file)
+{
+	struct vfio_device_file *df = file->private_data;
+
+	if (file->f_op != &vfio_device_fops)
+		return NULL;
+	return df->device;
+}
+
 /**
  * vfio_file_is_valid - True if the file is valid vfio file
  * @file: VFIO group file or VFIO device file
  */
 bool vfio_file_is_valid(struct file *file)
 {
-	return vfio_group_from_file(file);
+	return vfio_group_from_file(file) ||
+	       vfio_device_from_file(file);
 }
 EXPORT_SYMBOL_GPL(vfio_file_is_valid);
 
@@ -1211,16 +1222,36 @@ EXPORT_SYMBOL_GPL(vfio_file_is_valid);
  */
 bool vfio_file_enforced_coherent(struct file *file)
 {
+	struct vfio_device *device;
 	struct vfio_group *group;
 
 	group = vfio_group_from_file(file);
 	if (group)
 		return vfio_group_enforced_coherent(group);
 
+	device = vfio_device_from_file(file);
+	if (device)
+		return device_iommu_capable(device->dev,
+					    IOMMU_CAP_ENFORCE_CACHE_COHERENCY);
+
 	return true;
 }
 EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
 
+static void vfio_device_file_set_kvm(struct file *file, struct kvm *kvm)
+{
+	struct vfio_device_file *df = file->private_data;
+
+	/*
+	 * The kvm is first recorded in the vfio_device_file, and will
+	 * be propagated to vfio_device::kvm when the file is bound to
+	 * iommufd successfully in the vfio device cdev path.
+	 */
+	spin_lock(&df->kvm_ref_lock);
+	df->kvm = kvm;
+	spin_unlock(&df->kvm_ref_lock);
+}
+
 /**
  * vfio_file_set_kvm - Link a kvm with VFIO drivers
  * @file: VFIO group file or VFIO device file
@@ -1236,6 +1267,9 @@ void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
 	group = vfio_group_from_file(file);
 	if (group)
 		vfio_group_set_kvm(group, kvm);
+
+	if (vfio_device_from_file(file))
+		vfio_device_file_set_kvm(file, kvm);
 }
 EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [Intel-gfx] [PATCH v12 03/24] vfio: Accept vfio device file in the KVM facing kAPI
@ 2023-06-02 12:16   ` Yi Liu
  0 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

This makes the vfio file kAPIs to accept vfio device files, also a
preparation for vfio device cdev support.

For the kvm set with vfio device file, kvm pointer is stored in struct
vfio_device_file, and use kvm_ref_lock to protect kvm set and kvm
pointer usage within VFIO. This kvm pointer will be set to vfio_device
after device file is bound to iommufd in the cdev path.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/vfio.h      |  2 ++
 drivers/vfio/vfio_main.c | 36 +++++++++++++++++++++++++++++++++++-
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index b1e327a85a32..69e1a0692b06 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -18,6 +18,8 @@ struct vfio_container;
 
 struct vfio_device_file {
 	struct vfio_device *device;
+	spinlock_t kvm_ref_lock; /* protect kvm field */
+	struct kvm *kvm;
 };
 
 void vfio_device_put_registration(struct vfio_device *device);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 4665791aa2eb..8ef9210ad2aa 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -429,6 +429,7 @@ vfio_allocate_device_file(struct vfio_device *device)
 		return ERR_PTR(-ENOMEM);
 
 	df->device = device;
+	spin_lock_init(&df->kvm_ref_lock);
 
 	return df;
 }
@@ -1190,13 +1191,23 @@ const struct file_operations vfio_device_fops = {
 	.mmap		= vfio_device_fops_mmap,
 };
 
+static struct vfio_device *vfio_device_from_file(struct file *file)
+{
+	struct vfio_device_file *df = file->private_data;
+
+	if (file->f_op != &vfio_device_fops)
+		return NULL;
+	return df->device;
+}
+
 /**
  * vfio_file_is_valid - True if the file is valid vfio file
  * @file: VFIO group file or VFIO device file
  */
 bool vfio_file_is_valid(struct file *file)
 {
-	return vfio_group_from_file(file);
+	return vfio_group_from_file(file) ||
+	       vfio_device_from_file(file);
 }
 EXPORT_SYMBOL_GPL(vfio_file_is_valid);
 
@@ -1211,16 +1222,36 @@ EXPORT_SYMBOL_GPL(vfio_file_is_valid);
  */
 bool vfio_file_enforced_coherent(struct file *file)
 {
+	struct vfio_device *device;
 	struct vfio_group *group;
 
 	group = vfio_group_from_file(file);
 	if (group)
 		return vfio_group_enforced_coherent(group);
 
+	device = vfio_device_from_file(file);
+	if (device)
+		return device_iommu_capable(device->dev,
+					    IOMMU_CAP_ENFORCE_CACHE_COHERENCY);
+
 	return true;
 }
 EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
 
+static void vfio_device_file_set_kvm(struct file *file, struct kvm *kvm)
+{
+	struct vfio_device_file *df = file->private_data;
+
+	/*
+	 * The kvm is first recorded in the vfio_device_file, and will
+	 * be propagated to vfio_device::kvm when the file is bound to
+	 * iommufd successfully in the vfio device cdev path.
+	 */
+	spin_lock(&df->kvm_ref_lock);
+	df->kvm = kvm;
+	spin_unlock(&df->kvm_ref_lock);
+}
+
 /**
  * vfio_file_set_kvm - Link a kvm with VFIO drivers
  * @file: VFIO group file or VFIO device file
@@ -1236,6 +1267,9 @@ void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
 	group = vfio_group_from_file(file);
 	if (group)
 		vfio_group_set_kvm(group, kvm);
+
+	if (vfio_device_from_file(file))
+		vfio_device_file_set_kvm(file, kvm);
 }
 EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH v12 04/24] kvm/vfio: Prepare for accepting vfio device fd
  2023-06-02 12:16 ` [Intel-gfx] " Yi Liu
@ 2023-06-02 12:16   ` Yi Liu
  -1 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

This renames kvm_vfio_group related helpers to prepare for accepting
vfio device fd. No functional change is intended.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 virt/kvm/vfio.c | 115 ++++++++++++++++++++++++------------------------
 1 file changed, 58 insertions(+), 57 deletions(-)

diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index b33c7b8488b3..8f7fa07e8170 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -21,7 +21,7 @@
 #include <asm/kvm_ppc.h>
 #endif
 
-struct kvm_vfio_group {
+struct kvm_vfio_file {
 	struct list_head node;
 	struct file *file;
 #ifdef CONFIG_SPAPR_TCE_IOMMU
@@ -30,7 +30,7 @@ struct kvm_vfio_group {
 };
 
 struct kvm_vfio {
-	struct list_head group_list;
+	struct list_head file_list;
 	struct mutex lock;
 	bool noncoherent;
 };
@@ -98,34 +98,35 @@ static struct iommu_group *kvm_vfio_file_iommu_group(struct file *file)
 }
 
 static void kvm_spapr_tce_release_vfio_group(struct kvm *kvm,
-					     struct kvm_vfio_group *kvg)
+					     struct kvm_vfio_file *kvf)
 {
-	if (WARN_ON_ONCE(!kvg->iommu_group))
+	if (WARN_ON_ONCE(!kvf->iommu_group))
 		return;
 
-	kvm_spapr_tce_release_iommu_group(kvm, kvg->iommu_group);
-	iommu_group_put(kvg->iommu_group);
-	kvg->iommu_group = NULL;
+	kvm_spapr_tce_release_iommu_group(kvm, kvf->iommu_group);
+	iommu_group_put(kvf->iommu_group);
+	kvf->iommu_group = NULL;
 }
 #endif
 
 /*
- * Groups can use the same or different IOMMU domains.  If the same then
- * adding a new group may change the coherency of groups we've previously
- * been told about.  We don't want to care about any of that so we retest
- * each group and bail as soon as we find one that's noncoherent.  This
- * means we only ever [un]register_noncoherent_dma once for the whole device.
+ * Groups/devices can use the same or different IOMMU domains. If the same
+ * then adding a new group/device may change the coherency of groups/devices
+ * we've previously been told about. We don't want to care about any of
+ * that so we retest each group/device and bail as soon as we find one that's
+ * noncoherent.  This means we only ever [un]register_noncoherent_dma once
+ * for the whole device.
  */
 static void kvm_vfio_update_coherency(struct kvm_device *dev)
 {
 	struct kvm_vfio *kv = dev->private;
 	bool noncoherent = false;
-	struct kvm_vfio_group *kvg;
+	struct kvm_vfio_file *kvf;
 
 	mutex_lock(&kv->lock);
 
-	list_for_each_entry(kvg, &kv->group_list, node) {
-		if (!kvm_vfio_file_enforced_coherent(kvg->file)) {
+	list_for_each_entry(kvf, &kv->file_list, node) {
+		if (!kvm_vfio_file_enforced_coherent(kvf->file)) {
 			noncoherent = true;
 			break;
 		}
@@ -143,10 +144,10 @@ static void kvm_vfio_update_coherency(struct kvm_device *dev)
 	mutex_unlock(&kv->lock);
 }
 
-static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
+static int kvm_vfio_file_add(struct kvm_device *dev, unsigned int fd)
 {
 	struct kvm_vfio *kv = dev->private;
-	struct kvm_vfio_group *kvg;
+	struct kvm_vfio_file *kvf;
 	struct file *filp;
 	int ret;
 
@@ -162,27 +163,27 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
 
 	mutex_lock(&kv->lock);
 
-	list_for_each_entry(kvg, &kv->group_list, node) {
-		if (kvg->file == filp) {
+	list_for_each_entry(kvf, &kv->file_list, node) {
+		if (kvf->file == filp) {
 			ret = -EEXIST;
 			goto err_unlock;
 		}
 	}
 
-	kvg = kzalloc(sizeof(*kvg), GFP_KERNEL_ACCOUNT);
-	if (!kvg) {
+	kvf = kzalloc(sizeof(*kvf), GFP_KERNEL_ACCOUNT);
+	if (!kvf) {
 		ret = -ENOMEM;
 		goto err_unlock;
 	}
 
-	kvg->file = filp;
-	list_add_tail(&kvg->node, &kv->group_list);
+	kvf->file = filp;
+	list_add_tail(&kvf->node, &kv->file_list);
 
 	kvm_arch_start_assignment(dev->kvm);
 
 	mutex_unlock(&kv->lock);
 
-	kvm_vfio_file_set_kvm(kvg->file, dev->kvm);
+	kvm_vfio_file_set_kvm(kvf->file, dev->kvm);
 	kvm_vfio_update_coherency(dev);
 
 	return 0;
@@ -193,10 +194,10 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
 	return ret;
 }
 
-static int kvm_vfio_group_del(struct kvm_device *dev, unsigned int fd)
+static int kvm_vfio_file_del(struct kvm_device *dev, unsigned int fd)
 {
 	struct kvm_vfio *kv = dev->private;
-	struct kvm_vfio_group *kvg;
+	struct kvm_vfio_file *kvf;
 	struct fd f;
 	int ret;
 
@@ -208,18 +209,18 @@ static int kvm_vfio_group_del(struct kvm_device *dev, unsigned int fd)
 
 	mutex_lock(&kv->lock);
 
-	list_for_each_entry(kvg, &kv->group_list, node) {
-		if (kvg->file != f.file)
+	list_for_each_entry(kvf, &kv->file_list, node) {
+		if (kvf->file != f.file)
 			continue;
 
-		list_del(&kvg->node);
+		list_del(&kvf->node);
 		kvm_arch_end_assignment(dev->kvm);
 #ifdef CONFIG_SPAPR_TCE_IOMMU
-		kvm_spapr_tce_release_vfio_group(dev->kvm, kvg);
+		kvm_spapr_tce_release_vfio_group(dev->kvm, kvf);
 #endif
-		kvm_vfio_file_set_kvm(kvg->file, NULL);
-		fput(kvg->file);
-		kfree(kvg);
+		kvm_vfio_file_set_kvm(kvf->file, NULL);
+		fput(kvf->file);
+		kfree(kvf);
 		ret = 0;
 		break;
 	}
@@ -234,12 +235,12 @@ static int kvm_vfio_group_del(struct kvm_device *dev, unsigned int fd)
 }
 
 #ifdef CONFIG_SPAPR_TCE_IOMMU
-static int kvm_vfio_group_set_spapr_tce(struct kvm_device *dev,
-					void __user *arg)
+static int kvm_vfio_file_set_spapr_tce(struct kvm_device *dev,
+				       void __user *arg)
 {
 	struct kvm_vfio_spapr_tce param;
 	struct kvm_vfio *kv = dev->private;
-	struct kvm_vfio_group *kvg;
+	struct kvm_vfio_file *kvf;
 	struct fd f;
 	int ret;
 
@@ -254,20 +255,20 @@ static int kvm_vfio_group_set_spapr_tce(struct kvm_device *dev,
 
 	mutex_lock(&kv->lock);
 
-	list_for_each_entry(kvg, &kv->group_list, node) {
-		if (kvg->file != f.file)
+	list_for_each_entry(kvf, &kv->file_list, node) {
+		if (kvf->file != f.file)
 			continue;
 
-		if (!kvg->iommu_group) {
-			kvg->iommu_group = kvm_vfio_file_iommu_group(kvg->file);
-			if (WARN_ON_ONCE(!kvg->iommu_group)) {
+		if (!kvf->iommu_group) {
+			kvf->iommu_group = kvm_vfio_file_iommu_group(kvf->file);
+			if (WARN_ON_ONCE(!kvf->iommu_group)) {
 				ret = -EIO;
 				goto err_fdput;
 			}
 		}
 
 		ret = kvm_spapr_tce_attach_iommu_group(dev->kvm, param.tablefd,
-						       kvg->iommu_group);
+						       kvf->iommu_group);
 		break;
 	}
 
@@ -278,8 +279,8 @@ static int kvm_vfio_group_set_spapr_tce(struct kvm_device *dev,
 }
 #endif
 
-static int kvm_vfio_set_group(struct kvm_device *dev, long attr,
-			      void __user *arg)
+static int kvm_vfio_set_file(struct kvm_device *dev, long attr,
+			     void __user *arg)
 {
 	int32_t __user *argp = arg;
 	int32_t fd;
@@ -288,16 +289,16 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr,
 	case KVM_DEV_VFIO_GROUP_ADD:
 		if (get_user(fd, argp))
 			return -EFAULT;
-		return kvm_vfio_group_add(dev, fd);
+		return kvm_vfio_file_add(dev, fd);
 
 	case KVM_DEV_VFIO_GROUP_DEL:
 		if (get_user(fd, argp))
 			return -EFAULT;
-		return kvm_vfio_group_del(dev, fd);
+		return kvm_vfio_file_del(dev, fd);
 
 #ifdef CONFIG_SPAPR_TCE_IOMMU
 	case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE:
-		return kvm_vfio_group_set_spapr_tce(dev, arg);
+		return kvm_vfio_file_set_spapr_tce(dev, arg);
 #endif
 	}
 
@@ -309,8 +310,8 @@ static int kvm_vfio_set_attr(struct kvm_device *dev,
 {
 	switch (attr->group) {
 	case KVM_DEV_VFIO_GROUP:
-		return kvm_vfio_set_group(dev, attr->attr,
-					  u64_to_user_ptr(attr->addr));
+		return kvm_vfio_set_file(dev, attr->attr,
+					 u64_to_user_ptr(attr->addr));
 	}
 
 	return -ENXIO;
@@ -339,16 +340,16 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
 static void kvm_vfio_release(struct kvm_device *dev)
 {
 	struct kvm_vfio *kv = dev->private;
-	struct kvm_vfio_group *kvg, *tmp;
+	struct kvm_vfio_file *kvf, *tmp;
 
-	list_for_each_entry_safe(kvg, tmp, &kv->group_list, node) {
+	list_for_each_entry_safe(kvf, tmp, &kv->file_list, node) {
 #ifdef CONFIG_SPAPR_TCE_IOMMU
-		kvm_spapr_tce_release_vfio_group(dev->kvm, kvg);
+		kvm_spapr_tce_release_vfio_group(dev->kvm, kvf);
 #endif
-		kvm_vfio_file_set_kvm(kvg->file, NULL);
-		fput(kvg->file);
-		list_del(&kvg->node);
-		kfree(kvg);
+		kvm_vfio_file_set_kvm(kvf->file, NULL);
+		fput(kvf->file);
+		list_del(&kvf->node);
+		kfree(kvf);
 		kvm_arch_end_assignment(dev->kvm);
 	}
 
@@ -382,7 +383,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
 	if (!kv)
 		return -ENOMEM;
 
-	INIT_LIST_HEAD(&kv->group_list);
+	INIT_LIST_HEAD(&kv->file_list);
 	mutex_init(&kv->lock);
 
 	dev->private = kv;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [Intel-gfx] [PATCH v12 04/24] kvm/vfio: Prepare for accepting vfio device fd
@ 2023-06-02 12:16   ` Yi Liu
  0 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

This renames kvm_vfio_group related helpers to prepare for accepting
vfio device fd. No functional change is intended.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 virt/kvm/vfio.c | 115 ++++++++++++++++++++++++------------------------
 1 file changed, 58 insertions(+), 57 deletions(-)

diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index b33c7b8488b3..8f7fa07e8170 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -21,7 +21,7 @@
 #include <asm/kvm_ppc.h>
 #endif
 
-struct kvm_vfio_group {
+struct kvm_vfio_file {
 	struct list_head node;
 	struct file *file;
 #ifdef CONFIG_SPAPR_TCE_IOMMU
@@ -30,7 +30,7 @@ struct kvm_vfio_group {
 };
 
 struct kvm_vfio {
-	struct list_head group_list;
+	struct list_head file_list;
 	struct mutex lock;
 	bool noncoherent;
 };
@@ -98,34 +98,35 @@ static struct iommu_group *kvm_vfio_file_iommu_group(struct file *file)
 }
 
 static void kvm_spapr_tce_release_vfio_group(struct kvm *kvm,
-					     struct kvm_vfio_group *kvg)
+					     struct kvm_vfio_file *kvf)
 {
-	if (WARN_ON_ONCE(!kvg->iommu_group))
+	if (WARN_ON_ONCE(!kvf->iommu_group))
 		return;
 
-	kvm_spapr_tce_release_iommu_group(kvm, kvg->iommu_group);
-	iommu_group_put(kvg->iommu_group);
-	kvg->iommu_group = NULL;
+	kvm_spapr_tce_release_iommu_group(kvm, kvf->iommu_group);
+	iommu_group_put(kvf->iommu_group);
+	kvf->iommu_group = NULL;
 }
 #endif
 
 /*
- * Groups can use the same or different IOMMU domains.  If the same then
- * adding a new group may change the coherency of groups we've previously
- * been told about.  We don't want to care about any of that so we retest
- * each group and bail as soon as we find one that's noncoherent.  This
- * means we only ever [un]register_noncoherent_dma once for the whole device.
+ * Groups/devices can use the same or different IOMMU domains. If the same
+ * then adding a new group/device may change the coherency of groups/devices
+ * we've previously been told about. We don't want to care about any of
+ * that so we retest each group/device and bail as soon as we find one that's
+ * noncoherent.  This means we only ever [un]register_noncoherent_dma once
+ * for the whole device.
  */
 static void kvm_vfio_update_coherency(struct kvm_device *dev)
 {
 	struct kvm_vfio *kv = dev->private;
 	bool noncoherent = false;
-	struct kvm_vfio_group *kvg;
+	struct kvm_vfio_file *kvf;
 
 	mutex_lock(&kv->lock);
 
-	list_for_each_entry(kvg, &kv->group_list, node) {
-		if (!kvm_vfio_file_enforced_coherent(kvg->file)) {
+	list_for_each_entry(kvf, &kv->file_list, node) {
+		if (!kvm_vfio_file_enforced_coherent(kvf->file)) {
 			noncoherent = true;
 			break;
 		}
@@ -143,10 +144,10 @@ static void kvm_vfio_update_coherency(struct kvm_device *dev)
 	mutex_unlock(&kv->lock);
 }
 
-static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
+static int kvm_vfio_file_add(struct kvm_device *dev, unsigned int fd)
 {
 	struct kvm_vfio *kv = dev->private;
-	struct kvm_vfio_group *kvg;
+	struct kvm_vfio_file *kvf;
 	struct file *filp;
 	int ret;
 
@@ -162,27 +163,27 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
 
 	mutex_lock(&kv->lock);
 
-	list_for_each_entry(kvg, &kv->group_list, node) {
-		if (kvg->file == filp) {
+	list_for_each_entry(kvf, &kv->file_list, node) {
+		if (kvf->file == filp) {
 			ret = -EEXIST;
 			goto err_unlock;
 		}
 	}
 
-	kvg = kzalloc(sizeof(*kvg), GFP_KERNEL_ACCOUNT);
-	if (!kvg) {
+	kvf = kzalloc(sizeof(*kvf), GFP_KERNEL_ACCOUNT);
+	if (!kvf) {
 		ret = -ENOMEM;
 		goto err_unlock;
 	}
 
-	kvg->file = filp;
-	list_add_tail(&kvg->node, &kv->group_list);
+	kvf->file = filp;
+	list_add_tail(&kvf->node, &kv->file_list);
 
 	kvm_arch_start_assignment(dev->kvm);
 
 	mutex_unlock(&kv->lock);
 
-	kvm_vfio_file_set_kvm(kvg->file, dev->kvm);
+	kvm_vfio_file_set_kvm(kvf->file, dev->kvm);
 	kvm_vfio_update_coherency(dev);
 
 	return 0;
@@ -193,10 +194,10 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
 	return ret;
 }
 
-static int kvm_vfio_group_del(struct kvm_device *dev, unsigned int fd)
+static int kvm_vfio_file_del(struct kvm_device *dev, unsigned int fd)
 {
 	struct kvm_vfio *kv = dev->private;
-	struct kvm_vfio_group *kvg;
+	struct kvm_vfio_file *kvf;
 	struct fd f;
 	int ret;
 
@@ -208,18 +209,18 @@ static int kvm_vfio_group_del(struct kvm_device *dev, unsigned int fd)
 
 	mutex_lock(&kv->lock);
 
-	list_for_each_entry(kvg, &kv->group_list, node) {
-		if (kvg->file != f.file)
+	list_for_each_entry(kvf, &kv->file_list, node) {
+		if (kvf->file != f.file)
 			continue;
 
-		list_del(&kvg->node);
+		list_del(&kvf->node);
 		kvm_arch_end_assignment(dev->kvm);
 #ifdef CONFIG_SPAPR_TCE_IOMMU
-		kvm_spapr_tce_release_vfio_group(dev->kvm, kvg);
+		kvm_spapr_tce_release_vfio_group(dev->kvm, kvf);
 #endif
-		kvm_vfio_file_set_kvm(kvg->file, NULL);
-		fput(kvg->file);
-		kfree(kvg);
+		kvm_vfio_file_set_kvm(kvf->file, NULL);
+		fput(kvf->file);
+		kfree(kvf);
 		ret = 0;
 		break;
 	}
@@ -234,12 +235,12 @@ static int kvm_vfio_group_del(struct kvm_device *dev, unsigned int fd)
 }
 
 #ifdef CONFIG_SPAPR_TCE_IOMMU
-static int kvm_vfio_group_set_spapr_tce(struct kvm_device *dev,
-					void __user *arg)
+static int kvm_vfio_file_set_spapr_tce(struct kvm_device *dev,
+				       void __user *arg)
 {
 	struct kvm_vfio_spapr_tce param;
 	struct kvm_vfio *kv = dev->private;
-	struct kvm_vfio_group *kvg;
+	struct kvm_vfio_file *kvf;
 	struct fd f;
 	int ret;
 
@@ -254,20 +255,20 @@ static int kvm_vfio_group_set_spapr_tce(struct kvm_device *dev,
 
 	mutex_lock(&kv->lock);
 
-	list_for_each_entry(kvg, &kv->group_list, node) {
-		if (kvg->file != f.file)
+	list_for_each_entry(kvf, &kv->file_list, node) {
+		if (kvf->file != f.file)
 			continue;
 
-		if (!kvg->iommu_group) {
-			kvg->iommu_group = kvm_vfio_file_iommu_group(kvg->file);
-			if (WARN_ON_ONCE(!kvg->iommu_group)) {
+		if (!kvf->iommu_group) {
+			kvf->iommu_group = kvm_vfio_file_iommu_group(kvf->file);
+			if (WARN_ON_ONCE(!kvf->iommu_group)) {
 				ret = -EIO;
 				goto err_fdput;
 			}
 		}
 
 		ret = kvm_spapr_tce_attach_iommu_group(dev->kvm, param.tablefd,
-						       kvg->iommu_group);
+						       kvf->iommu_group);
 		break;
 	}
 
@@ -278,8 +279,8 @@ static int kvm_vfio_group_set_spapr_tce(struct kvm_device *dev,
 }
 #endif
 
-static int kvm_vfio_set_group(struct kvm_device *dev, long attr,
-			      void __user *arg)
+static int kvm_vfio_set_file(struct kvm_device *dev, long attr,
+			     void __user *arg)
 {
 	int32_t __user *argp = arg;
 	int32_t fd;
@@ -288,16 +289,16 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr,
 	case KVM_DEV_VFIO_GROUP_ADD:
 		if (get_user(fd, argp))
 			return -EFAULT;
-		return kvm_vfio_group_add(dev, fd);
+		return kvm_vfio_file_add(dev, fd);
 
 	case KVM_DEV_VFIO_GROUP_DEL:
 		if (get_user(fd, argp))
 			return -EFAULT;
-		return kvm_vfio_group_del(dev, fd);
+		return kvm_vfio_file_del(dev, fd);
 
 #ifdef CONFIG_SPAPR_TCE_IOMMU
 	case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE:
-		return kvm_vfio_group_set_spapr_tce(dev, arg);
+		return kvm_vfio_file_set_spapr_tce(dev, arg);
 #endif
 	}
 
@@ -309,8 +310,8 @@ static int kvm_vfio_set_attr(struct kvm_device *dev,
 {
 	switch (attr->group) {
 	case KVM_DEV_VFIO_GROUP:
-		return kvm_vfio_set_group(dev, attr->attr,
-					  u64_to_user_ptr(attr->addr));
+		return kvm_vfio_set_file(dev, attr->attr,
+					 u64_to_user_ptr(attr->addr));
 	}
 
 	return -ENXIO;
@@ -339,16 +340,16 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
 static void kvm_vfio_release(struct kvm_device *dev)
 {
 	struct kvm_vfio *kv = dev->private;
-	struct kvm_vfio_group *kvg, *tmp;
+	struct kvm_vfio_file *kvf, *tmp;
 
-	list_for_each_entry_safe(kvg, tmp, &kv->group_list, node) {
+	list_for_each_entry_safe(kvf, tmp, &kv->file_list, node) {
 #ifdef CONFIG_SPAPR_TCE_IOMMU
-		kvm_spapr_tce_release_vfio_group(dev->kvm, kvg);
+		kvm_spapr_tce_release_vfio_group(dev->kvm, kvf);
 #endif
-		kvm_vfio_file_set_kvm(kvg->file, NULL);
-		fput(kvg->file);
-		list_del(&kvg->node);
-		kfree(kvg);
+		kvm_vfio_file_set_kvm(kvf->file, NULL);
+		fput(kvf->file);
+		list_del(&kvf->node);
+		kfree(kvf);
 		kvm_arch_end_assignment(dev->kvm);
 	}
 
@@ -382,7 +383,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
 	if (!kv)
 		return -ENOMEM;
 
-	INIT_LIST_HEAD(&kv->group_list);
+	INIT_LIST_HEAD(&kv->file_list);
 	mutex_init(&kv->lock);
 
 	dev->private = kv;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH v12 05/24] kvm/vfio: Accept vfio device file from userspace
  2023-06-02 12:16 ` [Intel-gfx] " Yi Liu
@ 2023-06-02 12:16   ` Yi Liu
  -1 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

This defines KVM_DEV_VFIO_FILE* and make alias with KVM_DEV_VFIO_GROUP*.
Old userspace uses KVM_DEV_VFIO_GROUP* works as well.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 Documentation/virt/kvm/devices/vfio.rst | 47 ++++++++++++++++---------
 include/uapi/linux/kvm.h                | 13 +++++--
 virt/kvm/vfio.c                         | 12 +++----
 3 files changed, 47 insertions(+), 25 deletions(-)

diff --git a/Documentation/virt/kvm/devices/vfio.rst b/Documentation/virt/kvm/devices/vfio.rst
index 08b544212638..c549143bb891 100644
--- a/Documentation/virt/kvm/devices/vfio.rst
+++ b/Documentation/virt/kvm/devices/vfio.rst
@@ -9,22 +9,34 @@ Device types supported:
   - KVM_DEV_TYPE_VFIO
 
 Only one VFIO instance may be created per VM.  The created device
-tracks VFIO groups in use by the VM and features of those groups
-important to the correctness and acceleration of the VM.  As groups
-are enabled and disabled for use by the VM, KVM should be updated
-about their presence.  When registered with KVM, a reference to the
-VFIO-group is held by KVM.
+tracks VFIO files (group or device) in use by the VM and features
+of those groups/devices important to the correctness and acceleration
+of the VM.  As groups/devices are enabled and disabled for use by the
+VM, KVM should be updated about their presence.  When registered with
+KVM, a reference to the VFIO file is held by KVM.
 
 Groups:
-  KVM_DEV_VFIO_GROUP
-
-KVM_DEV_VFIO_GROUP attributes:
-  KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
-	kvm_device_attr.addr points to an int32_t file descriptor
-	for the VFIO group.
-  KVM_DEV_VFIO_GROUP_DEL: Remove a VFIO group from VFIO-KVM device tracking
-	kvm_device_attr.addr points to an int32_t file descriptor
-	for the VFIO group.
+  KVM_DEV_VFIO_FILE
+	alias: KVM_DEV_VFIO_GROUP
+
+KVM_DEV_VFIO_FILE attributes:
+  KVM_DEV_VFIO_FILE_ADD: Add a VFIO file (group/device) to VFIO-KVM device
+	tracking
+
+	kvm_device_attr.addr points to an int32_t file descriptor for the
+	VFIO file.
+
+  KVM_DEV_VFIO_FILE_DEL: Remove a VFIO file (group/device) from VFIO-KVM
+	device tracking
+
+	kvm_device_attr.addr points to an int32_t file descriptor for the
+	VFIO file.
+
+KVM_DEV_VFIO_GROUP (legacy kvm device group restricted to the handling of VFIO group fd):
+  KVM_DEV_VFIO_GROUP_ADD: same as KVM_DEV_VFIO_FILE_ADD for group fd only
+
+  KVM_DEV_VFIO_GROUP_DEL: same as KVM_DEV_VFIO_FILE_DEL for group fd only
+
   KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE: attaches a guest visible TCE table
 	allocated by sPAPR KVM.
 	kvm_device_attr.addr points to a struct::
@@ -40,7 +52,10 @@ KVM_DEV_VFIO_GROUP attributes:
 	- @tablefd is a file descriptor for a TCE table allocated via
 	  KVM_CREATE_SPAPR_TCE.
 
-The GROUP_ADD operation above should be invoked prior to accessing the
+The FILE/GROUP_ADD operation above should be invoked prior to accessing the
 device file descriptor via VFIO_GROUP_GET_DEVICE_FD in order to support
 drivers which require a kvm pointer to be set in their .open_device()
-callback.
+callback.  It is the same for device file descriptor via character device
+open which gets device access via VFIO_DEVICE_BIND_IOMMUFD.  For such file
+descriptors, FILE_ADD should be invoked before VFIO_DEVICE_BIND_IOMMUFD
+to support the drivers mentioned in prior sentence as well.
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 737318b1c1d9..0423af6161e1 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1416,9 +1416,16 @@ struct kvm_device_attr {
 	__u64	addr;		/* userspace address of attr data */
 };
 
-#define  KVM_DEV_VFIO_GROUP			1
-#define   KVM_DEV_VFIO_GROUP_ADD			1
-#define   KVM_DEV_VFIO_GROUP_DEL			2
+#define  KVM_DEV_VFIO_FILE			1
+
+#define   KVM_DEV_VFIO_FILE_ADD			1
+#define   KVM_DEV_VFIO_FILE_DEL			2
+
+/* KVM_DEV_VFIO_GROUP aliases are for compile time uapi compatibility */
+#define  KVM_DEV_VFIO_GROUP	KVM_DEV_VFIO_FILE
+
+#define   KVM_DEV_VFIO_GROUP_ADD	KVM_DEV_VFIO_FILE_ADD
+#define   KVM_DEV_VFIO_GROUP_DEL	KVM_DEV_VFIO_FILE_DEL
 #define   KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE		3
 
 enum kvm_device_type {
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 8f7fa07e8170..07cb5f44b2a2 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -286,12 +286,12 @@ static int kvm_vfio_set_file(struct kvm_device *dev, long attr,
 	int32_t fd;
 
 	switch (attr) {
-	case KVM_DEV_VFIO_GROUP_ADD:
+	case KVM_DEV_VFIO_FILE_ADD:
 		if (get_user(fd, argp))
 			return -EFAULT;
 		return kvm_vfio_file_add(dev, fd);
 
-	case KVM_DEV_VFIO_GROUP_DEL:
+	case KVM_DEV_VFIO_FILE_DEL:
 		if (get_user(fd, argp))
 			return -EFAULT;
 		return kvm_vfio_file_del(dev, fd);
@@ -309,7 +309,7 @@ static int kvm_vfio_set_attr(struct kvm_device *dev,
 			     struct kvm_device_attr *attr)
 {
 	switch (attr->group) {
-	case KVM_DEV_VFIO_GROUP:
+	case KVM_DEV_VFIO_FILE:
 		return kvm_vfio_set_file(dev, attr->attr,
 					 u64_to_user_ptr(attr->addr));
 	}
@@ -321,10 +321,10 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
 			     struct kvm_device_attr *attr)
 {
 	switch (attr->group) {
-	case KVM_DEV_VFIO_GROUP:
+	case KVM_DEV_VFIO_FILE:
 		switch (attr->attr) {
-		case KVM_DEV_VFIO_GROUP_ADD:
-		case KVM_DEV_VFIO_GROUP_DEL:
+		case KVM_DEV_VFIO_FILE_ADD:
+		case KVM_DEV_VFIO_FILE_DEL:
 #ifdef CONFIG_SPAPR_TCE_IOMMU
 		case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE:
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [Intel-gfx] [PATCH v12 05/24] kvm/vfio: Accept vfio device file from userspace
@ 2023-06-02 12:16   ` Yi Liu
  0 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

This defines KVM_DEV_VFIO_FILE* and make alias with KVM_DEV_VFIO_GROUP*.
Old userspace uses KVM_DEV_VFIO_GROUP* works as well.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 Documentation/virt/kvm/devices/vfio.rst | 47 ++++++++++++++++---------
 include/uapi/linux/kvm.h                | 13 +++++--
 virt/kvm/vfio.c                         | 12 +++----
 3 files changed, 47 insertions(+), 25 deletions(-)

diff --git a/Documentation/virt/kvm/devices/vfio.rst b/Documentation/virt/kvm/devices/vfio.rst
index 08b544212638..c549143bb891 100644
--- a/Documentation/virt/kvm/devices/vfio.rst
+++ b/Documentation/virt/kvm/devices/vfio.rst
@@ -9,22 +9,34 @@ Device types supported:
   - KVM_DEV_TYPE_VFIO
 
 Only one VFIO instance may be created per VM.  The created device
-tracks VFIO groups in use by the VM and features of those groups
-important to the correctness and acceleration of the VM.  As groups
-are enabled and disabled for use by the VM, KVM should be updated
-about their presence.  When registered with KVM, a reference to the
-VFIO-group is held by KVM.
+tracks VFIO files (group or device) in use by the VM and features
+of those groups/devices important to the correctness and acceleration
+of the VM.  As groups/devices are enabled and disabled for use by the
+VM, KVM should be updated about their presence.  When registered with
+KVM, a reference to the VFIO file is held by KVM.
 
 Groups:
-  KVM_DEV_VFIO_GROUP
-
-KVM_DEV_VFIO_GROUP attributes:
-  KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
-	kvm_device_attr.addr points to an int32_t file descriptor
-	for the VFIO group.
-  KVM_DEV_VFIO_GROUP_DEL: Remove a VFIO group from VFIO-KVM device tracking
-	kvm_device_attr.addr points to an int32_t file descriptor
-	for the VFIO group.
+  KVM_DEV_VFIO_FILE
+	alias: KVM_DEV_VFIO_GROUP
+
+KVM_DEV_VFIO_FILE attributes:
+  KVM_DEV_VFIO_FILE_ADD: Add a VFIO file (group/device) to VFIO-KVM device
+	tracking
+
+	kvm_device_attr.addr points to an int32_t file descriptor for the
+	VFIO file.
+
+  KVM_DEV_VFIO_FILE_DEL: Remove a VFIO file (group/device) from VFIO-KVM
+	device tracking
+
+	kvm_device_attr.addr points to an int32_t file descriptor for the
+	VFIO file.
+
+KVM_DEV_VFIO_GROUP (legacy kvm device group restricted to the handling of VFIO group fd):
+  KVM_DEV_VFIO_GROUP_ADD: same as KVM_DEV_VFIO_FILE_ADD for group fd only
+
+  KVM_DEV_VFIO_GROUP_DEL: same as KVM_DEV_VFIO_FILE_DEL for group fd only
+
   KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE: attaches a guest visible TCE table
 	allocated by sPAPR KVM.
 	kvm_device_attr.addr points to a struct::
@@ -40,7 +52,10 @@ KVM_DEV_VFIO_GROUP attributes:
 	- @tablefd is a file descriptor for a TCE table allocated via
 	  KVM_CREATE_SPAPR_TCE.
 
-The GROUP_ADD operation above should be invoked prior to accessing the
+The FILE/GROUP_ADD operation above should be invoked prior to accessing the
 device file descriptor via VFIO_GROUP_GET_DEVICE_FD in order to support
 drivers which require a kvm pointer to be set in their .open_device()
-callback.
+callback.  It is the same for device file descriptor via character device
+open which gets device access via VFIO_DEVICE_BIND_IOMMUFD.  For such file
+descriptors, FILE_ADD should be invoked before VFIO_DEVICE_BIND_IOMMUFD
+to support the drivers mentioned in prior sentence as well.
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 737318b1c1d9..0423af6161e1 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1416,9 +1416,16 @@ struct kvm_device_attr {
 	__u64	addr;		/* userspace address of attr data */
 };
 
-#define  KVM_DEV_VFIO_GROUP			1
-#define   KVM_DEV_VFIO_GROUP_ADD			1
-#define   KVM_DEV_VFIO_GROUP_DEL			2
+#define  KVM_DEV_VFIO_FILE			1
+
+#define   KVM_DEV_VFIO_FILE_ADD			1
+#define   KVM_DEV_VFIO_FILE_DEL			2
+
+/* KVM_DEV_VFIO_GROUP aliases are for compile time uapi compatibility */
+#define  KVM_DEV_VFIO_GROUP	KVM_DEV_VFIO_FILE
+
+#define   KVM_DEV_VFIO_GROUP_ADD	KVM_DEV_VFIO_FILE_ADD
+#define   KVM_DEV_VFIO_GROUP_DEL	KVM_DEV_VFIO_FILE_DEL
 #define   KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE		3
 
 enum kvm_device_type {
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 8f7fa07e8170..07cb5f44b2a2 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -286,12 +286,12 @@ static int kvm_vfio_set_file(struct kvm_device *dev, long attr,
 	int32_t fd;
 
 	switch (attr) {
-	case KVM_DEV_VFIO_GROUP_ADD:
+	case KVM_DEV_VFIO_FILE_ADD:
 		if (get_user(fd, argp))
 			return -EFAULT;
 		return kvm_vfio_file_add(dev, fd);
 
-	case KVM_DEV_VFIO_GROUP_DEL:
+	case KVM_DEV_VFIO_FILE_DEL:
 		if (get_user(fd, argp))
 			return -EFAULT;
 		return kvm_vfio_file_del(dev, fd);
@@ -309,7 +309,7 @@ static int kvm_vfio_set_attr(struct kvm_device *dev,
 			     struct kvm_device_attr *attr)
 {
 	switch (attr->group) {
-	case KVM_DEV_VFIO_GROUP:
+	case KVM_DEV_VFIO_FILE:
 		return kvm_vfio_set_file(dev, attr->attr,
 					 u64_to_user_ptr(attr->addr));
 	}
@@ -321,10 +321,10 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
 			     struct kvm_device_attr *attr)
 {
 	switch (attr->group) {
-	case KVM_DEV_VFIO_GROUP:
+	case KVM_DEV_VFIO_FILE:
 		switch (attr->attr) {
-		case KVM_DEV_VFIO_GROUP_ADD:
-		case KVM_DEV_VFIO_GROUP_DEL:
+		case KVM_DEV_VFIO_FILE_ADD:
+		case KVM_DEV_VFIO_FILE_DEL:
 #ifdef CONFIG_SPAPR_TCE_IOMMU
 		case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE:
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH v12 06/24] vfio: Pass struct vfio_device_file * to vfio_device_open/close()
  2023-06-02 12:16 ` [Intel-gfx] " Yi Liu
@ 2023-06-02 12:16   ` Yi Liu
  -1 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

This avoids passing too much parameters in multiple functions. Per the
input parameter change, rename the function to be vfio_df_open/close().

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c     | 20 ++++++++++++++------
 drivers/vfio/vfio.h      |  8 ++++----
 drivers/vfio/vfio_main.c | 25 +++++++++++++++----------
 3 files changed, 33 insertions(+), 20 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index b56e19d2a02d..caf53716ddb2 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -169,8 +169,9 @@ static void vfio_device_group_get_kvm_safe(struct vfio_device *device)
 	spin_unlock(&device->group->kvm_ref_lock);
 }
 
-static int vfio_device_group_open(struct vfio_device *device)
+static int vfio_df_group_open(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
 	int ret;
 
 	mutex_lock(&device->group->group_lock);
@@ -190,7 +191,11 @@ static int vfio_device_group_open(struct vfio_device *device)
 	if (device->open_count == 0)
 		vfio_device_group_get_kvm_safe(device);
 
-	ret = vfio_device_open(device, device->group->iommufd);
+	df->iommufd = device->group->iommufd;
+
+	ret = vfio_df_open(df);
+	if (ret)
+		df->iommufd = NULL;
 
 	if (device->open_count == 0)
 		vfio_device_put_kvm(device);
@@ -202,12 +207,15 @@ static int vfio_device_group_open(struct vfio_device *device)
 	return ret;
 }
 
-void vfio_device_group_close(struct vfio_device *device)
+void vfio_df_group_close(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
+
 	mutex_lock(&device->group->group_lock);
 	mutex_lock(&device->dev_set->lock);
 
-	vfio_device_close(device, device->group->iommufd);
+	vfio_df_close(df);
+	df->iommufd = NULL;
 
 	if (device->open_count == 0)
 		vfio_device_put_kvm(device);
@@ -228,7 +236,7 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
 		goto err_out;
 	}
 
-	ret = vfio_device_group_open(device);
+	ret = vfio_df_group_open(df);
 	if (ret)
 		goto err_free;
 
@@ -260,7 +268,7 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
 	return filep;
 
 err_close_device:
-	vfio_device_group_close(device);
+	vfio_df_group_close(df);
 err_free:
 	kfree(df);
 err_out:
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 69e1a0692b06..f9eb52eb9ed7 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -20,13 +20,13 @@ struct vfio_device_file {
 	struct vfio_device *device;
 	spinlock_t kvm_ref_lock; /* protect kvm field */
 	struct kvm *kvm;
+	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
 };
 
 void vfio_device_put_registration(struct vfio_device *device);
 bool vfio_device_try_get_registration(struct vfio_device *device);
-int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd);
-void vfio_device_close(struct vfio_device *device,
-		       struct iommufd_ctx *iommufd);
+int vfio_df_open(struct vfio_device_file *df);
+void vfio_df_close(struct vfio_device_file *df);
 struct vfio_device_file *
 vfio_allocate_device_file(struct vfio_device *device);
 
@@ -91,7 +91,7 @@ void vfio_device_group_register(struct vfio_device *device);
 void vfio_device_group_unregister(struct vfio_device *device);
 int vfio_device_group_use_iommu(struct vfio_device *device);
 void vfio_device_group_unuse_iommu(struct vfio_device *device);
-void vfio_device_group_close(struct vfio_device *device);
+void vfio_df_group_close(struct vfio_device_file *df);
 struct vfio_group *vfio_group_from_file(struct file *file);
 bool vfio_group_enforced_coherent(struct vfio_group *group);
 void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 8ef9210ad2aa..a3c5817fc545 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -434,9 +434,10 @@ vfio_allocate_device_file(struct vfio_device *device)
 	return df;
 }
 
-static int vfio_device_first_open(struct vfio_device *device,
-				  struct iommufd_ctx *iommufd)
+static int vfio_device_first_open(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
+	struct iommufd_ctx *iommufd = df->iommufd;
 	int ret;
 
 	lockdep_assert_held(&device->dev_set->lock);
@@ -468,9 +469,11 @@ static int vfio_device_first_open(struct vfio_device *device,
 	return ret;
 }
 
-static void vfio_device_last_close(struct vfio_device *device,
-				   struct iommufd_ctx *iommufd)
+static void vfio_device_last_close(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
+	struct iommufd_ctx *iommufd = df->iommufd;
+
 	lockdep_assert_held(&device->dev_set->lock);
 
 	if (device->ops->close_device)
@@ -482,15 +485,16 @@ static void vfio_device_last_close(struct vfio_device *device,
 	module_put(device->dev->driver->owner);
 }
 
-int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd)
+int vfio_df_open(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
 	int ret = 0;
 
 	lockdep_assert_held(&device->dev_set->lock);
 
 	device->open_count++;
 	if (device->open_count == 1) {
-		ret = vfio_device_first_open(device, iommufd);
+		ret = vfio_device_first_open(df);
 		if (ret)
 			device->open_count--;
 	}
@@ -498,14 +502,15 @@ int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd)
 	return ret;
 }
 
-void vfio_device_close(struct vfio_device *device,
-		       struct iommufd_ctx *iommufd)
+void vfio_df_close(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
+
 	lockdep_assert_held(&device->dev_set->lock);
 
 	vfio_assert_device_open(device);
 	if (device->open_count == 1)
-		vfio_device_last_close(device, iommufd);
+		vfio_device_last_close(df);
 	device->open_count--;
 }
 
@@ -550,7 +555,7 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
-	vfio_device_group_close(device);
+	vfio_df_group_close(df);
 
 	vfio_device_put_registration(device);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [Intel-gfx] [PATCH v12 06/24] vfio: Pass struct vfio_device_file * to vfio_device_open/close()
@ 2023-06-02 12:16   ` Yi Liu
  0 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

This avoids passing too much parameters in multiple functions. Per the
input parameter change, rename the function to be vfio_df_open/close().

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c     | 20 ++++++++++++++------
 drivers/vfio/vfio.h      |  8 ++++----
 drivers/vfio/vfio_main.c | 25 +++++++++++++++----------
 3 files changed, 33 insertions(+), 20 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index b56e19d2a02d..caf53716ddb2 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -169,8 +169,9 @@ static void vfio_device_group_get_kvm_safe(struct vfio_device *device)
 	spin_unlock(&device->group->kvm_ref_lock);
 }
 
-static int vfio_device_group_open(struct vfio_device *device)
+static int vfio_df_group_open(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
 	int ret;
 
 	mutex_lock(&device->group->group_lock);
@@ -190,7 +191,11 @@ static int vfio_device_group_open(struct vfio_device *device)
 	if (device->open_count == 0)
 		vfio_device_group_get_kvm_safe(device);
 
-	ret = vfio_device_open(device, device->group->iommufd);
+	df->iommufd = device->group->iommufd;
+
+	ret = vfio_df_open(df);
+	if (ret)
+		df->iommufd = NULL;
 
 	if (device->open_count == 0)
 		vfio_device_put_kvm(device);
@@ -202,12 +207,15 @@ static int vfio_device_group_open(struct vfio_device *device)
 	return ret;
 }
 
-void vfio_device_group_close(struct vfio_device *device)
+void vfio_df_group_close(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
+
 	mutex_lock(&device->group->group_lock);
 	mutex_lock(&device->dev_set->lock);
 
-	vfio_device_close(device, device->group->iommufd);
+	vfio_df_close(df);
+	df->iommufd = NULL;
 
 	if (device->open_count == 0)
 		vfio_device_put_kvm(device);
@@ -228,7 +236,7 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
 		goto err_out;
 	}
 
-	ret = vfio_device_group_open(device);
+	ret = vfio_df_group_open(df);
 	if (ret)
 		goto err_free;
 
@@ -260,7 +268,7 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
 	return filep;
 
 err_close_device:
-	vfio_device_group_close(device);
+	vfio_df_group_close(df);
 err_free:
 	kfree(df);
 err_out:
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 69e1a0692b06..f9eb52eb9ed7 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -20,13 +20,13 @@ struct vfio_device_file {
 	struct vfio_device *device;
 	spinlock_t kvm_ref_lock; /* protect kvm field */
 	struct kvm *kvm;
+	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
 };
 
 void vfio_device_put_registration(struct vfio_device *device);
 bool vfio_device_try_get_registration(struct vfio_device *device);
-int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd);
-void vfio_device_close(struct vfio_device *device,
-		       struct iommufd_ctx *iommufd);
+int vfio_df_open(struct vfio_device_file *df);
+void vfio_df_close(struct vfio_device_file *df);
 struct vfio_device_file *
 vfio_allocate_device_file(struct vfio_device *device);
 
@@ -91,7 +91,7 @@ void vfio_device_group_register(struct vfio_device *device);
 void vfio_device_group_unregister(struct vfio_device *device);
 int vfio_device_group_use_iommu(struct vfio_device *device);
 void vfio_device_group_unuse_iommu(struct vfio_device *device);
-void vfio_device_group_close(struct vfio_device *device);
+void vfio_df_group_close(struct vfio_device_file *df);
 struct vfio_group *vfio_group_from_file(struct file *file);
 bool vfio_group_enforced_coherent(struct vfio_group *group);
 void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 8ef9210ad2aa..a3c5817fc545 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -434,9 +434,10 @@ vfio_allocate_device_file(struct vfio_device *device)
 	return df;
 }
 
-static int vfio_device_first_open(struct vfio_device *device,
-				  struct iommufd_ctx *iommufd)
+static int vfio_device_first_open(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
+	struct iommufd_ctx *iommufd = df->iommufd;
 	int ret;
 
 	lockdep_assert_held(&device->dev_set->lock);
@@ -468,9 +469,11 @@ static int vfio_device_first_open(struct vfio_device *device,
 	return ret;
 }
 
-static void vfio_device_last_close(struct vfio_device *device,
-				   struct iommufd_ctx *iommufd)
+static void vfio_device_last_close(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
+	struct iommufd_ctx *iommufd = df->iommufd;
+
 	lockdep_assert_held(&device->dev_set->lock);
 
 	if (device->ops->close_device)
@@ -482,15 +485,16 @@ static void vfio_device_last_close(struct vfio_device *device,
 	module_put(device->dev->driver->owner);
 }
 
-int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd)
+int vfio_df_open(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
 	int ret = 0;
 
 	lockdep_assert_held(&device->dev_set->lock);
 
 	device->open_count++;
 	if (device->open_count == 1) {
-		ret = vfio_device_first_open(device, iommufd);
+		ret = vfio_device_first_open(df);
 		if (ret)
 			device->open_count--;
 	}
@@ -498,14 +502,15 @@ int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd)
 	return ret;
 }
 
-void vfio_device_close(struct vfio_device *device,
-		       struct iommufd_ctx *iommufd)
+void vfio_df_close(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
+
 	lockdep_assert_held(&device->dev_set->lock);
 
 	vfio_assert_device_open(device);
 	if (device->open_count == 1)
-		vfio_device_last_close(device, iommufd);
+		vfio_device_last_close(df);
 	device->open_count--;
 }
 
@@ -550,7 +555,7 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
-	vfio_device_group_close(device);
+	vfio_df_group_close(df);
 
 	vfio_device_put_registration(device);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH v12 07/24] vfio: Block device access via device fd until device is opened
  2023-06-02 12:16 ` [Intel-gfx] " Yi Liu
@ 2023-06-02 12:16   ` Yi Liu
  -1 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

Allow the vfio_device file to be in a state where the device FD is
opened but the device cannot be used by userspace (i.e. its .open_device()
hasn't been called). This inbetween state is not used when the device
FD is spawned from the group FD, however when we create the device FD
directly by opening a cdev it will be opened in the blocked state.

The reason for the inbetween state is that userspace only gets a FD but
doesn't gain access permission until binding the FD to an iommufd. So in
the blocked state, only the bind operation is allowed. Completing bind
will allow user to further access the device.

This is implemented by adding a flag in struct vfio_device_file to mark
the blocked state and using a simple smp_load_acquire() to obtain the
flag value and serialize all the device setup with the thread accessing
this device.

Following this lockless scheme, it can safely handle the device FD
unbound->bound but it cannot handle bound->unbound. To allow this we'd
need to add a lock on all the vfio ioctls which seems costly. So once
device FD is bound, it remains bound until the FD is closed.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c     | 11 ++++++++++-
 drivers/vfio/vfio.h      |  1 +
 drivers/vfio/vfio_main.c | 16 ++++++++++++++++
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index caf53716ddb2..088dd34c8931 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -194,9 +194,18 @@ static int vfio_df_group_open(struct vfio_device_file *df)
 	df->iommufd = device->group->iommufd;
 
 	ret = vfio_df_open(df);
-	if (ret)
+	if (ret) {
 		df->iommufd = NULL;
+		goto out_put_kvm;
+	}
+
+	/*
+	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
+	 * read/write/mmap and vfio_file_has_device_access()
+	 */
+	smp_store_release(&df->access_granted, true);
 
+out_put_kvm:
 	if (device->open_count == 0)
 		vfio_device_put_kvm(device);
 
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index f9eb52eb9ed7..fdf2fc73f880 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -18,6 +18,7 @@ struct vfio_container;
 
 struct vfio_device_file {
 	struct vfio_device *device;
+	bool access_granted;
 	spinlock_t kvm_ref_lock; /* protect kvm field */
 	struct kvm *kvm;
 	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index a3c5817fc545..4c8b7713dc3d 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1129,6 +1129,10 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
 	struct vfio_device *device = df->device;
 	int ret;
 
+	/* Paired with smp_store_release() following vfio_df_open() */
+	if (!smp_load_acquire(&df->access_granted))
+		return -EINVAL;
+
 	ret = vfio_device_pm_runtime_get(device);
 	if (ret)
 		return ret;
@@ -1156,6 +1160,10 @@ static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf,
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
+	/* Paired with smp_store_release() following vfio_df_open() */
+	if (!smp_load_acquire(&df->access_granted))
+		return -EINVAL;
+
 	if (unlikely(!device->ops->read))
 		return -EINVAL;
 
@@ -1169,6 +1177,10 @@ static ssize_t vfio_device_fops_write(struct file *filep,
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
+	/* Paired with smp_store_release() following vfio_df_open() */
+	if (!smp_load_acquire(&df->access_granted))
+		return -EINVAL;
+
 	if (unlikely(!device->ops->write))
 		return -EINVAL;
 
@@ -1180,6 +1192,10 @@ static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
+	/* Paired with smp_store_release() following vfio_df_open() */
+	if (!smp_load_acquire(&df->access_granted))
+		return -EINVAL;
+
 	if (unlikely(!device->ops->mmap))
 		return -EINVAL;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [Intel-gfx] [PATCH v12 07/24] vfio: Block device access via device fd until device is opened
@ 2023-06-02 12:16   ` Yi Liu
  0 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

Allow the vfio_device file to be in a state where the device FD is
opened but the device cannot be used by userspace (i.e. its .open_device()
hasn't been called). This inbetween state is not used when the device
FD is spawned from the group FD, however when we create the device FD
directly by opening a cdev it will be opened in the blocked state.

The reason for the inbetween state is that userspace only gets a FD but
doesn't gain access permission until binding the FD to an iommufd. So in
the blocked state, only the bind operation is allowed. Completing bind
will allow user to further access the device.

This is implemented by adding a flag in struct vfio_device_file to mark
the blocked state and using a simple smp_load_acquire() to obtain the
flag value and serialize all the device setup with the thread accessing
this device.

Following this lockless scheme, it can safely handle the device FD
unbound->bound but it cannot handle bound->unbound. To allow this we'd
need to add a lock on all the vfio ioctls which seems costly. So once
device FD is bound, it remains bound until the FD is closed.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c     | 11 ++++++++++-
 drivers/vfio/vfio.h      |  1 +
 drivers/vfio/vfio_main.c | 16 ++++++++++++++++
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index caf53716ddb2..088dd34c8931 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -194,9 +194,18 @@ static int vfio_df_group_open(struct vfio_device_file *df)
 	df->iommufd = device->group->iommufd;
 
 	ret = vfio_df_open(df);
-	if (ret)
+	if (ret) {
 		df->iommufd = NULL;
+		goto out_put_kvm;
+	}
+
+	/*
+	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
+	 * read/write/mmap and vfio_file_has_device_access()
+	 */
+	smp_store_release(&df->access_granted, true);
 
+out_put_kvm:
 	if (device->open_count == 0)
 		vfio_device_put_kvm(device);
 
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index f9eb52eb9ed7..fdf2fc73f880 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -18,6 +18,7 @@ struct vfio_container;
 
 struct vfio_device_file {
 	struct vfio_device *device;
+	bool access_granted;
 	spinlock_t kvm_ref_lock; /* protect kvm field */
 	struct kvm *kvm;
 	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index a3c5817fc545..4c8b7713dc3d 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1129,6 +1129,10 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
 	struct vfio_device *device = df->device;
 	int ret;
 
+	/* Paired with smp_store_release() following vfio_df_open() */
+	if (!smp_load_acquire(&df->access_granted))
+		return -EINVAL;
+
 	ret = vfio_device_pm_runtime_get(device);
 	if (ret)
 		return ret;
@@ -1156,6 +1160,10 @@ static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf,
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
+	/* Paired with smp_store_release() following vfio_df_open() */
+	if (!smp_load_acquire(&df->access_granted))
+		return -EINVAL;
+
 	if (unlikely(!device->ops->read))
 		return -EINVAL;
 
@@ -1169,6 +1177,10 @@ static ssize_t vfio_device_fops_write(struct file *filep,
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
+	/* Paired with smp_store_release() following vfio_df_open() */
+	if (!smp_load_acquire(&df->access_granted))
+		return -EINVAL;
+
 	if (unlikely(!device->ops->write))
 		return -EINVAL;
 
@@ -1180,6 +1192,10 @@ static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
+	/* Paired with smp_store_release() following vfio_df_open() */
+	if (!smp_load_acquire(&df->access_granted))
+		return -EINVAL;
+
 	if (unlikely(!device->ops->mmap))
 		return -EINVAL;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH v12 08/24] vfio: Add cdev_device_open_cnt to vfio_group
  2023-06-02 12:16 ` [Intel-gfx] " Yi Liu
@ 2023-06-02 12:16   ` Yi Liu
  -1 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

This is for counting the devices that are opened via the cdev path. This
count is increased and decreased by the cdev path. The group path checks
it to achieve exclusion with the cdev path. With this, only one path
(group path or cdev path) will claim DMA ownership. This avoids scenarios
in which devices within the same group may be opened via different paths.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c | 33 +++++++++++++++++++++++++++++++++
 drivers/vfio/vfio.h  |  3 +++
 2 files changed, 36 insertions(+)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 088dd34c8931..2751d61689c4 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -383,6 +383,33 @@ static long vfio_group_fops_unl_ioctl(struct file *filep,
 	}
 }
 
+int vfio_device_block_group(struct vfio_device *device)
+{
+	struct vfio_group *group = device->group;
+	int ret = 0;
+
+	mutex_lock(&group->group_lock);
+	if (group->opened_file) {
+		ret = -EBUSY;
+		goto out_unlock;
+	}
+
+	group->cdev_device_open_cnt++;
+
+out_unlock:
+	mutex_unlock(&group->group_lock);
+	return ret;
+}
+
+void vfio_device_unblock_group(struct vfio_device *device)
+{
+	struct vfio_group *group = device->group;
+
+	mutex_lock(&group->group_lock);
+	group->cdev_device_open_cnt--;
+	mutex_unlock(&group->group_lock);
+}
+
 static int vfio_group_fops_open(struct inode *inode, struct file *filep)
 {
 	struct vfio_group *group =
@@ -405,6 +432,11 @@ static int vfio_group_fops_open(struct inode *inode, struct file *filep)
 		goto out_unlock;
 	}
 
+	if (group->cdev_device_open_cnt) {
+		ret = -EBUSY;
+		goto out_unlock;
+	}
+
 	/*
 	 * Do we need multiple instances of the group open?  Seems not.
 	 */
@@ -479,6 +511,7 @@ static void vfio_group_release(struct device *dev)
 	mutex_destroy(&group->device_lock);
 	mutex_destroy(&group->group_lock);
 	WARN_ON(group->iommu_group);
+	WARN_ON(group->cdev_device_open_cnt);
 	ida_free(&vfio.group_ida, MINOR(group->dev.devt));
 	kfree(group);
 }
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index fdf2fc73f880..de17bdd16df5 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -83,8 +83,11 @@ struct vfio_group {
 	struct blocking_notifier_head	notifier;
 	struct iommufd_ctx		*iommufd;
 	spinlock_t			kvm_ref_lock;
+	unsigned int			cdev_device_open_cnt;
 };
 
+int vfio_device_block_group(struct vfio_device *device);
+void vfio_device_unblock_group(struct vfio_device *device);
 int vfio_device_set_group(struct vfio_device *device,
 			  enum vfio_group_type type);
 void vfio_device_remove_group(struct vfio_device *device);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [Intel-gfx] [PATCH v12 08/24] vfio: Add cdev_device_open_cnt to vfio_group
@ 2023-06-02 12:16   ` Yi Liu
  0 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

This is for counting the devices that are opened via the cdev path. This
count is increased and decreased by the cdev path. The group path checks
it to achieve exclusion with the cdev path. With this, only one path
(group path or cdev path) will claim DMA ownership. This avoids scenarios
in which devices within the same group may be opened via different paths.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c | 33 +++++++++++++++++++++++++++++++++
 drivers/vfio/vfio.h  |  3 +++
 2 files changed, 36 insertions(+)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 088dd34c8931..2751d61689c4 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -383,6 +383,33 @@ static long vfio_group_fops_unl_ioctl(struct file *filep,
 	}
 }
 
+int vfio_device_block_group(struct vfio_device *device)
+{
+	struct vfio_group *group = device->group;
+	int ret = 0;
+
+	mutex_lock(&group->group_lock);
+	if (group->opened_file) {
+		ret = -EBUSY;
+		goto out_unlock;
+	}
+
+	group->cdev_device_open_cnt++;
+
+out_unlock:
+	mutex_unlock(&group->group_lock);
+	return ret;
+}
+
+void vfio_device_unblock_group(struct vfio_device *device)
+{
+	struct vfio_group *group = device->group;
+
+	mutex_lock(&group->group_lock);
+	group->cdev_device_open_cnt--;
+	mutex_unlock(&group->group_lock);
+}
+
 static int vfio_group_fops_open(struct inode *inode, struct file *filep)
 {
 	struct vfio_group *group =
@@ -405,6 +432,11 @@ static int vfio_group_fops_open(struct inode *inode, struct file *filep)
 		goto out_unlock;
 	}
 
+	if (group->cdev_device_open_cnt) {
+		ret = -EBUSY;
+		goto out_unlock;
+	}
+
 	/*
 	 * Do we need multiple instances of the group open?  Seems not.
 	 */
@@ -479,6 +511,7 @@ static void vfio_group_release(struct device *dev)
 	mutex_destroy(&group->device_lock);
 	mutex_destroy(&group->group_lock);
 	WARN_ON(group->iommu_group);
+	WARN_ON(group->cdev_device_open_cnt);
 	ida_free(&vfio.group_ida, MINOR(group->dev.devt));
 	kfree(group);
 }
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index fdf2fc73f880..de17bdd16df5 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -83,8 +83,11 @@ struct vfio_group {
 	struct blocking_notifier_head	notifier;
 	struct iommufd_ctx		*iommufd;
 	spinlock_t			kvm_ref_lock;
+	unsigned int			cdev_device_open_cnt;
 };
 
+int vfio_device_block_group(struct vfio_device *device);
+void vfio_device_unblock_group(struct vfio_device *device);
 int vfio_device_set_group(struct vfio_device *device,
 			  enum vfio_group_type type);
 void vfio_device_remove_group(struct vfio_device *device);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH v12 09/24] vfio: Make vfio_df_open() single open for device cdev path
  2023-06-02 12:16 ` [Intel-gfx] " Yi Liu
@ 2023-06-02 12:16   ` Yi Liu
  -1 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

VFIO group has historically allowed multi-open of the device FD. This
was made secure because the "open" was executed via an ioctl to the
group FD which is itself only single open.

However, no known use of multiple device FDs today. It is kind of a
strange thing to do because new device FDs can naturally be created
via dup().

When we implement the new device uAPI (only used in cdev path) there is
no natural way to allow the device itself from being multi-opened in a
secure manner. Without the group FD we cannot prove the security context
of the opener.

Thus, when moving to the new uAPI we block the ability of opening
a device multiple times. Given old group path still allows it we store
a vfio_group pointer in struct vfio_device_file to differentiate.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c     | 2 ++
 drivers/vfio/vfio.h      | 2 ++
 drivers/vfio/vfio_main.c | 7 +++++++
 3 files changed, 11 insertions(+)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 2751d61689c4..4e6277191eb4 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -245,6 +245,8 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
 		goto err_out;
 	}
 
+	df->group = device->group;
+
 	ret = vfio_df_group_open(df);
 	if (ret)
 		goto err_free;
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index de17bdd16df5..86e45ba18768 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -18,6 +18,8 @@ struct vfio_container;
 
 struct vfio_device_file {
 	struct vfio_device *device;
+	struct vfio_group *group;
+
 	bool access_granted;
 	spinlock_t kvm_ref_lock; /* protect kvm field */
 	struct kvm *kvm;
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 4c8b7713dc3d..01db017a0c3b 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -492,6 +492,13 @@ int vfio_df_open(struct vfio_device_file *df)
 
 	lockdep_assert_held(&device->dev_set->lock);
 
+	/*
+	 * Only the group path allows the device to be opened multiple
+	 * times.  The device cdev path doesn't have a secure way for it.
+	 */
+	if (device->open_count != 0 && !df->group)
+		return -EINVAL;
+
 	device->open_count++;
 	if (device->open_count == 1) {
 		ret = vfio_device_first_open(df);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [Intel-gfx] [PATCH v12 09/24] vfio: Make vfio_df_open() single open for device cdev path
@ 2023-06-02 12:16   ` Yi Liu
  0 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

VFIO group has historically allowed multi-open of the device FD. This
was made secure because the "open" was executed via an ioctl to the
group FD which is itself only single open.

However, no known use of multiple device FDs today. It is kind of a
strange thing to do because new device FDs can naturally be created
via dup().

When we implement the new device uAPI (only used in cdev path) there is
no natural way to allow the device itself from being multi-opened in a
secure manner. Without the group FD we cannot prove the security context
of the opener.

Thus, when moving to the new uAPI we block the ability of opening
a device multiple times. Given old group path still allows it we store
a vfio_group pointer in struct vfio_device_file to differentiate.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c     | 2 ++
 drivers/vfio/vfio.h      | 2 ++
 drivers/vfio/vfio_main.c | 7 +++++++
 3 files changed, 11 insertions(+)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 2751d61689c4..4e6277191eb4 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -245,6 +245,8 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
 		goto err_out;
 	}
 
+	df->group = device->group;
+
 	ret = vfio_df_group_open(df);
 	if (ret)
 		goto err_free;
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index de17bdd16df5..86e45ba18768 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -18,6 +18,8 @@ struct vfio_container;
 
 struct vfio_device_file {
 	struct vfio_device *device;
+	struct vfio_group *group;
+
 	bool access_granted;
 	spinlock_t kvm_ref_lock; /* protect kvm field */
 	struct kvm *kvm;
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 4c8b7713dc3d..01db017a0c3b 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -492,6 +492,13 @@ int vfio_df_open(struct vfio_device_file *df)
 
 	lockdep_assert_held(&device->dev_set->lock);
 
+	/*
+	 * Only the group path allows the device to be opened multiple
+	 * times.  The device cdev path doesn't have a secure way for it.
+	 */
+	if (device->open_count != 0 && !df->group)
+		return -EINVAL;
+
 	device->open_count++;
 	if (device->open_count == 1) {
 		ret = vfio_device_first_open(df);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH v12 10/24] vfio-iommufd: Move noiommu compat validation out of vfio_iommufd_bind()
  2023-06-02 12:16 ` [Intel-gfx] " Yi Liu
@ 2023-06-02 12:16   ` Yi Liu
  -1 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

This moves the noiommu compat validation logic into vfio_df_group_open().
This is more consistent with what will be done in vfio device cdev path.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c   | 13 +++++++++++++
 drivers/vfio/iommufd.c | 22 ++++++++--------------
 drivers/vfio/vfio.h    |  9 +++++++++
 3 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 4e6277191eb4..b8b77daf7aa6 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -192,6 +192,19 @@ static int vfio_df_group_open(struct vfio_device_file *df)
 		vfio_device_group_get_kvm_safe(device);
 
 	df->iommufd = device->group->iommufd;
+	if (df->iommufd && vfio_device_is_noiommu(device) && device->open_count == 0) {
+		/*
+		 * Require no compat ioas to be assigned to proceed.  The basic
+		 * statement is that the user cannot have done something that
+		 * implies they expected translation to exist
+		 */
+		if (!capable(CAP_SYS_RAWIO) ||
+		    vfio_iommufd_device_has_compat_ioas(device, df->iommufd))
+			ret = -EPERM;
+		else
+			ret = 0;
+		goto out_put_kvm;
+	}
 
 	ret = vfio_df_open(df);
 	if (ret) {
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index a04f3a493437..21237f5d0ffc 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -10,6 +10,14 @@
 MODULE_IMPORT_NS(IOMMUFD);
 MODULE_IMPORT_NS(IOMMUFD_VFIO);
 
+bool vfio_iommufd_device_has_compat_ioas(struct vfio_device *vdev,
+					 struct iommufd_ctx *ictx)
+{
+	u32 ioas_id;
+
+	return !iommufd_vfio_compat_ioas_get_id(ictx, &ioas_id);
+}
+
 int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
 {
 	u32 ioas_id;
@@ -18,20 +26,6 @@ int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
 
 	lockdep_assert_held(&vdev->dev_set->lock);
 
-	if (vfio_device_is_noiommu(vdev)) {
-		if (!capable(CAP_SYS_RAWIO))
-			return -EPERM;
-
-		/*
-		 * Require no compat ioas to be assigned to proceed. The basic
-		 * statement is that the user cannot have done something that
-		 * implies they expected translation to exist
-		 */
-		if (!iommufd_vfio_compat_ioas_get_id(ictx, &ioas_id))
-			return -EPERM;
-		return 0;
-	}
-
 	ret = vdev->ops->bind_iommufd(vdev, ictx, &device_id);
 	if (ret)
 		return ret;
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 86e45ba18768..76181d208bc1 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -234,9 +234,18 @@ static inline void vfio_container_cleanup(void)
 #endif
 
 #if IS_ENABLED(CONFIG_IOMMUFD)
+bool vfio_iommufd_device_has_compat_ioas(struct vfio_device *vdev,
+					 struct iommufd_ctx *ictx);
 int vfio_iommufd_bind(struct vfio_device *device, struct iommufd_ctx *ictx);
 void vfio_iommufd_unbind(struct vfio_device *device);
 #else
+static inline bool
+vfio_iommufd_device_has_compat_ioas(struct vfio_device *vdev,
+				    struct iommufd_ctx *ictx)
+{
+	return false;
+}
+
 static inline int vfio_iommufd_bind(struct vfio_device *device,
 				    struct iommufd_ctx *ictx)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [Intel-gfx] [PATCH v12 10/24] vfio-iommufd: Move noiommu compat validation out of vfio_iommufd_bind()
@ 2023-06-02 12:16   ` Yi Liu
  0 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

This moves the noiommu compat validation logic into vfio_df_group_open().
This is more consistent with what will be done in vfio device cdev path.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c   | 13 +++++++++++++
 drivers/vfio/iommufd.c | 22 ++++++++--------------
 drivers/vfio/vfio.h    |  9 +++++++++
 3 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 4e6277191eb4..b8b77daf7aa6 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -192,6 +192,19 @@ static int vfio_df_group_open(struct vfio_device_file *df)
 		vfio_device_group_get_kvm_safe(device);
 
 	df->iommufd = device->group->iommufd;
+	if (df->iommufd && vfio_device_is_noiommu(device) && device->open_count == 0) {
+		/*
+		 * Require no compat ioas to be assigned to proceed.  The basic
+		 * statement is that the user cannot have done something that
+		 * implies they expected translation to exist
+		 */
+		if (!capable(CAP_SYS_RAWIO) ||
+		    vfio_iommufd_device_has_compat_ioas(device, df->iommufd))
+			ret = -EPERM;
+		else
+			ret = 0;
+		goto out_put_kvm;
+	}
 
 	ret = vfio_df_open(df);
 	if (ret) {
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index a04f3a493437..21237f5d0ffc 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -10,6 +10,14 @@
 MODULE_IMPORT_NS(IOMMUFD);
 MODULE_IMPORT_NS(IOMMUFD_VFIO);
 
+bool vfio_iommufd_device_has_compat_ioas(struct vfio_device *vdev,
+					 struct iommufd_ctx *ictx)
+{
+	u32 ioas_id;
+
+	return !iommufd_vfio_compat_ioas_get_id(ictx, &ioas_id);
+}
+
 int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
 {
 	u32 ioas_id;
@@ -18,20 +26,6 @@ int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
 
 	lockdep_assert_held(&vdev->dev_set->lock);
 
-	if (vfio_device_is_noiommu(vdev)) {
-		if (!capable(CAP_SYS_RAWIO))
-			return -EPERM;
-
-		/*
-		 * Require no compat ioas to be assigned to proceed. The basic
-		 * statement is that the user cannot have done something that
-		 * implies they expected translation to exist
-		 */
-		if (!iommufd_vfio_compat_ioas_get_id(ictx, &ioas_id))
-			return -EPERM;
-		return 0;
-	}
-
 	ret = vdev->ops->bind_iommufd(vdev, ictx, &device_id);
 	if (ret)
 		return ret;
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 86e45ba18768..76181d208bc1 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -234,9 +234,18 @@ static inline void vfio_container_cleanup(void)
 #endif
 
 #if IS_ENABLED(CONFIG_IOMMUFD)
+bool vfio_iommufd_device_has_compat_ioas(struct vfio_device *vdev,
+					 struct iommufd_ctx *ictx);
 int vfio_iommufd_bind(struct vfio_device *device, struct iommufd_ctx *ictx);
 void vfio_iommufd_unbind(struct vfio_device *device);
 #else
+static inline bool
+vfio_iommufd_device_has_compat_ioas(struct vfio_device *vdev,
+				    struct iommufd_ctx *ictx)
+{
+	return false;
+}
+
 static inline int vfio_iommufd_bind(struct vfio_device *device,
 				    struct iommufd_ctx *ictx)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH v12 11/24] vfio-iommufd: Split bind/attach into two steps
  2023-06-02 12:16 ` [Intel-gfx] " Yi Liu
@ 2023-06-02 12:16   ` Yi Liu
  -1 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

This aligns the bind/attach logic with the coming vfio device cdev support.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c   | 17 +++++++++++++----
 drivers/vfio/iommufd.c | 35 +++++++++++++++++------------------
 drivers/vfio/vfio.h    |  9 +++++++++
 3 files changed, 39 insertions(+), 22 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index b8b77daf7aa6..41a09a2df690 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -207,9 +207,13 @@ static int vfio_df_group_open(struct vfio_device_file *df)
 	}
 
 	ret = vfio_df_open(df);
-	if (ret) {
-		df->iommufd = NULL;
+	if (ret)
 		goto out_put_kvm;
+
+	if (df->iommufd && device->open_count == 1) {
+		ret = vfio_iommufd_compat_attach_ioas(device, df->iommufd);
+		if (ret)
+			goto out_close_device;
 	}
 
 	/*
@@ -218,12 +222,17 @@ static int vfio_df_group_open(struct vfio_device_file *df)
 	 */
 	smp_store_release(&df->access_granted, true);
 
+	mutex_unlock(&device->dev_set->lock);
+	mutex_unlock(&device->group->group_lock);
+	return 0;
+
+out_close_device:
+	vfio_df_close(df);
 out_put_kvm:
+	df->iommufd = NULL;
 	if (device->open_count == 0)
 		vfio_device_put_kvm(device);
-
 	mutex_unlock(&device->dev_set->lock);
-
 out_unlock:
 	mutex_unlock(&device->group->group_lock);
 	return ret;
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index 21237f5d0ffc..b30f9aaae6e7 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -20,33 +20,32 @@ bool vfio_iommufd_device_has_compat_ioas(struct vfio_device *vdev,
 
 int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
 {
-	u32 ioas_id;
 	u32 device_id;
+
+	lockdep_assert_held(&vdev->dev_set->lock);
+
+	/* The legacy path has no way to return the device id */
+	return vdev->ops->bind_iommufd(vdev, ictx, &device_id);
+}
+
+int vfio_iommufd_compat_attach_ioas(struct vfio_device *vdev,
+				    struct iommufd_ctx *ictx)
+{
+	u32 ioas_id;
 	int ret;
 
 	lockdep_assert_held(&vdev->dev_set->lock);
 
-	ret = vdev->ops->bind_iommufd(vdev, ictx, &device_id);
-	if (ret)
-		return ret;
+	/* compat noiommu does not need to do ioas attach */
+	if (vfio_device_is_noiommu(vdev))
+		return 0;
 
 	ret = iommufd_vfio_compat_ioas_get_id(ictx, &ioas_id);
 	if (ret)
-		goto err_unbind;
-	ret = vdev->ops->attach_ioas(vdev, &ioas_id);
-	if (ret)
-		goto err_unbind;
-
-	/*
-	 * The legacy path has no way to return the device id or the selected
-	 * pt_id
-	 */
-	return 0;
+		return ret;
 
-err_unbind:
-	if (vdev->ops->unbind_iommufd)
-		vdev->ops->unbind_iommufd(vdev);
-	return ret;
+	/* The legacy path has no way to return the selected pt_id */
+	return vdev->ops->attach_ioas(vdev, &ioas_id);
 }
 
 void vfio_iommufd_unbind(struct vfio_device *vdev)
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 76181d208bc1..bb7a375315bb 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -238,6 +238,8 @@ bool vfio_iommufd_device_has_compat_ioas(struct vfio_device *vdev,
 					 struct iommufd_ctx *ictx);
 int vfio_iommufd_bind(struct vfio_device *device, struct iommufd_ctx *ictx);
 void vfio_iommufd_unbind(struct vfio_device *device);
+int vfio_iommufd_compat_attach_ioas(struct vfio_device *device,
+				    struct iommufd_ctx *ictx);
 #else
 static inline bool
 vfio_iommufd_device_has_compat_ioas(struct vfio_device *vdev,
@@ -255,6 +257,13 @@ static inline int vfio_iommufd_bind(struct vfio_device *device,
 static inline void vfio_iommufd_unbind(struct vfio_device *device)
 {
 }
+
+static inline int
+vfio_iommufd_compat_attach_ioas(struct vfio_device *device,
+				struct iommufd_ctx *ictx)
+{
+	return -EOPNOTSUPP;
+}
 #endif
 
 #if IS_ENABLED(CONFIG_VFIO_VIRQFD)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [Intel-gfx] [PATCH v12 11/24] vfio-iommufd: Split bind/attach into two steps
@ 2023-06-02 12:16   ` Yi Liu
  0 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

This aligns the bind/attach logic with the coming vfio device cdev support.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c   | 17 +++++++++++++----
 drivers/vfio/iommufd.c | 35 +++++++++++++++++------------------
 drivers/vfio/vfio.h    |  9 +++++++++
 3 files changed, 39 insertions(+), 22 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index b8b77daf7aa6..41a09a2df690 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -207,9 +207,13 @@ static int vfio_df_group_open(struct vfio_device_file *df)
 	}
 
 	ret = vfio_df_open(df);
-	if (ret) {
-		df->iommufd = NULL;
+	if (ret)
 		goto out_put_kvm;
+
+	if (df->iommufd && device->open_count == 1) {
+		ret = vfio_iommufd_compat_attach_ioas(device, df->iommufd);
+		if (ret)
+			goto out_close_device;
 	}
 
 	/*
@@ -218,12 +222,17 @@ static int vfio_df_group_open(struct vfio_device_file *df)
 	 */
 	smp_store_release(&df->access_granted, true);
 
+	mutex_unlock(&device->dev_set->lock);
+	mutex_unlock(&device->group->group_lock);
+	return 0;
+
+out_close_device:
+	vfio_df_close(df);
 out_put_kvm:
+	df->iommufd = NULL;
 	if (device->open_count == 0)
 		vfio_device_put_kvm(device);
-
 	mutex_unlock(&device->dev_set->lock);
-
 out_unlock:
 	mutex_unlock(&device->group->group_lock);
 	return ret;
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index 21237f5d0ffc..b30f9aaae6e7 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -20,33 +20,32 @@ bool vfio_iommufd_device_has_compat_ioas(struct vfio_device *vdev,
 
 int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
 {
-	u32 ioas_id;
 	u32 device_id;
+
+	lockdep_assert_held(&vdev->dev_set->lock);
+
+	/* The legacy path has no way to return the device id */
+	return vdev->ops->bind_iommufd(vdev, ictx, &device_id);
+}
+
+int vfio_iommufd_compat_attach_ioas(struct vfio_device *vdev,
+				    struct iommufd_ctx *ictx)
+{
+	u32 ioas_id;
 	int ret;
 
 	lockdep_assert_held(&vdev->dev_set->lock);
 
-	ret = vdev->ops->bind_iommufd(vdev, ictx, &device_id);
-	if (ret)
-		return ret;
+	/* compat noiommu does not need to do ioas attach */
+	if (vfio_device_is_noiommu(vdev))
+		return 0;
 
 	ret = iommufd_vfio_compat_ioas_get_id(ictx, &ioas_id);
 	if (ret)
-		goto err_unbind;
-	ret = vdev->ops->attach_ioas(vdev, &ioas_id);
-	if (ret)
-		goto err_unbind;
-
-	/*
-	 * The legacy path has no way to return the device id or the selected
-	 * pt_id
-	 */
-	return 0;
+		return ret;
 
-err_unbind:
-	if (vdev->ops->unbind_iommufd)
-		vdev->ops->unbind_iommufd(vdev);
-	return ret;
+	/* The legacy path has no way to return the selected pt_id */
+	return vdev->ops->attach_ioas(vdev, &ioas_id);
 }
 
 void vfio_iommufd_unbind(struct vfio_device *vdev)
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 76181d208bc1..bb7a375315bb 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -238,6 +238,8 @@ bool vfio_iommufd_device_has_compat_ioas(struct vfio_device *vdev,
 					 struct iommufd_ctx *ictx);
 int vfio_iommufd_bind(struct vfio_device *device, struct iommufd_ctx *ictx);
 void vfio_iommufd_unbind(struct vfio_device *device);
+int vfio_iommufd_compat_attach_ioas(struct vfio_device *device,
+				    struct iommufd_ctx *ictx);
 #else
 static inline bool
 vfio_iommufd_device_has_compat_ioas(struct vfio_device *vdev,
@@ -255,6 +257,13 @@ static inline int vfio_iommufd_bind(struct vfio_device *device,
 static inline void vfio_iommufd_unbind(struct vfio_device *device)
 {
 }
+
+static inline int
+vfio_iommufd_compat_attach_ioas(struct vfio_device *device,
+				struct iommufd_ctx *ictx)
+{
+	return -EOPNOTSUPP;
+}
 #endif
 
 #if IS_ENABLED(CONFIG_VFIO_VIRQFD)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH v12 12/24] vfio: Record devid in vfio_device_file
  2023-06-02 12:16 ` [Intel-gfx] " Yi Liu
@ 2023-06-02 12:16   ` Yi Liu
  -1 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

.bind_iommufd() will generate an ID to represent this bond, which is
needed by userspace for further usage. Store devid in vfio_device_file
to avoid passing the pointer in multiple places.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/iommufd.c   | 12 +++++++-----
 drivers/vfio/vfio.h      | 10 +++++-----
 drivers/vfio/vfio_main.c |  6 +++---
 3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index b30f9aaae6e7..2ce4d4382565 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -18,14 +18,14 @@ bool vfio_iommufd_device_has_compat_ioas(struct vfio_device *vdev,
 	return !iommufd_vfio_compat_ioas_get_id(ictx, &ioas_id);
 }
 
-int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
+int vfio_df_iommufd_bind(struct vfio_device_file *df)
 {
-	u32 device_id;
+	struct vfio_device *vdev = df->device;
+	struct iommufd_ctx *ictx = df->iommufd;
 
 	lockdep_assert_held(&vdev->dev_set->lock);
 
-	/* The legacy path has no way to return the device id */
-	return vdev->ops->bind_iommufd(vdev, ictx, &device_id);
+	return vdev->ops->bind_iommufd(vdev, ictx, &df->devid);
 }
 
 int vfio_iommufd_compat_attach_ioas(struct vfio_device *vdev,
@@ -48,8 +48,10 @@ int vfio_iommufd_compat_attach_ioas(struct vfio_device *vdev,
 	return vdev->ops->attach_ioas(vdev, &ioas_id);
 }
 
-void vfio_iommufd_unbind(struct vfio_device *vdev)
+void vfio_df_iommufd_unbind(struct vfio_device_file *df)
 {
+	struct vfio_device *vdev = df->device;
+
 	lockdep_assert_held(&vdev->dev_set->lock);
 
 	if (vfio_device_is_noiommu(vdev))
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index bb7a375315bb..b491a0cdbe62 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -24,6 +24,7 @@ struct vfio_device_file {
 	spinlock_t kvm_ref_lock; /* protect kvm field */
 	struct kvm *kvm;
 	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
+	u32 devid; /* only valid when iommufd is valid */
 };
 
 void vfio_device_put_registration(struct vfio_device *device);
@@ -236,8 +237,8 @@ static inline void vfio_container_cleanup(void)
 #if IS_ENABLED(CONFIG_IOMMUFD)
 bool vfio_iommufd_device_has_compat_ioas(struct vfio_device *vdev,
 					 struct iommufd_ctx *ictx);
-int vfio_iommufd_bind(struct vfio_device *device, struct iommufd_ctx *ictx);
-void vfio_iommufd_unbind(struct vfio_device *device);
+int vfio_df_iommufd_bind(struct vfio_device_file *df);
+void vfio_df_iommufd_unbind(struct vfio_device_file *df);
 int vfio_iommufd_compat_attach_ioas(struct vfio_device *device,
 				    struct iommufd_ctx *ictx);
 #else
@@ -248,13 +249,12 @@ vfio_iommufd_device_has_compat_ioas(struct vfio_device *vdev,
 	return false;
 }
 
-static inline int vfio_iommufd_bind(struct vfio_device *device,
-				    struct iommufd_ctx *ictx)
+static inline int vfio_df_iommufd_bind(struct vfio_device_file *fd)
 {
 	return -EOPNOTSUPP;
 }
 
-static inline void vfio_iommufd_unbind(struct vfio_device *device)
+static inline void vfio_df_iommufd_unbind(struct vfio_device_file *df)
 {
 }
 
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 01db017a0c3b..019498115621 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -446,7 +446,7 @@ static int vfio_device_first_open(struct vfio_device_file *df)
 		return -ENODEV;
 
 	if (iommufd)
-		ret = vfio_iommufd_bind(device, iommufd);
+		ret = vfio_df_iommufd_bind(df);
 	else
 		ret = vfio_device_group_use_iommu(device);
 	if (ret)
@@ -461,7 +461,7 @@ static int vfio_device_first_open(struct vfio_device_file *df)
 
 err_unuse_iommu:
 	if (iommufd)
-		vfio_iommufd_unbind(device);
+		vfio_df_iommufd_unbind(df);
 	else
 		vfio_device_group_unuse_iommu(device);
 err_module_put:
@@ -479,7 +479,7 @@ static void vfio_device_last_close(struct vfio_device_file *df)
 	if (device->ops->close_device)
 		device->ops->close_device(device);
 	if (iommufd)
-		vfio_iommufd_unbind(device);
+		vfio_df_iommufd_unbind(df);
 	else
 		vfio_device_group_unuse_iommu(device);
 	module_put(device->dev->driver->owner);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [Intel-gfx] [PATCH v12 12/24] vfio: Record devid in vfio_device_file
@ 2023-06-02 12:16   ` Yi Liu
  0 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

.bind_iommufd() will generate an ID to represent this bond, which is
needed by userspace for further usage. Store devid in vfio_device_file
to avoid passing the pointer in multiple places.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/iommufd.c   | 12 +++++++-----
 drivers/vfio/vfio.h      | 10 +++++-----
 drivers/vfio/vfio_main.c |  6 +++---
 3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index b30f9aaae6e7..2ce4d4382565 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -18,14 +18,14 @@ bool vfio_iommufd_device_has_compat_ioas(struct vfio_device *vdev,
 	return !iommufd_vfio_compat_ioas_get_id(ictx, &ioas_id);
 }
 
-int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
+int vfio_df_iommufd_bind(struct vfio_device_file *df)
 {
-	u32 device_id;
+	struct vfio_device *vdev = df->device;
+	struct iommufd_ctx *ictx = df->iommufd;
 
 	lockdep_assert_held(&vdev->dev_set->lock);
 
-	/* The legacy path has no way to return the device id */
-	return vdev->ops->bind_iommufd(vdev, ictx, &device_id);
+	return vdev->ops->bind_iommufd(vdev, ictx, &df->devid);
 }
 
 int vfio_iommufd_compat_attach_ioas(struct vfio_device *vdev,
@@ -48,8 +48,10 @@ int vfio_iommufd_compat_attach_ioas(struct vfio_device *vdev,
 	return vdev->ops->attach_ioas(vdev, &ioas_id);
 }
 
-void vfio_iommufd_unbind(struct vfio_device *vdev)
+void vfio_df_iommufd_unbind(struct vfio_device_file *df)
 {
+	struct vfio_device *vdev = df->device;
+
 	lockdep_assert_held(&vdev->dev_set->lock);
 
 	if (vfio_device_is_noiommu(vdev))
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index bb7a375315bb..b491a0cdbe62 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -24,6 +24,7 @@ struct vfio_device_file {
 	spinlock_t kvm_ref_lock; /* protect kvm field */
 	struct kvm *kvm;
 	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
+	u32 devid; /* only valid when iommufd is valid */
 };
 
 void vfio_device_put_registration(struct vfio_device *device);
@@ -236,8 +237,8 @@ static inline void vfio_container_cleanup(void)
 #if IS_ENABLED(CONFIG_IOMMUFD)
 bool vfio_iommufd_device_has_compat_ioas(struct vfio_device *vdev,
 					 struct iommufd_ctx *ictx);
-int vfio_iommufd_bind(struct vfio_device *device, struct iommufd_ctx *ictx);
-void vfio_iommufd_unbind(struct vfio_device *device);
+int vfio_df_iommufd_bind(struct vfio_device_file *df);
+void vfio_df_iommufd_unbind(struct vfio_device_file *df);
 int vfio_iommufd_compat_attach_ioas(struct vfio_device *device,
 				    struct iommufd_ctx *ictx);
 #else
@@ -248,13 +249,12 @@ vfio_iommufd_device_has_compat_ioas(struct vfio_device *vdev,
 	return false;
 }
 
-static inline int vfio_iommufd_bind(struct vfio_device *device,
-				    struct iommufd_ctx *ictx)
+static inline int vfio_df_iommufd_bind(struct vfio_device_file *fd)
 {
 	return -EOPNOTSUPP;
 }
 
-static inline void vfio_iommufd_unbind(struct vfio_device *device)
+static inline void vfio_df_iommufd_unbind(struct vfio_device_file *df)
 {
 }
 
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 01db017a0c3b..019498115621 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -446,7 +446,7 @@ static int vfio_device_first_open(struct vfio_device_file *df)
 		return -ENODEV;
 
 	if (iommufd)
-		ret = vfio_iommufd_bind(device, iommufd);
+		ret = vfio_df_iommufd_bind(df);
 	else
 		ret = vfio_device_group_use_iommu(device);
 	if (ret)
@@ -461,7 +461,7 @@ static int vfio_device_first_open(struct vfio_device_file *df)
 
 err_unuse_iommu:
 	if (iommufd)
-		vfio_iommufd_unbind(device);
+		vfio_df_iommufd_unbind(df);
 	else
 		vfio_device_group_unuse_iommu(device);
 err_module_put:
@@ -479,7 +479,7 @@ static void vfio_device_last_close(struct vfio_device_file *df)
 	if (device->ops->close_device)
 		device->ops->close_device(device);
 	if (iommufd)
-		vfio_iommufd_unbind(device);
+		vfio_df_iommufd_unbind(df);
 	else
 		vfio_device_group_unuse_iommu(device);
 	module_put(device->dev->driver->owner);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH v12 13/24] vfio-iommufd: Add detach_ioas support for physical VFIO devices
  2023-06-02 12:16 ` [Intel-gfx] " Yi Liu
@ 2023-06-02 12:16   ` Yi Liu
  -1 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

This prepares for adding DETACH ioctl for physical VFIO devices.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 Documentation/driver-api/vfio.rst             |  8 +++++---
 drivers/vfio/fsl-mc/vfio_fsl_mc.c             |  1 +
 drivers/vfio/iommufd.c                        | 20 +++++++++++++++++++
 .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    |  2 ++
 drivers/vfio/pci/mlx5/main.c                  |  1 +
 drivers/vfio/pci/vfio_pci.c                   |  1 +
 drivers/vfio/platform/vfio_amba.c             |  1 +
 drivers/vfio/platform/vfio_platform.c         |  1 +
 drivers/vfio/vfio_main.c                      |  3 ++-
 include/linux/vfio.h                          |  8 +++++++-
 10 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
index 68abc089d6dd..363e12c90b87 100644
--- a/Documentation/driver-api/vfio.rst
+++ b/Documentation/driver-api/vfio.rst
@@ -279,6 +279,7 @@ similar to a file operations structure::
 					struct iommufd_ctx *ictx, u32 *out_device_id);
 		void	(*unbind_iommufd)(struct vfio_device *vdev);
 		int	(*attach_ioas)(struct vfio_device *vdev, u32 *pt_id);
+		void	(*detach_ioas)(struct vfio_device *vdev);
 		int	(*open_device)(struct vfio_device *vdev);
 		void	(*close_device)(struct vfio_device *vdev);
 		ssize_t	(*read)(struct vfio_device *vdev, char __user *buf,
@@ -315,9 +316,10 @@ container_of().
 	- The [un]bind_iommufd callbacks are issued when the device is bound to
 	  and unbound from iommufd.
 
-	- The attach_ioas callback is issued when the device is attached to an
-	  IOAS managed by the bound iommufd. The attached IOAS is automatically
-	  detached when the device is unbound from iommufd.
+	- The [de]attach_ioas callback is issued when the device is attached to
+	  and detached from an IOAS managed by the bound iommufd. However, the
+	  attached IOAS can also be automatically detached when the device is
+	  unbound from iommufd.
 
 	- The read/write/mmap callbacks implement the device region access defined
 	  by the device's own VFIO_DEVICE_GET_REGION_INFO ioctl.
diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc.c b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
index c89a047a4cd8..d540cf683d93 100644
--- a/drivers/vfio/fsl-mc/vfio_fsl_mc.c
+++ b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
@@ -594,6 +594,7 @@ static const struct vfio_device_ops vfio_fsl_mc_ops = {
 	.bind_iommufd	= vfio_iommufd_physical_bind,
 	.unbind_iommufd	= vfio_iommufd_physical_unbind,
 	.attach_ioas	= vfio_iommufd_physical_attach_ioas,
+	.detach_ioas	= vfio_iommufd_physical_detach_ioas,
 };
 
 static struct fsl_mc_driver vfio_fsl_mc_driver = {
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index 2ce4d4382565..ae96260912d8 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -145,6 +145,14 @@ int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
 {
 	int rc;
 
+	lockdep_assert_held(&vdev->dev_set->lock);
+
+	if (WARN_ON(!vdev->iommufd_device))
+		return -EINVAL;
+
+	if (vdev->iommufd_attached)
+		return -EBUSY;
+
 	rc = iommufd_device_attach(vdev->iommufd_device, pt_id);
 	if (rc)
 		return rc;
@@ -153,6 +161,18 @@ int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
 }
 EXPORT_SYMBOL_GPL(vfio_iommufd_physical_attach_ioas);
 
+void vfio_iommufd_physical_detach_ioas(struct vfio_device *vdev)
+{
+	lockdep_assert_held(&vdev->dev_set->lock);
+
+	if (WARN_ON(!vdev->iommufd_device) || !vdev->iommufd_attached)
+		return;
+
+	iommufd_device_detach(vdev->iommufd_device);
+	vdev->iommufd_attached = false;
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_physical_detach_ioas);
+
 /*
  * The emulated standard ops mean that vfio_device is going to use the
  * "mdev path" and will call vfio_pin_pages()/vfio_dma_rw(). Drivers using this
diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index a117eaf21c14..b2f9778c8366 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -1373,6 +1373,7 @@ static const struct vfio_device_ops hisi_acc_vfio_pci_migrn_ops = {
 	.bind_iommufd = vfio_iommufd_physical_bind,
 	.unbind_iommufd = vfio_iommufd_physical_unbind,
 	.attach_ioas = vfio_iommufd_physical_attach_ioas,
+	.detach_ioas = vfio_iommufd_physical_detach_ioas,
 };
 
 static const struct vfio_device_ops hisi_acc_vfio_pci_ops = {
@@ -1391,6 +1392,7 @@ static const struct vfio_device_ops hisi_acc_vfio_pci_ops = {
 	.bind_iommufd = vfio_iommufd_physical_bind,
 	.unbind_iommufd = vfio_iommufd_physical_unbind,
 	.attach_ioas = vfio_iommufd_physical_attach_ioas,
+	.detach_ioas = vfio_iommufd_physical_detach_ioas,
 };
 
 static int hisi_acc_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
index d95fd382814c..42ec574a8622 100644
--- a/drivers/vfio/pci/mlx5/main.c
+++ b/drivers/vfio/pci/mlx5/main.c
@@ -1320,6 +1320,7 @@ static const struct vfio_device_ops mlx5vf_pci_ops = {
 	.bind_iommufd = vfio_iommufd_physical_bind,
 	.unbind_iommufd = vfio_iommufd_physical_unbind,
 	.attach_ioas = vfio_iommufd_physical_attach_ioas,
+	.detach_ioas = vfio_iommufd_physical_detach_ioas,
 };
 
 static int mlx5vf_pci_probe(struct pci_dev *pdev,
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 29091ee2e984..cb5b7f865d58 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -141,6 +141,7 @@ static const struct vfio_device_ops vfio_pci_ops = {
 	.bind_iommufd	= vfio_iommufd_physical_bind,
 	.unbind_iommufd	= vfio_iommufd_physical_unbind,
 	.attach_ioas	= vfio_iommufd_physical_attach_ioas,
+	.detach_ioas	= vfio_iommufd_physical_detach_ioas,
 };
 
 static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
diff --git a/drivers/vfio/platform/vfio_amba.c b/drivers/vfio/platform/vfio_amba.c
index 83fe54015595..6464b3939ebc 100644
--- a/drivers/vfio/platform/vfio_amba.c
+++ b/drivers/vfio/platform/vfio_amba.c
@@ -119,6 +119,7 @@ static const struct vfio_device_ops vfio_amba_ops = {
 	.bind_iommufd	= vfio_iommufd_physical_bind,
 	.unbind_iommufd	= vfio_iommufd_physical_unbind,
 	.attach_ioas	= vfio_iommufd_physical_attach_ioas,
+	.detach_ioas	= vfio_iommufd_physical_detach_ioas,
 };
 
 static const struct amba_id pl330_ids[] = {
diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c
index 22a1efca32a8..8cf22fa65baa 100644
--- a/drivers/vfio/platform/vfio_platform.c
+++ b/drivers/vfio/platform/vfio_platform.c
@@ -108,6 +108,7 @@ static const struct vfio_device_ops vfio_platform_ops = {
 	.bind_iommufd	= vfio_iommufd_physical_bind,
 	.unbind_iommufd	= vfio_iommufd_physical_unbind,
 	.attach_ioas	= vfio_iommufd_physical_attach_ioas,
+	.detach_ioas	= vfio_iommufd_physical_detach_ioas,
 };
 
 static struct platform_driver vfio_platform_driver = {
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 019498115621..df4f3e37268d 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -273,7 +273,8 @@ static int __vfio_register_dev(struct vfio_device *device,
 	if (WARN_ON(IS_ENABLED(CONFIG_IOMMUFD) &&
 		    (!device->ops->bind_iommufd ||
 		     !device->ops->unbind_iommufd ||
-		     !device->ops->attach_ioas)))
+		     !device->ops->attach_ioas ||
+		     !device->ops->detach_ioas)))
 		return -EINVAL;
 
 	/*
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 974f8bcf917a..e1232d47e553 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -73,7 +73,9 @@ struct vfio_device {
  * @bind_iommufd: Called when binding the device to an iommufd
  * @unbind_iommufd: Opposite of bind_iommufd
  * @attach_ioas: Called when attaching device to an IOAS/HWPT managed by the
- *		 bound iommufd. Undo in unbind_iommufd.
+ *		 bound iommufd. Undo in unbind_iommufd if @detach_ioas is not
+ *		 called.
+ * @detach_ioas: Opposite of attach_ioas
  * @open_device: Called when the first file descriptor is opened for this device
  * @close_device: Opposite of open_device
  * @read: Perform read(2) on device file descriptor
@@ -97,6 +99,7 @@ struct vfio_device_ops {
 				struct iommufd_ctx *ictx, u32 *out_device_id);
 	void	(*unbind_iommufd)(struct vfio_device *vdev);
 	int	(*attach_ioas)(struct vfio_device *vdev, u32 *pt_id);
+	void	(*detach_ioas)(struct vfio_device *vdev);
 	int	(*open_device)(struct vfio_device *vdev);
 	void	(*close_device)(struct vfio_device *vdev);
 	ssize_t	(*read)(struct vfio_device *vdev, char __user *buf,
@@ -121,6 +124,7 @@ int vfio_iommufd_physical_bind(struct vfio_device *vdev,
 			       struct iommufd_ctx *ictx, u32 *out_device_id);
 void vfio_iommufd_physical_unbind(struct vfio_device *vdev);
 int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
+void vfio_iommufd_physical_detach_ioas(struct vfio_device *vdev);
 int vfio_iommufd_emulated_bind(struct vfio_device *vdev,
 			       struct iommufd_ctx *ictx, u32 *out_device_id);
 void vfio_iommufd_emulated_unbind(struct vfio_device *vdev);
@@ -146,6 +150,8 @@ vfio_iommufd_device_hot_reset_devid(struct vfio_device *vdev,
 	((void (*)(struct vfio_device *vdev)) NULL)
 #define vfio_iommufd_physical_attach_ioas \
 	((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
+#define vfio_iommufd_physical_detach_ioas \
+	((void (*)(struct vfio_device *vdev)) NULL)
 #define vfio_iommufd_emulated_bind                                      \
 	((int (*)(struct vfio_device *vdev, struct iommufd_ctx *ictx,   \
 		  u32 *out_device_id)) NULL)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [Intel-gfx] [PATCH v12 13/24] vfio-iommufd: Add detach_ioas support for physical VFIO devices
@ 2023-06-02 12:16   ` Yi Liu
  0 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

This prepares for adding DETACH ioctl for physical VFIO devices.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 Documentation/driver-api/vfio.rst             |  8 +++++---
 drivers/vfio/fsl-mc/vfio_fsl_mc.c             |  1 +
 drivers/vfio/iommufd.c                        | 20 +++++++++++++++++++
 .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    |  2 ++
 drivers/vfio/pci/mlx5/main.c                  |  1 +
 drivers/vfio/pci/vfio_pci.c                   |  1 +
 drivers/vfio/platform/vfio_amba.c             |  1 +
 drivers/vfio/platform/vfio_platform.c         |  1 +
 drivers/vfio/vfio_main.c                      |  3 ++-
 include/linux/vfio.h                          |  8 +++++++-
 10 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
index 68abc089d6dd..363e12c90b87 100644
--- a/Documentation/driver-api/vfio.rst
+++ b/Documentation/driver-api/vfio.rst
@@ -279,6 +279,7 @@ similar to a file operations structure::
 					struct iommufd_ctx *ictx, u32 *out_device_id);
 		void	(*unbind_iommufd)(struct vfio_device *vdev);
 		int	(*attach_ioas)(struct vfio_device *vdev, u32 *pt_id);
+		void	(*detach_ioas)(struct vfio_device *vdev);
 		int	(*open_device)(struct vfio_device *vdev);
 		void	(*close_device)(struct vfio_device *vdev);
 		ssize_t	(*read)(struct vfio_device *vdev, char __user *buf,
@@ -315,9 +316,10 @@ container_of().
 	- The [un]bind_iommufd callbacks are issued when the device is bound to
 	  and unbound from iommufd.
 
-	- The attach_ioas callback is issued when the device is attached to an
-	  IOAS managed by the bound iommufd. The attached IOAS is automatically
-	  detached when the device is unbound from iommufd.
+	- The [de]attach_ioas callback is issued when the device is attached to
+	  and detached from an IOAS managed by the bound iommufd. However, the
+	  attached IOAS can also be automatically detached when the device is
+	  unbound from iommufd.
 
 	- The read/write/mmap callbacks implement the device region access defined
 	  by the device's own VFIO_DEVICE_GET_REGION_INFO ioctl.
diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc.c b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
index c89a047a4cd8..d540cf683d93 100644
--- a/drivers/vfio/fsl-mc/vfio_fsl_mc.c
+++ b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
@@ -594,6 +594,7 @@ static const struct vfio_device_ops vfio_fsl_mc_ops = {
 	.bind_iommufd	= vfio_iommufd_physical_bind,
 	.unbind_iommufd	= vfio_iommufd_physical_unbind,
 	.attach_ioas	= vfio_iommufd_physical_attach_ioas,
+	.detach_ioas	= vfio_iommufd_physical_detach_ioas,
 };
 
 static struct fsl_mc_driver vfio_fsl_mc_driver = {
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index 2ce4d4382565..ae96260912d8 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -145,6 +145,14 @@ int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
 {
 	int rc;
 
+	lockdep_assert_held(&vdev->dev_set->lock);
+
+	if (WARN_ON(!vdev->iommufd_device))
+		return -EINVAL;
+
+	if (vdev->iommufd_attached)
+		return -EBUSY;
+
 	rc = iommufd_device_attach(vdev->iommufd_device, pt_id);
 	if (rc)
 		return rc;
@@ -153,6 +161,18 @@ int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
 }
 EXPORT_SYMBOL_GPL(vfio_iommufd_physical_attach_ioas);
 
+void vfio_iommufd_physical_detach_ioas(struct vfio_device *vdev)
+{
+	lockdep_assert_held(&vdev->dev_set->lock);
+
+	if (WARN_ON(!vdev->iommufd_device) || !vdev->iommufd_attached)
+		return;
+
+	iommufd_device_detach(vdev->iommufd_device);
+	vdev->iommufd_attached = false;
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_physical_detach_ioas);
+
 /*
  * The emulated standard ops mean that vfio_device is going to use the
  * "mdev path" and will call vfio_pin_pages()/vfio_dma_rw(). Drivers using this
diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index a117eaf21c14..b2f9778c8366 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -1373,6 +1373,7 @@ static const struct vfio_device_ops hisi_acc_vfio_pci_migrn_ops = {
 	.bind_iommufd = vfio_iommufd_physical_bind,
 	.unbind_iommufd = vfio_iommufd_physical_unbind,
 	.attach_ioas = vfio_iommufd_physical_attach_ioas,
+	.detach_ioas = vfio_iommufd_physical_detach_ioas,
 };
 
 static const struct vfio_device_ops hisi_acc_vfio_pci_ops = {
@@ -1391,6 +1392,7 @@ static const struct vfio_device_ops hisi_acc_vfio_pci_ops = {
 	.bind_iommufd = vfio_iommufd_physical_bind,
 	.unbind_iommufd = vfio_iommufd_physical_unbind,
 	.attach_ioas = vfio_iommufd_physical_attach_ioas,
+	.detach_ioas = vfio_iommufd_physical_detach_ioas,
 };
 
 static int hisi_acc_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
index d95fd382814c..42ec574a8622 100644
--- a/drivers/vfio/pci/mlx5/main.c
+++ b/drivers/vfio/pci/mlx5/main.c
@@ -1320,6 +1320,7 @@ static const struct vfio_device_ops mlx5vf_pci_ops = {
 	.bind_iommufd = vfio_iommufd_physical_bind,
 	.unbind_iommufd = vfio_iommufd_physical_unbind,
 	.attach_ioas = vfio_iommufd_physical_attach_ioas,
+	.detach_ioas = vfio_iommufd_physical_detach_ioas,
 };
 
 static int mlx5vf_pci_probe(struct pci_dev *pdev,
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 29091ee2e984..cb5b7f865d58 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -141,6 +141,7 @@ static const struct vfio_device_ops vfio_pci_ops = {
 	.bind_iommufd	= vfio_iommufd_physical_bind,
 	.unbind_iommufd	= vfio_iommufd_physical_unbind,
 	.attach_ioas	= vfio_iommufd_physical_attach_ioas,
+	.detach_ioas	= vfio_iommufd_physical_detach_ioas,
 };
 
 static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
diff --git a/drivers/vfio/platform/vfio_amba.c b/drivers/vfio/platform/vfio_amba.c
index 83fe54015595..6464b3939ebc 100644
--- a/drivers/vfio/platform/vfio_amba.c
+++ b/drivers/vfio/platform/vfio_amba.c
@@ -119,6 +119,7 @@ static const struct vfio_device_ops vfio_amba_ops = {
 	.bind_iommufd	= vfio_iommufd_physical_bind,
 	.unbind_iommufd	= vfio_iommufd_physical_unbind,
 	.attach_ioas	= vfio_iommufd_physical_attach_ioas,
+	.detach_ioas	= vfio_iommufd_physical_detach_ioas,
 };
 
 static const struct amba_id pl330_ids[] = {
diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c
index 22a1efca32a8..8cf22fa65baa 100644
--- a/drivers/vfio/platform/vfio_platform.c
+++ b/drivers/vfio/platform/vfio_platform.c
@@ -108,6 +108,7 @@ static const struct vfio_device_ops vfio_platform_ops = {
 	.bind_iommufd	= vfio_iommufd_physical_bind,
 	.unbind_iommufd	= vfio_iommufd_physical_unbind,
 	.attach_ioas	= vfio_iommufd_physical_attach_ioas,
+	.detach_ioas	= vfio_iommufd_physical_detach_ioas,
 };
 
 static struct platform_driver vfio_platform_driver = {
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 019498115621..df4f3e37268d 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -273,7 +273,8 @@ static int __vfio_register_dev(struct vfio_device *device,
 	if (WARN_ON(IS_ENABLED(CONFIG_IOMMUFD) &&
 		    (!device->ops->bind_iommufd ||
 		     !device->ops->unbind_iommufd ||
-		     !device->ops->attach_ioas)))
+		     !device->ops->attach_ioas ||
+		     !device->ops->detach_ioas)))
 		return -EINVAL;
 
 	/*
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 974f8bcf917a..e1232d47e553 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -73,7 +73,9 @@ struct vfio_device {
  * @bind_iommufd: Called when binding the device to an iommufd
  * @unbind_iommufd: Opposite of bind_iommufd
  * @attach_ioas: Called when attaching device to an IOAS/HWPT managed by the
- *		 bound iommufd. Undo in unbind_iommufd.
+ *		 bound iommufd. Undo in unbind_iommufd if @detach_ioas is not
+ *		 called.
+ * @detach_ioas: Opposite of attach_ioas
  * @open_device: Called when the first file descriptor is opened for this device
  * @close_device: Opposite of open_device
  * @read: Perform read(2) on device file descriptor
@@ -97,6 +99,7 @@ struct vfio_device_ops {
 				struct iommufd_ctx *ictx, u32 *out_device_id);
 	void	(*unbind_iommufd)(struct vfio_device *vdev);
 	int	(*attach_ioas)(struct vfio_device *vdev, u32 *pt_id);
+	void	(*detach_ioas)(struct vfio_device *vdev);
 	int	(*open_device)(struct vfio_device *vdev);
 	void	(*close_device)(struct vfio_device *vdev);
 	ssize_t	(*read)(struct vfio_device *vdev, char __user *buf,
@@ -121,6 +124,7 @@ int vfio_iommufd_physical_bind(struct vfio_device *vdev,
 			       struct iommufd_ctx *ictx, u32 *out_device_id);
 void vfio_iommufd_physical_unbind(struct vfio_device *vdev);
 int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
+void vfio_iommufd_physical_detach_ioas(struct vfio_device *vdev);
 int vfio_iommufd_emulated_bind(struct vfio_device *vdev,
 			       struct iommufd_ctx *ictx, u32 *out_device_id);
 void vfio_iommufd_emulated_unbind(struct vfio_device *vdev);
@@ -146,6 +150,8 @@ vfio_iommufd_device_hot_reset_devid(struct vfio_device *vdev,
 	((void (*)(struct vfio_device *vdev)) NULL)
 #define vfio_iommufd_physical_attach_ioas \
 	((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
+#define vfio_iommufd_physical_detach_ioas \
+	((void (*)(struct vfio_device *vdev)) NULL)
 #define vfio_iommufd_emulated_bind                                      \
 	((int (*)(struct vfio_device *vdev, struct iommufd_ctx *ictx,   \
 		  u32 *out_device_id)) NULL)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH v12 14/24] iommufd/device: Add iommufd_access_detach() API
  2023-06-02 12:16 ` [Intel-gfx] " Yi Liu
@ 2023-06-02 12:16   ` Yi Liu
  -1 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

From: Nicolin Chen <nicolinc@nvidia.com>

Previously, the detach routine is only done by the destroy(). And it was
called by vfio_iommufd_emulated_unbind() when the device runs close(), so
all the mappings in iopt were cleaned in that setup, when the call trace
reaches this detach() routine.

Now, there's a need of a detach uAPI, meaning that it does not only need
a new iommufd_access_detach() API, but also requires access->ops->unmap()
call as a cleanup. So add one.

However, leaving that unprotected can introduce some potential of a race
condition during the pin_/unpin_pages() call, where access->ioas->iopt is
getting referenced. So, add an ioas_lock to protect the context of iopt
referencings.

Also, to allow the iommufd_access_unpin_pages() callback to happen via
this unmap() call, add an ioas_unpin pointer, so the unpin routine won't
be affected by the "access->ioas = NULL" trick.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/device.c          | 76 +++++++++++++++++++++++--
 drivers/iommu/iommufd/iommufd_private.h |  2 +
 include/linux/iommufd.h                 |  1 +
 3 files changed, 74 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 96d4281bfa7c..6b4ff635c15e 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -486,6 +486,7 @@ iommufd_access_create(struct iommufd_ctx *ictx,
 	iommufd_ctx_get(ictx);
 	iommufd_object_finalize(ictx, &access->obj);
 	*id = access->obj.id;
+	mutex_init(&access->ioas_lock);
 	return access;
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_access_create, IOMMUFD);
@@ -505,26 +506,66 @@ void iommufd_access_destroy(struct iommufd_access *access)
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_access_destroy, IOMMUFD);
 
+static void __iommufd_access_detach(struct iommufd_access *access)
+{
+	struct iommufd_ioas *cur_ioas = access->ioas;
+
+	lockdep_assert_held(&access->ioas_lock);
+	/*
+	 * Set ioas to NULL to block any further iommufd_access_pin_pages().
+	 * iommufd_access_unpin_pages() can continue using access->ioas_unpin.
+	 */
+	access->ioas = NULL;
+
+	if (access->ops->unmap) {
+		mutex_unlock(&access->ioas_lock);
+		access->ops->unmap(access->data, 0, ULONG_MAX);
+		mutex_lock(&access->ioas_lock);
+	}
+	iopt_remove_access(&cur_ioas->iopt, access);
+	refcount_dec(&cur_ioas->obj.users);
+}
+
+void iommufd_access_detach(struct iommufd_access *access)
+{
+	mutex_lock(&access->ioas_lock);
+	if (WARN_ON(!access->ioas))
+		goto out;
+	__iommufd_access_detach(access);
+out:
+	access->ioas_unpin = NULL;
+	mutex_unlock(&access->ioas_lock);
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_access_detach, IOMMUFD);
+
 int iommufd_access_attach(struct iommufd_access *access, u32 ioas_id)
 {
 	struct iommufd_ioas *new_ioas;
 	int rc = 0;
 
-	if (access->ioas)
+	mutex_lock(&access->ioas_lock);
+	if (access->ioas) {
+		mutex_unlock(&access->ioas_lock);
 		return -EINVAL;
+	}
 
 	new_ioas = iommufd_get_ioas(access->ictx, ioas_id);
-	if (IS_ERR(new_ioas))
+	if (IS_ERR(new_ioas)) {
+		mutex_unlock(&access->ioas_lock);
 		return PTR_ERR(new_ioas);
+	}
 
 	rc = iopt_add_access(&new_ioas->iopt, access);
 	if (rc) {
+		mutex_unlock(&access->ioas_lock);
 		iommufd_put_object(&new_ioas->obj);
 		return rc;
 	}
 	iommufd_ref_to_users(&new_ioas->obj);
 
 	access->ioas = new_ioas;
+	access->ioas_unpin = new_ioas;
+	mutex_unlock(&access->ioas_lock);
 	return 0;
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_access_attach, IOMMUFD);
@@ -579,8 +620,8 @@ void iommufd_access_notify_unmap(struct io_pagetable *iopt, unsigned long iova,
 void iommufd_access_unpin_pages(struct iommufd_access *access,
 				unsigned long iova, unsigned long length)
 {
-	struct io_pagetable *iopt = &access->ioas->iopt;
 	struct iopt_area_contig_iter iter;
+	struct io_pagetable *iopt;
 	unsigned long last_iova;
 	struct iopt_area *area;
 
@@ -588,6 +629,13 @@ void iommufd_access_unpin_pages(struct iommufd_access *access,
 	    WARN_ON(check_add_overflow(iova, length - 1, &last_iova)))
 		return;
 
+	mutex_lock(&access->ioas_lock);
+	if (!access->ioas_unpin) {
+		mutex_unlock(&access->ioas_lock);
+		return;
+	}
+	iopt = &access->ioas_unpin->iopt;
+
 	down_read(&iopt->iova_rwsem);
 	iopt_for_each_contig_area(&iter, area, iopt, iova, last_iova)
 		iopt_area_remove_access(
@@ -597,6 +645,7 @@ void iommufd_access_unpin_pages(struct iommufd_access *access,
 				min(last_iova, iopt_area_last_iova(area))));
 	up_read(&iopt->iova_rwsem);
 	WARN_ON(!iopt_area_contig_done(&iter));
+	mutex_unlock(&access->ioas_lock);
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_access_unpin_pages, IOMMUFD);
 
@@ -642,8 +691,8 @@ int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova,
 			     unsigned long length, struct page **out_pages,
 			     unsigned int flags)
 {
-	struct io_pagetable *iopt = &access->ioas->iopt;
 	struct iopt_area_contig_iter iter;
+	struct io_pagetable *iopt;
 	unsigned long last_iova;
 	struct iopt_area *area;
 	int rc;
@@ -658,6 +707,13 @@ int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova,
 	if (check_add_overflow(iova, length - 1, &last_iova))
 		return -EOVERFLOW;
 
+	mutex_lock(&access->ioas_lock);
+	if (!access->ioas) {
+		mutex_unlock(&access->ioas_lock);
+		return -ENOENT;
+	}
+	iopt = &access->ioas->iopt;
+
 	down_read(&iopt->iova_rwsem);
 	iopt_for_each_contig_area(&iter, area, iopt, iova, last_iova) {
 		unsigned long last = min(last_iova, iopt_area_last_iova(area));
@@ -688,6 +744,7 @@ int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova,
 	}
 
 	up_read(&iopt->iova_rwsem);
+	mutex_unlock(&access->ioas_lock);
 	return 0;
 
 err_remove:
@@ -702,6 +759,7 @@ int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova,
 						  iopt_area_last_iova(area))));
 	}
 	up_read(&iopt->iova_rwsem);
+	mutex_unlock(&access->ioas_lock);
 	return rc;
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_access_pin_pages, IOMMUFD);
@@ -721,8 +779,8 @@ EXPORT_SYMBOL_NS_GPL(iommufd_access_pin_pages, IOMMUFD);
 int iommufd_access_rw(struct iommufd_access *access, unsigned long iova,
 		      void *data, size_t length, unsigned int flags)
 {
-	struct io_pagetable *iopt = &access->ioas->iopt;
 	struct iopt_area_contig_iter iter;
+	struct io_pagetable *iopt;
 	struct iopt_area *area;
 	unsigned long last_iova;
 	int rc;
@@ -732,6 +790,13 @@ int iommufd_access_rw(struct iommufd_access *access, unsigned long iova,
 	if (check_add_overflow(iova, length - 1, &last_iova))
 		return -EOVERFLOW;
 
+	mutex_lock(&access->ioas_lock);
+	if (!access->ioas) {
+		mutex_unlock(&access->ioas_lock);
+		return -ENOENT;
+	}
+	iopt = &access->ioas->iopt;
+
 	down_read(&iopt->iova_rwsem);
 	iopt_for_each_contig_area(&iter, area, iopt, iova, last_iova) {
 		unsigned long last = min(last_iova, iopt_area_last_iova(area));
@@ -758,6 +823,7 @@ int iommufd_access_rw(struct iommufd_access *access, unsigned long iova,
 		rc = -ENOENT;
 err_out:
 	up_read(&iopt->iova_rwsem);
+	mutex_unlock(&access->ioas_lock);
 	return rc;
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_access_rw, IOMMUFD);
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index b38e67d1988b..3dcaf86aab97 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -285,6 +285,8 @@ struct iommufd_access {
 	struct iommufd_object obj;
 	struct iommufd_ctx *ictx;
 	struct iommufd_ioas *ioas;
+	struct iommufd_ioas *ioas_unpin;
+	struct mutex ioas_lock;
 	const struct iommufd_access_ops *ops;
 	void *data;
 	unsigned long iova_alignment;
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index 33933b0f95fc..c8508daf9bd9 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -48,6 +48,7 @@ iommufd_access_create(struct iommufd_ctx *ictx,
 		      const struct iommufd_access_ops *ops, void *data, u32 *id);
 void iommufd_access_destroy(struct iommufd_access *access);
 int iommufd_access_attach(struct iommufd_access *access, u32 ioas_id);
+void iommufd_access_detach(struct iommufd_access *access);
 
 void iommufd_ctx_get(struct iommufd_ctx *ictx);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [Intel-gfx] [PATCH v12 14/24] iommufd/device: Add iommufd_access_detach() API
@ 2023-06-02 12:16   ` Yi Liu
  0 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

From: Nicolin Chen <nicolinc@nvidia.com>

Previously, the detach routine is only done by the destroy(). And it was
called by vfio_iommufd_emulated_unbind() when the device runs close(), so
all the mappings in iopt were cleaned in that setup, when the call trace
reaches this detach() routine.

Now, there's a need of a detach uAPI, meaning that it does not only need
a new iommufd_access_detach() API, but also requires access->ops->unmap()
call as a cleanup. So add one.

However, leaving that unprotected can introduce some potential of a race
condition during the pin_/unpin_pages() call, where access->ioas->iopt is
getting referenced. So, add an ioas_lock to protect the context of iopt
referencings.

Also, to allow the iommufd_access_unpin_pages() callback to happen via
this unmap() call, add an ioas_unpin pointer, so the unpin routine won't
be affected by the "access->ioas = NULL" trick.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/device.c          | 76 +++++++++++++++++++++++--
 drivers/iommu/iommufd/iommufd_private.h |  2 +
 include/linux/iommufd.h                 |  1 +
 3 files changed, 74 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 96d4281bfa7c..6b4ff635c15e 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -486,6 +486,7 @@ iommufd_access_create(struct iommufd_ctx *ictx,
 	iommufd_ctx_get(ictx);
 	iommufd_object_finalize(ictx, &access->obj);
 	*id = access->obj.id;
+	mutex_init(&access->ioas_lock);
 	return access;
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_access_create, IOMMUFD);
@@ -505,26 +506,66 @@ void iommufd_access_destroy(struct iommufd_access *access)
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_access_destroy, IOMMUFD);
 
+static void __iommufd_access_detach(struct iommufd_access *access)
+{
+	struct iommufd_ioas *cur_ioas = access->ioas;
+
+	lockdep_assert_held(&access->ioas_lock);
+	/*
+	 * Set ioas to NULL to block any further iommufd_access_pin_pages().
+	 * iommufd_access_unpin_pages() can continue using access->ioas_unpin.
+	 */
+	access->ioas = NULL;
+
+	if (access->ops->unmap) {
+		mutex_unlock(&access->ioas_lock);
+		access->ops->unmap(access->data, 0, ULONG_MAX);
+		mutex_lock(&access->ioas_lock);
+	}
+	iopt_remove_access(&cur_ioas->iopt, access);
+	refcount_dec(&cur_ioas->obj.users);
+}
+
+void iommufd_access_detach(struct iommufd_access *access)
+{
+	mutex_lock(&access->ioas_lock);
+	if (WARN_ON(!access->ioas))
+		goto out;
+	__iommufd_access_detach(access);
+out:
+	access->ioas_unpin = NULL;
+	mutex_unlock(&access->ioas_lock);
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_access_detach, IOMMUFD);
+
 int iommufd_access_attach(struct iommufd_access *access, u32 ioas_id)
 {
 	struct iommufd_ioas *new_ioas;
 	int rc = 0;
 
-	if (access->ioas)
+	mutex_lock(&access->ioas_lock);
+	if (access->ioas) {
+		mutex_unlock(&access->ioas_lock);
 		return -EINVAL;
+	}
 
 	new_ioas = iommufd_get_ioas(access->ictx, ioas_id);
-	if (IS_ERR(new_ioas))
+	if (IS_ERR(new_ioas)) {
+		mutex_unlock(&access->ioas_lock);
 		return PTR_ERR(new_ioas);
+	}
 
 	rc = iopt_add_access(&new_ioas->iopt, access);
 	if (rc) {
+		mutex_unlock(&access->ioas_lock);
 		iommufd_put_object(&new_ioas->obj);
 		return rc;
 	}
 	iommufd_ref_to_users(&new_ioas->obj);
 
 	access->ioas = new_ioas;
+	access->ioas_unpin = new_ioas;
+	mutex_unlock(&access->ioas_lock);
 	return 0;
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_access_attach, IOMMUFD);
@@ -579,8 +620,8 @@ void iommufd_access_notify_unmap(struct io_pagetable *iopt, unsigned long iova,
 void iommufd_access_unpin_pages(struct iommufd_access *access,
 				unsigned long iova, unsigned long length)
 {
-	struct io_pagetable *iopt = &access->ioas->iopt;
 	struct iopt_area_contig_iter iter;
+	struct io_pagetable *iopt;
 	unsigned long last_iova;
 	struct iopt_area *area;
 
@@ -588,6 +629,13 @@ void iommufd_access_unpin_pages(struct iommufd_access *access,
 	    WARN_ON(check_add_overflow(iova, length - 1, &last_iova)))
 		return;
 
+	mutex_lock(&access->ioas_lock);
+	if (!access->ioas_unpin) {
+		mutex_unlock(&access->ioas_lock);
+		return;
+	}
+	iopt = &access->ioas_unpin->iopt;
+
 	down_read(&iopt->iova_rwsem);
 	iopt_for_each_contig_area(&iter, area, iopt, iova, last_iova)
 		iopt_area_remove_access(
@@ -597,6 +645,7 @@ void iommufd_access_unpin_pages(struct iommufd_access *access,
 				min(last_iova, iopt_area_last_iova(area))));
 	up_read(&iopt->iova_rwsem);
 	WARN_ON(!iopt_area_contig_done(&iter));
+	mutex_unlock(&access->ioas_lock);
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_access_unpin_pages, IOMMUFD);
 
@@ -642,8 +691,8 @@ int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova,
 			     unsigned long length, struct page **out_pages,
 			     unsigned int flags)
 {
-	struct io_pagetable *iopt = &access->ioas->iopt;
 	struct iopt_area_contig_iter iter;
+	struct io_pagetable *iopt;
 	unsigned long last_iova;
 	struct iopt_area *area;
 	int rc;
@@ -658,6 +707,13 @@ int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova,
 	if (check_add_overflow(iova, length - 1, &last_iova))
 		return -EOVERFLOW;
 
+	mutex_lock(&access->ioas_lock);
+	if (!access->ioas) {
+		mutex_unlock(&access->ioas_lock);
+		return -ENOENT;
+	}
+	iopt = &access->ioas->iopt;
+
 	down_read(&iopt->iova_rwsem);
 	iopt_for_each_contig_area(&iter, area, iopt, iova, last_iova) {
 		unsigned long last = min(last_iova, iopt_area_last_iova(area));
@@ -688,6 +744,7 @@ int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova,
 	}
 
 	up_read(&iopt->iova_rwsem);
+	mutex_unlock(&access->ioas_lock);
 	return 0;
 
 err_remove:
@@ -702,6 +759,7 @@ int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova,
 						  iopt_area_last_iova(area))));
 	}
 	up_read(&iopt->iova_rwsem);
+	mutex_unlock(&access->ioas_lock);
 	return rc;
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_access_pin_pages, IOMMUFD);
@@ -721,8 +779,8 @@ EXPORT_SYMBOL_NS_GPL(iommufd_access_pin_pages, IOMMUFD);
 int iommufd_access_rw(struct iommufd_access *access, unsigned long iova,
 		      void *data, size_t length, unsigned int flags)
 {
-	struct io_pagetable *iopt = &access->ioas->iopt;
 	struct iopt_area_contig_iter iter;
+	struct io_pagetable *iopt;
 	struct iopt_area *area;
 	unsigned long last_iova;
 	int rc;
@@ -732,6 +790,13 @@ int iommufd_access_rw(struct iommufd_access *access, unsigned long iova,
 	if (check_add_overflow(iova, length - 1, &last_iova))
 		return -EOVERFLOW;
 
+	mutex_lock(&access->ioas_lock);
+	if (!access->ioas) {
+		mutex_unlock(&access->ioas_lock);
+		return -ENOENT;
+	}
+	iopt = &access->ioas->iopt;
+
 	down_read(&iopt->iova_rwsem);
 	iopt_for_each_contig_area(&iter, area, iopt, iova, last_iova) {
 		unsigned long last = min(last_iova, iopt_area_last_iova(area));
@@ -758,6 +823,7 @@ int iommufd_access_rw(struct iommufd_access *access, unsigned long iova,
 		rc = -ENOENT;
 err_out:
 	up_read(&iopt->iova_rwsem);
+	mutex_unlock(&access->ioas_lock);
 	return rc;
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_access_rw, IOMMUFD);
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index b38e67d1988b..3dcaf86aab97 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -285,6 +285,8 @@ struct iommufd_access {
 	struct iommufd_object obj;
 	struct iommufd_ctx *ictx;
 	struct iommufd_ioas *ioas;
+	struct iommufd_ioas *ioas_unpin;
+	struct mutex ioas_lock;
 	const struct iommufd_access_ops *ops;
 	void *data;
 	unsigned long iova_alignment;
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index 33933b0f95fc..c8508daf9bd9 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -48,6 +48,7 @@ iommufd_access_create(struct iommufd_ctx *ictx,
 		      const struct iommufd_access_ops *ops, void *data, u32 *id);
 void iommufd_access_destroy(struct iommufd_access *access);
 int iommufd_access_attach(struct iommufd_access *access, u32 ioas_id);
+void iommufd_access_detach(struct iommufd_access *access);
 
 void iommufd_ctx_get(struct iommufd_ctx *ictx);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH v12 15/24] vfio-iommufd: Add detach_ioas support for emulated VFIO devices
  2023-06-02 12:16 ` [Intel-gfx] " Yi Liu
@ 2023-06-02 12:16   ` Yi Liu
  -1 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

This prepares for adding DETACH ioctl for emulated VFIO devices.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c  |  1 +
 drivers/s390/cio/vfio_ccw_ops.c   |  1 +
 drivers/s390/crypto/vfio_ap_ops.c |  1 +
 drivers/vfio/iommufd.c            | 13 +++++++++++++
 include/linux/vfio.h              |  3 +++
 samples/vfio-mdev/mbochs.c        |  1 +
 samples/vfio-mdev/mdpy.c          |  1 +
 samples/vfio-mdev/mtty.c          |  1 +
 8 files changed, 22 insertions(+)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index de675d799c7d..9cd9e9da60dd 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1474,6 +1474,7 @@ static const struct vfio_device_ops intel_vgpu_dev_ops = {
 	.bind_iommufd	= vfio_iommufd_emulated_bind,
 	.unbind_iommufd = vfio_iommufd_emulated_unbind,
 	.attach_ioas	= vfio_iommufd_emulated_attach_ioas,
+	.detach_ioas	= vfio_iommufd_emulated_detach_ioas,
 };
 
 static int intel_vgpu_probe(struct mdev_device *mdev)
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index 5b53b94f13c7..cba4971618ff 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -632,6 +632,7 @@ static const struct vfio_device_ops vfio_ccw_dev_ops = {
 	.bind_iommufd = vfio_iommufd_emulated_bind,
 	.unbind_iommufd = vfio_iommufd_emulated_unbind,
 	.attach_ioas = vfio_iommufd_emulated_attach_ioas,
+	.detach_ioas = vfio_iommufd_emulated_detach_ioas,
 };
 
 struct mdev_driver vfio_ccw_mdev_driver = {
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index cfbcb864ab63..50d0293eeef3 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1844,6 +1844,7 @@ static const struct vfio_device_ops vfio_ap_matrix_dev_ops = {
 	.bind_iommufd = vfio_iommufd_emulated_bind,
 	.unbind_iommufd = vfio_iommufd_emulated_unbind,
 	.attach_ioas = vfio_iommufd_emulated_attach_ioas,
+	.detach_ioas = vfio_iommufd_emulated_detach_ioas,
 };
 
 static struct mdev_driver vfio_ap_matrix_driver = {
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index ae96260912d8..a59ed4f881aa 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -236,3 +236,16 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
 	return 0;
 }
 EXPORT_SYMBOL_GPL(vfio_iommufd_emulated_attach_ioas);
+
+void vfio_iommufd_emulated_detach_ioas(struct vfio_device *vdev)
+{
+	lockdep_assert_held(&vdev->dev_set->lock);
+
+	if (WARN_ON(!vdev->iommufd_access) ||
+	    !vdev->iommufd_attached)
+		return;
+
+	iommufd_access_detach(vdev->iommufd_access);
+	vdev->iommufd_attached = false;
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_emulated_detach_ioas);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index e1232d47e553..bdb30efa37a9 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -129,6 +129,7 @@ int vfio_iommufd_emulated_bind(struct vfio_device *vdev,
 			       struct iommufd_ctx *ictx, u32 *out_device_id);
 void vfio_iommufd_emulated_unbind(struct vfio_device *vdev);
 int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
+void vfio_iommufd_emulated_detach_ioas(struct vfio_device *vdev);
 #else
 static inline struct iommufd_ctx *
 vfio_iommufd_device_ictx(struct vfio_device *vdev)
@@ -159,6 +160,8 @@ vfio_iommufd_device_hot_reset_devid(struct vfio_device *vdev,
 	((void (*)(struct vfio_device *vdev)) NULL)
 #define vfio_iommufd_emulated_attach_ioas \
 	((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
+#define vfio_iommufd_emulated_detach_ioas \
+	((void (*)(struct vfio_device *vdev)) NULL)
 #endif
 
 static inline bool vfio_device_cdev_opened(struct vfio_device *device)
diff --git a/samples/vfio-mdev/mbochs.c b/samples/vfio-mdev/mbochs.c
index c6c6b5d26670..3764d1911b51 100644
--- a/samples/vfio-mdev/mbochs.c
+++ b/samples/vfio-mdev/mbochs.c
@@ -1377,6 +1377,7 @@ static const struct vfio_device_ops mbochs_dev_ops = {
 	.bind_iommufd	= vfio_iommufd_emulated_bind,
 	.unbind_iommufd	= vfio_iommufd_emulated_unbind,
 	.attach_ioas	= vfio_iommufd_emulated_attach_ioas,
+	.detach_ioas	= vfio_iommufd_emulated_detach_ioas,
 };
 
 static struct mdev_driver mbochs_driver = {
diff --git a/samples/vfio-mdev/mdpy.c b/samples/vfio-mdev/mdpy.c
index a62ea11e20ec..064e1c0a7aa8 100644
--- a/samples/vfio-mdev/mdpy.c
+++ b/samples/vfio-mdev/mdpy.c
@@ -666,6 +666,7 @@ static const struct vfio_device_ops mdpy_dev_ops = {
 	.bind_iommufd	= vfio_iommufd_emulated_bind,
 	.unbind_iommufd	= vfio_iommufd_emulated_unbind,
 	.attach_ioas	= vfio_iommufd_emulated_attach_ioas,
+	.detach_ioas	= vfio_iommufd_emulated_detach_ioas,
 };
 
 static struct mdev_driver mdpy_driver = {
diff --git a/samples/vfio-mdev/mtty.c b/samples/vfio-mdev/mtty.c
index a60801fb8660..5af00387c519 100644
--- a/samples/vfio-mdev/mtty.c
+++ b/samples/vfio-mdev/mtty.c
@@ -1272,6 +1272,7 @@ static const struct vfio_device_ops mtty_dev_ops = {
 	.bind_iommufd	= vfio_iommufd_emulated_bind,
 	.unbind_iommufd	= vfio_iommufd_emulated_unbind,
 	.attach_ioas	= vfio_iommufd_emulated_attach_ioas,
+	.detach_ioas	= vfio_iommufd_emulated_detach_ioas,
 };
 
 static struct mdev_driver mtty_driver = {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [Intel-gfx] [PATCH v12 15/24] vfio-iommufd: Add detach_ioas support for emulated VFIO devices
@ 2023-06-02 12:16   ` Yi Liu
  0 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

This prepares for adding DETACH ioctl for emulated VFIO devices.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c  |  1 +
 drivers/s390/cio/vfio_ccw_ops.c   |  1 +
 drivers/s390/crypto/vfio_ap_ops.c |  1 +
 drivers/vfio/iommufd.c            | 13 +++++++++++++
 include/linux/vfio.h              |  3 +++
 samples/vfio-mdev/mbochs.c        |  1 +
 samples/vfio-mdev/mdpy.c          |  1 +
 samples/vfio-mdev/mtty.c          |  1 +
 8 files changed, 22 insertions(+)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index de675d799c7d..9cd9e9da60dd 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1474,6 +1474,7 @@ static const struct vfio_device_ops intel_vgpu_dev_ops = {
 	.bind_iommufd	= vfio_iommufd_emulated_bind,
 	.unbind_iommufd = vfio_iommufd_emulated_unbind,
 	.attach_ioas	= vfio_iommufd_emulated_attach_ioas,
+	.detach_ioas	= vfio_iommufd_emulated_detach_ioas,
 };
 
 static int intel_vgpu_probe(struct mdev_device *mdev)
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index 5b53b94f13c7..cba4971618ff 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -632,6 +632,7 @@ static const struct vfio_device_ops vfio_ccw_dev_ops = {
 	.bind_iommufd = vfio_iommufd_emulated_bind,
 	.unbind_iommufd = vfio_iommufd_emulated_unbind,
 	.attach_ioas = vfio_iommufd_emulated_attach_ioas,
+	.detach_ioas = vfio_iommufd_emulated_detach_ioas,
 };
 
 struct mdev_driver vfio_ccw_mdev_driver = {
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index cfbcb864ab63..50d0293eeef3 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1844,6 +1844,7 @@ static const struct vfio_device_ops vfio_ap_matrix_dev_ops = {
 	.bind_iommufd = vfio_iommufd_emulated_bind,
 	.unbind_iommufd = vfio_iommufd_emulated_unbind,
 	.attach_ioas = vfio_iommufd_emulated_attach_ioas,
+	.detach_ioas = vfio_iommufd_emulated_detach_ioas,
 };
 
 static struct mdev_driver vfio_ap_matrix_driver = {
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index ae96260912d8..a59ed4f881aa 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -236,3 +236,16 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
 	return 0;
 }
 EXPORT_SYMBOL_GPL(vfio_iommufd_emulated_attach_ioas);
+
+void vfio_iommufd_emulated_detach_ioas(struct vfio_device *vdev)
+{
+	lockdep_assert_held(&vdev->dev_set->lock);
+
+	if (WARN_ON(!vdev->iommufd_access) ||
+	    !vdev->iommufd_attached)
+		return;
+
+	iommufd_access_detach(vdev->iommufd_access);
+	vdev->iommufd_attached = false;
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_emulated_detach_ioas);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index e1232d47e553..bdb30efa37a9 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -129,6 +129,7 @@ int vfio_iommufd_emulated_bind(struct vfio_device *vdev,
 			       struct iommufd_ctx *ictx, u32 *out_device_id);
 void vfio_iommufd_emulated_unbind(struct vfio_device *vdev);
 int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
+void vfio_iommufd_emulated_detach_ioas(struct vfio_device *vdev);
 #else
 static inline struct iommufd_ctx *
 vfio_iommufd_device_ictx(struct vfio_device *vdev)
@@ -159,6 +160,8 @@ vfio_iommufd_device_hot_reset_devid(struct vfio_device *vdev,
 	((void (*)(struct vfio_device *vdev)) NULL)
 #define vfio_iommufd_emulated_attach_ioas \
 	((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
+#define vfio_iommufd_emulated_detach_ioas \
+	((void (*)(struct vfio_device *vdev)) NULL)
 #endif
 
 static inline bool vfio_device_cdev_opened(struct vfio_device *device)
diff --git a/samples/vfio-mdev/mbochs.c b/samples/vfio-mdev/mbochs.c
index c6c6b5d26670..3764d1911b51 100644
--- a/samples/vfio-mdev/mbochs.c
+++ b/samples/vfio-mdev/mbochs.c
@@ -1377,6 +1377,7 @@ static const struct vfio_device_ops mbochs_dev_ops = {
 	.bind_iommufd	= vfio_iommufd_emulated_bind,
 	.unbind_iommufd	= vfio_iommufd_emulated_unbind,
 	.attach_ioas	= vfio_iommufd_emulated_attach_ioas,
+	.detach_ioas	= vfio_iommufd_emulated_detach_ioas,
 };
 
 static struct mdev_driver mbochs_driver = {
diff --git a/samples/vfio-mdev/mdpy.c b/samples/vfio-mdev/mdpy.c
index a62ea11e20ec..064e1c0a7aa8 100644
--- a/samples/vfio-mdev/mdpy.c
+++ b/samples/vfio-mdev/mdpy.c
@@ -666,6 +666,7 @@ static const struct vfio_device_ops mdpy_dev_ops = {
 	.bind_iommufd	= vfio_iommufd_emulated_bind,
 	.unbind_iommufd	= vfio_iommufd_emulated_unbind,
 	.attach_ioas	= vfio_iommufd_emulated_attach_ioas,
+	.detach_ioas	= vfio_iommufd_emulated_detach_ioas,
 };
 
 static struct mdev_driver mdpy_driver = {
diff --git a/samples/vfio-mdev/mtty.c b/samples/vfio-mdev/mtty.c
index a60801fb8660..5af00387c519 100644
--- a/samples/vfio-mdev/mtty.c
+++ b/samples/vfio-mdev/mtty.c
@@ -1272,6 +1272,7 @@ static const struct vfio_device_ops mtty_dev_ops = {
 	.bind_iommufd	= vfio_iommufd_emulated_bind,
 	.unbind_iommufd	= vfio_iommufd_emulated_unbind,
 	.attach_ioas	= vfio_iommufd_emulated_attach_ioas,
+	.detach_ioas	= vfio_iommufd_emulated_detach_ioas,
 };
 
 static struct mdev_driver mtty_driver = {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH v12 16/24] vfio: Move vfio_device_group_unregister() to be the first operation in unregister
  2023-06-02 12:16 ` [Intel-gfx] " Yi Liu
@ 2023-06-02 12:16   ` Yi Liu
  -1 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

This avoids endless vfio_device refcount increasement by userspace,
which would keep blocking the vfio_unregister_group_dev().

Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/vfio_main.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index df4f3e37268d..f00ba7603351 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -332,6 +332,12 @@ void vfio_unregister_group_dev(struct vfio_device *device)
 	bool interrupted = false;
 	long rc;
 
+	/*
+	 * Prevent new device opened by userspace via the
+	 * VFIO_GROUP_GET_DEVICE_FD in the group path.
+	 */
+	vfio_device_group_unregister(device);
+
 	vfio_device_put_registration(device);
 	rc = try_wait_for_completion(&device->comp);
 	while (rc <= 0) {
@@ -355,8 +361,6 @@ void vfio_unregister_group_dev(struct vfio_device *device)
 		}
 	}
 
-	vfio_device_group_unregister(device);
-
 	/* Balances device_add in register path */
 	device_del(&device->device);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [Intel-gfx] [PATCH v12 16/24] vfio: Move vfio_device_group_unregister() to be the first operation in unregister
@ 2023-06-02 12:16   ` Yi Liu
  0 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

This avoids endless vfio_device refcount increasement by userspace,
which would keep blocking the vfio_unregister_group_dev().

Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/vfio_main.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index df4f3e37268d..f00ba7603351 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -332,6 +332,12 @@ void vfio_unregister_group_dev(struct vfio_device *device)
 	bool interrupted = false;
 	long rc;
 
+	/*
+	 * Prevent new device opened by userspace via the
+	 * VFIO_GROUP_GET_DEVICE_FD in the group path.
+	 */
+	vfio_device_group_unregister(device);
+
 	vfio_device_put_registration(device);
 	rc = try_wait_for_completion(&device->comp);
 	while (rc <= 0) {
@@ -355,8 +361,6 @@ void vfio_unregister_group_dev(struct vfio_device *device)
 		}
 	}
 
-	vfio_device_group_unregister(device);
-
 	/* Balances device_add in register path */
 	device_del(&device->device);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH v12 17/24] vfio: Add cdev for vfio_device
  2023-06-02 12:16 ` [Intel-gfx] " Yi Liu
@ 2023-06-02 12:16   ` Yi Liu
  -1 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

This allows user to directly open a vfio device w/o using the legacy
container/group interface, as a prerequisite for supporting new iommu
features like nested translation.

The device fd opened in this manner doesn't have the capability to access
the device as the fops open() doesn't open the device until the successful
BIND_IOMMUFD which be added in next patch.

With this patch, devices registered to vfio core have both group and device
interface created.

- group interface : /dev/vfio/$groupID
- device interface: /dev/vfio/devices/vfioX - normal device
		    ("X" is the minor number and is unique across devices)

Given a vfio device the user can identify the matching vfioX by checking
the sysfs path of the device. Take PCI device (0000:6a:01.0) for example,
/sys/bus/pci/devices/0000\:6a\:01.0/vfio-dev/vfio0/dev contains the
major:minor of the matching vfioX.

Userspace then opens the /dev/vfio/devices/vfioX and checks with fstat
that the major:minor matches.

The vfio_device cdev logic in this patch:
*) __vfio_register_dev() path ends up doing cdev_device_add() for each
   vfio_device if VFIO_DEVICE_CDEV configured.
*) vfio_unregister_group_dev() path does cdev_device_del();

device interface does not support noiommu devices, noiommu users should
use the legacy group interface.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/Kconfig       | 12 ++++++++
 drivers/vfio/Makefile      |  1 +
 drivers/vfio/device_cdev.c | 62 ++++++++++++++++++++++++++++++++++++++
 drivers/vfio/vfio.h        | 54 +++++++++++++++++++++++++++++++++
 drivers/vfio/vfio_main.c   | 23 +++++++++++---
 include/linux/vfio.h       |  4 +++
 6 files changed, 151 insertions(+), 5 deletions(-)
 create mode 100644 drivers/vfio/device_cdev.c

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 89e06c981e43..1cab8e4729de 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -12,6 +12,18 @@ menuconfig VFIO
 	  If you don't know what to do here, say N.
 
 if VFIO
+config VFIO_DEVICE_CDEV
+	bool "Support for the VFIO cdev /dev/vfio/devices/vfioX"
+	depends on IOMMUFD
+	help
+	  The VFIO device cdev is another way for userspace to get device
+	  access. Userspace gets device fd by opening device cdev under
+	  /dev/vfio/devices/vfioX, and then bind the device fd with an iommufd
+	  to set up secure DMA context for device access.  This interface does
+	  not support noiommu.
+
+	  If you don't know what to do here, say N.
+
 config VFIO_CONTAINER
 	bool "Support for the VFIO container /dev/vfio/vfio"
 	select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index 70e7dcb302ef..245394aeb94b 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_VFIO) += vfio.o
 vfio-y += vfio_main.o \
 	  group.o \
 	  iova_bitmap.o
+vfio-$(CONFIG_VFIO_DEVICE_CDEV) += device_cdev.o
 vfio-$(CONFIG_IOMMUFD) += iommufd.o
 vfio-$(CONFIG_VFIO_CONTAINER) += container.o
 vfio-$(CONFIG_VFIO_VIRQFD) += virqfd.o
diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
new file mode 100644
index 000000000000..1c640016a824
--- /dev/null
+++ b/drivers/vfio/device_cdev.c
@@ -0,0 +1,62 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2023 Intel Corporation.
+ */
+#include <linux/vfio.h>
+
+#include "vfio.h"
+
+static dev_t device_devt;
+
+void vfio_init_device_cdev(struct vfio_device *device)
+{
+	device->device.devt = MKDEV(MAJOR(device_devt), device->index);
+	cdev_init(&device->cdev, &vfio_device_fops);
+	device->cdev.owner = THIS_MODULE;
+}
+
+/*
+ * device access via the fd opened by this function is blocked until
+ * .open_device() is called successfully during BIND_IOMMUFD.
+ */
+int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
+{
+	struct vfio_device *device = container_of(inode->i_cdev,
+						  struct vfio_device, cdev);
+	struct vfio_device_file *df;
+	int ret;
+
+	if (!vfio_device_try_get_registration(device))
+		return -ENODEV;
+
+	df = vfio_allocate_device_file(device);
+	if (IS_ERR(df)) {
+		ret = PTR_ERR(df);
+		goto err_put_registration;
+	}
+
+	filep->private_data = df;
+
+	return 0;
+
+err_put_registration:
+	vfio_device_put_registration(device);
+	return ret;
+}
+
+static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
+{
+	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
+}
+
+int vfio_cdev_init(struct class *device_class)
+{
+	device_class->devnode = vfio_device_devnode;
+	return alloc_chrdev_region(&device_devt, 0,
+				   MINORMASK + 1, "vfio-dev");
+}
+
+void vfio_cdev_cleanup(void)
+{
+	unregister_chrdev_region(device_devt, MINORMASK + 1);
+}
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index b491a0cdbe62..d12b5b524bfc 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -266,6 +266,60 @@ vfio_iommufd_compat_attach_ioas(struct vfio_device *device,
 }
 #endif
 
+#if IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV)
+void vfio_init_device_cdev(struct vfio_device *device);
+
+static inline int vfio_device_add(struct vfio_device *device)
+{
+	/* cdev does not support noiommu device */
+	if (vfio_device_is_noiommu(device))
+		return device_add(&device->device);
+	vfio_init_device_cdev(device);
+	return cdev_device_add(&device->cdev, &device->device);
+}
+
+static inline void vfio_device_del(struct vfio_device *device)
+{
+	if (vfio_device_is_noiommu(device))
+		device_del(&device->device);
+	else
+		cdev_device_del(&device->cdev, &device->device);
+}
+
+int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
+int vfio_cdev_init(struct class *device_class);
+void vfio_cdev_cleanup(void);
+#else
+static inline void vfio_init_device_cdev(struct vfio_device *device)
+{
+}
+
+static inline int vfio_device_add(struct vfio_device *device)
+{
+	return device_add(&device->device);
+}
+
+static inline void vfio_device_del(struct vfio_device *device)
+{
+	device_del(&device->device);
+}
+
+static inline int vfio_device_fops_cdev_open(struct inode *inode,
+					     struct file *filep)
+{
+	return 0;
+}
+
+static inline int vfio_cdev_init(struct class *device_class)
+{
+	return 0;
+}
+
+static inline void vfio_cdev_cleanup(void)
+{
+}
+#endif /* CONFIG_VFIO_DEVICE_CDEV */
+
 #if IS_ENABLED(CONFIG_VFIO_VIRQFD)
 int __init vfio_virqfd_init(void);
 void vfio_virqfd_exit(void);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index f00ba7603351..ef55af75f459 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -292,7 +292,7 @@ static int __vfio_register_dev(struct vfio_device *device,
 	if (ret)
 		return ret;
 
-	ret = device_add(&device->device);
+	ret = vfio_device_add(device);
 	if (ret)
 		goto err_out;
 
@@ -338,6 +338,12 @@ void vfio_unregister_group_dev(struct vfio_device *device)
 	 */
 	vfio_device_group_unregister(device);
 
+	/*
+	 * Balances vfio_device_add() in register path, also prevents
+	 * new device opened by userspace in the cdev path.
+	 */
+	vfio_device_del(device);
+
 	vfio_device_put_registration(device);
 	rc = try_wait_for_completion(&device->comp);
 	while (rc <= 0) {
@@ -361,9 +367,6 @@ void vfio_unregister_group_dev(struct vfio_device *device)
 		}
 	}
 
-	/* Balances device_add in register path */
-	device_del(&device->device);
-
 	/* Balances vfio_device_set_group in register path */
 	vfio_device_remove_group(device);
 }
@@ -567,7 +570,8 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
-	vfio_df_group_close(df);
+	if (df->group)
+		vfio_df_group_close(df);
 
 	vfio_device_put_registration(device);
 
@@ -1216,6 +1220,7 @@ static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
 
 const struct file_operations vfio_device_fops = {
 	.owner		= THIS_MODULE,
+	.open		= vfio_device_fops_cdev_open,
 	.release	= vfio_device_fops_release,
 	.read		= vfio_device_fops_read,
 	.write		= vfio_device_fops_write,
@@ -1567,9 +1572,16 @@ static int __init vfio_init(void)
 		goto err_dev_class;
 	}
 
+	ret = vfio_cdev_init(vfio.device_class);
+	if (ret)
+		goto err_alloc_dev_chrdev;
+
 	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
 	return 0;
 
+err_alloc_dev_chrdev:
+	class_destroy(vfio.device_class);
+	vfio.device_class = NULL;
 err_dev_class:
 	vfio_virqfd_exit();
 err_virqfd:
@@ -1580,6 +1592,7 @@ static int __init vfio_init(void)
 static void __exit vfio_cleanup(void)
 {
 	ida_destroy(&vfio.device_ida);
+	vfio_cdev_cleanup();
 	class_destroy(vfio.device_class);
 	vfio.device_class = NULL;
 	vfio_virqfd_exit();
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index bdb30efa37a9..83cc5dc28b7a 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -13,6 +13,7 @@
 #include <linux/mm.h>
 #include <linux/workqueue.h>
 #include <linux/poll.h>
+#include <linux/cdev.h>
 #include <uapi/linux/vfio.h>
 #include <linux/iova_bitmap.h>
 
@@ -51,6 +52,9 @@ struct vfio_device {
 	/* Members below here are private, not for driver use */
 	unsigned int index;
 	struct device device;	/* device.kref covers object life circle */
+#if IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV)
+	struct cdev cdev;
+#endif
 	refcount_t refcount;	/* user count on registered device*/
 	unsigned int open_count;
 	struct completion comp;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [Intel-gfx] [PATCH v12 17/24] vfio: Add cdev for vfio_device
@ 2023-06-02 12:16   ` Yi Liu
  0 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

This allows user to directly open a vfio device w/o using the legacy
container/group interface, as a prerequisite for supporting new iommu
features like nested translation.

The device fd opened in this manner doesn't have the capability to access
the device as the fops open() doesn't open the device until the successful
BIND_IOMMUFD which be added in next patch.

With this patch, devices registered to vfio core have both group and device
interface created.

- group interface : /dev/vfio/$groupID
- device interface: /dev/vfio/devices/vfioX - normal device
		    ("X" is the minor number and is unique across devices)

Given a vfio device the user can identify the matching vfioX by checking
the sysfs path of the device. Take PCI device (0000:6a:01.0) for example,
/sys/bus/pci/devices/0000\:6a\:01.0/vfio-dev/vfio0/dev contains the
major:minor of the matching vfioX.

Userspace then opens the /dev/vfio/devices/vfioX and checks with fstat
that the major:minor matches.

The vfio_device cdev logic in this patch:
*) __vfio_register_dev() path ends up doing cdev_device_add() for each
   vfio_device if VFIO_DEVICE_CDEV configured.
*) vfio_unregister_group_dev() path does cdev_device_del();

device interface does not support noiommu devices, noiommu users should
use the legacy group interface.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/Kconfig       | 12 ++++++++
 drivers/vfio/Makefile      |  1 +
 drivers/vfio/device_cdev.c | 62 ++++++++++++++++++++++++++++++++++++++
 drivers/vfio/vfio.h        | 54 +++++++++++++++++++++++++++++++++
 drivers/vfio/vfio_main.c   | 23 +++++++++++---
 include/linux/vfio.h       |  4 +++
 6 files changed, 151 insertions(+), 5 deletions(-)
 create mode 100644 drivers/vfio/device_cdev.c

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 89e06c981e43..1cab8e4729de 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -12,6 +12,18 @@ menuconfig VFIO
 	  If you don't know what to do here, say N.
 
 if VFIO
+config VFIO_DEVICE_CDEV
+	bool "Support for the VFIO cdev /dev/vfio/devices/vfioX"
+	depends on IOMMUFD
+	help
+	  The VFIO device cdev is another way for userspace to get device
+	  access. Userspace gets device fd by opening device cdev under
+	  /dev/vfio/devices/vfioX, and then bind the device fd with an iommufd
+	  to set up secure DMA context for device access.  This interface does
+	  not support noiommu.
+
+	  If you don't know what to do here, say N.
+
 config VFIO_CONTAINER
 	bool "Support for the VFIO container /dev/vfio/vfio"
 	select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index 70e7dcb302ef..245394aeb94b 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_VFIO) += vfio.o
 vfio-y += vfio_main.o \
 	  group.o \
 	  iova_bitmap.o
+vfio-$(CONFIG_VFIO_DEVICE_CDEV) += device_cdev.o
 vfio-$(CONFIG_IOMMUFD) += iommufd.o
 vfio-$(CONFIG_VFIO_CONTAINER) += container.o
 vfio-$(CONFIG_VFIO_VIRQFD) += virqfd.o
diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
new file mode 100644
index 000000000000..1c640016a824
--- /dev/null
+++ b/drivers/vfio/device_cdev.c
@@ -0,0 +1,62 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2023 Intel Corporation.
+ */
+#include <linux/vfio.h>
+
+#include "vfio.h"
+
+static dev_t device_devt;
+
+void vfio_init_device_cdev(struct vfio_device *device)
+{
+	device->device.devt = MKDEV(MAJOR(device_devt), device->index);
+	cdev_init(&device->cdev, &vfio_device_fops);
+	device->cdev.owner = THIS_MODULE;
+}
+
+/*
+ * device access via the fd opened by this function is blocked until
+ * .open_device() is called successfully during BIND_IOMMUFD.
+ */
+int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
+{
+	struct vfio_device *device = container_of(inode->i_cdev,
+						  struct vfio_device, cdev);
+	struct vfio_device_file *df;
+	int ret;
+
+	if (!vfio_device_try_get_registration(device))
+		return -ENODEV;
+
+	df = vfio_allocate_device_file(device);
+	if (IS_ERR(df)) {
+		ret = PTR_ERR(df);
+		goto err_put_registration;
+	}
+
+	filep->private_data = df;
+
+	return 0;
+
+err_put_registration:
+	vfio_device_put_registration(device);
+	return ret;
+}
+
+static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
+{
+	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
+}
+
+int vfio_cdev_init(struct class *device_class)
+{
+	device_class->devnode = vfio_device_devnode;
+	return alloc_chrdev_region(&device_devt, 0,
+				   MINORMASK + 1, "vfio-dev");
+}
+
+void vfio_cdev_cleanup(void)
+{
+	unregister_chrdev_region(device_devt, MINORMASK + 1);
+}
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index b491a0cdbe62..d12b5b524bfc 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -266,6 +266,60 @@ vfio_iommufd_compat_attach_ioas(struct vfio_device *device,
 }
 #endif
 
+#if IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV)
+void vfio_init_device_cdev(struct vfio_device *device);
+
+static inline int vfio_device_add(struct vfio_device *device)
+{
+	/* cdev does not support noiommu device */
+	if (vfio_device_is_noiommu(device))
+		return device_add(&device->device);
+	vfio_init_device_cdev(device);
+	return cdev_device_add(&device->cdev, &device->device);
+}
+
+static inline void vfio_device_del(struct vfio_device *device)
+{
+	if (vfio_device_is_noiommu(device))
+		device_del(&device->device);
+	else
+		cdev_device_del(&device->cdev, &device->device);
+}
+
+int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
+int vfio_cdev_init(struct class *device_class);
+void vfio_cdev_cleanup(void);
+#else
+static inline void vfio_init_device_cdev(struct vfio_device *device)
+{
+}
+
+static inline int vfio_device_add(struct vfio_device *device)
+{
+	return device_add(&device->device);
+}
+
+static inline void vfio_device_del(struct vfio_device *device)
+{
+	device_del(&device->device);
+}
+
+static inline int vfio_device_fops_cdev_open(struct inode *inode,
+					     struct file *filep)
+{
+	return 0;
+}
+
+static inline int vfio_cdev_init(struct class *device_class)
+{
+	return 0;
+}
+
+static inline void vfio_cdev_cleanup(void)
+{
+}
+#endif /* CONFIG_VFIO_DEVICE_CDEV */
+
 #if IS_ENABLED(CONFIG_VFIO_VIRQFD)
 int __init vfio_virqfd_init(void);
 void vfio_virqfd_exit(void);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index f00ba7603351..ef55af75f459 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -292,7 +292,7 @@ static int __vfio_register_dev(struct vfio_device *device,
 	if (ret)
 		return ret;
 
-	ret = device_add(&device->device);
+	ret = vfio_device_add(device);
 	if (ret)
 		goto err_out;
 
@@ -338,6 +338,12 @@ void vfio_unregister_group_dev(struct vfio_device *device)
 	 */
 	vfio_device_group_unregister(device);
 
+	/*
+	 * Balances vfio_device_add() in register path, also prevents
+	 * new device opened by userspace in the cdev path.
+	 */
+	vfio_device_del(device);
+
 	vfio_device_put_registration(device);
 	rc = try_wait_for_completion(&device->comp);
 	while (rc <= 0) {
@@ -361,9 +367,6 @@ void vfio_unregister_group_dev(struct vfio_device *device)
 		}
 	}
 
-	/* Balances device_add in register path */
-	device_del(&device->device);
-
 	/* Balances vfio_device_set_group in register path */
 	vfio_device_remove_group(device);
 }
@@ -567,7 +570,8 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 	struct vfio_device_file *df = filep->private_data;
 	struct vfio_device *device = df->device;
 
-	vfio_df_group_close(df);
+	if (df->group)
+		vfio_df_group_close(df);
 
 	vfio_device_put_registration(device);
 
@@ -1216,6 +1220,7 @@ static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
 
 const struct file_operations vfio_device_fops = {
 	.owner		= THIS_MODULE,
+	.open		= vfio_device_fops_cdev_open,
 	.release	= vfio_device_fops_release,
 	.read		= vfio_device_fops_read,
 	.write		= vfio_device_fops_write,
@@ -1567,9 +1572,16 @@ static int __init vfio_init(void)
 		goto err_dev_class;
 	}
 
+	ret = vfio_cdev_init(vfio.device_class);
+	if (ret)
+		goto err_alloc_dev_chrdev;
+
 	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
 	return 0;
 
+err_alloc_dev_chrdev:
+	class_destroy(vfio.device_class);
+	vfio.device_class = NULL;
 err_dev_class:
 	vfio_virqfd_exit();
 err_virqfd:
@@ -1580,6 +1592,7 @@ static int __init vfio_init(void)
 static void __exit vfio_cleanup(void)
 {
 	ida_destroy(&vfio.device_ida);
+	vfio_cdev_cleanup();
 	class_destroy(vfio.device_class);
 	vfio.device_class = NULL;
 	vfio_virqfd_exit();
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index bdb30efa37a9..83cc5dc28b7a 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -13,6 +13,7 @@
 #include <linux/mm.h>
 #include <linux/workqueue.h>
 #include <linux/poll.h>
+#include <linux/cdev.h>
 #include <uapi/linux/vfio.h>
 #include <linux/iova_bitmap.h>
 
@@ -51,6 +52,9 @@ struct vfio_device {
 	/* Members below here are private, not for driver use */
 	unsigned int index;
 	struct device device;	/* device.kref covers object life circle */
+#if IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV)
+	struct cdev cdev;
+#endif
 	refcount_t refcount;	/* user count on registered device*/
 	unsigned int open_count;
 	struct completion comp;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-06-02 12:16 ` [Intel-gfx] " Yi Liu
@ 2023-06-02 12:16   ` Yi Liu
  -1 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

This adds ioctl for userspace to bind device cdev fd to iommufd.

    VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA
			      control provided by the iommufd. open_device
			      op is called after bind_iommufd op.

Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/device_cdev.c | 123 +++++++++++++++++++++++++++++++++++++
 drivers/vfio/vfio.h        |  13 ++++
 drivers/vfio/vfio_main.c   |   5 ++
 include/linux/vfio.h       |   3 +-
 include/uapi/linux/vfio.h  |  27 ++++++++
 5 files changed, 170 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
index 1c640016a824..a4498ddbe774 100644
--- a/drivers/vfio/device_cdev.c
+++ b/drivers/vfio/device_cdev.c
@@ -3,6 +3,7 @@
  * Copyright (c) 2023 Intel Corporation.
  */
 #include <linux/vfio.h>
+#include <linux/iommufd.h>
 
 #include "vfio.h"
 
@@ -44,6 +45,128 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
 	return ret;
 }
 
+static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
+{
+	spin_lock(&df->kvm_ref_lock);
+	if (df->kvm)
+		_vfio_device_get_kvm_safe(df->device, df->kvm);
+	spin_unlock(&df->kvm_ref_lock);
+}
+
+void vfio_df_cdev_close(struct vfio_device_file *df)
+{
+	struct vfio_device *device = df->device;
+
+	/*
+	 * In the time of close, there is no contention with another one
+	 * changing this flag.  So read df->access_granted without lock
+	 * and no smp_load_acquire() is ok.
+	 */
+	if (!df->access_granted)
+		return;
+
+	mutex_lock(&device->dev_set->lock);
+	vfio_df_close(df);
+	vfio_device_put_kvm(device);
+	iommufd_ctx_put(df->iommufd);
+	device->cdev_opened = false;
+	mutex_unlock(&device->dev_set->lock);
+	vfio_device_unblock_group(device);
+}
+
+static struct iommufd_ctx *vfio_get_iommufd_from_fd(int fd)
+{
+	struct iommufd_ctx *iommufd;
+	struct fd f;
+
+	f = fdget(fd);
+	if (!f.file)
+		return ERR_PTR(-EBADF);
+
+	iommufd = iommufd_ctx_from_file(f.file);
+
+	fdput(f);
+	return iommufd;
+}
+
+long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
+				struct vfio_device_bind_iommufd __user *arg)
+{
+	struct vfio_device *device = df->device;
+	struct vfio_device_bind_iommufd bind;
+	unsigned long minsz;
+	int ret;
+
+	static_assert(__same_type(arg->out_devid, df->devid));
+
+	minsz = offsetofend(struct vfio_device_bind_iommufd, out_devid);
+
+	if (copy_from_user(&bind, arg, minsz))
+		return -EFAULT;
+
+	if (bind.argsz < minsz || bind.flags || bind.iommufd < 0)
+		return -EINVAL;
+
+	/* BIND_IOMMUFD only allowed for cdev fds */
+	if (df->group)
+		return -EINVAL;
+
+	ret = vfio_device_block_group(device);
+	if (ret)
+		return ret;
+
+	mutex_lock(&device->dev_set->lock);
+	/* one device cannot be bound twice */
+	if (df->access_granted) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	df->iommufd = vfio_get_iommufd_from_fd(bind.iommufd);
+	if (IS_ERR(df->iommufd)) {
+		ret = PTR_ERR(df->iommufd);
+		df->iommufd = NULL;
+		goto out_unlock;
+	}
+
+	/*
+	 * Before the device open, get the KVM pointer currently
+	 * associated with the device file (if there is) and obtain
+	 * a reference.  This reference is held until device closed.
+	 * Save the pointer in the device for use by drivers.
+	 */
+	vfio_device_get_kvm_safe(df);
+
+	ret = vfio_df_open(df);
+	if (ret)
+		goto out_put_kvm;
+
+	ret = copy_to_user(&arg->out_devid, &df->devid,
+			   sizeof(df->devid)) ? -EFAULT : 0;
+	if (ret)
+		goto out_close_device;
+
+	/*
+	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
+	 * read/write/mmap
+	 */
+	smp_store_release(&df->access_granted, true);
+	device->cdev_opened = true;
+	mutex_unlock(&device->dev_set->lock);
+	return 0;
+
+out_close_device:
+	vfio_df_close(df);
+out_put_kvm:
+	vfio_device_put_kvm(device);
+	iommufd_ctx_put(df->iommufd);
+	df->iommufd = NULL;
+out_unlock:
+	mutex_unlock(&device->dev_set->lock);
+	vfio_device_unblock_group(device);
+	return ret;
+}
+
 static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
 {
 	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index d12b5b524bfc..42de40d2cd4d 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -287,6 +287,9 @@ static inline void vfio_device_del(struct vfio_device *device)
 }
 
 int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
+void vfio_df_cdev_close(struct vfio_device_file *df);
+long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
+				struct vfio_device_bind_iommufd __user *arg);
 int vfio_cdev_init(struct class *device_class);
 void vfio_cdev_cleanup(void);
 #else
@@ -310,6 +313,16 @@ static inline int vfio_device_fops_cdev_open(struct inode *inode,
 	return 0;
 }
 
+static inline void vfio_df_cdev_close(struct vfio_device_file *df)
+{
+}
+
+static inline long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
+					      struct vfio_device_bind_iommufd __user *arg)
+{
+	return -EOPNOTSUPP;
+}
+
 static inline int vfio_cdev_init(struct class *device_class)
 {
 	return 0;
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index ef55af75f459..9ba4d420eda2 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -572,6 +572,8 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 
 	if (df->group)
 		vfio_df_group_close(df);
+	else
+		vfio_df_cdev_close(df);
 
 	vfio_device_put_registration(device);
 
@@ -1145,6 +1147,9 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
 	struct vfio_device *device = df->device;
 	int ret;
 
+	if (cmd == VFIO_DEVICE_BIND_IOMMUFD)
+		return vfio_df_ioctl_bind_iommufd(df, (void __user *)arg);
+
 	/* Paired with smp_store_release() following vfio_df_open() */
 	if (!smp_load_acquire(&df->access_granted))
 		return -EINVAL;
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 83cc5dc28b7a..e80a8ac86e46 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -66,6 +66,7 @@ struct vfio_device {
 	struct iommufd_device *iommufd_device;
 	bool iommufd_attached;
 #endif
+	bool cdev_opened:1;
 };
 
 /**
@@ -170,7 +171,7 @@ vfio_iommufd_device_hot_reset_devid(struct vfio_device *vdev,
 
 static inline bool vfio_device_cdev_opened(struct vfio_device *device)
 {
-	return false;
+	return device->cdev_opened;
 }
 
 /**
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index f753124e1c82..7296012b7f36 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -194,6 +194,33 @@ struct vfio_group_status {
 
 /* --------------- IOCTLs for DEVICE file descriptors --------------- */
 
+/*
+ * VFIO_DEVICE_BIND_IOMMUFD - _IOR(VFIO_TYPE, VFIO_BASE + 18,
+ *				   struct vfio_device_bind_iommufd)
+ * @argsz:	 User filled size of this data.
+ * @flags:	 Must be 0.
+ * @iommufd:	 iommufd to bind.
+ * @out_devid:	 The device id generated by this bind. devid is a handle for
+ *		 this device/iommufd bond and can be used in IOMMUFD commands.
+ *
+ * Bind a vfio_device to the specified iommufd.
+ *
+ * User is restricted from accessing the device before the binding operation
+ * is completed.
+ *
+ * Unbind is automatically conducted when device fd is closed.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+struct vfio_device_bind_iommufd {
+	__u32		argsz;
+	__u32		flags;
+	__s32		iommufd;
+	__u32		out_devid;
+};
+
+#define VFIO_DEVICE_BIND_IOMMUFD	_IO(VFIO_TYPE, VFIO_BASE + 18)
+
 /**
  * VFIO_DEVICE_GET_INFO - _IOR(VFIO_TYPE, VFIO_BASE + 7,
  *						struct vfio_device_info)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [Intel-gfx] [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
@ 2023-06-02 12:16   ` Yi Liu
  0 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

This adds ioctl for userspace to bind device cdev fd to iommufd.

    VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA
			      control provided by the iommufd. open_device
			      op is called after bind_iommufd op.

Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/device_cdev.c | 123 +++++++++++++++++++++++++++++++++++++
 drivers/vfio/vfio.h        |  13 ++++
 drivers/vfio/vfio_main.c   |   5 ++
 include/linux/vfio.h       |   3 +-
 include/uapi/linux/vfio.h  |  27 ++++++++
 5 files changed, 170 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
index 1c640016a824..a4498ddbe774 100644
--- a/drivers/vfio/device_cdev.c
+++ b/drivers/vfio/device_cdev.c
@@ -3,6 +3,7 @@
  * Copyright (c) 2023 Intel Corporation.
  */
 #include <linux/vfio.h>
+#include <linux/iommufd.h>
 
 #include "vfio.h"
 
@@ -44,6 +45,128 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
 	return ret;
 }
 
+static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
+{
+	spin_lock(&df->kvm_ref_lock);
+	if (df->kvm)
+		_vfio_device_get_kvm_safe(df->device, df->kvm);
+	spin_unlock(&df->kvm_ref_lock);
+}
+
+void vfio_df_cdev_close(struct vfio_device_file *df)
+{
+	struct vfio_device *device = df->device;
+
+	/*
+	 * In the time of close, there is no contention with another one
+	 * changing this flag.  So read df->access_granted without lock
+	 * and no smp_load_acquire() is ok.
+	 */
+	if (!df->access_granted)
+		return;
+
+	mutex_lock(&device->dev_set->lock);
+	vfio_df_close(df);
+	vfio_device_put_kvm(device);
+	iommufd_ctx_put(df->iommufd);
+	device->cdev_opened = false;
+	mutex_unlock(&device->dev_set->lock);
+	vfio_device_unblock_group(device);
+}
+
+static struct iommufd_ctx *vfio_get_iommufd_from_fd(int fd)
+{
+	struct iommufd_ctx *iommufd;
+	struct fd f;
+
+	f = fdget(fd);
+	if (!f.file)
+		return ERR_PTR(-EBADF);
+
+	iommufd = iommufd_ctx_from_file(f.file);
+
+	fdput(f);
+	return iommufd;
+}
+
+long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
+				struct vfio_device_bind_iommufd __user *arg)
+{
+	struct vfio_device *device = df->device;
+	struct vfio_device_bind_iommufd bind;
+	unsigned long minsz;
+	int ret;
+
+	static_assert(__same_type(arg->out_devid, df->devid));
+
+	minsz = offsetofend(struct vfio_device_bind_iommufd, out_devid);
+
+	if (copy_from_user(&bind, arg, minsz))
+		return -EFAULT;
+
+	if (bind.argsz < minsz || bind.flags || bind.iommufd < 0)
+		return -EINVAL;
+
+	/* BIND_IOMMUFD only allowed for cdev fds */
+	if (df->group)
+		return -EINVAL;
+
+	ret = vfio_device_block_group(device);
+	if (ret)
+		return ret;
+
+	mutex_lock(&device->dev_set->lock);
+	/* one device cannot be bound twice */
+	if (df->access_granted) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	df->iommufd = vfio_get_iommufd_from_fd(bind.iommufd);
+	if (IS_ERR(df->iommufd)) {
+		ret = PTR_ERR(df->iommufd);
+		df->iommufd = NULL;
+		goto out_unlock;
+	}
+
+	/*
+	 * Before the device open, get the KVM pointer currently
+	 * associated with the device file (if there is) and obtain
+	 * a reference.  This reference is held until device closed.
+	 * Save the pointer in the device for use by drivers.
+	 */
+	vfio_device_get_kvm_safe(df);
+
+	ret = vfio_df_open(df);
+	if (ret)
+		goto out_put_kvm;
+
+	ret = copy_to_user(&arg->out_devid, &df->devid,
+			   sizeof(df->devid)) ? -EFAULT : 0;
+	if (ret)
+		goto out_close_device;
+
+	/*
+	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
+	 * read/write/mmap
+	 */
+	smp_store_release(&df->access_granted, true);
+	device->cdev_opened = true;
+	mutex_unlock(&device->dev_set->lock);
+	return 0;
+
+out_close_device:
+	vfio_df_close(df);
+out_put_kvm:
+	vfio_device_put_kvm(device);
+	iommufd_ctx_put(df->iommufd);
+	df->iommufd = NULL;
+out_unlock:
+	mutex_unlock(&device->dev_set->lock);
+	vfio_device_unblock_group(device);
+	return ret;
+}
+
 static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
 {
 	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index d12b5b524bfc..42de40d2cd4d 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -287,6 +287,9 @@ static inline void vfio_device_del(struct vfio_device *device)
 }
 
 int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
+void vfio_df_cdev_close(struct vfio_device_file *df);
+long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
+				struct vfio_device_bind_iommufd __user *arg);
 int vfio_cdev_init(struct class *device_class);
 void vfio_cdev_cleanup(void);
 #else
@@ -310,6 +313,16 @@ static inline int vfio_device_fops_cdev_open(struct inode *inode,
 	return 0;
 }
 
+static inline void vfio_df_cdev_close(struct vfio_device_file *df)
+{
+}
+
+static inline long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
+					      struct vfio_device_bind_iommufd __user *arg)
+{
+	return -EOPNOTSUPP;
+}
+
 static inline int vfio_cdev_init(struct class *device_class)
 {
 	return 0;
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index ef55af75f459..9ba4d420eda2 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -572,6 +572,8 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 
 	if (df->group)
 		vfio_df_group_close(df);
+	else
+		vfio_df_cdev_close(df);
 
 	vfio_device_put_registration(device);
 
@@ -1145,6 +1147,9 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
 	struct vfio_device *device = df->device;
 	int ret;
 
+	if (cmd == VFIO_DEVICE_BIND_IOMMUFD)
+		return vfio_df_ioctl_bind_iommufd(df, (void __user *)arg);
+
 	/* Paired with smp_store_release() following vfio_df_open() */
 	if (!smp_load_acquire(&df->access_granted))
 		return -EINVAL;
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 83cc5dc28b7a..e80a8ac86e46 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -66,6 +66,7 @@ struct vfio_device {
 	struct iommufd_device *iommufd_device;
 	bool iommufd_attached;
 #endif
+	bool cdev_opened:1;
 };
 
 /**
@@ -170,7 +171,7 @@ vfio_iommufd_device_hot_reset_devid(struct vfio_device *vdev,
 
 static inline bool vfio_device_cdev_opened(struct vfio_device *device)
 {
-	return false;
+	return device->cdev_opened;
 }
 
 /**
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index f753124e1c82..7296012b7f36 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -194,6 +194,33 @@ struct vfio_group_status {
 
 /* --------------- IOCTLs for DEVICE file descriptors --------------- */
 
+/*
+ * VFIO_DEVICE_BIND_IOMMUFD - _IOR(VFIO_TYPE, VFIO_BASE + 18,
+ *				   struct vfio_device_bind_iommufd)
+ * @argsz:	 User filled size of this data.
+ * @flags:	 Must be 0.
+ * @iommufd:	 iommufd to bind.
+ * @out_devid:	 The device id generated by this bind. devid is a handle for
+ *		 this device/iommufd bond and can be used in IOMMUFD commands.
+ *
+ * Bind a vfio_device to the specified iommufd.
+ *
+ * User is restricted from accessing the device before the binding operation
+ * is completed.
+ *
+ * Unbind is automatically conducted when device fd is closed.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+struct vfio_device_bind_iommufd {
+	__u32		argsz;
+	__u32		flags;
+	__s32		iommufd;
+	__u32		out_devid;
+};
+
+#define VFIO_DEVICE_BIND_IOMMUFD	_IO(VFIO_TYPE, VFIO_BASE + 18)
+
 /**
  * VFIO_DEVICE_GET_INFO - _IOR(VFIO_TYPE, VFIO_BASE + 7,
  *						struct vfio_device_info)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH v12 19/24] vfio: Add VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT
  2023-06-02 12:16 ` [Intel-gfx] " Yi Liu
@ 2023-06-02 12:16   ` Yi Liu
  -1 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

This adds ioctl for userspace to attach device cdev fd to and detach
from IOAS/hw_pagetable managed by iommufd.

    VFIO_DEVICE_ATTACH_IOMMUFD_PT: attach vfio device to IOAS, hw_pagetable
				   managed by iommufd. Attach can be
				   undo by VFIO_DEVICE_DETACH_IOMMUFD_PT
				   or device fd close.
    VFIO_DEVICE_DETACH_IOMMUFD_PT: detach vfio device from the current attached
				   IOAS or hw_pagetable managed by iommufd.

noiommu devices do not support [AT|DE]TACH, if user invokes the two ioctls
on such devices, shall fail.

Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/device_cdev.c | 66 ++++++++++++++++++++++++++++++++++++++
 drivers/vfio/vfio.h        | 16 +++++++++
 drivers/vfio/vfio_main.c   |  8 +++++
 include/uapi/linux/vfio.h  | 42 ++++++++++++++++++++++++
 4 files changed, 132 insertions(+)

diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
index a4498ddbe774..6e1d499ee160 100644
--- a/drivers/vfio/device_cdev.c
+++ b/drivers/vfio/device_cdev.c
@@ -167,6 +167,72 @@ long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
 	return ret;
 }
 
+int vfio_df_ioctl_attach_pt(struct vfio_device_file *df,
+			    struct vfio_device_attach_iommufd_pt __user *arg)
+{
+	struct vfio_device *device = df->device;
+	struct vfio_device_attach_iommufd_pt attach;
+	unsigned long minsz;
+	int ret;
+
+	minsz = offsetofend(struct vfio_device_attach_iommufd_pt, pt_id);
+
+	if (copy_from_user(&attach, arg, minsz))
+		return -EFAULT;
+
+	if (attach.argsz < minsz || attach.flags)
+		return -EINVAL;
+
+	/* ATTACH only allowed for cdev fds */
+	if (df->group)
+		return -EINVAL;
+
+	mutex_lock(&device->dev_set->lock);
+	ret = device->ops->attach_ioas(device, &attach.pt_id);
+	if (ret)
+		goto out_unlock;
+
+	ret = copy_to_user(&arg->pt_id, &attach.pt_id,
+			   sizeof(attach.pt_id)) ? -EFAULT : 0;
+	if (ret)
+		goto out_detach;
+	mutex_unlock(&device->dev_set->lock);
+
+	return 0;
+
+out_detach:
+	device->ops->detach_ioas(device);
+out_unlock:
+	mutex_unlock(&device->dev_set->lock);
+	return ret;
+}
+
+int vfio_df_ioctl_detach_pt(struct vfio_device_file *df,
+			    struct vfio_device_detach_iommufd_pt __user *arg)
+{
+	struct vfio_device *device = df->device;
+	struct vfio_device_detach_iommufd_pt detach;
+	unsigned long minsz;
+
+	minsz = offsetofend(struct vfio_device_detach_iommufd_pt, flags);
+
+	if (copy_from_user(&detach, arg, minsz))
+		return -EFAULT;
+
+	if (detach.argsz < minsz || detach.flags)
+		return -EINVAL;
+
+	/* DETACH only allowed for cdev fds */
+	if (df->group)
+		return -EINVAL;
+
+	mutex_lock(&device->dev_set->lock);
+	device->ops->detach_ioas(device);
+	mutex_unlock(&device->dev_set->lock);
+
+	return 0;
+}
+
 static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
 {
 	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 42de40d2cd4d..5835c74e97ce 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -290,6 +290,10 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
 void vfio_df_cdev_close(struct vfio_device_file *df);
 long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
 				struct vfio_device_bind_iommufd __user *arg);
+int vfio_df_ioctl_attach_pt(struct vfio_device_file *df,
+			    struct vfio_device_attach_iommufd_pt __user *arg);
+int vfio_df_ioctl_detach_pt(struct vfio_device_file *df,
+			    struct vfio_device_detach_iommufd_pt __user *arg);
 int vfio_cdev_init(struct class *device_class);
 void vfio_cdev_cleanup(void);
 #else
@@ -323,6 +327,18 @@ static inline long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
 	return -EOPNOTSUPP;
 }
 
+static inline int vfio_df_ioctl_attach_pt(struct vfio_device_file *df,
+					  struct vfio_device_attach_iommufd_pt __user *arg)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline int vfio_df_ioctl_detach_pt(struct vfio_device_file *df,
+					  struct vfio_device_detach_iommufd_pt __user *arg)
+{
+	return -EOPNOTSUPP;
+}
+
 static inline int vfio_cdev_init(struct class *device_class)
 {
 	return 0;
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 9ba4d420eda2..6d8f9b0f3637 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1163,6 +1163,14 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
 		ret = vfio_ioctl_device_feature(device, (void __user *)arg);
 		break;
 
+	case VFIO_DEVICE_ATTACH_IOMMUFD_PT:
+		ret = vfio_df_ioctl_attach_pt(df, (void __user *)arg);
+		break;
+
+	case VFIO_DEVICE_DETACH_IOMMUFD_PT:
+		ret = vfio_df_ioctl_detach_pt(df, (void __user *)arg);
+		break;
+
 	default:
 		if (unlikely(!device->ops->ioctl))
 			ret = -EINVAL;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 7296012b7f36..355deb852e78 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -221,6 +221,48 @@ struct vfio_device_bind_iommufd {
 
 #define VFIO_DEVICE_BIND_IOMMUFD	_IO(VFIO_TYPE, VFIO_BASE + 18)
 
+/*
+ * VFIO_DEVICE_ATTACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 19,
+ *					struct vfio_device_attach_iommufd_pt)
+ * @argsz:	User filled size of this data.
+ * @flags:	Must be 0.
+ * @pt_id:	Input the target id which can represent an ioas or a hwpt
+ *		allocated via iommufd subsystem.
+ *		Output the input ioas id or the attached hwpt id which could
+ *		be the specified hwpt itself or a hwpt automatically created
+ *		for the specified ioas by kernel during the attachment.
+ *
+ * Associate the device with an address space within the bound iommufd.
+ * Undo by VFIO_DEVICE_DETACH_IOMMUFD_PT or device fd close.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+struct vfio_device_attach_iommufd_pt {
+	__u32	argsz;
+	__u32	flags;
+	__u32	pt_id;
+};
+
+#define VFIO_DEVICE_ATTACH_IOMMUFD_PT		_IO(VFIO_TYPE, VFIO_BASE + 19)
+
+/*
+ * VFIO_DEVICE_DETACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 20,
+ *					struct vfio_device_detach_iommufd_pt)
+ * @argsz:	User filled size of this data.
+ * @flags:	Must be 0.
+ *
+ * Remove the association of the device and its current associated address
+ * space.  After it, the device should be in a blocking DMA state.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+struct vfio_device_detach_iommufd_pt {
+	__u32	argsz;
+	__u32	flags;
+};
+
+#define VFIO_DEVICE_DETACH_IOMMUFD_PT		_IO(VFIO_TYPE, VFIO_BASE + 20)
+
 /**
  * VFIO_DEVICE_GET_INFO - _IOR(VFIO_TYPE, VFIO_BASE + 7,
  *						struct vfio_device_info)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [Intel-gfx] [PATCH v12 19/24] vfio: Add VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT
@ 2023-06-02 12:16   ` Yi Liu
  0 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

This adds ioctl for userspace to attach device cdev fd to and detach
from IOAS/hw_pagetable managed by iommufd.

    VFIO_DEVICE_ATTACH_IOMMUFD_PT: attach vfio device to IOAS, hw_pagetable
				   managed by iommufd. Attach can be
				   undo by VFIO_DEVICE_DETACH_IOMMUFD_PT
				   or device fd close.
    VFIO_DEVICE_DETACH_IOMMUFD_PT: detach vfio device from the current attached
				   IOAS or hw_pagetable managed by iommufd.

noiommu devices do not support [AT|DE]TACH, if user invokes the two ioctls
on such devices, shall fail.

Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/device_cdev.c | 66 ++++++++++++++++++++++++++++++++++++++
 drivers/vfio/vfio.h        | 16 +++++++++
 drivers/vfio/vfio_main.c   |  8 +++++
 include/uapi/linux/vfio.h  | 42 ++++++++++++++++++++++++
 4 files changed, 132 insertions(+)

diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
index a4498ddbe774..6e1d499ee160 100644
--- a/drivers/vfio/device_cdev.c
+++ b/drivers/vfio/device_cdev.c
@@ -167,6 +167,72 @@ long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
 	return ret;
 }
 
+int vfio_df_ioctl_attach_pt(struct vfio_device_file *df,
+			    struct vfio_device_attach_iommufd_pt __user *arg)
+{
+	struct vfio_device *device = df->device;
+	struct vfio_device_attach_iommufd_pt attach;
+	unsigned long minsz;
+	int ret;
+
+	minsz = offsetofend(struct vfio_device_attach_iommufd_pt, pt_id);
+
+	if (copy_from_user(&attach, arg, minsz))
+		return -EFAULT;
+
+	if (attach.argsz < minsz || attach.flags)
+		return -EINVAL;
+
+	/* ATTACH only allowed for cdev fds */
+	if (df->group)
+		return -EINVAL;
+
+	mutex_lock(&device->dev_set->lock);
+	ret = device->ops->attach_ioas(device, &attach.pt_id);
+	if (ret)
+		goto out_unlock;
+
+	ret = copy_to_user(&arg->pt_id, &attach.pt_id,
+			   sizeof(attach.pt_id)) ? -EFAULT : 0;
+	if (ret)
+		goto out_detach;
+	mutex_unlock(&device->dev_set->lock);
+
+	return 0;
+
+out_detach:
+	device->ops->detach_ioas(device);
+out_unlock:
+	mutex_unlock(&device->dev_set->lock);
+	return ret;
+}
+
+int vfio_df_ioctl_detach_pt(struct vfio_device_file *df,
+			    struct vfio_device_detach_iommufd_pt __user *arg)
+{
+	struct vfio_device *device = df->device;
+	struct vfio_device_detach_iommufd_pt detach;
+	unsigned long minsz;
+
+	minsz = offsetofend(struct vfio_device_detach_iommufd_pt, flags);
+
+	if (copy_from_user(&detach, arg, minsz))
+		return -EFAULT;
+
+	if (detach.argsz < minsz || detach.flags)
+		return -EINVAL;
+
+	/* DETACH only allowed for cdev fds */
+	if (df->group)
+		return -EINVAL;
+
+	mutex_lock(&device->dev_set->lock);
+	device->ops->detach_ioas(device);
+	mutex_unlock(&device->dev_set->lock);
+
+	return 0;
+}
+
 static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
 {
 	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 42de40d2cd4d..5835c74e97ce 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -290,6 +290,10 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
 void vfio_df_cdev_close(struct vfio_device_file *df);
 long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
 				struct vfio_device_bind_iommufd __user *arg);
+int vfio_df_ioctl_attach_pt(struct vfio_device_file *df,
+			    struct vfio_device_attach_iommufd_pt __user *arg);
+int vfio_df_ioctl_detach_pt(struct vfio_device_file *df,
+			    struct vfio_device_detach_iommufd_pt __user *arg);
 int vfio_cdev_init(struct class *device_class);
 void vfio_cdev_cleanup(void);
 #else
@@ -323,6 +327,18 @@ static inline long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
 	return -EOPNOTSUPP;
 }
 
+static inline int vfio_df_ioctl_attach_pt(struct vfio_device_file *df,
+					  struct vfio_device_attach_iommufd_pt __user *arg)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline int vfio_df_ioctl_detach_pt(struct vfio_device_file *df,
+					  struct vfio_device_detach_iommufd_pt __user *arg)
+{
+	return -EOPNOTSUPP;
+}
+
 static inline int vfio_cdev_init(struct class *device_class)
 {
 	return 0;
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 9ba4d420eda2..6d8f9b0f3637 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1163,6 +1163,14 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
 		ret = vfio_ioctl_device_feature(device, (void __user *)arg);
 		break;
 
+	case VFIO_DEVICE_ATTACH_IOMMUFD_PT:
+		ret = vfio_df_ioctl_attach_pt(df, (void __user *)arg);
+		break;
+
+	case VFIO_DEVICE_DETACH_IOMMUFD_PT:
+		ret = vfio_df_ioctl_detach_pt(df, (void __user *)arg);
+		break;
+
 	default:
 		if (unlikely(!device->ops->ioctl))
 			ret = -EINVAL;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 7296012b7f36..355deb852e78 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -221,6 +221,48 @@ struct vfio_device_bind_iommufd {
 
 #define VFIO_DEVICE_BIND_IOMMUFD	_IO(VFIO_TYPE, VFIO_BASE + 18)
 
+/*
+ * VFIO_DEVICE_ATTACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 19,
+ *					struct vfio_device_attach_iommufd_pt)
+ * @argsz:	User filled size of this data.
+ * @flags:	Must be 0.
+ * @pt_id:	Input the target id which can represent an ioas or a hwpt
+ *		allocated via iommufd subsystem.
+ *		Output the input ioas id or the attached hwpt id which could
+ *		be the specified hwpt itself or a hwpt automatically created
+ *		for the specified ioas by kernel during the attachment.
+ *
+ * Associate the device with an address space within the bound iommufd.
+ * Undo by VFIO_DEVICE_DETACH_IOMMUFD_PT or device fd close.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+struct vfio_device_attach_iommufd_pt {
+	__u32	argsz;
+	__u32	flags;
+	__u32	pt_id;
+};
+
+#define VFIO_DEVICE_ATTACH_IOMMUFD_PT		_IO(VFIO_TYPE, VFIO_BASE + 19)
+
+/*
+ * VFIO_DEVICE_DETACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 20,
+ *					struct vfio_device_detach_iommufd_pt)
+ * @argsz:	User filled size of this data.
+ * @flags:	Must be 0.
+ *
+ * Remove the association of the device and its current associated address
+ * space.  After it, the device should be in a blocking DMA state.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+struct vfio_device_detach_iommufd_pt {
+	__u32	argsz;
+	__u32	flags;
+};
+
+#define VFIO_DEVICE_DETACH_IOMMUFD_PT		_IO(VFIO_TYPE, VFIO_BASE + 20)
+
 /**
  * VFIO_DEVICE_GET_INFO - _IOR(VFIO_TYPE, VFIO_BASE + 7,
  *						struct vfio_device_info)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH v12 20/24] vfio: Only check group->type for noiommu test
  2023-06-02 12:16 ` [Intel-gfx] " Yi Liu
@ 2023-06-02 12:16   ` Yi Liu
  -1 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

group->type can be VFIO_NO_IOMMU only when vfio_noiommu option is true.
And vfio_noiommu option can only be true if CONFIG_VFIO_NOIOMMU is enabled.
So checking group->type is enough when testing noiommu.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c | 3 +--
 drivers/vfio/vfio.h  | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 41a09a2df690..653b62f93474 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -133,8 +133,7 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
 
 	iommufd = iommufd_ctx_from_file(f.file);
 	if (!IS_ERR(iommufd)) {
-		if (IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
-		    group->type == VFIO_NO_IOMMU)
+		if (group->type == VFIO_NO_IOMMU)
 			ret = iommufd_vfio_compat_set_no_iommu(iommufd);
 		else
 			ret = iommufd_vfio_compat_ioas_create(iommufd);
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 5835c74e97ce..1b89e8bc8571 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -108,8 +108,7 @@ void vfio_group_cleanup(void);
 
 static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
 {
-	return IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
-	       vdev->group->type == VFIO_NO_IOMMU;
+	return vdev->group->type == VFIO_NO_IOMMU;
 }
 
 #if IS_ENABLED(CONFIG_VFIO_CONTAINER)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [Intel-gfx] [PATCH v12 20/24] vfio: Only check group->type for noiommu test
@ 2023-06-02 12:16   ` Yi Liu
  0 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

group->type can be VFIO_NO_IOMMU only when vfio_noiommu option is true.
And vfio_noiommu option can only be true if CONFIG_VFIO_NOIOMMU is enabled.
So checking group->type is enough when testing noiommu.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c | 3 +--
 drivers/vfio/vfio.h  | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 41a09a2df690..653b62f93474 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -133,8 +133,7 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
 
 	iommufd = iommufd_ctx_from_file(f.file);
 	if (!IS_ERR(iommufd)) {
-		if (IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
-		    group->type == VFIO_NO_IOMMU)
+		if (group->type == VFIO_NO_IOMMU)
 			ret = iommufd_vfio_compat_set_no_iommu(iommufd);
 		else
 			ret = iommufd_vfio_compat_ioas_create(iommufd);
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 5835c74e97ce..1b89e8bc8571 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -108,8 +108,7 @@ void vfio_group_cleanup(void);
 
 static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
 {
-	return IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
-	       vdev->group->type == VFIO_NO_IOMMU;
+	return vdev->group->type == VFIO_NO_IOMMU;
 }
 
 #if IS_ENABLED(CONFIG_VFIO_CONTAINER)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
  2023-06-02 12:16 ` [Intel-gfx] " Yi Liu
@ 2023-06-02 12:16   ` Yi Liu
  -1 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

This moves the noiommu device determination and noiommu taint out of
vfio_group_find_or_alloc(). noiommu device is determined in
__vfio_register_dev() and result is stored in flag vfio_device->noiommu,
the noiommu taint is added in the end of __vfio_register_dev().

This is also a preparation for compiling out vfio_group infrastructure
as it makes the noiommu detection and taint common between the cdev path
and group path though cdev path does not support noiommu.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c     | 15 ---------------
 drivers/vfio/vfio_main.c | 31 ++++++++++++++++++++++++++++++-
 include/linux/vfio.h     |  1 +
 3 files changed, 31 insertions(+), 16 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 653b62f93474..64cdd0ea8825 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -668,21 +668,6 @@ static struct vfio_group *vfio_group_find_or_alloc(struct device *dev)
 	struct vfio_group *group;
 
 	iommu_group = iommu_group_get(dev);
-	if (!iommu_group && vfio_noiommu) {
-		/*
-		 * With noiommu enabled, create an IOMMU group for devices that
-		 * don't already have one, implying no IOMMU hardware/driver
-		 * exists.  Taint the kernel because we're about to give a DMA
-		 * capable device to a user without IOMMU protection.
-		 */
-		group = vfio_noiommu_group_alloc(dev, VFIO_NO_IOMMU);
-		if (!IS_ERR(group)) {
-			add_taint(TAINT_USER, LOCKDEP_STILL_OK);
-			dev_warn(dev, "Adding kernel taint for vfio-noiommu group on device\n");
-		}
-		return group;
-	}
-
 	if (!iommu_group)
 		return ERR_PTR(-EINVAL);
 
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 6d8f9b0f3637..00a699b9f76b 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -265,6 +265,18 @@ static int vfio_init_device(struct vfio_device *device, struct device *dev,
 	return ret;
 }
 
+static int vfio_device_set_noiommu(struct vfio_device *device)
+{
+	struct iommu_group *iommu_group = iommu_group_get(device->dev);
+
+	if (!iommu_group && !vfio_noiommu)
+		return -EINVAL;
+
+	device->noiommu = !iommu_group;
+	iommu_group_put(iommu_group); /* Accepts NULL */
+	return 0;
+}
+
 static int __vfio_register_dev(struct vfio_device *device,
 			       enum vfio_group_type type)
 {
@@ -277,6 +289,13 @@ static int __vfio_register_dev(struct vfio_device *device,
 		     !device->ops->detach_ioas)))
 		return -EINVAL;
 
+	/* Only physical devices can be noiommu device */
+	if (type == VFIO_IOMMU) {
+		ret = vfio_device_set_noiommu(device);
+		if (ret)
+			return ret;
+	}
+
 	/*
 	 * If the driver doesn't specify a set then the device is added to a
 	 * singleton set just for itself.
@@ -288,7 +307,8 @@ static int __vfio_register_dev(struct vfio_device *device,
 	if (ret)
 		return ret;
 
-	ret = vfio_device_set_group(device, type);
+	ret = vfio_device_set_group(device,
+				    device->noiommu ? VFIO_NO_IOMMU : type);
 	if (ret)
 		return ret;
 
@@ -301,6 +321,15 @@ static int __vfio_register_dev(struct vfio_device *device,
 
 	vfio_device_group_register(device);
 
+	if (device->noiommu) {
+		/*
+		 * noiommu deivces have no IOMMU hardware/driver.  Taint the
+		 * kernel because we're about to give a DMA capable device to
+		 * a user without IOMMU protection.
+		 */
+		add_taint(TAINT_USER, LOCKDEP_STILL_OK);
+		dev_warn(device->dev, "Adding kernel taint for vfio-noiommu on device\n");
+	}
 	return 0;
 err_out:
 	vfio_device_remove_group(device);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index e80a8ac86e46..183e620009e7 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -67,6 +67,7 @@ struct vfio_device {
 	bool iommufd_attached;
 #endif
 	bool cdev_opened:1;
+	bool noiommu:1;
 };
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [Intel-gfx] [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
@ 2023-06-02 12:16   ` Yi Liu
  0 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

This moves the noiommu device determination and noiommu taint out of
vfio_group_find_or_alloc(). noiommu device is determined in
__vfio_register_dev() and result is stored in flag vfio_device->noiommu,
the noiommu taint is added in the end of __vfio_register_dev().

This is also a preparation for compiling out vfio_group infrastructure
as it makes the noiommu detection and taint common between the cdev path
and group path though cdev path does not support noiommu.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c     | 15 ---------------
 drivers/vfio/vfio_main.c | 31 ++++++++++++++++++++++++++++++-
 include/linux/vfio.h     |  1 +
 3 files changed, 31 insertions(+), 16 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 653b62f93474..64cdd0ea8825 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -668,21 +668,6 @@ static struct vfio_group *vfio_group_find_or_alloc(struct device *dev)
 	struct vfio_group *group;
 
 	iommu_group = iommu_group_get(dev);
-	if (!iommu_group && vfio_noiommu) {
-		/*
-		 * With noiommu enabled, create an IOMMU group for devices that
-		 * don't already have one, implying no IOMMU hardware/driver
-		 * exists.  Taint the kernel because we're about to give a DMA
-		 * capable device to a user without IOMMU protection.
-		 */
-		group = vfio_noiommu_group_alloc(dev, VFIO_NO_IOMMU);
-		if (!IS_ERR(group)) {
-			add_taint(TAINT_USER, LOCKDEP_STILL_OK);
-			dev_warn(dev, "Adding kernel taint for vfio-noiommu group on device\n");
-		}
-		return group;
-	}
-
 	if (!iommu_group)
 		return ERR_PTR(-EINVAL);
 
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 6d8f9b0f3637..00a699b9f76b 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -265,6 +265,18 @@ static int vfio_init_device(struct vfio_device *device, struct device *dev,
 	return ret;
 }
 
+static int vfio_device_set_noiommu(struct vfio_device *device)
+{
+	struct iommu_group *iommu_group = iommu_group_get(device->dev);
+
+	if (!iommu_group && !vfio_noiommu)
+		return -EINVAL;
+
+	device->noiommu = !iommu_group;
+	iommu_group_put(iommu_group); /* Accepts NULL */
+	return 0;
+}
+
 static int __vfio_register_dev(struct vfio_device *device,
 			       enum vfio_group_type type)
 {
@@ -277,6 +289,13 @@ static int __vfio_register_dev(struct vfio_device *device,
 		     !device->ops->detach_ioas)))
 		return -EINVAL;
 
+	/* Only physical devices can be noiommu device */
+	if (type == VFIO_IOMMU) {
+		ret = vfio_device_set_noiommu(device);
+		if (ret)
+			return ret;
+	}
+
 	/*
 	 * If the driver doesn't specify a set then the device is added to a
 	 * singleton set just for itself.
@@ -288,7 +307,8 @@ static int __vfio_register_dev(struct vfio_device *device,
 	if (ret)
 		return ret;
 
-	ret = vfio_device_set_group(device, type);
+	ret = vfio_device_set_group(device,
+				    device->noiommu ? VFIO_NO_IOMMU : type);
 	if (ret)
 		return ret;
 
@@ -301,6 +321,15 @@ static int __vfio_register_dev(struct vfio_device *device,
 
 	vfio_device_group_register(device);
 
+	if (device->noiommu) {
+		/*
+		 * noiommu deivces have no IOMMU hardware/driver.  Taint the
+		 * kernel because we're about to give a DMA capable device to
+		 * a user without IOMMU protection.
+		 */
+		add_taint(TAINT_USER, LOCKDEP_STILL_OK);
+		dev_warn(device->dev, "Adding kernel taint for vfio-noiommu on device\n");
+	}
 	return 0;
 err_out:
 	vfio_device_remove_group(device);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index e80a8ac86e46..183e620009e7 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -67,6 +67,7 @@ struct vfio_device {
 	bool iommufd_attached;
 #endif
 	bool cdev_opened:1;
+	bool noiommu:1;
 };
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH v12 22/24] vfio: Remove vfio_device_is_noiommu()
  2023-06-02 12:16 ` [Intel-gfx] " Yi Liu
@ 2023-06-02 12:16   ` Yi Liu
  -1 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

This converts noiommu test to use vfio_device->noiommu flag. Per this
change, vfio_device_is_noiommu() is removed.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c   | 2 +-
 drivers/vfio/iommufd.c | 4 ++--
 drivers/vfio/vfio.h    | 9 ++-------
 3 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 64cdd0ea8825..08d37811507e 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -191,7 +191,7 @@ static int vfio_df_group_open(struct vfio_device_file *df)
 		vfio_device_group_get_kvm_safe(device);
 
 	df->iommufd = device->group->iommufd;
-	if (df->iommufd && vfio_device_is_noiommu(device) && device->open_count == 0) {
+	if (df->iommufd && device->noiommu && device->open_count == 0) {
 		/*
 		 * Require no compat ioas to be assigned to proceed.  The basic
 		 * statement is that the user cannot have done something that
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index a59ed4f881aa..fac8ca74ec85 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -37,7 +37,7 @@ int vfio_iommufd_compat_attach_ioas(struct vfio_device *vdev,
 	lockdep_assert_held(&vdev->dev_set->lock);
 
 	/* compat noiommu does not need to do ioas attach */
-	if (vfio_device_is_noiommu(vdev))
+	if (vdev->noiommu)
 		return 0;
 
 	ret = iommufd_vfio_compat_ioas_get_id(ictx, &ioas_id);
@@ -54,7 +54,7 @@ void vfio_df_iommufd_unbind(struct vfio_device_file *df)
 
 	lockdep_assert_held(&vdev->dev_set->lock);
 
-	if (vfio_device_is_noiommu(vdev))
+	if (vdev->noiommu)
 		return;
 
 	if (vdev->ops->unbind_iommufd)
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 1b89e8bc8571..b138b8334fe0 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -106,11 +106,6 @@ bool vfio_device_has_container(struct vfio_device *device);
 int __init vfio_group_init(void);
 void vfio_group_cleanup(void);
 
-static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
-{
-	return vdev->group->type == VFIO_NO_IOMMU;
-}
-
 #if IS_ENABLED(CONFIG_VFIO_CONTAINER)
 /**
  * struct vfio_iommu_driver_ops - VFIO IOMMU driver callbacks
@@ -271,7 +266,7 @@ void vfio_init_device_cdev(struct vfio_device *device);
 static inline int vfio_device_add(struct vfio_device *device)
 {
 	/* cdev does not support noiommu device */
-	if (vfio_device_is_noiommu(device))
+	if (device->noiommu)
 		return device_add(&device->device);
 	vfio_init_device_cdev(device);
 	return cdev_device_add(&device->cdev, &device->device);
@@ -279,7 +274,7 @@ static inline int vfio_device_add(struct vfio_device *device)
 
 static inline void vfio_device_del(struct vfio_device *device)
 {
-	if (vfio_device_is_noiommu(device))
+	if (device->noiommu)
 		device_del(&device->device);
 	else
 		cdev_device_del(&device->cdev, &device->device);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [Intel-gfx] [PATCH v12 22/24] vfio: Remove vfio_device_is_noiommu()
@ 2023-06-02 12:16   ` Yi Liu
  0 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

This converts noiommu test to use vfio_device->noiommu flag. Per this
change, vfio_device_is_noiommu() is removed.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c   | 2 +-
 drivers/vfio/iommufd.c | 4 ++--
 drivers/vfio/vfio.h    | 9 ++-------
 3 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 64cdd0ea8825..08d37811507e 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -191,7 +191,7 @@ static int vfio_df_group_open(struct vfio_device_file *df)
 		vfio_device_group_get_kvm_safe(device);
 
 	df->iommufd = device->group->iommufd;
-	if (df->iommufd && vfio_device_is_noiommu(device) && device->open_count == 0) {
+	if (df->iommufd && device->noiommu && device->open_count == 0) {
 		/*
 		 * Require no compat ioas to be assigned to proceed.  The basic
 		 * statement is that the user cannot have done something that
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index a59ed4f881aa..fac8ca74ec85 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -37,7 +37,7 @@ int vfio_iommufd_compat_attach_ioas(struct vfio_device *vdev,
 	lockdep_assert_held(&vdev->dev_set->lock);
 
 	/* compat noiommu does not need to do ioas attach */
-	if (vfio_device_is_noiommu(vdev))
+	if (vdev->noiommu)
 		return 0;
 
 	ret = iommufd_vfio_compat_ioas_get_id(ictx, &ioas_id);
@@ -54,7 +54,7 @@ void vfio_df_iommufd_unbind(struct vfio_device_file *df)
 
 	lockdep_assert_held(&vdev->dev_set->lock);
 
-	if (vfio_device_is_noiommu(vdev))
+	if (vdev->noiommu)
 		return;
 
 	if (vdev->ops->unbind_iommufd)
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 1b89e8bc8571..b138b8334fe0 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -106,11 +106,6 @@ bool vfio_device_has_container(struct vfio_device *device);
 int __init vfio_group_init(void);
 void vfio_group_cleanup(void);
 
-static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
-{
-	return vdev->group->type == VFIO_NO_IOMMU;
-}
-
 #if IS_ENABLED(CONFIG_VFIO_CONTAINER)
 /**
  * struct vfio_iommu_driver_ops - VFIO IOMMU driver callbacks
@@ -271,7 +266,7 @@ void vfio_init_device_cdev(struct vfio_device *device);
 static inline int vfio_device_add(struct vfio_device *device)
 {
 	/* cdev does not support noiommu device */
-	if (vfio_device_is_noiommu(device))
+	if (device->noiommu)
 		return device_add(&device->device);
 	vfio_init_device_cdev(device);
 	return cdev_device_add(&device->cdev, &device->device);
@@ -279,7 +274,7 @@ static inline int vfio_device_add(struct vfio_device *device)
 
 static inline void vfio_device_del(struct vfio_device *device)
 {
-	if (vfio_device_is_noiommu(device))
+	if (device->noiommu)
 		device_del(&device->device);
 	else
 		cdev_device_del(&device->cdev, &device->device);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH v12 23/24] vfio: Compile vfio_group infrastructure optionally
  2023-06-02 12:16 ` [Intel-gfx] " Yi Liu
@ 2023-06-02 12:16   ` Yi Liu
  -1 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

vfio_group is not needed for vfio device cdev, so with vfio device cdev
introduced, the vfio_group infrastructures can be compiled out if only
cdev is needed.

Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/Kconfig |  4 +-
 drivers/vfio/Kconfig          | 15 +++++++
 drivers/vfio/Makefile         |  2 +-
 drivers/vfio/vfio.h           | 84 ++++++++++++++++++++++++++++++++---
 include/linux/vfio.h          | 25 +++++++++--
 5 files changed, 118 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig
index ada693ea51a7..99d4b075df49 100644
--- a/drivers/iommu/iommufd/Kconfig
+++ b/drivers/iommu/iommufd/Kconfig
@@ -14,8 +14,8 @@ config IOMMUFD
 if IOMMUFD
 config IOMMUFD_VFIO_CONTAINER
 	bool "IOMMUFD provides the VFIO container /dev/vfio/vfio"
-	depends on VFIO && !VFIO_CONTAINER
-	default VFIO && !VFIO_CONTAINER
+	depends on VFIO_GROUP && !VFIO_CONTAINER
+	default VFIO_GROUP && !VFIO_CONTAINER
 	help
 	  IOMMUFD will provide /dev/vfio/vfio instead of VFIO. This relies on
 	  IOMMUFD providing compatibility emulation to give the same ioctls.
diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 1cab8e4729de..35ab8ab87688 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -4,6 +4,8 @@ menuconfig VFIO
 	select IOMMU_API
 	depends on IOMMUFD || !IOMMUFD
 	select INTERVAL_TREE
+	select VFIO_GROUP if SPAPR_TCE_IOMMU || IOMMUFD=n
+	select VFIO_DEVICE_CDEV if !VFIO_GROUP
 	select VFIO_CONTAINER if IOMMUFD=n
 	help
 	  VFIO provides a framework for secure userspace device drivers.
@@ -15,6 +17,7 @@ if VFIO
 config VFIO_DEVICE_CDEV
 	bool "Support for the VFIO cdev /dev/vfio/devices/vfioX"
 	depends on IOMMUFD
+	default !VFIO_GROUP
 	help
 	  The VFIO device cdev is another way for userspace to get device
 	  access. Userspace gets device fd by opening device cdev under
@@ -24,9 +27,20 @@ config VFIO_DEVICE_CDEV
 
 	  If you don't know what to do here, say N.
 
+config VFIO_GROUP
+	bool "Support for the VFIO group /dev/vfio/$group_id"
+	default y
+	help
+	   VFIO group support provides the traditional model for accessing
+	   devices through VFIO and is used by the majority of userspace
+	   applications and drivers making use of VFIO.
+
+	   If you don't know what to do here, say Y.
+
 config VFIO_CONTAINER
 	bool "Support for the VFIO container /dev/vfio/vfio"
 	select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
+	depends on VFIO_GROUP
 	default y
 	help
 	  The VFIO container is the classic interface to VFIO for establishing
@@ -48,6 +62,7 @@ endif
 
 config VFIO_NOIOMMU
 	bool "VFIO No-IOMMU support"
+	depends on VFIO_GROUP
 	help
 	  VFIO is built on the ability to isolate devices using the IOMMU.
 	  Only with an IOMMU can userspace access to DMA capable devices be
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index 245394aeb94b..57c3515af606 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -2,9 +2,9 @@
 obj-$(CONFIG_VFIO) += vfio.o
 
 vfio-y += vfio_main.o \
-	  group.o \
 	  iova_bitmap.o
 vfio-$(CONFIG_VFIO_DEVICE_CDEV) += device_cdev.o
+vfio-$(CONFIG_VFIO_GROUP) += group.o
 vfio-$(CONFIG_IOMMUFD) += iommufd.o
 vfio-$(CONFIG_VFIO_CONTAINER) += container.o
 vfio-$(CONFIG_VFIO_VIRQFD) += virqfd.o
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index b138b8334fe0..64bc9121b3ff 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -36,6 +36,12 @@ vfio_allocate_device_file(struct vfio_device *device);
 
 extern const struct file_operations vfio_device_fops;
 
+#ifdef CONFIG_VFIO_NOIOMMU
+extern bool vfio_noiommu __read_mostly;
+#else
+enum { vfio_noiommu = false };
+#endif
+
 enum vfio_group_type {
 	/*
 	 * Physical device with IOMMU backing.
@@ -60,6 +66,7 @@ enum vfio_group_type {
 	VFIO_NO_IOMMU,
 };
 
+#if IS_ENABLED(CONFIG_VFIO_GROUP)
 struct vfio_group {
 	struct device 			dev;
 	struct cdev			cdev;
@@ -105,6 +112,77 @@ void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm);
 bool vfio_device_has_container(struct vfio_device *device);
 int __init vfio_group_init(void);
 void vfio_group_cleanup(void);
+#else
+struct vfio_group;
+
+static inline int vfio_device_block_group(struct vfio_device *device)
+{
+	return 0;
+}
+
+static inline void vfio_device_unblock_group(struct vfio_device *device)
+{
+}
+
+static inline int vfio_device_set_group(struct vfio_device *device,
+					enum vfio_group_type type)
+{
+	return 0;
+}
+
+static inline void vfio_device_remove_group(struct vfio_device *device)
+{
+}
+
+static inline void vfio_device_group_register(struct vfio_device *device)
+{
+}
+
+static inline void vfio_device_group_unregister(struct vfio_device *device)
+{
+}
+
+static inline int vfio_device_group_use_iommu(struct vfio_device *device)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline void vfio_device_group_unuse_iommu(struct vfio_device *device)
+{
+}
+
+static inline void vfio_df_group_close(struct vfio_device_file *df)
+{
+}
+
+static inline struct vfio_group *vfio_group_from_file(struct file *file)
+{
+	return NULL;
+}
+
+static inline bool vfio_group_enforced_coherent(struct vfio_group *group)
+{
+	return true;
+}
+
+static inline void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm)
+{
+}
+
+static inline bool vfio_device_has_container(struct vfio_device *device)
+{
+	return false;
+}
+
+static inline int __init vfio_group_init(void)
+{
+	return 0;
+}
+
+static inline void vfio_group_cleanup(void)
+{
+}
+#endif /* CONFIG_VFIO_GROUP */
 
 #if IS_ENABLED(CONFIG_VFIO_CONTAINER)
 /**
@@ -356,12 +434,6 @@ static inline void vfio_virqfd_exit(void)
 }
 #endif
 
-#ifdef CONFIG_VFIO_NOIOMMU
-extern bool vfio_noiommu __read_mostly;
-#else
-enum { vfio_noiommu = false };
-#endif
-
 #ifdef CONFIG_HAVE_KVM
 void _vfio_device_get_kvm_safe(struct vfio_device *device, struct kvm *kvm);
 void vfio_device_put_kvm(struct vfio_device *device);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 183e620009e7..c1d80b3c964e 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -43,7 +43,11 @@ struct vfio_device {
 	 */
 	const struct vfio_migration_ops *mig_ops;
 	const struct vfio_log_ops *log_ops;
+#if IS_ENABLED(CONFIG_VFIO_GROUP)
 	struct vfio_group *group;
+	struct list_head group_next;
+	struct list_head iommu_entry;
+#endif
 	struct vfio_device_set *dev_set;
 	struct list_head dev_set_list;
 	unsigned int migration_flags;
@@ -58,8 +62,6 @@ struct vfio_device {
 	refcount_t refcount;	/* user count on registered device*/
 	unsigned int open_count;
 	struct completion comp;
-	struct list_head group_next;
-	struct list_head iommu_entry;
 	struct iommufd_access *iommufd_access;
 	void (*put_kvm)(struct kvm *kvm);
 #if IS_ENABLED(CONFIG_IOMMUFD)
@@ -287,12 +289,29 @@ int vfio_mig_get_next_state(struct vfio_device *device,
 /*
  * External user API
  */
+#if IS_ENABLED(CONFIG_VFIO_GROUP)
 struct iommu_group *vfio_file_iommu_group(struct file *file);
 bool vfio_file_is_group(struct file *file);
+bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
+#else
+static inline struct iommu_group *vfio_file_iommu_group(struct file *file)
+{
+	return NULL;
+}
+
+static inline bool vfio_file_is_group(struct file *file)
+{
+	return false;
+}
+
+static inline bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
+{
+	return false;
+}
+#endif
 bool vfio_file_is_valid(struct file *file);
 bool vfio_file_enforced_coherent(struct file *file);
 void vfio_file_set_kvm(struct file *file, struct kvm *kvm);
-bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
 
 #define VFIO_PIN_PAGES_MAX_ENTRIES	(PAGE_SIZE/sizeof(unsigned long))
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [Intel-gfx] [PATCH v12 23/24] vfio: Compile vfio_group infrastructure optionally
@ 2023-06-02 12:16   ` Yi Liu
  0 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

vfio_group is not needed for vfio device cdev, so with vfio device cdev
introduced, the vfio_group infrastructures can be compiled out if only
cdev is needed.

Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/Kconfig |  4 +-
 drivers/vfio/Kconfig          | 15 +++++++
 drivers/vfio/Makefile         |  2 +-
 drivers/vfio/vfio.h           | 84 ++++++++++++++++++++++++++++++++---
 include/linux/vfio.h          | 25 +++++++++--
 5 files changed, 118 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig
index ada693ea51a7..99d4b075df49 100644
--- a/drivers/iommu/iommufd/Kconfig
+++ b/drivers/iommu/iommufd/Kconfig
@@ -14,8 +14,8 @@ config IOMMUFD
 if IOMMUFD
 config IOMMUFD_VFIO_CONTAINER
 	bool "IOMMUFD provides the VFIO container /dev/vfio/vfio"
-	depends on VFIO && !VFIO_CONTAINER
-	default VFIO && !VFIO_CONTAINER
+	depends on VFIO_GROUP && !VFIO_CONTAINER
+	default VFIO_GROUP && !VFIO_CONTAINER
 	help
 	  IOMMUFD will provide /dev/vfio/vfio instead of VFIO. This relies on
 	  IOMMUFD providing compatibility emulation to give the same ioctls.
diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 1cab8e4729de..35ab8ab87688 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -4,6 +4,8 @@ menuconfig VFIO
 	select IOMMU_API
 	depends on IOMMUFD || !IOMMUFD
 	select INTERVAL_TREE
+	select VFIO_GROUP if SPAPR_TCE_IOMMU || IOMMUFD=n
+	select VFIO_DEVICE_CDEV if !VFIO_GROUP
 	select VFIO_CONTAINER if IOMMUFD=n
 	help
 	  VFIO provides a framework for secure userspace device drivers.
@@ -15,6 +17,7 @@ if VFIO
 config VFIO_DEVICE_CDEV
 	bool "Support for the VFIO cdev /dev/vfio/devices/vfioX"
 	depends on IOMMUFD
+	default !VFIO_GROUP
 	help
 	  The VFIO device cdev is another way for userspace to get device
 	  access. Userspace gets device fd by opening device cdev under
@@ -24,9 +27,20 @@ config VFIO_DEVICE_CDEV
 
 	  If you don't know what to do here, say N.
 
+config VFIO_GROUP
+	bool "Support for the VFIO group /dev/vfio/$group_id"
+	default y
+	help
+	   VFIO group support provides the traditional model for accessing
+	   devices through VFIO and is used by the majority of userspace
+	   applications and drivers making use of VFIO.
+
+	   If you don't know what to do here, say Y.
+
 config VFIO_CONTAINER
 	bool "Support for the VFIO container /dev/vfio/vfio"
 	select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
+	depends on VFIO_GROUP
 	default y
 	help
 	  The VFIO container is the classic interface to VFIO for establishing
@@ -48,6 +62,7 @@ endif
 
 config VFIO_NOIOMMU
 	bool "VFIO No-IOMMU support"
+	depends on VFIO_GROUP
 	help
 	  VFIO is built on the ability to isolate devices using the IOMMU.
 	  Only with an IOMMU can userspace access to DMA capable devices be
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index 245394aeb94b..57c3515af606 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -2,9 +2,9 @@
 obj-$(CONFIG_VFIO) += vfio.o
 
 vfio-y += vfio_main.o \
-	  group.o \
 	  iova_bitmap.o
 vfio-$(CONFIG_VFIO_DEVICE_CDEV) += device_cdev.o
+vfio-$(CONFIG_VFIO_GROUP) += group.o
 vfio-$(CONFIG_IOMMUFD) += iommufd.o
 vfio-$(CONFIG_VFIO_CONTAINER) += container.o
 vfio-$(CONFIG_VFIO_VIRQFD) += virqfd.o
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index b138b8334fe0..64bc9121b3ff 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -36,6 +36,12 @@ vfio_allocate_device_file(struct vfio_device *device);
 
 extern const struct file_operations vfio_device_fops;
 
+#ifdef CONFIG_VFIO_NOIOMMU
+extern bool vfio_noiommu __read_mostly;
+#else
+enum { vfio_noiommu = false };
+#endif
+
 enum vfio_group_type {
 	/*
 	 * Physical device with IOMMU backing.
@@ -60,6 +66,7 @@ enum vfio_group_type {
 	VFIO_NO_IOMMU,
 };
 
+#if IS_ENABLED(CONFIG_VFIO_GROUP)
 struct vfio_group {
 	struct device 			dev;
 	struct cdev			cdev;
@@ -105,6 +112,77 @@ void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm);
 bool vfio_device_has_container(struct vfio_device *device);
 int __init vfio_group_init(void);
 void vfio_group_cleanup(void);
+#else
+struct vfio_group;
+
+static inline int vfio_device_block_group(struct vfio_device *device)
+{
+	return 0;
+}
+
+static inline void vfio_device_unblock_group(struct vfio_device *device)
+{
+}
+
+static inline int vfio_device_set_group(struct vfio_device *device,
+					enum vfio_group_type type)
+{
+	return 0;
+}
+
+static inline void vfio_device_remove_group(struct vfio_device *device)
+{
+}
+
+static inline void vfio_device_group_register(struct vfio_device *device)
+{
+}
+
+static inline void vfio_device_group_unregister(struct vfio_device *device)
+{
+}
+
+static inline int vfio_device_group_use_iommu(struct vfio_device *device)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline void vfio_device_group_unuse_iommu(struct vfio_device *device)
+{
+}
+
+static inline void vfio_df_group_close(struct vfio_device_file *df)
+{
+}
+
+static inline struct vfio_group *vfio_group_from_file(struct file *file)
+{
+	return NULL;
+}
+
+static inline bool vfio_group_enforced_coherent(struct vfio_group *group)
+{
+	return true;
+}
+
+static inline void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm)
+{
+}
+
+static inline bool vfio_device_has_container(struct vfio_device *device)
+{
+	return false;
+}
+
+static inline int __init vfio_group_init(void)
+{
+	return 0;
+}
+
+static inline void vfio_group_cleanup(void)
+{
+}
+#endif /* CONFIG_VFIO_GROUP */
 
 #if IS_ENABLED(CONFIG_VFIO_CONTAINER)
 /**
@@ -356,12 +434,6 @@ static inline void vfio_virqfd_exit(void)
 }
 #endif
 
-#ifdef CONFIG_VFIO_NOIOMMU
-extern bool vfio_noiommu __read_mostly;
-#else
-enum { vfio_noiommu = false };
-#endif
-
 #ifdef CONFIG_HAVE_KVM
 void _vfio_device_get_kvm_safe(struct vfio_device *device, struct kvm *kvm);
 void vfio_device_put_kvm(struct vfio_device *device);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 183e620009e7..c1d80b3c964e 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -43,7 +43,11 @@ struct vfio_device {
 	 */
 	const struct vfio_migration_ops *mig_ops;
 	const struct vfio_log_ops *log_ops;
+#if IS_ENABLED(CONFIG_VFIO_GROUP)
 	struct vfio_group *group;
+	struct list_head group_next;
+	struct list_head iommu_entry;
+#endif
 	struct vfio_device_set *dev_set;
 	struct list_head dev_set_list;
 	unsigned int migration_flags;
@@ -58,8 +62,6 @@ struct vfio_device {
 	refcount_t refcount;	/* user count on registered device*/
 	unsigned int open_count;
 	struct completion comp;
-	struct list_head group_next;
-	struct list_head iommu_entry;
 	struct iommufd_access *iommufd_access;
 	void (*put_kvm)(struct kvm *kvm);
 #if IS_ENABLED(CONFIG_IOMMUFD)
@@ -287,12 +289,29 @@ int vfio_mig_get_next_state(struct vfio_device *device,
 /*
  * External user API
  */
+#if IS_ENABLED(CONFIG_VFIO_GROUP)
 struct iommu_group *vfio_file_iommu_group(struct file *file);
 bool vfio_file_is_group(struct file *file);
+bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
+#else
+static inline struct iommu_group *vfio_file_iommu_group(struct file *file)
+{
+	return NULL;
+}
+
+static inline bool vfio_file_is_group(struct file *file)
+{
+	return false;
+}
+
+static inline bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
+{
+	return false;
+}
+#endif
 bool vfio_file_is_valid(struct file *file);
 bool vfio_file_enforced_coherent(struct file *file);
 void vfio_file_set_kvm(struct file *file, struct kvm *kvm);
-bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
 
 #define VFIO_PIN_PAGES_MAX_ENTRIES	(PAGE_SIZE/sizeof(unsigned long))
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [PATCH v12 24/24] docs: vfio: Add vfio device cdev description
  2023-06-02 12:16 ` [Intel-gfx] " Yi Liu
@ 2023-06-02 12:16   ` Yi Liu
  -1 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

This gives notes for userspace applications on device cdev usage.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 Documentation/driver-api/vfio.rst | 132 ++++++++++++++++++++++++++++++
 1 file changed, 132 insertions(+)

diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
index 363e12c90b87..f00c9b86bda0 100644
--- a/Documentation/driver-api/vfio.rst
+++ b/Documentation/driver-api/vfio.rst
@@ -239,6 +239,130 @@ group and can access them as follows::
 	/* Gratuitous device reset and go... */
 	ioctl(device, VFIO_DEVICE_RESET);
 
+IOMMUFD and vfio_iommu_type1
+----------------------------
+
+IOMMUFD is the new user API to manage I/O page tables from userspace.
+It intends to be the portal of delivering advanced userspace DMA
+features (nested translation [5]_, PASID [6]_, etc.) while also providing
+a backwards compatibility interface for existing VFIO_TYPE1v2_IOMMU use
+cases.  Eventually the vfio_iommu_type1 driver, as well as the legacy
+vfio container and group model is intended to be deprecated.
+
+The IOMMUFD backwards compatibility interface can be enabled two ways.
+In the first method, the kernel can be configured with
+CONFIG_IOMMUFD_VFIO_CONTAINER, in which case the IOMMUFD subsystem
+transparently provides the entire infrastructure for the VFIO
+container and IOMMU backend interfaces.  The compatibility mode can
+also be accessed if the VFIO container interface, ie. /dev/vfio/vfio is
+simply symlink'd to /dev/iommu.  Note that at the time of writing, the
+compatibility mode is not entirely feature complete relative to
+VFIO_TYPE1v2_IOMMU (ex. DMA mapping MMIO) and does not attempt to
+provide compatibility to the VFIO_SPAPR_TCE_IOMMU interface.  Therefore
+it is not generally advisable at this time to switch from native VFIO
+implementations to the IOMMUFD compatibility interfaces.
+
+Long term, VFIO users should migrate to device access through the cdev
+interface described below, and native access through the IOMMUFD
+provided interfaces.
+
+VFIO Device cdev
+----------------
+
+Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
+in a VFIO group.
+
+With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd
+by directly opening a character device /dev/vfio/devices/vfioX where
+"X" is the number allocated uniquely by VFIO for registered devices.
+cdev interface does not support noiommu, so user should use the legacy
+group interface if noiommu is needed.
+
+The cdev only works with IOMMUFD.  Both VFIO drivers and applications
+must adapt to the new cdev security model which requires using
+VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to
+actually use the device.  Once BIND succeeds then a VFIO device can
+be fully accessed by the user.
+
+VFIO device cdev doesn't rely on VFIO group/container/iommu drivers.
+Hence those modules can be fully compiled out in an environment
+where no legacy VFIO application exists.
+
+So far SPAPR does not support IOMMUFD yet.  So it cannot support device
+cdev neither.
+
+Device cdev Example
+-------------------
+
+Assume user wants to access PCI device 0000:6a:01.0::
+
+	$ ls /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/
+	vfio0
+
+This device is therefore represented as vfio0.  The user can verify
+its existence::
+
+	$ ls -l /dev/vfio/devices/vfio0
+	crw------- 1 root root 511, 0 Feb 16 01:22 /dev/vfio/devices/vfio0
+	$ cat /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/vfio0/dev
+	511:0
+	$ ls -l /dev/char/511\:0
+	lrwxrwxrwx 1 root root 21 Feb 16 01:22 /dev/char/511:0 -> ../vfio/devices/vfio0
+
+Then provide the user with access to the device if unprivileged
+operation is desired::
+
+	$ chown user:user /dev/vfio/devices/vfio0
+
+Finally the user could get cdev fd by::
+
+	cdev_fd = open("/dev/vfio/devices/vfio0", O_RDWR);
+
+An opened cdev_fd doesn't give the user any permission of accessing
+the device except binding the cdev_fd to an iommufd.  After that point
+then the device is fully accessible including attaching it to an
+IOMMUFD IOAS/HWPT to enable userspace DMA::
+
+	struct vfio_device_bind_iommufd bind = {
+		.argsz = sizeof(bind),
+		.flags = 0,
+	};
+	struct iommu_ioas_alloc alloc_data  = {
+		.size = sizeof(alloc_data),
+		.flags = 0,
+	};
+	struct vfio_device_attach_iommufd_pt attach_data = {
+		.argsz = sizeof(attach_data),
+		.flags = 0,
+	};
+	struct iommu_ioas_map map = {
+		.size = sizeof(map),
+		.flags = IOMMU_IOAS_MAP_READABLE |
+			 IOMMU_IOAS_MAP_WRITEABLE |
+			 IOMMU_IOAS_MAP_FIXED_IOVA,
+		.__reserved = 0,
+	};
+
+	iommufd = open("/dev/iommu", O_RDWR);
+
+	bind.iommufd = iommufd;
+	ioctl(cdev_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind);
+
+	ioctl(iommufd, IOMMU_IOAS_ALLOC, &alloc_data);
+	attach_data.pt_id = alloc_data.out_ioas_id;
+	ioctl(cdev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data);
+
+	/* Allocate some space and setup a DMA mapping */
+	map.user_va = (int64_t)mmap(0, 1024 * 1024, PROT_READ | PROT_WRITE,
+				    MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
+	map.iova = 0; /* 1MB starting at 0x0 from device view */
+	map.length = 1024 * 1024;
+	map.ioas_id = alloc_data.out_ioas_id;;
+
+	ioctl(iommufd, IOMMU_IOAS_MAP, &map);
+
+	/* Other device operations as stated in "VFIO Usage Example" */
+
 VFIO User API
 -------------------------------------------------------------------------------
 
@@ -566,3 +690,11 @@ This implementation has some specifics:
 				\-0d.1
 
 	00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
+
+.. [5] Nested translation is an IOMMU feature which supports two stage
+   address translations.  This improves the address translation efficiency
+   in IOMMU virtualization.
+
+.. [6] PASID stands for Process Address Space ID, introduced by PCI
+   Express.  It is a prerequisite for Shared Virtual Addressing (SVA)
+   and Scalable I/O Virtualization (Scalable IOV).
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [Intel-gfx] [PATCH v12 24/24] docs: vfio: Add vfio device cdev description
@ 2023-06-02 12:16   ` Yi Liu
  0 siblings, 0 replies; 180+ messages in thread
From: Yi Liu @ 2023-06-02 12:16 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

This gives notes for userspace applications on device cdev usage.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 Documentation/driver-api/vfio.rst | 132 ++++++++++++++++++++++++++++++
 1 file changed, 132 insertions(+)

diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
index 363e12c90b87..f00c9b86bda0 100644
--- a/Documentation/driver-api/vfio.rst
+++ b/Documentation/driver-api/vfio.rst
@@ -239,6 +239,130 @@ group and can access them as follows::
 	/* Gratuitous device reset and go... */
 	ioctl(device, VFIO_DEVICE_RESET);
 
+IOMMUFD and vfio_iommu_type1
+----------------------------
+
+IOMMUFD is the new user API to manage I/O page tables from userspace.
+It intends to be the portal of delivering advanced userspace DMA
+features (nested translation [5]_, PASID [6]_, etc.) while also providing
+a backwards compatibility interface for existing VFIO_TYPE1v2_IOMMU use
+cases.  Eventually the vfio_iommu_type1 driver, as well as the legacy
+vfio container and group model is intended to be deprecated.
+
+The IOMMUFD backwards compatibility interface can be enabled two ways.
+In the first method, the kernel can be configured with
+CONFIG_IOMMUFD_VFIO_CONTAINER, in which case the IOMMUFD subsystem
+transparently provides the entire infrastructure for the VFIO
+container and IOMMU backend interfaces.  The compatibility mode can
+also be accessed if the VFIO container interface, ie. /dev/vfio/vfio is
+simply symlink'd to /dev/iommu.  Note that at the time of writing, the
+compatibility mode is not entirely feature complete relative to
+VFIO_TYPE1v2_IOMMU (ex. DMA mapping MMIO) and does not attempt to
+provide compatibility to the VFIO_SPAPR_TCE_IOMMU interface.  Therefore
+it is not generally advisable at this time to switch from native VFIO
+implementations to the IOMMUFD compatibility interfaces.
+
+Long term, VFIO users should migrate to device access through the cdev
+interface described below, and native access through the IOMMUFD
+provided interfaces.
+
+VFIO Device cdev
+----------------
+
+Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
+in a VFIO group.
+
+With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd
+by directly opening a character device /dev/vfio/devices/vfioX where
+"X" is the number allocated uniquely by VFIO for registered devices.
+cdev interface does not support noiommu, so user should use the legacy
+group interface if noiommu is needed.
+
+The cdev only works with IOMMUFD.  Both VFIO drivers and applications
+must adapt to the new cdev security model which requires using
+VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to
+actually use the device.  Once BIND succeeds then a VFIO device can
+be fully accessed by the user.
+
+VFIO device cdev doesn't rely on VFIO group/container/iommu drivers.
+Hence those modules can be fully compiled out in an environment
+where no legacy VFIO application exists.
+
+So far SPAPR does not support IOMMUFD yet.  So it cannot support device
+cdev neither.
+
+Device cdev Example
+-------------------
+
+Assume user wants to access PCI device 0000:6a:01.0::
+
+	$ ls /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/
+	vfio0
+
+This device is therefore represented as vfio0.  The user can verify
+its existence::
+
+	$ ls -l /dev/vfio/devices/vfio0
+	crw------- 1 root root 511, 0 Feb 16 01:22 /dev/vfio/devices/vfio0
+	$ cat /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/vfio0/dev
+	511:0
+	$ ls -l /dev/char/511\:0
+	lrwxrwxrwx 1 root root 21 Feb 16 01:22 /dev/char/511:0 -> ../vfio/devices/vfio0
+
+Then provide the user with access to the device if unprivileged
+operation is desired::
+
+	$ chown user:user /dev/vfio/devices/vfio0
+
+Finally the user could get cdev fd by::
+
+	cdev_fd = open("/dev/vfio/devices/vfio0", O_RDWR);
+
+An opened cdev_fd doesn't give the user any permission of accessing
+the device except binding the cdev_fd to an iommufd.  After that point
+then the device is fully accessible including attaching it to an
+IOMMUFD IOAS/HWPT to enable userspace DMA::
+
+	struct vfio_device_bind_iommufd bind = {
+		.argsz = sizeof(bind),
+		.flags = 0,
+	};
+	struct iommu_ioas_alloc alloc_data  = {
+		.size = sizeof(alloc_data),
+		.flags = 0,
+	};
+	struct vfio_device_attach_iommufd_pt attach_data = {
+		.argsz = sizeof(attach_data),
+		.flags = 0,
+	};
+	struct iommu_ioas_map map = {
+		.size = sizeof(map),
+		.flags = IOMMU_IOAS_MAP_READABLE |
+			 IOMMU_IOAS_MAP_WRITEABLE |
+			 IOMMU_IOAS_MAP_FIXED_IOVA,
+		.__reserved = 0,
+	};
+
+	iommufd = open("/dev/iommu", O_RDWR);
+
+	bind.iommufd = iommufd;
+	ioctl(cdev_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind);
+
+	ioctl(iommufd, IOMMU_IOAS_ALLOC, &alloc_data);
+	attach_data.pt_id = alloc_data.out_ioas_id;
+	ioctl(cdev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data);
+
+	/* Allocate some space and setup a DMA mapping */
+	map.user_va = (int64_t)mmap(0, 1024 * 1024, PROT_READ | PROT_WRITE,
+				    MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
+	map.iova = 0; /* 1MB starting at 0x0 from device view */
+	map.length = 1024 * 1024;
+	map.ioas_id = alloc_data.out_ioas_id;;
+
+	ioctl(iommufd, IOMMU_IOAS_MAP, &map);
+
+	/* Other device operations as stated in "VFIO Usage Example" */
+
 VFIO User API
 -------------------------------------------------------------------------------
 
@@ -566,3 +690,11 @@ This implementation has some specifics:
 				\-0d.1
 
 	00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
+
+.. [5] Nested translation is an IOMMU feature which supports two stage
+   address translations.  This improves the address translation efficiency
+   in IOMMU virtualization.
+
+.. [6] PASID stands for Process Address Space ID, introduced by PCI
+   Express.  It is a prerequisite for Shared Virtual Addressing (SVA)
+   and Scalable I/O Virtualization (Scalable IOV).
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BUILD: failure for Add vfio_device cdev for iommufd support (rev15)
  2023-06-02 12:16 ` [Intel-gfx] " Yi Liu
                   ` (24 preceding siblings ...)
  (?)
@ 2023-06-02 16:19 ` Patchwork
  -1 siblings, 0 replies; 180+ messages in thread
From: Patchwork @ 2023-06-02 16:19 UTC (permalink / raw)
  To: Yi Liu; +Cc: intel-gfx

== Series Details ==

Series: Add vfio_device cdev for iommufd support (rev15)
URL   : https://patchwork.freedesktop.org/series/113696/
State : failure

== Summary ==

Error: patch https://patchwork.freedesktop.org/api/1.0/series/113696/revisions/15/mbox/ not applied
Applying: vfio: Allocate per device file structure
Applying: vfio: Refine vfio file kAPIs for KVM
Applying: vfio: Accept vfio device file in the KVM facing kAPI
Applying: kvm/vfio: Prepare for accepting vfio device fd
Applying: kvm/vfio: Accept vfio device file from userspace
Applying: vfio: Pass struct vfio_device_file * to vfio_device_open/close()
Applying: vfio: Block device access via device fd until device is opened
Applying: vfio: Add cdev_device_open_cnt to vfio_group
Applying: vfio: Make vfio_df_open() single open for device cdev path
Applying: vfio-iommufd: Move noiommu compat validation out of vfio_iommufd_bind()
Applying: vfio-iommufd: Split bind/attach into two steps
Applying: vfio: Record devid in vfio_device_file
Applying: vfio-iommufd: Add detach_ioas support for physical VFIO devices
Applying: iommufd/device: Add iommufd_access_detach() API
Applying: vfio-iommufd: Add detach_ioas support for emulated VFIO devices
error: sha1 information is lacking or useless (drivers/vfio/iommufd.c).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0015 vfio-iommufd: Add detach_ioas support for emulated VFIO devices
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Build failed, no error log produced



^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 00/24] Add vfio_device cdev for iommufd support
  2023-06-02 12:16 ` [Intel-gfx] " Yi Liu
@ 2023-06-07  8:27   ` Nicolin Chen
  -1 siblings, 0 replies; 180+ messages in thread
From: Nicolin Chen @ 2023-06-07  8:27 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, jgg, kevin.tian, joro, robin.murphy, cohuck,
	eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

On Fri, Jun 02, 2023 at 05:16:29AM -0700, Yi Liu wrote:

> Existing VFIO provides group-centric user APIs for userspace. Userspace
> opens the /dev/vfio/$group_id first before getting device fd and hence
> getting access to device. This is not the desired model for iommufd. Per
> the conclusion of community discussion[1], iommufd provides device-centric
> kAPIs and requires its consumer (like VFIO) to be device-centric user
> APIs. Such user APIs are used to associate device with iommufd and also
> the I/O address spaces managed by the iommufd.
> 
> This series first introduces a per device file structure to be prepared
> for further enhancement and refactors the kvm-vfio code to be prepared
> for accepting device file from userspace. After this, adds a mechanism for
> blocking device access before iommufd bind. Then refactors the vfio to be
> able to handle cdev path (e.g. iommufd binding, no-iommufd, [de]attach ioas).
> This refactor includes making the device_open exclusive between the group
> and the cdev path, only allow single device open in cdev path; vfio-iommufd
> code is also refactored to support cdev. e.g. split the vfio_iommufd_bind()
> into two steps. Eventually, adds the cdev support for vfio device and the
> new ioctls, then makes group infrastructure optional as it is not needed
> when vfio device cdev is compiled.
> 
> This series is based on some preparation works done to vfio emulated devices[2]
> and vfio pci hot reset enhancements[3].
> 
> This series is a prerequisite for iommu nesting for vfio device[4] [5].
> 
> The complete code can be found in below branch, simple tests done to the
> legacy group path and the cdev path. Draft QEMU branch can be found at[6]
> However, the noiommu mode test is only done with some hacks in kernel and
> qemu to check if qemu can boot with noiommu devices.
> 
> https://github.com/yiliu1765/iommufd/tree/vfio_device_cdev_v12
> (config CONFIG_IOMMUFD=y CONFIG_VFIO_DEVICE_CDEV=y)
 
Rebased our nesting branch, and tested with an updated QEMU
branch on ARM64 (SMMUv3):
https://github.com/nicolinc/iommufd/commits/wip/iommufd_nesting-06052023-cdev-v12-nic
https://github.com/nicolinc/qemu/commits/wip/iommufd_nesting-06062023

Tested-by: Nicolin Chen <nicolinc@nvidia.com>

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 00/24] Add vfio_device cdev for iommufd support
@ 2023-06-07  8:27   ` Nicolin Chen
  0 siblings, 0 replies; 180+ messages in thread
From: Nicolin Chen @ 2023-06-07  8:27 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, kvm, lulu, yanting.jiang,
	joro, jgg, kevin.tian, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Fri, Jun 02, 2023 at 05:16:29AM -0700, Yi Liu wrote:

> Existing VFIO provides group-centric user APIs for userspace. Userspace
> opens the /dev/vfio/$group_id first before getting device fd and hence
> getting access to device. This is not the desired model for iommufd. Per
> the conclusion of community discussion[1], iommufd provides device-centric
> kAPIs and requires its consumer (like VFIO) to be device-centric user
> APIs. Such user APIs are used to associate device with iommufd and also
> the I/O address spaces managed by the iommufd.
> 
> This series first introduces a per device file structure to be prepared
> for further enhancement and refactors the kvm-vfio code to be prepared
> for accepting device file from userspace. After this, adds a mechanism for
> blocking device access before iommufd bind. Then refactors the vfio to be
> able to handle cdev path (e.g. iommufd binding, no-iommufd, [de]attach ioas).
> This refactor includes making the device_open exclusive between the group
> and the cdev path, only allow single device open in cdev path; vfio-iommufd
> code is also refactored to support cdev. e.g. split the vfio_iommufd_bind()
> into two steps. Eventually, adds the cdev support for vfio device and the
> new ioctls, then makes group infrastructure optional as it is not needed
> when vfio device cdev is compiled.
> 
> This series is based on some preparation works done to vfio emulated devices[2]
> and vfio pci hot reset enhancements[3].
> 
> This series is a prerequisite for iommu nesting for vfio device[4] [5].
> 
> The complete code can be found in below branch, simple tests done to the
> legacy group path and the cdev path. Draft QEMU branch can be found at[6]
> However, the noiommu mode test is only done with some hacks in kernel and
> qemu to check if qemu can boot with noiommu devices.
> 
> https://github.com/yiliu1765/iommufd/tree/vfio_device_cdev_v12
> (config CONFIG_IOMMUFD=y CONFIG_VFIO_DEVICE_CDEV=y)
 
Rebased our nesting branch, and tested with an updated QEMU
branch on ARM64 (SMMUv3):
https://github.com/nicolinc/iommufd/commits/wip/iommufd_nesting-06052023-cdev-v12-nic
https://github.com/nicolinc/qemu/commits/wip/iommufd_nesting-06062023

Tested-by: Nicolin Chen <nicolinc@nvidia.com>

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [PATCH v12 00/24] Add vfio_device cdev for iommufd support
  2023-06-02 12:16 ` [Intel-gfx] " Yi Liu
@ 2023-06-08  6:58   ` Jiang, Yanting
  -1 siblings, 0 replies; 180+ messages in thread
From: Jiang, Yanting @ 2023-06-08  6:58 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg, Tian, Kevin
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Duan, Zhenzhong, clegoate

> Subject: [PATCH v12 00/24] Add vfio_device cdev for iommufd support
> 
> Existing VFIO provides group-centric user APIs for userspace. Userspace opens
> the /dev/vfio/$group_id first before getting device fd and hence getting access
> to device. This is not the desired model for iommufd. Per the conclusion of
> community discussion[1], iommufd provides device-centric kAPIs and requires its
> consumer (like VFIO) to be device-centric user APIs. Such user APIs are used to
> associate device with iommufd and also the I/O address spaces managed by the
> iommufd.
> 
> This series first introduces a per device file structure to be prepared for further
> enhancement and refactors the kvm-vfio code to be prepared for accepting
> device file from userspace. After this, adds a mechanism for blocking device
> access before iommufd bind. Then refactors the vfio to be able to handle cdev
> path (e.g. iommufd binding, no-iommufd, [de]attach ioas).
> This refactor includes making the device_open exclusive between the group and
> the cdev path, only allow single device open in cdev path; vfio-iommufd code is
> also refactored to support cdev. e.g. split the vfio_iommufd_bind() into two
> steps. Eventually, adds the cdev support for vfio device and the new ioctls, then
> makes group infrastructure optional as it is not needed when vfio device cdev is
> compiled.
> 
> This series is based on some preparation works done to vfio emulated devices[2]
> and vfio pci hot reset enhancements[3].
> 
> This series is a prerequisite for iommu nesting for vfio device[4] [5].
> 
> The complete code can be found in below branch, simple tests done to the
> legacy group path and the cdev path. Draft QEMU branch can be found at[6]
> However, the noiommu mode test is only done with some hacks in kernel and
> qemu to check if qemu can boot with noiommu devices.
> 
> https://github.com/yiliu1765/iommufd/tree/vfio_device_cdev_v12
> (config CONFIG_IOMMUFD=y CONFIG_VFIO_DEVICE_CDEV=y)
> 
> base-commit: 0948fa29d62eca627a19d5b1534262a6d93d4181
> 

Tested NIC passthrough on Intel platform.
Result looks good hence,
Tested-by: Yanting Jiang <yanting.jiang@intel.com>

Thanks,
Yanting


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 00/24] Add vfio_device cdev for iommufd support
@ 2023-06-08  6:58   ` Jiang, Yanting
  0 siblings, 0 replies; 180+ messages in thread
From: Jiang, Yanting @ 2023-06-08  6:58 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg, Tian, Kevin
  Cc: mjrosato, jasowang, Hao, Xudong, peterx, Xu, Terrence,
	chao.p.peng, linux-s390, kvm, lulu, Duan, Zhenzhong, joro,
	nicolinc, Zhao, Yan Y, intel-gfx, eric.auger, intel-gvt-dev,
	yi.y.sun, clegoate, cohuck, shameerali.kolothum.thodi,
	suravee.suthikulpanit, robin.murphy

> Subject: [PATCH v12 00/24] Add vfio_device cdev for iommufd support
> 
> Existing VFIO provides group-centric user APIs for userspace. Userspace opens
> the /dev/vfio/$group_id first before getting device fd and hence getting access
> to device. This is not the desired model for iommufd. Per the conclusion of
> community discussion[1], iommufd provides device-centric kAPIs and requires its
> consumer (like VFIO) to be device-centric user APIs. Such user APIs are used to
> associate device with iommufd and also the I/O address spaces managed by the
> iommufd.
> 
> This series first introduces a per device file structure to be prepared for further
> enhancement and refactors the kvm-vfio code to be prepared for accepting
> device file from userspace. After this, adds a mechanism for blocking device
> access before iommufd bind. Then refactors the vfio to be able to handle cdev
> path (e.g. iommufd binding, no-iommufd, [de]attach ioas).
> This refactor includes making the device_open exclusive between the group and
> the cdev path, only allow single device open in cdev path; vfio-iommufd code is
> also refactored to support cdev. e.g. split the vfio_iommufd_bind() into two
> steps. Eventually, adds the cdev support for vfio device and the new ioctls, then
> makes group infrastructure optional as it is not needed when vfio device cdev is
> compiled.
> 
> This series is based on some preparation works done to vfio emulated devices[2]
> and vfio pci hot reset enhancements[3].
> 
> This series is a prerequisite for iommu nesting for vfio device[4] [5].
> 
> The complete code can be found in below branch, simple tests done to the
> legacy group path and the cdev path. Draft QEMU branch can be found at[6]
> However, the noiommu mode test is only done with some hacks in kernel and
> qemu to check if qemu can boot with noiommu devices.
> 
> https://github.com/yiliu1765/iommufd/tree/vfio_device_cdev_v12
> (config CONFIG_IOMMUFD=y CONFIG_VFIO_DEVICE_CDEV=y)
> 
> base-commit: 0948fa29d62eca627a19d5b1534262a6d93d4181
> 

Tested NIC passthrough on Intel platform.
Result looks good hence,
Tested-by: Yanting Jiang <yanting.jiang@intel.com>

Thanks,
Yanting


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 00/24] Add vfio_device cdev for iommufd support
  2023-06-02 12:16 ` [Intel-gfx] " Yi Liu
@ 2023-06-09 16:47   ` Matthew Rosato
  -1 siblings, 0 replies; 180+ messages in thread
From: Matthew Rosato @ 2023-06-09 16:47 UTC (permalink / raw)
  To: Yi Liu, alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

On 6/2/23 8:16 AM, Yi Liu wrote:
> Existing VFIO provides group-centric user APIs for userspace. Userspace
> opens the /dev/vfio/$group_id first before getting device fd and hence
> getting access to device. This is not the desired model for iommufd. Per
> the conclusion of community discussion[1], iommufd provides device-centric
> kAPIs and requires its consumer (like VFIO) to be device-centric user
> APIs. Such user APIs are used to associate device with iommufd and also
> the I/O address spaces managed by the iommufd.
> 
> This series first introduces a per device file structure to be prepared
> for further enhancement and refactors the kvm-vfio code to be prepared
> for accepting device file from userspace. After this, adds a mechanism for
> blocking device access before iommufd bind. Then refactors the vfio to be
> able to handle cdev path (e.g. iommufd binding, no-iommufd, [de]attach ioas).
> This refactor includes making the device_open exclusive between the group
> and the cdev path, only allow single device open in cdev path; vfio-iommufd
> code is also refactored to support cdev. e.g. split the vfio_iommufd_bind()
> into two steps. Eventually, adds the cdev support for vfio device and the
> new ioctls, then makes group infrastructure optional as it is not needed
> when vfio device cdev is compiled.
> 
> This series is based on some preparation works done to vfio emulated devices[2]
> and vfio pci hot reset enhancements[3].
> 
> This series is a prerequisite for iommu nesting for vfio device[4] [5].
> 
> The complete code can be found in below branch, simple tests done to the
> legacy group path and the cdev path. Draft QEMU branch can be found at[6]
> However, the noiommu mode test is only done with some hacks in kernel and
> qemu to check if qemu can boot with noiommu devices.
> 
> https://github.com/yiliu1765/iommufd/tree/vfio_device_cdev_v12
> (config CONFIG_IOMMUFD=y CONFIG_VFIO_DEVICE_CDEV=y)
> 
> base-commit: 0948fa29d62eca627a19d5b1534262a6d93d4181
> 

Hi Yi,

I gave a tested-by some time ago, and have been running with various versions in between -- but there have been enough changes that by now the testing seems worth reaffirming.

So, on this version (along with the QEMU test counterpart) I have tested the following on s390:

1) default vfio container testing using vfio-pci, vfio-ap, vfio-ccw
2) iommufd vfio compat testing using vfio-pci, vfio-ap, vfio-ccw (via group)
3) iommufd vfio compat testing using vfio-pci (via cdev)
4) iommufd + s390 nesting WIP kernel+QEMU series (built on top of intel and SMMUv3 nesting series) using vfio-pci


Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>


Thanks,
Matt



^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 00/24] Add vfio_device cdev for iommufd support
@ 2023-06-09 16:47   ` Matthew Rosato
  0 siblings, 0 replies; 180+ messages in thread
From: Matthew Rosato @ 2023-06-09 16:47 UTC (permalink / raw)
  To: Yi Liu, alex.williamson, jgg, kevin.tian
  Cc: kvm, jasowang, xudong.hao, peterx, terrence.xu, chao.p.peng,
	linux-s390, lulu, zhenzhong.duan, joro, nicolinc, yan.y.zhao,
	intel-gfx, eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, yanting.jiang,
	robin.murphy

On 6/2/23 8:16 AM, Yi Liu wrote:
> Existing VFIO provides group-centric user APIs for userspace. Userspace
> opens the /dev/vfio/$group_id first before getting device fd and hence
> getting access to device. This is not the desired model for iommufd. Per
> the conclusion of community discussion[1], iommufd provides device-centric
> kAPIs and requires its consumer (like VFIO) to be device-centric user
> APIs. Such user APIs are used to associate device with iommufd and also
> the I/O address spaces managed by the iommufd.
> 
> This series first introduces a per device file structure to be prepared
> for further enhancement and refactors the kvm-vfio code to be prepared
> for accepting device file from userspace. After this, adds a mechanism for
> blocking device access before iommufd bind. Then refactors the vfio to be
> able to handle cdev path (e.g. iommufd binding, no-iommufd, [de]attach ioas).
> This refactor includes making the device_open exclusive between the group
> and the cdev path, only allow single device open in cdev path; vfio-iommufd
> code is also refactored to support cdev. e.g. split the vfio_iommufd_bind()
> into two steps. Eventually, adds the cdev support for vfio device and the
> new ioctls, then makes group infrastructure optional as it is not needed
> when vfio device cdev is compiled.
> 
> This series is based on some preparation works done to vfio emulated devices[2]
> and vfio pci hot reset enhancements[3].
> 
> This series is a prerequisite for iommu nesting for vfio device[4] [5].
> 
> The complete code can be found in below branch, simple tests done to the
> legacy group path and the cdev path. Draft QEMU branch can be found at[6]
> However, the noiommu mode test is only done with some hacks in kernel and
> qemu to check if qemu can boot with noiommu devices.
> 
> https://github.com/yiliu1765/iommufd/tree/vfio_device_cdev_v12
> (config CONFIG_IOMMUFD=y CONFIG_VFIO_DEVICE_CDEV=y)
> 
> base-commit: 0948fa29d62eca627a19d5b1534262a6d93d4181
> 

Hi Yi,

I gave a tested-by some time ago, and have been running with various versions in between -- but there have been enough changes that by now the testing seems worth reaffirming.

So, on this version (along with the QEMU test counterpart) I have tested the following on s390:

1) default vfio container testing using vfio-pci, vfio-ap, vfio-ccw
2) iommufd vfio compat testing using vfio-pci, vfio-ap, vfio-ccw (via group)
3) iommufd vfio compat testing using vfio-pci (via cdev)
4) iommufd + s390 nesting WIP kernel+QEMU series (built on top of intel and SMMUv3 nesting series) using vfio-pci


Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>


Thanks,
Matt



^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 06/24] vfio: Pass struct vfio_device_file * to vfio_device_open/close()
  2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
@ 2023-06-12 21:52     ` Alex Williamson
  -1 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-12 21:52 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, kvm, lulu, yanting.jiang,
	joro, nicolinc, jgg, kevin.tian, yan.y.zhao, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Fri,  2 Jun 2023 05:16:35 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> This avoids passing too much parameters in multiple functions. Per the
> input parameter change, rename the function to be vfio_df_open/close().
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/group.c     | 20 ++++++++++++++------
>  drivers/vfio/vfio.h      |  8 ++++----
>  drivers/vfio/vfio_main.c | 25 +++++++++++++++----------
>  3 files changed, 33 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> index b56e19d2a02d..caf53716ddb2 100644
> --- a/drivers/vfio/group.c
> +++ b/drivers/vfio/group.c
> @@ -169,8 +169,9 @@ static void vfio_device_group_get_kvm_safe(struct vfio_device *device)
>  	spin_unlock(&device->group->kvm_ref_lock);
>  }
>  
> -static int vfio_device_group_open(struct vfio_device *device)
> +static int vfio_df_group_open(struct vfio_device_file *df)
>  {
> +	struct vfio_device *device = df->device;
>  	int ret;
>  
>  	mutex_lock(&device->group->group_lock);
> @@ -190,7 +191,11 @@ static int vfio_device_group_open(struct vfio_device *device)
>  	if (device->open_count == 0)
>  		vfio_device_group_get_kvm_safe(device);
>  
> -	ret = vfio_device_open(device, device->group->iommufd);
> +	df->iommufd = device->group->iommufd;
> +
> +	ret = vfio_df_open(df);
> +	if (ret)
> +		df->iommufd = NULL;
>  
>  	if (device->open_count == 0)
>  		vfio_device_put_kvm(device);
> @@ -202,12 +207,15 @@ static int vfio_device_group_open(struct vfio_device *device)
>  	return ret;
>  }
>  
> -void vfio_device_group_close(struct vfio_device *device)
> +void vfio_df_group_close(struct vfio_device_file *df)
>  {
> +	struct vfio_device *device = df->device;
> +
>  	mutex_lock(&device->group->group_lock);
>  	mutex_lock(&device->dev_set->lock);
>  
> -	vfio_device_close(device, device->group->iommufd);
> +	vfio_df_close(df);
> +	df->iommufd = NULL;
>  
>  	if (device->open_count == 0)
>  		vfio_device_put_kvm(device);
> @@ -228,7 +236,7 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
>  		goto err_out;
>  	}
>  
> -	ret = vfio_device_group_open(device);
> +	ret = vfio_df_group_open(df);
>  	if (ret)
>  		goto err_free;
>  
> @@ -260,7 +268,7 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
>  	return filep;
>  
>  err_close_device:
> -	vfio_device_group_close(device);
> +	vfio_df_group_close(df);
>  err_free:
>  	kfree(df);
>  err_out:
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index 69e1a0692b06..f9eb52eb9ed7 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -20,13 +20,13 @@ struct vfio_device_file {
>  	struct vfio_device *device;
>  	spinlock_t kvm_ref_lock; /* protect kvm field */
>  	struct kvm *kvm;
> +	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
>  };
>  
>  void vfio_device_put_registration(struct vfio_device *device);
>  bool vfio_device_try_get_registration(struct vfio_device *device);
> -int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd);
> -void vfio_device_close(struct vfio_device *device,
> -		       struct iommufd_ctx *iommufd);
> +int vfio_df_open(struct vfio_device_file *df);
> +void vfio_df_close(struct vfio_device_file *df);
>  struct vfio_device_file *
>  vfio_allocate_device_file(struct vfio_device *device);
>  
> @@ -91,7 +91,7 @@ void vfio_device_group_register(struct vfio_device *device);
>  void vfio_device_group_unregister(struct vfio_device *device);
>  int vfio_device_group_use_iommu(struct vfio_device *device);
>  void vfio_device_group_unuse_iommu(struct vfio_device *device);
> -void vfio_device_group_close(struct vfio_device *device);
> +void vfio_df_group_close(struct vfio_device_file *df);
>  struct vfio_group *vfio_group_from_file(struct file *file);
>  bool vfio_group_enforced_coherent(struct vfio_group *group);
>  void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm);
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 8ef9210ad2aa..a3c5817fc545 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -434,9 +434,10 @@ vfio_allocate_device_file(struct vfio_device *device)
>  	return df;
>  }
>  
> -static int vfio_device_first_open(struct vfio_device *device,
> -				  struct iommufd_ctx *iommufd)
> +static int vfio_device_first_open(struct vfio_device_file *df)
>  {
> +	struct vfio_device *device = df->device;
> +	struct iommufd_ctx *iommufd = df->iommufd;
>  	int ret;
>  
>  	lockdep_assert_held(&device->dev_set->lock);
> @@ -468,9 +469,11 @@ static int vfio_device_first_open(struct vfio_device *device,
>  	return ret;
>  }
>  
> -static void vfio_device_last_close(struct vfio_device *device,
> -				   struct iommufd_ctx *iommufd)
> +static void vfio_device_last_close(struct vfio_device_file *df)

Shouldn't these now be vfio_df_... functions too?  Thanks,

Ale

>  {
> +	struct vfio_device *device = df->device;
> +	struct iommufd_ctx *iommufd = df->iommufd;
> +
>  	lockdep_assert_held(&device->dev_set->lock);
>  
>  	if (device->ops->close_device)
> @@ -482,15 +485,16 @@ static void vfio_device_last_close(struct vfio_device *device,
>  	module_put(device->dev->driver->owner);
>  }
>  
> -int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd)
> +int vfio_df_open(struct vfio_device_file *df)
>  {
> +	struct vfio_device *device = df->device;
>  	int ret = 0;
>  
>  	lockdep_assert_held(&device->dev_set->lock);
>  
>  	device->open_count++;
>  	if (device->open_count == 1) {
> -		ret = vfio_device_first_open(device, iommufd);
> +		ret = vfio_device_first_open(df);
>  		if (ret)
>  			device->open_count--;
>  	}
> @@ -498,14 +502,15 @@ int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd)
>  	return ret;
>  }
>  
> -void vfio_device_close(struct vfio_device *device,
> -		       struct iommufd_ctx *iommufd)
> +void vfio_df_close(struct vfio_device_file *df)
>  {
> +	struct vfio_device *device = df->device;
> +
>  	lockdep_assert_held(&device->dev_set->lock);
>  
>  	vfio_assert_device_open(device);
>  	if (device->open_count == 1)
> -		vfio_device_last_close(device, iommufd);
> +		vfio_device_last_close(df);
>  	device->open_count--;
>  }
>  
> @@ -550,7 +555,7 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
>  	struct vfio_device_file *df = filep->private_data;
>  	struct vfio_device *device = df->device;
>  
> -	vfio_device_group_close(device);
> +	vfio_df_group_close(df);
>  
>  	vfio_device_put_registration(device);
>  


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 06/24] vfio: Pass struct vfio_device_file * to vfio_device_open/close()
@ 2023-06-12 21:52     ` Alex Williamson
  0 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-12 21:52 UTC (permalink / raw)
  To: Yi Liu
  Cc: jgg, kevin.tian, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

On Fri,  2 Jun 2023 05:16:35 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> This avoids passing too much parameters in multiple functions. Per the
> input parameter change, rename the function to be vfio_df_open/close().
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/group.c     | 20 ++++++++++++++------
>  drivers/vfio/vfio.h      |  8 ++++----
>  drivers/vfio/vfio_main.c | 25 +++++++++++++++----------
>  3 files changed, 33 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> index b56e19d2a02d..caf53716ddb2 100644
> --- a/drivers/vfio/group.c
> +++ b/drivers/vfio/group.c
> @@ -169,8 +169,9 @@ static void vfio_device_group_get_kvm_safe(struct vfio_device *device)
>  	spin_unlock(&device->group->kvm_ref_lock);
>  }
>  
> -static int vfio_device_group_open(struct vfio_device *device)
> +static int vfio_df_group_open(struct vfio_device_file *df)
>  {
> +	struct vfio_device *device = df->device;
>  	int ret;
>  
>  	mutex_lock(&device->group->group_lock);
> @@ -190,7 +191,11 @@ static int vfio_device_group_open(struct vfio_device *device)
>  	if (device->open_count == 0)
>  		vfio_device_group_get_kvm_safe(device);
>  
> -	ret = vfio_device_open(device, device->group->iommufd);
> +	df->iommufd = device->group->iommufd;
> +
> +	ret = vfio_df_open(df);
> +	if (ret)
> +		df->iommufd = NULL;
>  
>  	if (device->open_count == 0)
>  		vfio_device_put_kvm(device);
> @@ -202,12 +207,15 @@ static int vfio_device_group_open(struct vfio_device *device)
>  	return ret;
>  }
>  
> -void vfio_device_group_close(struct vfio_device *device)
> +void vfio_df_group_close(struct vfio_device_file *df)
>  {
> +	struct vfio_device *device = df->device;
> +
>  	mutex_lock(&device->group->group_lock);
>  	mutex_lock(&device->dev_set->lock);
>  
> -	vfio_device_close(device, device->group->iommufd);
> +	vfio_df_close(df);
> +	df->iommufd = NULL;
>  
>  	if (device->open_count == 0)
>  		vfio_device_put_kvm(device);
> @@ -228,7 +236,7 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
>  		goto err_out;
>  	}
>  
> -	ret = vfio_device_group_open(device);
> +	ret = vfio_df_group_open(df);
>  	if (ret)
>  		goto err_free;
>  
> @@ -260,7 +268,7 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
>  	return filep;
>  
>  err_close_device:
> -	vfio_device_group_close(device);
> +	vfio_df_group_close(df);
>  err_free:
>  	kfree(df);
>  err_out:
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index 69e1a0692b06..f9eb52eb9ed7 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -20,13 +20,13 @@ struct vfio_device_file {
>  	struct vfio_device *device;
>  	spinlock_t kvm_ref_lock; /* protect kvm field */
>  	struct kvm *kvm;
> +	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
>  };
>  
>  void vfio_device_put_registration(struct vfio_device *device);
>  bool vfio_device_try_get_registration(struct vfio_device *device);
> -int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd);
> -void vfio_device_close(struct vfio_device *device,
> -		       struct iommufd_ctx *iommufd);
> +int vfio_df_open(struct vfio_device_file *df);
> +void vfio_df_close(struct vfio_device_file *df);
>  struct vfio_device_file *
>  vfio_allocate_device_file(struct vfio_device *device);
>  
> @@ -91,7 +91,7 @@ void vfio_device_group_register(struct vfio_device *device);
>  void vfio_device_group_unregister(struct vfio_device *device);
>  int vfio_device_group_use_iommu(struct vfio_device *device);
>  void vfio_device_group_unuse_iommu(struct vfio_device *device);
> -void vfio_device_group_close(struct vfio_device *device);
> +void vfio_df_group_close(struct vfio_device_file *df);
>  struct vfio_group *vfio_group_from_file(struct file *file);
>  bool vfio_group_enforced_coherent(struct vfio_group *group);
>  void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm);
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 8ef9210ad2aa..a3c5817fc545 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -434,9 +434,10 @@ vfio_allocate_device_file(struct vfio_device *device)
>  	return df;
>  }
>  
> -static int vfio_device_first_open(struct vfio_device *device,
> -				  struct iommufd_ctx *iommufd)
> +static int vfio_device_first_open(struct vfio_device_file *df)
>  {
> +	struct vfio_device *device = df->device;
> +	struct iommufd_ctx *iommufd = df->iommufd;
>  	int ret;
>  
>  	lockdep_assert_held(&device->dev_set->lock);
> @@ -468,9 +469,11 @@ static int vfio_device_first_open(struct vfio_device *device,
>  	return ret;
>  }
>  
> -static void vfio_device_last_close(struct vfio_device *device,
> -				   struct iommufd_ctx *iommufd)
> +static void vfio_device_last_close(struct vfio_device_file *df)

Shouldn't these now be vfio_df_... functions too?  Thanks,

Ale

>  {
> +	struct vfio_device *device = df->device;
> +	struct iommufd_ctx *iommufd = df->iommufd;
> +
>  	lockdep_assert_held(&device->dev_set->lock);
>  
>  	if (device->ops->close_device)
> @@ -482,15 +485,16 @@ static void vfio_device_last_close(struct vfio_device *device,
>  	module_put(device->dev->driver->owner);
>  }
>  
> -int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd)
> +int vfio_df_open(struct vfio_device_file *df)
>  {
> +	struct vfio_device *device = df->device;
>  	int ret = 0;
>  
>  	lockdep_assert_held(&device->dev_set->lock);
>  
>  	device->open_count++;
>  	if (device->open_count == 1) {
> -		ret = vfio_device_first_open(device, iommufd);
> +		ret = vfio_device_first_open(df);
>  		if (ret)
>  			device->open_count--;
>  	}
> @@ -498,14 +502,15 @@ int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd)
>  	return ret;
>  }
>  
> -void vfio_device_close(struct vfio_device *device,
> -		       struct iommufd_ctx *iommufd)
> +void vfio_df_close(struct vfio_device_file *df)
>  {
> +	struct vfio_device *device = df->device;
> +
>  	lockdep_assert_held(&device->dev_set->lock);
>  
>  	vfio_assert_device_open(device);
>  	if (device->open_count == 1)
> -		vfio_device_last_close(device, iommufd);
> +		vfio_device_last_close(df);
>  	device->open_count--;
>  }
>  
> @@ -550,7 +555,7 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
>  	struct vfio_device_file *df = filep->private_data;
>  	struct vfio_device *device = df->device;
>  
> -	vfio_device_group_close(device);
> +	vfio_df_group_close(df);
>  
>  	vfio_device_put_registration(device);
>  


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 07/24] vfio: Block device access via device fd until device is opened
  2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
@ 2023-06-12 21:52     ` Alex Williamson
  -1 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-12 21:52 UTC (permalink / raw)
  To: Yi Liu
  Cc: jgg, kevin.tian, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

On Fri,  2 Jun 2023 05:16:36 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> Allow the vfio_device file to be in a state where the device FD is
> opened but the device cannot be used by userspace (i.e. its .open_device()
> hasn't been called). This inbetween state is not used when the device
> FD is spawned from the group FD, however when we create the device FD
> directly by opening a cdev it will be opened in the blocked state.
> 
> The reason for the inbetween state is that userspace only gets a FD but
> doesn't gain access permission until binding the FD to an iommufd. So in
> the blocked state, only the bind operation is allowed. Completing bind
> will allow user to further access the device.
> 
> This is implemented by adding a flag in struct vfio_device_file to mark
> the blocked state and using a simple smp_load_acquire() to obtain the
> flag value and serialize all the device setup with the thread accessing
> this device.
> 
> Following this lockless scheme, it can safely handle the device FD
> unbound->bound but it cannot handle bound->unbound. To allow this we'd
> need to add a lock on all the vfio ioctls which seems costly. So once
> device FD is bound, it remains bound until the FD is closed.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/group.c     | 11 ++++++++++-
>  drivers/vfio/vfio.h      |  1 +
>  drivers/vfio/vfio_main.c | 16 ++++++++++++++++
>  3 files changed, 27 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> index caf53716ddb2..088dd34c8931 100644
> --- a/drivers/vfio/group.c
> +++ b/drivers/vfio/group.c
> @@ -194,9 +194,18 @@ static int vfio_df_group_open(struct vfio_device_file *df)
>  	df->iommufd = device->group->iommufd;
>  
>  	ret = vfio_df_open(df);
> -	if (ret)
> +	if (ret) {
>  		df->iommufd = NULL;
> +		goto out_put_kvm;
> +	}
> +
> +	/*
> +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> +	 * read/write/mmap and vfio_file_has_device_access()
> +	 */
> +	smp_store_release(&df->access_granted, true);
>  
> +out_put_kvm:
>  	if (device->open_count == 0)
>  		vfio_device_put_kvm(device);
>  
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index f9eb52eb9ed7..fdf2fc73f880 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -18,6 +18,7 @@ struct vfio_container;
>  
>  struct vfio_device_file {
>  	struct vfio_device *device;
> +	bool access_granted;

Should we make this a more strongly defined data type and later move
devid (u32) here to partially fill the hole created?

I think this is being placed towards the front of the data structure
for cache line locality given this is a hot path for file operations.
But bool types have an implementation dependent size, making them
difficult to pack.  Also there will be a tendency to want to make this
a bit field, which is probably not compatible with the smp lockless
operations being used here.  We might get in front of these issues if
we just define it as a u8 now.  Thanks,

Alex

>  	spinlock_t kvm_ref_lock; /* protect kvm field */
>  	struct kvm *kvm;
>  	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index a3c5817fc545..4c8b7713dc3d 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -1129,6 +1129,10 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
>  	struct vfio_device *device = df->device;
>  	int ret;
>  
> +	/* Paired with smp_store_release() following vfio_df_open() */
> +	if (!smp_load_acquire(&df->access_granted))
> +		return -EINVAL;
> +
>  	ret = vfio_device_pm_runtime_get(device);
>  	if (ret)
>  		return ret;
> @@ -1156,6 +1160,10 @@ static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf,
>  	struct vfio_device_file *df = filep->private_data;
>  	struct vfio_device *device = df->device;
>  
> +	/* Paired with smp_store_release() following vfio_df_open() */
> +	if (!smp_load_acquire(&df->access_granted))
> +		return -EINVAL;
> +
>  	if (unlikely(!device->ops->read))
>  		return -EINVAL;
>  
> @@ -1169,6 +1177,10 @@ static ssize_t vfio_device_fops_write(struct file *filep,
>  	struct vfio_device_file *df = filep->private_data;
>  	struct vfio_device *device = df->device;
>  
> +	/* Paired with smp_store_release() following vfio_df_open() */
> +	if (!smp_load_acquire(&df->access_granted))
> +		return -EINVAL;
> +
>  	if (unlikely(!device->ops->write))
>  		return -EINVAL;
>  
> @@ -1180,6 +1192,10 @@ static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
>  	struct vfio_device_file *df = filep->private_data;
>  	struct vfio_device *device = df->device;
>  
> +	/* Paired with smp_store_release() following vfio_df_open() */
> +	if (!smp_load_acquire(&df->access_granted))
> +		return -EINVAL;
> +
>  	if (unlikely(!device->ops->mmap))
>  		return -EINVAL;
>  


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 07/24] vfio: Block device access via device fd until device is opened
@ 2023-06-12 21:52     ` Alex Williamson
  0 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-12 21:52 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, kvm, lulu, yanting.jiang,
	joro, nicolinc, jgg, kevin.tian, yan.y.zhao, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Fri,  2 Jun 2023 05:16:36 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> Allow the vfio_device file to be in a state where the device FD is
> opened but the device cannot be used by userspace (i.e. its .open_device()
> hasn't been called). This inbetween state is not used when the device
> FD is spawned from the group FD, however when we create the device FD
> directly by opening a cdev it will be opened in the blocked state.
> 
> The reason for the inbetween state is that userspace only gets a FD but
> doesn't gain access permission until binding the FD to an iommufd. So in
> the blocked state, only the bind operation is allowed. Completing bind
> will allow user to further access the device.
> 
> This is implemented by adding a flag in struct vfio_device_file to mark
> the blocked state and using a simple smp_load_acquire() to obtain the
> flag value and serialize all the device setup with the thread accessing
> this device.
> 
> Following this lockless scheme, it can safely handle the device FD
> unbound->bound but it cannot handle bound->unbound. To allow this we'd
> need to add a lock on all the vfio ioctls which seems costly. So once
> device FD is bound, it remains bound until the FD is closed.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/group.c     | 11 ++++++++++-
>  drivers/vfio/vfio.h      |  1 +
>  drivers/vfio/vfio_main.c | 16 ++++++++++++++++
>  3 files changed, 27 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> index caf53716ddb2..088dd34c8931 100644
> --- a/drivers/vfio/group.c
> +++ b/drivers/vfio/group.c
> @@ -194,9 +194,18 @@ static int vfio_df_group_open(struct vfio_device_file *df)
>  	df->iommufd = device->group->iommufd;
>  
>  	ret = vfio_df_open(df);
> -	if (ret)
> +	if (ret) {
>  		df->iommufd = NULL;
> +		goto out_put_kvm;
> +	}
> +
> +	/*
> +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> +	 * read/write/mmap and vfio_file_has_device_access()
> +	 */
> +	smp_store_release(&df->access_granted, true);
>  
> +out_put_kvm:
>  	if (device->open_count == 0)
>  		vfio_device_put_kvm(device);
>  
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index f9eb52eb9ed7..fdf2fc73f880 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -18,6 +18,7 @@ struct vfio_container;
>  
>  struct vfio_device_file {
>  	struct vfio_device *device;
> +	bool access_granted;

Should we make this a more strongly defined data type and later move
devid (u32) here to partially fill the hole created?

I think this is being placed towards the front of the data structure
for cache line locality given this is a hot path for file operations.
But bool types have an implementation dependent size, making them
difficult to pack.  Also there will be a tendency to want to make this
a bit field, which is probably not compatible with the smp lockless
operations being used here.  We might get in front of these issues if
we just define it as a u8 now.  Thanks,

Alex

>  	spinlock_t kvm_ref_lock; /* protect kvm field */
>  	struct kvm *kvm;
>  	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index a3c5817fc545..4c8b7713dc3d 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -1129,6 +1129,10 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
>  	struct vfio_device *device = df->device;
>  	int ret;
>  
> +	/* Paired with smp_store_release() following vfio_df_open() */
> +	if (!smp_load_acquire(&df->access_granted))
> +		return -EINVAL;
> +
>  	ret = vfio_device_pm_runtime_get(device);
>  	if (ret)
>  		return ret;
> @@ -1156,6 +1160,10 @@ static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf,
>  	struct vfio_device_file *df = filep->private_data;
>  	struct vfio_device *device = df->device;
>  
> +	/* Paired with smp_store_release() following vfio_df_open() */
> +	if (!smp_load_acquire(&df->access_granted))
> +		return -EINVAL;
> +
>  	if (unlikely(!device->ops->read))
>  		return -EINVAL;
>  
> @@ -1169,6 +1177,10 @@ static ssize_t vfio_device_fops_write(struct file *filep,
>  	struct vfio_device_file *df = filep->private_data;
>  	struct vfio_device *device = df->device;
>  
> +	/* Paired with smp_store_release() following vfio_df_open() */
> +	if (!smp_load_acquire(&df->access_granted))
> +		return -EINVAL;
> +
>  	if (unlikely(!device->ops->write))
>  		return -EINVAL;
>  
> @@ -1180,6 +1192,10 @@ static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
>  	struct vfio_device_file *df = filep->private_data;
>  	struct vfio_device *device = df->device;
>  
> +	/* Paired with smp_store_release() following vfio_df_open() */
> +	if (!smp_load_acquire(&df->access_granted))
> +		return -EINVAL;
> +
>  	if (unlikely(!device->ops->mmap))
>  		return -EINVAL;
>  


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
@ 2023-06-12 22:27     ` Alex Williamson
  -1 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-12 22:27 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, kvm, lulu, yanting.jiang,
	joro, nicolinc, jgg, kevin.tian, yan.y.zhao, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Fri,  2 Jun 2023 05:16:47 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> This adds ioctl for userspace to bind device cdev fd to iommufd.
> 
>     VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA
> 			      control provided by the iommufd. open_device
> 			      op is called after bind_iommufd op.
> 
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/device_cdev.c | 123 +++++++++++++++++++++++++++++++++++++
>  drivers/vfio/vfio.h        |  13 ++++
>  drivers/vfio/vfio_main.c   |   5 ++
>  include/linux/vfio.h       |   3 +-
>  include/uapi/linux/vfio.h  |  27 ++++++++
>  5 files changed, 170 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> index 1c640016a824..a4498ddbe774 100644
> --- a/drivers/vfio/device_cdev.c
> +++ b/drivers/vfio/device_cdev.c
> @@ -3,6 +3,7 @@
>   * Copyright (c) 2023 Intel Corporation.
>   */
>  #include <linux/vfio.h>
> +#include <linux/iommufd.h>
>  
>  #include "vfio.h"
>  
> @@ -44,6 +45,128 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
>  	return ret;
>  }
>  
> +static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
> +{
> +	spin_lock(&df->kvm_ref_lock);
> +	if (df->kvm)
> +		_vfio_device_get_kvm_safe(df->device, df->kvm);
> +	spin_unlock(&df->kvm_ref_lock);
> +}
> +
> +void vfio_df_cdev_close(struct vfio_device_file *df)
> +{
> +	struct vfio_device *device = df->device;
> +
> +	/*
> +	 * In the time of close, there is no contention with another one
> +	 * changing this flag.  So read df->access_granted without lock
> +	 * and no smp_load_acquire() is ok.
> +	 */
> +	if (!df->access_granted)
> +		return;
> +
> +	mutex_lock(&device->dev_set->lock);
> +	vfio_df_close(df);
> +	vfio_device_put_kvm(device);
> +	iommufd_ctx_put(df->iommufd);
> +	device->cdev_opened = false;
> +	mutex_unlock(&device->dev_set->lock);
> +	vfio_device_unblock_group(device);
> +}
> +
> +static struct iommufd_ctx *vfio_get_iommufd_from_fd(int fd)
> +{
> +	struct iommufd_ctx *iommufd;
> +	struct fd f;
> +
> +	f = fdget(fd);
> +	if (!f.file)
> +		return ERR_PTR(-EBADF);
> +
> +	iommufd = iommufd_ctx_from_file(f.file);
> +
> +	fdput(f);
> +	return iommufd;
> +}
> +
> +long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
> +				struct vfio_device_bind_iommufd __user *arg)
> +{
> +	struct vfio_device *device = df->device;
> +	struct vfio_device_bind_iommufd bind;
> +	unsigned long minsz;
> +	int ret;
> +
> +	static_assert(__same_type(arg->out_devid, df->devid));
> +
> +	minsz = offsetofend(struct vfio_device_bind_iommufd, out_devid);
> +
> +	if (copy_from_user(&bind, arg, minsz))
> +		return -EFAULT;
> +
> +	if (bind.argsz < minsz || bind.flags || bind.iommufd < 0)
> +		return -EINVAL;
> +
> +	/* BIND_IOMMUFD only allowed for cdev fds */
> +	if (df->group)
> +		return -EINVAL;
> +
> +	ret = vfio_device_block_group(device);
> +	if (ret)
> +		return ret;
> +
> +	mutex_lock(&device->dev_set->lock);
> +	/* one device cannot be bound twice */
> +	if (df->access_granted) {
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	}
> +
> +	df->iommufd = vfio_get_iommufd_from_fd(bind.iommufd);
> +	if (IS_ERR(df->iommufd)) {
> +		ret = PTR_ERR(df->iommufd);
> +		df->iommufd = NULL;
> +		goto out_unlock;
> +	}
> +
> +	/*
> +	 * Before the device open, get the KVM pointer currently
> +	 * associated with the device file (if there is) and obtain
> +	 * a reference.  This reference is held until device closed.
> +	 * Save the pointer in the device for use by drivers.
> +	 */
> +	vfio_device_get_kvm_safe(df);
> +
> +	ret = vfio_df_open(df);
> +	if (ret)
> +		goto out_put_kvm;
> +
> +	ret = copy_to_user(&arg->out_devid, &df->devid,
> +			   sizeof(df->devid)) ? -EFAULT : 0;
> +	if (ret)
> +		goto out_close_device;
> +
> +	/*
> +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> +	 * read/write/mmap
> +	 */
> +	smp_store_release(&df->access_granted, true);
> +	device->cdev_opened = true;
> +	mutex_unlock(&device->dev_set->lock);
> +	return 0;
> +
> +out_close_device:
> +	vfio_df_close(df);
> +out_put_kvm:
> +	vfio_device_put_kvm(device);
> +	iommufd_ctx_put(df->iommufd);
> +	df->iommufd = NULL;
> +out_unlock:
> +	mutex_unlock(&device->dev_set->lock);
> +	vfio_device_unblock_group(device);
> +	return ret;
> +}
> +
>  static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
>  {
>  	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index d12b5b524bfc..42de40d2cd4d 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -287,6 +287,9 @@ static inline void vfio_device_del(struct vfio_device *device)
>  }
>  
>  int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
> +void vfio_df_cdev_close(struct vfio_device_file *df);
> +long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
> +				struct vfio_device_bind_iommufd __user *arg);
>  int vfio_cdev_init(struct class *device_class);
>  void vfio_cdev_cleanup(void);
>  #else
> @@ -310,6 +313,16 @@ static inline int vfio_device_fops_cdev_open(struct inode *inode,
>  	return 0;
>  }
>  
> +static inline void vfio_df_cdev_close(struct vfio_device_file *df)
> +{
> +}
> +
> +static inline long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
> +					      struct vfio_device_bind_iommufd __user *arg)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
>  static inline int vfio_cdev_init(struct class *device_class)
>  {
>  	return 0;
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index ef55af75f459..9ba4d420eda2 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -572,6 +572,8 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
>  
>  	if (df->group)
>  		vfio_df_group_close(df);
> +	else
> +		vfio_df_cdev_close(df);
>  
>  	vfio_device_put_registration(device);
>  
> @@ -1145,6 +1147,9 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
>  	struct vfio_device *device = df->device;
>  	int ret;
>  
> +	if (cmd == VFIO_DEVICE_BIND_IOMMUFD)
> +		return vfio_df_ioctl_bind_iommufd(df, (void __user *)arg);
> +
>  	/* Paired with smp_store_release() following vfio_df_open() */
>  	if (!smp_load_acquire(&df->access_granted))
>  		return -EINVAL;
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 83cc5dc28b7a..e80a8ac86e46 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -66,6 +66,7 @@ struct vfio_device {
>  	struct iommufd_device *iommufd_device;
>  	bool iommufd_attached;
>  #endif
> +	bool cdev_opened:1;

Perhaps a more strongly defined data type here as well and roll
iommufd_attached into the same bit field scheme.

>  };
>  
>  /**
> @@ -170,7 +171,7 @@ vfio_iommufd_device_hot_reset_devid(struct vfio_device *vdev,
>  
>  static inline bool vfio_device_cdev_opened(struct vfio_device *device)
>  {
> -	return false;
> +	return device->cdev_opened;
>  }
>  
>  /**
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index f753124e1c82..7296012b7f36 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -194,6 +194,33 @@ struct vfio_group_status {
>  
>  /* --------------- IOCTLs for DEVICE file descriptors --------------- */
>  
> +/*
> + * VFIO_DEVICE_BIND_IOMMUFD - _IOR(VFIO_TYPE, VFIO_BASE + 18,
> + *				   struct vfio_device_bind_iommufd)
> + * @argsz:	 User filled size of this data.
> + * @flags:	 Must be 0.
> + * @iommufd:	 iommufd to bind.
> + * @out_devid:	 The device id generated by this bind. devid is a handle for
> + *		 this device/iommufd bond and can be used in IOMMUFD commands.
> + *
> + * Bind a vfio_device to the specified iommufd.
> + *
> + * User is restricted from accessing the device before the binding operation
> + * is completed.
> + *
> + * Unbind is automatically conducted when device fd is closed.
> + *
> + * Return: 0 on success, -errno on failure.
> + */
> +struct vfio_device_bind_iommufd {
> +	__u32		argsz;
> +	__u32		flags;
> +	__s32		iommufd;
> +	__u32		out_devid;
> +};
> +
> +#define VFIO_DEVICE_BIND_IOMMUFD	_IO(VFIO_TYPE, VFIO_BASE + 18)
> +

Why are we still defining device ioctls 18-20 before existing device
ioctls?  18 should be defined after 17...  Thanks,

Alex

>  /**
>   * VFIO_DEVICE_GET_INFO - _IOR(VFIO_TYPE, VFIO_BASE + 7,
>   *						struct vfio_device_info)


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
@ 2023-06-12 22:27     ` Alex Williamson
  0 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-12 22:27 UTC (permalink / raw)
  To: Yi Liu
  Cc: jgg, kevin.tian, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

On Fri,  2 Jun 2023 05:16:47 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> This adds ioctl for userspace to bind device cdev fd to iommufd.
> 
>     VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA
> 			      control provided by the iommufd. open_device
> 			      op is called after bind_iommufd op.
> 
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/device_cdev.c | 123 +++++++++++++++++++++++++++++++++++++
>  drivers/vfio/vfio.h        |  13 ++++
>  drivers/vfio/vfio_main.c   |   5 ++
>  include/linux/vfio.h       |   3 +-
>  include/uapi/linux/vfio.h  |  27 ++++++++
>  5 files changed, 170 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> index 1c640016a824..a4498ddbe774 100644
> --- a/drivers/vfio/device_cdev.c
> +++ b/drivers/vfio/device_cdev.c
> @@ -3,6 +3,7 @@
>   * Copyright (c) 2023 Intel Corporation.
>   */
>  #include <linux/vfio.h>
> +#include <linux/iommufd.h>
>  
>  #include "vfio.h"
>  
> @@ -44,6 +45,128 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
>  	return ret;
>  }
>  
> +static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
> +{
> +	spin_lock(&df->kvm_ref_lock);
> +	if (df->kvm)
> +		_vfio_device_get_kvm_safe(df->device, df->kvm);
> +	spin_unlock(&df->kvm_ref_lock);
> +}
> +
> +void vfio_df_cdev_close(struct vfio_device_file *df)
> +{
> +	struct vfio_device *device = df->device;
> +
> +	/*
> +	 * In the time of close, there is no contention with another one
> +	 * changing this flag.  So read df->access_granted without lock
> +	 * and no smp_load_acquire() is ok.
> +	 */
> +	if (!df->access_granted)
> +		return;
> +
> +	mutex_lock(&device->dev_set->lock);
> +	vfio_df_close(df);
> +	vfio_device_put_kvm(device);
> +	iommufd_ctx_put(df->iommufd);
> +	device->cdev_opened = false;
> +	mutex_unlock(&device->dev_set->lock);
> +	vfio_device_unblock_group(device);
> +}
> +
> +static struct iommufd_ctx *vfio_get_iommufd_from_fd(int fd)
> +{
> +	struct iommufd_ctx *iommufd;
> +	struct fd f;
> +
> +	f = fdget(fd);
> +	if (!f.file)
> +		return ERR_PTR(-EBADF);
> +
> +	iommufd = iommufd_ctx_from_file(f.file);
> +
> +	fdput(f);
> +	return iommufd;
> +}
> +
> +long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
> +				struct vfio_device_bind_iommufd __user *arg)
> +{
> +	struct vfio_device *device = df->device;
> +	struct vfio_device_bind_iommufd bind;
> +	unsigned long minsz;
> +	int ret;
> +
> +	static_assert(__same_type(arg->out_devid, df->devid));
> +
> +	minsz = offsetofend(struct vfio_device_bind_iommufd, out_devid);
> +
> +	if (copy_from_user(&bind, arg, minsz))
> +		return -EFAULT;
> +
> +	if (bind.argsz < minsz || bind.flags || bind.iommufd < 0)
> +		return -EINVAL;
> +
> +	/* BIND_IOMMUFD only allowed for cdev fds */
> +	if (df->group)
> +		return -EINVAL;
> +
> +	ret = vfio_device_block_group(device);
> +	if (ret)
> +		return ret;
> +
> +	mutex_lock(&device->dev_set->lock);
> +	/* one device cannot be bound twice */
> +	if (df->access_granted) {
> +		ret = -EINVAL;
> +		goto out_unlock;
> +	}
> +
> +	df->iommufd = vfio_get_iommufd_from_fd(bind.iommufd);
> +	if (IS_ERR(df->iommufd)) {
> +		ret = PTR_ERR(df->iommufd);
> +		df->iommufd = NULL;
> +		goto out_unlock;
> +	}
> +
> +	/*
> +	 * Before the device open, get the KVM pointer currently
> +	 * associated with the device file (if there is) and obtain
> +	 * a reference.  This reference is held until device closed.
> +	 * Save the pointer in the device for use by drivers.
> +	 */
> +	vfio_device_get_kvm_safe(df);
> +
> +	ret = vfio_df_open(df);
> +	if (ret)
> +		goto out_put_kvm;
> +
> +	ret = copy_to_user(&arg->out_devid, &df->devid,
> +			   sizeof(df->devid)) ? -EFAULT : 0;
> +	if (ret)
> +		goto out_close_device;
> +
> +	/*
> +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> +	 * read/write/mmap
> +	 */
> +	smp_store_release(&df->access_granted, true);
> +	device->cdev_opened = true;
> +	mutex_unlock(&device->dev_set->lock);
> +	return 0;
> +
> +out_close_device:
> +	vfio_df_close(df);
> +out_put_kvm:
> +	vfio_device_put_kvm(device);
> +	iommufd_ctx_put(df->iommufd);
> +	df->iommufd = NULL;
> +out_unlock:
> +	mutex_unlock(&device->dev_set->lock);
> +	vfio_device_unblock_group(device);
> +	return ret;
> +}
> +
>  static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
>  {
>  	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index d12b5b524bfc..42de40d2cd4d 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -287,6 +287,9 @@ static inline void vfio_device_del(struct vfio_device *device)
>  }
>  
>  int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
> +void vfio_df_cdev_close(struct vfio_device_file *df);
> +long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
> +				struct vfio_device_bind_iommufd __user *arg);
>  int vfio_cdev_init(struct class *device_class);
>  void vfio_cdev_cleanup(void);
>  #else
> @@ -310,6 +313,16 @@ static inline int vfio_device_fops_cdev_open(struct inode *inode,
>  	return 0;
>  }
>  
> +static inline void vfio_df_cdev_close(struct vfio_device_file *df)
> +{
> +}
> +
> +static inline long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
> +					      struct vfio_device_bind_iommufd __user *arg)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
>  static inline int vfio_cdev_init(struct class *device_class)
>  {
>  	return 0;
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index ef55af75f459..9ba4d420eda2 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -572,6 +572,8 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
>  
>  	if (df->group)
>  		vfio_df_group_close(df);
> +	else
> +		vfio_df_cdev_close(df);
>  
>  	vfio_device_put_registration(device);
>  
> @@ -1145,6 +1147,9 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
>  	struct vfio_device *device = df->device;
>  	int ret;
>  
> +	if (cmd == VFIO_DEVICE_BIND_IOMMUFD)
> +		return vfio_df_ioctl_bind_iommufd(df, (void __user *)arg);
> +
>  	/* Paired with smp_store_release() following vfio_df_open() */
>  	if (!smp_load_acquire(&df->access_granted))
>  		return -EINVAL;
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 83cc5dc28b7a..e80a8ac86e46 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -66,6 +66,7 @@ struct vfio_device {
>  	struct iommufd_device *iommufd_device;
>  	bool iommufd_attached;
>  #endif
> +	bool cdev_opened:1;

Perhaps a more strongly defined data type here as well and roll
iommufd_attached into the same bit field scheme.

>  };
>  
>  /**
> @@ -170,7 +171,7 @@ vfio_iommufd_device_hot_reset_devid(struct vfio_device *vdev,
>  
>  static inline bool vfio_device_cdev_opened(struct vfio_device *device)
>  {
> -	return false;
> +	return device->cdev_opened;
>  }
>  
>  /**
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index f753124e1c82..7296012b7f36 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -194,6 +194,33 @@ struct vfio_group_status {
>  
>  /* --------------- IOCTLs for DEVICE file descriptors --------------- */
>  
> +/*
> + * VFIO_DEVICE_BIND_IOMMUFD - _IOR(VFIO_TYPE, VFIO_BASE + 18,
> + *				   struct vfio_device_bind_iommufd)
> + * @argsz:	 User filled size of this data.
> + * @flags:	 Must be 0.
> + * @iommufd:	 iommufd to bind.
> + * @out_devid:	 The device id generated by this bind. devid is a handle for
> + *		 this device/iommufd bond and can be used in IOMMUFD commands.
> + *
> + * Bind a vfio_device to the specified iommufd.
> + *
> + * User is restricted from accessing the device before the binding operation
> + * is completed.
> + *
> + * Unbind is automatically conducted when device fd is closed.
> + *
> + * Return: 0 on success, -errno on failure.
> + */
> +struct vfio_device_bind_iommufd {
> +	__u32		argsz;
> +	__u32		flags;
> +	__s32		iommufd;
> +	__u32		out_devid;
> +};
> +
> +#define VFIO_DEVICE_BIND_IOMMUFD	_IO(VFIO_TYPE, VFIO_BASE + 18)
> +

Why are we still defining device ioctls 18-20 before existing device
ioctls?  18 should be defined after 17...  Thanks,

Alex

>  /**
>   * VFIO_DEVICE_GET_INFO - _IOR(VFIO_TYPE, VFIO_BASE + 7,
>   *						struct vfio_device_info)


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 20/24] vfio: Only check group->type for noiommu test
  2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
@ 2023-06-12 22:37     ` Alex Williamson
  -1 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-12 22:37 UTC (permalink / raw)
  To: Yi Liu
  Cc: jgg, kevin.tian, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

On Fri,  2 Jun 2023 05:16:49 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> group->type can be VFIO_NO_IOMMU only when vfio_noiommu option is true.
> And vfio_noiommu option can only be true if CONFIG_VFIO_NOIOMMU is enabled.
> So checking group->type is enough when testing noiommu.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/group.c | 3 +--
>  drivers/vfio/vfio.h  | 3 +--
>  2 files changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> index 41a09a2df690..653b62f93474 100644
> --- a/drivers/vfio/group.c
> +++ b/drivers/vfio/group.c
> @@ -133,8 +133,7 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
>  
>  	iommufd = iommufd_ctx_from_file(f.file);
>  	if (!IS_ERR(iommufd)) {
> -		if (IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
> -		    group->type == VFIO_NO_IOMMU)
> +		if (group->type == VFIO_NO_IOMMU)
>  			ret = iommufd_vfio_compat_set_no_iommu(iommufd);
>  		else
>  			ret = iommufd_vfio_compat_ioas_create(iommufd);
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index 5835c74e97ce..1b89e8bc8571 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -108,8 +108,7 @@ void vfio_group_cleanup(void);
>  
>  static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
>  {
> -	return IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
> -	       vdev->group->type == VFIO_NO_IOMMU;
> +	return vdev->group->type == VFIO_NO_IOMMU;
>  }
>  
>  #if IS_ENABLED(CONFIG_VFIO_CONTAINER)

This patch should be dropped.  It's logically correct, but ignores that
the config option can be determined at compile time and therefore the
code can be better optimized based on that test.  I think there was a
specific case where I questioned it, but this drops an otherwise valid
compiler optimization.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 20/24] vfio: Only check group->type for noiommu test
@ 2023-06-12 22:37     ` Alex Williamson
  0 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-12 22:37 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, kvm, lulu, yanting.jiang,
	joro, nicolinc, jgg, kevin.tian, yan.y.zhao, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Fri,  2 Jun 2023 05:16:49 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> group->type can be VFIO_NO_IOMMU only when vfio_noiommu option is true.
> And vfio_noiommu option can only be true if CONFIG_VFIO_NOIOMMU is enabled.
> So checking group->type is enough when testing noiommu.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/group.c | 3 +--
>  drivers/vfio/vfio.h  | 3 +--
>  2 files changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> index 41a09a2df690..653b62f93474 100644
> --- a/drivers/vfio/group.c
> +++ b/drivers/vfio/group.c
> @@ -133,8 +133,7 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
>  
>  	iommufd = iommufd_ctx_from_file(f.file);
>  	if (!IS_ERR(iommufd)) {
> -		if (IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
> -		    group->type == VFIO_NO_IOMMU)
> +		if (group->type == VFIO_NO_IOMMU)
>  			ret = iommufd_vfio_compat_set_no_iommu(iommufd);
>  		else
>  			ret = iommufd_vfio_compat_ioas_create(iommufd);
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index 5835c74e97ce..1b89e8bc8571 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -108,8 +108,7 @@ void vfio_group_cleanup(void);
>  
>  static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
>  {
> -	return IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
> -	       vdev->group->type == VFIO_NO_IOMMU;
> +	return vdev->group->type == VFIO_NO_IOMMU;
>  }
>  
>  #if IS_ENABLED(CONFIG_VFIO_CONTAINER)

This patch should be dropped.  It's logically correct, but ignores that
the config option can be determined at compile time and therefore the
code can be better optimized based on that test.  I think there was a
specific case where I questioned it, but this drops an otherwise valid
compiler optimization.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
  2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
@ 2023-06-12 22:42     ` Alex Williamson
  -1 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-12 22:42 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, kvm, lulu, yanting.jiang,
	joro, nicolinc, jgg, kevin.tian, yan.y.zhao, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Fri,  2 Jun 2023 05:16:50 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> This moves the noiommu device determination and noiommu taint out of
> vfio_group_find_or_alloc(). noiommu device is determined in
> __vfio_register_dev() and result is stored in flag vfio_device->noiommu,
> the noiommu taint is added in the end of __vfio_register_dev().
> 
> This is also a preparation for compiling out vfio_group infrastructure
> as it makes the noiommu detection and taint common between the cdev path
> and group path though cdev path does not support noiommu.

Does this really still make sense?  The motivation for the change is
really not clear without cdev support for noiommu.  Thanks,

Alex
 
> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/group.c     | 15 ---------------
>  drivers/vfio/vfio_main.c | 31 ++++++++++++++++++++++++++++++-
>  include/linux/vfio.h     |  1 +
>  3 files changed, 31 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> index 653b62f93474..64cdd0ea8825 100644
> --- a/drivers/vfio/group.c
> +++ b/drivers/vfio/group.c
> @@ -668,21 +668,6 @@ static struct vfio_group *vfio_group_find_or_alloc(struct device *dev)
>  	struct vfio_group *group;
>  
>  	iommu_group = iommu_group_get(dev);
> -	if (!iommu_group && vfio_noiommu) {
> -		/*
> -		 * With noiommu enabled, create an IOMMU group for devices that
> -		 * don't already have one, implying no IOMMU hardware/driver
> -		 * exists.  Taint the kernel because we're about to give a DMA
> -		 * capable device to a user without IOMMU protection.
> -		 */
> -		group = vfio_noiommu_group_alloc(dev, VFIO_NO_IOMMU);
> -		if (!IS_ERR(group)) {
> -			add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> -			dev_warn(dev, "Adding kernel taint for vfio-noiommu group on device\n");
> -		}
> -		return group;
> -	}
> -
>  	if (!iommu_group)
>  		return ERR_PTR(-EINVAL);
>  
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 6d8f9b0f3637..00a699b9f76b 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -265,6 +265,18 @@ static int vfio_init_device(struct vfio_device *device, struct device *dev,
>  	return ret;
>  }
>  
> +static int vfio_device_set_noiommu(struct vfio_device *device)
> +{
> +	struct iommu_group *iommu_group = iommu_group_get(device->dev);
> +
> +	if (!iommu_group && !vfio_noiommu)
> +		return -EINVAL;
> +
> +	device->noiommu = !iommu_group;
> +	iommu_group_put(iommu_group); /* Accepts NULL */
> +	return 0;
> +}
> +
>  static int __vfio_register_dev(struct vfio_device *device,
>  			       enum vfio_group_type type)
>  {
> @@ -277,6 +289,13 @@ static int __vfio_register_dev(struct vfio_device *device,
>  		     !device->ops->detach_ioas)))
>  		return -EINVAL;
>  
> +	/* Only physical devices can be noiommu device */
> +	if (type == VFIO_IOMMU) {
> +		ret = vfio_device_set_noiommu(device);
> +		if (ret)
> +			return ret;
> +	}
> +
>  	/*
>  	 * If the driver doesn't specify a set then the device is added to a
>  	 * singleton set just for itself.
> @@ -288,7 +307,8 @@ static int __vfio_register_dev(struct vfio_device *device,
>  	if (ret)
>  		return ret;
>  
> -	ret = vfio_device_set_group(device, type);
> +	ret = vfio_device_set_group(device,
> +				    device->noiommu ? VFIO_NO_IOMMU : type);
>  	if (ret)
>  		return ret;
>  
> @@ -301,6 +321,15 @@ static int __vfio_register_dev(struct vfio_device *device,
>  
>  	vfio_device_group_register(device);
>  
> +	if (device->noiommu) {
> +		/*
> +		 * noiommu deivces have no IOMMU hardware/driver.  Taint the
> +		 * kernel because we're about to give a DMA capable device to
> +		 * a user without IOMMU protection.
> +		 */
> +		add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> +		dev_warn(device->dev, "Adding kernel taint for vfio-noiommu on device\n");
> +	}
>  	return 0;
>  err_out:
>  	vfio_device_remove_group(device);
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index e80a8ac86e46..183e620009e7 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -67,6 +67,7 @@ struct vfio_device {
>  	bool iommufd_attached;
>  #endif
>  	bool cdev_opened:1;
> +	bool noiommu:1;
>  };
>  
>  /**


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
@ 2023-06-12 22:42     ` Alex Williamson
  0 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-12 22:42 UTC (permalink / raw)
  To: Yi Liu
  Cc: jgg, kevin.tian, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

On Fri,  2 Jun 2023 05:16:50 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> This moves the noiommu device determination and noiommu taint out of
> vfio_group_find_or_alloc(). noiommu device is determined in
> __vfio_register_dev() and result is stored in flag vfio_device->noiommu,
> the noiommu taint is added in the end of __vfio_register_dev().
> 
> This is also a preparation for compiling out vfio_group infrastructure
> as it makes the noiommu detection and taint common between the cdev path
> and group path though cdev path does not support noiommu.

Does this really still make sense?  The motivation for the change is
really not clear without cdev support for noiommu.  Thanks,

Alex
 
> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/group.c     | 15 ---------------
>  drivers/vfio/vfio_main.c | 31 ++++++++++++++++++++++++++++++-
>  include/linux/vfio.h     |  1 +
>  3 files changed, 31 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> index 653b62f93474..64cdd0ea8825 100644
> --- a/drivers/vfio/group.c
> +++ b/drivers/vfio/group.c
> @@ -668,21 +668,6 @@ static struct vfio_group *vfio_group_find_or_alloc(struct device *dev)
>  	struct vfio_group *group;
>  
>  	iommu_group = iommu_group_get(dev);
> -	if (!iommu_group && vfio_noiommu) {
> -		/*
> -		 * With noiommu enabled, create an IOMMU group for devices that
> -		 * don't already have one, implying no IOMMU hardware/driver
> -		 * exists.  Taint the kernel because we're about to give a DMA
> -		 * capable device to a user without IOMMU protection.
> -		 */
> -		group = vfio_noiommu_group_alloc(dev, VFIO_NO_IOMMU);
> -		if (!IS_ERR(group)) {
> -			add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> -			dev_warn(dev, "Adding kernel taint for vfio-noiommu group on device\n");
> -		}
> -		return group;
> -	}
> -
>  	if (!iommu_group)
>  		return ERR_PTR(-EINVAL);
>  
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 6d8f9b0f3637..00a699b9f76b 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -265,6 +265,18 @@ static int vfio_init_device(struct vfio_device *device, struct device *dev,
>  	return ret;
>  }
>  
> +static int vfio_device_set_noiommu(struct vfio_device *device)
> +{
> +	struct iommu_group *iommu_group = iommu_group_get(device->dev);
> +
> +	if (!iommu_group && !vfio_noiommu)
> +		return -EINVAL;
> +
> +	device->noiommu = !iommu_group;
> +	iommu_group_put(iommu_group); /* Accepts NULL */
> +	return 0;
> +}
> +
>  static int __vfio_register_dev(struct vfio_device *device,
>  			       enum vfio_group_type type)
>  {
> @@ -277,6 +289,13 @@ static int __vfio_register_dev(struct vfio_device *device,
>  		     !device->ops->detach_ioas)))
>  		return -EINVAL;
>  
> +	/* Only physical devices can be noiommu device */
> +	if (type == VFIO_IOMMU) {
> +		ret = vfio_device_set_noiommu(device);
> +		if (ret)
> +			return ret;
> +	}
> +
>  	/*
>  	 * If the driver doesn't specify a set then the device is added to a
>  	 * singleton set just for itself.
> @@ -288,7 +307,8 @@ static int __vfio_register_dev(struct vfio_device *device,
>  	if (ret)
>  		return ret;
>  
> -	ret = vfio_device_set_group(device, type);
> +	ret = vfio_device_set_group(device,
> +				    device->noiommu ? VFIO_NO_IOMMU : type);
>  	if (ret)
>  		return ret;
>  
> @@ -301,6 +321,15 @@ static int __vfio_register_dev(struct vfio_device *device,
>  
>  	vfio_device_group_register(device);
>  
> +	if (device->noiommu) {
> +		/*
> +		 * noiommu deivces have no IOMMU hardware/driver.  Taint the
> +		 * kernel because we're about to give a DMA capable device to
> +		 * a user without IOMMU protection.
> +		 */
> +		add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> +		dev_warn(device->dev, "Adding kernel taint for vfio-noiommu on device\n");
> +	}
>  	return 0;
>  err_out:
>  	vfio_device_remove_group(device);
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index e80a8ac86e46..183e620009e7 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -67,6 +67,7 @@ struct vfio_device {
>  	bool iommufd_attached;
>  #endif
>  	bool cdev_opened:1;
> +	bool noiommu:1;
>  };
>  
>  /**


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 22/24] vfio: Remove vfio_device_is_noiommu()
  2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
@ 2023-06-12 22:46     ` Alex Williamson
  -1 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-12 22:46 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, kvm, lulu, yanting.jiang,
	joro, nicolinc, jgg, kevin.tian, yan.y.zhao, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Fri,  2 Jun 2023 05:16:51 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> This converts noiommu test to use vfio_device->noiommu flag. Per this
> change, vfio_device_is_noiommu() is removed.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/group.c   | 2 +-
>  drivers/vfio/iommufd.c | 4 ++--
>  drivers/vfio/vfio.h    | 9 ++-------
>  3 files changed, 5 insertions(+), 10 deletions(-)

Drop this as well.  You can see here all the code paths that wouldn't
have even been compiled with CONFIG_VFIO_NOIOMMU unset.  Thanks,

Alex
 
> diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> index 64cdd0ea8825..08d37811507e 100644
> --- a/drivers/vfio/group.c
> +++ b/drivers/vfio/group.c
> @@ -191,7 +191,7 @@ static int vfio_df_group_open(struct vfio_device_file *df)
>  		vfio_device_group_get_kvm_safe(device);
>  
>  	df->iommufd = device->group->iommufd;
> -	if (df->iommufd && vfio_device_is_noiommu(device) && device->open_count == 0) {
> +	if (df->iommufd && device->noiommu && device->open_count == 0) {
>  		/*
>  		 * Require no compat ioas to be assigned to proceed.  The basic
>  		 * statement is that the user cannot have done something that
> diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> index a59ed4f881aa..fac8ca74ec85 100644
> --- a/drivers/vfio/iommufd.c
> +++ b/drivers/vfio/iommufd.c
> @@ -37,7 +37,7 @@ int vfio_iommufd_compat_attach_ioas(struct vfio_device *vdev,
>  	lockdep_assert_held(&vdev->dev_set->lock);
>  
>  	/* compat noiommu does not need to do ioas attach */
> -	if (vfio_device_is_noiommu(vdev))
> +	if (vdev->noiommu)
>  		return 0;
>  
>  	ret = iommufd_vfio_compat_ioas_get_id(ictx, &ioas_id);
> @@ -54,7 +54,7 @@ void vfio_df_iommufd_unbind(struct vfio_device_file *df)
>  
>  	lockdep_assert_held(&vdev->dev_set->lock);
>  
> -	if (vfio_device_is_noiommu(vdev))
> +	if (vdev->noiommu)
>  		return;
>  
>  	if (vdev->ops->unbind_iommufd)
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index 1b89e8bc8571..b138b8334fe0 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -106,11 +106,6 @@ bool vfio_device_has_container(struct vfio_device *device);
>  int __init vfio_group_init(void);
>  void vfio_group_cleanup(void);
>  
> -static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
> -{
> -	return vdev->group->type == VFIO_NO_IOMMU;
> -}
> -
>  #if IS_ENABLED(CONFIG_VFIO_CONTAINER)
>  /**
>   * struct vfio_iommu_driver_ops - VFIO IOMMU driver callbacks
> @@ -271,7 +266,7 @@ void vfio_init_device_cdev(struct vfio_device *device);
>  static inline int vfio_device_add(struct vfio_device *device)
>  {
>  	/* cdev does not support noiommu device */
> -	if (vfio_device_is_noiommu(device))
> +	if (device->noiommu)
>  		return device_add(&device->device);
>  	vfio_init_device_cdev(device);
>  	return cdev_device_add(&device->cdev, &device->device);
> @@ -279,7 +274,7 @@ static inline int vfio_device_add(struct vfio_device *device)
>  
>  static inline void vfio_device_del(struct vfio_device *device)
>  {
> -	if (vfio_device_is_noiommu(device))
> +	if (device->noiommu)
>  		device_del(&device->device);
>  	else
>  		cdev_device_del(&device->cdev, &device->device);


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 22/24] vfio: Remove vfio_device_is_noiommu()
@ 2023-06-12 22:46     ` Alex Williamson
  0 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-12 22:46 UTC (permalink / raw)
  To: Yi Liu
  Cc: jgg, kevin.tian, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

On Fri,  2 Jun 2023 05:16:51 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> This converts noiommu test to use vfio_device->noiommu flag. Per this
> change, vfio_device_is_noiommu() is removed.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/group.c   | 2 +-
>  drivers/vfio/iommufd.c | 4 ++--
>  drivers/vfio/vfio.h    | 9 ++-------
>  3 files changed, 5 insertions(+), 10 deletions(-)

Drop this as well.  You can see here all the code paths that wouldn't
have even been compiled with CONFIG_VFIO_NOIOMMU unset.  Thanks,

Alex
 
> diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> index 64cdd0ea8825..08d37811507e 100644
> --- a/drivers/vfio/group.c
> +++ b/drivers/vfio/group.c
> @@ -191,7 +191,7 @@ static int vfio_df_group_open(struct vfio_device_file *df)
>  		vfio_device_group_get_kvm_safe(device);
>  
>  	df->iommufd = device->group->iommufd;
> -	if (df->iommufd && vfio_device_is_noiommu(device) && device->open_count == 0) {
> +	if (df->iommufd && device->noiommu && device->open_count == 0) {
>  		/*
>  		 * Require no compat ioas to be assigned to proceed.  The basic
>  		 * statement is that the user cannot have done something that
> diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> index a59ed4f881aa..fac8ca74ec85 100644
> --- a/drivers/vfio/iommufd.c
> +++ b/drivers/vfio/iommufd.c
> @@ -37,7 +37,7 @@ int vfio_iommufd_compat_attach_ioas(struct vfio_device *vdev,
>  	lockdep_assert_held(&vdev->dev_set->lock);
>  
>  	/* compat noiommu does not need to do ioas attach */
> -	if (vfio_device_is_noiommu(vdev))
> +	if (vdev->noiommu)
>  		return 0;
>  
>  	ret = iommufd_vfio_compat_ioas_get_id(ictx, &ioas_id);
> @@ -54,7 +54,7 @@ void vfio_df_iommufd_unbind(struct vfio_device_file *df)
>  
>  	lockdep_assert_held(&vdev->dev_set->lock);
>  
> -	if (vfio_device_is_noiommu(vdev))
> +	if (vdev->noiommu)
>  		return;
>  
>  	if (vdev->ops->unbind_iommufd)
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index 1b89e8bc8571..b138b8334fe0 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -106,11 +106,6 @@ bool vfio_device_has_container(struct vfio_device *device);
>  int __init vfio_group_init(void);
>  void vfio_group_cleanup(void);
>  
> -static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
> -{
> -	return vdev->group->type == VFIO_NO_IOMMU;
> -}
> -
>  #if IS_ENABLED(CONFIG_VFIO_CONTAINER)
>  /**
>   * struct vfio_iommu_driver_ops - VFIO IOMMU driver callbacks
> @@ -271,7 +266,7 @@ void vfio_init_device_cdev(struct vfio_device *device);
>  static inline int vfio_device_add(struct vfio_device *device)
>  {
>  	/* cdev does not support noiommu device */
> -	if (vfio_device_is_noiommu(device))
> +	if (device->noiommu)
>  		return device_add(&device->device);
>  	vfio_init_device_cdev(device);
>  	return cdev_device_add(&device->cdev, &device->device);
> @@ -279,7 +274,7 @@ static inline int vfio_device_add(struct vfio_device *device)
>  
>  static inline void vfio_device_del(struct vfio_device *device)
>  {
> -	if (vfio_device_is_noiommu(device))
> +	if (device->noiommu)
>  		device_del(&device->device);
>  	else
>  		cdev_device_del(&device->cdev, &device->device);


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 24/24] docs: vfio: Add vfio device cdev description
  2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
@ 2023-06-12 23:06     ` Alex Williamson
  -1 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-12 23:06 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, kvm, lulu, yanting.jiang,
	joro, nicolinc, jgg, kevin.tian, yan.y.zhao, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Fri,  2 Jun 2023 05:16:53 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> This gives notes for userspace applications on device cdev usage.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  Documentation/driver-api/vfio.rst | 132 ++++++++++++++++++++++++++++++
>  1 file changed, 132 insertions(+)
> 
> diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
> index 363e12c90b87..f00c9b86bda0 100644
> --- a/Documentation/driver-api/vfio.rst
> +++ b/Documentation/driver-api/vfio.rst
> @@ -239,6 +239,130 @@ group and can access them as follows::
>  	/* Gratuitous device reset and go... */
>  	ioctl(device, VFIO_DEVICE_RESET);
>  
> +IOMMUFD and vfio_iommu_type1
> +----------------------------
> +
> +IOMMUFD is the new user API to manage I/O page tables from userspace.
> +It intends to be the portal of delivering advanced userspace DMA
> +features (nested translation [5]_, PASID [6]_, etc.) while also providing
> +a backwards compatibility interface for existing VFIO_TYPE1v2_IOMMU use
> +cases.  Eventually the vfio_iommu_type1 driver, as well as the legacy
> +vfio container and group model is intended to be deprecated.
> +
> +The IOMMUFD backwards compatibility interface can be enabled two ways.
> +In the first method, the kernel can be configured with
> +CONFIG_IOMMUFD_VFIO_CONTAINER, in which case the IOMMUFD subsystem
> +transparently provides the entire infrastructure for the VFIO
> +container and IOMMU backend interfaces.  The compatibility mode can
> +also be accessed if the VFIO container interface, ie. /dev/vfio/vfio is
> +simply symlink'd to /dev/iommu.  Note that at the time of writing, the
> +compatibility mode is not entirely feature complete relative to
> +VFIO_TYPE1v2_IOMMU (ex. DMA mapping MMIO) and does not attempt to
> +provide compatibility to the VFIO_SPAPR_TCE_IOMMU interface.  Therefore
> +it is not generally advisable at this time to switch from native VFIO
> +implementations to the IOMMUFD compatibility interfaces.
> +
> +Long term, VFIO users should migrate to device access through the cdev
> +interface described below, and native access through the IOMMUFD
> +provided interfaces.
> +
> +VFIO Device cdev
> +----------------
> +
> +Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
> +in a VFIO group.
> +
> +With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd
> +by directly opening a character device /dev/vfio/devices/vfioX where
> +"X" is the number allocated uniquely by VFIO for registered devices.
> +cdev interface does not support noiommu, so user should use the legacy
> +group interface if noiommu is needed.
> +
> +The cdev only works with IOMMUFD.  Both VFIO drivers and applications
> +must adapt to the new cdev security model which requires using
> +VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to
> +actually use the device.  Once BIND succeeds then a VFIO device can
> +be fully accessed by the user.
> +
> +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers.
> +Hence those modules can be fully compiled out in an environment
> +where no legacy VFIO application exists.
> +
> +So far SPAPR does not support IOMMUFD yet.  So it cannot support device
> +cdev neither.

s/neither/either/

Unless I missed it, we've not described that vfio device cdev access is
still bound by IOMMU group semantics, ie. there can be one DMA owner
for the group.  That's a pretty common failure point for multi-function
consumer device use cases, so the why, where, and how it fails should
be well covered.

In general there's been a lot of cross collaboration to get the series
this far.  I see an abundance of Tested-by, but unfortunately not a lot
of Reviewed-by beyond about the first 1/3rd of the series.  Thanks,

Alex

> +
> +Device cdev Example
> +-------------------
> +
> +Assume user wants to access PCI device 0000:6a:01.0::
> +
> +	$ ls /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/
> +	vfio0
> +
> +This device is therefore represented as vfio0.  The user can verify
> +its existence::
> +
> +	$ ls -l /dev/vfio/devices/vfio0
> +	crw------- 1 root root 511, 0 Feb 16 01:22 /dev/vfio/devices/vfio0
> +	$ cat /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/vfio0/dev
> +	511:0
> +	$ ls -l /dev/char/511\:0
> +	lrwxrwxrwx 1 root root 21 Feb 16 01:22 /dev/char/511:0 -> ../vfio/devices/vfio0
> +
> +Then provide the user with access to the device if unprivileged
> +operation is desired::
> +
> +	$ chown user:user /dev/vfio/devices/vfio0
> +
> +Finally the user could get cdev fd by::
> +
> +	cdev_fd = open("/dev/vfio/devices/vfio0", O_RDWR);
> +
> +An opened cdev_fd doesn't give the user any permission of accessing
> +the device except binding the cdev_fd to an iommufd.  After that point
> +then the device is fully accessible including attaching it to an
> +IOMMUFD IOAS/HWPT to enable userspace DMA::
> +
> +	struct vfio_device_bind_iommufd bind = {
> +		.argsz = sizeof(bind),
> +		.flags = 0,
> +	};
> +	struct iommu_ioas_alloc alloc_data  = {
> +		.size = sizeof(alloc_data),
> +		.flags = 0,
> +	};
> +	struct vfio_device_attach_iommufd_pt attach_data = {
> +		.argsz = sizeof(attach_data),
> +		.flags = 0,
> +	};
> +	struct iommu_ioas_map map = {
> +		.size = sizeof(map),
> +		.flags = IOMMU_IOAS_MAP_READABLE |
> +			 IOMMU_IOAS_MAP_WRITEABLE |
> +			 IOMMU_IOAS_MAP_FIXED_IOVA,
> +		.__reserved = 0,
> +	};
> +
> +	iommufd = open("/dev/iommu", O_RDWR);
> +
> +	bind.iommufd = iommufd;
> +	ioctl(cdev_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind);
> +
> +	ioctl(iommufd, IOMMU_IOAS_ALLOC, &alloc_data);
> +	attach_data.pt_id = alloc_data.out_ioas_id;
> +	ioctl(cdev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data);
> +
> +	/* Allocate some space and setup a DMA mapping */
> +	map.user_va = (int64_t)mmap(0, 1024 * 1024, PROT_READ | PROT_WRITE,
> +				    MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
> +	map.iova = 0; /* 1MB starting at 0x0 from device view */
> +	map.length = 1024 * 1024;
> +	map.ioas_id = alloc_data.out_ioas_id;;
> +
> +	ioctl(iommufd, IOMMU_IOAS_MAP, &map);
> +
> +	/* Other device operations as stated in "VFIO Usage Example" */
> +
>  VFIO User API
>  -------------------------------------------------------------------------------
>  
> @@ -566,3 +690,11 @@ This implementation has some specifics:
>  				\-0d.1
>  
>  	00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
> +
> +.. [5] Nested translation is an IOMMU feature which supports two stage
> +   address translations.  This improves the address translation efficiency
> +   in IOMMU virtualization.
> +
> +.. [6] PASID stands for Process Address Space ID, introduced by PCI
> +   Express.  It is a prerequisite for Shared Virtual Addressing (SVA)
> +   and Scalable I/O Virtualization (Scalable IOV).


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 24/24] docs: vfio: Add vfio device cdev description
@ 2023-06-12 23:06     ` Alex Williamson
  0 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-12 23:06 UTC (permalink / raw)
  To: Yi Liu
  Cc: jgg, kevin.tian, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

On Fri,  2 Jun 2023 05:16:53 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> This gives notes for userspace applications on device cdev usage.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  Documentation/driver-api/vfio.rst | 132 ++++++++++++++++++++++++++++++
>  1 file changed, 132 insertions(+)
> 
> diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
> index 363e12c90b87..f00c9b86bda0 100644
> --- a/Documentation/driver-api/vfio.rst
> +++ b/Documentation/driver-api/vfio.rst
> @@ -239,6 +239,130 @@ group and can access them as follows::
>  	/* Gratuitous device reset and go... */
>  	ioctl(device, VFIO_DEVICE_RESET);
>  
> +IOMMUFD and vfio_iommu_type1
> +----------------------------
> +
> +IOMMUFD is the new user API to manage I/O page tables from userspace.
> +It intends to be the portal of delivering advanced userspace DMA
> +features (nested translation [5]_, PASID [6]_, etc.) while also providing
> +a backwards compatibility interface for existing VFIO_TYPE1v2_IOMMU use
> +cases.  Eventually the vfio_iommu_type1 driver, as well as the legacy
> +vfio container and group model is intended to be deprecated.
> +
> +The IOMMUFD backwards compatibility interface can be enabled two ways.
> +In the first method, the kernel can be configured with
> +CONFIG_IOMMUFD_VFIO_CONTAINER, in which case the IOMMUFD subsystem
> +transparently provides the entire infrastructure for the VFIO
> +container and IOMMU backend interfaces.  The compatibility mode can
> +also be accessed if the VFIO container interface, ie. /dev/vfio/vfio is
> +simply symlink'd to /dev/iommu.  Note that at the time of writing, the
> +compatibility mode is not entirely feature complete relative to
> +VFIO_TYPE1v2_IOMMU (ex. DMA mapping MMIO) and does not attempt to
> +provide compatibility to the VFIO_SPAPR_TCE_IOMMU interface.  Therefore
> +it is not generally advisable at this time to switch from native VFIO
> +implementations to the IOMMUFD compatibility interfaces.
> +
> +Long term, VFIO users should migrate to device access through the cdev
> +interface described below, and native access through the IOMMUFD
> +provided interfaces.
> +
> +VFIO Device cdev
> +----------------
> +
> +Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
> +in a VFIO group.
> +
> +With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd
> +by directly opening a character device /dev/vfio/devices/vfioX where
> +"X" is the number allocated uniquely by VFIO for registered devices.
> +cdev interface does not support noiommu, so user should use the legacy
> +group interface if noiommu is needed.
> +
> +The cdev only works with IOMMUFD.  Both VFIO drivers and applications
> +must adapt to the new cdev security model which requires using
> +VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to
> +actually use the device.  Once BIND succeeds then a VFIO device can
> +be fully accessed by the user.
> +
> +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers.
> +Hence those modules can be fully compiled out in an environment
> +where no legacy VFIO application exists.
> +
> +So far SPAPR does not support IOMMUFD yet.  So it cannot support device
> +cdev neither.

s/neither/either/

Unless I missed it, we've not described that vfio device cdev access is
still bound by IOMMU group semantics, ie. there can be one DMA owner
for the group.  That's a pretty common failure point for multi-function
consumer device use cases, so the why, where, and how it fails should
be well covered.

In general there's been a lot of cross collaboration to get the series
this far.  I see an abundance of Tested-by, but unfortunately not a lot
of Reviewed-by beyond about the first 1/3rd of the series.  Thanks,

Alex

> +
> +Device cdev Example
> +-------------------
> +
> +Assume user wants to access PCI device 0000:6a:01.0::
> +
> +	$ ls /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/
> +	vfio0
> +
> +This device is therefore represented as vfio0.  The user can verify
> +its existence::
> +
> +	$ ls -l /dev/vfio/devices/vfio0
> +	crw------- 1 root root 511, 0 Feb 16 01:22 /dev/vfio/devices/vfio0
> +	$ cat /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/vfio0/dev
> +	511:0
> +	$ ls -l /dev/char/511\:0
> +	lrwxrwxrwx 1 root root 21 Feb 16 01:22 /dev/char/511:0 -> ../vfio/devices/vfio0
> +
> +Then provide the user with access to the device if unprivileged
> +operation is desired::
> +
> +	$ chown user:user /dev/vfio/devices/vfio0
> +
> +Finally the user could get cdev fd by::
> +
> +	cdev_fd = open("/dev/vfio/devices/vfio0", O_RDWR);
> +
> +An opened cdev_fd doesn't give the user any permission of accessing
> +the device except binding the cdev_fd to an iommufd.  After that point
> +then the device is fully accessible including attaching it to an
> +IOMMUFD IOAS/HWPT to enable userspace DMA::
> +
> +	struct vfio_device_bind_iommufd bind = {
> +		.argsz = sizeof(bind),
> +		.flags = 0,
> +	};
> +	struct iommu_ioas_alloc alloc_data  = {
> +		.size = sizeof(alloc_data),
> +		.flags = 0,
> +	};
> +	struct vfio_device_attach_iommufd_pt attach_data = {
> +		.argsz = sizeof(attach_data),
> +		.flags = 0,
> +	};
> +	struct iommu_ioas_map map = {
> +		.size = sizeof(map),
> +		.flags = IOMMU_IOAS_MAP_READABLE |
> +			 IOMMU_IOAS_MAP_WRITEABLE |
> +			 IOMMU_IOAS_MAP_FIXED_IOVA,
> +		.__reserved = 0,
> +	};
> +
> +	iommufd = open("/dev/iommu", O_RDWR);
> +
> +	bind.iommufd = iommufd;
> +	ioctl(cdev_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind);
> +
> +	ioctl(iommufd, IOMMU_IOAS_ALLOC, &alloc_data);
> +	attach_data.pt_id = alloc_data.out_ioas_id;
> +	ioctl(cdev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data);
> +
> +	/* Allocate some space and setup a DMA mapping */
> +	map.user_va = (int64_t)mmap(0, 1024 * 1024, PROT_READ | PROT_WRITE,
> +				    MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
> +	map.iova = 0; /* 1MB starting at 0x0 from device view */
> +	map.length = 1024 * 1024;
> +	map.ioas_id = alloc_data.out_ioas_id;;
> +
> +	ioctl(iommufd, IOMMU_IOAS_MAP, &map);
> +
> +	/* Other device operations as stated in "VFIO Usage Example" */
> +
>  VFIO User API
>  -------------------------------------------------------------------------------
>  
> @@ -566,3 +690,11 @@ This implementation has some specifics:
>  				\-0d.1
>  
>  	00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
> +
> +.. [5] Nested translation is an IOMMU feature which supports two stage
> +   address translations.  This improves the address translation efficiency
> +   in IOMMU virtualization.
> +
> +.. [6] PASID stands for Process Address Space ID, introduced by PCI
> +   Express.  It is a prerequisite for Shared Virtual Addressing (SVA)
> +   and Scalable I/O Virtualization (Scalable IOV).


^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [PATCH v12 06/24] vfio: Pass struct vfio_device_file * to vfio_device_open/close()
  2023-06-12 21:52     ` Alex Williamson
@ 2023-06-13  5:24       ` Liu, Yi L
  -1 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-13  5:24 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, June 13, 2023 5:52 AM
> 
> On Fri,  2 Jun 2023 05:16:35 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > This avoids passing too much parameters in multiple functions. Per the
> > input parameter change, rename the function to be vfio_df_open/close().
> >
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > Reviewed-by: Eric Auger <eric.auger@redhat.com>
> > Tested-by: Terrence Xu <terrence.xu@intel.com>
> > Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> > Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
> > Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> > Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/group.c     | 20 ++++++++++++++------
> >  drivers/vfio/vfio.h      |  8 ++++----
> >  drivers/vfio/vfio_main.c | 25 +++++++++++++++----------
> >  3 files changed, 33 insertions(+), 20 deletions(-)
> >
> > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > index b56e19d2a02d..caf53716ddb2 100644
> > --- a/drivers/vfio/group.c
> > +++ b/drivers/vfio/group.c
> > @@ -169,8 +169,9 @@ static void vfio_device_group_get_kvm_safe(struct vfio_device
> *device)
> >  	spin_unlock(&device->group->kvm_ref_lock);
> >  }
> >
> > -static int vfio_device_group_open(struct vfio_device *device)
> > +static int vfio_df_group_open(struct vfio_device_file *df)
> >  {
> > +	struct vfio_device *device = df->device;
> >  	int ret;
> >
> >  	mutex_lock(&device->group->group_lock);
> > @@ -190,7 +191,11 @@ static int vfio_device_group_open(struct vfio_device *device)
> >  	if (device->open_count == 0)
> >  		vfio_device_group_get_kvm_safe(device);
> >
> > -	ret = vfio_device_open(device, device->group->iommufd);
> > +	df->iommufd = device->group->iommufd;
> > +
> > +	ret = vfio_df_open(df);
> > +	if (ret)
> > +		df->iommufd = NULL;
> >
> >  	if (device->open_count == 0)
> >  		vfio_device_put_kvm(device);
> > @@ -202,12 +207,15 @@ static int vfio_device_group_open(struct vfio_device
> *device)
> >  	return ret;
> >  }
> >
> > -void vfio_device_group_close(struct vfio_device *device)
> > +void vfio_df_group_close(struct vfio_device_file *df)
> >  {
> > +	struct vfio_device *device = df->device;
> > +
> >  	mutex_lock(&device->group->group_lock);
> >  	mutex_lock(&device->dev_set->lock);
> >
> > -	vfio_device_close(device, device->group->iommufd);
> > +	vfio_df_close(df);
> > +	df->iommufd = NULL;
> >
> >  	if (device->open_count == 0)
> >  		vfio_device_put_kvm(device);
> > @@ -228,7 +236,7 @@ static struct file *vfio_device_open_file(struct vfio_device
> *device)
> >  		goto err_out;
> >  	}
> >
> > -	ret = vfio_device_group_open(device);
> > +	ret = vfio_df_group_open(df);
> >  	if (ret)
> >  		goto err_free;
> >
> > @@ -260,7 +268,7 @@ static struct file *vfio_device_open_file(struct vfio_device
> *device)
> >  	return filep;
> >
> >  err_close_device:
> > -	vfio_device_group_close(device);
> > +	vfio_df_group_close(df);
> >  err_free:
> >  	kfree(df);
> >  err_out:
> > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > index 69e1a0692b06..f9eb52eb9ed7 100644
> > --- a/drivers/vfio/vfio.h
> > +++ b/drivers/vfio/vfio.h
> > @@ -20,13 +20,13 @@ struct vfio_device_file {
> >  	struct vfio_device *device;
> >  	spinlock_t kvm_ref_lock; /* protect kvm field */
> >  	struct kvm *kvm;
> > +	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
> >  };
> >
> >  void vfio_device_put_registration(struct vfio_device *device);
> >  bool vfio_device_try_get_registration(struct vfio_device *device);
> > -int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd);
> > -void vfio_device_close(struct vfio_device *device,
> > -		       struct iommufd_ctx *iommufd);
> > +int vfio_df_open(struct vfio_device_file *df);
> > +void vfio_df_close(struct vfio_device_file *df);
> >  struct vfio_device_file *
> >  vfio_allocate_device_file(struct vfio_device *device);
> >
> > @@ -91,7 +91,7 @@ void vfio_device_group_register(struct vfio_device *device);
> >  void vfio_device_group_unregister(struct vfio_device *device);
> >  int vfio_device_group_use_iommu(struct vfio_device *device);
> >  void vfio_device_group_unuse_iommu(struct vfio_device *device);
> > -void vfio_device_group_close(struct vfio_device *device);
> > +void vfio_df_group_close(struct vfio_device_file *df);
> >  struct vfio_group *vfio_group_from_file(struct file *file);
> >  bool vfio_group_enforced_coherent(struct vfio_group *group);
> >  void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm);
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index 8ef9210ad2aa..a3c5817fc545 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -434,9 +434,10 @@ vfio_allocate_device_file(struct vfio_device *device)
> >  	return df;
> >  }
> >
> > -static int vfio_device_first_open(struct vfio_device *device,
> > -				  struct iommufd_ctx *iommufd)
> > +static int vfio_device_first_open(struct vfio_device_file *df)
> >  {
> > +	struct vfio_device *device = df->device;
> > +	struct iommufd_ctx *iommufd = df->iommufd;
> >  	int ret;
> >
> >  	lockdep_assert_held(&device->dev_set->lock);
> > @@ -468,9 +469,11 @@ static int vfio_device_first_open(struct vfio_device *device,
> >  	return ret;
> >  }
> >
> > -static void vfio_device_last_close(struct vfio_device *device,
> > -				   struct iommufd_ctx *iommufd)
> > +static void vfio_device_last_close(struct vfio_device_file *df)
> 
> Shouldn't these now be vfio_df_... functions too?  Thanks,

Yes. vfio_device_first_open() and vfio_device_last_close() should be
named like vfio_df...()

Regards,
Yi Liu

> 
> >  {
> > +	struct vfio_device *device = df->device;
> > +	struct iommufd_ctx *iommufd = df->iommufd;
> > +
> >  	lockdep_assert_held(&device->dev_set->lock);
> >
> >  	if (device->ops->close_device)
> > @@ -482,15 +485,16 @@ static void vfio_device_last_close(struct vfio_device *device,
> >  	module_put(device->dev->driver->owner);
> >  }
> >
> > -int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd)
> > +int vfio_df_open(struct vfio_device_file *df)
> >  {
> > +	struct vfio_device *device = df->device;
> >  	int ret = 0;
> >
> >  	lockdep_assert_held(&device->dev_set->lock);
> >
> >  	device->open_count++;
> >  	if (device->open_count == 1) {
> > -		ret = vfio_device_first_open(device, iommufd);
> > +		ret = vfio_device_first_open(df);
> >  		if (ret)
> >  			device->open_count--;
> >  	}
> > @@ -498,14 +502,15 @@ int vfio_device_open(struct vfio_device *device, struct
> iommufd_ctx *iommufd)
> >  	return ret;
> >  }
> >
> > -void vfio_device_close(struct vfio_device *device,
> > -		       struct iommufd_ctx *iommufd)
> > +void vfio_df_close(struct vfio_device_file *df)
> >  {
> > +	struct vfio_device *device = df->device;
> > +
> >  	lockdep_assert_held(&device->dev_set->lock);
> >
> >  	vfio_assert_device_open(device);
> >  	if (device->open_count == 1)
> > -		vfio_device_last_close(device, iommufd);
> > +		vfio_device_last_close(df);
> >  	device->open_count--;
> >  }
> >
> > @@ -550,7 +555,7 @@ static int vfio_device_fops_release(struct inode *inode, struct
> file *filep)
> >  	struct vfio_device_file *df = filep->private_data;
> >  	struct vfio_device *device = df->device;
> >
> > -	vfio_device_group_close(device);
> > +	vfio_df_group_close(df);
> >
> >  	vfio_device_put_registration(device);
> >


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 06/24] vfio: Pass struct vfio_device_file * to vfio_device_open/close()
@ 2023-06-13  5:24       ` Liu, Yi L
  0 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-13  5:24 UTC (permalink / raw)
  To: Alex Williamson
  Cc: mjrosato, jasowang, Hao, Xudong, Duan,  Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, jgg, Tian, Kevin, Zhao, Yan Y, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, June 13, 2023 5:52 AM
> 
> On Fri,  2 Jun 2023 05:16:35 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > This avoids passing too much parameters in multiple functions. Per the
> > input parameter change, rename the function to be vfio_df_open/close().
> >
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > Reviewed-by: Eric Auger <eric.auger@redhat.com>
> > Tested-by: Terrence Xu <terrence.xu@intel.com>
> > Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> > Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
> > Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> > Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/group.c     | 20 ++++++++++++++------
> >  drivers/vfio/vfio.h      |  8 ++++----
> >  drivers/vfio/vfio_main.c | 25 +++++++++++++++----------
> >  3 files changed, 33 insertions(+), 20 deletions(-)
> >
> > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > index b56e19d2a02d..caf53716ddb2 100644
> > --- a/drivers/vfio/group.c
> > +++ b/drivers/vfio/group.c
> > @@ -169,8 +169,9 @@ static void vfio_device_group_get_kvm_safe(struct vfio_device
> *device)
> >  	spin_unlock(&device->group->kvm_ref_lock);
> >  }
> >
> > -static int vfio_device_group_open(struct vfio_device *device)
> > +static int vfio_df_group_open(struct vfio_device_file *df)
> >  {
> > +	struct vfio_device *device = df->device;
> >  	int ret;
> >
> >  	mutex_lock(&device->group->group_lock);
> > @@ -190,7 +191,11 @@ static int vfio_device_group_open(struct vfio_device *device)
> >  	if (device->open_count == 0)
> >  		vfio_device_group_get_kvm_safe(device);
> >
> > -	ret = vfio_device_open(device, device->group->iommufd);
> > +	df->iommufd = device->group->iommufd;
> > +
> > +	ret = vfio_df_open(df);
> > +	if (ret)
> > +		df->iommufd = NULL;
> >
> >  	if (device->open_count == 0)
> >  		vfio_device_put_kvm(device);
> > @@ -202,12 +207,15 @@ static int vfio_device_group_open(struct vfio_device
> *device)
> >  	return ret;
> >  }
> >
> > -void vfio_device_group_close(struct vfio_device *device)
> > +void vfio_df_group_close(struct vfio_device_file *df)
> >  {
> > +	struct vfio_device *device = df->device;
> > +
> >  	mutex_lock(&device->group->group_lock);
> >  	mutex_lock(&device->dev_set->lock);
> >
> > -	vfio_device_close(device, device->group->iommufd);
> > +	vfio_df_close(df);
> > +	df->iommufd = NULL;
> >
> >  	if (device->open_count == 0)
> >  		vfio_device_put_kvm(device);
> > @@ -228,7 +236,7 @@ static struct file *vfio_device_open_file(struct vfio_device
> *device)
> >  		goto err_out;
> >  	}
> >
> > -	ret = vfio_device_group_open(device);
> > +	ret = vfio_df_group_open(df);
> >  	if (ret)
> >  		goto err_free;
> >
> > @@ -260,7 +268,7 @@ static struct file *vfio_device_open_file(struct vfio_device
> *device)
> >  	return filep;
> >
> >  err_close_device:
> > -	vfio_device_group_close(device);
> > +	vfio_df_group_close(df);
> >  err_free:
> >  	kfree(df);
> >  err_out:
> > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > index 69e1a0692b06..f9eb52eb9ed7 100644
> > --- a/drivers/vfio/vfio.h
> > +++ b/drivers/vfio/vfio.h
> > @@ -20,13 +20,13 @@ struct vfio_device_file {
> >  	struct vfio_device *device;
> >  	spinlock_t kvm_ref_lock; /* protect kvm field */
> >  	struct kvm *kvm;
> > +	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
> >  };
> >
> >  void vfio_device_put_registration(struct vfio_device *device);
> >  bool vfio_device_try_get_registration(struct vfio_device *device);
> > -int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd);
> > -void vfio_device_close(struct vfio_device *device,
> > -		       struct iommufd_ctx *iommufd);
> > +int vfio_df_open(struct vfio_device_file *df);
> > +void vfio_df_close(struct vfio_device_file *df);
> >  struct vfio_device_file *
> >  vfio_allocate_device_file(struct vfio_device *device);
> >
> > @@ -91,7 +91,7 @@ void vfio_device_group_register(struct vfio_device *device);
> >  void vfio_device_group_unregister(struct vfio_device *device);
> >  int vfio_device_group_use_iommu(struct vfio_device *device);
> >  void vfio_device_group_unuse_iommu(struct vfio_device *device);
> > -void vfio_device_group_close(struct vfio_device *device);
> > +void vfio_df_group_close(struct vfio_device_file *df);
> >  struct vfio_group *vfio_group_from_file(struct file *file);
> >  bool vfio_group_enforced_coherent(struct vfio_group *group);
> >  void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm);
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index 8ef9210ad2aa..a3c5817fc545 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -434,9 +434,10 @@ vfio_allocate_device_file(struct vfio_device *device)
> >  	return df;
> >  }
> >
> > -static int vfio_device_first_open(struct vfio_device *device,
> > -				  struct iommufd_ctx *iommufd)
> > +static int vfio_device_first_open(struct vfio_device_file *df)
> >  {
> > +	struct vfio_device *device = df->device;
> > +	struct iommufd_ctx *iommufd = df->iommufd;
> >  	int ret;
> >
> >  	lockdep_assert_held(&device->dev_set->lock);
> > @@ -468,9 +469,11 @@ static int vfio_device_first_open(struct vfio_device *device,
> >  	return ret;
> >  }
> >
> > -static void vfio_device_last_close(struct vfio_device *device,
> > -				   struct iommufd_ctx *iommufd)
> > +static void vfio_device_last_close(struct vfio_device_file *df)
> 
> Shouldn't these now be vfio_df_... functions too?  Thanks,

Yes. vfio_device_first_open() and vfio_device_last_close() should be
named like vfio_df...()

Regards,
Yi Liu

> 
> >  {
> > +	struct vfio_device *device = df->device;
> > +	struct iommufd_ctx *iommufd = df->iommufd;
> > +
> >  	lockdep_assert_held(&device->dev_set->lock);
> >
> >  	if (device->ops->close_device)
> > @@ -482,15 +485,16 @@ static void vfio_device_last_close(struct vfio_device *device,
> >  	module_put(device->dev->driver->owner);
> >  }
> >
> > -int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd)
> > +int vfio_df_open(struct vfio_device_file *df)
> >  {
> > +	struct vfio_device *device = df->device;
> >  	int ret = 0;
> >
> >  	lockdep_assert_held(&device->dev_set->lock);
> >
> >  	device->open_count++;
> >  	if (device->open_count == 1) {
> > -		ret = vfio_device_first_open(device, iommufd);
> > +		ret = vfio_device_first_open(df);
> >  		if (ret)
> >  			device->open_count--;
> >  	}
> > @@ -498,14 +502,15 @@ int vfio_device_open(struct vfio_device *device, struct
> iommufd_ctx *iommufd)
> >  	return ret;
> >  }
> >
> > -void vfio_device_close(struct vfio_device *device,
> > -		       struct iommufd_ctx *iommufd)
> > +void vfio_df_close(struct vfio_device_file *df)
> >  {
> > +	struct vfio_device *device = df->device;
> > +
> >  	lockdep_assert_held(&device->dev_set->lock);
> >
> >  	vfio_assert_device_open(device);
> >  	if (device->open_count == 1)
> > -		vfio_device_last_close(device, iommufd);
> > +		vfio_device_last_close(df);
> >  	device->open_count--;
> >  }
> >
> > @@ -550,7 +555,7 @@ static int vfio_device_fops_release(struct inode *inode, struct
> file *filep)
> >  	struct vfio_device_file *df = filep->private_data;
> >  	struct vfio_device *device = df->device;
> >
> > -	vfio_device_group_close(device);
> > +	vfio_df_group_close(df);
> >
> >  	vfio_device_put_registration(device);
> >


^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [PATCH v12 07/24] vfio: Block device access via device fd until device is opened
  2023-06-12 21:52     ` [Intel-gfx] " Alex Williamson
@ 2023-06-13  5:46       ` Liu, Yi L
  -1 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-13  5:46 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, June 13, 2023 5:52 AM
> 
> On Fri,  2 Jun 2023 05:16:36 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > Allow the vfio_device file to be in a state where the device FD is
> > opened but the device cannot be used by userspace (i.e. its .open_device()
> > hasn't been called). This inbetween state is not used when the device
> > FD is spawned from the group FD, however when we create the device FD
> > directly by opening a cdev it will be opened in the blocked state.
> >
> > The reason for the inbetween state is that userspace only gets a FD but
> > doesn't gain access permission until binding the FD to an iommufd. So in
> > the blocked state, only the bind operation is allowed. Completing bind
> > will allow user to further access the device.
> >
> > This is implemented by adding a flag in struct vfio_device_file to mark
> > the blocked state and using a simple smp_load_acquire() to obtain the
> > flag value and serialize all the device setup with the thread accessing
> > this device.
> >
> > Following this lockless scheme, it can safely handle the device FD
> > unbound->bound but it cannot handle bound->unbound. To allow this we'd
> > need to add a lock on all the vfio ioctls which seems costly. So once
> > device FD is bound, it remains bound until the FD is closed.
> >
> > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > Reviewed-by: Eric Auger <eric.auger@redhat.com>
> > Tested-by: Terrence Xu <terrence.xu@intel.com>
> > Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> > Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
> > Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> > Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/group.c     | 11 ++++++++++-
> >  drivers/vfio/vfio.h      |  1 +
> >  drivers/vfio/vfio_main.c | 16 ++++++++++++++++
> >  3 files changed, 27 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > index caf53716ddb2..088dd34c8931 100644
> > --- a/drivers/vfio/group.c
> > +++ b/drivers/vfio/group.c
> > @@ -194,9 +194,18 @@ static int vfio_df_group_open(struct vfio_device_file *df)
> >  	df->iommufd = device->group->iommufd;
> >
> >  	ret = vfio_df_open(df);
> > -	if (ret)
> > +	if (ret) {
> >  		df->iommufd = NULL;
> > +		goto out_put_kvm;
> > +	}
> > +
> > +	/*
> > +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> > +	 * read/write/mmap and vfio_file_has_device_access()
> > +	 */
> > +	smp_store_release(&df->access_granted, true);
> >
> > +out_put_kvm:
> >  	if (device->open_count == 0)
> >  		vfio_device_put_kvm(device);
> >
> > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > index f9eb52eb9ed7..fdf2fc73f880 100644
> > --- a/drivers/vfio/vfio.h
> > +++ b/drivers/vfio/vfio.h
> > @@ -18,6 +18,7 @@ struct vfio_container;
> >
> >  struct vfio_device_file {
> >  	struct vfio_device *device;
> > +	bool access_granted;
> 
> Should we make this a more strongly defined data type and later move
> devid (u32) here to partially fill the hole created?

Before your question, let me describe how I place the fields
of this structure to see if it is common practice. The first two
fields are static, so they are in the beginning. The access_granted
is lockless and other fields are protected by locks. So I tried to
put the lock and the fields it protects closely. So this is why I put
devid behind iommufd as both are protected by the same lock.

struct vfio_device_file {
        struct vfio_device *device;
        struct vfio_group *group;

        bool access_granted;
        spinlock_t kvm_ref_lock; /* protect kvm field */
        struct kvm *kvm;
        struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
        u32 devid; /* only valid when iommufd is valid */
};

> 
> I think this is being placed towards the front of the data structure
> for cache line locality given this is a hot path for file operations.
> But bool types have an implementation dependent size, making them
> difficult to pack.  Also there will be a tendency to want to make this
> a bit field, which is probably not compatible with the smp lockless
> operations being used here.  We might get in front of these issues if
> we just define it as a u8 now.  Thanks,

Not quite get why bit field is going to be incompatible with smp
lockless operations. Could you elaborate a bit? And should I define
the access_granted as u8 or "u8:1"?

Regards,
Yi Liu

> 
> >  	spinlock_t kvm_ref_lock; /* protect kvm field */
> >  	struct kvm *kvm;
> >  	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index a3c5817fc545..4c8b7713dc3d 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -1129,6 +1129,10 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
> >  	struct vfio_device *device = df->device;
> >  	int ret;
> >
> > +	/* Paired with smp_store_release() following vfio_df_open() */
> > +	if (!smp_load_acquire(&df->access_granted))
> > +		return -EINVAL;
> > +
> >  	ret = vfio_device_pm_runtime_get(device);
> >  	if (ret)
> >  		return ret;
> > @@ -1156,6 +1160,10 @@ static ssize_t vfio_device_fops_read(struct file *filep, char
> __user *buf,
> >  	struct vfio_device_file *df = filep->private_data;
> >  	struct vfio_device *device = df->device;
> >
> > +	/* Paired with smp_store_release() following vfio_df_open() */
> > +	if (!smp_load_acquire(&df->access_granted))
> > +		return -EINVAL;
> > +
> >  	if (unlikely(!device->ops->read))
> >  		return -EINVAL;
> >
> > @@ -1169,6 +1177,10 @@ static ssize_t vfio_device_fops_write(struct file *filep,
> >  	struct vfio_device_file *df = filep->private_data;
> >  	struct vfio_device *device = df->device;
> >
> > +	/* Paired with smp_store_release() following vfio_df_open() */
> > +	if (!smp_load_acquire(&df->access_granted))
> > +		return -EINVAL;
> > +
> >  	if (unlikely(!device->ops->write))
> >  		return -EINVAL;
> >
> > @@ -1180,6 +1192,10 @@ static int vfio_device_fops_mmap(struct file *filep, struct
> vm_area_struct *vma)
> >  	struct vfio_device_file *df = filep->private_data;
> >  	struct vfio_device *device = df->device;
> >
> > +	/* Paired with smp_store_release() following vfio_df_open() */
> > +	if (!smp_load_acquire(&df->access_granted))
> > +		return -EINVAL;
> > +
> >  	if (unlikely(!device->ops->mmap))
> >  		return -EINVAL;
> >


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 07/24] vfio: Block device access via device fd until device is opened
@ 2023-06-13  5:46       ` Liu, Yi L
  0 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-13  5:46 UTC (permalink / raw)
  To: Alex Williamson
  Cc: mjrosato, jasowang, Hao, Xudong, Duan,  Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, jgg, Tian, Kevin, Zhao, Yan Y, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, June 13, 2023 5:52 AM
> 
> On Fri,  2 Jun 2023 05:16:36 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > Allow the vfio_device file to be in a state where the device FD is
> > opened but the device cannot be used by userspace (i.e. its .open_device()
> > hasn't been called). This inbetween state is not used when the device
> > FD is spawned from the group FD, however when we create the device FD
> > directly by opening a cdev it will be opened in the blocked state.
> >
> > The reason for the inbetween state is that userspace only gets a FD but
> > doesn't gain access permission until binding the FD to an iommufd. So in
> > the blocked state, only the bind operation is allowed. Completing bind
> > will allow user to further access the device.
> >
> > This is implemented by adding a flag in struct vfio_device_file to mark
> > the blocked state and using a simple smp_load_acquire() to obtain the
> > flag value and serialize all the device setup with the thread accessing
> > this device.
> >
> > Following this lockless scheme, it can safely handle the device FD
> > unbound->bound but it cannot handle bound->unbound. To allow this we'd
> > need to add a lock on all the vfio ioctls which seems costly. So once
> > device FD is bound, it remains bound until the FD is closed.
> >
> > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > Reviewed-by: Eric Auger <eric.auger@redhat.com>
> > Tested-by: Terrence Xu <terrence.xu@intel.com>
> > Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> > Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
> > Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> > Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/group.c     | 11 ++++++++++-
> >  drivers/vfio/vfio.h      |  1 +
> >  drivers/vfio/vfio_main.c | 16 ++++++++++++++++
> >  3 files changed, 27 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > index caf53716ddb2..088dd34c8931 100644
> > --- a/drivers/vfio/group.c
> > +++ b/drivers/vfio/group.c
> > @@ -194,9 +194,18 @@ static int vfio_df_group_open(struct vfio_device_file *df)
> >  	df->iommufd = device->group->iommufd;
> >
> >  	ret = vfio_df_open(df);
> > -	if (ret)
> > +	if (ret) {
> >  		df->iommufd = NULL;
> > +		goto out_put_kvm;
> > +	}
> > +
> > +	/*
> > +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> > +	 * read/write/mmap and vfio_file_has_device_access()
> > +	 */
> > +	smp_store_release(&df->access_granted, true);
> >
> > +out_put_kvm:
> >  	if (device->open_count == 0)
> >  		vfio_device_put_kvm(device);
> >
> > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > index f9eb52eb9ed7..fdf2fc73f880 100644
> > --- a/drivers/vfio/vfio.h
> > +++ b/drivers/vfio/vfio.h
> > @@ -18,6 +18,7 @@ struct vfio_container;
> >
> >  struct vfio_device_file {
> >  	struct vfio_device *device;
> > +	bool access_granted;
> 
> Should we make this a more strongly defined data type and later move
> devid (u32) here to partially fill the hole created?

Before your question, let me describe how I place the fields
of this structure to see if it is common practice. The first two
fields are static, so they are in the beginning. The access_granted
is lockless and other fields are protected by locks. So I tried to
put the lock and the fields it protects closely. So this is why I put
devid behind iommufd as both are protected by the same lock.

struct vfio_device_file {
        struct vfio_device *device;
        struct vfio_group *group;

        bool access_granted;
        spinlock_t kvm_ref_lock; /* protect kvm field */
        struct kvm *kvm;
        struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
        u32 devid; /* only valid when iommufd is valid */
};

> 
> I think this is being placed towards the front of the data structure
> for cache line locality given this is a hot path for file operations.
> But bool types have an implementation dependent size, making them
> difficult to pack.  Also there will be a tendency to want to make this
> a bit field, which is probably not compatible with the smp lockless
> operations being used here.  We might get in front of these issues if
> we just define it as a u8 now.  Thanks,

Not quite get why bit field is going to be incompatible with smp
lockless operations. Could you elaborate a bit? And should I define
the access_granted as u8 or "u8:1"?

Regards,
Yi Liu

> 
> >  	spinlock_t kvm_ref_lock; /* protect kvm field */
> >  	struct kvm *kvm;
> >  	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index a3c5817fc545..4c8b7713dc3d 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -1129,6 +1129,10 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
> >  	struct vfio_device *device = df->device;
> >  	int ret;
> >
> > +	/* Paired with smp_store_release() following vfio_df_open() */
> > +	if (!smp_load_acquire(&df->access_granted))
> > +		return -EINVAL;
> > +
> >  	ret = vfio_device_pm_runtime_get(device);
> >  	if (ret)
> >  		return ret;
> > @@ -1156,6 +1160,10 @@ static ssize_t vfio_device_fops_read(struct file *filep, char
> __user *buf,
> >  	struct vfio_device_file *df = filep->private_data;
> >  	struct vfio_device *device = df->device;
> >
> > +	/* Paired with smp_store_release() following vfio_df_open() */
> > +	if (!smp_load_acquire(&df->access_granted))
> > +		return -EINVAL;
> > +
> >  	if (unlikely(!device->ops->read))
> >  		return -EINVAL;
> >
> > @@ -1169,6 +1177,10 @@ static ssize_t vfio_device_fops_write(struct file *filep,
> >  	struct vfio_device_file *df = filep->private_data;
> >  	struct vfio_device *device = df->device;
> >
> > +	/* Paired with smp_store_release() following vfio_df_open() */
> > +	if (!smp_load_acquire(&df->access_granted))
> > +		return -EINVAL;
> > +
> >  	if (unlikely(!device->ops->write))
> >  		return -EINVAL;
> >
> > @@ -1180,6 +1192,10 @@ static int vfio_device_fops_mmap(struct file *filep, struct
> vm_area_struct *vma)
> >  	struct vfio_device_file *df = filep->private_data;
> >  	struct vfio_device *device = df->device;
> >
> > +	/* Paired with smp_store_release() following vfio_df_open() */
> > +	if (!smp_load_acquire(&df->access_granted))
> > +		return -EINVAL;
> > +
> >  	if (unlikely(!device->ops->mmap))
> >  		return -EINVAL;
> >


^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-06-12 22:27     ` Alex Williamson
@ 2023-06-13  5:48       ` Liu, Yi L
  -1 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-13  5:48 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, June 13, 2023 6:27 AM
> 
> On Fri,  2 Jun 2023 05:16:47 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > This adds ioctl for userspace to bind device cdev fd to iommufd.
> >
> >     VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA
> > 			      control provided by the iommufd. open_device
> > 			      op is called after bind_iommufd op.
> >
> > Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> > Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > Tested-by: Terrence Xu <terrence.xu@intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/device_cdev.c | 123 +++++++++++++++++++++++++++++++++++++
> >  drivers/vfio/vfio.h        |  13 ++++
> >  drivers/vfio/vfio_main.c   |   5 ++
> >  include/linux/vfio.h       |   3 +-
> >  include/uapi/linux/vfio.h  |  27 ++++++++
> >  5 files changed, 170 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> > index 1c640016a824..a4498ddbe774 100644
> > --- a/drivers/vfio/device_cdev.c
> > +++ b/drivers/vfio/device_cdev.c
> > @@ -3,6 +3,7 @@
> >   * Copyright (c) 2023 Intel Corporation.
> >   */
> >  #include <linux/vfio.h>
> > +#include <linux/iommufd.h>
> >
> >  #include "vfio.h"
> >
> > @@ -44,6 +45,128 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct
> file *filep)
> >  	return ret;
> >  }
> >
> > +static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
> > +{
> > +	spin_lock(&df->kvm_ref_lock);
> > +	if (df->kvm)
> > +		_vfio_device_get_kvm_safe(df->device, df->kvm);
> > +	spin_unlock(&df->kvm_ref_lock);
> > +}
> > +
> > +void vfio_df_cdev_close(struct vfio_device_file *df)
> > +{
> > +	struct vfio_device *device = df->device;
> > +
> > +	/*
> > +	 * In the time of close, there is no contention with another one
> > +	 * changing this flag.  So read df->access_granted without lock
> > +	 * and no smp_load_acquire() is ok.
> > +	 */
> > +	if (!df->access_granted)
> > +		return;
> > +
> > +	mutex_lock(&device->dev_set->lock);
> > +	vfio_df_close(df);
> > +	vfio_device_put_kvm(device);
> > +	iommufd_ctx_put(df->iommufd);
> > +	device->cdev_opened = false;
> > +	mutex_unlock(&device->dev_set->lock);
> > +	vfio_device_unblock_group(device);
> > +}
> > +
> > +static struct iommufd_ctx *vfio_get_iommufd_from_fd(int fd)
> > +{
> > +	struct iommufd_ctx *iommufd;
> > +	struct fd f;
> > +
> > +	f = fdget(fd);
> > +	if (!f.file)
> > +		return ERR_PTR(-EBADF);
> > +
> > +	iommufd = iommufd_ctx_from_file(f.file);
> > +
> > +	fdput(f);
> > +	return iommufd;
> > +}
> > +
> > +long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
> > +				struct vfio_device_bind_iommufd __user *arg)
> > +{
> > +	struct vfio_device *device = df->device;
> > +	struct vfio_device_bind_iommufd bind;
> > +	unsigned long minsz;
> > +	int ret;
> > +
> > +	static_assert(__same_type(arg->out_devid, df->devid));
> > +
> > +	minsz = offsetofend(struct vfio_device_bind_iommufd, out_devid);
> > +
> > +	if (copy_from_user(&bind, arg, minsz))
> > +		return -EFAULT;
> > +
> > +	if (bind.argsz < minsz || bind.flags || bind.iommufd < 0)
> > +		return -EINVAL;
> > +
> > +	/* BIND_IOMMUFD only allowed for cdev fds */
> > +	if (df->group)
> > +		return -EINVAL;
> > +
> > +	ret = vfio_device_block_group(device);
> > +	if (ret)
> > +		return ret;
> > +
> > +	mutex_lock(&device->dev_set->lock);
> > +	/* one device cannot be bound twice */
> > +	if (df->access_granted) {
> > +		ret = -EINVAL;
> > +		goto out_unlock;
> > +	}
> > +
> > +	df->iommufd = vfio_get_iommufd_from_fd(bind.iommufd);
> > +	if (IS_ERR(df->iommufd)) {
> > +		ret = PTR_ERR(df->iommufd);
> > +		df->iommufd = NULL;
> > +		goto out_unlock;
> > +	}
> > +
> > +	/*
> > +	 * Before the device open, get the KVM pointer currently
> > +	 * associated with the device file (if there is) and obtain
> > +	 * a reference.  This reference is held until device closed.
> > +	 * Save the pointer in the device for use by drivers.
> > +	 */
> > +	vfio_device_get_kvm_safe(df);
> > +
> > +	ret = vfio_df_open(df);
> > +	if (ret)
> > +		goto out_put_kvm;
> > +
> > +	ret = copy_to_user(&arg->out_devid, &df->devid,
> > +			   sizeof(df->devid)) ? -EFAULT : 0;
> > +	if (ret)
> > +		goto out_close_device;
> > +
> > +	/*
> > +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> > +	 * read/write/mmap
> > +	 */
> > +	smp_store_release(&df->access_granted, true);
> > +	device->cdev_opened = true;
> > +	mutex_unlock(&device->dev_set->lock);
> > +	return 0;
> > +
> > +out_close_device:
> > +	vfio_df_close(df);
> > +out_put_kvm:
> > +	vfio_device_put_kvm(device);
> > +	iommufd_ctx_put(df->iommufd);
> > +	df->iommufd = NULL;
> > +out_unlock:
> > +	mutex_unlock(&device->dev_set->lock);
> > +	vfio_device_unblock_group(device);
> > +	return ret;
> > +}
> > +
> >  static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
> >  {
> >  	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
> > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > index d12b5b524bfc..42de40d2cd4d 100644
> > --- a/drivers/vfio/vfio.h
> > +++ b/drivers/vfio/vfio.h
> > @@ -287,6 +287,9 @@ static inline void vfio_device_del(struct vfio_device *device)
> >  }
> >
> >  int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
> > +void vfio_df_cdev_close(struct vfio_device_file *df);
> > +long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
> > +				struct vfio_device_bind_iommufd __user *arg);
> >  int vfio_cdev_init(struct class *device_class);
> >  void vfio_cdev_cleanup(void);
> >  #else
> > @@ -310,6 +313,16 @@ static inline int vfio_device_fops_cdev_open(struct inode
> *inode,
> >  	return 0;
> >  }
> >
> > +static inline void vfio_df_cdev_close(struct vfio_device_file *df)
> > +{
> > +}
> > +
> > +static inline long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
> > +					      struct vfio_device_bind_iommufd __user
> *arg)
> > +{
> > +	return -EOPNOTSUPP;
> > +}
> > +
> >  static inline int vfio_cdev_init(struct class *device_class)
> >  {
> >  	return 0;
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index ef55af75f459..9ba4d420eda2 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -572,6 +572,8 @@ static int vfio_device_fops_release(struct inode *inode, struct
> file *filep)
> >
> >  	if (df->group)
> >  		vfio_df_group_close(df);
> > +	else
> > +		vfio_df_cdev_close(df);
> >
> >  	vfio_device_put_registration(device);
> >
> > @@ -1145,6 +1147,9 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
> >  	struct vfio_device *device = df->device;
> >  	int ret;
> >
> > +	if (cmd == VFIO_DEVICE_BIND_IOMMUFD)
> > +		return vfio_df_ioctl_bind_iommufd(df, (void __user *)arg);
> > +
> >  	/* Paired with smp_store_release() following vfio_df_open() */
> >  	if (!smp_load_acquire(&df->access_granted))
> >  		return -EINVAL;
> > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > index 83cc5dc28b7a..e80a8ac86e46 100644
> > --- a/include/linux/vfio.h
> > +++ b/include/linux/vfio.h
> > @@ -66,6 +66,7 @@ struct vfio_device {
> >  	struct iommufd_device *iommufd_device;
> >  	bool iommufd_attached;
> >  #endif
> > +	bool cdev_opened:1;
> 
> Perhaps a more strongly defined data type here as well and roll
> iommufd_attached into the same bit field scheme.

Ok, then needs to make iommufd_attached always defined.

> 
> >  };
> >
> >  /**
> > @@ -170,7 +171,7 @@ vfio_iommufd_device_hot_reset_devid(struct vfio_device
> *vdev,
> >
> >  static inline bool vfio_device_cdev_opened(struct vfio_device *device)
> >  {
> > -	return false;
> > +	return device->cdev_opened;
> >  }
> >
> >  /**
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index f753124e1c82..7296012b7f36 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -194,6 +194,33 @@ struct vfio_group_status {
> >
> >  /* --------------- IOCTLs for DEVICE file descriptors --------------- */
> >
> > +/*
> > + * VFIO_DEVICE_BIND_IOMMUFD - _IOR(VFIO_TYPE, VFIO_BASE + 18,
> > + *				   struct vfio_device_bind_iommufd)
> > + * @argsz:	 User filled size of this data.
> > + * @flags:	 Must be 0.
> > + * @iommufd:	 iommufd to bind.
> > + * @out_devid:	 The device id generated by this bind. devid is a handle for
> > + *		 this device/iommufd bond and can be used in IOMMUFD commands.
> > + *
> > + * Bind a vfio_device to the specified iommufd.
> > + *
> > + * User is restricted from accessing the device before the binding operation
> > + * is completed.
> > + *
> > + * Unbind is automatically conducted when device fd is closed.
> > + *
> > + * Return: 0 on success, -errno on failure.
> > + */
> > +struct vfio_device_bind_iommufd {
> > +	__u32		argsz;
> > +	__u32		flags;
> > +	__s32		iommufd;
> > +	__u32		out_devid;
> > +};
> > +
> > +#define VFIO_DEVICE_BIND_IOMMUFD	_IO(VFIO_TYPE, VFIO_BASE + 18)
> > +
> 
> Why are we still defining device ioctls 18-20 before existing device
> ioctls?  18 should be defined after 17...  Thanks,

Yes. I put it here as it is supposed to be the first doable ioctl for cdev fds.
But you are right, it should be ordered by offset.

Regards,
Yi Liu

> Alex
> 
> >  /**
> >   * VFIO_DEVICE_GET_INFO - _IOR(VFIO_TYPE, VFIO_BASE + 7,
> >   *						struct vfio_device_info)


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
@ 2023-06-13  5:48       ` Liu, Yi L
  0 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-13  5:48 UTC (permalink / raw)
  To: Alex Williamson
  Cc: mjrosato, jasowang, Hao, Xudong, Duan,  Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, jgg, Tian, Kevin, Zhao, Yan Y, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, June 13, 2023 6:27 AM
> 
> On Fri,  2 Jun 2023 05:16:47 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > This adds ioctl for userspace to bind device cdev fd to iommufd.
> >
> >     VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA
> > 			      control provided by the iommufd. open_device
> > 			      op is called after bind_iommufd op.
> >
> > Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> > Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > Tested-by: Terrence Xu <terrence.xu@intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/device_cdev.c | 123 +++++++++++++++++++++++++++++++++++++
> >  drivers/vfio/vfio.h        |  13 ++++
> >  drivers/vfio/vfio_main.c   |   5 ++
> >  include/linux/vfio.h       |   3 +-
> >  include/uapi/linux/vfio.h  |  27 ++++++++
> >  5 files changed, 170 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> > index 1c640016a824..a4498ddbe774 100644
> > --- a/drivers/vfio/device_cdev.c
> > +++ b/drivers/vfio/device_cdev.c
> > @@ -3,6 +3,7 @@
> >   * Copyright (c) 2023 Intel Corporation.
> >   */
> >  #include <linux/vfio.h>
> > +#include <linux/iommufd.h>
> >
> >  #include "vfio.h"
> >
> > @@ -44,6 +45,128 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct
> file *filep)
> >  	return ret;
> >  }
> >
> > +static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
> > +{
> > +	spin_lock(&df->kvm_ref_lock);
> > +	if (df->kvm)
> > +		_vfio_device_get_kvm_safe(df->device, df->kvm);
> > +	spin_unlock(&df->kvm_ref_lock);
> > +}
> > +
> > +void vfio_df_cdev_close(struct vfio_device_file *df)
> > +{
> > +	struct vfio_device *device = df->device;
> > +
> > +	/*
> > +	 * In the time of close, there is no contention with another one
> > +	 * changing this flag.  So read df->access_granted without lock
> > +	 * and no smp_load_acquire() is ok.
> > +	 */
> > +	if (!df->access_granted)
> > +		return;
> > +
> > +	mutex_lock(&device->dev_set->lock);
> > +	vfio_df_close(df);
> > +	vfio_device_put_kvm(device);
> > +	iommufd_ctx_put(df->iommufd);
> > +	device->cdev_opened = false;
> > +	mutex_unlock(&device->dev_set->lock);
> > +	vfio_device_unblock_group(device);
> > +}
> > +
> > +static struct iommufd_ctx *vfio_get_iommufd_from_fd(int fd)
> > +{
> > +	struct iommufd_ctx *iommufd;
> > +	struct fd f;
> > +
> > +	f = fdget(fd);
> > +	if (!f.file)
> > +		return ERR_PTR(-EBADF);
> > +
> > +	iommufd = iommufd_ctx_from_file(f.file);
> > +
> > +	fdput(f);
> > +	return iommufd;
> > +}
> > +
> > +long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
> > +				struct vfio_device_bind_iommufd __user *arg)
> > +{
> > +	struct vfio_device *device = df->device;
> > +	struct vfio_device_bind_iommufd bind;
> > +	unsigned long minsz;
> > +	int ret;
> > +
> > +	static_assert(__same_type(arg->out_devid, df->devid));
> > +
> > +	minsz = offsetofend(struct vfio_device_bind_iommufd, out_devid);
> > +
> > +	if (copy_from_user(&bind, arg, minsz))
> > +		return -EFAULT;
> > +
> > +	if (bind.argsz < minsz || bind.flags || bind.iommufd < 0)
> > +		return -EINVAL;
> > +
> > +	/* BIND_IOMMUFD only allowed for cdev fds */
> > +	if (df->group)
> > +		return -EINVAL;
> > +
> > +	ret = vfio_device_block_group(device);
> > +	if (ret)
> > +		return ret;
> > +
> > +	mutex_lock(&device->dev_set->lock);
> > +	/* one device cannot be bound twice */
> > +	if (df->access_granted) {
> > +		ret = -EINVAL;
> > +		goto out_unlock;
> > +	}
> > +
> > +	df->iommufd = vfio_get_iommufd_from_fd(bind.iommufd);
> > +	if (IS_ERR(df->iommufd)) {
> > +		ret = PTR_ERR(df->iommufd);
> > +		df->iommufd = NULL;
> > +		goto out_unlock;
> > +	}
> > +
> > +	/*
> > +	 * Before the device open, get the KVM pointer currently
> > +	 * associated with the device file (if there is) and obtain
> > +	 * a reference.  This reference is held until device closed.
> > +	 * Save the pointer in the device for use by drivers.
> > +	 */
> > +	vfio_device_get_kvm_safe(df);
> > +
> > +	ret = vfio_df_open(df);
> > +	if (ret)
> > +		goto out_put_kvm;
> > +
> > +	ret = copy_to_user(&arg->out_devid, &df->devid,
> > +			   sizeof(df->devid)) ? -EFAULT : 0;
> > +	if (ret)
> > +		goto out_close_device;
> > +
> > +	/*
> > +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> > +	 * read/write/mmap
> > +	 */
> > +	smp_store_release(&df->access_granted, true);
> > +	device->cdev_opened = true;
> > +	mutex_unlock(&device->dev_set->lock);
> > +	return 0;
> > +
> > +out_close_device:
> > +	vfio_df_close(df);
> > +out_put_kvm:
> > +	vfio_device_put_kvm(device);
> > +	iommufd_ctx_put(df->iommufd);
> > +	df->iommufd = NULL;
> > +out_unlock:
> > +	mutex_unlock(&device->dev_set->lock);
> > +	vfio_device_unblock_group(device);
> > +	return ret;
> > +}
> > +
> >  static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
> >  {
> >  	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
> > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > index d12b5b524bfc..42de40d2cd4d 100644
> > --- a/drivers/vfio/vfio.h
> > +++ b/drivers/vfio/vfio.h
> > @@ -287,6 +287,9 @@ static inline void vfio_device_del(struct vfio_device *device)
> >  }
> >
> >  int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
> > +void vfio_df_cdev_close(struct vfio_device_file *df);
> > +long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
> > +				struct vfio_device_bind_iommufd __user *arg);
> >  int vfio_cdev_init(struct class *device_class);
> >  void vfio_cdev_cleanup(void);
> >  #else
> > @@ -310,6 +313,16 @@ static inline int vfio_device_fops_cdev_open(struct inode
> *inode,
> >  	return 0;
> >  }
> >
> > +static inline void vfio_df_cdev_close(struct vfio_device_file *df)
> > +{
> > +}
> > +
> > +static inline long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
> > +					      struct vfio_device_bind_iommufd __user
> *arg)
> > +{
> > +	return -EOPNOTSUPP;
> > +}
> > +
> >  static inline int vfio_cdev_init(struct class *device_class)
> >  {
> >  	return 0;
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index ef55af75f459..9ba4d420eda2 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -572,6 +572,8 @@ static int vfio_device_fops_release(struct inode *inode, struct
> file *filep)
> >
> >  	if (df->group)
> >  		vfio_df_group_close(df);
> > +	else
> > +		vfio_df_cdev_close(df);
> >
> >  	vfio_device_put_registration(device);
> >
> > @@ -1145,6 +1147,9 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
> >  	struct vfio_device *device = df->device;
> >  	int ret;
> >
> > +	if (cmd == VFIO_DEVICE_BIND_IOMMUFD)
> > +		return vfio_df_ioctl_bind_iommufd(df, (void __user *)arg);
> > +
> >  	/* Paired with smp_store_release() following vfio_df_open() */
> >  	if (!smp_load_acquire(&df->access_granted))
> >  		return -EINVAL;
> > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > index 83cc5dc28b7a..e80a8ac86e46 100644
> > --- a/include/linux/vfio.h
> > +++ b/include/linux/vfio.h
> > @@ -66,6 +66,7 @@ struct vfio_device {
> >  	struct iommufd_device *iommufd_device;
> >  	bool iommufd_attached;
> >  #endif
> > +	bool cdev_opened:1;
> 
> Perhaps a more strongly defined data type here as well and roll
> iommufd_attached into the same bit field scheme.

Ok, then needs to make iommufd_attached always defined.

> 
> >  };
> >
> >  /**
> > @@ -170,7 +171,7 @@ vfio_iommufd_device_hot_reset_devid(struct vfio_device
> *vdev,
> >
> >  static inline bool vfio_device_cdev_opened(struct vfio_device *device)
> >  {
> > -	return false;
> > +	return device->cdev_opened;
> >  }
> >
> >  /**
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index f753124e1c82..7296012b7f36 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -194,6 +194,33 @@ struct vfio_group_status {
> >
> >  /* --------------- IOCTLs for DEVICE file descriptors --------------- */
> >
> > +/*
> > + * VFIO_DEVICE_BIND_IOMMUFD - _IOR(VFIO_TYPE, VFIO_BASE + 18,
> > + *				   struct vfio_device_bind_iommufd)
> > + * @argsz:	 User filled size of this data.
> > + * @flags:	 Must be 0.
> > + * @iommufd:	 iommufd to bind.
> > + * @out_devid:	 The device id generated by this bind. devid is a handle for
> > + *		 this device/iommufd bond and can be used in IOMMUFD commands.
> > + *
> > + * Bind a vfio_device to the specified iommufd.
> > + *
> > + * User is restricted from accessing the device before the binding operation
> > + * is completed.
> > + *
> > + * Unbind is automatically conducted when device fd is closed.
> > + *
> > + * Return: 0 on success, -errno on failure.
> > + */
> > +struct vfio_device_bind_iommufd {
> > +	__u32		argsz;
> > +	__u32		flags;
> > +	__s32		iommufd;
> > +	__u32		out_devid;
> > +};
> > +
> > +#define VFIO_DEVICE_BIND_IOMMUFD	_IO(VFIO_TYPE, VFIO_BASE + 18)
> > +
> 
> Why are we still defining device ioctls 18-20 before existing device
> ioctls?  18 should be defined after 17...  Thanks,

Yes. I put it here as it is supposed to be the first doable ioctl for cdev fds.
But you are right, it should be ordered by offset.

Regards,
Yi Liu

> Alex
> 
> >  /**
> >   * VFIO_DEVICE_GET_INFO - _IOR(VFIO_TYPE, VFIO_BASE + 7,
> >   *						struct vfio_device_info)


^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
  2023-06-12 22:42     ` Alex Williamson
@ 2023-06-13  5:53       ` Liu, Yi L
  -1 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-13  5:53 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, June 13, 2023 6:42 AM
> 
> On Fri,  2 Jun 2023 05:16:50 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > This moves the noiommu device determination and noiommu taint out of
> > vfio_group_find_or_alloc(). noiommu device is determined in
> > __vfio_register_dev() and result is stored in flag vfio_device->noiommu,
> > the noiommu taint is added in the end of __vfio_register_dev().
> >
> > This is also a preparation for compiling out vfio_group infrastructure
> > as it makes the noiommu detection and taint common between the cdev path
> > and group path though cdev path does not support noiommu.
> 
> Does this really still make sense?  The motivation for the change is
> really not clear without cdev support for noiommu.  Thanks,

I think it still makes sense. When CONFIG_VFIO_GROUP==n, the kernel
only supports cdev interface. If there is noiommu device, vfio should
fail the registration. So, the noiommu determination is still needed. But
I'd admit the taint might still be in the group code.

Regards,
Yi Liu

> Alex
> 
> > Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/group.c     | 15 ---------------
> >  drivers/vfio/vfio_main.c | 31 ++++++++++++++++++++++++++++++-
> >  include/linux/vfio.h     |  1 +
> >  3 files changed, 31 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > index 653b62f93474..64cdd0ea8825 100644
> > --- a/drivers/vfio/group.c
> > +++ b/drivers/vfio/group.c
> > @@ -668,21 +668,6 @@ static struct vfio_group *vfio_group_find_or_alloc(struct
> device *dev)
> >  	struct vfio_group *group;
> >
> >  	iommu_group = iommu_group_get(dev);
> > -	if (!iommu_group && vfio_noiommu) {
> > -		/*
> > -		 * With noiommu enabled, create an IOMMU group for devices that
> > -		 * don't already have one, implying no IOMMU hardware/driver
> > -		 * exists.  Taint the kernel because we're about to give a DMA
> > -		 * capable device to a user without IOMMU protection.
> > -		 */
> > -		group = vfio_noiommu_group_alloc(dev, VFIO_NO_IOMMU);
> > -		if (!IS_ERR(group)) {
> > -			add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> > -			dev_warn(dev, "Adding kernel taint for vfio-noiommu group on
> device\n");
> > -		}
> > -		return group;
> > -	}
> > -
> >  	if (!iommu_group)
> >  		return ERR_PTR(-EINVAL);
> >
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index 6d8f9b0f3637..00a699b9f76b 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -265,6 +265,18 @@ static int vfio_init_device(struct vfio_device *device, struct
> device *dev,
> >  	return ret;
> >  }
> >
> > +static int vfio_device_set_noiommu(struct vfio_device *device)
> > +{
> > +	struct iommu_group *iommu_group = iommu_group_get(device->dev);
> > +
> > +	if (!iommu_group && !vfio_noiommu)
> > +		return -EINVAL;
> > +
> > +	device->noiommu = !iommu_group;
> > +	iommu_group_put(iommu_group); /* Accepts NULL */
> > +	return 0;
> > +}
> > +
> >  static int __vfio_register_dev(struct vfio_device *device,
> >  			       enum vfio_group_type type)
> >  {
> > @@ -277,6 +289,13 @@ static int __vfio_register_dev(struct vfio_device *device,
> >  		     !device->ops->detach_ioas)))
> >  		return -EINVAL;
> >
> > +	/* Only physical devices can be noiommu device */
> > +	if (type == VFIO_IOMMU) {
> > +		ret = vfio_device_set_noiommu(device);
> > +		if (ret)
> > +			return ret;
> > +	}
> > +
> >  	/*
> >  	 * If the driver doesn't specify a set then the device is added to a
> >  	 * singleton set just for itself.
> > @@ -288,7 +307,8 @@ static int __vfio_register_dev(struct vfio_device *device,
> >  	if (ret)
> >  		return ret;
> >
> > -	ret = vfio_device_set_group(device, type);
> > +	ret = vfio_device_set_group(device,
> > +				    device->noiommu ? VFIO_NO_IOMMU : type);
> >  	if (ret)
> >  		return ret;
> >
> > @@ -301,6 +321,15 @@ static int __vfio_register_dev(struct vfio_device *device,
> >
> >  	vfio_device_group_register(device);
> >
> > +	if (device->noiommu) {
> > +		/*
> > +		 * noiommu deivces have no IOMMU hardware/driver.  Taint the
> > +		 * kernel because we're about to give a DMA capable device to
> > +		 * a user without IOMMU protection.
> > +		 */
> > +		add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> > +		dev_warn(device->dev, "Adding kernel taint for vfio-noiommu on
> device\n");
> > +	}
> >  	return 0;
> >  err_out:
> >  	vfio_device_remove_group(device);
> > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > index e80a8ac86e46..183e620009e7 100644
> > --- a/include/linux/vfio.h
> > +++ b/include/linux/vfio.h
> > @@ -67,6 +67,7 @@ struct vfio_device {
> >  	bool iommufd_attached;
> >  #endif
> >  	bool cdev_opened:1;
> > +	bool noiommu:1;
> >  };
> >
> >  /**


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
@ 2023-06-13  5:53       ` Liu, Yi L
  0 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-13  5:53 UTC (permalink / raw)
  To: Alex Williamson
  Cc: mjrosato, jasowang, Hao, Xudong, Duan,  Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, jgg, Tian, Kevin, Zhao, Yan Y, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, June 13, 2023 6:42 AM
> 
> On Fri,  2 Jun 2023 05:16:50 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > This moves the noiommu device determination and noiommu taint out of
> > vfio_group_find_or_alloc(). noiommu device is determined in
> > __vfio_register_dev() and result is stored in flag vfio_device->noiommu,
> > the noiommu taint is added in the end of __vfio_register_dev().
> >
> > This is also a preparation for compiling out vfio_group infrastructure
> > as it makes the noiommu detection and taint common between the cdev path
> > and group path though cdev path does not support noiommu.
> 
> Does this really still make sense?  The motivation for the change is
> really not clear without cdev support for noiommu.  Thanks,

I think it still makes sense. When CONFIG_VFIO_GROUP==n, the kernel
only supports cdev interface. If there is noiommu device, vfio should
fail the registration. So, the noiommu determination is still needed. But
I'd admit the taint might still be in the group code.

Regards,
Yi Liu

> Alex
> 
> > Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/group.c     | 15 ---------------
> >  drivers/vfio/vfio_main.c | 31 ++++++++++++++++++++++++++++++-
> >  include/linux/vfio.h     |  1 +
> >  3 files changed, 31 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > index 653b62f93474..64cdd0ea8825 100644
> > --- a/drivers/vfio/group.c
> > +++ b/drivers/vfio/group.c
> > @@ -668,21 +668,6 @@ static struct vfio_group *vfio_group_find_or_alloc(struct
> device *dev)
> >  	struct vfio_group *group;
> >
> >  	iommu_group = iommu_group_get(dev);
> > -	if (!iommu_group && vfio_noiommu) {
> > -		/*
> > -		 * With noiommu enabled, create an IOMMU group for devices that
> > -		 * don't already have one, implying no IOMMU hardware/driver
> > -		 * exists.  Taint the kernel because we're about to give a DMA
> > -		 * capable device to a user without IOMMU protection.
> > -		 */
> > -		group = vfio_noiommu_group_alloc(dev, VFIO_NO_IOMMU);
> > -		if (!IS_ERR(group)) {
> > -			add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> > -			dev_warn(dev, "Adding kernel taint for vfio-noiommu group on
> device\n");
> > -		}
> > -		return group;
> > -	}
> > -
> >  	if (!iommu_group)
> >  		return ERR_PTR(-EINVAL);
> >
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index 6d8f9b0f3637..00a699b9f76b 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -265,6 +265,18 @@ static int vfio_init_device(struct vfio_device *device, struct
> device *dev,
> >  	return ret;
> >  }
> >
> > +static int vfio_device_set_noiommu(struct vfio_device *device)
> > +{
> > +	struct iommu_group *iommu_group = iommu_group_get(device->dev);
> > +
> > +	if (!iommu_group && !vfio_noiommu)
> > +		return -EINVAL;
> > +
> > +	device->noiommu = !iommu_group;
> > +	iommu_group_put(iommu_group); /* Accepts NULL */
> > +	return 0;
> > +}
> > +
> >  static int __vfio_register_dev(struct vfio_device *device,
> >  			       enum vfio_group_type type)
> >  {
> > @@ -277,6 +289,13 @@ static int __vfio_register_dev(struct vfio_device *device,
> >  		     !device->ops->detach_ioas)))
> >  		return -EINVAL;
> >
> > +	/* Only physical devices can be noiommu device */
> > +	if (type == VFIO_IOMMU) {
> > +		ret = vfio_device_set_noiommu(device);
> > +		if (ret)
> > +			return ret;
> > +	}
> > +
> >  	/*
> >  	 * If the driver doesn't specify a set then the device is added to a
> >  	 * singleton set just for itself.
> > @@ -288,7 +307,8 @@ static int __vfio_register_dev(struct vfio_device *device,
> >  	if (ret)
> >  		return ret;
> >
> > -	ret = vfio_device_set_group(device, type);
> > +	ret = vfio_device_set_group(device,
> > +				    device->noiommu ? VFIO_NO_IOMMU : type);
> >  	if (ret)
> >  		return ret;
> >
> > @@ -301,6 +321,15 @@ static int __vfio_register_dev(struct vfio_device *device,
> >
> >  	vfio_device_group_register(device);
> >
> > +	if (device->noiommu) {
> > +		/*
> > +		 * noiommu deivces have no IOMMU hardware/driver.  Taint the
> > +		 * kernel because we're about to give a DMA capable device to
> > +		 * a user without IOMMU protection.
> > +		 */
> > +		add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> > +		dev_warn(device->dev, "Adding kernel taint for vfio-noiommu on
> device\n");
> > +	}
> >  	return 0;
> >  err_out:
> >  	vfio_device_remove_group(device);
> > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > index e80a8ac86e46..183e620009e7 100644
> > --- a/include/linux/vfio.h
> > +++ b/include/linux/vfio.h
> > @@ -67,6 +67,7 @@ struct vfio_device {
> >  	bool iommufd_attached;
> >  #endif
> >  	bool cdev_opened:1;
> > +	bool noiommu:1;
> >  };
> >
> >  /**


^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [PATCH v12 20/24] vfio: Only check group->type for noiommu test
  2023-06-12 22:37     ` [Intel-gfx] " Alex Williamson
@ 2023-06-13  9:20       ` Liu, Yi L
  -1 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-13  9:20 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, June 13, 2023 6:38 AM
> On Fri,  2 Jun 2023 05:16:49 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > group->type can be VFIO_NO_IOMMU only when vfio_noiommu option is true.
> > And vfio_noiommu option can only be true if CONFIG_VFIO_NOIOMMU is enabled.
> > So checking group->type is enough when testing noiommu.
> >
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/group.c | 3 +--
> >  drivers/vfio/vfio.h  | 3 +--
> >  2 files changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > index 41a09a2df690..653b62f93474 100644
> > --- a/drivers/vfio/group.c
> > +++ b/drivers/vfio/group.c
> > @@ -133,8 +133,7 @@ static int vfio_group_ioctl_set_container(struct vfio_group
> *group,
> >
> >  	iommufd = iommufd_ctx_from_file(f.file);
> >  	if (!IS_ERR(iommufd)) {
> > -		if (IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
> > -		    group->type == VFIO_NO_IOMMU)
> > +		if (group->type == VFIO_NO_IOMMU)
> >  			ret = iommufd_vfio_compat_set_no_iommu(iommufd);
> >  		else
> >  			ret = iommufd_vfio_compat_ioas_create(iommufd);
> > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > index 5835c74e97ce..1b89e8bc8571 100644
> > --- a/drivers/vfio/vfio.h
> > +++ b/drivers/vfio/vfio.h
> > @@ -108,8 +108,7 @@ void vfio_group_cleanup(void);
> >
> >  static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
> >  {
> > -	return IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
> > -	       vdev->group->type == VFIO_NO_IOMMU;
> > +	return vdev->group->type == VFIO_NO_IOMMU;
> >  }
> >
> >  #if IS_ENABLED(CONFIG_VFIO_CONTAINER)
> 
> This patch should be dropped.  It's logically correct, but ignores that
> the config option can be determined at compile time and therefore the
> code can be better optimized based on that test.  I think there was a
> specific case where I questioned it, but this drops an otherwise valid
> compiler optimization.  Thanks,

Yes. in v11, you mentioned the compiler optimization and the fact that
vfio_noiommu can only be valid when VFIO_NOIOMMU is enabled. I'm
ok to drop this patch to keep the compiler optimization.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 20/24] vfio: Only check group->type for noiommu test
@ 2023-06-13  9:20       ` Liu, Yi L
  0 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-13  9:20 UTC (permalink / raw)
  To: Alex Williamson
  Cc: mjrosato, jasowang, Hao, Xudong, Duan,  Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, jgg, Tian, Kevin, Zhao, Yan Y, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, June 13, 2023 6:38 AM
> On Fri,  2 Jun 2023 05:16:49 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > group->type can be VFIO_NO_IOMMU only when vfio_noiommu option is true.
> > And vfio_noiommu option can only be true if CONFIG_VFIO_NOIOMMU is enabled.
> > So checking group->type is enough when testing noiommu.
> >
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/group.c | 3 +--
> >  drivers/vfio/vfio.h  | 3 +--
> >  2 files changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > index 41a09a2df690..653b62f93474 100644
> > --- a/drivers/vfio/group.c
> > +++ b/drivers/vfio/group.c
> > @@ -133,8 +133,7 @@ static int vfio_group_ioctl_set_container(struct vfio_group
> *group,
> >
> >  	iommufd = iommufd_ctx_from_file(f.file);
> >  	if (!IS_ERR(iommufd)) {
> > -		if (IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
> > -		    group->type == VFIO_NO_IOMMU)
> > +		if (group->type == VFIO_NO_IOMMU)
> >  			ret = iommufd_vfio_compat_set_no_iommu(iommufd);
> >  		else
> >  			ret = iommufd_vfio_compat_ioas_create(iommufd);
> > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > index 5835c74e97ce..1b89e8bc8571 100644
> > --- a/drivers/vfio/vfio.h
> > +++ b/drivers/vfio/vfio.h
> > @@ -108,8 +108,7 @@ void vfio_group_cleanup(void);
> >
> >  static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
> >  {
> > -	return IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
> > -	       vdev->group->type == VFIO_NO_IOMMU;
> > +	return vdev->group->type == VFIO_NO_IOMMU;
> >  }
> >
> >  #if IS_ENABLED(CONFIG_VFIO_CONTAINER)
> 
> This patch should be dropped.  It's logically correct, but ignores that
> the config option can be determined at compile time and therefore the
> code can be better optimized based on that test.  I think there was a
> specific case where I questioned it, but this drops an otherwise valid
> compiler optimization.  Thanks,

Yes. in v11, you mentioned the compiler optimization and the fact that
vfio_noiommu can only be valid when VFIO_NOIOMMU is enabled. I'm
ok to drop this patch to keep the compiler optimization.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [PATCH v12 24/24] docs: vfio: Add vfio device cdev description
  2023-06-12 23:06     ` Alex Williamson
@ 2023-06-13 12:01       ` Liu, Yi L
  -1 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-13 12:01 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, June 13, 2023 7:06 AM
> 
> On Fri,  2 Jun 2023 05:16:53 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > This gives notes for userspace applications on device cdev usage.
> >
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  Documentation/driver-api/vfio.rst | 132 ++++++++++++++++++++++++++++++
> >  1 file changed, 132 insertions(+)
> >
> > diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
> > index 363e12c90b87..f00c9b86bda0 100644
> > --- a/Documentation/driver-api/vfio.rst
> > +++ b/Documentation/driver-api/vfio.rst
> > @@ -239,6 +239,130 @@ group and can access them as follows::
> >  	/* Gratuitous device reset and go... */
> >  	ioctl(device, VFIO_DEVICE_RESET);
> >
> > +IOMMUFD and vfio_iommu_type1
> > +----------------------------
> > +
> > +IOMMUFD is the new user API to manage I/O page tables from userspace.
> > +It intends to be the portal of delivering advanced userspace DMA
> > +features (nested translation [5]_, PASID [6]_, etc.) while also providing
> > +a backwards compatibility interface for existing VFIO_TYPE1v2_IOMMU use
> > +cases.  Eventually the vfio_iommu_type1 driver, as well as the legacy
> > +vfio container and group model is intended to be deprecated.
> > +
> > +The IOMMUFD backwards compatibility interface can be enabled two ways.
> > +In the first method, the kernel can be configured with
> > +CONFIG_IOMMUFD_VFIO_CONTAINER, in which case the IOMMUFD subsystem
> > +transparently provides the entire infrastructure for the VFIO
> > +container and IOMMU backend interfaces.  The compatibility mode can
> > +also be accessed if the VFIO container interface, ie. /dev/vfio/vfio is
> > +simply symlink'd to /dev/iommu.  Note that at the time of writing, the
> > +compatibility mode is not entirely feature complete relative to
> > +VFIO_TYPE1v2_IOMMU (ex. DMA mapping MMIO) and does not attempt to
> > +provide compatibility to the VFIO_SPAPR_TCE_IOMMU interface.  Therefore
> > +it is not generally advisable at this time to switch from native VFIO
> > +implementations to the IOMMUFD compatibility interfaces.
> > +
> > +Long term, VFIO users should migrate to device access through the cdev
> > +interface described below, and native access through the IOMMUFD
> > +provided interfaces.
> > +
> > +VFIO Device cdev
> > +----------------
> > +
> > +Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
> > +in a VFIO group.
> > +
> > +With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd
> > +by directly opening a character device /dev/vfio/devices/vfioX where
> > +"X" is the number allocated uniquely by VFIO for registered devices.
> > +cdev interface does not support noiommu, so user should use the legacy
> > +group interface if noiommu is needed.
> > +
> > +The cdev only works with IOMMUFD.  Both VFIO drivers and applications
> > +must adapt to the new cdev security model which requires using
> > +VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to
> > +actually use the device.  Once BIND succeeds then a VFIO device can
> > +be fully accessed by the user.
> > +
> > +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers.
> > +Hence those modules can be fully compiled out in an environment
> > +where no legacy VFIO application exists.
> > +
> > +So far SPAPR does not support IOMMUFD yet.  So it cannot support device
> > +cdev neither.
> 
> s/neither/either/

Got it.

> 
> Unless I missed it, we've not described that vfio device cdev access is
> still bound by IOMMU group semantics, ie. there can be one DMA owner
> for the group.  That's a pretty common failure point for multi-function
> consumer device use cases, so the why, where, and how it fails should
> be well covered.

Yes. this needs to be documented. How about below words:

vfio device cdev access is still bound by IOMMU group semantics, ie. there
can be only one DMA owner for the group.  Devices belonging to the same
group can not be bound to multiple iommufd_ctx.  The users that try to bind
such device to different iommufd shall be failed in VFIO_DEVICE_BIND_IOMMUFD
which is the start point to get full access for the device.

> 
> In general there's been a lot of cross collaboration to get the series
> this far.  I see an abundance of Tested-by, but unfortunately not a lot
> of Reviewed-by beyond about the first 1/3rd of the series.  Thanks,

Yeah. The rest 2/3rd part has back and forth changes since v8.

Regards,
Yi Liu

> Alex
> 
> > +
> > +Device cdev Example
> > +-------------------
> > +
> > +Assume user wants to access PCI device 0000:6a:01.0::
> > +
> > +	$ ls /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/
> > +	vfio0
> > +
> > +This device is therefore represented as vfio0.  The user can verify
> > +its existence::
> > +
> > +	$ ls -l /dev/vfio/devices/vfio0
> > +	crw------- 1 root root 511, 0 Feb 16 01:22 /dev/vfio/devices/vfio0
> > +	$ cat /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/vfio0/dev
> > +	511:0
> > +	$ ls -l /dev/char/511\:0
> > +	lrwxrwxrwx 1 root root 21 Feb 16 01:22 /dev/char/511:0 -> ../vfio/devices/vfio0
> > +
> > +Then provide the user with access to the device if unprivileged
> > +operation is desired::
> > +
> > +	$ chown user:user /dev/vfio/devices/vfio0
> > +
> > +Finally the user could get cdev fd by::
> > +
> > +	cdev_fd = open("/dev/vfio/devices/vfio0", O_RDWR);
> > +
> > +An opened cdev_fd doesn't give the user any permission of accessing
> > +the device except binding the cdev_fd to an iommufd.  After that point
> > +then the device is fully accessible including attaching it to an
> > +IOMMUFD IOAS/HWPT to enable userspace DMA::
> > +
> > +	struct vfio_device_bind_iommufd bind = {
> > +		.argsz = sizeof(bind),
> > +		.flags = 0,
> > +	};
> > +	struct iommu_ioas_alloc alloc_data  = {
> > +		.size = sizeof(alloc_data),
> > +		.flags = 0,
> > +	};
> > +	struct vfio_device_attach_iommufd_pt attach_data = {
> > +		.argsz = sizeof(attach_data),
> > +		.flags = 0,
> > +	};
> > +	struct iommu_ioas_map map = {
> > +		.size = sizeof(map),
> > +		.flags = IOMMU_IOAS_MAP_READABLE |
> > +			 IOMMU_IOAS_MAP_WRITEABLE |
> > +			 IOMMU_IOAS_MAP_FIXED_IOVA,
> > +		.__reserved = 0,
> > +	};
> > +
> > +	iommufd = open("/dev/iommu", O_RDWR);
> > +
> > +	bind.iommufd = iommufd;
> > +	ioctl(cdev_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind);
> > +
> > +	ioctl(iommufd, IOMMU_IOAS_ALLOC, &alloc_data);
> > +	attach_data.pt_id = alloc_data.out_ioas_id;
> > +	ioctl(cdev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data);
> > +
> > +	/* Allocate some space and setup a DMA mapping */
> > +	map.user_va = (int64_t)mmap(0, 1024 * 1024, PROT_READ | PROT_WRITE,
> > +				    MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
> > +	map.iova = 0; /* 1MB starting at 0x0 from device view */
> > +	map.length = 1024 * 1024;
> > +	map.ioas_id = alloc_data.out_ioas_id;;
> > +
> > +	ioctl(iommufd, IOMMU_IOAS_MAP, &map);
> > +
> > +	/* Other device operations as stated in "VFIO Usage Example" */
> > +
> >  VFIO User API
> >  -------------------------------------------------------------------------------
> >
> > @@ -566,3 +690,11 @@ This implementation has some specifics:
> >  				\-0d.1
> >
> >  	00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
> > +
> > +.. [5] Nested translation is an IOMMU feature which supports two stage
> > +   address translations.  This improves the address translation efficiency
> > +   in IOMMU virtualization.
> > +
> > +.. [6] PASID stands for Process Address Space ID, introduced by PCI
> > +   Express.  It is a prerequisite for Shared Virtual Addressing (SVA)
> > +   and Scalable I/O Virtualization (Scalable IOV).


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 24/24] docs: vfio: Add vfio device cdev description
@ 2023-06-13 12:01       ` Liu, Yi L
  0 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-13 12:01 UTC (permalink / raw)
  To: Alex Williamson
  Cc: mjrosato, jasowang, Hao, Xudong, Duan,  Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, jgg, Tian, Kevin, Zhao, Yan Y, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, June 13, 2023 7:06 AM
> 
> On Fri,  2 Jun 2023 05:16:53 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > This gives notes for userspace applications on device cdev usage.
> >
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  Documentation/driver-api/vfio.rst | 132 ++++++++++++++++++++++++++++++
> >  1 file changed, 132 insertions(+)
> >
> > diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
> > index 363e12c90b87..f00c9b86bda0 100644
> > --- a/Documentation/driver-api/vfio.rst
> > +++ b/Documentation/driver-api/vfio.rst
> > @@ -239,6 +239,130 @@ group and can access them as follows::
> >  	/* Gratuitous device reset and go... */
> >  	ioctl(device, VFIO_DEVICE_RESET);
> >
> > +IOMMUFD and vfio_iommu_type1
> > +----------------------------
> > +
> > +IOMMUFD is the new user API to manage I/O page tables from userspace.
> > +It intends to be the portal of delivering advanced userspace DMA
> > +features (nested translation [5]_, PASID [6]_, etc.) while also providing
> > +a backwards compatibility interface for existing VFIO_TYPE1v2_IOMMU use
> > +cases.  Eventually the vfio_iommu_type1 driver, as well as the legacy
> > +vfio container and group model is intended to be deprecated.
> > +
> > +The IOMMUFD backwards compatibility interface can be enabled two ways.
> > +In the first method, the kernel can be configured with
> > +CONFIG_IOMMUFD_VFIO_CONTAINER, in which case the IOMMUFD subsystem
> > +transparently provides the entire infrastructure for the VFIO
> > +container and IOMMU backend interfaces.  The compatibility mode can
> > +also be accessed if the VFIO container interface, ie. /dev/vfio/vfio is
> > +simply symlink'd to /dev/iommu.  Note that at the time of writing, the
> > +compatibility mode is not entirely feature complete relative to
> > +VFIO_TYPE1v2_IOMMU (ex. DMA mapping MMIO) and does not attempt to
> > +provide compatibility to the VFIO_SPAPR_TCE_IOMMU interface.  Therefore
> > +it is not generally advisable at this time to switch from native VFIO
> > +implementations to the IOMMUFD compatibility interfaces.
> > +
> > +Long term, VFIO users should migrate to device access through the cdev
> > +interface described below, and native access through the IOMMUFD
> > +provided interfaces.
> > +
> > +VFIO Device cdev
> > +----------------
> > +
> > +Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
> > +in a VFIO group.
> > +
> > +With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd
> > +by directly opening a character device /dev/vfio/devices/vfioX where
> > +"X" is the number allocated uniquely by VFIO for registered devices.
> > +cdev interface does not support noiommu, so user should use the legacy
> > +group interface if noiommu is needed.
> > +
> > +The cdev only works with IOMMUFD.  Both VFIO drivers and applications
> > +must adapt to the new cdev security model which requires using
> > +VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to
> > +actually use the device.  Once BIND succeeds then a VFIO device can
> > +be fully accessed by the user.
> > +
> > +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers.
> > +Hence those modules can be fully compiled out in an environment
> > +where no legacy VFIO application exists.
> > +
> > +So far SPAPR does not support IOMMUFD yet.  So it cannot support device
> > +cdev neither.
> 
> s/neither/either/

Got it.

> 
> Unless I missed it, we've not described that vfio device cdev access is
> still bound by IOMMU group semantics, ie. there can be one DMA owner
> for the group.  That's a pretty common failure point for multi-function
> consumer device use cases, so the why, where, and how it fails should
> be well covered.

Yes. this needs to be documented. How about below words:

vfio device cdev access is still bound by IOMMU group semantics, ie. there
can be only one DMA owner for the group.  Devices belonging to the same
group can not be bound to multiple iommufd_ctx.  The users that try to bind
such device to different iommufd shall be failed in VFIO_DEVICE_BIND_IOMMUFD
which is the start point to get full access for the device.

> 
> In general there's been a lot of cross collaboration to get the series
> this far.  I see an abundance of Tested-by, but unfortunately not a lot
> of Reviewed-by beyond about the first 1/3rd of the series.  Thanks,

Yeah. The rest 2/3rd part has back and forth changes since v8.

Regards,
Yi Liu

> Alex
> 
> > +
> > +Device cdev Example
> > +-------------------
> > +
> > +Assume user wants to access PCI device 0000:6a:01.0::
> > +
> > +	$ ls /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/
> > +	vfio0
> > +
> > +This device is therefore represented as vfio0.  The user can verify
> > +its existence::
> > +
> > +	$ ls -l /dev/vfio/devices/vfio0
> > +	crw------- 1 root root 511, 0 Feb 16 01:22 /dev/vfio/devices/vfio0
> > +	$ cat /sys/bus/pci/devices/0000:6a:01.0/vfio-dev/vfio0/dev
> > +	511:0
> > +	$ ls -l /dev/char/511\:0
> > +	lrwxrwxrwx 1 root root 21 Feb 16 01:22 /dev/char/511:0 -> ../vfio/devices/vfio0
> > +
> > +Then provide the user with access to the device if unprivileged
> > +operation is desired::
> > +
> > +	$ chown user:user /dev/vfio/devices/vfio0
> > +
> > +Finally the user could get cdev fd by::
> > +
> > +	cdev_fd = open("/dev/vfio/devices/vfio0", O_RDWR);
> > +
> > +An opened cdev_fd doesn't give the user any permission of accessing
> > +the device except binding the cdev_fd to an iommufd.  After that point
> > +then the device is fully accessible including attaching it to an
> > +IOMMUFD IOAS/HWPT to enable userspace DMA::
> > +
> > +	struct vfio_device_bind_iommufd bind = {
> > +		.argsz = sizeof(bind),
> > +		.flags = 0,
> > +	};
> > +	struct iommu_ioas_alloc alloc_data  = {
> > +		.size = sizeof(alloc_data),
> > +		.flags = 0,
> > +	};
> > +	struct vfio_device_attach_iommufd_pt attach_data = {
> > +		.argsz = sizeof(attach_data),
> > +		.flags = 0,
> > +	};
> > +	struct iommu_ioas_map map = {
> > +		.size = sizeof(map),
> > +		.flags = IOMMU_IOAS_MAP_READABLE |
> > +			 IOMMU_IOAS_MAP_WRITEABLE |
> > +			 IOMMU_IOAS_MAP_FIXED_IOVA,
> > +		.__reserved = 0,
> > +	};
> > +
> > +	iommufd = open("/dev/iommu", O_RDWR);
> > +
> > +	bind.iommufd = iommufd;
> > +	ioctl(cdev_fd, VFIO_DEVICE_BIND_IOMMUFD, &bind);
> > +
> > +	ioctl(iommufd, IOMMU_IOAS_ALLOC, &alloc_data);
> > +	attach_data.pt_id = alloc_data.out_ioas_id;
> > +	ioctl(cdev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data);
> > +
> > +	/* Allocate some space and setup a DMA mapping */
> > +	map.user_va = (int64_t)mmap(0, 1024 * 1024, PROT_READ | PROT_WRITE,
> > +				    MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
> > +	map.iova = 0; /* 1MB starting at 0x0 from device view */
> > +	map.length = 1024 * 1024;
> > +	map.ioas_id = alloc_data.out_ioas_id;;
> > +
> > +	ioctl(iommufd, IOMMU_IOAS_MAP, &map);
> > +
> > +	/* Other device operations as stated in "VFIO Usage Example" */
> > +
> >  VFIO User API
> >  -------------------------------------------------------------------------------
> >
> > @@ -566,3 +690,11 @@ This implementation has some specifics:
> >  				\-0d.1
> >
> >  	00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)
> > +
> > +.. [5] Nested translation is an IOMMU feature which supports two stage
> > +   address translations.  This improves the address translation efficiency
> > +   in IOMMU virtualization.
> > +
> > +.. [6] PASID stands for Process Address Space ID, introduced by PCI
> > +   Express.  It is a prerequisite for Shared Virtual Addressing (SVA)
> > +   and Scalable I/O Virtualization (Scalable IOV).


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 07/24] vfio: Block device access via device fd until device is opened
  2023-06-13  5:46       ` [Intel-gfx] " Liu, Yi L
@ 2023-06-13 14:16         ` Alex Williamson
  -1 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-13 14:16 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato, jasowang, Hao, Xudong, Duan, Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, jgg, Tian, Kevin, Zhao,  Yan Y, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Tue, 13 Jun 2023 05:46:32 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Tuesday, June 13, 2023 5:52 AM
> > 
> > On Fri,  2 Jun 2023 05:16:36 -0700
> > Yi Liu <yi.l.liu@intel.com> wrote:
> >   
> > > Allow the vfio_device file to be in a state where the device FD is
> > > opened but the device cannot be used by userspace (i.e. its .open_device()
> > > hasn't been called). This inbetween state is not used when the device
> > > FD is spawned from the group FD, however when we create the device FD
> > > directly by opening a cdev it will be opened in the blocked state.
> > >
> > > The reason for the inbetween state is that userspace only gets a FD but
> > > doesn't gain access permission until binding the FD to an iommufd. So in
> > > the blocked state, only the bind operation is allowed. Completing bind
> > > will allow user to further access the device.
> > >
> > > This is implemented by adding a flag in struct vfio_device_file to mark
> > > the blocked state and using a simple smp_load_acquire() to obtain the
> > > flag value and serialize all the device setup with the thread accessing
> > > this device.
> > >
> > > Following this lockless scheme, it can safely handle the device FD
> > > unbound->bound but it cannot handle bound->unbound. To allow this we'd
> > > need to add a lock on all the vfio ioctls which seems costly. So once
> > > device FD is bound, it remains bound until the FD is closed.
> > >
> > > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > > Reviewed-by: Eric Auger <eric.auger@redhat.com>
> > > Tested-by: Terrence Xu <terrence.xu@intel.com>
> > > Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> > > Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
> > > Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> > > Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > ---
> > >  drivers/vfio/group.c     | 11 ++++++++++-
> > >  drivers/vfio/vfio.h      |  1 +
> > >  drivers/vfio/vfio_main.c | 16 ++++++++++++++++
> > >  3 files changed, 27 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > > index caf53716ddb2..088dd34c8931 100644
> > > --- a/drivers/vfio/group.c
> > > +++ b/drivers/vfio/group.c
> > > @@ -194,9 +194,18 @@ static int vfio_df_group_open(struct vfio_device_file *df)
> > >  	df->iommufd = device->group->iommufd;
> > >
> > >  	ret = vfio_df_open(df);
> > > -	if (ret)
> > > +	if (ret) {
> > >  		df->iommufd = NULL;
> > > +		goto out_put_kvm;
> > > +	}
> > > +
> > > +	/*
> > > +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> > > +	 * read/write/mmap and vfio_file_has_device_access()
> > > +	 */
> > > +	smp_store_release(&df->access_granted, true);
> > >
> > > +out_put_kvm:
> > >  	if (device->open_count == 0)
> > >  		vfio_device_put_kvm(device);
> > >
> > > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > > index f9eb52eb9ed7..fdf2fc73f880 100644
> > > --- a/drivers/vfio/vfio.h
> > > +++ b/drivers/vfio/vfio.h
> > > @@ -18,6 +18,7 @@ struct vfio_container;
> > >
> > >  struct vfio_device_file {
> > >  	struct vfio_device *device;
> > > +	bool access_granted;  
> > 
> > Should we make this a more strongly defined data type and later move
> > devid (u32) here to partially fill the hole created?  
> 
> Before your question, let me describe how I place the fields
> of this structure to see if it is common practice. The first two
> fields are static, so they are in the beginning. The access_granted
> is lockless and other fields are protected by locks. So I tried to
> put the lock and the fields it protects closely. So this is why I put
> devid behind iommufd as both are protected by the same lock.

I think the primary considerations are locality and compactness.  Hot
paths data should be within the first cache line of the structure,
related data should share a cache line, and we should use the space
efficiently.  What you describe seems largely an aesthetic concern,
which was not evident to me by the segmentation alone.
 
> struct vfio_device_file {
>         struct vfio_device *device;
>         struct vfio_group *group;
> 
>         bool access_granted;
>         spinlock_t kvm_ref_lock; /* protect kvm field */
>         struct kvm *kvm;
>         struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
>         u32 devid; /* only valid when iommufd is valid */
> };
> 
> > 
> > I think this is being placed towards the front of the data structure
> > for cache line locality given this is a hot path for file operations.
> > But bool types have an implementation dependent size, making them
> > difficult to pack.  Also there will be a tendency to want to make this
> > a bit field, which is probably not compatible with the smp lockless
> > operations being used here.  We might get in front of these issues if
> > we just define it as a u8 now.  Thanks,  
> 
> Not quite get why bit field is going to be incompatible with smp
> lockless operations. Could you elaborate a bit? And should I define
> the access_granted as u8 or "u8:1"?

Perhaps FUD on my part, but load-acquire type operations have specific
semantics and it's not clear to me that they interest with compiler
generated bit operations.  Thanks,

Alex

> > >  	spinlock_t kvm_ref_lock; /* protect kvm field */
> > >  	struct kvm *kvm;
> > >  	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
> > > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > > index a3c5817fc545..4c8b7713dc3d 100644
> > > --- a/drivers/vfio/vfio_main.c
> > > +++ b/drivers/vfio/vfio_main.c
> > > @@ -1129,6 +1129,10 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
> > >  	struct vfio_device *device = df->device;
> > >  	int ret;
> > >
> > > +	/* Paired with smp_store_release() following vfio_df_open() */
> > > +	if (!smp_load_acquire(&df->access_granted))
> > > +		return -EINVAL;
> > > +
> > >  	ret = vfio_device_pm_runtime_get(device);
> > >  	if (ret)
> > >  		return ret;
> > > @@ -1156,6 +1160,10 @@ static ssize_t vfio_device_fops_read(struct file *filep, char  
> > __user *buf,  
> > >  	struct vfio_device_file *df = filep->private_data;
> > >  	struct vfio_device *device = df->device;
> > >
> > > +	/* Paired with smp_store_release() following vfio_df_open() */
> > > +	if (!smp_load_acquire(&df->access_granted))
> > > +		return -EINVAL;
> > > +
> > >  	if (unlikely(!device->ops->read))
> > >  		return -EINVAL;
> > >
> > > @@ -1169,6 +1177,10 @@ static ssize_t vfio_device_fops_write(struct file *filep,
> > >  	struct vfio_device_file *df = filep->private_data;
> > >  	struct vfio_device *device = df->device;
> > >
> > > +	/* Paired with smp_store_release() following vfio_df_open() */
> > > +	if (!smp_load_acquire(&df->access_granted))
> > > +		return -EINVAL;
> > > +
> > >  	if (unlikely(!device->ops->write))
> > >  		return -EINVAL;
> > >
> > > @@ -1180,6 +1192,10 @@ static int vfio_device_fops_mmap(struct file *filep, struct  
> > vm_area_struct *vma)  
> > >  	struct vfio_device_file *df = filep->private_data;
> > >  	struct vfio_device *device = df->device;
> > >
> > > +	/* Paired with smp_store_release() following vfio_df_open() */
> > > +	if (!smp_load_acquire(&df->access_granted))
> > > +		return -EINVAL;
> > > +
> > >  	if (unlikely(!device->ops->mmap))
> > >  		return -EINVAL;
> > >  
> 


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 07/24] vfio: Block device access via device fd until device is opened
@ 2023-06-13 14:16         ` Alex Williamson
  0 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-13 14:16 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

On Tue, 13 Jun 2023 05:46:32 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Tuesday, June 13, 2023 5:52 AM
> > 
> > On Fri,  2 Jun 2023 05:16:36 -0700
> > Yi Liu <yi.l.liu@intel.com> wrote:
> >   
> > > Allow the vfio_device file to be in a state where the device FD is
> > > opened but the device cannot be used by userspace (i.e. its .open_device()
> > > hasn't been called). This inbetween state is not used when the device
> > > FD is spawned from the group FD, however when we create the device FD
> > > directly by opening a cdev it will be opened in the blocked state.
> > >
> > > The reason for the inbetween state is that userspace only gets a FD but
> > > doesn't gain access permission until binding the FD to an iommufd. So in
> > > the blocked state, only the bind operation is allowed. Completing bind
> > > will allow user to further access the device.
> > >
> > > This is implemented by adding a flag in struct vfio_device_file to mark
> > > the blocked state and using a simple smp_load_acquire() to obtain the
> > > flag value and serialize all the device setup with the thread accessing
> > > this device.
> > >
> > > Following this lockless scheme, it can safely handle the device FD
> > > unbound->bound but it cannot handle bound->unbound. To allow this we'd
> > > need to add a lock on all the vfio ioctls which seems costly. So once
> > > device FD is bound, it remains bound until the FD is closed.
> > >
> > > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > > Reviewed-by: Eric Auger <eric.auger@redhat.com>
> > > Tested-by: Terrence Xu <terrence.xu@intel.com>
> > > Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> > > Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
> > > Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> > > Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > ---
> > >  drivers/vfio/group.c     | 11 ++++++++++-
> > >  drivers/vfio/vfio.h      |  1 +
> > >  drivers/vfio/vfio_main.c | 16 ++++++++++++++++
> > >  3 files changed, 27 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > > index caf53716ddb2..088dd34c8931 100644
> > > --- a/drivers/vfio/group.c
> > > +++ b/drivers/vfio/group.c
> > > @@ -194,9 +194,18 @@ static int vfio_df_group_open(struct vfio_device_file *df)
> > >  	df->iommufd = device->group->iommufd;
> > >
> > >  	ret = vfio_df_open(df);
> > > -	if (ret)
> > > +	if (ret) {
> > >  		df->iommufd = NULL;
> > > +		goto out_put_kvm;
> > > +	}
> > > +
> > > +	/*
> > > +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> > > +	 * read/write/mmap and vfio_file_has_device_access()
> > > +	 */
> > > +	smp_store_release(&df->access_granted, true);
> > >
> > > +out_put_kvm:
> > >  	if (device->open_count == 0)
> > >  		vfio_device_put_kvm(device);
> > >
> > > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > > index f9eb52eb9ed7..fdf2fc73f880 100644
> > > --- a/drivers/vfio/vfio.h
> > > +++ b/drivers/vfio/vfio.h
> > > @@ -18,6 +18,7 @@ struct vfio_container;
> > >
> > >  struct vfio_device_file {
> > >  	struct vfio_device *device;
> > > +	bool access_granted;  
> > 
> > Should we make this a more strongly defined data type and later move
> > devid (u32) here to partially fill the hole created?  
> 
> Before your question, let me describe how I place the fields
> of this structure to see if it is common practice. The first two
> fields are static, so they are in the beginning. The access_granted
> is lockless and other fields are protected by locks. So I tried to
> put the lock and the fields it protects closely. So this is why I put
> devid behind iommufd as both are protected by the same lock.

I think the primary considerations are locality and compactness.  Hot
paths data should be within the first cache line of the structure,
related data should share a cache line, and we should use the space
efficiently.  What you describe seems largely an aesthetic concern,
which was not evident to me by the segmentation alone.
 
> struct vfio_device_file {
>         struct vfio_device *device;
>         struct vfio_group *group;
> 
>         bool access_granted;
>         spinlock_t kvm_ref_lock; /* protect kvm field */
>         struct kvm *kvm;
>         struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
>         u32 devid; /* only valid when iommufd is valid */
> };
> 
> > 
> > I think this is being placed towards the front of the data structure
> > for cache line locality given this is a hot path for file operations.
> > But bool types have an implementation dependent size, making them
> > difficult to pack.  Also there will be a tendency to want to make this
> > a bit field, which is probably not compatible with the smp lockless
> > operations being used here.  We might get in front of these issues if
> > we just define it as a u8 now.  Thanks,  
> 
> Not quite get why bit field is going to be incompatible with smp
> lockless operations. Could you elaborate a bit? And should I define
> the access_granted as u8 or "u8:1"?

Perhaps FUD on my part, but load-acquire type operations have specific
semantics and it's not clear to me that they interest with compiler
generated bit operations.  Thanks,

Alex

> > >  	spinlock_t kvm_ref_lock; /* protect kvm field */
> > >  	struct kvm *kvm;
> > >  	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
> > > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > > index a3c5817fc545..4c8b7713dc3d 100644
> > > --- a/drivers/vfio/vfio_main.c
> > > +++ b/drivers/vfio/vfio_main.c
> > > @@ -1129,6 +1129,10 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
> > >  	struct vfio_device *device = df->device;
> > >  	int ret;
> > >
> > > +	/* Paired with smp_store_release() following vfio_df_open() */
> > > +	if (!smp_load_acquire(&df->access_granted))
> > > +		return -EINVAL;
> > > +
> > >  	ret = vfio_device_pm_runtime_get(device);
> > >  	if (ret)
> > >  		return ret;
> > > @@ -1156,6 +1160,10 @@ static ssize_t vfio_device_fops_read(struct file *filep, char  
> > __user *buf,  
> > >  	struct vfio_device_file *df = filep->private_data;
> > >  	struct vfio_device *device = df->device;
> > >
> > > +	/* Paired with smp_store_release() following vfio_df_open() */
> > > +	if (!smp_load_acquire(&df->access_granted))
> > > +		return -EINVAL;
> > > +
> > >  	if (unlikely(!device->ops->read))
> > >  		return -EINVAL;
> > >
> > > @@ -1169,6 +1177,10 @@ static ssize_t vfio_device_fops_write(struct file *filep,
> > >  	struct vfio_device_file *df = filep->private_data;
> > >  	struct vfio_device *device = df->device;
> > >
> > > +	/* Paired with smp_store_release() following vfio_df_open() */
> > > +	if (!smp_load_acquire(&df->access_granted))
> > > +		return -EINVAL;
> > > +
> > >  	if (unlikely(!device->ops->write))
> > >  		return -EINVAL;
> > >
> > > @@ -1180,6 +1192,10 @@ static int vfio_device_fops_mmap(struct file *filep, struct  
> > vm_area_struct *vma)  
> > >  	struct vfio_device_file *df = filep->private_data;
> > >  	struct vfio_device *device = df->device;
> > >
> > > +	/* Paired with smp_store_release() following vfio_df_open() */
> > > +	if (!smp_load_acquire(&df->access_granted))
> > > +		return -EINVAL;
> > > +
> > >  	if (unlikely(!device->ops->mmap))
> > >  		return -EINVAL;
> > >  
> 


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-06-13  5:48       ` [Intel-gfx] " Liu, Yi L
@ 2023-06-13 14:18         ` Alex Williamson
  -1 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-13 14:18 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato, jasowang, Hao, Xudong, Duan, Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, jgg, Tian, Kevin, Zhao,  Yan Y, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Tue, 13 Jun 2023 05:48:46 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Tuesday, June 13, 2023 6:27 AM
> > 
> > On Fri,  2 Jun 2023 05:16:47 -0700
> > Yi Liu <yi.l.liu@intel.com> wrote:
> >   
> > > This adds ioctl for userspace to bind device cdev fd to iommufd.
> > >
> > >     VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA
> > > 			      control provided by the iommufd. open_device
> > > 			      op is called after bind_iommufd op.
> > >
> > > Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> > > Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > > Tested-by: Terrence Xu <terrence.xu@intel.com>
> > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > ---
> > >  drivers/vfio/device_cdev.c | 123 +++++++++++++++++++++++++++++++++++++
> > >  drivers/vfio/vfio.h        |  13 ++++
> > >  drivers/vfio/vfio_main.c   |   5 ++
> > >  include/linux/vfio.h       |   3 +-
> > >  include/uapi/linux/vfio.h  |  27 ++++++++
> > >  5 files changed, 170 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> > > index 1c640016a824..a4498ddbe774 100644
> > > --- a/drivers/vfio/device_cdev.c
> > > +++ b/drivers/vfio/device_cdev.c
> > > @@ -3,6 +3,7 @@
> > >   * Copyright (c) 2023 Intel Corporation.
> > >   */
> > >  #include <linux/vfio.h>
> > > +#include <linux/iommufd.h>
> > >
> > >  #include "vfio.h"
> > >
> > > @@ -44,6 +45,128 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct  
> > file *filep)  
> > >  	return ret;
> > >  }
> > >
> > > +static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
> > > +{
> > > +	spin_lock(&df->kvm_ref_lock);
> > > +	if (df->kvm)
> > > +		_vfio_device_get_kvm_safe(df->device, df->kvm);
> > > +	spin_unlock(&df->kvm_ref_lock);
> > > +}
> > > +
> > > +void vfio_df_cdev_close(struct vfio_device_file *df)
> > > +{
> > > +	struct vfio_device *device = df->device;
> > > +
> > > +	/*
> > > +	 * In the time of close, there is no contention with another one
> > > +	 * changing this flag.  So read df->access_granted without lock
> > > +	 * and no smp_load_acquire() is ok.
> > > +	 */
> > > +	if (!df->access_granted)
> > > +		return;
> > > +
> > > +	mutex_lock(&device->dev_set->lock);
> > > +	vfio_df_close(df);
> > > +	vfio_device_put_kvm(device);
> > > +	iommufd_ctx_put(df->iommufd);
> > > +	device->cdev_opened = false;
> > > +	mutex_unlock(&device->dev_set->lock);
> > > +	vfio_device_unblock_group(device);
> > > +}
> > > +
> > > +static struct iommufd_ctx *vfio_get_iommufd_from_fd(int fd)
> > > +{
> > > +	struct iommufd_ctx *iommufd;
> > > +	struct fd f;
> > > +
> > > +	f = fdget(fd);
> > > +	if (!f.file)
> > > +		return ERR_PTR(-EBADF);
> > > +
> > > +	iommufd = iommufd_ctx_from_file(f.file);
> > > +
> > > +	fdput(f);
> > > +	return iommufd;
> > > +}
> > > +
> > > +long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
> > > +				struct vfio_device_bind_iommufd __user *arg)
> > > +{
> > > +	struct vfio_device *device = df->device;
> > > +	struct vfio_device_bind_iommufd bind;
> > > +	unsigned long minsz;
> > > +	int ret;
> > > +
> > > +	static_assert(__same_type(arg->out_devid, df->devid));
> > > +
> > > +	minsz = offsetofend(struct vfio_device_bind_iommufd, out_devid);
> > > +
> > > +	if (copy_from_user(&bind, arg, minsz))
> > > +		return -EFAULT;
> > > +
> > > +	if (bind.argsz < minsz || bind.flags || bind.iommufd < 0)
> > > +		return -EINVAL;
> > > +
> > > +	/* BIND_IOMMUFD only allowed for cdev fds */
> > > +	if (df->group)
> > > +		return -EINVAL;
> > > +
> > > +	ret = vfio_device_block_group(device);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	mutex_lock(&device->dev_set->lock);
> > > +	/* one device cannot be bound twice */
> > > +	if (df->access_granted) {
> > > +		ret = -EINVAL;
> > > +		goto out_unlock;
> > > +	}
> > > +
> > > +	df->iommufd = vfio_get_iommufd_from_fd(bind.iommufd);
> > > +	if (IS_ERR(df->iommufd)) {
> > > +		ret = PTR_ERR(df->iommufd);
> > > +		df->iommufd = NULL;
> > > +		goto out_unlock;
> > > +	}
> > > +
> > > +	/*
> > > +	 * Before the device open, get the KVM pointer currently
> > > +	 * associated with the device file (if there is) and obtain
> > > +	 * a reference.  This reference is held until device closed.
> > > +	 * Save the pointer in the device for use by drivers.
> > > +	 */
> > > +	vfio_device_get_kvm_safe(df);
> > > +
> > > +	ret = vfio_df_open(df);
> > > +	if (ret)
> > > +		goto out_put_kvm;
> > > +
> > > +	ret = copy_to_user(&arg->out_devid, &df->devid,
> > > +			   sizeof(df->devid)) ? -EFAULT : 0;
> > > +	if (ret)
> > > +		goto out_close_device;
> > > +
> > > +	/*
> > > +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> > > +	 * read/write/mmap
> > > +	 */
> > > +	smp_store_release(&df->access_granted, true);
> > > +	device->cdev_opened = true;
> > > +	mutex_unlock(&device->dev_set->lock);
> > > +	return 0;
> > > +
> > > +out_close_device:
> > > +	vfio_df_close(df);
> > > +out_put_kvm:
> > > +	vfio_device_put_kvm(device);
> > > +	iommufd_ctx_put(df->iommufd);
> > > +	df->iommufd = NULL;
> > > +out_unlock:
> > > +	mutex_unlock(&device->dev_set->lock);
> > > +	vfio_device_unblock_group(device);
> > > +	return ret;
> > > +}
> > > +
> > >  static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
> > >  {
> > >  	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
> > > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > > index d12b5b524bfc..42de40d2cd4d 100644
> > > --- a/drivers/vfio/vfio.h
> > > +++ b/drivers/vfio/vfio.h
> > > @@ -287,6 +287,9 @@ static inline void vfio_device_del(struct vfio_device *device)
> > >  }
> > >
> > >  int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
> > > +void vfio_df_cdev_close(struct vfio_device_file *df);
> > > +long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
> > > +				struct vfio_device_bind_iommufd __user *arg);
> > >  int vfio_cdev_init(struct class *device_class);
> > >  void vfio_cdev_cleanup(void);
> > >  #else
> > > @@ -310,6 +313,16 @@ static inline int vfio_device_fops_cdev_open(struct inode  
> > *inode,  
> > >  	return 0;
> > >  }
> > >
> > > +static inline void vfio_df_cdev_close(struct vfio_device_file *df)
> > > +{
> > > +}
> > > +
> > > +static inline long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
> > > +					      struct vfio_device_bind_iommufd __user  
> > *arg)  
> > > +{
> > > +	return -EOPNOTSUPP;
> > > +}
> > > +
> > >  static inline int vfio_cdev_init(struct class *device_class)
> > >  {
> > >  	return 0;
> > > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > > index ef55af75f459..9ba4d420eda2 100644
> > > --- a/drivers/vfio/vfio_main.c
> > > +++ b/drivers/vfio/vfio_main.c
> > > @@ -572,6 +572,8 @@ static int vfio_device_fops_release(struct inode *inode, struct  
> > file *filep)  
> > >
> > >  	if (df->group)
> > >  		vfio_df_group_close(df);
> > > +	else
> > > +		vfio_df_cdev_close(df);
> > >
> > >  	vfio_device_put_registration(device);
> > >
> > > @@ -1145,6 +1147,9 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
> > >  	struct vfio_device *device = df->device;
> > >  	int ret;
> > >
> > > +	if (cmd == VFIO_DEVICE_BIND_IOMMUFD)
> > > +		return vfio_df_ioctl_bind_iommufd(df, (void __user *)arg);
> > > +
> > >  	/* Paired with smp_store_release() following vfio_df_open() */
> > >  	if (!smp_load_acquire(&df->access_granted))
> > >  		return -EINVAL;
> > > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > > index 83cc5dc28b7a..e80a8ac86e46 100644
> > > --- a/include/linux/vfio.h
> > > +++ b/include/linux/vfio.h
> > > @@ -66,6 +66,7 @@ struct vfio_device {
> > >  	struct iommufd_device *iommufd_device;
> > >  	bool iommufd_attached;
> > >  #endif
> > > +	bool cdev_opened:1;  
> > 
> > Perhaps a more strongly defined data type here as well and roll
> > iommufd_attached into the same bit field scheme.  
> 
> Ok, then needs to make iommufd_attached always defined.

That does not follow.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
@ 2023-06-13 14:18         ` Alex Williamson
  0 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-13 14:18 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

On Tue, 13 Jun 2023 05:48:46 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Tuesday, June 13, 2023 6:27 AM
> > 
> > On Fri,  2 Jun 2023 05:16:47 -0700
> > Yi Liu <yi.l.liu@intel.com> wrote:
> >   
> > > This adds ioctl for userspace to bind device cdev fd to iommufd.
> > >
> > >     VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA
> > > 			      control provided by the iommufd. open_device
> > > 			      op is called after bind_iommufd op.
> > >
> > > Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> > > Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > > Tested-by: Terrence Xu <terrence.xu@intel.com>
> > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > ---
> > >  drivers/vfio/device_cdev.c | 123 +++++++++++++++++++++++++++++++++++++
> > >  drivers/vfio/vfio.h        |  13 ++++
> > >  drivers/vfio/vfio_main.c   |   5 ++
> > >  include/linux/vfio.h       |   3 +-
> > >  include/uapi/linux/vfio.h  |  27 ++++++++
> > >  5 files changed, 170 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> > > index 1c640016a824..a4498ddbe774 100644
> > > --- a/drivers/vfio/device_cdev.c
> > > +++ b/drivers/vfio/device_cdev.c
> > > @@ -3,6 +3,7 @@
> > >   * Copyright (c) 2023 Intel Corporation.
> > >   */
> > >  #include <linux/vfio.h>
> > > +#include <linux/iommufd.h>
> > >
> > >  #include "vfio.h"
> > >
> > > @@ -44,6 +45,128 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct  
> > file *filep)  
> > >  	return ret;
> > >  }
> > >
> > > +static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
> > > +{
> > > +	spin_lock(&df->kvm_ref_lock);
> > > +	if (df->kvm)
> > > +		_vfio_device_get_kvm_safe(df->device, df->kvm);
> > > +	spin_unlock(&df->kvm_ref_lock);
> > > +}
> > > +
> > > +void vfio_df_cdev_close(struct vfio_device_file *df)
> > > +{
> > > +	struct vfio_device *device = df->device;
> > > +
> > > +	/*
> > > +	 * In the time of close, there is no contention with another one
> > > +	 * changing this flag.  So read df->access_granted without lock
> > > +	 * and no smp_load_acquire() is ok.
> > > +	 */
> > > +	if (!df->access_granted)
> > > +		return;
> > > +
> > > +	mutex_lock(&device->dev_set->lock);
> > > +	vfio_df_close(df);
> > > +	vfio_device_put_kvm(device);
> > > +	iommufd_ctx_put(df->iommufd);
> > > +	device->cdev_opened = false;
> > > +	mutex_unlock(&device->dev_set->lock);
> > > +	vfio_device_unblock_group(device);
> > > +}
> > > +
> > > +static struct iommufd_ctx *vfio_get_iommufd_from_fd(int fd)
> > > +{
> > > +	struct iommufd_ctx *iommufd;
> > > +	struct fd f;
> > > +
> > > +	f = fdget(fd);
> > > +	if (!f.file)
> > > +		return ERR_PTR(-EBADF);
> > > +
> > > +	iommufd = iommufd_ctx_from_file(f.file);
> > > +
> > > +	fdput(f);
> > > +	return iommufd;
> > > +}
> > > +
> > > +long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
> > > +				struct vfio_device_bind_iommufd __user *arg)
> > > +{
> > > +	struct vfio_device *device = df->device;
> > > +	struct vfio_device_bind_iommufd bind;
> > > +	unsigned long minsz;
> > > +	int ret;
> > > +
> > > +	static_assert(__same_type(arg->out_devid, df->devid));
> > > +
> > > +	minsz = offsetofend(struct vfio_device_bind_iommufd, out_devid);
> > > +
> > > +	if (copy_from_user(&bind, arg, minsz))
> > > +		return -EFAULT;
> > > +
> > > +	if (bind.argsz < minsz || bind.flags || bind.iommufd < 0)
> > > +		return -EINVAL;
> > > +
> > > +	/* BIND_IOMMUFD only allowed for cdev fds */
> > > +	if (df->group)
> > > +		return -EINVAL;
> > > +
> > > +	ret = vfio_device_block_group(device);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	mutex_lock(&device->dev_set->lock);
> > > +	/* one device cannot be bound twice */
> > > +	if (df->access_granted) {
> > > +		ret = -EINVAL;
> > > +		goto out_unlock;
> > > +	}
> > > +
> > > +	df->iommufd = vfio_get_iommufd_from_fd(bind.iommufd);
> > > +	if (IS_ERR(df->iommufd)) {
> > > +		ret = PTR_ERR(df->iommufd);
> > > +		df->iommufd = NULL;
> > > +		goto out_unlock;
> > > +	}
> > > +
> > > +	/*
> > > +	 * Before the device open, get the KVM pointer currently
> > > +	 * associated with the device file (if there is) and obtain
> > > +	 * a reference.  This reference is held until device closed.
> > > +	 * Save the pointer in the device for use by drivers.
> > > +	 */
> > > +	vfio_device_get_kvm_safe(df);
> > > +
> > > +	ret = vfio_df_open(df);
> > > +	if (ret)
> > > +		goto out_put_kvm;
> > > +
> > > +	ret = copy_to_user(&arg->out_devid, &df->devid,
> > > +			   sizeof(df->devid)) ? -EFAULT : 0;
> > > +	if (ret)
> > > +		goto out_close_device;
> > > +
> > > +	/*
> > > +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> > > +	 * read/write/mmap
> > > +	 */
> > > +	smp_store_release(&df->access_granted, true);
> > > +	device->cdev_opened = true;
> > > +	mutex_unlock(&device->dev_set->lock);
> > > +	return 0;
> > > +
> > > +out_close_device:
> > > +	vfio_df_close(df);
> > > +out_put_kvm:
> > > +	vfio_device_put_kvm(device);
> > > +	iommufd_ctx_put(df->iommufd);
> > > +	df->iommufd = NULL;
> > > +out_unlock:
> > > +	mutex_unlock(&device->dev_set->lock);
> > > +	vfio_device_unblock_group(device);
> > > +	return ret;
> > > +}
> > > +
> > >  static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
> > >  {
> > >  	return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
> > > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > > index d12b5b524bfc..42de40d2cd4d 100644
> > > --- a/drivers/vfio/vfio.h
> > > +++ b/drivers/vfio/vfio.h
> > > @@ -287,6 +287,9 @@ static inline void vfio_device_del(struct vfio_device *device)
> > >  }
> > >
> > >  int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep);
> > > +void vfio_df_cdev_close(struct vfio_device_file *df);
> > > +long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
> > > +				struct vfio_device_bind_iommufd __user *arg);
> > >  int vfio_cdev_init(struct class *device_class);
> > >  void vfio_cdev_cleanup(void);
> > >  #else
> > > @@ -310,6 +313,16 @@ static inline int vfio_device_fops_cdev_open(struct inode  
> > *inode,  
> > >  	return 0;
> > >  }
> > >
> > > +static inline void vfio_df_cdev_close(struct vfio_device_file *df)
> > > +{
> > > +}
> > > +
> > > +static inline long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
> > > +					      struct vfio_device_bind_iommufd __user  
> > *arg)  
> > > +{
> > > +	return -EOPNOTSUPP;
> > > +}
> > > +
> > >  static inline int vfio_cdev_init(struct class *device_class)
> > >  {
> > >  	return 0;
> > > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > > index ef55af75f459..9ba4d420eda2 100644
> > > --- a/drivers/vfio/vfio_main.c
> > > +++ b/drivers/vfio/vfio_main.c
> > > @@ -572,6 +572,8 @@ static int vfio_device_fops_release(struct inode *inode, struct  
> > file *filep)  
> > >
> > >  	if (df->group)
> > >  		vfio_df_group_close(df);
> > > +	else
> > > +		vfio_df_cdev_close(df);
> > >
> > >  	vfio_device_put_registration(device);
> > >
> > > @@ -1145,6 +1147,9 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
> > >  	struct vfio_device *device = df->device;
> > >  	int ret;
> > >
> > > +	if (cmd == VFIO_DEVICE_BIND_IOMMUFD)
> > > +		return vfio_df_ioctl_bind_iommufd(df, (void __user *)arg);
> > > +
> > >  	/* Paired with smp_store_release() following vfio_df_open() */
> > >  	if (!smp_load_acquire(&df->access_granted))
> > >  		return -EINVAL;
> > > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > > index 83cc5dc28b7a..e80a8ac86e46 100644
> > > --- a/include/linux/vfio.h
> > > +++ b/include/linux/vfio.h
> > > @@ -66,6 +66,7 @@ struct vfio_device {
> > >  	struct iommufd_device *iommufd_device;
> > >  	bool iommufd_attached;
> > >  #endif
> > > +	bool cdev_opened:1;  
> > 
> > Perhaps a more strongly defined data type here as well and roll
> > iommufd_attached into the same bit field scheme.  
> 
> Ok, then needs to make iommufd_attached always defined.

That does not follow.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
  2023-06-13  5:53       ` [Intel-gfx] " Liu, Yi L
@ 2023-06-13 14:19         ` Alex Williamson
  -1 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-13 14:19 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato, jasowang, Hao, Xudong, Duan, Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, jgg, Tian, Kevin, Zhao,  Yan Y, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Tue, 13 Jun 2023 05:53:42 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Tuesday, June 13, 2023 6:42 AM
> > 
> > On Fri,  2 Jun 2023 05:16:50 -0700
> > Yi Liu <yi.l.liu@intel.com> wrote:
> >   
> > > This moves the noiommu device determination and noiommu taint out of
> > > vfio_group_find_or_alloc(). noiommu device is determined in
> > > __vfio_register_dev() and result is stored in flag vfio_device->noiommu,
> > > the noiommu taint is added in the end of __vfio_register_dev().
> > >
> > > This is also a preparation for compiling out vfio_group infrastructure
> > > as it makes the noiommu detection and taint common between the cdev path
> > > and group path though cdev path does not support noiommu.  
> > 
> > Does this really still make sense?  The motivation for the change is
> > really not clear without cdev support for noiommu.  Thanks,  
> 
> I think it still makes sense. When CONFIG_VFIO_GROUP==n, the kernel
> only supports cdev interface. If there is noiommu device, vfio should
> fail the registration. So, the noiommu determination is still needed. But
> I'd admit the taint might still be in the group code.

How is there going to be a noiommu device when VFIO_GROUP is unset?
Thanks,

Alex


> > > Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > ---
> > >  drivers/vfio/group.c     | 15 ---------------
> > >  drivers/vfio/vfio_main.c | 31 ++++++++++++++++++++++++++++++-
> > >  include/linux/vfio.h     |  1 +
> > >  3 files changed, 31 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > > index 653b62f93474..64cdd0ea8825 100644
> > > --- a/drivers/vfio/group.c
> > > +++ b/drivers/vfio/group.c
> > > @@ -668,21 +668,6 @@ static struct vfio_group *vfio_group_find_or_alloc(struct  
> > device *dev)  
> > >  	struct vfio_group *group;
> > >
> > >  	iommu_group = iommu_group_get(dev);
> > > -	if (!iommu_group && vfio_noiommu) {
> > > -		/*
> > > -		 * With noiommu enabled, create an IOMMU group for devices that
> > > -		 * don't already have one, implying no IOMMU hardware/driver
> > > -		 * exists.  Taint the kernel because we're about to give a DMA
> > > -		 * capable device to a user without IOMMU protection.
> > > -		 */
> > > -		group = vfio_noiommu_group_alloc(dev, VFIO_NO_IOMMU);
> > > -		if (!IS_ERR(group)) {
> > > -			add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> > > -			dev_warn(dev, "Adding kernel taint for vfio-noiommu group on  
> > device\n");  
> > > -		}
> > > -		return group;
> > > -	}
> > > -
> > >  	if (!iommu_group)
> > >  		return ERR_PTR(-EINVAL);
> > >
> > > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > > index 6d8f9b0f3637..00a699b9f76b 100644
> > > --- a/drivers/vfio/vfio_main.c
> > > +++ b/drivers/vfio/vfio_main.c
> > > @@ -265,6 +265,18 @@ static int vfio_init_device(struct vfio_device *device, struct  
> > device *dev,  
> > >  	return ret;
> > >  }
> > >
> > > +static int vfio_device_set_noiommu(struct vfio_device *device)
> > > +{
> > > +	struct iommu_group *iommu_group = iommu_group_get(device->dev);
> > > +
> > > +	if (!iommu_group && !vfio_noiommu)
> > > +		return -EINVAL;
> > > +
> > > +	device->noiommu = !iommu_group;
> > > +	iommu_group_put(iommu_group); /* Accepts NULL */
> > > +	return 0;
> > > +}
> > > +
> > >  static int __vfio_register_dev(struct vfio_device *device,
> > >  			       enum vfio_group_type type)
> > >  {
> > > @@ -277,6 +289,13 @@ static int __vfio_register_dev(struct vfio_device *device,
> > >  		     !device->ops->detach_ioas)))
> > >  		return -EINVAL;
> > >
> > > +	/* Only physical devices can be noiommu device */
> > > +	if (type == VFIO_IOMMU) {
> > > +		ret = vfio_device_set_noiommu(device);
> > > +		if (ret)
> > > +			return ret;
> > > +	}
> > > +
> > >  	/*
> > >  	 * If the driver doesn't specify a set then the device is added to a
> > >  	 * singleton set just for itself.
> > > @@ -288,7 +307,8 @@ static int __vfio_register_dev(struct vfio_device *device,
> > >  	if (ret)
> > >  		return ret;
> > >
> > > -	ret = vfio_device_set_group(device, type);
> > > +	ret = vfio_device_set_group(device,
> > > +				    device->noiommu ? VFIO_NO_IOMMU : type);
> > >  	if (ret)
> > >  		return ret;
> > >
> > > @@ -301,6 +321,15 @@ static int __vfio_register_dev(struct vfio_device *device,
> > >
> > >  	vfio_device_group_register(device);
> > >
> > > +	if (device->noiommu) {
> > > +		/*
> > > +		 * noiommu deivces have no IOMMU hardware/driver.  Taint the
> > > +		 * kernel because we're about to give a DMA capable device to
> > > +		 * a user without IOMMU protection.
> > > +		 */
> > > +		add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> > > +		dev_warn(device->dev, "Adding kernel taint for vfio-noiommu on  
> > device\n");  
> > > +	}
> > >  	return 0;
> > >  err_out:
> > >  	vfio_device_remove_group(device);
> > > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > > index e80a8ac86e46..183e620009e7 100644
> > > --- a/include/linux/vfio.h
> > > +++ b/include/linux/vfio.h
> > > @@ -67,6 +67,7 @@ struct vfio_device {
> > >  	bool iommufd_attached;
> > >  #endif
> > >  	bool cdev_opened:1;
> > > +	bool noiommu:1;
> > >  };
> > >
> > >  /**  
> 


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
@ 2023-06-13 14:19         ` Alex Williamson
  0 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-13 14:19 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

On Tue, 13 Jun 2023 05:53:42 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Tuesday, June 13, 2023 6:42 AM
> > 
> > On Fri,  2 Jun 2023 05:16:50 -0700
> > Yi Liu <yi.l.liu@intel.com> wrote:
> >   
> > > This moves the noiommu device determination and noiommu taint out of
> > > vfio_group_find_or_alloc(). noiommu device is determined in
> > > __vfio_register_dev() and result is stored in flag vfio_device->noiommu,
> > > the noiommu taint is added in the end of __vfio_register_dev().
> > >
> > > This is also a preparation for compiling out vfio_group infrastructure
> > > as it makes the noiommu detection and taint common between the cdev path
> > > and group path though cdev path does not support noiommu.  
> > 
> > Does this really still make sense?  The motivation for the change is
> > really not clear without cdev support for noiommu.  Thanks,  
> 
> I think it still makes sense. When CONFIG_VFIO_GROUP==n, the kernel
> only supports cdev interface. If there is noiommu device, vfio should
> fail the registration. So, the noiommu determination is still needed. But
> I'd admit the taint might still be in the group code.

How is there going to be a noiommu device when VFIO_GROUP is unset?
Thanks,

Alex


> > > Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > ---
> > >  drivers/vfio/group.c     | 15 ---------------
> > >  drivers/vfio/vfio_main.c | 31 ++++++++++++++++++++++++++++++-
> > >  include/linux/vfio.h     |  1 +
> > >  3 files changed, 31 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > > index 653b62f93474..64cdd0ea8825 100644
> > > --- a/drivers/vfio/group.c
> > > +++ b/drivers/vfio/group.c
> > > @@ -668,21 +668,6 @@ static struct vfio_group *vfio_group_find_or_alloc(struct  
> > device *dev)  
> > >  	struct vfio_group *group;
> > >
> > >  	iommu_group = iommu_group_get(dev);
> > > -	if (!iommu_group && vfio_noiommu) {
> > > -		/*
> > > -		 * With noiommu enabled, create an IOMMU group for devices that
> > > -		 * don't already have one, implying no IOMMU hardware/driver
> > > -		 * exists.  Taint the kernel because we're about to give a DMA
> > > -		 * capable device to a user without IOMMU protection.
> > > -		 */
> > > -		group = vfio_noiommu_group_alloc(dev, VFIO_NO_IOMMU);
> > > -		if (!IS_ERR(group)) {
> > > -			add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> > > -			dev_warn(dev, "Adding kernel taint for vfio-noiommu group on  
> > device\n");  
> > > -		}
> > > -		return group;
> > > -	}
> > > -
> > >  	if (!iommu_group)
> > >  		return ERR_PTR(-EINVAL);
> > >
> > > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > > index 6d8f9b0f3637..00a699b9f76b 100644
> > > --- a/drivers/vfio/vfio_main.c
> > > +++ b/drivers/vfio/vfio_main.c
> > > @@ -265,6 +265,18 @@ static int vfio_init_device(struct vfio_device *device, struct  
> > device *dev,  
> > >  	return ret;
> > >  }
> > >
> > > +static int vfio_device_set_noiommu(struct vfio_device *device)
> > > +{
> > > +	struct iommu_group *iommu_group = iommu_group_get(device->dev);
> > > +
> > > +	if (!iommu_group && !vfio_noiommu)
> > > +		return -EINVAL;
> > > +
> > > +	device->noiommu = !iommu_group;
> > > +	iommu_group_put(iommu_group); /* Accepts NULL */
> > > +	return 0;
> > > +}
> > > +
> > >  static int __vfio_register_dev(struct vfio_device *device,
> > >  			       enum vfio_group_type type)
> > >  {
> > > @@ -277,6 +289,13 @@ static int __vfio_register_dev(struct vfio_device *device,
> > >  		     !device->ops->detach_ioas)))
> > >  		return -EINVAL;
> > >
> > > +	/* Only physical devices can be noiommu device */
> > > +	if (type == VFIO_IOMMU) {
> > > +		ret = vfio_device_set_noiommu(device);
> > > +		if (ret)
> > > +			return ret;
> > > +	}
> > > +
> > >  	/*
> > >  	 * If the driver doesn't specify a set then the device is added to a
> > >  	 * singleton set just for itself.
> > > @@ -288,7 +307,8 @@ static int __vfio_register_dev(struct vfio_device *device,
> > >  	if (ret)
> > >  		return ret;
> > >
> > > -	ret = vfio_device_set_group(device, type);
> > > +	ret = vfio_device_set_group(device,
> > > +				    device->noiommu ? VFIO_NO_IOMMU : type);
> > >  	if (ret)
> > >  		return ret;
> > >
> > > @@ -301,6 +321,15 @@ static int __vfio_register_dev(struct vfio_device *device,
> > >
> > >  	vfio_device_group_register(device);
> > >
> > > +	if (device->noiommu) {
> > > +		/*
> > > +		 * noiommu deivces have no IOMMU hardware/driver.  Taint the
> > > +		 * kernel because we're about to give a DMA capable device to
> > > +		 * a user without IOMMU protection.
> > > +		 */
> > > +		add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> > > +		dev_warn(device->dev, "Adding kernel taint for vfio-noiommu on  
> > device\n");  
> > > +	}
> > >  	return 0;
> > >  err_out:
> > >  	vfio_device_remove_group(device);
> > > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > > index e80a8ac86e46..183e620009e7 100644
> > > --- a/include/linux/vfio.h
> > > +++ b/include/linux/vfio.h
> > > @@ -67,6 +67,7 @@ struct vfio_device {
> > >  	bool iommufd_attached;
> > >  #endif
> > >  	bool cdev_opened:1;
> > > +	bool noiommu:1;
> > >  };
> > >
> > >  /**  
> 


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 24/24] docs: vfio: Add vfio device cdev description
  2023-06-13 12:01       ` [Intel-gfx] " Liu, Yi L
@ 2023-06-13 14:24         ` Alex Williamson
  -1 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-13 14:24 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato, jasowang, Hao, Xudong, Duan, Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, jgg, Tian, Kevin, Zhao,  Yan Y, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Tue, 13 Jun 2023 12:01:51 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Tuesday, June 13, 2023 7:06 AM
> > 
> > On Fri,  2 Jun 2023 05:16:53 -0700
> > Yi Liu <yi.l.liu@intel.com> wrote:
> >   
> > > This gives notes for userspace applications on device cdev usage.
> > >
> > > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > ---
> > >  Documentation/driver-api/vfio.rst | 132 ++++++++++++++++++++++++++++++
> > >  1 file changed, 132 insertions(+)
> > >
> > > diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
> > > index 363e12c90b87..f00c9b86bda0 100644
> > > --- a/Documentation/driver-api/vfio.rst
> > > +++ b/Documentation/driver-api/vfio.rst
> > > @@ -239,6 +239,130 @@ group and can access them as follows::
> > >  	/* Gratuitous device reset and go... */
> > >  	ioctl(device, VFIO_DEVICE_RESET);
> > >
> > > +IOMMUFD and vfio_iommu_type1
> > > +----------------------------
> > > +
> > > +IOMMUFD is the new user API to manage I/O page tables from userspace.
> > > +It intends to be the portal of delivering advanced userspace DMA
> > > +features (nested translation [5]_, PASID [6]_, etc.) while also providing
> > > +a backwards compatibility interface for existing VFIO_TYPE1v2_IOMMU use
> > > +cases.  Eventually the vfio_iommu_type1 driver, as well as the legacy
> > > +vfio container and group model is intended to be deprecated.
> > > +
> > > +The IOMMUFD backwards compatibility interface can be enabled two ways.
> > > +In the first method, the kernel can be configured with
> > > +CONFIG_IOMMUFD_VFIO_CONTAINER, in which case the IOMMUFD subsystem
> > > +transparently provides the entire infrastructure for the VFIO
> > > +container and IOMMU backend interfaces.  The compatibility mode can
> > > +also be accessed if the VFIO container interface, ie. /dev/vfio/vfio is
> > > +simply symlink'd to /dev/iommu.  Note that at the time of writing, the
> > > +compatibility mode is not entirely feature complete relative to
> > > +VFIO_TYPE1v2_IOMMU (ex. DMA mapping MMIO) and does not attempt to
> > > +provide compatibility to the VFIO_SPAPR_TCE_IOMMU interface.  Therefore
> > > +it is not generally advisable at this time to switch from native VFIO
> > > +implementations to the IOMMUFD compatibility interfaces.
> > > +
> > > +Long term, VFIO users should migrate to device access through the cdev
> > > +interface described below, and native access through the IOMMUFD
> > > +provided interfaces.
> > > +
> > > +VFIO Device cdev
> > > +----------------
> > > +
> > > +Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
> > > +in a VFIO group.
> > > +
> > > +With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd
> > > +by directly opening a character device /dev/vfio/devices/vfioX where
> > > +"X" is the number allocated uniquely by VFIO for registered devices.
> > > +cdev interface does not support noiommu, so user should use the legacy
> > > +group interface if noiommu is needed.
> > > +
> > > +The cdev only works with IOMMUFD.  Both VFIO drivers and applications
> > > +must adapt to the new cdev security model which requires using
> > > +VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to
> > > +actually use the device.  Once BIND succeeds then a VFIO device can
> > > +be fully accessed by the user.
> > > +
> > > +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers.
> > > +Hence those modules can be fully compiled out in an environment
> > > +where no legacy VFIO application exists.
> > > +
> > > +So far SPAPR does not support IOMMUFD yet.  So it cannot support device
> > > +cdev neither.  
> > 
> > s/neither/either/  
> 
> Got it.
> 
> > 
> > Unless I missed it, we've not described that vfio device cdev access is
> > still bound by IOMMU group semantics, ie. there can be one DMA owner
> > for the group.  That's a pretty common failure point for multi-function
> > consumer device use cases, so the why, where, and how it fails should
> > be well covered.  
> 
> Yes. this needs to be documented. How about below words:
> 
> vfio device cdev access is still bound by IOMMU group semantics, ie. there
> can be only one DMA owner for the group.  Devices belonging to the same
> group can not be bound to multiple iommufd_ctx.

... or shared between native kernel and vfio drivers.


>  The users that try to bind
> such device to different iommufd shall be failed in VFIO_DEVICE_BIND_IOMMUFD
> which is the start point to get full access for the device.

"A violation of this ownership requirement will fail at the
VFIO_DEVICE_BIND_IOMMUFD ioctl, which gates full device access."

Thanks,
Alex


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 24/24] docs: vfio: Add vfio device cdev description
@ 2023-06-13 14:24         ` Alex Williamson
  0 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-13 14:24 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

On Tue, 13 Jun 2023 12:01:51 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Tuesday, June 13, 2023 7:06 AM
> > 
> > On Fri,  2 Jun 2023 05:16:53 -0700
> > Yi Liu <yi.l.liu@intel.com> wrote:
> >   
> > > This gives notes for userspace applications on device cdev usage.
> > >
> > > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > ---
> > >  Documentation/driver-api/vfio.rst | 132 ++++++++++++++++++++++++++++++
> > >  1 file changed, 132 insertions(+)
> > >
> > > diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
> > > index 363e12c90b87..f00c9b86bda0 100644
> > > --- a/Documentation/driver-api/vfio.rst
> > > +++ b/Documentation/driver-api/vfio.rst
> > > @@ -239,6 +239,130 @@ group and can access them as follows::
> > >  	/* Gratuitous device reset and go... */
> > >  	ioctl(device, VFIO_DEVICE_RESET);
> > >
> > > +IOMMUFD and vfio_iommu_type1
> > > +----------------------------
> > > +
> > > +IOMMUFD is the new user API to manage I/O page tables from userspace.
> > > +It intends to be the portal of delivering advanced userspace DMA
> > > +features (nested translation [5]_, PASID [6]_, etc.) while also providing
> > > +a backwards compatibility interface for existing VFIO_TYPE1v2_IOMMU use
> > > +cases.  Eventually the vfio_iommu_type1 driver, as well as the legacy
> > > +vfio container and group model is intended to be deprecated.
> > > +
> > > +The IOMMUFD backwards compatibility interface can be enabled two ways.
> > > +In the first method, the kernel can be configured with
> > > +CONFIG_IOMMUFD_VFIO_CONTAINER, in which case the IOMMUFD subsystem
> > > +transparently provides the entire infrastructure for the VFIO
> > > +container and IOMMU backend interfaces.  The compatibility mode can
> > > +also be accessed if the VFIO container interface, ie. /dev/vfio/vfio is
> > > +simply symlink'd to /dev/iommu.  Note that at the time of writing, the
> > > +compatibility mode is not entirely feature complete relative to
> > > +VFIO_TYPE1v2_IOMMU (ex. DMA mapping MMIO) and does not attempt to
> > > +provide compatibility to the VFIO_SPAPR_TCE_IOMMU interface.  Therefore
> > > +it is not generally advisable at this time to switch from native VFIO
> > > +implementations to the IOMMUFD compatibility interfaces.
> > > +
> > > +Long term, VFIO users should migrate to device access through the cdev
> > > +interface described below, and native access through the IOMMUFD
> > > +provided interfaces.
> > > +
> > > +VFIO Device cdev
> > > +----------------
> > > +
> > > +Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
> > > +in a VFIO group.
> > > +
> > > +With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd
> > > +by directly opening a character device /dev/vfio/devices/vfioX where
> > > +"X" is the number allocated uniquely by VFIO for registered devices.
> > > +cdev interface does not support noiommu, so user should use the legacy
> > > +group interface if noiommu is needed.
> > > +
> > > +The cdev only works with IOMMUFD.  Both VFIO drivers and applications
> > > +must adapt to the new cdev security model which requires using
> > > +VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to
> > > +actually use the device.  Once BIND succeeds then a VFIO device can
> > > +be fully accessed by the user.
> > > +
> > > +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers.
> > > +Hence those modules can be fully compiled out in an environment
> > > +where no legacy VFIO application exists.
> > > +
> > > +So far SPAPR does not support IOMMUFD yet.  So it cannot support device
> > > +cdev neither.  
> > 
> > s/neither/either/  
> 
> Got it.
> 
> > 
> > Unless I missed it, we've not described that vfio device cdev access is
> > still bound by IOMMU group semantics, ie. there can be one DMA owner
> > for the group.  That's a pretty common failure point for multi-function
> > consumer device use cases, so the why, where, and how it fails should
> > be well covered.  
> 
> Yes. this needs to be documented. How about below words:
> 
> vfio device cdev access is still bound by IOMMU group semantics, ie. there
> can be only one DMA owner for the group.  Devices belonging to the same
> group can not be bound to multiple iommufd_ctx.

... or shared between native kernel and vfio drivers.


>  The users that try to bind
> such device to different iommufd shall be failed in VFIO_DEVICE_BIND_IOMMUFD
> which is the start point to get full access for the device.

"A violation of this ownership requirement will fail at the
VFIO_DEVICE_BIND_IOMMUFD ioctl, which gates full device access."

Thanks,
Alex


^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-06-13 14:18         ` Alex Williamson
@ 2023-06-13 14:28           ` Liu, Yi L
  -1 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-13 14:28 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, June 13, 2023 10:18 PM

> > > > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > > > index 83cc5dc28b7a..e80a8ac86e46 100644
> > > > --- a/include/linux/vfio.h
> > > > +++ b/include/linux/vfio.h
> > > > @@ -66,6 +66,7 @@ struct vfio_device {
> > > >  	struct iommufd_device *iommufd_device;
> > > >  	bool iommufd_attached;
> > > >  #endif
> > > > +	bool cdev_opened:1;
> > >
> > > Perhaps a more strongly defined data type here as well and roll
> > > iommufd_attached into the same bit field scheme.
> >
> > Ok, then needs to make iommufd_attached always defined.
> 
> That does not follow.  Thanks,

Well, I meant the iommufd_attached now is defined only when
CONFIG_IOMMUFD is enabled. To toll it with cdev_opened, needs
to change this.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
@ 2023-06-13 14:28           ` Liu, Yi L
  0 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-13 14:28 UTC (permalink / raw)
  To: Alex Williamson
  Cc: mjrosato, jasowang, Hao, Xudong, Duan,  Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, jgg, Tian, Kevin, Zhao, Yan Y, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, June 13, 2023 10:18 PM

> > > > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > > > index 83cc5dc28b7a..e80a8ac86e46 100644
> > > > --- a/include/linux/vfio.h
> > > > +++ b/include/linux/vfio.h
> > > > @@ -66,6 +66,7 @@ struct vfio_device {
> > > >  	struct iommufd_device *iommufd_device;
> > > >  	bool iommufd_attached;
> > > >  #endif
> > > > +	bool cdev_opened:1;
> > >
> > > Perhaps a more strongly defined data type here as well and roll
> > > iommufd_attached into the same bit field scheme.
> >
> > Ok, then needs to make iommufd_attached always defined.
> 
> That does not follow.  Thanks,

Well, I meant the iommufd_attached now is defined only when
CONFIG_IOMMUFD is enabled. To toll it with cdev_opened, needs
to change this.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
  2023-06-13 14:19         ` Alex Williamson
@ 2023-06-13 14:33           ` Liu, Yi L
  -1 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-13 14:33 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, June 13, 2023 10:19 PM
> 
> On Tue, 13 Jun 2023 05:53:42 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> 
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Tuesday, June 13, 2023 6:42 AM
> > >
> > > On Fri,  2 Jun 2023 05:16:50 -0700
> > > Yi Liu <yi.l.liu@intel.com> wrote:
> > >
> > > > This moves the noiommu device determination and noiommu taint out of
> > > > vfio_group_find_or_alloc(). noiommu device is determined in
> > > > __vfio_register_dev() and result is stored in flag vfio_device->noiommu,
> > > > the noiommu taint is added in the end of __vfio_register_dev().
> > > >
> > > > This is also a preparation for compiling out vfio_group infrastructure
> > > > as it makes the noiommu detection and taint common between the cdev path
> > > > and group path though cdev path does not support noiommu.
> > >
> > > Does this really still make sense?  The motivation for the change is
> > > really not clear without cdev support for noiommu.  Thanks,
> >
> > I think it still makes sense. When CONFIG_VFIO_GROUP==n, the kernel
> > only supports cdev interface. If there is noiommu device, vfio should
> > fail the registration. So, the noiommu determination is still needed. But
> > I'd admit the taint might still be in the group code.
> 
> How is there going to be a noiommu device when VFIO_GROUP is unset?

How about booting a kernel with iommu disabled, then all the devices
are not protected by iommu. I suppose they are noiommu devices. If
user wants to bound them to vfio, the kernel should have VFIO_GROUP.
Otherwise, needs to fail.

Regards,
Yi Liu

> Thanks,
> 
> Alex
> 
> 
> > > > Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> > > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > > ---
> > > >  drivers/vfio/group.c     | 15 ---------------
> > > >  drivers/vfio/vfio_main.c | 31 ++++++++++++++++++++++++++++++-
> > > >  include/linux/vfio.h     |  1 +
> > > >  3 files changed, 31 insertions(+), 16 deletions(-)
> > > >
> > > > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > > > index 653b62f93474..64cdd0ea8825 100644
> > > > --- a/drivers/vfio/group.c
> > > > +++ b/drivers/vfio/group.c
> > > > @@ -668,21 +668,6 @@ static struct vfio_group *vfio_group_find_or_alloc(struct
> > > device *dev)
> > > >  	struct vfio_group *group;
> > > >
> > > >  	iommu_group = iommu_group_get(dev);
> > > > -	if (!iommu_group && vfio_noiommu) {
> > > > -		/*
> > > > -		 * With noiommu enabled, create an IOMMU group for devices that
> > > > -		 * don't already have one, implying no IOMMU hardware/driver
> > > > -		 * exists.  Taint the kernel because we're about to give a DMA
> > > > -		 * capable device to a user without IOMMU protection.
> > > > -		 */
> > > > -		group = vfio_noiommu_group_alloc(dev, VFIO_NO_IOMMU);
> > > > -		if (!IS_ERR(group)) {
> > > > -			add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> > > > -			dev_warn(dev, "Adding kernel taint for vfio-noiommu group on
> > > device\n");
> > > > -		}
> > > > -		return group;
> > > > -	}
> > > > -
> > > >  	if (!iommu_group)
> > > >  		return ERR_PTR(-EINVAL);
> > > >
> > > > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > > > index 6d8f9b0f3637..00a699b9f76b 100644
> > > > --- a/drivers/vfio/vfio_main.c
> > > > +++ b/drivers/vfio/vfio_main.c
> > > > @@ -265,6 +265,18 @@ static int vfio_init_device(struct vfio_device *device,
> struct
> > > device *dev,
> > > >  	return ret;
> > > >  }
> > > >
> > > > +static int vfio_device_set_noiommu(struct vfio_device *device)
> > > > +{
> > > > +	struct iommu_group *iommu_group = iommu_group_get(device->dev);
> > > > +
> > > > +	if (!iommu_group && !vfio_noiommu)
> > > > +		return -EINVAL;
> > > > +
> > > > +	device->noiommu = !iommu_group;
> > > > +	iommu_group_put(iommu_group); /* Accepts NULL */
> > > > +	return 0;
> > > > +}
> > > > +
> > > >  static int __vfio_register_dev(struct vfio_device *device,
> > > >  			       enum vfio_group_type type)
> > > >  {
> > > > @@ -277,6 +289,13 @@ static int __vfio_register_dev(struct vfio_device *device,
> > > >  		     !device->ops->detach_ioas)))
> > > >  		return -EINVAL;
> > > >
> > > > +	/* Only physical devices can be noiommu device */
> > > > +	if (type == VFIO_IOMMU) {
> > > > +		ret = vfio_device_set_noiommu(device);
> > > > +		if (ret)
> > > > +			return ret;
> > > > +	}
> > > > +
> > > >  	/*
> > > >  	 * If the driver doesn't specify a set then the device is added to a
> > > >  	 * singleton set just for itself.
> > > > @@ -288,7 +307,8 @@ static int __vfio_register_dev(struct vfio_device *device,
> > > >  	if (ret)
> > > >  		return ret;
> > > >
> > > > -	ret = vfio_device_set_group(device, type);
> > > > +	ret = vfio_device_set_group(device,
> > > > +				    device->noiommu ? VFIO_NO_IOMMU : type);
> > > >  	if (ret)
> > > >  		return ret;
> > > >
> > > > @@ -301,6 +321,15 @@ static int __vfio_register_dev(struct vfio_device *device,
> > > >
> > > >  	vfio_device_group_register(device);
> > > >
> > > > +	if (device->noiommu) {
> > > > +		/*
> > > > +		 * noiommu deivces have no IOMMU hardware/driver.  Taint the
> > > > +		 * kernel because we're about to give a DMA capable device to
> > > > +		 * a user without IOMMU protection.
> > > > +		 */
> > > > +		add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> > > > +		dev_warn(device->dev, "Adding kernel taint for vfio-noiommu on
> > > device\n");
> > > > +	}
> > > >  	return 0;
> > > >  err_out:
> > > >  	vfio_device_remove_group(device);
> > > > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > > > index e80a8ac86e46..183e620009e7 100644
> > > > --- a/include/linux/vfio.h
> > > > +++ b/include/linux/vfio.h
> > > > @@ -67,6 +67,7 @@ struct vfio_device {
> > > >  	bool iommufd_attached;
> > > >  #endif
> > > >  	bool cdev_opened:1;
> > > > +	bool noiommu:1;
> > > >  };
> > > >
> > > >  /**
> >


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
@ 2023-06-13 14:33           ` Liu, Yi L
  0 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-13 14:33 UTC (permalink / raw)
  To: Alex Williamson
  Cc: mjrosato, jasowang, Hao, Xudong, Duan,  Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, jgg, Tian, Kevin, Zhao, Yan Y, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, June 13, 2023 10:19 PM
> 
> On Tue, 13 Jun 2023 05:53:42 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> 
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Tuesday, June 13, 2023 6:42 AM
> > >
> > > On Fri,  2 Jun 2023 05:16:50 -0700
> > > Yi Liu <yi.l.liu@intel.com> wrote:
> > >
> > > > This moves the noiommu device determination and noiommu taint out of
> > > > vfio_group_find_or_alloc(). noiommu device is determined in
> > > > __vfio_register_dev() and result is stored in flag vfio_device->noiommu,
> > > > the noiommu taint is added in the end of __vfio_register_dev().
> > > >
> > > > This is also a preparation for compiling out vfio_group infrastructure
> > > > as it makes the noiommu detection and taint common between the cdev path
> > > > and group path though cdev path does not support noiommu.
> > >
> > > Does this really still make sense?  The motivation for the change is
> > > really not clear without cdev support for noiommu.  Thanks,
> >
> > I think it still makes sense. When CONFIG_VFIO_GROUP==n, the kernel
> > only supports cdev interface. If there is noiommu device, vfio should
> > fail the registration. So, the noiommu determination is still needed. But
> > I'd admit the taint might still be in the group code.
> 
> How is there going to be a noiommu device when VFIO_GROUP is unset?

How about booting a kernel with iommu disabled, then all the devices
are not protected by iommu. I suppose they are noiommu devices. If
user wants to bound them to vfio, the kernel should have VFIO_GROUP.
Otherwise, needs to fail.

Regards,
Yi Liu

> Thanks,
> 
> Alex
> 
> 
> > > > Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> > > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > > ---
> > > >  drivers/vfio/group.c     | 15 ---------------
> > > >  drivers/vfio/vfio_main.c | 31 ++++++++++++++++++++++++++++++-
> > > >  include/linux/vfio.h     |  1 +
> > > >  3 files changed, 31 insertions(+), 16 deletions(-)
> > > >
> > > > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > > > index 653b62f93474..64cdd0ea8825 100644
> > > > --- a/drivers/vfio/group.c
> > > > +++ b/drivers/vfio/group.c
> > > > @@ -668,21 +668,6 @@ static struct vfio_group *vfio_group_find_or_alloc(struct
> > > device *dev)
> > > >  	struct vfio_group *group;
> > > >
> > > >  	iommu_group = iommu_group_get(dev);
> > > > -	if (!iommu_group && vfio_noiommu) {
> > > > -		/*
> > > > -		 * With noiommu enabled, create an IOMMU group for devices that
> > > > -		 * don't already have one, implying no IOMMU hardware/driver
> > > > -		 * exists.  Taint the kernel because we're about to give a DMA
> > > > -		 * capable device to a user without IOMMU protection.
> > > > -		 */
> > > > -		group = vfio_noiommu_group_alloc(dev, VFIO_NO_IOMMU);
> > > > -		if (!IS_ERR(group)) {
> > > > -			add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> > > > -			dev_warn(dev, "Adding kernel taint for vfio-noiommu group on
> > > device\n");
> > > > -		}
> > > > -		return group;
> > > > -	}
> > > > -
> > > >  	if (!iommu_group)
> > > >  		return ERR_PTR(-EINVAL);
> > > >
> > > > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > > > index 6d8f9b0f3637..00a699b9f76b 100644
> > > > --- a/drivers/vfio/vfio_main.c
> > > > +++ b/drivers/vfio/vfio_main.c
> > > > @@ -265,6 +265,18 @@ static int vfio_init_device(struct vfio_device *device,
> struct
> > > device *dev,
> > > >  	return ret;
> > > >  }
> > > >
> > > > +static int vfio_device_set_noiommu(struct vfio_device *device)
> > > > +{
> > > > +	struct iommu_group *iommu_group = iommu_group_get(device->dev);
> > > > +
> > > > +	if (!iommu_group && !vfio_noiommu)
> > > > +		return -EINVAL;
> > > > +
> > > > +	device->noiommu = !iommu_group;
> > > > +	iommu_group_put(iommu_group); /* Accepts NULL */
> > > > +	return 0;
> > > > +}
> > > > +
> > > >  static int __vfio_register_dev(struct vfio_device *device,
> > > >  			       enum vfio_group_type type)
> > > >  {
> > > > @@ -277,6 +289,13 @@ static int __vfio_register_dev(struct vfio_device *device,
> > > >  		     !device->ops->detach_ioas)))
> > > >  		return -EINVAL;
> > > >
> > > > +	/* Only physical devices can be noiommu device */
> > > > +	if (type == VFIO_IOMMU) {
> > > > +		ret = vfio_device_set_noiommu(device);
> > > > +		if (ret)
> > > > +			return ret;
> > > > +	}
> > > > +
> > > >  	/*
> > > >  	 * If the driver doesn't specify a set then the device is added to a
> > > >  	 * singleton set just for itself.
> > > > @@ -288,7 +307,8 @@ static int __vfio_register_dev(struct vfio_device *device,
> > > >  	if (ret)
> > > >  		return ret;
> > > >
> > > > -	ret = vfio_device_set_group(device, type);
> > > > +	ret = vfio_device_set_group(device,
> > > > +				    device->noiommu ? VFIO_NO_IOMMU : type);
> > > >  	if (ret)
> > > >  		return ret;
> > > >
> > > > @@ -301,6 +321,15 @@ static int __vfio_register_dev(struct vfio_device *device,
> > > >
> > > >  	vfio_device_group_register(device);
> > > >
> > > > +	if (device->noiommu) {
> > > > +		/*
> > > > +		 * noiommu deivces have no IOMMU hardware/driver.  Taint the
> > > > +		 * kernel because we're about to give a DMA capable device to
> > > > +		 * a user without IOMMU protection.
> > > > +		 */
> > > > +		add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> > > > +		dev_warn(device->dev, "Adding kernel taint for vfio-noiommu on
> > > device\n");
> > > > +	}
> > > >  	return 0;
> > > >  err_out:
> > > >  	vfio_device_remove_group(device);
> > > > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > > > index e80a8ac86e46..183e620009e7 100644
> > > > --- a/include/linux/vfio.h
> > > > +++ b/include/linux/vfio.h
> > > > @@ -67,6 +67,7 @@ struct vfio_device {
> > > >  	bool iommufd_attached;
> > > >  #endif
> > > >  	bool cdev_opened:1;
> > > > +	bool noiommu:1;
> > > >  };
> > > >
> > > >  /**
> >


^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [PATCH v12 07/24] vfio: Block device access via device fd until device is opened
  2023-06-13 14:16         ` Alex Williamson
@ 2023-06-13 14:36           ` Liu, Yi L
  -1 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-13 14:36 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, June 13, 2023 10:17 PM
> 
> On Tue, 13 Jun 2023 05:46:32 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> 
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Tuesday, June 13, 2023 5:52 AM
> > >
> > > On Fri,  2 Jun 2023 05:16:36 -0700
> > > Yi Liu <yi.l.liu@intel.com> wrote:
> > >
> > > > Allow the vfio_device file to be in a state where the device FD is
> > > > opened but the device cannot be used by userspace (i.e. its .open_device()
> > > > hasn't been called). This inbetween state is not used when the device
> > > > FD is spawned from the group FD, however when we create the device FD
> > > > directly by opening a cdev it will be opened in the blocked state.
> > > >
> > > > The reason for the inbetween state is that userspace only gets a FD but
> > > > doesn't gain access permission until binding the FD to an iommufd. So in
> > > > the blocked state, only the bind operation is allowed. Completing bind
> > > > will allow user to further access the device.
> > > >
> > > > This is implemented by adding a flag in struct vfio_device_file to mark
> > > > the blocked state and using a simple smp_load_acquire() to obtain the
> > > > flag value and serialize all the device setup with the thread accessing
> > > > this device.
> > > >
> > > > Following this lockless scheme, it can safely handle the device FD
> > > > unbound->bound but it cannot handle bound->unbound. To allow this we'd
> > > > need to add a lock on all the vfio ioctls which seems costly. So once
> > > > device FD is bound, it remains bound until the FD is closed.
> > > >
> > > > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > > > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > > > Reviewed-by: Eric Auger <eric.auger@redhat.com>
> > > > Tested-by: Terrence Xu <terrence.xu@intel.com>
> > > > Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> > > > Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
> > > > Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> > > > Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > > ---
> > > >  drivers/vfio/group.c     | 11 ++++++++++-
> > > >  drivers/vfio/vfio.h      |  1 +
> > > >  drivers/vfio/vfio_main.c | 16 ++++++++++++++++
> > > >  3 files changed, 27 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > > > index caf53716ddb2..088dd34c8931 100644
> > > > --- a/drivers/vfio/group.c
> > > > +++ b/drivers/vfio/group.c
> > > > @@ -194,9 +194,18 @@ static int vfio_df_group_open(struct vfio_device_file *df)
> > > >  	df->iommufd = device->group->iommufd;
> > > >
> > > >  	ret = vfio_df_open(df);
> > > > -	if (ret)
> > > > +	if (ret) {
> > > >  		df->iommufd = NULL;
> > > > +		goto out_put_kvm;
> > > > +	}
> > > > +
> > > > +	/*
> > > > +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> > > > +	 * read/write/mmap and vfio_file_has_device_access()
> > > > +	 */
> > > > +	smp_store_release(&df->access_granted, true);
> > > >
> > > > +out_put_kvm:
> > > >  	if (device->open_count == 0)
> > > >  		vfio_device_put_kvm(device);
> > > >
> > > > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > > > index f9eb52eb9ed7..fdf2fc73f880 100644
> > > > --- a/drivers/vfio/vfio.h
> > > > +++ b/drivers/vfio/vfio.h
> > > > @@ -18,6 +18,7 @@ struct vfio_container;
> > > >
> > > >  struct vfio_device_file {
> > > >  	struct vfio_device *device;
> > > > +	bool access_granted;
> > >
> > > Should we make this a more strongly defined data type and later move
> > > devid (u32) here to partially fill the hole created?
> >
> > Before your question, let me describe how I place the fields
> > of this structure to see if it is common practice. The first two
> > fields are static, so they are in the beginning. The access_granted
> > is lockless and other fields are protected by locks. So I tried to
> > put the lock and the fields it protects closely. So this is why I put
> > devid behind iommufd as both are protected by the same lock.
> 
> I think the primary considerations are locality and compactness.  Hot
> paths data should be within the first cache line of the structure,
> related data should share a cache line, and we should use the space
> efficiently.  What you describe seems largely an aesthetic concern,
> which was not evident to me by the segmentation alone.

Sure.

> 
> > struct vfio_device_file {
> >         struct vfio_device *device;
> >         struct vfio_group *group;
> >
> >         bool access_granted;
> >         spinlock_t kvm_ref_lock; /* protect kvm field */
> >         struct kvm *kvm;
> >         struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
> >         u32 devid; /* only valid when iommufd is valid */
> > };
> >
> > >
> > > I think this is being placed towards the front of the data structure
> > > for cache line locality given this is a hot path for file operations.
> > > But bool types have an implementation dependent size, making them
> > > difficult to pack.  Also there will be a tendency to want to make this
> > > a bit field, which is probably not compatible with the smp lockless
> > > operations being used here.  We might get in front of these issues if
> > > we just define it as a u8 now.  Thanks,
> >
> > Not quite get why bit field is going to be incompatible with smp
> > lockless operations. Could you elaborate a bit? And should I define
> > the access_granted as u8 or "u8:1"?
> 
> Perhaps FUD on my part, but load-acquire type operations have specific
> semantics and it's not clear to me that they interest with compiler
> generated bit operations.  Thanks,

I see. How about below? 

struct vfio_device_file {
        struct vfio_device *device;
        struct vfio_group *group;
        u8 access_granted;
        u32 devid; /* only valid when iommufd is valid */
        spinlock_t kvm_ref_lock; /* protect kvm field */
        struct kvm *kvm;
        struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
};

Regards,
Yi Liu

> Alex
> 
> > > >  	spinlock_t kvm_ref_lock; /* protect kvm field */
> > > >  	struct kvm *kvm;
> > > >  	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
> > > > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > > > index a3c5817fc545..4c8b7713dc3d 100644
> > > > --- a/drivers/vfio/vfio_main.c
> > > > +++ b/drivers/vfio/vfio_main.c
> > > > @@ -1129,6 +1129,10 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
> > > >  	struct vfio_device *device = df->device;
> > > >  	int ret;
> > > >
> > > > +	/* Paired with smp_store_release() following vfio_df_open() */
> > > > +	if (!smp_load_acquire(&df->access_granted))
> > > > +		return -EINVAL;
> > > > +
> > > >  	ret = vfio_device_pm_runtime_get(device);
> > > >  	if (ret)
> > > >  		return ret;
> > > > @@ -1156,6 +1160,10 @@ static ssize_t vfio_device_fops_read(struct file *filep,
> char
> > > __user *buf,
> > > >  	struct vfio_device_file *df = filep->private_data;
> > > >  	struct vfio_device *device = df->device;
> > > >
> > > > +	/* Paired with smp_store_release() following vfio_df_open() */
> > > > +	if (!smp_load_acquire(&df->access_granted))
> > > > +		return -EINVAL;
> > > > +
> > > >  	if (unlikely(!device->ops->read))
> > > >  		return -EINVAL;
> > > >
> > > > @@ -1169,6 +1177,10 @@ static ssize_t vfio_device_fops_write(struct file *filep,
> > > >  	struct vfio_device_file *df = filep->private_data;
> > > >  	struct vfio_device *device = df->device;
> > > >
> > > > +	/* Paired with smp_store_release() following vfio_df_open() */
> > > > +	if (!smp_load_acquire(&df->access_granted))
> > > > +		return -EINVAL;
> > > > +
> > > >  	if (unlikely(!device->ops->write))
> > > >  		return -EINVAL;
> > > >
> > > > @@ -1180,6 +1192,10 @@ static int vfio_device_fops_mmap(struct file *filep,
> struct
> > > vm_area_struct *vma)
> > > >  	struct vfio_device_file *df = filep->private_data;
> > > >  	struct vfio_device *device = df->device;
> > > >
> > > > +	/* Paired with smp_store_release() following vfio_df_open() */
> > > > +	if (!smp_load_acquire(&df->access_granted))
> > > > +		return -EINVAL;
> > > > +
> > > >  	if (unlikely(!device->ops->mmap))
> > > >  		return -EINVAL;
> > > >
> >


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 07/24] vfio: Block device access via device fd until device is opened
@ 2023-06-13 14:36           ` Liu, Yi L
  0 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-13 14:36 UTC (permalink / raw)
  To: Alex Williamson
  Cc: mjrosato, jasowang, Hao, Xudong, Duan,  Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, jgg, Tian, Kevin, Zhao, Yan Y, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, June 13, 2023 10:17 PM
> 
> On Tue, 13 Jun 2023 05:46:32 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> 
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Tuesday, June 13, 2023 5:52 AM
> > >
> > > On Fri,  2 Jun 2023 05:16:36 -0700
> > > Yi Liu <yi.l.liu@intel.com> wrote:
> > >
> > > > Allow the vfio_device file to be in a state where the device FD is
> > > > opened but the device cannot be used by userspace (i.e. its .open_device()
> > > > hasn't been called). This inbetween state is not used when the device
> > > > FD is spawned from the group FD, however when we create the device FD
> > > > directly by opening a cdev it will be opened in the blocked state.
> > > >
> > > > The reason for the inbetween state is that userspace only gets a FD but
> > > > doesn't gain access permission until binding the FD to an iommufd. So in
> > > > the blocked state, only the bind operation is allowed. Completing bind
> > > > will allow user to further access the device.
> > > >
> > > > This is implemented by adding a flag in struct vfio_device_file to mark
> > > > the blocked state and using a simple smp_load_acquire() to obtain the
> > > > flag value and serialize all the device setup with the thread accessing
> > > > this device.
> > > >
> > > > Following this lockless scheme, it can safely handle the device FD
> > > > unbound->bound but it cannot handle bound->unbound. To allow this we'd
> > > > need to add a lock on all the vfio ioctls which seems costly. So once
> > > > device FD is bound, it remains bound until the FD is closed.
> > > >
> > > > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > > > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > > > Reviewed-by: Eric Auger <eric.auger@redhat.com>
> > > > Tested-by: Terrence Xu <terrence.xu@intel.com>
> > > > Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> > > > Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
> > > > Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> > > > Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > > ---
> > > >  drivers/vfio/group.c     | 11 ++++++++++-
> > > >  drivers/vfio/vfio.h      |  1 +
> > > >  drivers/vfio/vfio_main.c | 16 ++++++++++++++++
> > > >  3 files changed, 27 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > > > index caf53716ddb2..088dd34c8931 100644
> > > > --- a/drivers/vfio/group.c
> > > > +++ b/drivers/vfio/group.c
> > > > @@ -194,9 +194,18 @@ static int vfio_df_group_open(struct vfio_device_file *df)
> > > >  	df->iommufd = device->group->iommufd;
> > > >
> > > >  	ret = vfio_df_open(df);
> > > > -	if (ret)
> > > > +	if (ret) {
> > > >  		df->iommufd = NULL;
> > > > +		goto out_put_kvm;
> > > > +	}
> > > > +
> > > > +	/*
> > > > +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> > > > +	 * read/write/mmap and vfio_file_has_device_access()
> > > > +	 */
> > > > +	smp_store_release(&df->access_granted, true);
> > > >
> > > > +out_put_kvm:
> > > >  	if (device->open_count == 0)
> > > >  		vfio_device_put_kvm(device);
> > > >
> > > > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > > > index f9eb52eb9ed7..fdf2fc73f880 100644
> > > > --- a/drivers/vfio/vfio.h
> > > > +++ b/drivers/vfio/vfio.h
> > > > @@ -18,6 +18,7 @@ struct vfio_container;
> > > >
> > > >  struct vfio_device_file {
> > > >  	struct vfio_device *device;
> > > > +	bool access_granted;
> > >
> > > Should we make this a more strongly defined data type and later move
> > > devid (u32) here to partially fill the hole created?
> >
> > Before your question, let me describe how I place the fields
> > of this structure to see if it is common practice. The first two
> > fields are static, so they are in the beginning. The access_granted
> > is lockless and other fields are protected by locks. So I tried to
> > put the lock and the fields it protects closely. So this is why I put
> > devid behind iommufd as both are protected by the same lock.
> 
> I think the primary considerations are locality and compactness.  Hot
> paths data should be within the first cache line of the structure,
> related data should share a cache line, and we should use the space
> efficiently.  What you describe seems largely an aesthetic concern,
> which was not evident to me by the segmentation alone.

Sure.

> 
> > struct vfio_device_file {
> >         struct vfio_device *device;
> >         struct vfio_group *group;
> >
> >         bool access_granted;
> >         spinlock_t kvm_ref_lock; /* protect kvm field */
> >         struct kvm *kvm;
> >         struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
> >         u32 devid; /* only valid when iommufd is valid */
> > };
> >
> > >
> > > I think this is being placed towards the front of the data structure
> > > for cache line locality given this is a hot path for file operations.
> > > But bool types have an implementation dependent size, making them
> > > difficult to pack.  Also there will be a tendency to want to make this
> > > a bit field, which is probably not compatible with the smp lockless
> > > operations being used here.  We might get in front of these issues if
> > > we just define it as a u8 now.  Thanks,
> >
> > Not quite get why bit field is going to be incompatible with smp
> > lockless operations. Could you elaborate a bit? And should I define
> > the access_granted as u8 or "u8:1"?
> 
> Perhaps FUD on my part, but load-acquire type operations have specific
> semantics and it's not clear to me that they interest with compiler
> generated bit operations.  Thanks,

I see. How about below? 

struct vfio_device_file {
        struct vfio_device *device;
        struct vfio_group *group;
        u8 access_granted;
        u32 devid; /* only valid when iommufd is valid */
        spinlock_t kvm_ref_lock; /* protect kvm field */
        struct kvm *kvm;
        struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
};

Regards,
Yi Liu

> Alex
> 
> > > >  	spinlock_t kvm_ref_lock; /* protect kvm field */
> > > >  	struct kvm *kvm;
> > > >  	struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
> > > > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > > > index a3c5817fc545..4c8b7713dc3d 100644
> > > > --- a/drivers/vfio/vfio_main.c
> > > > +++ b/drivers/vfio/vfio_main.c
> > > > @@ -1129,6 +1129,10 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
> > > >  	struct vfio_device *device = df->device;
> > > >  	int ret;
> > > >
> > > > +	/* Paired with smp_store_release() following vfio_df_open() */
> > > > +	if (!smp_load_acquire(&df->access_granted))
> > > > +		return -EINVAL;
> > > > +
> > > >  	ret = vfio_device_pm_runtime_get(device);
> > > >  	if (ret)
> > > >  		return ret;
> > > > @@ -1156,6 +1160,10 @@ static ssize_t vfio_device_fops_read(struct file *filep,
> char
> > > __user *buf,
> > > >  	struct vfio_device_file *df = filep->private_data;
> > > >  	struct vfio_device *device = df->device;
> > > >
> > > > +	/* Paired with smp_store_release() following vfio_df_open() */
> > > > +	if (!smp_load_acquire(&df->access_granted))
> > > > +		return -EINVAL;
> > > > +
> > > >  	if (unlikely(!device->ops->read))
> > > >  		return -EINVAL;
> > > >
> > > > @@ -1169,6 +1177,10 @@ static ssize_t vfio_device_fops_write(struct file *filep,
> > > >  	struct vfio_device_file *df = filep->private_data;
> > > >  	struct vfio_device *device = df->device;
> > > >
> > > > +	/* Paired with smp_store_release() following vfio_df_open() */
> > > > +	if (!smp_load_acquire(&df->access_granted))
> > > > +		return -EINVAL;
> > > > +
> > > >  	if (unlikely(!device->ops->write))
> > > >  		return -EINVAL;
> > > >
> > > > @@ -1180,6 +1192,10 @@ static int vfio_device_fops_mmap(struct file *filep,
> struct
> > > vm_area_struct *vma)
> > > >  	struct vfio_device_file *df = filep->private_data;
> > > >  	struct vfio_device *device = df->device;
> > > >
> > > > +	/* Paired with smp_store_release() following vfio_df_open() */
> > > > +	if (!smp_load_acquire(&df->access_granted))
> > > > +		return -EINVAL;
> > > > +
> > > >  	if (unlikely(!device->ops->mmap))
> > > >  		return -EINVAL;
> > > >
> >


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-06-13 14:28           ` [Intel-gfx] " Liu, Yi L
@ 2023-06-13 14:39             ` Alex Williamson
  -1 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-13 14:39 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato, jasowang, Hao, Xudong, Duan, Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, jgg, Tian, Kevin, Zhao,  Yan Y, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Tue, 13 Jun 2023 14:28:43 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Tuesday, June 13, 2023 10:18 PM  
> 
> > > > > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > > > > index 83cc5dc28b7a..e80a8ac86e46 100644
> > > > > --- a/include/linux/vfio.h
> > > > > +++ b/include/linux/vfio.h
> > > > > @@ -66,6 +66,7 @@ struct vfio_device {
> > > > >  	struct iommufd_device *iommufd_device;
> > > > >  	bool iommufd_attached;
> > > > >  #endif
> > > > > +	bool cdev_opened:1;  
> > > >
> > > > Perhaps a more strongly defined data type here as well and roll
> > > > iommufd_attached into the same bit field scheme.  
> > >
> > > Ok, then needs to make iommufd_attached always defined.  
> > 
> > That does not follow.  Thanks,  
> 
> Well, I meant the iommufd_attached now is defined only when
> CONFIG_IOMMUFD is enabled. To toll it with cdev_opened, needs
> to change this.

Understood, but I don't think it's true.  If defined we use one more
bit of the bit field, which is a consideration when we approach filling
it, but we're not using bit-shift operations to address these bits, so
why does it matter if one has compiler conditional usage?  Thanks,

Alex


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
@ 2023-06-13 14:39             ` Alex Williamson
  0 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-13 14:39 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

On Tue, 13 Jun 2023 14:28:43 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Tuesday, June 13, 2023 10:18 PM  
> 
> > > > > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > > > > index 83cc5dc28b7a..e80a8ac86e46 100644
> > > > > --- a/include/linux/vfio.h
> > > > > +++ b/include/linux/vfio.h
> > > > > @@ -66,6 +66,7 @@ struct vfio_device {
> > > > >  	struct iommufd_device *iommufd_device;
> > > > >  	bool iommufd_attached;
> > > > >  #endif
> > > > > +	bool cdev_opened:1;  
> > > >
> > > > Perhaps a more strongly defined data type here as well and roll
> > > > iommufd_attached into the same bit field scheme.  
> > >
> > > Ok, then needs to make iommufd_attached always defined.  
> > 
> > That does not follow.  Thanks,  
> 
> Well, I meant the iommufd_attached now is defined only when
> CONFIG_IOMMUFD is enabled. To toll it with cdev_opened, needs
> to change this.

Understood, but I don't think it's true.  If defined we use one more
bit of the bit field, which is a consideration when we approach filling
it, but we're not using bit-shift operations to address these bits, so
why does it matter if one has compiler conditional usage?  Thanks,

Alex


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 07/24] vfio: Block device access via device fd until device is opened
  2023-06-13 14:36           ` [Intel-gfx] " Liu, Yi L
@ 2023-06-13 14:42             ` Alex Williamson
  -1 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-13 14:42 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato, jasowang, Hao, Xudong, Duan, Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, jgg, Tian, Kevin, Zhao,  Yan Y, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Tue, 13 Jun 2023 14:36:14 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Tuesday, June 13, 2023 10:17 PM
> > 
> > On Tue, 13 Jun 2023 05:46:32 +0000
> > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >   
> > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > Sent: Tuesday, June 13, 2023 5:52 AM
> > > >
> > > > On Fri,  2 Jun 2023 05:16:36 -0700
> > > > Yi Liu <yi.l.liu@intel.com> wrote:
> > > >  
> > > > > Allow the vfio_device file to be in a state where the device FD is
> > > > > opened but the device cannot be used by userspace (i.e. its .open_device()
> > > > > hasn't been called). This inbetween state is not used when the device
> > > > > FD is spawned from the group FD, however when we create the device FD
> > > > > directly by opening a cdev it will be opened in the blocked state.
> > > > >
> > > > > The reason for the inbetween state is that userspace only gets a FD but
> > > > > doesn't gain access permission until binding the FD to an iommufd. So in
> > > > > the blocked state, only the bind operation is allowed. Completing bind
> > > > > will allow user to further access the device.
> > > > >
> > > > > This is implemented by adding a flag in struct vfio_device_file to mark
> > > > > the blocked state and using a simple smp_load_acquire() to obtain the
> > > > > flag value and serialize all the device setup with the thread accessing
> > > > > this device.
> > > > >
> > > > > Following this lockless scheme, it can safely handle the device FD
> > > > > unbound->bound but it cannot handle bound->unbound. To allow this we'd
> > > > > need to add a lock on all the vfio ioctls which seems costly. So once
> > > > > device FD is bound, it remains bound until the FD is closed.
> > > > >
> > > > > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > > > > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > > > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > > > > Reviewed-by: Eric Auger <eric.auger@redhat.com>
> > > > > Tested-by: Terrence Xu <terrence.xu@intel.com>
> > > > > Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> > > > > Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
> > > > > Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> > > > > Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > > > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > > > ---
> > > > >  drivers/vfio/group.c     | 11 ++++++++++-
> > > > >  drivers/vfio/vfio.h      |  1 +
> > > > >  drivers/vfio/vfio_main.c | 16 ++++++++++++++++
> > > > >  3 files changed, 27 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > > > > index caf53716ddb2..088dd34c8931 100644
> > > > > --- a/drivers/vfio/group.c
> > > > > +++ b/drivers/vfio/group.c
> > > > > @@ -194,9 +194,18 @@ static int vfio_df_group_open(struct vfio_device_file *df)
> > > > >  	df->iommufd = device->group->iommufd;
> > > > >
> > > > >  	ret = vfio_df_open(df);
> > > > > -	if (ret)
> > > > > +	if (ret) {
> > > > >  		df->iommufd = NULL;
> > > > > +		goto out_put_kvm;
> > > > > +	}
> > > > > +
> > > > > +	/*
> > > > > +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> > > > > +	 * read/write/mmap and vfio_file_has_device_access()
> > > > > +	 */
> > > > > +	smp_store_release(&df->access_granted, true);
> > > > >
> > > > > +out_put_kvm:
> > > > >  	if (device->open_count == 0)
> > > > >  		vfio_device_put_kvm(device);
> > > > >
> > > > > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > > > > index f9eb52eb9ed7..fdf2fc73f880 100644
> > > > > --- a/drivers/vfio/vfio.h
> > > > > +++ b/drivers/vfio/vfio.h
> > > > > @@ -18,6 +18,7 @@ struct vfio_container;
> > > > >
> > > > >  struct vfio_device_file {
> > > > >  	struct vfio_device *device;
> > > > > +	bool access_granted;  
> > > >
> > > > Should we make this a more strongly defined data type and later move
> > > > devid (u32) here to partially fill the hole created?  
> > >
> > > Before your question, let me describe how I place the fields
> > > of this structure to see if it is common practice. The first two
> > > fields are static, so they are in the beginning. The access_granted
> > > is lockless and other fields are protected by locks. So I tried to
> > > put the lock and the fields it protects closely. So this is why I put
> > > devid behind iommufd as both are protected by the same lock.  
> > 
> > I think the primary considerations are locality and compactness.  Hot
> > paths data should be within the first cache line of the structure,
> > related data should share a cache line, and we should use the space
> > efficiently.  What you describe seems largely an aesthetic concern,
> > which was not evident to me by the segmentation alone.  
> 
> Sure.
> 
> >   
> > > struct vfio_device_file {
> > >         struct vfio_device *device;
> > >         struct vfio_group *group;
> > >
> > >         bool access_granted;
> > >         spinlock_t kvm_ref_lock; /* protect kvm field */
> > >         struct kvm *kvm;
> > >         struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
> > >         u32 devid; /* only valid when iommufd is valid */
> > > };
> > >  
> > > >
> > > > I think this is being placed towards the front of the data structure
> > > > for cache line locality given this is a hot path for file operations.
> > > > But bool types have an implementation dependent size, making them
> > > > difficult to pack.  Also there will be a tendency to want to make this
> > > > a bit field, which is probably not compatible with the smp lockless
> > > > operations being used here.  We might get in front of these issues if
> > > > we just define it as a u8 now.  Thanks,  
> > >
> > > Not quite get why bit field is going to be incompatible with smp
> > > lockless operations. Could you elaborate a bit? And should I define
> > > the access_granted as u8 or "u8:1"?  
> > 
> > Perhaps FUD on my part, but load-acquire type operations have specific
> > semantics and it's not clear to me that they interest with compiler
> > generated bit operations.  Thanks,  
> 
> I see. How about below? 
> 
> struct vfio_device_file {
>         struct vfio_device *device;
>         struct vfio_group *group;
>         u8 access_granted;
>         u32 devid; /* only valid when iommufd is valid */
>         spinlock_t kvm_ref_lock; /* protect kvm field */
>         struct kvm *kvm;
>         struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
> };

Yep, that's essentially what I was suggesting.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 07/24] vfio: Block device access via device fd until device is opened
@ 2023-06-13 14:42             ` Alex Williamson
  0 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-13 14:42 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

On Tue, 13 Jun 2023 14:36:14 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Tuesday, June 13, 2023 10:17 PM
> > 
> > On Tue, 13 Jun 2023 05:46:32 +0000
> > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >   
> > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > Sent: Tuesday, June 13, 2023 5:52 AM
> > > >
> > > > On Fri,  2 Jun 2023 05:16:36 -0700
> > > > Yi Liu <yi.l.liu@intel.com> wrote:
> > > >  
> > > > > Allow the vfio_device file to be in a state where the device FD is
> > > > > opened but the device cannot be used by userspace (i.e. its .open_device()
> > > > > hasn't been called). This inbetween state is not used when the device
> > > > > FD is spawned from the group FD, however when we create the device FD
> > > > > directly by opening a cdev it will be opened in the blocked state.
> > > > >
> > > > > The reason for the inbetween state is that userspace only gets a FD but
> > > > > doesn't gain access permission until binding the FD to an iommufd. So in
> > > > > the blocked state, only the bind operation is allowed. Completing bind
> > > > > will allow user to further access the device.
> > > > >
> > > > > This is implemented by adding a flag in struct vfio_device_file to mark
> > > > > the blocked state and using a simple smp_load_acquire() to obtain the
> > > > > flag value and serialize all the device setup with the thread accessing
> > > > > this device.
> > > > >
> > > > > Following this lockless scheme, it can safely handle the device FD
> > > > > unbound->bound but it cannot handle bound->unbound. To allow this we'd
> > > > > need to add a lock on all the vfio ioctls which seems costly. So once
> > > > > device FD is bound, it remains bound until the FD is closed.
> > > > >
> > > > > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > > > > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > > > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > > > > Reviewed-by: Eric Auger <eric.auger@redhat.com>
> > > > > Tested-by: Terrence Xu <terrence.xu@intel.com>
> > > > > Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> > > > > Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
> > > > > Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> > > > > Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > > > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > > > ---
> > > > >  drivers/vfio/group.c     | 11 ++++++++++-
> > > > >  drivers/vfio/vfio.h      |  1 +
> > > > >  drivers/vfio/vfio_main.c | 16 ++++++++++++++++
> > > > >  3 files changed, 27 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > > > > index caf53716ddb2..088dd34c8931 100644
> > > > > --- a/drivers/vfio/group.c
> > > > > +++ b/drivers/vfio/group.c
> > > > > @@ -194,9 +194,18 @@ static int vfio_df_group_open(struct vfio_device_file *df)
> > > > >  	df->iommufd = device->group->iommufd;
> > > > >
> > > > >  	ret = vfio_df_open(df);
> > > > > -	if (ret)
> > > > > +	if (ret) {
> > > > >  		df->iommufd = NULL;
> > > > > +		goto out_put_kvm;
> > > > > +	}
> > > > > +
> > > > > +	/*
> > > > > +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> > > > > +	 * read/write/mmap and vfio_file_has_device_access()
> > > > > +	 */
> > > > > +	smp_store_release(&df->access_granted, true);
> > > > >
> > > > > +out_put_kvm:
> > > > >  	if (device->open_count == 0)
> > > > >  		vfio_device_put_kvm(device);
> > > > >
> > > > > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > > > > index f9eb52eb9ed7..fdf2fc73f880 100644
> > > > > --- a/drivers/vfio/vfio.h
> > > > > +++ b/drivers/vfio/vfio.h
> > > > > @@ -18,6 +18,7 @@ struct vfio_container;
> > > > >
> > > > >  struct vfio_device_file {
> > > > >  	struct vfio_device *device;
> > > > > +	bool access_granted;  
> > > >
> > > > Should we make this a more strongly defined data type and later move
> > > > devid (u32) here to partially fill the hole created?  
> > >
> > > Before your question, let me describe how I place the fields
> > > of this structure to see if it is common practice. The first two
> > > fields are static, so they are in the beginning. The access_granted
> > > is lockless and other fields are protected by locks. So I tried to
> > > put the lock and the fields it protects closely. So this is why I put
> > > devid behind iommufd as both are protected by the same lock.  
> > 
> > I think the primary considerations are locality and compactness.  Hot
> > paths data should be within the first cache line of the structure,
> > related data should share a cache line, and we should use the space
> > efficiently.  What you describe seems largely an aesthetic concern,
> > which was not evident to me by the segmentation alone.  
> 
> Sure.
> 
> >   
> > > struct vfio_device_file {
> > >         struct vfio_device *device;
> > >         struct vfio_group *group;
> > >
> > >         bool access_granted;
> > >         spinlock_t kvm_ref_lock; /* protect kvm field */
> > >         struct kvm *kvm;
> > >         struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
> > >         u32 devid; /* only valid when iommufd is valid */
> > > };
> > >  
> > > >
> > > > I think this is being placed towards the front of the data structure
> > > > for cache line locality given this is a hot path for file operations.
> > > > But bool types have an implementation dependent size, making them
> > > > difficult to pack.  Also there will be a tendency to want to make this
> > > > a bit field, which is probably not compatible with the smp lockless
> > > > operations being used here.  We might get in front of these issues if
> > > > we just define it as a u8 now.  Thanks,  
> > >
> > > Not quite get why bit field is going to be incompatible with smp
> > > lockless operations. Could you elaborate a bit? And should I define
> > > the access_granted as u8 or "u8:1"?  
> > 
> > Perhaps FUD on my part, but load-acquire type operations have specific
> > semantics and it's not clear to me that they interest with compiler
> > generated bit operations.  Thanks,  
> 
> I see. How about below? 
> 
> struct vfio_device_file {
>         struct vfio_device *device;
>         struct vfio_group *group;
>         u8 access_granted;
>         u32 devid; /* only valid when iommufd is valid */
>         spinlock_t kvm_ref_lock; /* protect kvm field */
>         struct kvm *kvm;
>         struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
> };

Yep, that's essentially what I was suggesting.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-06-13 14:39             ` Alex Williamson
@ 2023-06-13 14:42               ` Liu, Yi L
  -1 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-13 14:42 UTC (permalink / raw)
  To: Alex Williamson
  Cc: mjrosato, jasowang, Hao, Xudong, Duan,  Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, jgg, Tian, Kevin, Zhao, Yan Y, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, June 13, 2023 10:40 PM
> 
> On Tue, 13 Jun 2023 14:28:43 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> 
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Tuesday, June 13, 2023 10:18 PM
> >
> > > > > > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > > > > > index 83cc5dc28b7a..e80a8ac86e46 100644
> > > > > > --- a/include/linux/vfio.h
> > > > > > +++ b/include/linux/vfio.h
> > > > > > @@ -66,6 +66,7 @@ struct vfio_device {
> > > > > >  	struct iommufd_device *iommufd_device;
> > > > > >  	bool iommufd_attached;
> > > > > >  #endif
> > > > > > +	bool cdev_opened:1;
> > > > >
> > > > > Perhaps a more strongly defined data type here as well and roll
> > > > > iommufd_attached into the same bit field scheme.
> > > >
> > > > Ok, then needs to make iommufd_attached always defined.
> > >
> > > That does not follow.  Thanks,
> >
> > Well, I meant the iommufd_attached now is defined only when
> > CONFIG_IOMMUFD is enabled. To toll it with cdev_opened, needs
> > to change this.
> 
> Understood, but I don't think it's true.  If defined we use one more
> bit of the bit field, which is a consideration when we approach filling
> it, but we're not using bit-shift operations to address these bits, so
> why does it matter if one has compiler conditional usage?  Thanks,

Aha, I see. So you are suggesting something like the below. Is it?

#if IS_ENABLED(CONFIG_IOMMUFD)
	struct iommufd_device *iommufd_device;
	u8 iommufd_attached:1;
#endif
	u8 cdev_opened:1;

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
@ 2023-06-13 14:42               ` Liu, Yi L
  0 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-13 14:42 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, June 13, 2023 10:40 PM
> 
> On Tue, 13 Jun 2023 14:28:43 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> 
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Tuesday, June 13, 2023 10:18 PM
> >
> > > > > > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > > > > > index 83cc5dc28b7a..e80a8ac86e46 100644
> > > > > > --- a/include/linux/vfio.h
> > > > > > +++ b/include/linux/vfio.h
> > > > > > @@ -66,6 +66,7 @@ struct vfio_device {
> > > > > >  	struct iommufd_device *iommufd_device;
> > > > > >  	bool iommufd_attached;
> > > > > >  #endif
> > > > > > +	bool cdev_opened:1;
> > > > >
> > > > > Perhaps a more strongly defined data type here as well and roll
> > > > > iommufd_attached into the same bit field scheme.
> > > >
> > > > Ok, then needs to make iommufd_attached always defined.
> > >
> > > That does not follow.  Thanks,
> >
> > Well, I meant the iommufd_attached now is defined only when
> > CONFIG_IOMMUFD is enabled. To toll it with cdev_opened, needs
> > to change this.
> 
> Understood, but I don't think it's true.  If defined we use one more
> bit of the bit field, which is a consideration when we approach filling
> it, but we're not using bit-shift operations to address these bits, so
> why does it matter if one has compiler conditional usage?  Thanks,

Aha, I see. So you are suggesting something like the below. Is it?

#if IS_ENABLED(CONFIG_IOMMUFD)
	struct iommufd_device *iommufd_device;
	u8 iommufd_attached:1;
#endif
	u8 cdev_opened:1;

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [PATCH v12 07/24] vfio: Block device access via device fd until device is opened
  2023-06-13 14:42             ` Alex Williamson
@ 2023-06-13 14:44               ` Liu, Yi L
  -1 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-13 14:44 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, June 13, 2023 10:42 PM
> 
> On Tue, 13 Jun 2023 14:36:14 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > > > > >
> > > > > > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > > > > > index f9eb52eb9ed7..fdf2fc73f880 100644
> > > > > > --- a/drivers/vfio/vfio.h
> > > > > > +++ b/drivers/vfio/vfio.h
> > > > > > @@ -18,6 +18,7 @@ struct vfio_container;
> > > > > >
> > > > > >  struct vfio_device_file {
> > > > > >  	struct vfio_device *device;
> > > > > > +	bool access_granted;
> > > > >
> > > > > Should we make this a more strongly defined data type and later move
> > > > > devid (u32) here to partially fill the hole created?
> > > >
> > > > Before your question, let me describe how I place the fields
> > > > of this structure to see if it is common practice. The first two
> > > > fields are static, so they are in the beginning. The access_granted
> > > > is lockless and other fields are protected by locks. So I tried to
> > > > put the lock and the fields it protects closely. So this is why I put
> > > > devid behind iommufd as both are protected by the same lock.
> > >
> > > I think the primary considerations are locality and compactness.  Hot
> > > paths data should be within the first cache line of the structure,
> > > related data should share a cache line, and we should use the space
> > > efficiently.  What you describe seems largely an aesthetic concern,
> > > which was not evident to me by the segmentation alone.
> >
> > Sure.
> >
> > >
> > > > struct vfio_device_file {
> > > >         struct vfio_device *device;
> > > >         struct vfio_group *group;
> > > >
> > > >         bool access_granted;
> > > >         spinlock_t kvm_ref_lock; /* protect kvm field */
> > > >         struct kvm *kvm;
> > > >         struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
> > > >         u32 devid; /* only valid when iommufd is valid */
> > > > };
> > > >
> > > > >
> > > > > I think this is being placed towards the front of the data structure
> > > > > for cache line locality given this is a hot path for file operations.
> > > > > But bool types have an implementation dependent size, making them
> > > > > difficult to pack.  Also there will be a tendency to want to make this
> > > > > a bit field, which is probably not compatible with the smp lockless
> > > > > operations being used here.  We might get in front of these issues if
> > > > > we just define it as a u8 now.  Thanks,
> > > >
> > > > Not quite get why bit field is going to be incompatible with smp
> > > > lockless operations. Could you elaborate a bit? And should I define
> > > > the access_granted as u8 or "u8:1"?
> > >
> > > Perhaps FUD on my part, but load-acquire type operations have specific
> > > semantics and it's not clear to me that they interest with compiler
> > > generated bit operations.  Thanks,
> >
> > I see. How about below?
> >
> > struct vfio_device_file {
> >         struct vfio_device *device;
> >         struct vfio_group *group;
> >         u8 access_granted;
> >         u32 devid; /* only valid when iommufd is valid */
> >         spinlock_t kvm_ref_lock; /* protect kvm field */
> >         struct kvm *kvm;
> >         struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
> > };
> 
> Yep, that's essentially what I was suggesting.  Thanks,

Got it. 😊

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 07/24] vfio: Block device access via device fd until device is opened
@ 2023-06-13 14:44               ` Liu, Yi L
  0 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-13 14:44 UTC (permalink / raw)
  To: Alex Williamson
  Cc: mjrosato, jasowang, Hao, Xudong, Duan,  Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, jgg, Tian, Kevin, Zhao, Yan Y, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, June 13, 2023 10:42 PM
> 
> On Tue, 13 Jun 2023 14:36:14 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > > > > >
> > > > > > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > > > > > index f9eb52eb9ed7..fdf2fc73f880 100644
> > > > > > --- a/drivers/vfio/vfio.h
> > > > > > +++ b/drivers/vfio/vfio.h
> > > > > > @@ -18,6 +18,7 @@ struct vfio_container;
> > > > > >
> > > > > >  struct vfio_device_file {
> > > > > >  	struct vfio_device *device;
> > > > > > +	bool access_granted;
> > > > >
> > > > > Should we make this a more strongly defined data type and later move
> > > > > devid (u32) here to partially fill the hole created?
> > > >
> > > > Before your question, let me describe how I place the fields
> > > > of this structure to see if it is common practice. The first two
> > > > fields are static, so they are in the beginning. The access_granted
> > > > is lockless and other fields are protected by locks. So I tried to
> > > > put the lock and the fields it protects closely. So this is why I put
> > > > devid behind iommufd as both are protected by the same lock.
> > >
> > > I think the primary considerations are locality and compactness.  Hot
> > > paths data should be within the first cache line of the structure,
> > > related data should share a cache line, and we should use the space
> > > efficiently.  What you describe seems largely an aesthetic concern,
> > > which was not evident to me by the segmentation alone.
> >
> > Sure.
> >
> > >
> > > > struct vfio_device_file {
> > > >         struct vfio_device *device;
> > > >         struct vfio_group *group;
> > > >
> > > >         bool access_granted;
> > > >         spinlock_t kvm_ref_lock; /* protect kvm field */
> > > >         struct kvm *kvm;
> > > >         struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
> > > >         u32 devid; /* only valid when iommufd is valid */
> > > > };
> > > >
> > > > >
> > > > > I think this is being placed towards the front of the data structure
> > > > > for cache line locality given this is a hot path for file operations.
> > > > > But bool types have an implementation dependent size, making them
> > > > > difficult to pack.  Also there will be a tendency to want to make this
> > > > > a bit field, which is probably not compatible with the smp lockless
> > > > > operations being used here.  We might get in front of these issues if
> > > > > we just define it as a u8 now.  Thanks,
> > > >
> > > > Not quite get why bit field is going to be incompatible with smp
> > > > lockless operations. Could you elaborate a bit? And should I define
> > > > the access_granted as u8 or "u8:1"?
> > >
> > > Perhaps FUD on my part, but load-acquire type operations have specific
> > > semantics and it's not clear to me that they interest with compiler
> > > generated bit operations.  Thanks,
> >
> > I see. How about below?
> >
> > struct vfio_device_file {
> >         struct vfio_device *device;
> >         struct vfio_group *group;
> >         u8 access_granted;
> >         u32 devid; /* only valid when iommufd is valid */
> >         spinlock_t kvm_ref_lock; /* protect kvm field */
> >         struct kvm *kvm;
> >         struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
> > };
> 
> Yep, that's essentially what I was suggesting.  Thanks,

Got it. 😊

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [PATCH v12 24/24] docs: vfio: Add vfio device cdev description
  2023-06-13 14:24         ` Alex Williamson
@ 2023-06-13 14:48           ` Liu, Yi L
  -1 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-13 14:48 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, June 13, 2023 10:24 PM
> 
> On Tue, 13 Jun 2023 12:01:51 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> 
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Tuesday, June 13, 2023 7:06 AM
> > >
> > > On Fri,  2 Jun 2023 05:16:53 -0700
> > > Yi Liu <yi.l.liu@intel.com> wrote:
> > >
> > > > This gives notes for userspace applications on device cdev usage.
> > > >
> > > > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > > ---
> > > >  Documentation/driver-api/vfio.rst | 132 ++++++++++++++++++++++++++++++
> > > >  1 file changed, 132 insertions(+)
> > > >
> > > > diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
> > > > index 363e12c90b87..f00c9b86bda0 100644
> > > > --- a/Documentation/driver-api/vfio.rst
> > > > +++ b/Documentation/driver-api/vfio.rst
> > > > @@ -239,6 +239,130 @@ group and can access them as follows::
> > > >  	/* Gratuitous device reset and go... */
> > > >  	ioctl(device, VFIO_DEVICE_RESET);
> > > >
> > > > +IOMMUFD and vfio_iommu_type1
> > > > +----------------------------
> > > > +
> > > > +IOMMUFD is the new user API to manage I/O page tables from userspace.
> > > > +It intends to be the portal of delivering advanced userspace DMA
> > > > +features (nested translation [5]_, PASID [6]_, etc.) while also providing
> > > > +a backwards compatibility interface for existing VFIO_TYPE1v2_IOMMU use
> > > > +cases.  Eventually the vfio_iommu_type1 driver, as well as the legacy
> > > > +vfio container and group model is intended to be deprecated.
> > > > +
> > > > +The IOMMUFD backwards compatibility interface can be enabled two ways.
> > > > +In the first method, the kernel can be configured with
> > > > +CONFIG_IOMMUFD_VFIO_CONTAINER, in which case the IOMMUFD subsystem
> > > > +transparently provides the entire infrastructure for the VFIO
> > > > +container and IOMMU backend interfaces.  The compatibility mode can
> > > > +also be accessed if the VFIO container interface, ie. /dev/vfio/vfio is
> > > > +simply symlink'd to /dev/iommu.  Note that at the time of writing, the
> > > > +compatibility mode is not entirely feature complete relative to
> > > > +VFIO_TYPE1v2_IOMMU (ex. DMA mapping MMIO) and does not attempt to
> > > > +provide compatibility to the VFIO_SPAPR_TCE_IOMMU interface.  Therefore
> > > > +it is not generally advisable at this time to switch from native VFIO
> > > > +implementations to the IOMMUFD compatibility interfaces.
> > > > +
> > > > +Long term, VFIO users should migrate to device access through the cdev
> > > > +interface described below, and native access through the IOMMUFD
> > > > +provided interfaces.
> > > > +
> > > > +VFIO Device cdev
> > > > +----------------
> > > > +
> > > > +Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
> > > > +in a VFIO group.
> > > > +
> > > > +With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd
> > > > +by directly opening a character device /dev/vfio/devices/vfioX where
> > > > +"X" is the number allocated uniquely by VFIO for registered devices.
> > > > +cdev interface does not support noiommu, so user should use the legacy
> > > > +group interface if noiommu is needed.
> > > > +
> > > > +The cdev only works with IOMMUFD.  Both VFIO drivers and applications
> > > > +must adapt to the new cdev security model which requires using
> > > > +VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to
> > > > +actually use the device.  Once BIND succeeds then a VFIO device can
> > > > +be fully accessed by the user.
> > > > +
> > > > +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers.
> > > > +Hence those modules can be fully compiled out in an environment
> > > > +where no legacy VFIO application exists.
> > > > +
> > > > +So far SPAPR does not support IOMMUFD yet.  So it cannot support device
> > > > +cdev neither.
> > >
> > > s/neither/either/
> >
> > Got it.
> >
> > >
> > > Unless I missed it, we've not described that vfio device cdev access is
> > > still bound by IOMMU group semantics, ie. there can be one DMA owner
> > > for the group.  That's a pretty common failure point for multi-function
> > > consumer device use cases, so the why, where, and how it fails should
> > > be well covered.
> >
> > Yes. this needs to be documented. How about below words:
> >
> > vfio device cdev access is still bound by IOMMU group semantics, ie. there
> > can be only one DMA owner for the group.  Devices belonging to the same
> > group can not be bound to multiple iommufd_ctx.
> 
> ... or shared between native kernel and vfio drivers.

I suppose you mean the devices in one group are bound to different
drivers. right?

> 
> >  The users that try to bind
> > such device to different iommufd shall be failed in VFIO_DEVICE_BIND_IOMMUFD
> > which is the start point to get full access for the device.
> 
> "A violation of this ownership requirement will fail at the
> VFIO_DEVICE_BIND_IOMMUFD ioctl, which gates full device access."

Got it.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 24/24] docs: vfio: Add vfio device cdev description
@ 2023-06-13 14:48           ` Liu, Yi L
  0 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-13 14:48 UTC (permalink / raw)
  To: Alex Williamson
  Cc: mjrosato, jasowang, Hao, Xudong, Duan,  Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, jgg, Tian, Kevin, Zhao, Yan Y, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, June 13, 2023 10:24 PM
> 
> On Tue, 13 Jun 2023 12:01:51 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> 
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Tuesday, June 13, 2023 7:06 AM
> > >
> > > On Fri,  2 Jun 2023 05:16:53 -0700
> > > Yi Liu <yi.l.liu@intel.com> wrote:
> > >
> > > > This gives notes for userspace applications on device cdev usage.
> > > >
> > > > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > > ---
> > > >  Documentation/driver-api/vfio.rst | 132 ++++++++++++++++++++++++++++++
> > > >  1 file changed, 132 insertions(+)
> > > >
> > > > diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
> > > > index 363e12c90b87..f00c9b86bda0 100644
> > > > --- a/Documentation/driver-api/vfio.rst
> > > > +++ b/Documentation/driver-api/vfio.rst
> > > > @@ -239,6 +239,130 @@ group and can access them as follows::
> > > >  	/* Gratuitous device reset and go... */
> > > >  	ioctl(device, VFIO_DEVICE_RESET);
> > > >
> > > > +IOMMUFD and vfio_iommu_type1
> > > > +----------------------------
> > > > +
> > > > +IOMMUFD is the new user API to manage I/O page tables from userspace.
> > > > +It intends to be the portal of delivering advanced userspace DMA
> > > > +features (nested translation [5]_, PASID [6]_, etc.) while also providing
> > > > +a backwards compatibility interface for existing VFIO_TYPE1v2_IOMMU use
> > > > +cases.  Eventually the vfio_iommu_type1 driver, as well as the legacy
> > > > +vfio container and group model is intended to be deprecated.
> > > > +
> > > > +The IOMMUFD backwards compatibility interface can be enabled two ways.
> > > > +In the first method, the kernel can be configured with
> > > > +CONFIG_IOMMUFD_VFIO_CONTAINER, in which case the IOMMUFD subsystem
> > > > +transparently provides the entire infrastructure for the VFIO
> > > > +container and IOMMU backend interfaces.  The compatibility mode can
> > > > +also be accessed if the VFIO container interface, ie. /dev/vfio/vfio is
> > > > +simply symlink'd to /dev/iommu.  Note that at the time of writing, the
> > > > +compatibility mode is not entirely feature complete relative to
> > > > +VFIO_TYPE1v2_IOMMU (ex. DMA mapping MMIO) and does not attempt to
> > > > +provide compatibility to the VFIO_SPAPR_TCE_IOMMU interface.  Therefore
> > > > +it is not generally advisable at this time to switch from native VFIO
> > > > +implementations to the IOMMUFD compatibility interfaces.
> > > > +
> > > > +Long term, VFIO users should migrate to device access through the cdev
> > > > +interface described below, and native access through the IOMMUFD
> > > > +provided interfaces.
> > > > +
> > > > +VFIO Device cdev
> > > > +----------------
> > > > +
> > > > +Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
> > > > +in a VFIO group.
> > > > +
> > > > +With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd
> > > > +by directly opening a character device /dev/vfio/devices/vfioX where
> > > > +"X" is the number allocated uniquely by VFIO for registered devices.
> > > > +cdev interface does not support noiommu, so user should use the legacy
> > > > +group interface if noiommu is needed.
> > > > +
> > > > +The cdev only works with IOMMUFD.  Both VFIO drivers and applications
> > > > +must adapt to the new cdev security model which requires using
> > > > +VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to
> > > > +actually use the device.  Once BIND succeeds then a VFIO device can
> > > > +be fully accessed by the user.
> > > > +
> > > > +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers.
> > > > +Hence those modules can be fully compiled out in an environment
> > > > +where no legacy VFIO application exists.
> > > > +
> > > > +So far SPAPR does not support IOMMUFD yet.  So it cannot support device
> > > > +cdev neither.
> > >
> > > s/neither/either/
> >
> > Got it.
> >
> > >
> > > Unless I missed it, we've not described that vfio device cdev access is
> > > still bound by IOMMU group semantics, ie. there can be one DMA owner
> > > for the group.  That's a pretty common failure point for multi-function
> > > consumer device use cases, so the why, where, and how it fails should
> > > be well covered.
> >
> > Yes. this needs to be documented. How about below words:
> >
> > vfio device cdev access is still bound by IOMMU group semantics, ie. there
> > can be only one DMA owner for the group.  Devices belonging to the same
> > group can not be bound to multiple iommufd_ctx.
> 
> ... or shared between native kernel and vfio drivers.

I suppose you mean the devices in one group are bound to different
drivers. right?

> 
> >  The users that try to bind
> > such device to different iommufd shall be failed in VFIO_DEVICE_BIND_IOMMUFD
> > which is the start point to get full access for the device.
> 
> "A violation of this ownership requirement will fail at the
> VFIO_DEVICE_BIND_IOMMUFD ioctl, which gates full device access."

Got it.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
  2023-06-13 14:33           ` [Intel-gfx] " Liu, Yi L
@ 2023-06-13 14:48             ` Alex Williamson
  -1 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-13 14:48 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato, jasowang, Hao, Xudong, Duan, Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, jgg, Tian, Kevin, Zhao,  Yan Y, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Tue, 13 Jun 2023 14:33:01 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Tuesday, June 13, 2023 10:19 PM
> > 
> > On Tue, 13 Jun 2023 05:53:42 +0000
> > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >   
> > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > Sent: Tuesday, June 13, 2023 6:42 AM
> > > >
> > > > On Fri,  2 Jun 2023 05:16:50 -0700
> > > > Yi Liu <yi.l.liu@intel.com> wrote:
> > > >  
> > > > > This moves the noiommu device determination and noiommu taint out of
> > > > > vfio_group_find_or_alloc(). noiommu device is determined in
> > > > > __vfio_register_dev() and result is stored in flag vfio_device->noiommu,
> > > > > the noiommu taint is added in the end of __vfio_register_dev().
> > > > >
> > > > > This is also a preparation for compiling out vfio_group infrastructure
> > > > > as it makes the noiommu detection and taint common between the cdev path
> > > > > and group path though cdev path does not support noiommu.  
> > > >
> > > > Does this really still make sense?  The motivation for the change is
> > > > really not clear without cdev support for noiommu.  Thanks,  
> > >
> > > I think it still makes sense. When CONFIG_VFIO_GROUP==n, the kernel
> > > only supports cdev interface. If there is noiommu device, vfio should
> > > fail the registration. So, the noiommu determination is still needed. But
> > > I'd admit the taint might still be in the group code.  
> > 
> > How is there going to be a noiommu device when VFIO_GROUP is unset?  
> 
> How about booting a kernel with iommu disabled, then all the devices
> are not protected by iommu. I suppose they are noiommu devices. If
> user wants to bound them to vfio, the kernel should have VFIO_GROUP.
> Otherwise, needs to fail.

"noiommu" is a vfio designation of a device, it must be created by
vfio.  There can certainly be devices which are not IOMMU backed, but
without vfio designating them as noiommu devices, which is only done
via the legacy and compat paths, there's no such thing as a noiommu
device.  Devices without an IOMMU are simply out of scope for cdev,
there should never be a vfio cdev entry created for them.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
@ 2023-06-13 14:48             ` Alex Williamson
  0 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-13 14:48 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

On Tue, 13 Jun 2023 14:33:01 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Tuesday, June 13, 2023 10:19 PM
> > 
> > On Tue, 13 Jun 2023 05:53:42 +0000
> > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >   
> > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > Sent: Tuesday, June 13, 2023 6:42 AM
> > > >
> > > > On Fri,  2 Jun 2023 05:16:50 -0700
> > > > Yi Liu <yi.l.liu@intel.com> wrote:
> > > >  
> > > > > This moves the noiommu device determination and noiommu taint out of
> > > > > vfio_group_find_or_alloc(). noiommu device is determined in
> > > > > __vfio_register_dev() and result is stored in flag vfio_device->noiommu,
> > > > > the noiommu taint is added in the end of __vfio_register_dev().
> > > > >
> > > > > This is also a preparation for compiling out vfio_group infrastructure
> > > > > as it makes the noiommu detection and taint common between the cdev path
> > > > > and group path though cdev path does not support noiommu.  
> > > >
> > > > Does this really still make sense?  The motivation for the change is
> > > > really not clear without cdev support for noiommu.  Thanks,  
> > >
> > > I think it still makes sense. When CONFIG_VFIO_GROUP==n, the kernel
> > > only supports cdev interface. If there is noiommu device, vfio should
> > > fail the registration. So, the noiommu determination is still needed. But
> > > I'd admit the taint might still be in the group code.  
> > 
> > How is there going to be a noiommu device when VFIO_GROUP is unset?  
> 
> How about booting a kernel with iommu disabled, then all the devices
> are not protected by iommu. I suppose they are noiommu devices. If
> user wants to bound them to vfio, the kernel should have VFIO_GROUP.
> Otherwise, needs to fail.

"noiommu" is a vfio designation of a device, it must be created by
vfio.  There can certainly be devices which are not IOMMU backed, but
without vfio designating them as noiommu devices, which is only done
via the legacy and compat paths, there's no such thing as a noiommu
device.  Devices without an IOMMU are simply out of scope for cdev,
there should never be a vfio cdev entry created for them.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-06-13 14:42               ` Liu, Yi L
@ 2023-06-13 14:59                 ` Alex Williamson
  -1 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-13 14:59 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

On Tue, 13 Jun 2023 14:42:46 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Tuesday, June 13, 2023 10:40 PM
> > 
> > On Tue, 13 Jun 2023 14:28:43 +0000
> > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >   
> > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > Sent: Tuesday, June 13, 2023 10:18 PM  
> > >  
> > > > > > > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > > > > > > index 83cc5dc28b7a..e80a8ac86e46 100644
> > > > > > > --- a/include/linux/vfio.h
> > > > > > > +++ b/include/linux/vfio.h
> > > > > > > @@ -66,6 +66,7 @@ struct vfio_device {
> > > > > > >  	struct iommufd_device *iommufd_device;
> > > > > > >  	bool iommufd_attached;
> > > > > > >  #endif
> > > > > > > +	bool cdev_opened:1;  
> > > > > >
> > > > > > Perhaps a more strongly defined data type here as well and roll
> > > > > > iommufd_attached into the same bit field scheme.  
> > > > >
> > > > > Ok, then needs to make iommufd_attached always defined.  
> > > >
> > > > That does not follow.  Thanks,  
> > >
> > > Well, I meant the iommufd_attached now is defined only when
> > > CONFIG_IOMMUFD is enabled. To toll it with cdev_opened, needs
> > > to change this.  
> > 
> > Understood, but I don't think it's true.  If defined we use one more
> > bit of the bit field, which is a consideration when we approach filling
> > it, but we're not using bit-shift operations to address these bits, so
> > why does it matter if one has compiler conditional usage?  Thanks,  
> 
> Aha, I see. So you are suggesting something like the below. Is it?
> 
> #if IS_ENABLED(CONFIG_IOMMUFD)
> 	struct iommufd_device *iommufd_device;
> 	u8 iommufd_attached:1;
> #endif
> 	u8 cdev_opened:1;


Precisely.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
@ 2023-06-13 14:59                 ` Alex Williamson
  0 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-13 14:59 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato, jasowang, Hao, Xudong, Duan, Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, jgg, Tian, Kevin, Zhao,  Yan Y, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Tue, 13 Jun 2023 14:42:46 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Tuesday, June 13, 2023 10:40 PM
> > 
> > On Tue, 13 Jun 2023 14:28:43 +0000
> > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >   
> > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > Sent: Tuesday, June 13, 2023 10:18 PM  
> > >  
> > > > > > > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > > > > > > index 83cc5dc28b7a..e80a8ac86e46 100644
> > > > > > > --- a/include/linux/vfio.h
> > > > > > > +++ b/include/linux/vfio.h
> > > > > > > @@ -66,6 +66,7 @@ struct vfio_device {
> > > > > > >  	struct iommufd_device *iommufd_device;
> > > > > > >  	bool iommufd_attached;
> > > > > > >  #endif
> > > > > > > +	bool cdev_opened:1;  
> > > > > >
> > > > > > Perhaps a more strongly defined data type here as well and roll
> > > > > > iommufd_attached into the same bit field scheme.  
> > > > >
> > > > > Ok, then needs to make iommufd_attached always defined.  
> > > >
> > > > That does not follow.  Thanks,  
> > >
> > > Well, I meant the iommufd_attached now is defined only when
> > > CONFIG_IOMMUFD is enabled. To toll it with cdev_opened, needs
> > > to change this.  
> > 
> > Understood, but I don't think it's true.  If defined we use one more
> > bit of the bit field, which is a consideration when we approach filling
> > it, but we're not using bit-shift operations to address these bits, so
> > why does it matter if one has compiler conditional usage?  Thanks,  
> 
> Aha, I see. So you are suggesting something like the below. Is it?
> 
> #if IS_ENABLED(CONFIG_IOMMUFD)
> 	struct iommufd_device *iommufd_device;
> 	u8 iommufd_attached:1;
> #endif
> 	u8 cdev_opened:1;


Precisely.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
  2023-06-13 14:48             ` Alex Williamson
@ 2023-06-13 15:01               ` Liu, Yi L
  -1 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-13 15:01 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, June 13, 2023 10:48 PM
> 
> On Tue, 13 Jun 2023 14:33:01 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> 
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Tuesday, June 13, 2023 10:19 PM
> > >
> > > On Tue, 13 Jun 2023 05:53:42 +0000
> > > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> > >
> > > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > > Sent: Tuesday, June 13, 2023 6:42 AM
> > > > >
> > > > > On Fri,  2 Jun 2023 05:16:50 -0700
> > > > > Yi Liu <yi.l.liu@intel.com> wrote:
> > > > >
> > > > > > This moves the noiommu device determination and noiommu taint out of
> > > > > > vfio_group_find_or_alloc(). noiommu device is determined in
> > > > > > __vfio_register_dev() and result is stored in flag vfio_device->noiommu,
> > > > > > the noiommu taint is added in the end of __vfio_register_dev().
> > > > > >
> > > > > > This is also a preparation for compiling out vfio_group infrastructure
> > > > > > as it makes the noiommu detection and taint common between the cdev path
> > > > > > and group path though cdev path does not support noiommu.
> > > > >
> > > > > Does this really still make sense?  The motivation for the change is
> > > > > really not clear without cdev support for noiommu.  Thanks,
> > > >
> > > > I think it still makes sense. When CONFIG_VFIO_GROUP==n, the kernel
> > > > only supports cdev interface. If there is noiommu device, vfio should
> > > > fail the registration. So, the noiommu determination is still needed. But
> > > > I'd admit the taint might still be in the group code.
> > >
> > > How is there going to be a noiommu device when VFIO_GROUP is unset?
> >
> > How about booting a kernel with iommu disabled, then all the devices
> > are not protected by iommu. I suppose they are noiommu devices. If
> > user wants to bound them to vfio, the kernel should have VFIO_GROUP.
> > Otherwise, needs to fail.
> 
> "noiommu" is a vfio designation of a device, it must be created by
> vfio.  

Sure.

> There can certainly be devices which are not IOMMU backed, but
> without vfio designating them as noiommu devices, which is only done
> via the legacy and compat paths, there's no such thing as a noiommu
> device. 

Yes.

> Devices without an IOMMU are simply out of scope for cdev,
> there should never be a vfio cdev entry created for them.  Thanks,

Actually, this is what I want to solve. I need to check if a device is
IOMMU backed or not, and based on this info to prevent creating
cdev entry for them in the coming cdev support or may need to
fail registration if VFIO_GROUP is unset.

If this patch is not good. I can use the vfio_device_is_noiommu()
written like below when VFIO_GROUP is unset. What about your
opinion?

static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
{
	struct iommu_group *iommu_group;

	iommu_group = iommu_group_get(vdev->dev);
	iommu_group_put(iommu_group); /* Accepts NULL */
	return !iommu_group;
}

Regards,
Yi Liu



^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
@ 2023-06-13 15:01               ` Liu, Yi L
  0 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-13 15:01 UTC (permalink / raw)
  To: Alex Williamson
  Cc: mjrosato, jasowang, Hao, Xudong, Duan,  Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, jgg, Tian, Kevin, Zhao, Yan Y, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, June 13, 2023 10:48 PM
> 
> On Tue, 13 Jun 2023 14:33:01 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> 
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Tuesday, June 13, 2023 10:19 PM
> > >
> > > On Tue, 13 Jun 2023 05:53:42 +0000
> > > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> > >
> > > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > > Sent: Tuesday, June 13, 2023 6:42 AM
> > > > >
> > > > > On Fri,  2 Jun 2023 05:16:50 -0700
> > > > > Yi Liu <yi.l.liu@intel.com> wrote:
> > > > >
> > > > > > This moves the noiommu device determination and noiommu taint out of
> > > > > > vfio_group_find_or_alloc(). noiommu device is determined in
> > > > > > __vfio_register_dev() and result is stored in flag vfio_device->noiommu,
> > > > > > the noiommu taint is added in the end of __vfio_register_dev().
> > > > > >
> > > > > > This is also a preparation for compiling out vfio_group infrastructure
> > > > > > as it makes the noiommu detection and taint common between the cdev path
> > > > > > and group path though cdev path does not support noiommu.
> > > > >
> > > > > Does this really still make sense?  The motivation for the change is
> > > > > really not clear without cdev support for noiommu.  Thanks,
> > > >
> > > > I think it still makes sense. When CONFIG_VFIO_GROUP==n, the kernel
> > > > only supports cdev interface. If there is noiommu device, vfio should
> > > > fail the registration. So, the noiommu determination is still needed. But
> > > > I'd admit the taint might still be in the group code.
> > >
> > > How is there going to be a noiommu device when VFIO_GROUP is unset?
> >
> > How about booting a kernel with iommu disabled, then all the devices
> > are not protected by iommu. I suppose they are noiommu devices. If
> > user wants to bound them to vfio, the kernel should have VFIO_GROUP.
> > Otherwise, needs to fail.
> 
> "noiommu" is a vfio designation of a device, it must be created by
> vfio.  

Sure.

> There can certainly be devices which are not IOMMU backed, but
> without vfio designating them as noiommu devices, which is only done
> via the legacy and compat paths, there's no such thing as a noiommu
> device. 

Yes.

> Devices without an IOMMU are simply out of scope for cdev,
> there should never be a vfio cdev entry created for them.  Thanks,

Actually, this is what I want to solve. I need to check if a device is
IOMMU backed or not, and based on this info to prevent creating
cdev entry for them in the coming cdev support or may need to
fail registration if VFIO_GROUP is unset.

If this patch is not good. I can use the vfio_device_is_noiommu()
written like below when VFIO_GROUP is unset. What about your
opinion?

static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
{
	struct iommu_group *iommu_group;

	iommu_group = iommu_group_get(vdev->dev);
	iommu_group_put(iommu_group); /* Accepts NULL */
	return !iommu_group;
}

Regards,
Yi Liu



^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 24/24] docs: vfio: Add vfio device cdev description
  2023-06-13 14:48           ` [Intel-gfx] " Liu, Yi L
@ 2023-06-13 15:04             ` Alex Williamson
  -1 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-13 15:04 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato, jasowang, Hao, Xudong, Duan, Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, jgg, Tian, Kevin, Zhao,  Yan Y, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Tue, 13 Jun 2023 14:48:02 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Tuesday, June 13, 2023 10:24 PM
> > 
> > On Tue, 13 Jun 2023 12:01:51 +0000
> > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >   
> > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > Sent: Tuesday, June 13, 2023 7:06 AM
> > > >
> > > > On Fri,  2 Jun 2023 05:16:53 -0700
> > > > Yi Liu <yi.l.liu@intel.com> wrote:
> > > >  
> > > > > This gives notes for userspace applications on device cdev usage.
> > > > >
> > > > > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > > > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > > > ---
> > > > >  Documentation/driver-api/vfio.rst | 132 ++++++++++++++++++++++++++++++
> > > > >  1 file changed, 132 insertions(+)
> > > > >
> > > > > diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
> > > > > index 363e12c90b87..f00c9b86bda0 100644
> > > > > --- a/Documentation/driver-api/vfio.rst
> > > > > +++ b/Documentation/driver-api/vfio.rst
> > > > > @@ -239,6 +239,130 @@ group and can access them as follows::
> > > > >  	/* Gratuitous device reset and go... */
> > > > >  	ioctl(device, VFIO_DEVICE_RESET);
> > > > >
> > > > > +IOMMUFD and vfio_iommu_type1
> > > > > +----------------------------
> > > > > +
> > > > > +IOMMUFD is the new user API to manage I/O page tables from userspace.
> > > > > +It intends to be the portal of delivering advanced userspace DMA
> > > > > +features (nested translation [5]_, PASID [6]_, etc.) while also providing
> > > > > +a backwards compatibility interface for existing VFIO_TYPE1v2_IOMMU use
> > > > > +cases.  Eventually the vfio_iommu_type1 driver, as well as the legacy
> > > > > +vfio container and group model is intended to be deprecated.
> > > > > +
> > > > > +The IOMMUFD backwards compatibility interface can be enabled two ways.
> > > > > +In the first method, the kernel can be configured with
> > > > > +CONFIG_IOMMUFD_VFIO_CONTAINER, in which case the IOMMUFD subsystem
> > > > > +transparently provides the entire infrastructure for the VFIO
> > > > > +container and IOMMU backend interfaces.  The compatibility mode can
> > > > > +also be accessed if the VFIO container interface, ie. /dev/vfio/vfio is
> > > > > +simply symlink'd to /dev/iommu.  Note that at the time of writing, the
> > > > > +compatibility mode is not entirely feature complete relative to
> > > > > +VFIO_TYPE1v2_IOMMU (ex. DMA mapping MMIO) and does not attempt to
> > > > > +provide compatibility to the VFIO_SPAPR_TCE_IOMMU interface.  Therefore
> > > > > +it is not generally advisable at this time to switch from native VFIO
> > > > > +implementations to the IOMMUFD compatibility interfaces.
> > > > > +
> > > > > +Long term, VFIO users should migrate to device access through the cdev
> > > > > +interface described below, and native access through the IOMMUFD
> > > > > +provided interfaces.
> > > > > +
> > > > > +VFIO Device cdev
> > > > > +----------------
> > > > > +
> > > > > +Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
> > > > > +in a VFIO group.
> > > > > +
> > > > > +With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd
> > > > > +by directly opening a character device /dev/vfio/devices/vfioX where
> > > > > +"X" is the number allocated uniquely by VFIO for registered devices.
> > > > > +cdev interface does not support noiommu, so user should use the legacy
> > > > > +group interface if noiommu is needed.
> > > > > +
> > > > > +The cdev only works with IOMMUFD.  Both VFIO drivers and applications
> > > > > +must adapt to the new cdev security model which requires using
> > > > > +VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to
> > > > > +actually use the device.  Once BIND succeeds then a VFIO device can
> > > > > +be fully accessed by the user.
> > > > > +
> > > > > +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers.
> > > > > +Hence those modules can be fully compiled out in an environment
> > > > > +where no legacy VFIO application exists.
> > > > > +
> > > > > +So far SPAPR does not support IOMMUFD yet.  So it cannot support device
> > > > > +cdev neither.  
> > > >
> > > > s/neither/either/  
> > >
> > > Got it.
> > >  
> > > >
> > > > Unless I missed it, we've not described that vfio device cdev access is
> > > > still bound by IOMMU group semantics, ie. there can be one DMA owner
> > > > for the group.  That's a pretty common failure point for multi-function
> > > > consumer device use cases, so the why, where, and how it fails should
> > > > be well covered.  
> > >
> > > Yes. this needs to be documented. How about below words:
> > >
> > > vfio device cdev access is still bound by IOMMU group semantics, ie. there
> > > can be only one DMA owner for the group.  Devices belonging to the same
> > > group can not be bound to multiple iommufd_ctx.  
> > 
> > ... or shared between native kernel and vfio drivers.  
> 
> I suppose you mean the devices in one group are bound to different
> drivers. right?

Essentially, but we need to be careful that we're developing multiple
vfio drivers for a given bus now, which is why I try to distinguish
between the two sets of drivers.  Thanks,

Alex
 
> > >  The users that try to bind
> > > such device to different iommufd shall be failed in VFIO_DEVICE_BIND_IOMMUFD
> > > which is the start point to get full access for the device.  
> > 
> > "A violation of this ownership requirement will fail at the
> > VFIO_DEVICE_BIND_IOMMUFD ioctl, which gates full device access."  
> 
> Got it.
> 
> Regards,
> Yi Liu
> 


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 24/24] docs: vfio: Add vfio device cdev description
@ 2023-06-13 15:04             ` Alex Williamson
  0 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-13 15:04 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

On Tue, 13 Jun 2023 14:48:02 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Tuesday, June 13, 2023 10:24 PM
> > 
> > On Tue, 13 Jun 2023 12:01:51 +0000
> > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >   
> > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > Sent: Tuesday, June 13, 2023 7:06 AM
> > > >
> > > > On Fri,  2 Jun 2023 05:16:53 -0700
> > > > Yi Liu <yi.l.liu@intel.com> wrote:
> > > >  
> > > > > This gives notes for userspace applications on device cdev usage.
> > > > >
> > > > > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > > > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > > > ---
> > > > >  Documentation/driver-api/vfio.rst | 132 ++++++++++++++++++++++++++++++
> > > > >  1 file changed, 132 insertions(+)
> > > > >
> > > > > diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
> > > > > index 363e12c90b87..f00c9b86bda0 100644
> > > > > --- a/Documentation/driver-api/vfio.rst
> > > > > +++ b/Documentation/driver-api/vfio.rst
> > > > > @@ -239,6 +239,130 @@ group and can access them as follows::
> > > > >  	/* Gratuitous device reset and go... */
> > > > >  	ioctl(device, VFIO_DEVICE_RESET);
> > > > >
> > > > > +IOMMUFD and vfio_iommu_type1
> > > > > +----------------------------
> > > > > +
> > > > > +IOMMUFD is the new user API to manage I/O page tables from userspace.
> > > > > +It intends to be the portal of delivering advanced userspace DMA
> > > > > +features (nested translation [5]_, PASID [6]_, etc.) while also providing
> > > > > +a backwards compatibility interface for existing VFIO_TYPE1v2_IOMMU use
> > > > > +cases.  Eventually the vfio_iommu_type1 driver, as well as the legacy
> > > > > +vfio container and group model is intended to be deprecated.
> > > > > +
> > > > > +The IOMMUFD backwards compatibility interface can be enabled two ways.
> > > > > +In the first method, the kernel can be configured with
> > > > > +CONFIG_IOMMUFD_VFIO_CONTAINER, in which case the IOMMUFD subsystem
> > > > > +transparently provides the entire infrastructure for the VFIO
> > > > > +container and IOMMU backend interfaces.  The compatibility mode can
> > > > > +also be accessed if the VFIO container interface, ie. /dev/vfio/vfio is
> > > > > +simply symlink'd to /dev/iommu.  Note that at the time of writing, the
> > > > > +compatibility mode is not entirely feature complete relative to
> > > > > +VFIO_TYPE1v2_IOMMU (ex. DMA mapping MMIO) and does not attempt to
> > > > > +provide compatibility to the VFIO_SPAPR_TCE_IOMMU interface.  Therefore
> > > > > +it is not generally advisable at this time to switch from native VFIO
> > > > > +implementations to the IOMMUFD compatibility interfaces.
> > > > > +
> > > > > +Long term, VFIO users should migrate to device access through the cdev
> > > > > +interface described below, and native access through the IOMMUFD
> > > > > +provided interfaces.
> > > > > +
> > > > > +VFIO Device cdev
> > > > > +----------------
> > > > > +
> > > > > +Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
> > > > > +in a VFIO group.
> > > > > +
> > > > > +With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd
> > > > > +by directly opening a character device /dev/vfio/devices/vfioX where
> > > > > +"X" is the number allocated uniquely by VFIO for registered devices.
> > > > > +cdev interface does not support noiommu, so user should use the legacy
> > > > > +group interface if noiommu is needed.
> > > > > +
> > > > > +The cdev only works with IOMMUFD.  Both VFIO drivers and applications
> > > > > +must adapt to the new cdev security model which requires using
> > > > > +VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to
> > > > > +actually use the device.  Once BIND succeeds then a VFIO device can
> > > > > +be fully accessed by the user.
> > > > > +
> > > > > +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers.
> > > > > +Hence those modules can be fully compiled out in an environment
> > > > > +where no legacy VFIO application exists.
> > > > > +
> > > > > +So far SPAPR does not support IOMMUFD yet.  So it cannot support device
> > > > > +cdev neither.  
> > > >
> > > > s/neither/either/  
> > >
> > > Got it.
> > >  
> > > >
> > > > Unless I missed it, we've not described that vfio device cdev access is
> > > > still bound by IOMMU group semantics, ie. there can be one DMA owner
> > > > for the group.  That's a pretty common failure point for multi-function
> > > > consumer device use cases, so the why, where, and how it fails should
> > > > be well covered.  
> > >
> > > Yes. this needs to be documented. How about below words:
> > >
> > > vfio device cdev access is still bound by IOMMU group semantics, ie. there
> > > can be only one DMA owner for the group.  Devices belonging to the same
> > > group can not be bound to multiple iommufd_ctx.  
> > 
> > ... or shared between native kernel and vfio drivers.  
> 
> I suppose you mean the devices in one group are bound to different
> drivers. right?

Essentially, but we need to be careful that we're developing multiple
vfio drivers for a given bus now, which is why I try to distinguish
between the two sets of drivers.  Thanks,

Alex
 
> > >  The users that try to bind
> > > such device to different iommufd shall be failed in VFIO_DEVICE_BIND_IOMMUFD
> > > which is the start point to get full access for the device.  
> > 
> > "A violation of this ownership requirement will fail at the
> > VFIO_DEVICE_BIND_IOMMUFD ioctl, which gates full device access."  
> 
> Got it.
> 
> Regards,
> Yi Liu
> 


^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [PATCH v12 24/24] docs: vfio: Add vfio device cdev description
  2023-06-13 15:04             ` Alex Williamson
@ 2023-06-13 15:11               ` Liu, Yi L
  -1 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-13 15:11 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, June 13, 2023 11:04 PM
> 
> > > >
> > > > >
> > > > > Unless I missed it, we've not described that vfio device cdev access is
> > > > > still bound by IOMMU group semantics, ie. there can be one DMA owner
> > > > > for the group.  That's a pretty common failure point for multi-function
> > > > > consumer device use cases, so the why, where, and how it fails should
> > > > > be well covered.
> > > >
> > > > Yes. this needs to be documented. How about below words:
> > > >
> > > > vfio device cdev access is still bound by IOMMU group semantics, ie. there
> > > > can be only one DMA owner for the group.  Devices belonging to the same
> > > > group can not be bound to multiple iommufd_ctx.
> > >
> > > ... or shared between native kernel and vfio drivers.
> >
> > I suppose you mean the devices in one group are bound to different
> > drivers. right?
> 
> Essentially, but we need to be careful that we're developing multiple
> vfio drivers for a given bus now, which is why I try to distinguish
> between the two sets of drivers.  Thanks,

Indeed. There are a set of vfio drivers. Even pci-stub can be considered
in this set? Perhaps, it is more precise to say : or shared between drivers
that set the struct pci_driver::driver_managed_dma flag and the drivers
that do not.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 24/24] docs: vfio: Add vfio device cdev description
@ 2023-06-13 15:11               ` Liu, Yi L
  0 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-13 15:11 UTC (permalink / raw)
  To: Alex Williamson
  Cc: mjrosato, jasowang, Hao, Xudong, Duan,  Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, jgg, Tian, Kevin, Zhao, Yan Y, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, June 13, 2023 11:04 PM
> 
> > > >
> > > > >
> > > > > Unless I missed it, we've not described that vfio device cdev access is
> > > > > still bound by IOMMU group semantics, ie. there can be one DMA owner
> > > > > for the group.  That's a pretty common failure point for multi-function
> > > > > consumer device use cases, so the why, where, and how it fails should
> > > > > be well covered.
> > > >
> > > > Yes. this needs to be documented. How about below words:
> > > >
> > > > vfio device cdev access is still bound by IOMMU group semantics, ie. there
> > > > can be only one DMA owner for the group.  Devices belonging to the same
> > > > group can not be bound to multiple iommufd_ctx.
> > >
> > > ... or shared between native kernel and vfio drivers.
> >
> > I suppose you mean the devices in one group are bound to different
> > drivers. right?
> 
> Essentially, but we need to be careful that we're developing multiple
> vfio drivers for a given bus now, which is why I try to distinguish
> between the two sets of drivers.  Thanks,

Indeed. There are a set of vfio drivers. Even pci-stub can be considered
in this set? Perhaps, it is more precise to say : or shared between drivers
that set the struct pci_driver::driver_managed_dma flag and the drivers
that do not.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
  2023-06-13 15:01               ` [Intel-gfx] " Liu, Yi L
  (?)
@ 2023-06-13 15:13               ` Alex Williamson
  2023-06-13 17:15                 ` Alex Williamson
  -1 siblings, 1 reply; 180+ messages in thread
From: Alex Williamson @ 2023-06-13 15:13 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Yanting, mjrosato, jasowang, peterx,   <lulu@redhat.com>, ,
	suravee.suthikulpanit, chao.p.peng, kvm, joro, Zhenzhong,
	 <zhenzhong.duan@intel.com>,   ,
	clegoate, Yan, nicolinc, jgg,
	     <intel-gvt-dev@lists.freedesktop.org>,  ,
	intel-gfx, linux-s390, ,
	Tian,  Kevin, Xudong,   <suravee.suthikulpanit@amd.com>, ,
	intel-gvt-dev, ,  <intel-gfx@lists.freedesktop.org>,   ,
	linux-s390, Terrence, yi.y.sun, eric.auger, cohuck, clegoate,
	robin.murphy,
	shameerali.kolothum.thodi@huawei.com"         
	<shameerali.kolothum.thodi@huawei.com>, ,
	lulu

On Tue, 13 Jun 2023 15:01:35 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Tuesday, June 13, 2023 10:48 PM
> > 
> > On Tue, 13 Jun 2023 14:33:01 +0000
> > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >   
> > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > Sent: Tuesday, June 13, 2023 10:19 PM
> > > >
> > > > On Tue, 13 Jun 2023 05:53:42 +0000
> > > > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> > > >  
> > > > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > > > Sent: Tuesday, June 13, 2023 6:42 AM
> > > > > >
> > > > > > On Fri,  2 Jun 2023 05:16:50 -0700
> > > > > > Yi Liu <yi.l.liu@intel.com> wrote:
> > > > > >  
> > > > > > > This moves the noiommu device determination and noiommu taint out of
> > > > > > > vfio_group_find_or_alloc(). noiommu device is determined in
> > > > > > > __vfio_register_dev() and result is stored in flag vfio_device->noiommu,
> > > > > > > the noiommu taint is added in the end of __vfio_register_dev().
> > > > > > >
> > > > > > > This is also a preparation for compiling out vfio_group infrastructure
> > > > > > > as it makes the noiommu detection and taint common between the cdev path
> > > > > > > and group path though cdev path does not support noiommu.  
> > > > > >
> > > > > > Does this really still make sense?  The motivation for the change is
> > > > > > really not clear without cdev support for noiommu.  Thanks,  
> > > > >
> > > > > I think it still makes sense. When CONFIG_VFIO_GROUP==n, the kernel
> > > > > only supports cdev interface. If there is noiommu device, vfio should
> > > > > fail the registration. So, the noiommu determination is still needed. But
> > > > > I'd admit the taint might still be in the group code.  
> > > >
> > > > How is there going to be a noiommu device when VFIO_GROUP is unset?  
> > >
> > > How about booting a kernel with iommu disabled, then all the devices
> > > are not protected by iommu. I suppose they are noiommu devices. If
> > > user wants to bound them to vfio, the kernel should have VFIO_GROUP.
> > > Otherwise, needs to fail.  
> > 
> > "noiommu" is a vfio designation of a device, it must be created by
> > vfio.    
> 
> Sure.
> 
> > There can certainly be devices which are not IOMMU backed, but
> > without vfio designating them as noiommu devices, which is only done
> > via the legacy and compat paths, there's no such thing as a noiommu
> > device.   
> 
> Yes.
> 
> > Devices without an IOMMU are simply out of scope for cdev,
> > there should never be a vfio cdev entry created for them.  Thanks,  
> 
> Actually, this is what I want to solve. I need to check if a device is
> IOMMU backed or not, and based on this info to prevent creating
> cdev entry for them in the coming cdev support or may need to
> fail registration if VFIO_GROUP is unset.
> 
> If this patch is not good. I can use the vfio_device_is_noiommu()
> written like below when VFIO_GROUP is unset. What about your
> opinion?
> 
> static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
> {
> 	struct iommu_group *iommu_group;
> 
> 	iommu_group = iommu_group_get(vdev->dev);
> 	iommu_group_put(iommu_group); /* Accepts NULL */
> 	return !iommu_group;
> }


No, please do not confuse the issue.  As we agreed above "noiommu"
means a specific thing, it's a device without IOMMU backing that vfio
has artificially included in the environment.  If we don't have
VFIO_NOIOMMU then there's no such thing as a "noiommu" device.

You can certainly use an iommu_group test to decide if a device should
be represented, but there absolutely should never be a vfio_device
created without IOMMU backing and without VFIO_NOIOMMU.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
  2023-06-13 15:13               ` Alex Williamson
@ 2023-06-13 17:15                 ` Alex Williamson
  2023-06-13 17:35                     ` [Intel-gfx] " Jason Gunthorpe
  0 siblings, 1 reply; 180+ messages in thread
From: Alex Williamson @ 2023-06-13 17:15 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Yanting, mjrosato, jasowang, peterx,   <lulu@redhat.com>, ,
	suravee.suthikulpanit, chao.p.peng, kvm, joro, Zhenzhong,
	 <zhenzhong.duan@intel.com>,   ,
	clegoate, Yan, nicolinc, jgg,
	     <intel-gvt-dev@lists.freedesktop.org>,  ,
	intel-gfx, linux-s390, ,
	Tian,  Kevin, Xudong,   <suravee.suthikulpanit@amd.com>, ,
	intel-gvt-dev, ,  <intel-gfx@lists.freedesktop.org>,   ,
	linux-s390, Terrence, yi.y.sun, eric.auger, cohuck, clegoate,
	robin.murphy,
	shameerali.kolothum.thodi@huawei.com"         
	<shameerali.kolothum.thodi@huawei.com>, ,
	lulu

[Sorry for breaking threading, replying to my own message id with reply
 content from Yi since the Cc list got broken]

On Tue, 13 Jun 2023 15:28:06 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Tuesday, June 13, 2023 11:13 PM
> > 
> > On Tue, 13 Jun 2023 15:01:35 +0000
> > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >   
> > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > Sent: Tuesday, June 13, 2023 10:48 PM
> > > >
> > > > On Tue, 13 Jun 2023 14:33:01 +0000
> > > > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> > > >  
> > > > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > > > Sent: Tuesday, June 13, 2023 10:19 PM
> > > > > >
> > > > > > On Tue, 13 Jun 2023 05:53:42 +0000
> > > > > > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> > > > > >  
> > > > > > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > > > > > Sent: Tuesday, June 13, 2023 6:42 AM
> > > > > > > >
> > > > > > > > On Fri,  2 Jun 2023 05:16:50 -0700
> > > > > > > > Yi Liu <yi.l.liu@intel.com> wrote:
> > > > > > > >  
> > > > > > > > > This moves the noiommu device determination and noiommu taint out of
> > > > > > > > > vfio_group_find_or_alloc(). noiommu device is determined in
> > > > > > > > > __vfio_register_dev() and result is stored in flag vfio_device->noiommu,
> > > > > > > > > the noiommu taint is added in the end of __vfio_register_dev().
> > > > > > > > >
> > > > > > > > > This is also a preparation for compiling out vfio_group infrastructure
> > > > > > > > > as it makes the noiommu detection and taint common between the cdev  
> > path  
> > > > > > > > > and group path though cdev path does not support noiommu.  
> > > > > > > >
> > > > > > > > Does this really still make sense?  The motivation for the change is
> > > > > > > > really not clear without cdev support for noiommu.  Thanks,  
> > > > > > >
> > > > > > > I think it still makes sense. When CONFIG_VFIO_GROUP==n, the kernel
> > > > > > > only supports cdev interface. If there is noiommu device, vfio should
> > > > > > > fail the registration. So, the noiommu determination is still needed. But
> > > > > > > I'd admit the taint might still be in the group code.  
> > > > > >
> > > > > > How is there going to be a noiommu device when VFIO_GROUP is unset?  
> > > > >
> > > > > How about booting a kernel with iommu disabled, then all the devices
> > > > > are not protected by iommu. I suppose they are noiommu devices. If
> > > > > user wants to bound them to vfio, the kernel should have VFIO_GROUP.
> > > > > Otherwise, needs to fail.  
> > > >
> > > > "noiommu" is a vfio designation of a device, it must be created by
> > > > vfio.  
> > >
> > > Sure.
> > >  
> > > > There can certainly be devices which are not IOMMU backed, but
> > > > without vfio designating them as noiommu devices, which is only done
> > > > via the legacy and compat paths, there's no such thing as a noiommu
> > > > device.  
> > >
> > > Yes.
> > >  
> > > > Devices without an IOMMU are simply out of scope for cdev,
> > > > there should never be a vfio cdev entry created for them.  Thanks,  
> > >
> > > Actually, this is what I want to solve. I need to check if a device is
> > > IOMMU backed or not, and based on this info to prevent creating
> > > cdev entry for them in the coming cdev support or may need to
> > > fail registration if VFIO_GROUP is unset.
> > >
> > > If this patch is not good. I can use the vfio_device_is_noiommu()
> > > written like below when VFIO_GROUP is unset. What about your
> > > opinion?
> > >
> > > static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
> > > {
> > > 	struct iommu_group *iommu_group;
> > >
> > > 	iommu_group = iommu_group_get(vdev->dev);
> > > 	iommu_group_put(iommu_group); /* Accepts NULL */
> > > 	return !iommu_group;
> > > }  
> > 
> > 
> > No, please do not confuse the issue.  As we agreed above "noiommu"
> > means a specific thing, it's a device without IOMMU backing that vfio
> > has artificially included in the environment.  If we don't have
> > VFIO_NOIOMMU then there's no such thing as a "noiommu" device.
> > 
> > You can certainly use an iommu_group test to decide if a device should
> > be represented, but there absolutely should never be a vfio_device
> > created without IOMMU backing and without VFIO_NOIOMMU.  Thanks,  
> 
> Hmmm. So your suggestion is to fail the vfio_alloc_device() if the input
> device is not IOMMU backed. right? But at this point, we don't know if
> the caller is trying to allocate vfio_device for a physical device or an
> emulated device. For emulated devices, cdev entry can always be created.
> Is it? I think the iommu_group test should be done for only physical devices
> in the register time.
> 
> Can I have an iommu_backed flag to store the iommu_group test result
> and check it when trying to create/remove cdev entry?

Ok, let me rephrase, the probe function needs to fail for a physical
(VFIO_IOMMU) device when VFIO_NO_IOIMMU is not configured and
vfio_noiommu is not enabled, there should never be a vfio group or cdev
device file created and the vfio_device should never be fully registered.
I overreacted a bit that we should never have a vfio_device at all, we
clearly need one leading up to determining if we can proceed.

If we renamed your function above to vfio_device_has_iommu_group(),
couldn't we just wrap device_add like below instead to not have cdev
setup for a noiommu device, generate an error for a physical device w/o
IOMMU backing, and otherwise setup the cdev device?

static inline int vfio_device_add(struct vfio_device *device, enum vfio_group_type type)
{
#if IS_ENABLED(CONFIG_VFIO_GROUP)
	if (device->group->type == VFIO_NO_IOMMU)
		return device_add(&device->device);
#else
	if (type == VFIO_IOMMU && !vfio_device_has_iommu_group(device))
		return -EINVAL;
#endif
	vfio_init_device_cdev(device);
	return cdev_device_add(&device->cdev, &device->device);
}

static inline void vfio_device_del(struct vfio_device *device)
{
#if IS_ENABLED(CONFIG_VFIO_GROUP)
	if (device->group->type == VFIO_NO_IOMMU)
		return device_del(&device->device);
#endif
	cdev_device_del(&device->cdev, &device->device);
}

I think this is the only extent to which noiommu needs to be a factor
here, skip cdev setup for a noiommu device.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 07/24] vfio: Block device access via device fd until device is opened
  2023-06-13 14:16         ` Alex Williamson
@ 2023-06-13 17:19           ` Jason Gunthorpe
  -1 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-13 17:19 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Liu, Yi L, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

On Tue, Jun 13, 2023 at 08:16:47AM -0600, Alex Williamson wrote:

> > Not quite get why bit field is going to be incompatible with smp
> > lockless operations. Could you elaborate a bit? And should I define
> > the access_granted as u8 or "u8:1"?
> 
> Perhaps FUD on my part, but load-acquire type operations have specific
> semantics and it's not clear to me that they interest with compiler
> generated bit operations.  Thanks,

They won't compile if you target bit ops, you can't take the address
of a bitfield.

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 07/24] vfio: Block device access via device fd until device is opened
@ 2023-06-13 17:19           ` Jason Gunthorpe
  0 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-13 17:19 UTC (permalink / raw)
  To: Alex Williamson
  Cc: mjrosato, jasowang, Hao, Xudong, Duan, Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, Liu, Yi L, kvm, lulu, Jiang,
	Yanting, joro, nicolinc, Tian, Kevin, Zhao, Yan Y, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Tue, Jun 13, 2023 at 08:16:47AM -0600, Alex Williamson wrote:

> > Not quite get why bit field is going to be incompatible with smp
> > lockless operations. Could you elaborate a bit? And should I define
> > the access_granted as u8 or "u8:1"?
> 
> Perhaps FUD on my part, but load-acquire type operations have specific
> semantics and it's not clear to me that they interest with compiler
> generated bit operations.  Thanks,

They won't compile if you target bit ops, you can't take the address
of a bitfield.

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 24/24] docs: vfio: Add vfio device cdev description
  2023-06-13 15:11               ` [Intel-gfx] " Liu, Yi L
@ 2023-06-13 17:30                 ` Alex Williamson
  -1 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-13 17:30 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato, jasowang, Hao, Xudong, Duan, Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, jgg, Tian, Kevin, Zhao,  Yan Y, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Tue, 13 Jun 2023 15:11:06 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Tuesday, June 13, 2023 11:04 PM
> >   
> > > > >  
> > > > > >
> > > > > > Unless I missed it, we've not described that vfio device cdev access is
> > > > > > still bound by IOMMU group semantics, ie. there can be one DMA owner
> > > > > > for the group.  That's a pretty common failure point for multi-function
> > > > > > consumer device use cases, so the why, where, and how it fails should
> > > > > > be well covered.  
> > > > >
> > > > > Yes. this needs to be documented. How about below words:
> > > > >
> > > > > vfio device cdev access is still bound by IOMMU group semantics, ie. there
> > > > > can be only one DMA owner for the group.  Devices belonging to the same
> > > > > group can not be bound to multiple iommufd_ctx.  
> > > >
> > > > ... or shared between native kernel and vfio drivers.  
> > >
> > > I suppose you mean the devices in one group are bound to different
> > > drivers. right?  
> > 
> > Essentially, but we need to be careful that we're developing multiple
> > vfio drivers for a given bus now, which is why I try to distinguish
> > between the two sets of drivers.  Thanks,  
> 
> Indeed. There are a set of vfio drivers. Even pci-stub can be considered
> in this set? Perhaps, it is more precise to say : or shared between drivers
> that set the struct pci_driver::driver_managed_dma flag and the drivers
> that do not.

Yeah, I wish there was a less technical way to describe this.  This is
essentially why we have the VIABLE flag on VFIO_GROUP_GET_STATUS in the
legacy interface, which is what QEMU uses to generate the warning
specific to binding all devices to vfio bus drivers.

Technically there are some exceptions, like pci-stub or "no driver" that
can be used to prevent direct access to devices within the group, but
except for that narrow use case a vfio driver is generally recommended,
and is currently required for certain things like the dev_set test
during hot-reset.

If we want to be accurate without being too pedantic, perhaps it would
be something like "vfio bus driver or other driver supporting the
driver_manged_dma flag".  Note the flag is supported for several
drivers other than pci_driver.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 24/24] docs: vfio: Add vfio device cdev description
@ 2023-06-13 17:30                 ` Alex Williamson
  0 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-13 17:30 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

On Tue, 13 Jun 2023 15:11:06 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Tuesday, June 13, 2023 11:04 PM
> >   
> > > > >  
> > > > > >
> > > > > > Unless I missed it, we've not described that vfio device cdev access is
> > > > > > still bound by IOMMU group semantics, ie. there can be one DMA owner
> > > > > > for the group.  That's a pretty common failure point for multi-function
> > > > > > consumer device use cases, so the why, where, and how it fails should
> > > > > > be well covered.  
> > > > >
> > > > > Yes. this needs to be documented. How about below words:
> > > > >
> > > > > vfio device cdev access is still bound by IOMMU group semantics, ie. there
> > > > > can be only one DMA owner for the group.  Devices belonging to the same
> > > > > group can not be bound to multiple iommufd_ctx.  
> > > >
> > > > ... or shared between native kernel and vfio drivers.  
> > >
> > > I suppose you mean the devices in one group are bound to different
> > > drivers. right?  
> > 
> > Essentially, but we need to be careful that we're developing multiple
> > vfio drivers for a given bus now, which is why I try to distinguish
> > between the two sets of drivers.  Thanks,  
> 
> Indeed. There are a set of vfio drivers. Even pci-stub can be considered
> in this set? Perhaps, it is more precise to say : or shared between drivers
> that set the struct pci_driver::driver_managed_dma flag and the drivers
> that do not.

Yeah, I wish there was a less technical way to describe this.  This is
essentially why we have the VIABLE flag on VFIO_GROUP_GET_STATUS in the
legacy interface, which is what QEMU uses to generate the warning
specific to binding all devices to vfio bus drivers.

Technically there are some exceptions, like pci-stub or "no driver" that
can be used to prevent direct access to devices within the group, but
except for that narrow use case a vfio driver is generally recommended,
and is currently required for certain things like the dev_set test
during hot-reset.

If we want to be accurate without being too pedantic, perhaps it would
be something like "vfio bus driver or other driver supporting the
driver_manged_dma flag".  Note the flag is supported for several
drivers other than pci_driver.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 07/24] vfio: Block device access via device fd until device is opened
  2023-06-13 17:19           ` [Intel-gfx] " Jason Gunthorpe
@ 2023-06-13 17:31             ` Alex Williamson
  -1 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-13 17:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato, jasowang, Hao, Xudong, Duan, Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, Liu, Yi L, kvm, lulu, Jiang,
	Yanting, joro, nicolinc, Tian, Kevin, Zhao, Yan Y, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Tue, 13 Jun 2023 14:19:17 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Tue, Jun 13, 2023 at 08:16:47AM -0600, Alex Williamson wrote:
> 
> > > Not quite get why bit field is going to be incompatible with smp
> > > lockless operations. Could you elaborate a bit? And should I define
> > > the access_granted as u8 or "u8:1"?  
> > 
> > Perhaps FUD on my part, but load-acquire type operations have specific
> > semantics and it's not clear to me that they interest with compiler
> > generated bit operations.  Thanks,  
> 
> They won't compile if you target bit ops, you can't take the address
> of a bitfield.

Yup, that's what I was assuming but was too lazy to prove it.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 07/24] vfio: Block device access via device fd until device is opened
@ 2023-06-13 17:31             ` Alex Williamson
  0 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-13 17:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liu, Yi L, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

On Tue, 13 Jun 2023 14:19:17 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Tue, Jun 13, 2023 at 08:16:47AM -0600, Alex Williamson wrote:
> 
> > > Not quite get why bit field is going to be incompatible with smp
> > > lockless operations. Could you elaborate a bit? And should I define
> > > the access_granted as u8 or "u8:1"?  
> > 
> > Perhaps FUD on my part, but load-acquire type operations have specific
> > semantics and it's not clear to me that they interest with compiler
> > generated bit operations.  Thanks,  
> 
> They won't compile if you target bit ops, you can't take the address
> of a bitfield.

Yup, that's what I was assuming but was too lazy to prove it.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
  2023-06-13 17:15                 ` Alex Williamson
@ 2023-06-13 17:35                     ` Jason Gunthorpe
  0 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-13 17:35 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Liu, Yi L, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

On Tue, Jun 13, 2023 at 11:15:11AM -0600, Alex Williamson wrote:
> [Sorry for breaking threading, replying to my own message id with reply
>  content from Yi since the Cc list got broken]

Yikes it is really busted, I think I fixed it?

> If we renamed your function above to vfio_device_has_iommu_group(),
> couldn't we just wrap device_add like below instead to not have cdev
> setup for a noiommu device, generate an error for a physical device w/o
> IOMMU backing, and otherwise setup the cdev device?
> 
> static inline int vfio_device_add(struct vfio_device *device, enum vfio_group_type type)
> {
> #if IS_ENABLED(CONFIG_VFIO_GROUP)
> 	if (device->group->type == VFIO_NO_IOMMU)
> 		return device_add(&device->device);

vfio_device_is_noiommu() embeds the IS_ENABLED

> #else
> 	if (type == VFIO_IOMMU && !vfio_device_has_iommu_group(device))
> 		return -EINVAL;
> #endif

The require test is this from the group code:

 	if (!device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY)) {

We could lift it out of the group code and call it from vfio_main.c like:

if (type == VFIO_IOMMU && !vfio_device_is_noiommu(vdev) && !device_iommu_capable(dev,
     IOMMU_CAP_CACHE_COHERENCY))
   FAIL

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
@ 2023-06-13 17:35                     ` Jason Gunthorpe
  0 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-13 17:35 UTC (permalink / raw)
  To: Alex Williamson
  Cc: mjrosato, jasowang, Hao, Xudong, Duan, Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, Liu, Yi L, kvm, lulu, Jiang,
	Yanting, joro, nicolinc, Tian, Kevin, Zhao, Yan Y, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Tue, Jun 13, 2023 at 11:15:11AM -0600, Alex Williamson wrote:
> [Sorry for breaking threading, replying to my own message id with reply
>  content from Yi since the Cc list got broken]

Yikes it is really busted, I think I fixed it?

> If we renamed your function above to vfio_device_has_iommu_group(),
> couldn't we just wrap device_add like below instead to not have cdev
> setup for a noiommu device, generate an error for a physical device w/o
> IOMMU backing, and otherwise setup the cdev device?
> 
> static inline int vfio_device_add(struct vfio_device *device, enum vfio_group_type type)
> {
> #if IS_ENABLED(CONFIG_VFIO_GROUP)
> 	if (device->group->type == VFIO_NO_IOMMU)
> 		return device_add(&device->device);

vfio_device_is_noiommu() embeds the IS_ENABLED

> #else
> 	if (type == VFIO_IOMMU && !vfio_device_has_iommu_group(device))
> 		return -EINVAL;
> #endif

The require test is this from the group code:

 	if (!device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY)) {

We could lift it out of the group code and call it from vfio_main.c like:

if (type == VFIO_IOMMU && !vfio_device_is_noiommu(vdev) && !device_iommu_capable(dev,
     IOMMU_CAP_CACHE_COHERENCY))
   FAIL

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
  2023-06-13 17:35                     ` [Intel-gfx] " Jason Gunthorpe
@ 2023-06-13 20:10                       ` Alex Williamson
  -1 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-13 20:10 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato, jasowang, Hao, Xudong, Duan, Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, Liu, Yi L, kvm, lulu, Jiang,
	Yanting, joro, nicolinc, Tian, Kevin, Zhao, Yan Y, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Tue, 13 Jun 2023 14:35:09 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Tue, Jun 13, 2023 at 11:15:11AM -0600, Alex Williamson wrote:
> > [Sorry for breaking threading, replying to my own message id with reply
> >  content from Yi since the Cc list got broken]  
> 
> Yikes it is really busted, I think I fixed it?
> 
> > If we renamed your function above to vfio_device_has_iommu_group(),
> > couldn't we just wrap device_add like below instead to not have cdev
> > setup for a noiommu device, generate an error for a physical device w/o
> > IOMMU backing, and otherwise setup the cdev device?
> > 
> > static inline int vfio_device_add(struct vfio_device *device, enum vfio_group_type type)
> > {
> > #if IS_ENABLED(CONFIG_VFIO_GROUP)
> > 	if (device->group->type == VFIO_NO_IOMMU)
> > 		return device_add(&device->device);  
> 
> vfio_device_is_noiommu() embeds the IS_ENABLED

But patch 23/ makes the definition of struct vfio_group conditional on
CONFIG_VFIO_GROUP, so while CONFIG_VFIO_NOIOMMU depends on
CONFIG_VFIO_GROUP and the result could be determined, I think the
compiler is still unhappy about the undefined reference.  We'd need a
!CONFIG_VFIO_GROUP stub for the function.

> > #else
> > 	if (type == VFIO_IOMMU && !vfio_device_has_iommu_group(device))
> > 		return -EINVAL;
> > #endif  
> 
> The require test is this from the group code:
> 
>  	if (!device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY)) {
> 
> We could lift it out of the group code and call it from vfio_main.c like:
> 
> if (type == VFIO_IOMMU && !vfio_device_is_noiommu(vdev) && !device_iommu_capable(dev,
>      IOMMU_CAP_CACHE_COHERENCY))
>    FAIL

Ack.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
@ 2023-06-13 20:10                       ` Alex Williamson
  0 siblings, 0 replies; 180+ messages in thread
From: Alex Williamson @ 2023-06-13 20:10 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liu, Yi L, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

On Tue, 13 Jun 2023 14:35:09 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Tue, Jun 13, 2023 at 11:15:11AM -0600, Alex Williamson wrote:
> > [Sorry for breaking threading, replying to my own message id with reply
> >  content from Yi since the Cc list got broken]  
> 
> Yikes it is really busted, I think I fixed it?
> 
> > If we renamed your function above to vfio_device_has_iommu_group(),
> > couldn't we just wrap device_add like below instead to not have cdev
> > setup for a noiommu device, generate an error for a physical device w/o
> > IOMMU backing, and otherwise setup the cdev device?
> > 
> > static inline int vfio_device_add(struct vfio_device *device, enum vfio_group_type type)
> > {
> > #if IS_ENABLED(CONFIG_VFIO_GROUP)
> > 	if (device->group->type == VFIO_NO_IOMMU)
> > 		return device_add(&device->device);  
> 
> vfio_device_is_noiommu() embeds the IS_ENABLED

But patch 23/ makes the definition of struct vfio_group conditional on
CONFIG_VFIO_GROUP, so while CONFIG_VFIO_NOIOMMU depends on
CONFIG_VFIO_GROUP and the result could be determined, I think the
compiler is still unhappy about the undefined reference.  We'd need a
!CONFIG_VFIO_GROUP stub for the function.

> > #else
> > 	if (type == VFIO_IOMMU && !vfio_device_has_iommu_group(device))
> > 		return -EINVAL;
> > #endif  
> 
> The require test is this from the group code:
> 
>  	if (!device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY)) {
> 
> We could lift it out of the group code and call it from vfio_main.c like:
> 
> if (type == VFIO_IOMMU && !vfio_device_is_noiommu(vdev) && !device_iommu_capable(dev,
>      IOMMU_CAP_CACHE_COHERENCY))
>    FAIL

Ack.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
  2023-06-13 20:10                       ` Alex Williamson
@ 2023-06-14  3:24                         ` Liu, Yi L
  -1 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-14  3:24 UTC (permalink / raw)
  To: Alex Williamson, Jason Gunthorpe
  Cc: Tian, Kevin, joro, robin.murphy, cohuck, eric.auger, nicolinc,
	kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Wednesday, June 14, 2023 4:11 AM
> 
> On Tue, 13 Jun 2023 14:35:09 -0300
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > On Tue, Jun 13, 2023 at 11:15:11AM -0600, Alex Williamson wrote:
> > > [Sorry for breaking threading, replying to my own message id with reply
> > >  content from Yi since the Cc list got broken]
> >
> > Yikes it is really busted, I think I fixed it?
> >
> > > If we renamed your function above to vfio_device_has_iommu_group(),
> > > couldn't we just wrap device_add like below instead to not have cdev
> > > setup for a noiommu device, generate an error for a physical device w/o
> > > IOMMU backing, and otherwise setup the cdev device?
> > >
> > > static inline int vfio_device_add(struct vfio_device *device, enum vfio_group_type
> type)
> > > {
> > > #if IS_ENABLED(CONFIG_VFIO_GROUP)
> > > 	if (device->group->type == VFIO_NO_IOMMU)
> > > 		return device_add(&device->device);
> >
> > vfio_device_is_noiommu() embeds the IS_ENABLED
> 
> But patch 23/ makes the definition of struct vfio_group conditional on
> CONFIG_VFIO_GROUP, so while CONFIG_VFIO_NOIOMMU depends on
> CONFIG_VFIO_GROUP and the result could be determined, I think the
> compiler is still unhappy about the undefined reference.  We'd need a
> !CONFIG_VFIO_GROUP stub for the function.
> 
> > > #else
> > > 	if (type == VFIO_IOMMU && !vfio_device_has_iommu_group(device))
> > > 		return -EINVAL;
> > > #endif
> >
> > The require test is this from the group code:
> >
> >  	if (!device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY)) {
> >
> > We could lift it out of the group code and call it from vfio_main.c like:
> >
> > if (type == VFIO_IOMMU && !vfio_device_is_noiommu(vdev)
> && !device_iommu_capable(dev,
> >      IOMMU_CAP_CACHE_COHERENCY))
> >    FAIL
> 
> Ack.  Thanks,

So, what I got is:

1) Add bellow check in __vfio_register_dev() to fail the physical devices that
    don't have IOMMU protection.

	/*
	  * noiommu device is a special type supported by the group interface.
	  * Such type represents the physical devices  that are not iommu backed.
	  */
	if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device)) &&
	    !vfio_device_has_iommu_group(device))
		return -EINVAL; //or maybe -EOPNOTSUPP?

Nit: require a vfio_device_is_noiommu() stub which returns false for
the VFIO_GROUP unset case.

2) Have below functions to add device

#if IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV)
static inline int vfio_device_add(struct vfio_device *device)
{
	if (vfio_device_is_noiommu(device))
		return device_add(&device->device);
	vfio_init_device_cdev(device);
	return cdev_device_add(&device->cdev, &device->device);
}

static inline void vfio_device_del(struct vfio_device *device)
{
	if (vfio_device_is_noiommu(device))
		return device_del(&device->device);
	cdev_device_del(&device->cdev, &device->device);
}
#else
blabla
#endif

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
@ 2023-06-14  3:24                         ` Liu, Yi L
  0 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-14  3:24 UTC (permalink / raw)
  To: Alex Williamson, Jason Gunthorpe
  Cc: mjrosato, jasowang, Hao, Xudong, Duan,  Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, Tian, Kevin, Zhao, Yan Y, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Wednesday, June 14, 2023 4:11 AM
> 
> On Tue, 13 Jun 2023 14:35:09 -0300
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > On Tue, Jun 13, 2023 at 11:15:11AM -0600, Alex Williamson wrote:
> > > [Sorry for breaking threading, replying to my own message id with reply
> > >  content from Yi since the Cc list got broken]
> >
> > Yikes it is really busted, I think I fixed it?
> >
> > > If we renamed your function above to vfio_device_has_iommu_group(),
> > > couldn't we just wrap device_add like below instead to not have cdev
> > > setup for a noiommu device, generate an error for a physical device w/o
> > > IOMMU backing, and otherwise setup the cdev device?
> > >
> > > static inline int vfio_device_add(struct vfio_device *device, enum vfio_group_type
> type)
> > > {
> > > #if IS_ENABLED(CONFIG_VFIO_GROUP)
> > > 	if (device->group->type == VFIO_NO_IOMMU)
> > > 		return device_add(&device->device);
> >
> > vfio_device_is_noiommu() embeds the IS_ENABLED
> 
> But patch 23/ makes the definition of struct vfio_group conditional on
> CONFIG_VFIO_GROUP, so while CONFIG_VFIO_NOIOMMU depends on
> CONFIG_VFIO_GROUP and the result could be determined, I think the
> compiler is still unhappy about the undefined reference.  We'd need a
> !CONFIG_VFIO_GROUP stub for the function.
> 
> > > #else
> > > 	if (type == VFIO_IOMMU && !vfio_device_has_iommu_group(device))
> > > 		return -EINVAL;
> > > #endif
> >
> > The require test is this from the group code:
> >
> >  	if (!device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY)) {
> >
> > We could lift it out of the group code and call it from vfio_main.c like:
> >
> > if (type == VFIO_IOMMU && !vfio_device_is_noiommu(vdev)
> && !device_iommu_capable(dev,
> >      IOMMU_CAP_CACHE_COHERENCY))
> >    FAIL
> 
> Ack.  Thanks,

So, what I got is:

1) Add bellow check in __vfio_register_dev() to fail the physical devices that
    don't have IOMMU protection.

	/*
	  * noiommu device is a special type supported by the group interface.
	  * Such type represents the physical devices  that are not iommu backed.
	  */
	if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device)) &&
	    !vfio_device_has_iommu_group(device))
		return -EINVAL; //or maybe -EOPNOTSUPP?

Nit: require a vfio_device_is_noiommu() stub which returns false for
the VFIO_GROUP unset case.

2) Have below functions to add device

#if IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV)
static inline int vfio_device_add(struct vfio_device *device)
{
	if (vfio_device_is_noiommu(device))
		return device_add(&device->device);
	vfio_init_device_cdev(device);
	return cdev_device_add(&device->cdev, &device->device);
}

static inline void vfio_device_del(struct vfio_device *device)
{
	if (vfio_device_is_noiommu(device))
		return device_del(&device->device);
	cdev_device_del(&device->cdev, &device->device);
}
#else
blabla
#endif

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
  2023-06-14  3:24                         ` [Intel-gfx] " Liu, Yi L
@ 2023-06-14  5:42                           ` Tian, Kevin
  -1 siblings, 0 replies; 180+ messages in thread
From: Tian, Kevin @ 2023-06-14  5:42 UTC (permalink / raw)
  To: Liu, Yi L, Alex Williamson, Jason Gunthorpe
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Wednesday, June 14, 2023 11:24 AM
> 
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Wednesday, June 14, 2023 4:11 AM
> >
> > On Tue, 13 Jun 2023 14:35:09 -0300
> > Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > > On Tue, Jun 13, 2023 at 11:15:11AM -0600, Alex Williamson wrote:
> > > > [Sorry for breaking threading, replying to my own message id with reply
> > > >  content from Yi since the Cc list got broken]
> > >
> > > Yikes it is really busted, I think I fixed it?
> > >
> > > > If we renamed your function above to vfio_device_has_iommu_group(),
> > > > couldn't we just wrap device_add like below instead to not have cdev
> > > > setup for a noiommu device, generate an error for a physical device
> w/o
> > > > IOMMU backing, and otherwise setup the cdev device?
> > > >
> > > > static inline int vfio_device_add(struct vfio_device *device, enum
> vfio_group_type
> > type)
> > > > {
> > > > #if IS_ENABLED(CONFIG_VFIO_GROUP)
> > > > 	if (device->group->type == VFIO_NO_IOMMU)
> > > > 		return device_add(&device->device);
> > >
> > > vfio_device_is_noiommu() embeds the IS_ENABLED
> >
> > But patch 23/ makes the definition of struct vfio_group conditional on
> > CONFIG_VFIO_GROUP, so while CONFIG_VFIO_NOIOMMU depends on
> > CONFIG_VFIO_GROUP and the result could be determined, I think the
> > compiler is still unhappy about the undefined reference.  We'd need a
> > !CONFIG_VFIO_GROUP stub for the function.
> >
> > > > #else
> > > > 	if (type == VFIO_IOMMU && !vfio_device_has_iommu_group(device))
> > > > 		return -EINVAL;
> > > > #endif
> > >
> > > The require test is this from the group code:
> > >
> > >  	if (!device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY))
> {
> > >
> > > We could lift it out of the group code and call it from vfio_main.c like:
> > >
> > > if (type == VFIO_IOMMU && !vfio_device_is_noiommu(vdev)
> > && !device_iommu_capable(dev,
> > >      IOMMU_CAP_CACHE_COHERENCY))
> > >    FAIL
> >
> > Ack.  Thanks,
> 
> So, what I got is:
> 
> 1) Add bellow check in __vfio_register_dev() to fail the physical devices that
>     don't have IOMMU protection.
> 
> 	/*
> 	  * noiommu device is a special type supported by the group interface.
> 	  * Such type represents the physical devices  that are not iommu
> backed.
> 	  */
> 	if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device)) &&
> 	    !vfio_device_has_iommu_group(device))
> 		return -EINVAL; //or maybe -EOPNOTSUPP?
> 
> Nit: require a vfio_device_is_noiommu() stub which returns false for
> the VFIO_GROUP unset case.

device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY) is valid
only for cases with iommu groups. So that check already  covers the
group verification indirectly.

With that I think Jason's suggestion is to lift that test into main.c:

int vfio_register_group_dev(struct vfio_device *device)
{
	/*
	 * VFIO always sets IOMMU_CACHE because we offer no way for userspace to
	 * restore cache coherency. It has to be checked here because it is only
	 * valid for cases where we are using iommu groups.
	 */
	if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device) &&
	    !device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY))
		return ERR_PTR(-EINVAL);

	return __vfio_register_dev(device, VFIO_IOMMU);
}

> 
> 2) Have below functions to add device
> 
> #if IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV)
> static inline int vfio_device_add(struct vfio_device *device)
> {
> 	if (vfio_device_is_noiommu(device))
> 		return device_add(&device->device);
> 	vfio_init_device_cdev(device);
> 	return cdev_device_add(&device->cdev, &device->device);
> }
> 
> static inline void vfio_device_del(struct vfio_device *device)
> {
> 	if (vfio_device_is_noiommu(device))
> 		return device_del(&device->device);
> 	cdev_device_del(&device->cdev, &device->device);
> }

Correct

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
@ 2023-06-14  5:42                           ` Tian, Kevin
  0 siblings, 0 replies; 180+ messages in thread
From: Tian, Kevin @ 2023-06-14  5:42 UTC (permalink / raw)
  To: Liu, Yi L, Alex Williamson, Jason Gunthorpe
  Cc: mjrosato, jasowang, Hao, Xudong, peterx, Xu, Terrence,
	chao.p.peng, linux-s390, kvm, lulu, Duan,  Zhenzhong, joro,
	nicolinc, Zhao, Yan Y, intel-gfx, eric.auger, intel-gvt-dev,
	yi.y.sun, clegoate, cohuck, shameerali.kolothum.thodi,
	suravee.suthikulpanit, Jiang, Yanting, robin.murphy

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Wednesday, June 14, 2023 11:24 AM
> 
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Wednesday, June 14, 2023 4:11 AM
> >
> > On Tue, 13 Jun 2023 14:35:09 -0300
> > Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > > On Tue, Jun 13, 2023 at 11:15:11AM -0600, Alex Williamson wrote:
> > > > [Sorry for breaking threading, replying to my own message id with reply
> > > >  content from Yi since the Cc list got broken]
> > >
> > > Yikes it is really busted, I think I fixed it?
> > >
> > > > If we renamed your function above to vfio_device_has_iommu_group(),
> > > > couldn't we just wrap device_add like below instead to not have cdev
> > > > setup for a noiommu device, generate an error for a physical device
> w/o
> > > > IOMMU backing, and otherwise setup the cdev device?
> > > >
> > > > static inline int vfio_device_add(struct vfio_device *device, enum
> vfio_group_type
> > type)
> > > > {
> > > > #if IS_ENABLED(CONFIG_VFIO_GROUP)
> > > > 	if (device->group->type == VFIO_NO_IOMMU)
> > > > 		return device_add(&device->device);
> > >
> > > vfio_device_is_noiommu() embeds the IS_ENABLED
> >
> > But patch 23/ makes the definition of struct vfio_group conditional on
> > CONFIG_VFIO_GROUP, so while CONFIG_VFIO_NOIOMMU depends on
> > CONFIG_VFIO_GROUP and the result could be determined, I think the
> > compiler is still unhappy about the undefined reference.  We'd need a
> > !CONFIG_VFIO_GROUP stub for the function.
> >
> > > > #else
> > > > 	if (type == VFIO_IOMMU && !vfio_device_has_iommu_group(device))
> > > > 		return -EINVAL;
> > > > #endif
> > >
> > > The require test is this from the group code:
> > >
> > >  	if (!device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY))
> {
> > >
> > > We could lift it out of the group code and call it from vfio_main.c like:
> > >
> > > if (type == VFIO_IOMMU && !vfio_device_is_noiommu(vdev)
> > && !device_iommu_capable(dev,
> > >      IOMMU_CAP_CACHE_COHERENCY))
> > >    FAIL
> >
> > Ack.  Thanks,
> 
> So, what I got is:
> 
> 1) Add bellow check in __vfio_register_dev() to fail the physical devices that
>     don't have IOMMU protection.
> 
> 	/*
> 	  * noiommu device is a special type supported by the group interface.
> 	  * Such type represents the physical devices  that are not iommu
> backed.
> 	  */
> 	if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device)) &&
> 	    !vfio_device_has_iommu_group(device))
> 		return -EINVAL; //or maybe -EOPNOTSUPP?
> 
> Nit: require a vfio_device_is_noiommu() stub which returns false for
> the VFIO_GROUP unset case.

device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY) is valid
only for cases with iommu groups. So that check already  covers the
group verification indirectly.

With that I think Jason's suggestion is to lift that test into main.c:

int vfio_register_group_dev(struct vfio_device *device)
{
	/*
	 * VFIO always sets IOMMU_CACHE because we offer no way for userspace to
	 * restore cache coherency. It has to be checked here because it is only
	 * valid for cases where we are using iommu groups.
	 */
	if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device) &&
	    !device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY))
		return ERR_PTR(-EINVAL);

	return __vfio_register_dev(device, VFIO_IOMMU);
}

> 
> 2) Have below functions to add device
> 
> #if IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV)
> static inline int vfio_device_add(struct vfio_device *device)
> {
> 	if (vfio_device_is_noiommu(device))
> 		return device_add(&device->device);
> 	vfio_init_device_cdev(device);
> 	return cdev_device_add(&device->cdev, &device->device);
> }
> 
> static inline void vfio_device_del(struct vfio_device *device)
> {
> 	if (vfio_device_is_noiommu(device))
> 		return device_del(&device->device);
> 	cdev_device_del(&device->cdev, &device->device);
> }

Correct

^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
  2023-06-14  5:42                           ` [Intel-gfx] " Tian, Kevin
@ 2023-06-14  6:14                             ` Liu, Yi L
  -1 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-14  6:14 UTC (permalink / raw)
  To: Tian, Kevin, Alex Williamson, Jason Gunthorpe
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Wednesday, June 14, 2023 1:42 PM
> 
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Wednesday, June 14, 2023 11:24 AM
> >
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Wednesday, June 14, 2023 4:11 AM
> > >
> > > On Tue, 13 Jun 2023 14:35:09 -0300
> > > Jason Gunthorpe <jgg@nvidia.com> wrote:
> > >
> > > > On Tue, Jun 13, 2023 at 11:15:11AM -0600, Alex Williamson wrote:
> > > > > [Sorry for breaking threading, replying to my own message id with reply
> > > > >  content from Yi since the Cc list got broken]
> > > >
> > > > Yikes it is really busted, I think I fixed it?
> > > >
> > > > > If we renamed your function above to vfio_device_has_iommu_group(),
> > > > > couldn't we just wrap device_add like below instead to not have cdev
> > > > > setup for a noiommu device, generate an error for a physical device
> > w/o
> > > > > IOMMU backing, and otherwise setup the cdev device?
> > > > >
> > > > > static inline int vfio_device_add(struct vfio_device *device, enum
> > vfio_group_type
> > > type)
> > > > > {
> > > > > #if IS_ENABLED(CONFIG_VFIO_GROUP)
> > > > > 	if (device->group->type == VFIO_NO_IOMMU)
> > > > > 		return device_add(&device->device);
> > > >
> > > > vfio_device_is_noiommu() embeds the IS_ENABLED
> > >
> > > But patch 23/ makes the definition of struct vfio_group conditional on
> > > CONFIG_VFIO_GROUP, so while CONFIG_VFIO_NOIOMMU depends on
> > > CONFIG_VFIO_GROUP and the result could be determined, I think the
> > > compiler is still unhappy about the undefined reference.  We'd need a
> > > !CONFIG_VFIO_GROUP stub for the function.
> > >
> > > > > #else
> > > > > 	if (type == VFIO_IOMMU && !vfio_device_has_iommu_group(device))
> > > > > 		return -EINVAL;
> > > > > #endif
> > > >
> > > > The require test is this from the group code:
> > > >
> > > >  	if (!device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY))
> > {
> > > >
> > > > We could lift it out of the group code and call it from vfio_main.c like:
> > > >
> > > > if (type == VFIO_IOMMU && !vfio_device_is_noiommu(vdev)
> > > && !device_iommu_capable(dev,
> > > >      IOMMU_CAP_CACHE_COHERENCY))
> > > >    FAIL
> > >
> > > Ack.  Thanks,
> >
> > So, what I got is:
> >
> > 1) Add bellow check in __vfio_register_dev() to fail the physical devices that
> >     don't have IOMMU protection.
> >
> > 	/*
> > 	  * noiommu device is a special type supported by the group interface.
> > 	  * Such type represents the physical devices  that are not iommu
> > backed.
> > 	  */
> > 	if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device)) &&
> > 	    !vfio_device_has_iommu_group(device))
> > 		return -EINVAL; //or maybe -EOPNOTSUPP?
> >
> > Nit: require a vfio_device_is_noiommu() stub which returns false for
> > the VFIO_GROUP unset case.
> 
> device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY) is valid
> only for cases with iommu groups. So that check already  covers the
> group verification indirectly.

Okay. This IOMMU_CAP_CACHE_COHERENCY check is missed in the cdev
path.

> With that I think Jason's suggestion is to lift that test into main.c:
> 
> int vfio_register_group_dev(struct vfio_device *device)
> {
> 	/*
> 	 * VFIO always sets IOMMU_CACHE because we offer no way for userspace to
> 	 * restore cache coherency. It has to be checked here because it is only
> 	 * valid for cases where we are using iommu groups.
> 	 */
> 	if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device) &&
> 	    !device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY))
> 		return ERR_PTR(-EINVAL);

vfio_device_is_noiommu() needs to be called after vfio_device_set_group().
Otherwise, it's always false. So still needs to call it in the __vfio_register_dev().

> 	return __vfio_register_dev(device, VFIO_IOMMU);
> }
> 
> >
> > 2) Have below functions to add device
> >
> > #if IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV)
> > static inline int vfio_device_add(struct vfio_device *device)
> > {
> > 	if (vfio_device_is_noiommu(device))
> > 		return device_add(&device->device);
> > 	vfio_init_device_cdev(device);
> > 	return cdev_device_add(&device->cdev, &device->device);
> > }
> >
> > static inline void vfio_device_del(struct vfio_device *device)
> > {
> > 	if (vfio_device_is_noiommu(device))
> > 		return device_del(&device->device);
> > 	cdev_device_del(&device->cdev, &device->device);
> > }
> 
> Correct

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
@ 2023-06-14  6:14                             ` Liu, Yi L
  0 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-14  6:14 UTC (permalink / raw)
  To: Tian, Kevin, Alex Williamson, Jason Gunthorpe
  Cc: mjrosato, jasowang, Hao, Xudong, peterx, Xu, Terrence,
	chao.p.peng, linux-s390, kvm, lulu, Duan,  Zhenzhong, joro,
	nicolinc, Zhao, Yan Y, intel-gfx, eric.auger, intel-gvt-dev,
	yi.y.sun, clegoate, cohuck, shameerali.kolothum.thodi,
	suravee.suthikulpanit, Jiang, Yanting, robin.murphy

> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Wednesday, June 14, 2023 1:42 PM
> 
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Wednesday, June 14, 2023 11:24 AM
> >
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Wednesday, June 14, 2023 4:11 AM
> > >
> > > On Tue, 13 Jun 2023 14:35:09 -0300
> > > Jason Gunthorpe <jgg@nvidia.com> wrote:
> > >
> > > > On Tue, Jun 13, 2023 at 11:15:11AM -0600, Alex Williamson wrote:
> > > > > [Sorry for breaking threading, replying to my own message id with reply
> > > > >  content from Yi since the Cc list got broken]
> > > >
> > > > Yikes it is really busted, I think I fixed it?
> > > >
> > > > > If we renamed your function above to vfio_device_has_iommu_group(),
> > > > > couldn't we just wrap device_add like below instead to not have cdev
> > > > > setup for a noiommu device, generate an error for a physical device
> > w/o
> > > > > IOMMU backing, and otherwise setup the cdev device?
> > > > >
> > > > > static inline int vfio_device_add(struct vfio_device *device, enum
> > vfio_group_type
> > > type)
> > > > > {
> > > > > #if IS_ENABLED(CONFIG_VFIO_GROUP)
> > > > > 	if (device->group->type == VFIO_NO_IOMMU)
> > > > > 		return device_add(&device->device);
> > > >
> > > > vfio_device_is_noiommu() embeds the IS_ENABLED
> > >
> > > But patch 23/ makes the definition of struct vfio_group conditional on
> > > CONFIG_VFIO_GROUP, so while CONFIG_VFIO_NOIOMMU depends on
> > > CONFIG_VFIO_GROUP and the result could be determined, I think the
> > > compiler is still unhappy about the undefined reference.  We'd need a
> > > !CONFIG_VFIO_GROUP stub for the function.
> > >
> > > > > #else
> > > > > 	if (type == VFIO_IOMMU && !vfio_device_has_iommu_group(device))
> > > > > 		return -EINVAL;
> > > > > #endif
> > > >
> > > > The require test is this from the group code:
> > > >
> > > >  	if (!device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY))
> > {
> > > >
> > > > We could lift it out of the group code and call it from vfio_main.c like:
> > > >
> > > > if (type == VFIO_IOMMU && !vfio_device_is_noiommu(vdev)
> > > && !device_iommu_capable(dev,
> > > >      IOMMU_CAP_CACHE_COHERENCY))
> > > >    FAIL
> > >
> > > Ack.  Thanks,
> >
> > So, what I got is:
> >
> > 1) Add bellow check in __vfio_register_dev() to fail the physical devices that
> >     don't have IOMMU protection.
> >
> > 	/*
> > 	  * noiommu device is a special type supported by the group interface.
> > 	  * Such type represents the physical devices  that are not iommu
> > backed.
> > 	  */
> > 	if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device)) &&
> > 	    !vfio_device_has_iommu_group(device))
> > 		return -EINVAL; //or maybe -EOPNOTSUPP?
> >
> > Nit: require a vfio_device_is_noiommu() stub which returns false for
> > the VFIO_GROUP unset case.
> 
> device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY) is valid
> only for cases with iommu groups. So that check already  covers the
> group verification indirectly.

Okay. This IOMMU_CAP_CACHE_COHERENCY check is missed in the cdev
path.

> With that I think Jason's suggestion is to lift that test into main.c:
> 
> int vfio_register_group_dev(struct vfio_device *device)
> {
> 	/*
> 	 * VFIO always sets IOMMU_CACHE because we offer no way for userspace to
> 	 * restore cache coherency. It has to be checked here because it is only
> 	 * valid for cases where we are using iommu groups.
> 	 */
> 	if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device) &&
> 	    !device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY))
> 		return ERR_PTR(-EINVAL);

vfio_device_is_noiommu() needs to be called after vfio_device_set_group().
Otherwise, it's always false. So still needs to call it in the __vfio_register_dev().

> 	return __vfio_register_dev(device, VFIO_IOMMU);
> }
> 
> >
> > 2) Have below functions to add device
> >
> > #if IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV)
> > static inline int vfio_device_add(struct vfio_device *device)
> > {
> > 	if (vfio_device_is_noiommu(device))
> > 		return device_add(&device->device);
> > 	vfio_init_device_cdev(device);
> > 	return cdev_device_add(&device->cdev, &device->device);
> > }
> >
> > static inline void vfio_device_del(struct vfio_device *device)
> > {
> > 	if (vfio_device_is_noiommu(device))
> > 		return device_del(&device->device);
> > 	cdev_device_del(&device->cdev, &device->device);
> > }
> 
> Correct

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
  2023-06-14  6:14                             ` [Intel-gfx] " Liu, Yi L
@ 2023-06-14  6:20                               ` Tian, Kevin
  -1 siblings, 0 replies; 180+ messages in thread
From: Tian, Kevin @ 2023-06-14  6:20 UTC (permalink / raw)
  To: Liu, Yi L, Alex Williamson, Jason Gunthorpe
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Wednesday, June 14, 2023 2:14 PM
> 
> 
> > With that I think Jason's suggestion is to lift that test into main.c:
> >
> > int vfio_register_group_dev(struct vfio_device *device)
> > {
> > 	/*
> > 	 * VFIO always sets IOMMU_CACHE because we offer no way for
> userspace to
> > 	 * restore cache coherency. It has to be checked here because it is
> only
> > 	 * valid for cases where we are using iommu groups.
> > 	 */
> > 	if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device) &&
> > 	    !device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY))
> > 		return ERR_PTR(-EINVAL);
> 
> vfio_device_is_noiommu() needs to be called after vfio_device_set_group().
> Otherwise, it's always false. So still needs to call it in the
> __vfio_register_dev().

yes

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
@ 2023-06-14  6:20                               ` Tian, Kevin
  0 siblings, 0 replies; 180+ messages in thread
From: Tian, Kevin @ 2023-06-14  6:20 UTC (permalink / raw)
  To: Liu, Yi L, Alex Williamson, Jason Gunthorpe
  Cc: mjrosato, jasowang, Hao, Xudong, peterx, Xu, Terrence,
	chao.p.peng, linux-s390, kvm, lulu, Duan,  Zhenzhong, joro,
	nicolinc, Zhao, Yan Y, intel-gfx, eric.auger, intel-gvt-dev,
	yi.y.sun, clegoate, cohuck, shameerali.kolothum.thodi,
	suravee.suthikulpanit, Jiang, Yanting, robin.murphy

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Wednesday, June 14, 2023 2:14 PM
> 
> 
> > With that I think Jason's suggestion is to lift that test into main.c:
> >
> > int vfio_register_group_dev(struct vfio_device *device)
> > {
> > 	/*
> > 	 * VFIO always sets IOMMU_CACHE because we offer no way for
> userspace to
> > 	 * restore cache coherency. It has to be checked here because it is
> only
> > 	 * valid for cases where we are using iommu groups.
> > 	 */
> > 	if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device) &&
> > 	    !device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY))
> > 		return ERR_PTR(-EINVAL);
> 
> vfio_device_is_noiommu() needs to be called after vfio_device_set_group().
> Otherwise, it's always false. So still needs to call it in the
> __vfio_register_dev().

yes

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
  2023-06-14  6:20                               ` [Intel-gfx] " Tian, Kevin
@ 2023-06-14 12:23                                 ` Jason Gunthorpe
  -1 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-14 12:23 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: mjrosato, jasowang, Hao, Xudong, Duan, Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, Liu, Yi L, kvm, lulu, Jiang,
	Yanting, joro, nicolinc, Zhao, Yan Y, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Wed, Jun 14, 2023 at 06:20:10AM +0000, Tian, Kevin wrote:
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Wednesday, June 14, 2023 2:14 PM
> > 
> > 
> > > With that I think Jason's suggestion is to lift that test into main.c:
> > >
> > > int vfio_register_group_dev(struct vfio_device *device)
> > > {
> > > 	/*
> > > 	 * VFIO always sets IOMMU_CACHE because we offer no way for
> > userspace to
> > > 	 * restore cache coherency. It has to be checked here because it is
> > only
> > > 	 * valid for cases where we are using iommu groups.
> > > 	 */
> > > 	if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device) &&
> > > 	    !device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY))
> > > 		return ERR_PTR(-EINVAL);
> > 
> > vfio_device_is_noiommu() needs to be called after vfio_device_set_group().
> > Otherwise, it's always false. So still needs to call it in the
> > __vfio_register_dev().
> 
> yes

Right, but it needs to be in vfio_main.c, not in the group.c - so
another patch should be added to move it.

I prefer the idea that vfio_device_is_noiommu() works in all the
kconfig scenarios rather than adding #ifdefs.

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
@ 2023-06-14 12:23                                 ` Jason Gunthorpe
  0 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-14 12:23 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, Alex Williamson, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390, Hao,
	Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting, Duan,
	Zhenzhong, clegoate

On Wed, Jun 14, 2023 at 06:20:10AM +0000, Tian, Kevin wrote:
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Wednesday, June 14, 2023 2:14 PM
> > 
> > 
> > > With that I think Jason's suggestion is to lift that test into main.c:
> > >
> > > int vfio_register_group_dev(struct vfio_device *device)
> > > {
> > > 	/*
> > > 	 * VFIO always sets IOMMU_CACHE because we offer no way for
> > userspace to
> > > 	 * restore cache coherency. It has to be checked here because it is
> > only
> > > 	 * valid for cases where we are using iommu groups.
> > > 	 */
> > > 	if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device) &&
> > > 	    !device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY))
> > > 		return ERR_PTR(-EINVAL);
> > 
> > vfio_device_is_noiommu() needs to be called after vfio_device_set_group().
> > Otherwise, it's always false. So still needs to call it in the
> > __vfio_register_dev().
> 
> yes

Right, but it needs to be in vfio_main.c, not in the group.c - so
another patch should be added to move it.

I prefer the idea that vfio_device_is_noiommu() works in all the
kconfig scenarios rather than adding #ifdefs.

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
  2023-06-14 12:23                                 ` Jason Gunthorpe
@ 2023-06-14 13:12                                   ` Liu, Yi L
  -1 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-14 13:12 UTC (permalink / raw)
  To: Jason Gunthorpe, Tian, Kevin
  Cc: mjrosato, jasowang, Hao, Xudong, Duan,  Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, Zhao, Yan Y, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, June 14, 2023 8:23 PM
> On Wed, Jun 14, 2023 at 06:20:10AM +0000, Tian, Kevin wrote:
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Wednesday, June 14, 2023 2:14 PM
> > >
> > >
> > > > With that I think Jason's suggestion is to lift that test into main.c:
> > > >
> > > > int vfio_register_group_dev(struct vfio_device *device)
> > > > {
> > > > 	/*
> > > > 	 * VFIO always sets IOMMU_CACHE because we offer no way for
> > > userspace to
> > > > 	 * restore cache coherency. It has to be checked here because it is
> > > only
> > > > 	 * valid for cases where we are using iommu groups.
> > > > 	 */
> > > > 	if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device) &&
> > > > 	    !device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY))
> > > > 		return ERR_PTR(-EINVAL);
> > >
> > > vfio_device_is_noiommu() needs to be called after vfio_device_set_group().
> > > Otherwise, it's always false. So still needs to call it in the
> > > __vfio_register_dev().
> >
> > yes
> 
> Right, but it needs to be in vfio_main.c, not in the group.c - so
> another patch should be added to move it.

I've got a patch as below to move it.

From 306e71325d255eef34a1c44312bf9cdc8c302faa Mon Sep 17 00:00:00 2001
From: Yi Liu <yi.l.liu@intel.com>
Date: Wed, 14 Jun 2023 00:37:52 -0700
Subject: [PATCH] vfio: Move the IOMMU_CAP_CACHE_COHERENCY check in
 __vfio_register_dev()

The IOMMU_CAP_CACHE_COHERENCY check only applies to the physical devices
that are IOMMU-backed. This change prepares for compiling the vfio_group
infrastructure optionally as cdev does not support the physical devices
that are not IOMMU-backed. This check help to fail the device registration
for such devices if only vfio_group infrastructure is compiled out.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c     | 10 ----------
 drivers/vfio/vfio_main.c | 11 +++++++++++
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 41a09a2df690..c2e0128323a7 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -687,16 +687,6 @@ static struct vfio_group *vfio_group_find_or_alloc(struct device *dev)
 	if (!iommu_group)
 		return ERR_PTR(-EINVAL);
 
-	/*
-	 * VFIO always sets IOMMU_CACHE because we offer no way for userspace to
-	 * restore cache coherency. It has to be checked here because it is only
-	 * valid for cases where we are using iommu groups.
-	 */
-	if (!device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY)) {
-		iommu_group_put(iommu_group);
-		return ERR_PTR(-EINVAL);
-	}
-
 	mutex_lock(&vfio.group_lock);
 	group = vfio_group_find_from_iommu(iommu_group);
 	if (group) {
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 51c80eb32af6..ffb4585b7f0e 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -292,6 +292,17 @@ static int __vfio_register_dev(struct vfio_device *device,
 	if (ret)
 		return ret;
 
+	/*
+	 * VFIO always sets IOMMU_CACHE because we offer no way for userspace to
+	 * restore cache coherency. It has to be checked here because it is only
+	 * valid for cases where we are using iommu groups.
+	 */
+	if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device) &&
+	    !device_iommu_capable(device->dev, IOMMU_CAP_CACHE_COHERENCY)) {
+		ret = -EINVAL;
+		goto err_out;
+	}
+
 	ret = vfio_device_add(device);
 	if (ret)
 		goto err_out;
-- 
2.34.1

> I prefer the idea that vfio_device_is_noiommu() works in all the
> kconfig scenarios rather than adding #ifdefs.

But the vfio_group would be empty when CONFIG_VFIO_GROUP is unset.
From what I got now, when CONFIG_VFIO_GROUP is unset, the stub
function always returns false.

#if IS_ENABLED(CONFIG_VFIO_GROUP)
struct vfio_group {
	...;
};

static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
{
        return IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
               vdev->group->type == VFIO_NO_IOMMU;
}
#else
struct vfio_group;

static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
{
        return false;
}
#endif

Regards,
Yi Liu


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* RE: [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
@ 2023-06-14 13:12                                   ` Liu, Yi L
  0 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-14 13:12 UTC (permalink / raw)
  To: Jason Gunthorpe, Tian, Kevin
  Cc: Alex Williamson, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, June 14, 2023 8:23 PM
> On Wed, Jun 14, 2023 at 06:20:10AM +0000, Tian, Kevin wrote:
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Wednesday, June 14, 2023 2:14 PM
> > >
> > >
> > > > With that I think Jason's suggestion is to lift that test into main.c:
> > > >
> > > > int vfio_register_group_dev(struct vfio_device *device)
> > > > {
> > > > 	/*
> > > > 	 * VFIO always sets IOMMU_CACHE because we offer no way for
> > > userspace to
> > > > 	 * restore cache coherency. It has to be checked here because it is
> > > only
> > > > 	 * valid for cases where we are using iommu groups.
> > > > 	 */
> > > > 	if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device) &&
> > > > 	    !device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY))
> > > > 		return ERR_PTR(-EINVAL);
> > >
> > > vfio_device_is_noiommu() needs to be called after vfio_device_set_group().
> > > Otherwise, it's always false. So still needs to call it in the
> > > __vfio_register_dev().
> >
> > yes
> 
> Right, but it needs to be in vfio_main.c, not in the group.c - so
> another patch should be added to move it.

I've got a patch as below to move it.

From 306e71325d255eef34a1c44312bf9cdc8c302faa Mon Sep 17 00:00:00 2001
From: Yi Liu <yi.l.liu@intel.com>
Date: Wed, 14 Jun 2023 00:37:52 -0700
Subject: [PATCH] vfio: Move the IOMMU_CAP_CACHE_COHERENCY check in
 __vfio_register_dev()

The IOMMU_CAP_CACHE_COHERENCY check only applies to the physical devices
that are IOMMU-backed. This change prepares for compiling the vfio_group
infrastructure optionally as cdev does not support the physical devices
that are not IOMMU-backed. This check help to fail the device registration
for such devices if only vfio_group infrastructure is compiled out.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c     | 10 ----------
 drivers/vfio/vfio_main.c | 11 +++++++++++
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 41a09a2df690..c2e0128323a7 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -687,16 +687,6 @@ static struct vfio_group *vfio_group_find_or_alloc(struct device *dev)
 	if (!iommu_group)
 		return ERR_PTR(-EINVAL);
 
-	/*
-	 * VFIO always sets IOMMU_CACHE because we offer no way for userspace to
-	 * restore cache coherency. It has to be checked here because it is only
-	 * valid for cases where we are using iommu groups.
-	 */
-	if (!device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY)) {
-		iommu_group_put(iommu_group);
-		return ERR_PTR(-EINVAL);
-	}
-
 	mutex_lock(&vfio.group_lock);
 	group = vfio_group_find_from_iommu(iommu_group);
 	if (group) {
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 51c80eb32af6..ffb4585b7f0e 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -292,6 +292,17 @@ static int __vfio_register_dev(struct vfio_device *device,
 	if (ret)
 		return ret;
 
+	/*
+	 * VFIO always sets IOMMU_CACHE because we offer no way for userspace to
+	 * restore cache coherency. It has to be checked here because it is only
+	 * valid for cases where we are using iommu groups.
+	 */
+	if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device) &&
+	    !device_iommu_capable(device->dev, IOMMU_CAP_CACHE_COHERENCY)) {
+		ret = -EINVAL;
+		goto err_out;
+	}
+
 	ret = vfio_device_add(device);
 	if (ret)
 		goto err_out;
-- 
2.34.1

> I prefer the idea that vfio_device_is_noiommu() works in all the
> kconfig scenarios rather than adding #ifdefs.

But the vfio_group would be empty when CONFIG_VFIO_GROUP is unset.
From what I got now, when CONFIG_VFIO_GROUP is unset, the stub
function always returns false.

#if IS_ENABLED(CONFIG_VFIO_GROUP)
struct vfio_group {
	...;
};

static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
{
        return IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
               vdev->group->type == VFIO_NO_IOMMU;
}
#else
struct vfio_group;

static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
{
        return false;
}
#endif

Regards,
Yi Liu


^ permalink raw reply related	[flat|nested] 180+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BUILD: failure for Add vfio_device cdev for iommufd support (rev16)
  2023-06-02 12:16 ` [Intel-gfx] " Yi Liu
                   ` (28 preceding siblings ...)
  (?)
@ 2023-06-14 15:47 ` Patchwork
  -1 siblings, 0 replies; 180+ messages in thread
From: Patchwork @ 2023-06-14 15:47 UTC (permalink / raw)
  To: Liu, Yi L; +Cc: intel-gfx

== Series Details ==

Series: Add vfio_device cdev for iommufd support (rev16)
URL   : https://patchwork.freedesktop.org/series/113696/
State : failure

== Summary ==

Error: patch https://patchwork.freedesktop.org/api/1.0/series/113696/revisions/16/mbox/ not applied
Applying: vfio: Allocate per device file structure
Applying: vfio: Refine vfio file kAPIs for KVM
Applying: vfio: Accept vfio device file in the KVM facing kAPI
Applying: kvm/vfio: Prepare for accepting vfio device fd
Applying: kvm/vfio: Accept vfio device file from userspace
Applying: vfio: Pass struct vfio_device_file * to vfio_device_open/close()
Applying: vfio: Block device access via device fd until device is opened
Applying: vfio: Add cdev_device_open_cnt to vfio_group
Applying: vfio: Make vfio_df_open() single open for device cdev path
Applying: vfio-iommufd: Move noiommu compat validation out of vfio_iommufd_bind()
Applying: vfio-iommufd: Split bind/attach into two steps
Applying: vfio: Record devid in vfio_device_file
Applying: vfio-iommufd: Add detach_ioas support for physical VFIO devices
Applying: iommufd/device: Add iommufd_access_detach() API
Applying: vfio-iommufd: Add detach_ioas support for emulated VFIO devices
error: sha1 information is lacking or useless (drivers/vfio/iommufd.c).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0015 vfio-iommufd: Add detach_ioas support for emulated VFIO devices
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Build failed, no error log produced



^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
  2023-06-14 13:12                                   ` Liu, Yi L
@ 2023-06-14 17:30                                     ` Jason Gunthorpe
  -1 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-14 17:30 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Tian, Kevin, Alex Williamson, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390, Hao,
	Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting, Duan,
	Zhenzhong, clegoate

On Wed, Jun 14, 2023 at 01:12:50PM +0000, Liu, Yi L wrote:
> diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> index 41a09a2df690..c2e0128323a7 100644
> --- a/drivers/vfio/group.c
> +++ b/drivers/vfio/group.c
> @@ -687,16 +687,6 @@ static struct vfio_group *vfio_group_find_or_alloc(struct device *dev)
>  	if (!iommu_group)
>  		return ERR_PTR(-EINVAL);
>  
> -	/*
> -	 * VFIO always sets IOMMU_CACHE because we offer no way for userspace to
> -	 * restore cache coherency. It has to be checked here because it is only
> -	 * valid for cases where we are using iommu groups.
> -	 */
> -	if (!device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY)) {
> -		iommu_group_put(iommu_group);
> -		return ERR_PTR(-EINVAL);
> -	}
> -
>  	mutex_lock(&vfio.group_lock);
>  	group = vfio_group_find_from_iommu(iommu_group);
>  	if (group) {
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 51c80eb32af6..ffb4585b7f0e 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -292,6 +292,17 @@ static int __vfio_register_dev(struct vfio_device *device,
>  	if (ret)
>  		return ret;
>  
> +	/*
> +	 * VFIO always sets IOMMU_CACHE because we offer no way for userspace to
> +	 * restore cache coherency. It has to be checked here because it is only
> +	 * valid for cases where we are using iommu groups.
> +	 */
> +	if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device) &&
> +	    !device_iommu_capable(device->dev, IOMMU_CAP_CACHE_COHERENCY)) {
> +		ret = -EINVAL;
> +		goto err_out;
> +	}
> +
>  	ret = vfio_device_add(device);
>  	if (ret)
>  		goto err_out;

Yes that looks right

> 
> > I prefer the idea that vfio_device_is_noiommu() works in all the
> > kconfig scenarios rather than adding #ifdefs.
> 
> But the vfio_group would be empty when CONFIG_VFIO_GROUP is unset.
> From what I got now, when CONFIG_VFIO_GROUP is unset, the stub
> function always returns false.

It seems fine, you could also put the ifdef inside the stub

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()
@ 2023-06-14 17:30                                     ` Jason Gunthorpe
  0 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-14 17:30 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato, jasowang, Hao, Xudong, Duan, Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, Tian, Kevin, Zhao, Yan Y, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Wed, Jun 14, 2023 at 01:12:50PM +0000, Liu, Yi L wrote:
> diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> index 41a09a2df690..c2e0128323a7 100644
> --- a/drivers/vfio/group.c
> +++ b/drivers/vfio/group.c
> @@ -687,16 +687,6 @@ static struct vfio_group *vfio_group_find_or_alloc(struct device *dev)
>  	if (!iommu_group)
>  		return ERR_PTR(-EINVAL);
>  
> -	/*
> -	 * VFIO always sets IOMMU_CACHE because we offer no way for userspace to
> -	 * restore cache coherency. It has to be checked here because it is only
> -	 * valid for cases where we are using iommu groups.
> -	 */
> -	if (!device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY)) {
> -		iommu_group_put(iommu_group);
> -		return ERR_PTR(-EINVAL);
> -	}
> -
>  	mutex_lock(&vfio.group_lock);
>  	group = vfio_group_find_from_iommu(iommu_group);
>  	if (group) {
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 51c80eb32af6..ffb4585b7f0e 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -292,6 +292,17 @@ static int __vfio_register_dev(struct vfio_device *device,
>  	if (ret)
>  		return ret;
>  
> +	/*
> +	 * VFIO always sets IOMMU_CACHE because we offer no way for userspace to
> +	 * restore cache coherency. It has to be checked here because it is only
> +	 * valid for cases where we are using iommu groups.
> +	 */
> +	if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device) &&
> +	    !device_iommu_capable(device->dev, IOMMU_CAP_CACHE_COHERENCY)) {
> +		ret = -EINVAL;
> +		goto err_out;
> +	}
> +
>  	ret = vfio_device_add(device);
>  	if (ret)
>  		goto err_out;

Yes that looks right

> 
> > I prefer the idea that vfio_device_is_noiommu() works in all the
> > kconfig scenarios rather than adding #ifdefs.
> 
> But the vfio_group would be empty when CONFIG_VFIO_GROUP is unset.
> From what I got now, when CONFIG_VFIO_GROUP is unset, the stub
> function always returns false.

It seems fine, you could also put the ifdef inside the stub

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 10/24] vfio-iommufd: Move noiommu compat validation out of vfio_iommufd_bind()
  2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
@ 2023-06-22 17:59     ` Jason Gunthorpe
  -1 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-22 17:59 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang,
	zhenzhong.duan, clegoate

On Fri, Jun 02, 2023 at 05:16:39AM -0700, Yi Liu wrote:
> This moves the noiommu compat validation logic into vfio_df_group_open().
> This is more consistent with what will be done in vfio device cdev path.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/group.c   | 13 +++++++++++++
>  drivers/vfio/iommufd.c | 22 ++++++++--------------
>  drivers/vfio/vfio.h    |  9 +++++++++
>  3 files changed, 30 insertions(+), 14 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 10/24] vfio-iommufd: Move noiommu compat validation out of vfio_iommufd_bind()
@ 2023-06-22 17:59     ` Jason Gunthorpe
  0 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-22 17:59 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, kvm, lulu, yanting.jiang,
	joro, nicolinc, kevin.tian, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Fri, Jun 02, 2023 at 05:16:39AM -0700, Yi Liu wrote:
> This moves the noiommu compat validation logic into vfio_df_group_open().
> This is more consistent with what will be done in vfio device cdev path.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/group.c   | 13 +++++++++++++
>  drivers/vfio/iommufd.c | 22 ++++++++--------------
>  drivers/vfio/vfio.h    |  9 +++++++++
>  3 files changed, 30 insertions(+), 14 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 11/24] vfio-iommufd: Split bind/attach into two steps
  2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
@ 2023-06-22 17:59     ` Jason Gunthorpe
  -1 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-22 17:59 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang,
	zhenzhong.duan, clegoate

On Fri, Jun 02, 2023 at 05:16:40AM -0700, Yi Liu wrote:
> This aligns the bind/attach logic with the coming vfio device cdev support.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/group.c   | 17 +++++++++++++----
>  drivers/vfio/iommufd.c | 35 +++++++++++++++++------------------
>  drivers/vfio/vfio.h    |  9 +++++++++
>  3 files changed, 39 insertions(+), 22 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 11/24] vfio-iommufd: Split bind/attach into two steps
@ 2023-06-22 17:59     ` Jason Gunthorpe
  0 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-22 17:59 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, kvm, lulu, yanting.jiang,
	joro, nicolinc, kevin.tian, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Fri, Jun 02, 2023 at 05:16:40AM -0700, Yi Liu wrote:
> This aligns the bind/attach logic with the coming vfio device cdev support.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/group.c   | 17 +++++++++++++----
>  drivers/vfio/iommufd.c | 35 +++++++++++++++++------------------
>  drivers/vfio/vfio.h    |  9 +++++++++
>  3 files changed, 39 insertions(+), 22 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 12/24] vfio: Record devid in vfio_device_file
  2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
@ 2023-06-22 18:00     ` Jason Gunthorpe
  -1 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-22 18:00 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, kvm, lulu, yanting.jiang,
	joro, nicolinc, kevin.tian, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Fri, Jun 02, 2023 at 05:16:41AM -0700, Yi Liu wrote:
> .bind_iommufd() will generate an ID to represent this bond, which is
> needed by userspace for further usage. Store devid in vfio_device_file
> to avoid passing the pointer in multiple places.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/iommufd.c   | 12 +++++++-----
>  drivers/vfio/vfio.h      | 10 +++++-----
>  drivers/vfio/vfio_main.c |  6 +++---
>  3 files changed, 15 insertions(+), 13 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 12/24] vfio: Record devid in vfio_device_file
@ 2023-06-22 18:00     ` Jason Gunthorpe
  0 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-22 18:00 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang,
	zhenzhong.duan, clegoate

On Fri, Jun 02, 2023 at 05:16:41AM -0700, Yi Liu wrote:
> .bind_iommufd() will generate an ID to represent this bond, which is
> needed by userspace for further usage. Store devid in vfio_device_file
> to avoid passing the pointer in multiple places.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/iommufd.c   | 12 +++++++-----
>  drivers/vfio/vfio.h      | 10 +++++-----
>  drivers/vfio/vfio_main.c |  6 +++---
>  3 files changed, 15 insertions(+), 13 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 13/24] vfio-iommufd: Add detach_ioas support for physical VFIO devices
  2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
@ 2023-06-23 14:04     ` Jason Gunthorpe
  -1 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-23 14:04 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang,
	zhenzhong.duan, clegoate

On Fri, Jun 02, 2023 at 05:16:42AM -0700, Yi Liu wrote:
> This prepares for adding DETACH ioctl for physical VFIO devices.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  Documentation/driver-api/vfio.rst             |  8 +++++---
>  drivers/vfio/fsl-mc/vfio_fsl_mc.c             |  1 +
>  drivers/vfio/iommufd.c                        | 20 +++++++++++++++++++
>  .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    |  2 ++
>  drivers/vfio/pci/mlx5/main.c                  |  1 +
>  drivers/vfio/pci/vfio_pci.c                   |  1 +
>  drivers/vfio/platform/vfio_amba.c             |  1 +
>  drivers/vfio/platform/vfio_platform.c         |  1 +
>  drivers/vfio/vfio_main.c                      |  3 ++-
>  include/linux/vfio.h                          |  8 +++++++-
>  10 files changed, 41 insertions(+), 5 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 13/24] vfio-iommufd: Add detach_ioas support for physical VFIO devices
@ 2023-06-23 14:04     ` Jason Gunthorpe
  0 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-23 14:04 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, kvm, lulu, yanting.jiang,
	joro, nicolinc, kevin.tian, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Fri, Jun 02, 2023 at 05:16:42AM -0700, Yi Liu wrote:
> This prepares for adding DETACH ioctl for physical VFIO devices.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  Documentation/driver-api/vfio.rst             |  8 +++++---
>  drivers/vfio/fsl-mc/vfio_fsl_mc.c             |  1 +
>  drivers/vfio/iommufd.c                        | 20 +++++++++++++++++++
>  .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    |  2 ++
>  drivers/vfio/pci/mlx5/main.c                  |  1 +
>  drivers/vfio/pci/vfio_pci.c                   |  1 +
>  drivers/vfio/platform/vfio_amba.c             |  1 +
>  drivers/vfio/platform/vfio_platform.c         |  1 +
>  drivers/vfio/vfio_main.c                      |  3 ++-
>  include/linux/vfio.h                          |  8 +++++++-
>  10 files changed, 41 insertions(+), 5 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 14/24] iommufd/device: Add iommufd_access_detach() API
  2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
@ 2023-06-23 14:15     ` Jason Gunthorpe
  -1 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-23 14:15 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang,
	zhenzhong.duan, clegoate

On Fri, Jun 02, 2023 at 05:16:43AM -0700, Yi Liu wrote:
> From: Nicolin Chen <nicolinc@nvidia.com>
> 
> Previously, the detach routine is only done by the destroy(). And it was
> called by vfio_iommufd_emulated_unbind() when the device runs close(), so
> all the mappings in iopt were cleaned in that setup, when the call trace
> reaches this detach() routine.
> 
> Now, there's a need of a detach uAPI, meaning that it does not only need
> a new iommufd_access_detach() API, but also requires access->ops->unmap()
> call as a cleanup. So add one.
> 
> However, leaving that unprotected can introduce some potential of a race
> condition during the pin_/unpin_pages() call, where access->ioas->iopt is
> getting referenced. So, add an ioas_lock to protect the context of iopt
> referencings.
> 
> Also, to allow the iommufd_access_unpin_pages() callback to happen via
> this unmap() call, add an ioas_unpin pointer, so the unpin routine won't
> be affected by the "access->ioas = NULL" trick.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/iommu/iommufd/device.c          | 76 +++++++++++++++++++++++--
>  drivers/iommu/iommufd/iommufd_private.h |  2 +
>  include/linux/iommufd.h                 |  1 +
>  3 files changed, 74 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> index 96d4281bfa7c..6b4ff635c15e 100644
> --- a/drivers/iommu/iommufd/device.c
> +++ b/drivers/iommu/iommufd/device.c
> @@ -486,6 +486,7 @@ iommufd_access_create(struct iommufd_ctx *ictx,
>  	iommufd_ctx_get(ictx);
>  	iommufd_object_finalize(ictx, &access->obj);
>  	*id = access->obj.id;
> +	mutex_init(&access->ioas_lock);
>  	return access;
>  }
>  EXPORT_SYMBOL_NS_GPL(iommufd_access_create, IOMMUFD);
> @@ -505,26 +506,66 @@ void iommufd_access_destroy(struct iommufd_access *access)
>  }
>  EXPORT_SYMBOL_NS_GPL(iommufd_access_destroy, IOMMUFD);
>  
> +static void __iommufd_access_detach(struct iommufd_access *access)
> +{
> +	struct iommufd_ioas *cur_ioas = access->ioas;
> +
> +	lockdep_assert_held(&access->ioas_lock);
> +	/*
> +	 * Set ioas to NULL to block any further iommufd_access_pin_pages().
> +	 * iommufd_access_unpin_pages() can continue using access->ioas_unpin.
> +	 */
> +	access->ioas = NULL;
> +
> +	if (access->ops->unmap) {
> +		mutex_unlock(&access->ioas_lock);
> +		access->ops->unmap(access->data, 0, ULONG_MAX);
> +		mutex_lock(&access->ioas_lock);
> +	}
> +	iopt_remove_access(&cur_ioas->iopt, access);
> +	refcount_dec(&cur_ioas->obj.users);
> +}
> +
> +void iommufd_access_detach(struct iommufd_access *access)
> +{
> +	mutex_lock(&access->ioas_lock);
> +	if (WARN_ON(!access->ioas))
> +		goto out;
> +	__iommufd_access_detach(access);
> +out:
> +	access->ioas_unpin = NULL;
> +	mutex_unlock(&access->ioas_lock);
> +}
> +EXPORT_SYMBOL_NS_GPL(iommufd_access_detach, IOMMUFD);

There is not really any benefit to make this two functions

> int iommufd_access_attach(struct iommufd_access *access, u32 ioas_id)
> {
[..]
> 	if (access->ioas) {

if (access->ioas || access->ioas_unpin) {

But I wonder if it should be a WARN_ON? Does VFIO protect against
the userspace racing detach and attach, or do we expect to do it here?

> @@ -579,8 +620,8 @@ void iommufd_access_notify_unmap(struct io_pagetable *iopt, unsigned long iova,
>  void iommufd_access_unpin_pages(struct iommufd_access *access,
>  				unsigned long iova, unsigned long length)
>  {
> -	struct io_pagetable *iopt = &access->ioas->iopt;
>  	struct iopt_area_contig_iter iter;
> +	struct io_pagetable *iopt;
>  	unsigned long last_iova;
>  	struct iopt_area *area;
>  
> @@ -588,6 +629,13 @@ void iommufd_access_unpin_pages(struct iommufd_access *access,
>  	    WARN_ON(check_add_overflow(iova, length - 1, &last_iova)))
>  		return;
>  
> +	mutex_lock(&access->ioas_lock);
> +	if (!access->ioas_unpin) {

This should be WARN_ON(), the driver has done something wrong if we
call this after the access has been detached.

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 14/24] iommufd/device: Add iommufd_access_detach() API
@ 2023-06-23 14:15     ` Jason Gunthorpe
  0 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-23 14:15 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, kvm, lulu, yanting.jiang,
	joro, nicolinc, kevin.tian, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Fri, Jun 02, 2023 at 05:16:43AM -0700, Yi Liu wrote:
> From: Nicolin Chen <nicolinc@nvidia.com>
> 
> Previously, the detach routine is only done by the destroy(). And it was
> called by vfio_iommufd_emulated_unbind() when the device runs close(), so
> all the mappings in iopt were cleaned in that setup, when the call trace
> reaches this detach() routine.
> 
> Now, there's a need of a detach uAPI, meaning that it does not only need
> a new iommufd_access_detach() API, but also requires access->ops->unmap()
> call as a cleanup. So add one.
> 
> However, leaving that unprotected can introduce some potential of a race
> condition during the pin_/unpin_pages() call, where access->ioas->iopt is
> getting referenced. So, add an ioas_lock to protect the context of iopt
> referencings.
> 
> Also, to allow the iommufd_access_unpin_pages() callback to happen via
> this unmap() call, add an ioas_unpin pointer, so the unpin routine won't
> be affected by the "access->ioas = NULL" trick.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/iommu/iommufd/device.c          | 76 +++++++++++++++++++++++--
>  drivers/iommu/iommufd/iommufd_private.h |  2 +
>  include/linux/iommufd.h                 |  1 +
>  3 files changed, 74 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> index 96d4281bfa7c..6b4ff635c15e 100644
> --- a/drivers/iommu/iommufd/device.c
> +++ b/drivers/iommu/iommufd/device.c
> @@ -486,6 +486,7 @@ iommufd_access_create(struct iommufd_ctx *ictx,
>  	iommufd_ctx_get(ictx);
>  	iommufd_object_finalize(ictx, &access->obj);
>  	*id = access->obj.id;
> +	mutex_init(&access->ioas_lock);
>  	return access;
>  }
>  EXPORT_SYMBOL_NS_GPL(iommufd_access_create, IOMMUFD);
> @@ -505,26 +506,66 @@ void iommufd_access_destroy(struct iommufd_access *access)
>  }
>  EXPORT_SYMBOL_NS_GPL(iommufd_access_destroy, IOMMUFD);
>  
> +static void __iommufd_access_detach(struct iommufd_access *access)
> +{
> +	struct iommufd_ioas *cur_ioas = access->ioas;
> +
> +	lockdep_assert_held(&access->ioas_lock);
> +	/*
> +	 * Set ioas to NULL to block any further iommufd_access_pin_pages().
> +	 * iommufd_access_unpin_pages() can continue using access->ioas_unpin.
> +	 */
> +	access->ioas = NULL;
> +
> +	if (access->ops->unmap) {
> +		mutex_unlock(&access->ioas_lock);
> +		access->ops->unmap(access->data, 0, ULONG_MAX);
> +		mutex_lock(&access->ioas_lock);
> +	}
> +	iopt_remove_access(&cur_ioas->iopt, access);
> +	refcount_dec(&cur_ioas->obj.users);
> +}
> +
> +void iommufd_access_detach(struct iommufd_access *access)
> +{
> +	mutex_lock(&access->ioas_lock);
> +	if (WARN_ON(!access->ioas))
> +		goto out;
> +	__iommufd_access_detach(access);
> +out:
> +	access->ioas_unpin = NULL;
> +	mutex_unlock(&access->ioas_lock);
> +}
> +EXPORT_SYMBOL_NS_GPL(iommufd_access_detach, IOMMUFD);

There is not really any benefit to make this two functions

> int iommufd_access_attach(struct iommufd_access *access, u32 ioas_id)
> {
[..]
> 	if (access->ioas) {

if (access->ioas || access->ioas_unpin) {

But I wonder if it should be a WARN_ON? Does VFIO protect against
the userspace racing detach and attach, or do we expect to do it here?

> @@ -579,8 +620,8 @@ void iommufd_access_notify_unmap(struct io_pagetable *iopt, unsigned long iova,
>  void iommufd_access_unpin_pages(struct iommufd_access *access,
>  				unsigned long iova, unsigned long length)
>  {
> -	struct io_pagetable *iopt = &access->ioas->iopt;
>  	struct iopt_area_contig_iter iter;
> +	struct io_pagetable *iopt;
>  	unsigned long last_iova;
>  	struct iopt_area *area;
>  
> @@ -588,6 +629,13 @@ void iommufd_access_unpin_pages(struct iommufd_access *access,
>  	    WARN_ON(check_add_overflow(iova, length - 1, &last_iova)))
>  		return;
>  
> +	mutex_lock(&access->ioas_lock);
> +	if (!access->ioas_unpin) {

This should be WARN_ON(), the driver has done something wrong if we
call this after the access has been detached.

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 15/24] vfio-iommufd: Add detach_ioas support for emulated VFIO devices
  2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
@ 2023-06-23 14:16     ` Jason Gunthorpe
  -1 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-23 14:16 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang,
	zhenzhong.duan, clegoate

On Fri, Jun 02, 2023 at 05:16:44AM -0700, Yi Liu wrote:
> This prepares for adding DETACH ioctl for emulated VFIO devices.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/gpu/drm/i915/gvt/kvmgt.c  |  1 +
>  drivers/s390/cio/vfio_ccw_ops.c   |  1 +
>  drivers/s390/crypto/vfio_ap_ops.c |  1 +
>  drivers/vfio/iommufd.c            | 13 +++++++++++++
>  include/linux/vfio.h              |  3 +++
>  samples/vfio-mdev/mbochs.c        |  1 +
>  samples/vfio-mdev/mdpy.c          |  1 +
>  samples/vfio-mdev/mtty.c          |  1 +
>  8 files changed, 22 insertions(+)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 15/24] vfio-iommufd: Add detach_ioas support for emulated VFIO devices
@ 2023-06-23 14:16     ` Jason Gunthorpe
  0 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-23 14:16 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, kvm, lulu, yanting.jiang,
	joro, nicolinc, kevin.tian, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Fri, Jun 02, 2023 at 05:16:44AM -0700, Yi Liu wrote:
> This prepares for adding DETACH ioctl for emulated VFIO devices.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/gpu/drm/i915/gvt/kvmgt.c  |  1 +
>  drivers/s390/cio/vfio_ccw_ops.c   |  1 +
>  drivers/s390/crypto/vfio_ap_ops.c |  1 +
>  drivers/vfio/iommufd.c            | 13 +++++++++++++
>  include/linux/vfio.h              |  3 +++
>  samples/vfio-mdev/mbochs.c        |  1 +
>  samples/vfio-mdev/mdpy.c          |  1 +
>  samples/vfio-mdev/mtty.c          |  1 +
>  8 files changed, 22 insertions(+)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 16/24] vfio: Move vfio_device_group_unregister() to be the first operation in unregister
  2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
@ 2023-06-23 14:22     ` Jason Gunthorpe
  -1 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-23 14:22 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang,
	zhenzhong.duan, clegoate

On Fri, Jun 02, 2023 at 05:16:45AM -0700, Yi Liu wrote:
> This avoids endless vfio_device refcount increasement by userspace,
> which would keep blocking the vfio_unregister_group_dev().
> 
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/vfio_main.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

It looks OK, at least I couldn't find a reason why the group list
would need to continue to be valid while we are waiting for the
registration lock to release.

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 16/24] vfio: Move vfio_device_group_unregister() to be the first operation in unregister
@ 2023-06-23 14:22     ` Jason Gunthorpe
  0 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-23 14:22 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, kvm, lulu, yanting.jiang,
	joro, nicolinc, kevin.tian, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Fri, Jun 02, 2023 at 05:16:45AM -0700, Yi Liu wrote:
> This avoids endless vfio_device refcount increasement by userspace,
> which would keep blocking the vfio_unregister_group_dev().
> 
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/vfio_main.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

It looks OK, at least I couldn't find a reason why the group list
would need to continue to be valid while we are waiting for the
registration lock to release.

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 17/24] vfio: Add cdev for vfio_device
  2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
@ 2023-06-23 15:58     ` Jason Gunthorpe
  -1 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-23 15:58 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang,
	zhenzhong.duan, clegoate

On Fri, Jun 02, 2023 at 05:16:46AM -0700, Yi Liu wrote:
> This allows user to directly open a vfio device w/o using the legacy
> container/group interface, as a prerequisite for supporting new iommu
> features like nested translation.
> 
> The device fd opened in this manner doesn't have the capability to access
> the device as the fops open() doesn't open the device until the successful
> BIND_IOMMUFD which be added in next patch.
> 
> With this patch, devices registered to vfio core have both group and device
> interface created.
> 
> - group interface : /dev/vfio/$groupID
> - device interface: /dev/vfio/devices/vfioX - normal device
> 		    ("X" is the minor number and is unique across devices)
> 
> Given a vfio device the user can identify the matching vfioX by checking
> the sysfs path of the device. Take PCI device (0000:6a:01.0) for example,
> /sys/bus/pci/devices/0000\:6a\:01.0/vfio-dev/vfio0/dev contains the
> major:minor of the matching vfioX.
> 
> Userspace then opens the /dev/vfio/devices/vfioX and checks with fstat
> that the major:minor matches.
> 
> The vfio_device cdev logic in this patch:
> *) __vfio_register_dev() path ends up doing cdev_device_add() for each
>    vfio_device if VFIO_DEVICE_CDEV configured.
> *) vfio_unregister_group_dev() path does cdev_device_del();
> 
> device interface does not support noiommu devices, noiommu users should
> use the legacy group interface.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/Kconfig       | 12 ++++++++
>  drivers/vfio/Makefile      |  1 +
>  drivers/vfio/device_cdev.c | 62 ++++++++++++++++++++++++++++++++++++++
>  drivers/vfio/vfio.h        | 54 +++++++++++++++++++++++++++++++++
>  drivers/vfio/vfio_main.c   | 23 +++++++++++---
>  include/linux/vfio.h       |  4 +++
>  6 files changed, 151 insertions(+), 5 deletions(-)
>  create mode 100644 drivers/vfio/device_cdev.c

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

> +/*
> + * device access via the fd opened by this function is blocked until
> + * .open_device() is called successfully during BIND_IOMMUFD.
> + */
> +int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
> +{
> +	struct vfio_device *device = container_of(inode->i_cdev,
> +						  struct vfio_device, cdev);
> +	struct vfio_device_file *df;
> +	int ret;
> +

Add the comment

 /* Paired with the put in vfio_device_fops_release() */
> +	if (!vfio_device_try_get_registration(device))
> +		return -ENODEV;


> @@ -338,6 +338,12 @@ void vfio_unregister_group_dev(struct vfio_device *device)
>  	 */
>  	vfio_device_group_unregister(device);
>  
> +	/*
> +	 * Balances vfio_device_add() in register path, also prevents
> +	 * new device opened by userspace in the cdev path.
> +	 */
> +	vfio_device_del(device);
> +
>  	vfio_device_put_registration(device);
>  	rc = try_wait_for_completion(&device->comp);
>  	while (rc <= 0) {
> @@ -361,9 +367,6 @@ void vfio_unregister_group_dev(struct vfio_device *device)
>  		}
>  	}
>  
> -	/* Balances device_add in register path */
> -	device_del(&device->device);
> -

This looks OK from what I can tell, but it might deserve its own patch
like was done for other movement.

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 17/24] vfio: Add cdev for vfio_device
@ 2023-06-23 15:58     ` Jason Gunthorpe
  0 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-23 15:58 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, kvm, lulu, yanting.jiang,
	joro, nicolinc, kevin.tian, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Fri, Jun 02, 2023 at 05:16:46AM -0700, Yi Liu wrote:
> This allows user to directly open a vfio device w/o using the legacy
> container/group interface, as a prerequisite for supporting new iommu
> features like nested translation.
> 
> The device fd opened in this manner doesn't have the capability to access
> the device as the fops open() doesn't open the device until the successful
> BIND_IOMMUFD which be added in next patch.
> 
> With this patch, devices registered to vfio core have both group and device
> interface created.
> 
> - group interface : /dev/vfio/$groupID
> - device interface: /dev/vfio/devices/vfioX - normal device
> 		    ("X" is the minor number and is unique across devices)
> 
> Given a vfio device the user can identify the matching vfioX by checking
> the sysfs path of the device. Take PCI device (0000:6a:01.0) for example,
> /sys/bus/pci/devices/0000\:6a\:01.0/vfio-dev/vfio0/dev contains the
> major:minor of the matching vfioX.
> 
> Userspace then opens the /dev/vfio/devices/vfioX and checks with fstat
> that the major:minor matches.
> 
> The vfio_device cdev logic in this patch:
> *) __vfio_register_dev() path ends up doing cdev_device_add() for each
>    vfio_device if VFIO_DEVICE_CDEV configured.
> *) vfio_unregister_group_dev() path does cdev_device_del();
> 
> device interface does not support noiommu devices, noiommu users should
> use the legacy group interface.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/Kconfig       | 12 ++++++++
>  drivers/vfio/Makefile      |  1 +
>  drivers/vfio/device_cdev.c | 62 ++++++++++++++++++++++++++++++++++++++
>  drivers/vfio/vfio.h        | 54 +++++++++++++++++++++++++++++++++
>  drivers/vfio/vfio_main.c   | 23 +++++++++++---
>  include/linux/vfio.h       |  4 +++
>  6 files changed, 151 insertions(+), 5 deletions(-)
>  create mode 100644 drivers/vfio/device_cdev.c

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

> +/*
> + * device access via the fd opened by this function is blocked until
> + * .open_device() is called successfully during BIND_IOMMUFD.
> + */
> +int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
> +{
> +	struct vfio_device *device = container_of(inode->i_cdev,
> +						  struct vfio_device, cdev);
> +	struct vfio_device_file *df;
> +	int ret;
> +

Add the comment

 /* Paired with the put in vfio_device_fops_release() */
> +	if (!vfio_device_try_get_registration(device))
> +		return -ENODEV;


> @@ -338,6 +338,12 @@ void vfio_unregister_group_dev(struct vfio_device *device)
>  	 */
>  	vfio_device_group_unregister(device);
>  
> +	/*
> +	 * Balances vfio_device_add() in register path, also prevents
> +	 * new device opened by userspace in the cdev path.
> +	 */
> +	vfio_device_del(device);
> +
>  	vfio_device_put_registration(device);
>  	rc = try_wait_for_completion(&device->comp);
>  	while (rc <= 0) {
> @@ -361,9 +367,6 @@ void vfio_unregister_group_dev(struct vfio_device *device)
>  		}
>  	}
>  
> -	/* Balances device_add in register path */
> -	device_del(&device->device);
> -

This looks OK from what I can tell, but it might deserve its own patch
like was done for other movement.

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
@ 2023-06-23 16:15     ` Jason Gunthorpe
  -1 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-23 16:15 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang,
	zhenzhong.duan, clegoate

On Fri, Jun 02, 2023 at 05:16:47AM -0700, Yi Liu wrote:
> This adds ioctl for userspace to bind device cdev fd to iommufd.
> 
>     VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA
> 			      control provided by the iommufd. open_device
> 			      op is called after bind_iommufd op.
> 
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/device_cdev.c | 123 +++++++++++++++++++++++++++++++++++++
>  drivers/vfio/vfio.h        |  13 ++++
>  drivers/vfio/vfio_main.c   |   5 ++
>  include/linux/vfio.h       |   3 +-
>  include/uapi/linux/vfio.h  |  27 ++++++++
>  5 files changed, 170 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> index 1c640016a824..a4498ddbe774 100644
> --- a/drivers/vfio/device_cdev.c
> +++ b/drivers/vfio/device_cdev.c
> @@ -3,6 +3,7 @@
>   * Copyright (c) 2023 Intel Corporation.
>   */
>  #include <linux/vfio.h>
> +#include <linux/iommufd.h>
>  
>  #include "vfio.h"
>  
> @@ -44,6 +45,128 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
>  	return ret;
>  }
>  
> +static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
> +{
> +	spin_lock(&df->kvm_ref_lock);
> +	if (df->kvm)
> +		_vfio_device_get_kvm_safe(df->device, df->kvm);
> +	spin_unlock(&df->kvm_ref_lock);
> +}

I'm surprised symbol_get() can be called from a spinlock, but it sure
looks like it can..

Also moving the if kvm is null test into _vfio_device_get_kvm_safe()
will save a few lines.

Also shouldn't be called _vfio_device...

> +void vfio_df_cdev_close(struct vfio_device_file *df)
> +{
> +	struct vfio_device *device = df->device;
> +
> +	/*
> +	 * In the time of close, there is no contention with another one
> +	 * changing this flag.  So read df->access_granted without lock
> +	 * and no smp_load_acquire() is ok.
> +	 */
> +	if (!df->access_granted)
> +		return;
> +
> +	mutex_lock(&device->dev_set->lock);
> +	vfio_df_close(df);
> +	vfio_device_put_kvm(device);
> +	iommufd_ctx_put(df->iommufd);
> +	device->cdev_opened = false;
> +	mutex_unlock(&device->dev_set->lock);
> +	vfio_device_unblock_group(device);
> +}

Lets call this vfio_df_unbind_iommufd() and put it near
vfio_df_ioctl_bind_iommufd()

> static struct iommufd_ctx *vfio_get_iommufd_from_fd(int fd)

This can probably be an iommufd function:
  iommufd_ctx_from_fd(int fd)

> +long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
> +				struct vfio_device_bind_iommufd __user *arg)
> +{
> +	ret = copy_to_user(&arg->out_devid, &df->devid,
> +			   sizeof(df->devid)) ? -EFAULT : 0;
> +	if (ret)
> +		goto out_close_device;
> +
> +	/*
> +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> +	 * read/write/mmap
> +	 */
> +	smp_store_release(&df->access_granted, true);
> +	device->cdev_opened = true;

Move the cdev_opened up above the release just for consistency.

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
@ 2023-06-23 16:15     ` Jason Gunthorpe
  0 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-23 16:15 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, kvm, lulu, yanting.jiang,
	joro, nicolinc, kevin.tian, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Fri, Jun 02, 2023 at 05:16:47AM -0700, Yi Liu wrote:
> This adds ioctl for userspace to bind device cdev fd to iommufd.
> 
>     VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA
> 			      control provided by the iommufd. open_device
> 			      op is called after bind_iommufd op.
> 
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/device_cdev.c | 123 +++++++++++++++++++++++++++++++++++++
>  drivers/vfio/vfio.h        |  13 ++++
>  drivers/vfio/vfio_main.c   |   5 ++
>  include/linux/vfio.h       |   3 +-
>  include/uapi/linux/vfio.h  |  27 ++++++++
>  5 files changed, 170 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> index 1c640016a824..a4498ddbe774 100644
> --- a/drivers/vfio/device_cdev.c
> +++ b/drivers/vfio/device_cdev.c
> @@ -3,6 +3,7 @@
>   * Copyright (c) 2023 Intel Corporation.
>   */
>  #include <linux/vfio.h>
> +#include <linux/iommufd.h>
>  
>  #include "vfio.h"
>  
> @@ -44,6 +45,128 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep)
>  	return ret;
>  }
>  
> +static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
> +{
> +	spin_lock(&df->kvm_ref_lock);
> +	if (df->kvm)
> +		_vfio_device_get_kvm_safe(df->device, df->kvm);
> +	spin_unlock(&df->kvm_ref_lock);
> +}

I'm surprised symbol_get() can be called from a spinlock, but it sure
looks like it can..

Also moving the if kvm is null test into _vfio_device_get_kvm_safe()
will save a few lines.

Also shouldn't be called _vfio_device...

> +void vfio_df_cdev_close(struct vfio_device_file *df)
> +{
> +	struct vfio_device *device = df->device;
> +
> +	/*
> +	 * In the time of close, there is no contention with another one
> +	 * changing this flag.  So read df->access_granted without lock
> +	 * and no smp_load_acquire() is ok.
> +	 */
> +	if (!df->access_granted)
> +		return;
> +
> +	mutex_lock(&device->dev_set->lock);
> +	vfio_df_close(df);
> +	vfio_device_put_kvm(device);
> +	iommufd_ctx_put(df->iommufd);
> +	device->cdev_opened = false;
> +	mutex_unlock(&device->dev_set->lock);
> +	vfio_device_unblock_group(device);
> +}

Lets call this vfio_df_unbind_iommufd() and put it near
vfio_df_ioctl_bind_iommufd()

> static struct iommufd_ctx *vfio_get_iommufd_from_fd(int fd)

This can probably be an iommufd function:
  iommufd_ctx_from_fd(int fd)

> +long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
> +				struct vfio_device_bind_iommufd __user *arg)
> +{
> +	ret = copy_to_user(&arg->out_devid, &df->devid,
> +			   sizeof(df->devid)) ? -EFAULT : 0;
> +	if (ret)
> +		goto out_close_device;
> +
> +	/*
> +	 * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> +	 * read/write/mmap
> +	 */
> +	smp_store_release(&df->access_granted, true);
> +	device->cdev_opened = true;

Move the cdev_opened up above the release just for consistency.

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 19/24] vfio: Add VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT
  2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
@ 2023-06-23 16:21     ` Jason Gunthorpe
  -1 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-23 16:21 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang,
	zhenzhong.duan, clegoate

On Fri, Jun 02, 2023 at 05:16:48AM -0700, Yi Liu wrote:
> This adds ioctl for userspace to attach device cdev fd to and detach
> from IOAS/hw_pagetable managed by iommufd.
> 
>     VFIO_DEVICE_ATTACH_IOMMUFD_PT: attach vfio device to IOAS, hw_pagetable
> 				   managed by iommufd. Attach can be
> 				   undo by VFIO_DEVICE_DETACH_IOMMUFD_PT
> 				   or device fd close.
>     VFIO_DEVICE_DETACH_IOMMUFD_PT: detach vfio device from the current attached
> 				   IOAS or hw_pagetable managed by iommufd.
> 
> noiommu devices do not support [AT|DE]TACH, if user invokes the two ioctls
> on such devices, shall fail.

Stale comment

> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/device_cdev.c | 66 ++++++++++++++++++++++++++++++++++++++
>  drivers/vfio/vfio.h        | 16 +++++++++
>  drivers/vfio/vfio_main.c   |  8 +++++
>  include/uapi/linux/vfio.h  | 42 ++++++++++++++++++++++++
>  4 files changed, 132 insertions(+)
> 
> diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> index a4498ddbe774..6e1d499ee160 100644
> --- a/drivers/vfio/device_cdev.c
> +++ b/drivers/vfio/device_cdev.c
> @@ -167,6 +167,72 @@ long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
>  	return ret;
>  }
>  
> +int vfio_df_ioctl_attach_pt(struct vfio_device_file *df,
> +			    struct vfio_device_attach_iommufd_pt __user *arg)
> +{
> +	struct vfio_device *device = df->device;
> +	struct vfio_device_attach_iommufd_pt attach;
> +	unsigned long minsz;
> +	int ret;
> +
> +	minsz = offsetofend(struct vfio_device_attach_iommufd_pt, pt_id);
> +
> +	if (copy_from_user(&attach, arg, minsz))
> +		return -EFAULT;
> +
> +	if (attach.argsz < minsz || attach.flags)
> +		return -EINVAL;
> +
> +	/* ATTACH only allowed for cdev fds */
> +	if (df->group)
> +		return -EINVAL;

I feel like vfio_device_fops_unl_ioctl() should do these group tests
for the whole lot

@@ -1187,19 +1187,24 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
        if (ret)
                return ret;
 
+       /* cdev only ioctls */
+       if (IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV) && !df->group) {
+               switch (cmd) {
+               case VFIO_DEVICE_ATTACH_IOMMUFD_PT:
+                       ret = vfio_df_ioctl_attach_pt(df, (void __user *)arg);
+                       goto out;
+
+               case VFIO_DEVICE_DETACH_IOMMUFD_PT:
+                       ret = vfio_df_ioctl_detach_pt(df, (void __user *)arg);
+                       goto out;
+               }
+       }
+
        switch (cmd) {
        case VFIO_DEVICE_FEATURE:
                ret = vfio_ioctl_device_feature(device, (void __user *)arg);
                break;
 
-       case VFIO_DEVICE_ATTACH_IOMMUFD_PT:
-               ret = vfio_df_ioctl_attach_pt(df, (void __user *)arg);
-               break;
-
-       case VFIO_DEVICE_DETACH_IOMMUFD_PT:
-               ret = vfio_df_ioctl_detach_pt(df, (void __user *)arg);
-               break;
-
        default:
                if (unlikely(!device->ops->ioctl))
                        ret = -EINVAL;

And also make a local var for void __user * to avoid the repeated
casts.

Also this construction avoids the stub static inlines since the
IS_ENABLED will compile out the call. Just use a normal function
prototype outside any ifdef.

> +
> +	mutex_lock(&device->dev_set->lock);
> +	ret = device->ops->attach_ioas(device, &attach.pt_id);
> +	if (ret)
> +		goto out_unlock;
> +
> +	ret = copy_to_user(&arg->pt_id, &attach.pt_id,
> +			   sizeof(attach.pt_id)) ? -EFAULT : 0;
> +	if (ret)
> +		goto out_detach;

Don't use the ?:

if (copy_to_user()..) {
    ret = -EFAULT;
    goto out_detach;
}

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 19/24] vfio: Add VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT
@ 2023-06-23 16:21     ` Jason Gunthorpe
  0 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-23 16:21 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, kvm, lulu, yanting.jiang,
	joro, nicolinc, kevin.tian, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Fri, Jun 02, 2023 at 05:16:48AM -0700, Yi Liu wrote:
> This adds ioctl for userspace to attach device cdev fd to and detach
> from IOAS/hw_pagetable managed by iommufd.
> 
>     VFIO_DEVICE_ATTACH_IOMMUFD_PT: attach vfio device to IOAS, hw_pagetable
> 				   managed by iommufd. Attach can be
> 				   undo by VFIO_DEVICE_DETACH_IOMMUFD_PT
> 				   or device fd close.
>     VFIO_DEVICE_DETACH_IOMMUFD_PT: detach vfio device from the current attached
> 				   IOAS or hw_pagetable managed by iommufd.
> 
> noiommu devices do not support [AT|DE]TACH, if user invokes the two ioctls
> on such devices, shall fail.

Stale comment

> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/device_cdev.c | 66 ++++++++++++++++++++++++++++++++++++++
>  drivers/vfio/vfio.h        | 16 +++++++++
>  drivers/vfio/vfio_main.c   |  8 +++++
>  include/uapi/linux/vfio.h  | 42 ++++++++++++++++++++++++
>  4 files changed, 132 insertions(+)
> 
> diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> index a4498ddbe774..6e1d499ee160 100644
> --- a/drivers/vfio/device_cdev.c
> +++ b/drivers/vfio/device_cdev.c
> @@ -167,6 +167,72 @@ long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
>  	return ret;
>  }
>  
> +int vfio_df_ioctl_attach_pt(struct vfio_device_file *df,
> +			    struct vfio_device_attach_iommufd_pt __user *arg)
> +{
> +	struct vfio_device *device = df->device;
> +	struct vfio_device_attach_iommufd_pt attach;
> +	unsigned long minsz;
> +	int ret;
> +
> +	minsz = offsetofend(struct vfio_device_attach_iommufd_pt, pt_id);
> +
> +	if (copy_from_user(&attach, arg, minsz))
> +		return -EFAULT;
> +
> +	if (attach.argsz < minsz || attach.flags)
> +		return -EINVAL;
> +
> +	/* ATTACH only allowed for cdev fds */
> +	if (df->group)
> +		return -EINVAL;

I feel like vfio_device_fops_unl_ioctl() should do these group tests
for the whole lot

@@ -1187,19 +1187,24 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
        if (ret)
                return ret;
 
+       /* cdev only ioctls */
+       if (IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV) && !df->group) {
+               switch (cmd) {
+               case VFIO_DEVICE_ATTACH_IOMMUFD_PT:
+                       ret = vfio_df_ioctl_attach_pt(df, (void __user *)arg);
+                       goto out;
+
+               case VFIO_DEVICE_DETACH_IOMMUFD_PT:
+                       ret = vfio_df_ioctl_detach_pt(df, (void __user *)arg);
+                       goto out;
+               }
+       }
+
        switch (cmd) {
        case VFIO_DEVICE_FEATURE:
                ret = vfio_ioctl_device_feature(device, (void __user *)arg);
                break;
 
-       case VFIO_DEVICE_ATTACH_IOMMUFD_PT:
-               ret = vfio_df_ioctl_attach_pt(df, (void __user *)arg);
-               break;
-
-       case VFIO_DEVICE_DETACH_IOMMUFD_PT:
-               ret = vfio_df_ioctl_detach_pt(df, (void __user *)arg);
-               break;
-
        default:
                if (unlikely(!device->ops->ioctl))
                        ret = -EINVAL;

And also make a local var for void __user * to avoid the repeated
casts.

Also this construction avoids the stub static inlines since the
IS_ENABLED will compile out the call. Just use a normal function
prototype outside any ifdef.

> +
> +	mutex_lock(&device->dev_set->lock);
> +	ret = device->ops->attach_ioas(device, &attach.pt_id);
> +	if (ret)
> +		goto out_unlock;
> +
> +	ret = copy_to_user(&arg->pt_id, &attach.pt_id,
> +			   sizeof(attach.pt_id)) ? -EFAULT : 0;
> +	if (ret)
> +		goto out_detach;

Don't use the ?:

if (copy_to_user()..) {
    ret = -EFAULT;
    goto out_detach;
}

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 23/24] vfio: Compile vfio_group infrastructure optionally
  2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
@ 2023-06-23 16:35     ` Jason Gunthorpe
  -1 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-23 16:35 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang,
	zhenzhong.duan, clegoate

On Fri, Jun 02, 2023 at 05:16:52AM -0700, Yi Liu wrote:
> vfio_group is not needed for vfio device cdev, so with vfio device cdev
> introduced, the vfio_group infrastructures can be compiled out if only
> cdev is needed.
> 
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/iommu/iommufd/Kconfig |  4 +-
>  drivers/vfio/Kconfig          | 15 +++++++
>  drivers/vfio/Makefile         |  2 +-
>  drivers/vfio/vfio.h           | 84 ++++++++++++++++++++++++++++++++---
>  include/linux/vfio.h          | 25 +++++++++--
>  5 files changed, 118 insertions(+), 12 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 23/24] vfio: Compile vfio_group infrastructure optionally
@ 2023-06-23 16:35     ` Jason Gunthorpe
  0 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-23 16:35 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, kvm, lulu, yanting.jiang,
	joro, nicolinc, kevin.tian, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Fri, Jun 02, 2023 at 05:16:52AM -0700, Yi Liu wrote:
> vfio_group is not needed for vfio device cdev, so with vfio device cdev
> introduced, the vfio_group infrastructures can be compiled out if only
> cdev is needed.
> 
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/iommu/iommufd/Kconfig |  4 +-
>  drivers/vfio/Kconfig          | 15 +++++++
>  drivers/vfio/Makefile         |  2 +-
>  drivers/vfio/vfio.h           | 84 ++++++++++++++++++++++++++++++++---
>  include/linux/vfio.h          | 25 +++++++++--
>  5 files changed, 118 insertions(+), 12 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 14/24] iommufd/device: Add iommufd_access_detach() API
  2023-06-23 14:15     ` [Intel-gfx] " Jason Gunthorpe
@ 2023-06-25 18:26       ` Nicolin Chen
  -1 siblings, 0 replies; 180+ messages in thread
From: Nicolin Chen @ 2023-06-25 18:26 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yi Liu, alex.williamson, kevin.tian, joro, robin.murphy, cohuck,
	eric.auger, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx,
	jasowang, shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

On Fri, Jun 23, 2023 at 11:15:40AM -0300, Jason Gunthorpe wrote:

> > +static void __iommufd_access_detach(struct iommufd_access *access)
> > +{
> > +	struct iommufd_ioas *cur_ioas = access->ioas;
> > +
> > +	lockdep_assert_held(&access->ioas_lock);
> > +	/*
> > +	 * Set ioas to NULL to block any further iommufd_access_pin_pages().
> > +	 * iommufd_access_unpin_pages() can continue using access->ioas_unpin.
> > +	 */
> > +	access->ioas = NULL;
> > +
> > +	if (access->ops->unmap) {
> > +		mutex_unlock(&access->ioas_lock);
> > +		access->ops->unmap(access->data, 0, ULONG_MAX);
> > +		mutex_lock(&access->ioas_lock);
> > +	}
> > +	iopt_remove_access(&cur_ioas->iopt, access);
> > +	refcount_dec(&cur_ioas->obj.users);
> > +}
> > +
> > +void iommufd_access_detach(struct iommufd_access *access)
> > +{
> > +	mutex_lock(&access->ioas_lock);
> > +	if (WARN_ON(!access->ioas))
> > +		goto out;
> > +	__iommufd_access_detach(access);
> > +out:
> > +	access->ioas_unpin = NULL;
> > +	mutex_unlock(&access->ioas_lock);
> > +}
> > +EXPORT_SYMBOL_NS_GPL(iommufd_access_detach, IOMMUFD);
> 
> There is not really any benefit to make this two functions

The __iommufd_access_detach() will be used by replace() in the
following series. Yet, let's merge them here then. And I'll add
__iommufd_access_detach() back in the replace series.

> > int iommufd_access_attach(struct iommufd_access *access, u32 ioas_id)
> > {
> [..]
> > 	if (access->ioas) {
> 
> if (access->ioas || access->ioas_unpin) {

Ack.

> But I wonder if it should be a WARN_ON? Does VFIO protect against
> the userspace racing detach and attach, or do we expect to do it here?

VFIO has a vdev->iommufd_attached flag to prevent a double call
of this function. And detach and attach there also have a mutex
protection. So it should be a WARN_ON here, I think.

> > @@ -579,8 +620,8 @@ void iommufd_access_notify_unmap(struct io_pagetable *iopt, unsigned long iova,
> >  void iommufd_access_unpin_pages(struct iommufd_access *access,
> >  				unsigned long iova, unsigned long length)
> >  {
> > -	struct io_pagetable *iopt = &access->ioas->iopt;
> >  	struct iopt_area_contig_iter iter;
> > +	struct io_pagetable *iopt;
> >  	unsigned long last_iova;
> >  	struct iopt_area *area;
> >  
> > @@ -588,6 +629,13 @@ void iommufd_access_unpin_pages(struct iommufd_access *access,
> >  	    WARN_ON(check_add_overflow(iova, length - 1, &last_iova)))
> >  		return;
> >  
> > +	mutex_lock(&access->ioas_lock);
> > +	if (!access->ioas_unpin) {
> 
> This should be WARN_ON(), the driver has done something wrong if we
> call this after the access has been detached.

Ack. Also adding a line of comments for that:
+       /*
+        * The driver must be doing something wrong if it calls this before an
+        * iommufd_access_attach() or after an iommufd_access_detach().
+        */
+       if (WARN_ON(!access->ioas_unpin)) {

Thanks
Nic

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 14/24] iommufd/device: Add iommufd_access_detach() API
@ 2023-06-25 18:26       ` Nicolin Chen
  0 siblings, 0 replies; 180+ messages in thread
From: Nicolin Chen @ 2023-06-25 18:26 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, Yi Liu, kvm, lulu,
	yanting.jiang, joro, kevin.tian, yan.y.zhao, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Fri, Jun 23, 2023 at 11:15:40AM -0300, Jason Gunthorpe wrote:

> > +static void __iommufd_access_detach(struct iommufd_access *access)
> > +{
> > +	struct iommufd_ioas *cur_ioas = access->ioas;
> > +
> > +	lockdep_assert_held(&access->ioas_lock);
> > +	/*
> > +	 * Set ioas to NULL to block any further iommufd_access_pin_pages().
> > +	 * iommufd_access_unpin_pages() can continue using access->ioas_unpin.
> > +	 */
> > +	access->ioas = NULL;
> > +
> > +	if (access->ops->unmap) {
> > +		mutex_unlock(&access->ioas_lock);
> > +		access->ops->unmap(access->data, 0, ULONG_MAX);
> > +		mutex_lock(&access->ioas_lock);
> > +	}
> > +	iopt_remove_access(&cur_ioas->iopt, access);
> > +	refcount_dec(&cur_ioas->obj.users);
> > +}
> > +
> > +void iommufd_access_detach(struct iommufd_access *access)
> > +{
> > +	mutex_lock(&access->ioas_lock);
> > +	if (WARN_ON(!access->ioas))
> > +		goto out;
> > +	__iommufd_access_detach(access);
> > +out:
> > +	access->ioas_unpin = NULL;
> > +	mutex_unlock(&access->ioas_lock);
> > +}
> > +EXPORT_SYMBOL_NS_GPL(iommufd_access_detach, IOMMUFD);
> 
> There is not really any benefit to make this two functions

The __iommufd_access_detach() will be used by replace() in the
following series. Yet, let's merge them here then. And I'll add
__iommufd_access_detach() back in the replace series.

> > int iommufd_access_attach(struct iommufd_access *access, u32 ioas_id)
> > {
> [..]
> > 	if (access->ioas) {
> 
> if (access->ioas || access->ioas_unpin) {

Ack.

> But I wonder if it should be a WARN_ON? Does VFIO protect against
> the userspace racing detach and attach, or do we expect to do it here?

VFIO has a vdev->iommufd_attached flag to prevent a double call
of this function. And detach and attach there also have a mutex
protection. So it should be a WARN_ON here, I think.

> > @@ -579,8 +620,8 @@ void iommufd_access_notify_unmap(struct io_pagetable *iopt, unsigned long iova,
> >  void iommufd_access_unpin_pages(struct iommufd_access *access,
> >  				unsigned long iova, unsigned long length)
> >  {
> > -	struct io_pagetable *iopt = &access->ioas->iopt;
> >  	struct iopt_area_contig_iter iter;
> > +	struct io_pagetable *iopt;
> >  	unsigned long last_iova;
> >  	struct iopt_area *area;
> >  
> > @@ -588,6 +629,13 @@ void iommufd_access_unpin_pages(struct iommufd_access *access,
> >  	    WARN_ON(check_add_overflow(iova, length - 1, &last_iova)))
> >  		return;
> >  
> > +	mutex_lock(&access->ioas_lock);
> > +	if (!access->ioas_unpin) {
> 
> This should be WARN_ON(), the driver has done something wrong if we
> call this after the access has been detached.

Ack. Also adding a line of comments for that:
+       /*
+        * The driver must be doing something wrong if it calls this before an
+        * iommufd_access_attach() or after an iommufd_access_detach().
+        */
+       if (WARN_ON(!access->ioas_unpin)) {

Thanks
Nic

^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-06-23 16:15     ` [Intel-gfx] " Jason Gunthorpe
@ 2023-06-26  8:34       ` Liu, Yi L
  -1 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-26  8:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: alex.williamson, Tian, Kevin, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390, Hao,
	Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting, Duan,
	Zhenzhong, clegoate

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, June 24, 2023 12:15 AM

> >  }
> >
> > +static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
> > +{
> > +	spin_lock(&df->kvm_ref_lock);
> > +	if (df->kvm)
> > +		_vfio_device_get_kvm_safe(df->device, df->kvm);
> > +	spin_unlock(&df->kvm_ref_lock);
> > +}
> 
> I'm surprised symbol_get() can be called from a spinlock, but it sure
> looks like it can..
> 
> Also moving the if kvm is null test into _vfio_device_get_kvm_safe()
> will save a few lines.
> 
> Also shouldn't be called _vfio_device...

Ah, any suggestion on the naming? How about vfio_device_get_kvm_safe_locked()?

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
@ 2023-06-26  8:34       ` Liu, Yi L
  0 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-26  8:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato, jasowang, Hao, Xudong, Duan,  Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, Tian, Kevin, Zhao, Yan Y, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, June 24, 2023 12:15 AM

> >  }
> >
> > +static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
> > +{
> > +	spin_lock(&df->kvm_ref_lock);
> > +	if (df->kvm)
> > +		_vfio_device_get_kvm_safe(df->device, df->kvm);
> > +	spin_unlock(&df->kvm_ref_lock);
> > +}
> 
> I'm surprised symbol_get() can be called from a spinlock, but it sure
> looks like it can..
> 
> Also moving the if kvm is null test into _vfio_device_get_kvm_safe()
> will save a few lines.
> 
> Also shouldn't be called _vfio_device...

Ah, any suggestion on the naming? How about vfio_device_get_kvm_safe_locked()?

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-06-26  8:34       ` [Intel-gfx] " Liu, Yi L
@ 2023-06-26 12:56         ` Jason Gunthorpe
  -1 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-26 12:56 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: alex.williamson, Tian, Kevin, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390, Hao,
	Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting, Duan,
	Zhenzhong, clegoate

On Mon, Jun 26, 2023 at 08:34:26AM +0000, Liu, Yi L wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Saturday, June 24, 2023 12:15 AM
> 
> > >  }
> > >
> > > +static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
> > > +{
> > > +	spin_lock(&df->kvm_ref_lock);
> > > +	if (df->kvm)
> > > +		_vfio_device_get_kvm_safe(df->device, df->kvm);
> > > +	spin_unlock(&df->kvm_ref_lock);
> > > +}
> > 
> > I'm surprised symbol_get() can be called from a spinlock, but it sure
> > looks like it can..
> > 
> > Also moving the if kvm is null test into _vfio_device_get_kvm_safe()
> > will save a few lines.
> > 
> > Also shouldn't be called _vfio_device...
> 
> Ah, any suggestion on the naming? How about vfio_device_get_kvm_safe_locked()?

I thought you were using _df_ now for these functions?

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
@ 2023-06-26 12:56         ` Jason Gunthorpe
  0 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-26 12:56 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato, jasowang, Hao, Xudong, Duan, Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, Tian, Kevin, Zhao, Yan Y, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Mon, Jun 26, 2023 at 08:34:26AM +0000, Liu, Yi L wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Saturday, June 24, 2023 12:15 AM
> 
> > >  }
> > >
> > > +static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
> > > +{
> > > +	spin_lock(&df->kvm_ref_lock);
> > > +	if (df->kvm)
> > > +		_vfio_device_get_kvm_safe(df->device, df->kvm);
> > > +	spin_unlock(&df->kvm_ref_lock);
> > > +}
> > 
> > I'm surprised symbol_get() can be called from a spinlock, but it sure
> > looks like it can..
> > 
> > Also moving the if kvm is null test into _vfio_device_get_kvm_safe()
> > will save a few lines.
> > 
> > Also shouldn't be called _vfio_device...
> 
> Ah, any suggestion on the naming? How about vfio_device_get_kvm_safe_locked()?

I thought you were using _df_ now for these functions?

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-06-26 12:56         ` [Intel-gfx] " Jason Gunthorpe
@ 2023-06-26 13:35           ` Liu, Yi L
  -1 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-26 13:35 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: alex.williamson, Tian, Kevin, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390, Hao,
	Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting, Duan,
	Zhenzhong, clegoate

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Monday, June 26, 2023 8:56 PM
> 
> On Mon, Jun 26, 2023 at 08:34:26AM +0000, Liu, Yi L wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Saturday, June 24, 2023 12:15 AM
> >
> > > >  }
> > > >
> > > > +static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
> > > > +{
> > > > +	spin_lock(&df->kvm_ref_lock);
> > > > +	if (df->kvm)
> > > > +		_vfio_device_get_kvm_safe(df->device, df->kvm);
> > > > +	spin_unlock(&df->kvm_ref_lock);
> > > > +}
> > >
> > > I'm surprised symbol_get() can be called from a spinlock, but it sure
> > > looks like it can..
> > >
> > > Also moving the if kvm is null test into _vfio_device_get_kvm_safe()
> > > will save a few lines.
> > >
> > > Also shouldn't be called _vfio_device...
> >
> > Ah, any suggestion on the naming? How about vfio_device_get_kvm_safe_locked()?
> 
> I thought you were using _df_ now for these functions?
> 

I see. Your point is passing df to _vfio_device_get_kvm_safe(() and
test the df->kvm within it.  Hence rename it to be _df_. I think group
path should be ok with this change as well. Let me make it.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
@ 2023-06-26 13:35           ` Liu, Yi L
  0 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-26 13:35 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato, jasowang, Hao, Xudong, Duan,  Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, Tian, Kevin, Zhao, Yan Y, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Monday, June 26, 2023 8:56 PM
> 
> On Mon, Jun 26, 2023 at 08:34:26AM +0000, Liu, Yi L wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Saturday, June 24, 2023 12:15 AM
> >
> > > >  }
> > > >
> > > > +static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
> > > > +{
> > > > +	spin_lock(&df->kvm_ref_lock);
> > > > +	if (df->kvm)
> > > > +		_vfio_device_get_kvm_safe(df->device, df->kvm);
> > > > +	spin_unlock(&df->kvm_ref_lock);
> > > > +}
> > >
> > > I'm surprised symbol_get() can be called from a spinlock, but it sure
> > > looks like it can..
> > >
> > > Also moving the if kvm is null test into _vfio_device_get_kvm_safe()
> > > will save a few lines.
> > >
> > > Also shouldn't be called _vfio_device...
> >
> > Ah, any suggestion on the naming? How about vfio_device_get_kvm_safe_locked()?
> 
> I thought you were using _df_ now for these functions?
> 

I see. Your point is passing df to _vfio_device_get_kvm_safe(() and
test the df->kvm within it.  Hence rename it to be _df_. I think group
path should be ok with this change as well. Let me make it.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-06-26 13:35           ` [Intel-gfx] " Liu, Yi L
@ 2023-06-26 14:51             ` Liu, Yi L
  -1 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-26 14:51 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: alex.williamson, Tian, Kevin, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390, Hao,
	Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting, Duan,
	Zhenzhong, clegoate

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, June 26, 2023 9:35 PM
> 
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Monday, June 26, 2023 8:56 PM
> >
> > On Mon, Jun 26, 2023 at 08:34:26AM +0000, Liu, Yi L wrote:
> > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > Sent: Saturday, June 24, 2023 12:15 AM
> > >
> > > > >  }
> > > > >
> > > > > +static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
> > > > > +{
> > > > > +	spin_lock(&df->kvm_ref_lock);
> > > > > +	if (df->kvm)
> > > > > +		_vfio_device_get_kvm_safe(df->device, df->kvm);
> > > > > +	spin_unlock(&df->kvm_ref_lock);
> > > > > +}
> > > >
> > > > I'm surprised symbol_get() can be called from a spinlock, but it sure
> > > > looks like it can..
> > > >
> > > > Also moving the if kvm is null test into _vfio_device_get_kvm_safe()
> > > > will save a few lines.
> > > >
> > > > Also shouldn't be called _vfio_device...
> > >
> > > Ah, any suggestion on the naming? How about vfio_device_get_kvm_safe_locked()?
> >
> > I thought you were using _df_ now for these functions?
> >
> 
> I see. Your point is passing df to _vfio_device_get_kvm_safe(() and
> test the df->kvm within it.  Hence rename it to be _df_. I think group
> path should be ok with this change as well. Let me make it.

To pass vfio_device_file to _vfio_device_get_kvm_safe(), needs to make
the below changes to the group path. If just wants to test null kvm in the
_vfio_device_get_kvm_safe(), maybe simpler to keep the current parameters
and just move the null kvm test within this function. Is it?

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 41a09a2df690..c2e880c15c44 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -157,15 +157,15 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
 	return ret;
 }
 
-static void vfio_device_group_get_kvm_safe(struct vfio_device *device)
+static void vfio_device_group_get_kvm_safe(struct vfio_device_file *df)
 {
-	spin_lock(&device->group->kvm_ref_lock);
-	if (!device->group->kvm)
-		goto unlock;
-
-	_vfio_device_get_kvm_safe(device, device->group->kvm);
+	struct vfio_device *device = df->device;
 
-unlock:
+	spin_lock(&device->group->kvm_ref_lock);
+	spin_lock(&df->kvm_ref_lock);
+	df->kvm = device->group->kvm;
+	_vfio_df_get_kvm_safe(df);
+	spin_unlock(&df->kvm_ref_lock);
 	spin_unlock(&device->group->kvm_ref_lock);
 }
 
@@ -189,7 +189,7 @@ static int vfio_df_group_open(struct vfio_device_file *df)
 	 * the pointer in the device for use by drivers.
 	 */
 	if (device->open_count == 0)
-		vfio_device_group_get_kvm_safe(device);
+		vfio_device_group_get_kvm_safe(df);
 
 	df->iommufd = device->group->iommufd;
 	if (df->iommufd && vfio_device_is_noiommu(device) && device->open_count == 0) {
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index fb8f2fac3d23..066766d43bdc 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -340,11 +340,10 @@ enum { vfio_noiommu = false };
 #endif
 
 #ifdef CONFIG_HAVE_KVM
-void _vfio_device_get_kvm_safe(struct vfio_device *device, struct kvm *kvm);
+void _vfio_df_get_kvm_safe(struct vfio_device_file *df);
 void vfio_device_put_kvm(struct vfio_device *device);
 #else
-static inline void _vfio_device_get_kvm_safe(struct vfio_device *device,
-					     struct kvm *kvm)
+static inline void _vfio_df_get_kvm_safe(struct vfio_device_file *df)
 {
 }
 
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 8a9ebcc6980b..4e6ea2943d28 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -373,14 +373,22 @@ void vfio_unregister_group_dev(struct vfio_device *device)
 EXPORT_SYMBOL_GPL(vfio_unregister_group_dev);
 
 #ifdef CONFIG_HAVE_KVM
-void _vfio_device_get_kvm_safe(struct vfio_device *device, struct kvm *kvm)
+void _vfio_df_get_kvm_safe(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
 	void (*pfn)(struct kvm *kvm);
 	bool (*fn)(struct kvm *kvm);
+	struct kvm *kvm;
 	bool ret;
 
+	lockdep_assert_held(&df->kvm_ref_lock);
 	lockdep_assert_held(&device->dev_set->lock);
 
+	kvm = df->kvm;
+
+	if (!kvm)
+		return;
+
 	pfn = symbol_get(kvm_put_kvm);
 	if (WARN_ON(!pfn))
 		return;

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
@ 2023-06-26 14:51             ` Liu, Yi L
  0 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-26 14:51 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato, jasowang, Hao, Xudong, Duan,  Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, Tian, Kevin, Zhao, Yan Y, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, June 26, 2023 9:35 PM
> 
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Monday, June 26, 2023 8:56 PM
> >
> > On Mon, Jun 26, 2023 at 08:34:26AM +0000, Liu, Yi L wrote:
> > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > Sent: Saturday, June 24, 2023 12:15 AM
> > >
> > > > >  }
> > > > >
> > > > > +static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
> > > > > +{
> > > > > +	spin_lock(&df->kvm_ref_lock);
> > > > > +	if (df->kvm)
> > > > > +		_vfio_device_get_kvm_safe(df->device, df->kvm);
> > > > > +	spin_unlock(&df->kvm_ref_lock);
> > > > > +}
> > > >
> > > > I'm surprised symbol_get() can be called from a spinlock, but it sure
> > > > looks like it can..
> > > >
> > > > Also moving the if kvm is null test into _vfio_device_get_kvm_safe()
> > > > will save a few lines.
> > > >
> > > > Also shouldn't be called _vfio_device...
> > >
> > > Ah, any suggestion on the naming? How about vfio_device_get_kvm_safe_locked()?
> >
> > I thought you were using _df_ now for these functions?
> >
> 
> I see. Your point is passing df to _vfio_device_get_kvm_safe(() and
> test the df->kvm within it.  Hence rename it to be _df_. I think group
> path should be ok with this change as well. Let me make it.

To pass vfio_device_file to _vfio_device_get_kvm_safe(), needs to make
the below changes to the group path. If just wants to test null kvm in the
_vfio_device_get_kvm_safe(), maybe simpler to keep the current parameters
and just move the null kvm test within this function. Is it?

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 41a09a2df690..c2e880c15c44 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -157,15 +157,15 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
 	return ret;
 }
 
-static void vfio_device_group_get_kvm_safe(struct vfio_device *device)
+static void vfio_device_group_get_kvm_safe(struct vfio_device_file *df)
 {
-	spin_lock(&device->group->kvm_ref_lock);
-	if (!device->group->kvm)
-		goto unlock;
-
-	_vfio_device_get_kvm_safe(device, device->group->kvm);
+	struct vfio_device *device = df->device;
 
-unlock:
+	spin_lock(&device->group->kvm_ref_lock);
+	spin_lock(&df->kvm_ref_lock);
+	df->kvm = device->group->kvm;
+	_vfio_df_get_kvm_safe(df);
+	spin_unlock(&df->kvm_ref_lock);
 	spin_unlock(&device->group->kvm_ref_lock);
 }
 
@@ -189,7 +189,7 @@ static int vfio_df_group_open(struct vfio_device_file *df)
 	 * the pointer in the device for use by drivers.
 	 */
 	if (device->open_count == 0)
-		vfio_device_group_get_kvm_safe(device);
+		vfio_device_group_get_kvm_safe(df);
 
 	df->iommufd = device->group->iommufd;
 	if (df->iommufd && vfio_device_is_noiommu(device) && device->open_count == 0) {
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index fb8f2fac3d23..066766d43bdc 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -340,11 +340,10 @@ enum { vfio_noiommu = false };
 #endif
 
 #ifdef CONFIG_HAVE_KVM
-void _vfio_device_get_kvm_safe(struct vfio_device *device, struct kvm *kvm);
+void _vfio_df_get_kvm_safe(struct vfio_device_file *df);
 void vfio_device_put_kvm(struct vfio_device *device);
 #else
-static inline void _vfio_device_get_kvm_safe(struct vfio_device *device,
-					     struct kvm *kvm)
+static inline void _vfio_df_get_kvm_safe(struct vfio_device_file *df)
 {
 }
 
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 8a9ebcc6980b..4e6ea2943d28 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -373,14 +373,22 @@ void vfio_unregister_group_dev(struct vfio_device *device)
 EXPORT_SYMBOL_GPL(vfio_unregister_group_dev);
 
 #ifdef CONFIG_HAVE_KVM
-void _vfio_device_get_kvm_safe(struct vfio_device *device, struct kvm *kvm)
+void _vfio_df_get_kvm_safe(struct vfio_device_file *df)
 {
+	struct vfio_device *device = df->device;
 	void (*pfn)(struct kvm *kvm);
 	bool (*fn)(struct kvm *kvm);
+	struct kvm *kvm;
 	bool ret;
 
+	lockdep_assert_held(&df->kvm_ref_lock);
 	lockdep_assert_held(&device->dev_set->lock);
 
+	kvm = df->kvm;
+
+	if (!kvm)
+		return;
+
 	pfn = symbol_get(kvm_put_kvm);
 	if (WARN_ON(!pfn))
 		return;

^ permalink raw reply related	[flat|nested] 180+ messages in thread

* Re: [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-06-26 14:51             ` [Intel-gfx] " Liu, Yi L
@ 2023-06-28 14:34               ` Jason Gunthorpe
  -1 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-28 14:34 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: alex.williamson, Tian, Kevin, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390, Hao,
	Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting, Duan,
	Zhenzhong, clegoate

On Mon, Jun 26, 2023 at 02:51:29PM +0000, Liu, Yi L wrote:
> > > >
> > > > Ah, any suggestion on the naming? How about vfio_device_get_kvm_safe_locked()?
> > >
> > > I thought you were using _df_ now for these functions?
> > >
> > 
> > I see. Your point is passing df to _vfio_device_get_kvm_safe(() and
> > test the df->kvm within it.  Hence rename it to be _df_. I think group
> > path should be ok with this change as well. Let me make it.
> 
> To pass vfio_device_file to _vfio_device_get_kvm_safe(), needs to make
> the below changes to the group path. If just wants to test null kvm in the
> _vfio_device_get_kvm_safe(), maybe simpler to keep the current parameters
> and just move the null kvm test within this function. Is it?

This does seem a bit nicer, yes

> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 8a9ebcc6980b..4e6ea2943d28 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -373,14 +373,22 @@ void vfio_unregister_group_dev(struct vfio_device *device)
>  EXPORT_SYMBOL_GPL(vfio_unregister_group_dev);
>  
>  #ifdef CONFIG_HAVE_KVM
> -void _vfio_device_get_kvm_safe(struct vfio_device *device, struct kvm *kvm)
> +void _vfio_df_get_kvm_safe(struct vfio_device_file *df)

But still avoid the leading _ here

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
@ 2023-06-28 14:34               ` Jason Gunthorpe
  0 siblings, 0 replies; 180+ messages in thread
From: Jason Gunthorpe @ 2023-06-28 14:34 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato, jasowang, Hao, Xudong, Duan, Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, Tian, Kevin, Zhao, Yan Y, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Mon, Jun 26, 2023 at 02:51:29PM +0000, Liu, Yi L wrote:
> > > >
> > > > Ah, any suggestion on the naming? How about vfio_device_get_kvm_safe_locked()?
> > >
> > > I thought you were using _df_ now for these functions?
> > >
> > 
> > I see. Your point is passing df to _vfio_device_get_kvm_safe(() and
> > test the df->kvm within it.  Hence rename it to be _df_. I think group
> > path should be ok with this change as well. Let me make it.
> 
> To pass vfio_device_file to _vfio_device_get_kvm_safe(), needs to make
> the below changes to the group path. If just wants to test null kvm in the
> _vfio_device_get_kvm_safe(), maybe simpler to keep the current parameters
> and just move the null kvm test within this function. Is it?

This does seem a bit nicer, yes

> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 8a9ebcc6980b..4e6ea2943d28 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -373,14 +373,22 @@ void vfio_unregister_group_dev(struct vfio_device *device)
>  EXPORT_SYMBOL_GPL(vfio_unregister_group_dev);
>  
>  #ifdef CONFIG_HAVE_KVM
> -void _vfio_device_get_kvm_safe(struct vfio_device *device, struct kvm *kvm)
> +void _vfio_df_get_kvm_safe(struct vfio_device_file *df)

But still avoid the leading _ here

Jason

^ permalink raw reply	[flat|nested] 180+ messages in thread

* RE: [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
  2023-06-28 14:34               ` [Intel-gfx] " Jason Gunthorpe
@ 2023-06-28 14:41                 ` Liu, Yi L
  -1 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-28 14:41 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: alex.williamson, Tian, Kevin, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390, Hao,
	Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting, Duan,
	Zhenzhong, clegoate

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, June 28, 2023 10:34 PM
> 
> On Mon, Jun 26, 2023 at 02:51:29PM +0000, Liu, Yi L wrote:
> > > > >
> > > > > Ah, any suggestion on the naming? How about
> vfio_device_get_kvm_safe_locked()?
> > > >
> > > > I thought you were using _df_ now for these functions?
> > > >
> > >
> > > I see. Your point is passing df to _vfio_device_get_kvm_safe(() and
> > > test the df->kvm within it.  Hence rename it to be _df_. I think group
> > > path should be ok with this change as well. Let me make it.
> >
> > To pass vfio_device_file to _vfio_device_get_kvm_safe(), needs to make
> > the below changes to the group path. If just wants to test null kvm in the
> > _vfio_device_get_kvm_safe(), maybe simpler to keep the current parameters
> > and just move the null kvm test within this function. Is it?
> 
> This does seem a bit nicer, yes
> 
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index 8a9ebcc6980b..4e6ea2943d28 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -373,14 +373,22 @@ void vfio_unregister_group_dev(struct vfio_device *device)
> >  EXPORT_SYMBOL_GPL(vfio_unregister_group_dev);
> >
> >  #ifdef CONFIG_HAVE_KVM
> > -void _vfio_device_get_kvm_safe(struct vfio_device *device, struct kvm *kvm)
> > +void _vfio_df_get_kvm_safe(struct vfio_device_file *df)
> 
> But still avoid the leading _ here

Ok, I'll move the kvm pointer test to _vfio_device_get_kvm_safe()
And also rename it as vfio_device_get_kvm_safe()

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 180+ messages in thread

* Re: [Intel-gfx] [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
@ 2023-06-28 14:41                 ` Liu, Yi L
  0 siblings, 0 replies; 180+ messages in thread
From: Liu, Yi L @ 2023-06-28 14:41 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato, jasowang, Hao, Xudong, Duan,  Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, Tian, Kevin, Zhao, Yan Y, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, June 28, 2023 10:34 PM
> 
> On Mon, Jun 26, 2023 at 02:51:29PM +0000, Liu, Yi L wrote:
> > > > >
> > > > > Ah, any suggestion on the naming? How about
> vfio_device_get_kvm_safe_locked()?
> > > >
> > > > I thought you were using _df_ now for these functions?
> > > >
> > >
> > > I see. Your point is passing df to _vfio_device_get_kvm_safe(() and
> > > test the df->kvm within it.  Hence rename it to be _df_. I think group
> > > path should be ok with this change as well. Let me make it.
> >
> > To pass vfio_device_file to _vfio_device_get_kvm_safe(), needs to make
> > the below changes to the group path. If just wants to test null kvm in the
> > _vfio_device_get_kvm_safe(), maybe simpler to keep the current parameters
> > and just move the null kvm test within this function. Is it?
> 
> This does seem a bit nicer, yes
> 
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index 8a9ebcc6980b..4e6ea2943d28 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -373,14 +373,22 @@ void vfio_unregister_group_dev(struct vfio_device *device)
> >  EXPORT_SYMBOL_GPL(vfio_unregister_group_dev);
> >
> >  #ifdef CONFIG_HAVE_KVM
> > -void _vfio_device_get_kvm_safe(struct vfio_device *device, struct kvm *kvm)
> > +void _vfio_df_get_kvm_safe(struct vfio_device_file *df)
> 
> But still avoid the leading _ here

Ok, I'll move the kvm pointer test to _vfio_device_get_kvm_safe()
And also rename it as vfio_device_get_kvm_safe()

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 180+ messages in thread

end of thread, other threads:[~2023-06-28 14:41 UTC | newest]

Thread overview: 180+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-02 12:16 [PATCH v12 00/24] Add vfio_device cdev for iommufd support Yi Liu
2023-06-02 12:16 ` [Intel-gfx] " Yi Liu
2023-06-02 12:16 ` [PATCH v12 01/24] vfio: Allocate per device file structure Yi Liu
2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
2023-06-02 12:16 ` [PATCH v12 02/24] vfio: Refine vfio file kAPIs for KVM Yi Liu
2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
2023-06-02 12:16 ` [PATCH v12 03/24] vfio: Accept vfio device file in the KVM facing kAPI Yi Liu
2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
2023-06-02 12:16 ` [PATCH v12 04/24] kvm/vfio: Prepare for accepting vfio device fd Yi Liu
2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
2023-06-02 12:16 ` [PATCH v12 05/24] kvm/vfio: Accept vfio device file from userspace Yi Liu
2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
2023-06-02 12:16 ` [PATCH v12 06/24] vfio: Pass struct vfio_device_file * to vfio_device_open/close() Yi Liu
2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
2023-06-12 21:52   ` Alex Williamson
2023-06-12 21:52     ` Alex Williamson
2023-06-13  5:24     ` Liu, Yi L
2023-06-13  5:24       ` [Intel-gfx] " Liu, Yi L
2023-06-02 12:16 ` [PATCH v12 07/24] vfio: Block device access via device fd until device is opened Yi Liu
2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
2023-06-12 21:52   ` Alex Williamson
2023-06-12 21:52     ` [Intel-gfx] " Alex Williamson
2023-06-13  5:46     ` Liu, Yi L
2023-06-13  5:46       ` [Intel-gfx] " Liu, Yi L
2023-06-13 14:16       ` Alex Williamson
2023-06-13 14:16         ` Alex Williamson
2023-06-13 14:36         ` Liu, Yi L
2023-06-13 14:36           ` [Intel-gfx] " Liu, Yi L
2023-06-13 14:42           ` Alex Williamson
2023-06-13 14:42             ` Alex Williamson
2023-06-13 14:44             ` Liu, Yi L
2023-06-13 14:44               ` [Intel-gfx] " Liu, Yi L
2023-06-13 17:19         ` Jason Gunthorpe
2023-06-13 17:19           ` [Intel-gfx] " Jason Gunthorpe
2023-06-13 17:31           ` Alex Williamson
2023-06-13 17:31             ` Alex Williamson
2023-06-02 12:16 ` [PATCH v12 08/24] vfio: Add cdev_device_open_cnt to vfio_group Yi Liu
2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
2023-06-02 12:16 ` [PATCH v12 09/24] vfio: Make vfio_df_open() single open for device cdev path Yi Liu
2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
2023-06-02 12:16 ` [PATCH v12 10/24] vfio-iommufd: Move noiommu compat validation out of vfio_iommufd_bind() Yi Liu
2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
2023-06-22 17:59   ` Jason Gunthorpe
2023-06-22 17:59     ` [Intel-gfx] " Jason Gunthorpe
2023-06-02 12:16 ` [PATCH v12 11/24] vfio-iommufd: Split bind/attach into two steps Yi Liu
2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
2023-06-22 17:59   ` Jason Gunthorpe
2023-06-22 17:59     ` [Intel-gfx] " Jason Gunthorpe
2023-06-02 12:16 ` [PATCH v12 12/24] vfio: Record devid in vfio_device_file Yi Liu
2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
2023-06-22 18:00   ` Jason Gunthorpe
2023-06-22 18:00     ` Jason Gunthorpe
2023-06-02 12:16 ` [PATCH v12 13/24] vfio-iommufd: Add detach_ioas support for physical VFIO devices Yi Liu
2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
2023-06-23 14:04   ` Jason Gunthorpe
2023-06-23 14:04     ` [Intel-gfx] " Jason Gunthorpe
2023-06-02 12:16 ` [PATCH v12 14/24] iommufd/device: Add iommufd_access_detach() API Yi Liu
2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
2023-06-23 14:15   ` Jason Gunthorpe
2023-06-23 14:15     ` [Intel-gfx] " Jason Gunthorpe
2023-06-25 18:26     ` Nicolin Chen
2023-06-25 18:26       ` [Intel-gfx] " Nicolin Chen
2023-06-02 12:16 ` [PATCH v12 15/24] vfio-iommufd: Add detach_ioas support for emulated VFIO devices Yi Liu
2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
2023-06-23 14:16   ` Jason Gunthorpe
2023-06-23 14:16     ` [Intel-gfx] " Jason Gunthorpe
2023-06-02 12:16 ` [PATCH v12 16/24] vfio: Move vfio_device_group_unregister() to be the first operation in unregister Yi Liu
2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
2023-06-23 14:22   ` Jason Gunthorpe
2023-06-23 14:22     ` [Intel-gfx] " Jason Gunthorpe
2023-06-02 12:16 ` [PATCH v12 17/24] vfio: Add cdev for vfio_device Yi Liu
2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
2023-06-23 15:58   ` Jason Gunthorpe
2023-06-23 15:58     ` [Intel-gfx] " Jason Gunthorpe
2023-06-02 12:16 ` [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD Yi Liu
2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
2023-06-12 22:27   ` Alex Williamson
2023-06-12 22:27     ` Alex Williamson
2023-06-13  5:48     ` Liu, Yi L
2023-06-13  5:48       ` [Intel-gfx] " Liu, Yi L
2023-06-13 14:18       ` Alex Williamson
2023-06-13 14:18         ` Alex Williamson
2023-06-13 14:28         ` Liu, Yi L
2023-06-13 14:28           ` [Intel-gfx] " Liu, Yi L
2023-06-13 14:39           ` Alex Williamson
2023-06-13 14:39             ` Alex Williamson
2023-06-13 14:42             ` [Intel-gfx] " Liu, Yi L
2023-06-13 14:42               ` Liu, Yi L
2023-06-13 14:59               ` Alex Williamson
2023-06-13 14:59                 ` [Intel-gfx] " Alex Williamson
2023-06-23 16:15   ` Jason Gunthorpe
2023-06-23 16:15     ` [Intel-gfx] " Jason Gunthorpe
2023-06-26  8:34     ` Liu, Yi L
2023-06-26  8:34       ` [Intel-gfx] " Liu, Yi L
2023-06-26 12:56       ` Jason Gunthorpe
2023-06-26 12:56         ` [Intel-gfx] " Jason Gunthorpe
2023-06-26 13:35         ` Liu, Yi L
2023-06-26 13:35           ` [Intel-gfx] " Liu, Yi L
2023-06-26 14:51           ` Liu, Yi L
2023-06-26 14:51             ` [Intel-gfx] " Liu, Yi L
2023-06-28 14:34             ` Jason Gunthorpe
2023-06-28 14:34               ` [Intel-gfx] " Jason Gunthorpe
2023-06-28 14:41               ` Liu, Yi L
2023-06-28 14:41                 ` [Intel-gfx] " Liu, Yi L
2023-06-02 12:16 ` [PATCH v12 19/24] vfio: Add VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT Yi Liu
2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
2023-06-23 16:21   ` Jason Gunthorpe
2023-06-23 16:21     ` [Intel-gfx] " Jason Gunthorpe
2023-06-02 12:16 ` [PATCH v12 20/24] vfio: Only check group->type for noiommu test Yi Liu
2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
2023-06-12 22:37   ` Alex Williamson
2023-06-12 22:37     ` [Intel-gfx] " Alex Williamson
2023-06-13  9:20     ` Liu, Yi L
2023-06-13  9:20       ` [Intel-gfx] " Liu, Yi L
2023-06-02 12:16 ` [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev() Yi Liu
2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
2023-06-12 22:42   ` Alex Williamson
2023-06-12 22:42     ` Alex Williamson
2023-06-13  5:53     ` Liu, Yi L
2023-06-13  5:53       ` [Intel-gfx] " Liu, Yi L
2023-06-13 14:19       ` Alex Williamson
2023-06-13 14:19         ` Alex Williamson
2023-06-13 14:33         ` Liu, Yi L
2023-06-13 14:33           ` [Intel-gfx] " Liu, Yi L
2023-06-13 14:48           ` Alex Williamson
2023-06-13 14:48             ` Alex Williamson
2023-06-13 15:01             ` Liu, Yi L
2023-06-13 15:01               ` [Intel-gfx] " Liu, Yi L
2023-06-13 15:13               ` Alex Williamson
2023-06-13 17:15                 ` Alex Williamson
2023-06-13 17:35                   ` Jason Gunthorpe
2023-06-13 17:35                     ` [Intel-gfx] " Jason Gunthorpe
2023-06-13 20:10                     ` Alex Williamson
2023-06-13 20:10                       ` Alex Williamson
2023-06-14  3:24                       ` Liu, Yi L
2023-06-14  3:24                         ` [Intel-gfx] " Liu, Yi L
2023-06-14  5:42                         ` Tian, Kevin
2023-06-14  5:42                           ` [Intel-gfx] " Tian, Kevin
2023-06-14  6:14                           ` Liu, Yi L
2023-06-14  6:14                             ` [Intel-gfx] " Liu, Yi L
2023-06-14  6:20                             ` Tian, Kevin
2023-06-14  6:20                               ` [Intel-gfx] " Tian, Kevin
2023-06-14 12:23                               ` Jason Gunthorpe
2023-06-14 12:23                                 ` Jason Gunthorpe
2023-06-14 13:12                                 ` [Intel-gfx] " Liu, Yi L
2023-06-14 13:12                                   ` Liu, Yi L
2023-06-14 17:30                                   ` Jason Gunthorpe
2023-06-14 17:30                                     ` [Intel-gfx] " Jason Gunthorpe
2023-06-02 12:16 ` [PATCH v12 22/24] vfio: Remove vfio_device_is_noiommu() Yi Liu
2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
2023-06-12 22:46   ` Alex Williamson
2023-06-12 22:46     ` Alex Williamson
2023-06-02 12:16 ` [PATCH v12 23/24] vfio: Compile vfio_group infrastructure optionally Yi Liu
2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
2023-06-23 16:35   ` Jason Gunthorpe
2023-06-23 16:35     ` [Intel-gfx] " Jason Gunthorpe
2023-06-02 12:16 ` [PATCH v12 24/24] docs: vfio: Add vfio device cdev description Yi Liu
2023-06-02 12:16   ` [Intel-gfx] " Yi Liu
2023-06-12 23:06   ` Alex Williamson
2023-06-12 23:06     ` Alex Williamson
2023-06-13 12:01     ` Liu, Yi L
2023-06-13 12:01       ` [Intel-gfx] " Liu, Yi L
2023-06-13 14:24       ` Alex Williamson
2023-06-13 14:24         ` Alex Williamson
2023-06-13 14:48         ` Liu, Yi L
2023-06-13 14:48           ` [Intel-gfx] " Liu, Yi L
2023-06-13 15:04           ` Alex Williamson
2023-06-13 15:04             ` Alex Williamson
2023-06-13 15:11             ` Liu, Yi L
2023-06-13 15:11               ` [Intel-gfx] " Liu, Yi L
2023-06-13 17:30               ` Alex Williamson
2023-06-13 17:30                 ` Alex Williamson
2023-06-02 16:19 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Add vfio_device cdev for iommufd support (rev15) Patchwork
2023-06-07  8:27 ` [PATCH v12 00/24] Add vfio_device cdev for iommufd support Nicolin Chen
2023-06-07  8:27   ` [Intel-gfx] " Nicolin Chen
2023-06-08  6:58 ` Jiang, Yanting
2023-06-08  6:58   ` [Intel-gfx] " Jiang, Yanting
2023-06-09 16:47 ` Matthew Rosato
2023-06-09 16:47   ` [Intel-gfx] " Matthew Rosato
2023-06-14 15:47 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Add vfio_device cdev for iommufd support (rev16) Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.