linux-s390.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/10] Introduce new methods for verifying ownership in vfio PCI hot reset
@ 2023-03-27  9:34 Yi Liu
  2023-03-27  9:34 ` [PATCH v2 01/10] vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset() Yi Liu
                   ` (12 more replies)
  0 siblings, 13 replies; 56+ messages in thread
From: Yi Liu @ 2023-03-27  9:34 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

VFIO_DEVICE_PCI_HOT_RESET requires user to pass an array of group fds
to prove that it owns all devices affected by resetting the calling
device. This series introduces several extensions to allow the ownership
check better aligned with iommufd and coming vfio device cdev support.

First, resetting an unopened device is always safe given nobody is using
it. So relax the check to allow such devices not covered by group fd
array. [1]

When iommufd is used we can simply verify that all affected devices are
bound to a same iommufd then no need for the user to provide extra fd
information. This is enabled by the user passing a zero-length fd array
and moving forward this should be the preferred way for hot reset. [2]

However the iommufd method has difficulty working with noiommu devices
since those devices don't have a valid iommufd, unless the noiommu device
is in a singleton dev_set hence no ownership check is required. [3]

For noiommu backward compatibility a 3rd method is introduced by allowing
the user to pass an array of device fds to prove ownership. [4]

As suggested by Jason [5], we have this series to introduce the above
stuffs to the vfio PCI hot reset. Per the dicussion in [6], this series
also adds a new _INFO ioctl to get hot reset scope for given device.

[1] https://lore.kernel.org/kvm/Y%2FdobS6gdSkxnPH7@nvidia.com/
[2] https://lore.kernel.org/kvm/Y%2FZOOClu8nXy2toX@nvidia.com/#t
[3] https://lore.kernel.org/kvm/ZACX+Np%2FIY7ygqL5@nvidia.com/
[4] https://lore.kernel.org/kvm/DS0PR11MB7529BE88460582BD599DC1F7C3B19@DS0PR11MB7529.namprd11.prod.outlook.com/#t
[5] https://lore.kernel.org/kvm/ZAcvzvhkt9QhCmdi@nvidia.com/
[6] https://lore.kernel.org/kvm/ZBoYgNq60eDpV9Un@nvidia.com/

Change log:

v2:
 - Split the patch 03 of v1 to be 03, 04 and 05 of v2 (Jaon)
 - Add r-b from Kevin and Jason
 - Add patch 10 to introduce a new _INFO ioctl for the usage of device
   fd passing usage in cdev path (Jason, Alex)

v1: https://lore.kernel.org/kvm/20230316124156.12064-1-yi.l.liu@intel.com/

Regards,
	Yi Liu

Yi Liu (10):
  vfio/pci: Update comment around group_fd get in
    vfio_pci_ioctl_pci_hot_reset()
  vfio/pci: Only check ownership of opened devices in hot reset
  vfio/pci: Move the existing hot reset logic to be a helper
  vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for
    vfio_device
  vfio/pci: Allow passing zero-length fd array in
    VFIO_DEVICE_PCI_HOT_RESET
  vfio: Refine vfio file kAPIs for vfio PCI hot reset
  vfio: Accpet device file from vfio PCI hot reset path
  vfio/pci: Renaming for accepting device fd in hot reset path
  vfio/pci: Accept device fd in VFIO_DEVICE_PCI_HOT_RESET ioctl
  vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO

 drivers/iommu/iommufd/device.c   |  12 ++
 drivers/vfio/group.c             |  32 ++--
 drivers/vfio/iommufd.c           |  16 ++
 drivers/vfio/pci/vfio_pci_core.c | 244 ++++++++++++++++++++++++-------
 drivers/vfio/vfio.h              |   2 +
 drivers/vfio/vfio_main.c         |  44 ++++++
 include/linux/iommufd.h          |   3 +
 include/linux/vfio.h             |  14 ++
 include/uapi/linux/vfio.h        |  65 +++++++-
 9 files changed, 364 insertions(+), 68 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 01/10] vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset()
  2023-03-27  9:34 [PATCH v2 00/10] Introduce new methods for verifying ownership in vfio PCI hot reset Yi Liu
@ 2023-03-27  9:34 ` Yi Liu
  2023-03-27  9:34 ` [PATCH v2 02/10] vfio/pci: Only check ownership of opened devices in hot reset Yi Liu
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 56+ messages in thread
From: Yi Liu @ 2023-03-27  9:34 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

this suits more on what the code does.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index a5ab416cf476..65bbef562268 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1308,9 +1308,8 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	}
 
 	/*
-	 * For each group_fd, get the group through the vfio external user
-	 * interface and store the group and iommu ID.  This ensures the group
-	 * is held across the reset.
+	 * Get the group file for each fd to ensure the group held across
+	 * the reset
 	 */
 	for (file_idx = 0; file_idx < hdr.count; file_idx++) {
 		struct file *file = fget(group_fds[file_idx]);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 02/10] vfio/pci: Only check ownership of opened devices in hot reset
  2023-03-27  9:34 [PATCH v2 00/10] Introduce new methods for verifying ownership in vfio PCI hot reset Yi Liu
  2023-03-27  9:34 ` [PATCH v2 01/10] vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset() Yi Liu
@ 2023-03-27  9:34 ` Yi Liu
  2023-03-27  9:34 ` [PATCH v2 03/10] vfio/pci: Move the existing hot reset logic to be a helper Yi Liu
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 56+ messages in thread
From: Yi Liu @ 2023-03-27  9:34 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

If the affected device is not opened by any user, it's safe to reset it
given it's not in use.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 14 +++++++++++---
 include/uapi/linux/vfio.h        |  8 ++++++++
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 65bbef562268..5d745c9abf05 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -2429,10 +2429,18 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
 
 	list_for_each_entry(cur_vma, &dev_set->device_list, vdev.dev_set_list) {
 		/*
-		 * Test whether all the affected devices are contained by the
-		 * set of groups provided by the user.
+		 * Test whether all the affected devices can be reset by the
+		 * user.
+		 *
+		 * Resetting an unused device (not opened) is safe, because
+		 * dev_set->lock is held in hot reset path so this device
+		 * cannot race being opened by another user simultaneously.
+		 *
+		 * Otherwise all opened devices in the dev_set must be
+		 * contained by the set of groups provided by the user.
 		 */
-		if (!vfio_dev_in_groups(cur_vma, groups)) {
+		if (cur_vma->vdev.open_count &&
+		    !vfio_dev_in_groups(cur_vma, groups)) {
 			ret = -EINVAL;
 			goto err_undo;
 		}
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 0552e8dcf0cb..f96e5689cffc 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -673,6 +673,14 @@ struct vfio_pci_hot_reset_info {
  * VFIO_DEVICE_PCI_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE + 13,
  *				    struct vfio_pci_hot_reset)
  *
+ * Userspace requests hot reset for the devices it uses.  Due to the
+ * underlying topology, multiple devices can be affected in the reset
+ * while some might be opened by another user.  To avoid interference
+ * the calling user must ensure all affected devices, if opened, are
+ * owned by itself.
+ *
+ * The ownership is proved by an array of group fds.
+ *
  * Return: 0 on success, -errno on failure.
  */
 struct vfio_pci_hot_reset {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 03/10] vfio/pci: Move the existing hot reset logic to be a helper
  2023-03-27  9:34 [PATCH v2 00/10] Introduce new methods for verifying ownership in vfio PCI hot reset Yi Liu
  2023-03-27  9:34 ` [PATCH v2 01/10] vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset() Yi Liu
  2023-03-27  9:34 ` [PATCH v2 02/10] vfio/pci: Only check ownership of opened devices in hot reset Yi Liu
@ 2023-03-27  9:34 ` Yi Liu
  2023-03-30 23:39   ` Jason Gunthorpe
  2023-03-30 23:44   ` Jason Gunthorpe
  2023-03-27  9:34 ` [PATCH v2 04/10] vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for vfio_device Yi Liu
                   ` (9 subsequent siblings)
  12 siblings, 2 replies; 56+ messages in thread
From: Yi Liu @ 2023-03-27  9:34 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

This prepares to add another method for hot reset. The major hot reset logic
are moved to vfio_pci_ioctl_pci_hot_reset_groups().

No functional change is intended.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 56 +++++++++++++++++++-------------
 1 file changed, 33 insertions(+), 23 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 5d745c9abf05..3696b8e58445 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1255,29 +1255,17 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
 	return ret;
 }
 
-static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
-					struct vfio_pci_hot_reset __user *arg)
+static int
+vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
+				    struct vfio_pci_hot_reset *hdr,
+				    bool slot,
+				    struct vfio_pci_hot_reset __user *arg)
 {
-	unsigned long minsz = offsetofend(struct vfio_pci_hot_reset, count);
-	struct vfio_pci_hot_reset hdr;
 	int32_t *group_fds;
 	struct file **files;
 	struct vfio_pci_group_info info;
-	bool slot = false;
 	int file_idx, count = 0, ret = 0;
 
-	if (copy_from_user(&hdr, arg, minsz))
-		return -EFAULT;
-
-	if (hdr.argsz < minsz || hdr.flags)
-		return -EINVAL;
-
-	/* Can we do a slot or bus reset or neither? */
-	if (!pci_probe_reset_slot(vdev->pdev->slot))
-		slot = true;
-	else if (pci_probe_reset_bus(vdev->pdev->bus))
-		return -ENODEV;
-
 	/*
 	 * We can't let userspace give us an arbitrarily large buffer to copy,
 	 * so verify how many we think there could be.  Note groups can have
@@ -1289,11 +1277,11 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 		return ret;
 
 	/* Somewhere between 1 and count is OK */
-	if (!hdr.count || hdr.count > count)
+	if (!hdr->count || hdr->count > count)
 		return -EINVAL;
 
-	group_fds = kcalloc(hdr.count, sizeof(*group_fds), GFP_KERNEL);
-	files = kcalloc(hdr.count, sizeof(*files), GFP_KERNEL);
+	group_fds = kcalloc(hdr->count, sizeof(*group_fds), GFP_KERNEL);
+	files = kcalloc(hdr->count, sizeof(*files), GFP_KERNEL);
 	if (!group_fds || !files) {
 		kfree(group_fds);
 		kfree(files);
@@ -1301,7 +1289,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	}
 
 	if (copy_from_user(group_fds, arg->group_fds,
-			   hdr.count * sizeof(*group_fds))) {
+			   hdr->count * sizeof(*group_fds))) {
 		kfree(group_fds);
 		kfree(files);
 		return -EFAULT;
@@ -1311,7 +1299,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	 * Get the group file for each fd to ensure the group held across
 	 * the reset
 	 */
-	for (file_idx = 0; file_idx < hdr.count; file_idx++) {
+	for (file_idx = 0; file_idx < hdr->count; file_idx++) {
 		struct file *file = fget(group_fds[file_idx]);
 
 		if (!file) {
@@ -1335,7 +1323,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	if (ret)
 		goto hot_reset_release;
 
-	info.count = hdr.count;
+	info.count = hdr->count;
 	info.files = files;
 
 	ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info);
@@ -1348,6 +1336,28 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	return ret;
 }
 
+static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
+					struct vfio_pci_hot_reset __user *arg)
+{
+	unsigned long minsz = offsetofend(struct vfio_pci_hot_reset, count);
+	struct vfio_pci_hot_reset hdr;
+	bool slot = false;
+
+	if (copy_from_user(&hdr, arg, minsz))
+		return -EFAULT;
+
+	if (hdr.argsz < minsz || hdr.flags)
+		return -EINVAL;
+
+	/* Can we do a slot or bus reset or neither? */
+	if (!pci_probe_reset_slot(vdev->pdev->slot))
+		slot = true;
+	else if (pci_probe_reset_bus(vdev->pdev->bus))
+		return -ENODEV;
+
+	return vfio_pci_ioctl_pci_hot_reset_groups(vdev, &hdr, slot, arg);
+}
+
 static int vfio_pci_ioctl_ioeventfd(struct vfio_pci_core_device *vdev,
 				    struct vfio_device_ioeventfd __user *arg)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 04/10] vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for vfio_device
  2023-03-27  9:34 [PATCH v2 00/10] Introduce new methods for verifying ownership in vfio PCI hot reset Yi Liu
                   ` (2 preceding siblings ...)
  2023-03-27  9:34 ` [PATCH v2 03/10] vfio/pci: Move the existing hot reset logic to be a helper Yi Liu
@ 2023-03-27  9:34 ` Yi Liu
  2023-03-30 23:44   ` Jason Gunthorpe
  2023-03-27  9:34 ` [PATCH v2 05/10] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET Yi Liu
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 56+ messages in thread
From: Yi Liu @ 2023-03-27  9:34 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

This is needed by the vfio pci driver to report affected devices in the
hot reset for a given device.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/device.c | 12 ++++++++++++
 drivers/vfio/iommufd.c         | 16 ++++++++++++++++
 include/linux/iommufd.h        |  3 +++
 include/linux/vfio.h           | 13 +++++++++++++
 4 files changed, 44 insertions(+)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 25115d401d8f..04a57aa1ae2c 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -131,6 +131,18 @@ void iommufd_device_unbind(struct iommufd_device *idev)
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_device_unbind, IOMMUFD);
 
+struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev)
+{
+	return idev->ictx;
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_device_to_ictx, IOMMUFD);
+
+u32 iommufd_device_to_id(struct iommufd_device *idev)
+{
+	return idev->obj.id;
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_device_to_id, IOMMUFD);
+
 static int iommufd_device_setup_msi(struct iommufd_device *idev,
 				    struct iommufd_hw_pagetable *hwpt,
 				    phys_addr_t sw_msi_start)
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index 88b00c501015..44088049dbb1 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -66,6 +66,22 @@ void vfio_iommufd_unbind(struct vfio_device *vdev)
 		vdev->ops->unbind_iommufd(vdev);
 }
 
+struct iommufd_ctx *vfio_iommufd_physical_ictx(struct vfio_device *vdev)
+{
+	if (!vdev->iommufd_device)
+		return NULL;
+	return iommufd_device_to_ictx(vdev->iommufd_device);
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_physical_ictx);
+
+int vfio_iommufd_physical_devid(struct vfio_device *vdev, u32 *id)
+{
+	if (!vdev->iommufd_device)
+		return -EINVAL;
+	*id = iommufd_device_to_id(vdev->iommufd_device);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_physical_devid);
 /*
  * The physical standard ops mean that the iommufd_device is bound to the
  * physical device vdev->dev that was provided to vfio_init_group_dev(). Drivers
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index 1129a36a74c4..ac96df406833 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -24,6 +24,9 @@ void iommufd_device_unbind(struct iommufd_device *idev);
 int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id);
 void iommufd_device_detach(struct iommufd_device *idev);
 
+struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev);
+u32 iommufd_device_to_id(struct iommufd_device *idev);
+
 struct iommufd_access_ops {
 	u8 needs_pin_pages : 1;
 	void (*unmap)(void *data, unsigned long iova, unsigned long length);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 3188d8a374bd..e54bef5489a0 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -113,6 +113,8 @@ struct vfio_device_ops {
 };
 
 #if IS_ENABLED(CONFIG_IOMMUFD)
+struct iommufd_ctx *vfio_iommufd_physical_ictx(struct vfio_device *vdev);
+int vfio_iommufd_physical_devid(struct vfio_device *vdev, u32 *id);
 int vfio_iommufd_physical_bind(struct vfio_device *vdev,
 			       struct iommufd_ctx *ictx, u32 *out_device_id);
 void vfio_iommufd_physical_unbind(struct vfio_device *vdev);
@@ -122,6 +124,17 @@ int vfio_iommufd_emulated_bind(struct vfio_device *vdev,
 void vfio_iommufd_emulated_unbind(struct vfio_device *vdev);
 int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
 #else
+static inline struct iommufd_ctx *
+vfio_iommufd_physical_ictx(struct vfio_device *vdev)
+{
+	return NULL;
+}
+
+static inline int vfio_iommufd_physical_devid(struct vfio_device *vdev, u32 *id)
+{
+	return -EOPNOTSUPP;
+}
+
 #define vfio_iommufd_physical_bind                                      \
 	((int (*)(struct vfio_device *vdev, struct iommufd_ctx *ictx,   \
 		  u32 *out_device_id)) NULL)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 05/10] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-03-27  9:34 [PATCH v2 00/10] Introduce new methods for verifying ownership in vfio PCI hot reset Yi Liu
                   ` (3 preceding siblings ...)
  2023-03-27  9:34 ` [PATCH v2 04/10] vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for vfio_device Yi Liu
@ 2023-03-27  9:34 ` Yi Liu
  2023-03-30 23:47   ` Jason Gunthorpe
  2023-03-27  9:34 ` [PATCH v2 06/10] vfio: Refine vfio file kAPIs for vfio PCI hot reset Yi Liu
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 56+ messages in thread
From: Yi Liu @ 2023-03-27  9:34 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

as an alternative method for ownership check when iommufd is used. In
this case all opened devices in the affected dev_set are verified to
be bound to a same valid iommufd value to allow reset. It's simpler
and faster as user does not need to pass a set of fds and kernel no
need to search the device within the given fds.

a device in noiommu mode doesn't have a valid iommufd, so this method
should not be used in a dev_set which contains multiple devices and one
of them is in noiommu. The only allowed noiommu scenario is that the
calling device is noiommu and it's in a singleton dev_set.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 42 +++++++++++++++++++++++++++-----
 include/uapi/linux/vfio.h        |  9 ++++++-
 2 files changed, 44 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 3696b8e58445..b68fcba67a4b 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -180,7 +180,8 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_core_device *vdev)
 struct vfio_pci_group_info;
 static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set);
 static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
-				      struct vfio_pci_group_info *groups);
+				      struct vfio_pci_group_info *groups,
+				      struct iommufd_ctx *iommufd_ctx);
 
 /*
  * INTx masking requires the ability to disable INTx signaling via PCI_COMMAND
@@ -1277,7 +1278,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
 		return ret;
 
 	/* Somewhere between 1 and count is OK */
-	if (!hdr->count || hdr->count > count)
+	if (hdr->count > count)
 		return -EINVAL;
 
 	group_fds = kcalloc(hdr->count, sizeof(*group_fds), GFP_KERNEL);
@@ -1326,7 +1327,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
 	info.count = hdr->count;
 	info.files = files;
 
-	ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info);
+	ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info, NULL);
 
 hot_reset_release:
 	for (file_idx--; file_idx >= 0; file_idx--)
@@ -1341,6 +1342,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 {
 	unsigned long minsz = offsetofend(struct vfio_pci_hot_reset, count);
 	struct vfio_pci_hot_reset hdr;
+	struct iommufd_ctx *iommufd;
 	bool slot = false;
 
 	if (copy_from_user(&hdr, arg, minsz))
@@ -1355,7 +1357,12 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	else if (pci_probe_reset_bus(vdev->pdev->bus))
 		return -ENODEV;
 
-	return vfio_pci_ioctl_pci_hot_reset_groups(vdev, &hdr, slot, arg);
+	if (hdr.count)
+		return vfio_pci_ioctl_pci_hot_reset_groups(vdev, &hdr, slot, arg);
+
+	iommufd = vfio_iommufd_physical_ictx(&vdev->vdev);
+
+	return vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, NULL, iommufd);
 }
 
 static int vfio_pci_ioctl_ioeventfd(struct vfio_pci_core_device *vdev,
@@ -2327,6 +2334,9 @@ static bool vfio_dev_in_groups(struct vfio_pci_core_device *vdev,
 {
 	unsigned int i;
 
+	if (!groups)
+		return false;
+
 	for (i = 0; i < groups->count; i++)
 		if (vfio_file_has_dev(groups->files[i], &vdev->vdev))
 			return true;
@@ -2402,13 +2412,25 @@ static int vfio_pci_dev_set_pm_runtime_get(struct vfio_device_set *dev_set)
 	return ret;
 }
 
+static bool vfio_dev_in_iommufd_ctx(struct vfio_pci_core_device *vdev,
+				    struct iommufd_ctx *iommufd_ctx)
+{
+	struct iommufd_ctx *iommufd = vfio_iommufd_physical_ictx(&vdev->vdev);
+
+	if (!iommufd)
+		return false;
+
+	return iommufd == iommufd_ctx;
+}
+
 /*
  * We need to get memory_lock for each device, but devices can share mmap_lock,
  * therefore we need to zap and hold the vma_lock for each device, and only then
  * get each memory_lock.
  */
 static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
-				      struct vfio_pci_group_info *groups)
+				      struct vfio_pci_group_info *groups,
+				      struct iommufd_ctx *iommufd_ctx)
 {
 	struct vfio_pci_core_device *cur_mem;
 	struct vfio_pci_core_device *cur_vma;
@@ -2448,9 +2470,17 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
 		 *
 		 * Otherwise all opened devices in the dev_set must be
 		 * contained by the set of groups provided by the user.
+		 *
+		 * If user provides a zero-length array, then all the
+		 * opened devices must be bound to a same iommufd_ctx.
+		 *
+		 * If all above checks are failed, reset is allowed only if
+		 * the calling device is in a singleton dev_set.
 		 */
 		if (cur_vma->vdev.open_count &&
-		    !vfio_dev_in_groups(cur_vma, groups)) {
+		    !vfio_dev_in_groups(cur_vma, groups) &&
+		    !vfio_dev_in_iommufd_ctx(cur_vma, iommufd_ctx) &&
+		    (dev_set->device_count > 1)) {
 			ret = -EINVAL;
 			goto err_undo;
 		}
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index f96e5689cffc..17aa5d09db41 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -679,7 +679,14 @@ struct vfio_pci_hot_reset_info {
  * the calling user must ensure all affected devices, if opened, are
  * owned by itself.
  *
- * The ownership is proved by an array of group fds.
+ * The ownership can be proved by:
+ *   - An array of group fds
+ *   - A zero-length array
+ *
+ * In the last case all affected devices which are opened by this user
+ * must have been bound to a same iommufd. If the calling device is in
+ * noiommu mode (no valid iommufd) then it can be reset only if the reset
+ * doesn't affect other devices.
  *
  * Return: 0 on success, -errno on failure.
  */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 06/10] vfio: Refine vfio file kAPIs for vfio PCI hot reset
  2023-03-27  9:34 [PATCH v2 00/10] Introduce new methods for verifying ownership in vfio PCI hot reset Yi Liu
                   ` (4 preceding siblings ...)
  2023-03-27  9:34 ` [PATCH v2 05/10] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET Yi Liu
@ 2023-03-27  9:34 ` Yi Liu
  2023-03-30 23:48   ` Jason Gunthorpe
  2023-03-27  9:34 ` [PATCH v2 07/10] vfio: Accpet device file from vfio PCI hot reset path Yi Liu
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 56+ messages in thread
From: Yi Liu @ 2023-03-27  9:34 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

This prepares vfio core to accept vfio device file from the vfio PCI
hot reset path. vfio_file_is_group() is still kept for KVM usage.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/group.c             | 32 ++++++++++++++------------------
 drivers/vfio/pci/vfio_pci_core.c |  4 ++--
 drivers/vfio/vfio.h              |  2 ++
 drivers/vfio/vfio_main.c         | 29 +++++++++++++++++++++++++++++
 include/linux/vfio.h             |  1 +
 5 files changed, 48 insertions(+), 20 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 27d5ba7cf9dc..d0c95d033605 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -745,6 +745,15 @@ bool vfio_device_has_container(struct vfio_device *device)
 	return device->group->container;
 }
 
+struct vfio_group *vfio_group_from_file(struct file *file)
+{
+	struct vfio_group *group = file->private_data;
+
+	if (file->f_op != &vfio_group_fops)
+		return NULL;
+	return group;
+}
+
 /**
  * vfio_file_iommu_group - Return the struct iommu_group for the vfio group file
  * @file: VFIO group file
@@ -755,13 +764,13 @@ bool vfio_device_has_container(struct vfio_device *device)
  */
 struct iommu_group *vfio_file_iommu_group(struct file *file)
 {
-	struct vfio_group *group = file->private_data;
+	struct vfio_group *group = vfio_group_from_file(file);
 	struct iommu_group *iommu_group = NULL;
 
 	if (!IS_ENABLED(CONFIG_SPAPR_TCE_IOMMU))
 		return NULL;
 
-	if (!vfio_file_is_group(file))
+	if (!group)
 		return NULL;
 
 	mutex_lock(&group->group_lock);
@@ -775,12 +784,12 @@ struct iommu_group *vfio_file_iommu_group(struct file *file)
 EXPORT_SYMBOL_GPL(vfio_file_iommu_group);
 
 /**
- * vfio_file_is_group - True if the file is usable with VFIO aPIS
+ * vfio_file_is_group - True if the file is a vfio group file
  * @file: VFIO group file
  */
 bool vfio_file_is_group(struct file *file)
 {
-	return file->f_op == &vfio_group_fops;
+	return vfio_group_from_file(file);
 }
 EXPORT_SYMBOL_GPL(vfio_file_is_group);
 
@@ -842,23 +851,10 @@ void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
 }
 EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
 
-/**
- * vfio_file_has_dev - True if the VFIO file is a handle for device
- * @file: VFIO file to check
- * @device: Device that must be part of the file
- *
- * Returns true if given file has permission to manipulate the given device.
- */
-bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
+bool vfio_group_has_dev(struct vfio_group *group, struct vfio_device *device)
 {
-	struct vfio_group *group = file->private_data;
-
-	if (!vfio_file_is_group(file))
-		return false;
-
 	return group == device->group;
 }
-EXPORT_SYMBOL_GPL(vfio_file_has_dev);
 
 static char *vfio_devnode(const struct device *dev, umode_t *mode)
 {
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index b68fcba67a4b..2a510b71edcb 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1308,8 +1308,8 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
 			break;
 		}
 
-		/* Ensure the FD is a vfio group FD.*/
-		if (!vfio_file_is_group(file)) {
+		/* Ensure the FD is a vfio FD. vfio group or vfio device */
+		if (!vfio_file_is_valid(file)) {
 			fput(file);
 			ret = -EINVAL;
 			break;
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 7b19c621e0e6..c0aeea24fbd6 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -84,6 +84,8 @@ void vfio_device_group_unregister(struct vfio_device *device);
 int vfio_device_group_use_iommu(struct vfio_device *device);
 void vfio_device_group_unuse_iommu(struct vfio_device *device);
 void vfio_device_group_close(struct vfio_device *device);
+struct vfio_group *vfio_group_from_file(struct file *file);
+bool vfio_group_has_dev(struct vfio_group *group, struct vfio_device *device);
 bool vfio_device_has_container(struct vfio_device *device);
 int __init vfio_group_init(void);
 void vfio_group_cleanup(void);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 89497c933490..fe7446805afd 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1154,6 +1154,35 @@ const struct file_operations vfio_device_fops = {
 	.mmap		= vfio_device_fops_mmap,
 };
 
+/**
+ * vfio_file_is_valid - True if the file is valid vfio file
+ * @file: VFIO group file or VFIO device file
+ */
+bool vfio_file_is_valid(struct file *file)
+{
+	return vfio_group_from_file(file);
+}
+EXPORT_SYMBOL_GPL(vfio_file_is_valid);
+
+/**
+ * vfio_file_has_dev - True if the VFIO file is a handle for device
+ * @file: VFIO file to check
+ * @device: Device that must be part of the file
+ *
+ * Returns true if given file has permission to manipulate the given device.
+ */
+bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
+{
+	struct vfio_group *group;
+
+	group = vfio_group_from_file(file);
+	if (!group)
+		return false;
+
+	return vfio_group_has_dev(group, device);
+}
+EXPORT_SYMBOL_GPL(vfio_file_has_dev);
+
 /*
  * Sub-module support
  */
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index e54bef5489a0..79c47733ae0d 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -258,6 +258,7 @@ int vfio_mig_get_next_state(struct vfio_device *device,
  */
 struct iommu_group *vfio_file_iommu_group(struct file *file);
 bool vfio_file_is_group(struct file *file);
+bool vfio_file_is_valid(struct file *file);
 bool vfio_file_enforced_coherent(struct file *file);
 void vfio_file_set_kvm(struct file *file, struct kvm *kvm);
 bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 07/10] vfio: Accpet device file from vfio PCI hot reset path
  2023-03-27  9:34 [PATCH v2 00/10] Introduce new methods for verifying ownership in vfio PCI hot reset Yi Liu
                   ` (5 preceding siblings ...)
  2023-03-27  9:34 ` [PATCH v2 06/10] vfio: Refine vfio file kAPIs for vfio PCI hot reset Yi Liu
@ 2023-03-27  9:34 ` Yi Liu
  2023-03-30 23:49   ` Jason Gunthorpe
  2023-03-27  9:34 ` [PATCH v2 08/10] vfio/pci: Renaming for accepting device fd in " Yi Liu
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 56+ messages in thread
From: Yi Liu @ 2023-03-27  9:34 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

This extends both vfio_file_is_valid() and vfio_file_has_dev() to accept
device file from the vfio PCI hot reset.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/vfio_main.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index fe7446805afd..ebbb6b91a498 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1154,13 +1154,23 @@ const struct file_operations vfio_device_fops = {
 	.mmap		= vfio_device_fops_mmap,
 };
 
+static struct vfio_device *vfio_device_from_file(struct file *file)
+{
+	struct vfio_device *device = file->private_data;
+
+	if (file->f_op != &vfio_device_fops)
+		return NULL;
+	return device;
+}
+
 /**
  * vfio_file_is_valid - True if the file is valid vfio file
  * @file: VFIO group file or VFIO device file
  */
 bool vfio_file_is_valid(struct file *file)
 {
-	return vfio_group_from_file(file);
+	return vfio_group_from_file(file) ||
+	       vfio_device_from_file(file);
 }
 EXPORT_SYMBOL_GPL(vfio_file_is_valid);
 
@@ -1174,12 +1184,17 @@ EXPORT_SYMBOL_GPL(vfio_file_is_valid);
 bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
 {
 	struct vfio_group *group;
+	struct vfio_device *vdev;
 
 	group = vfio_group_from_file(file);
-	if (!group)
-		return false;
+	if (group)
+		return vfio_group_has_dev(group, device);
+
+	vdev = vfio_device_from_file(file);
+	if (vdev)
+		return vdev == device;
 
-	return vfio_group_has_dev(group, device);
+	return false;
 }
 EXPORT_SYMBOL_GPL(vfio_file_has_dev);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 08/10] vfio/pci: Renaming for accepting device fd in hot reset path
  2023-03-27  9:34 [PATCH v2 00/10] Introduce new methods for verifying ownership in vfio PCI hot reset Yi Liu
                   ` (6 preceding siblings ...)
  2023-03-27  9:34 ` [PATCH v2 07/10] vfio: Accpet device file from vfio PCI hot reset path Yi Liu
@ 2023-03-27  9:34 ` Yi Liu
  2023-03-27  9:34 ` [PATCH v2 09/10] vfio/pci: Accept device fd in VFIO_DEVICE_PCI_HOT_RESET ioctl Yi Liu
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 56+ messages in thread
From: Yi Liu @ 2023-03-27  9:34 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

No functional change is intended.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 52 ++++++++++++++++----------------
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 2a510b71edcb..da6325008872 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -177,10 +177,10 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_core_device *vdev)
 	}
 }
 
-struct vfio_pci_group_info;
+struct vfio_pci_file_info;
 static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set);
 static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
-				      struct vfio_pci_group_info *groups,
+				      struct vfio_pci_file_info *info,
 				      struct iommufd_ctx *iommufd_ctx);
 
 /*
@@ -800,7 +800,7 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
 	return 0;
 }
 
-struct vfio_pci_group_info {
+struct vfio_pci_file_info {
 	int count;
 	struct file **files;
 };
@@ -1257,14 +1257,14 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
 }
 
 static int
-vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
-				    struct vfio_pci_hot_reset *hdr,
-				    bool slot,
-				    struct vfio_pci_hot_reset __user *arg)
+vfio_pci_ioctl_pci_hot_reset_files(struct vfio_pci_core_device *vdev,
+				   struct vfio_pci_hot_reset *hdr,
+				   bool slot,
+				   struct vfio_pci_hot_reset __user *arg)
 {
-	int32_t *group_fds;
+	int32_t *fds;
 	struct file **files;
-	struct vfio_pci_group_info info;
+	struct vfio_pci_file_info info;
 	int file_idx, count = 0, ret = 0;
 
 	/*
@@ -1281,17 +1281,17 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
 	if (hdr->count > count)
 		return -EINVAL;
 
-	group_fds = kcalloc(hdr->count, sizeof(*group_fds), GFP_KERNEL);
+	fds = kcalloc(hdr->count, sizeof(*fds), GFP_KERNEL);
 	files = kcalloc(hdr->count, sizeof(*files), GFP_KERNEL);
-	if (!group_fds || !files) {
-		kfree(group_fds);
+	if (!fds || !files) {
+		kfree(fds);
 		kfree(files);
 		return -ENOMEM;
 	}
 
-	if (copy_from_user(group_fds, arg->group_fds,
-			   hdr->count * sizeof(*group_fds))) {
-		kfree(group_fds);
+	if (copy_from_user(fds, arg->group_fds,
+			   hdr->count * sizeof(*fds))) {
+		kfree(fds);
 		kfree(files);
 		return -EFAULT;
 	}
@@ -1301,7 +1301,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
 	 * the reset
 	 */
 	for (file_idx = 0; file_idx < hdr->count; file_idx++) {
-		struct file *file = fget(group_fds[file_idx]);
+		struct file *file = fget(fds[file_idx]);
 
 		if (!file) {
 			ret = -EBADF;
@@ -1318,9 +1318,9 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
 		files[file_idx] = file;
 	}
 
-	kfree(group_fds);
+	kfree(fds);
 
-	/* release reference to groups on error */
+	/* release reference to fds on error */
 	if (ret)
 		goto hot_reset_release;
 
@@ -1358,7 +1358,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 		return -ENODEV;
 
 	if (hdr.count)
-		return vfio_pci_ioctl_pci_hot_reset_groups(vdev, &hdr, slot, arg);
+		return vfio_pci_ioctl_pci_hot_reset_files(vdev, &hdr, slot, arg);
 
 	iommufd = vfio_iommufd_physical_ictx(&vdev->vdev);
 
@@ -2329,16 +2329,16 @@ const struct pci_error_handlers vfio_pci_core_err_handlers = {
 };
 EXPORT_SYMBOL_GPL(vfio_pci_core_err_handlers);
 
-static bool vfio_dev_in_groups(struct vfio_pci_core_device *vdev,
-			       struct vfio_pci_group_info *groups)
+static bool vfio_dev_in_files(struct vfio_pci_core_device *vdev,
+			      struct vfio_pci_file_info *info)
 {
 	unsigned int i;
 
-	if (!groups)
+	if (!info)
 		return false;
 
-	for (i = 0; i < groups->count; i++)
-		if (vfio_file_has_dev(groups->files[i], &vdev->vdev))
+	for (i = 0; i < info->count; i++)
+		if (vfio_file_has_dev(info->files[i], &vdev->vdev))
 			return true;
 	return false;
 }
@@ -2429,7 +2429,7 @@ static bool vfio_dev_in_iommufd_ctx(struct vfio_pci_core_device *vdev,
  * get each memory_lock.
  */
 static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
-				      struct vfio_pci_group_info *groups,
+				      struct vfio_pci_file_info *info,
 				      struct iommufd_ctx *iommufd_ctx)
 {
 	struct vfio_pci_core_device *cur_mem;
@@ -2478,7 +2478,7 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
 		 * the calling device is in a singleton dev_set.
 		 */
 		if (cur_vma->vdev.open_count &&
-		    !vfio_dev_in_groups(cur_vma, groups) &&
+		    !vfio_dev_in_files(cur_vma, info) &&
 		    !vfio_dev_in_iommufd_ctx(cur_vma, iommufd_ctx) &&
 		    (dev_set->device_count > 1)) {
 			ret = -EINVAL;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 09/10] vfio/pci: Accept device fd in VFIO_DEVICE_PCI_HOT_RESET ioctl
  2023-03-27  9:34 [PATCH v2 00/10] Introduce new methods for verifying ownership in vfio PCI hot reset Yi Liu
                   ` (7 preceding siblings ...)
  2023-03-27  9:34 ` [PATCH v2 08/10] vfio/pci: Renaming for accepting device fd in " Yi Liu
@ 2023-03-27  9:34 ` Yi Liu
  2023-03-30 23:50   ` Jason Gunthorpe
  2023-03-27  9:34 ` [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO Yi Liu
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 56+ messages in thread
From: Yi Liu @ 2023-03-27  9:34 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

Now user can also provide an array of device fds as a 3rd method to verify
the reset ownership. It's not useful at this point when the device fds are
acquired via group fds. But it's necessary when moving to device cdev which
allows the user to directly acquire device fds by skipping group. In that
case this method can be used as a last resort when the preferred iommufd
verification doesn't work, e.g. in noiommu usages.

Clarify it in uAPI.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 9 +++++----
 include/uapi/linux/vfio.h        | 3 ++-
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index da6325008872..19f5b075d70a 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1289,7 +1289,7 @@ vfio_pci_ioctl_pci_hot_reset_files(struct vfio_pci_core_device *vdev,
 		return -ENOMEM;
 	}
 
-	if (copy_from_user(fds, arg->group_fds,
+	if (copy_from_user(fds, arg->fds,
 			   hdr->count * sizeof(*fds))) {
 		kfree(fds);
 		kfree(files);
@@ -1297,8 +1297,8 @@ vfio_pci_ioctl_pci_hot_reset_files(struct vfio_pci_core_device *vdev,
 	}
 
 	/*
-	 * Get the group file for each fd to ensure the group held across
-	 * the reset
+	 * Get the file for each fd to ensure the group/device file
+	 * is held across the reset
 	 */
 	for (file_idx = 0; file_idx < hdr->count; file_idx++) {
 		struct file *file = fget(fds[file_idx]);
@@ -2469,7 +2469,8 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
 		 * cannot race being opened by another user simultaneously.
 		 *
 		 * Otherwise all opened devices in the dev_set must be
-		 * contained by the set of groups provided by the user.
+		 * contained by the set of groups/devices provided by
+		 * the user.
 		 *
 		 * If user provides a zero-length array, then all the
 		 * opened devices must be bound to a same iommufd_ctx.
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 17aa5d09db41..25432ef213ee 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -681,6 +681,7 @@ struct vfio_pci_hot_reset_info {
  *
  * The ownership can be proved by:
  *   - An array of group fds
+ *   - An array of device fds
  *   - A zero-length array
  *
  * In the last case all affected devices which are opened by this user
@@ -694,7 +695,7 @@ struct vfio_pci_hot_reset {
 	__u32	argsz;
 	__u32	flags;
 	__u32	count;
-	__s32	group_fds[];
+	__s32	fds[];
 };
 
 #define VFIO_DEVICE_PCI_HOT_RESET	_IO(VFIO_TYPE, VFIO_BASE + 13)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
  2023-03-27  9:34 [PATCH v2 00/10] Introduce new methods for verifying ownership in vfio PCI hot reset Yi Liu
                   ` (8 preceding siblings ...)
  2023-03-27  9:34 ` [PATCH v2 09/10] vfio/pci: Accept device fd in VFIO_DEVICE_PCI_HOT_RESET ioctl Yi Liu
@ 2023-03-27  9:34 ` Yi Liu
  2023-03-27 19:26   ` Alex Williamson
  2023-03-28 12:40   ` Jason Gunthorpe
  2023-03-31  3:14 ` [PATCH v2 00/10] Introduce new methods for verifying ownership in vfio PCI hot reset Jiang, Yanting
                   ` (2 subsequent siblings)
  12 siblings, 2 replies; 56+ messages in thread
From: Yi Liu @ 2023-03-27  9:34 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

This is a preparation for vfio device cdev as cdev gives userspace the
capability to open device cdev fd and management stack (e.g. libvirt)
could pass the device fd to the actual user (e.g. QEMU). As a result,
the actual user has no idea about the device's bus:devfn information.
This is a problem when user uses VFIO_DEVICE_GET_PCI_HOT_RESET_INFO to
know the hot reset affected device scope as this ioctl returns bus:devfn
info. For the fd passing usage, the acutal user cannot map the bus:devfn
to the devices it has opened via the fd passed from management stack. So
a new ioctl is required.

This new ioctl reports the list of iommufd dev_id that is opened by the
user. If there is affected device that is not bound to vfio driver or
opened by another user, this command shall fail with -EPERM. For the
noiommu mode in the vfio device cdev path, this shall fail as no dev_id
would be generated, hence nothing to report.

This ioctl is useless to the users that open vfio group as such users
have no idea about the iommufd dev_id and it can use the existing
VFIO_DEVICE_GET_PCI_HOT_RESET_INFO. The user that uses the traditional
mode vfio group/container would be failed if invoking this ioctl. But
the user that uses the iommufd compat mode vfio group/container shall
succeed. This is harmless as long as user cannot make use of it and
should use VFIO_DEVICE_GET_PCI_HOT_RESET_INFO.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 98 ++++++++++++++++++++++++++++++++
 include/uapi/linux/vfio.h        | 47 +++++++++++++++
 2 files changed, 145 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 19f5b075d70a..45edf4e9b98b 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1181,6 +1181,102 @@ static int vfio_pci_ioctl_reset(struct vfio_pci_core_device *vdev,
 	return ret;
 }
 
+static struct pci_dev *
+vfio_pci_dev_set_resettable(struct vfio_device_set *dev_set);
+
+static int vfio_pci_ioctl_get_pci_hot_reset_group_info(
+	struct vfio_pci_core_device *vdev,
+	struct vfio_pci_hot_reset_group_info __user *arg)
+{
+	unsigned long minsz =
+		offsetofend(struct vfio_pci_hot_reset_group_info, count);
+	struct vfio_pci_hot_reset_group_info hdr;
+	struct iommufd_ctx *iommufd, *cur_iommufd;
+	u32 count = 0, index = 0, *devices = NULL;
+	struct vfio_pci_core_device *cur;
+	bool slot = false;
+	int ret = 0;
+
+	if (copy_from_user(&hdr, arg, minsz))
+		return -EFAULT;
+
+	if (hdr.argsz < minsz)
+		return -EINVAL;
+
+	hdr.flags = 0;
+
+	/* Can we do a slot or bus reset or neither? */
+	if (!pci_probe_reset_slot(vdev->pdev->slot))
+		slot = true;
+	else if (pci_probe_reset_bus(vdev->pdev->bus))
+		return -ENODEV;
+
+	mutex_lock(&vdev->vdev.dev_set->lock);
+	if (!vfio_pci_dev_set_resettable(vdev->vdev.dev_set)) {
+		ret = -EPERM;
+		goto out_unlock;
+	}
+
+	iommufd = vfio_iommufd_physical_ictx(&vdev->vdev);
+	if (!iommufd) {
+		ret = -EPERM;
+		goto out_unlock;
+	}
+
+	/* How many devices are affected? */
+	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_count_devs,
+					    &count, slot);
+	if (ret)
+		goto out_unlock;
+
+	WARN_ON(!count); /* Should always be at least one */
+
+	/*
+	 * If there's enough space, fill it now, otherwise return -ENOSPC and
+	 * the number of devices affected.
+	 */
+	if (hdr.argsz < sizeof(hdr) + (count * sizeof(*devices))) {
+		ret = -ENOSPC;
+		hdr.count = count;
+		goto reset_info_exit;
+	}
+
+	devices = kcalloc(count, sizeof(*devices), GFP_KERNEL);
+	if (!devices) {
+		ret = -ENOMEM;
+		goto reset_info_exit;
+	}
+
+	list_for_each_entry(cur, &vdev->vdev.dev_set->device_list, vdev.dev_set_list) {
+		cur_iommufd = vfio_iommufd_physical_ictx(&cur->vdev);
+		if (cur->vdev.open_count) {
+			if (cur_iommufd != iommufd) {
+				ret = -EPERM;
+				break;
+			}
+			ret = vfio_iommufd_physical_devid(&cur->vdev, &devices[index]);
+			if (ret)
+				break;
+			index++;
+		}
+	}
+
+reset_info_exit:
+	if (copy_to_user(arg, &hdr, minsz))
+		ret = -EFAULT;
+
+	if (!ret) {
+		if (copy_to_user(&arg->devices, devices,
+				 hdr.count * sizeof(*devices)))
+			ret = -EFAULT;
+	}
+
+	kfree(devices);
+out_unlock:
+	mutex_unlock(&vdev->vdev.dev_set->lock);
+	return ret;
+}
+
 static int vfio_pci_ioctl_get_pci_hot_reset_info(
 	struct vfio_pci_core_device *vdev,
 	struct vfio_pci_hot_reset_info __user *arg)
@@ -1404,6 +1500,8 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
 		return vfio_pci_ioctl_get_irq_info(vdev, uarg);
 	case VFIO_DEVICE_GET_PCI_HOT_RESET_INFO:
 		return vfio_pci_ioctl_get_pci_hot_reset_info(vdev, uarg);
+	case VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO:
+		return vfio_pci_ioctl_get_pci_hot_reset_group_info(vdev, uarg);
 	case VFIO_DEVICE_GET_REGION_INFO:
 		return vfio_pci_ioctl_get_region_info(vdev, uarg);
 	case VFIO_DEVICE_IOEVENTFD:
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 25432ef213ee..61b801dfd40b 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -669,6 +669,53 @@ struct vfio_pci_hot_reset_info {
 
 #define VFIO_DEVICE_GET_PCI_HOT_RESET_INFO	_IO(VFIO_TYPE, VFIO_BASE + 12)
 
+/**
+ * VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 12,
+ *						    struct vfio_pci_hot_reset_group_info)
+ *
+ * This is used in the vfio device cdev mode.  It returns the list of
+ * affected devices (represented by iommufd dev_id) when hot reset is
+ * issued on the current device with which this ioctl is invoked.  It
+ * only includes the devices that are opened by the current user in the
+ * time of this command is invoked.  This list may change when user opens
+ * new device or close opened device, hence user should re-invoke it
+ * after open/close devices.  This command has no guarantee on the result
+ * of VFIO_DEVICE_PCI_HOT_RESET since the not-opened affected device can
+ * be by other users in the window between the two ioctls.  If the affected
+ * devices are opened by multiple users, the VFIO_DEVICE_PCI_HOT_RESET
+ * shall fail, detail can check the description of VFIO_DEVICE_PCI_HOT_RESET.
+ *
+ * For the users that open vfio group/container, this ioctl is useless as
+ * they have no idea about the iommufd dev_id returned by this ioctl.  For
+ * the users of the traditional mode vfio group/container, this ioctl will
+ * fail as this mode does not use iommufd hence no dev_id to report back.
+ * For the users of the iommufd compat mode vfio group/container, this ioctl
+ * would succeed as this mode uses iommufd as container fd.  But such users
+ * still have no idea about the iommufd dev_id as the dev_id is only stored
+ * in kernel in this mode.  For the users of the vfio group/container, the
+ * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO should be used to know the hot reset
+ * affected devices.
+ *
+ * Return: 0 on success, -errno on failure:
+ *	-enospc = insufficient buffer;
+ *	-enodev = unsupported for device;
+ *	-eperm = no permission for device, this error comes:
+ *		 - when there are affected devices that are opened but
+ *		   not bound to the same iommufd with the current device
+ *		   with which this ioctl is invoked,
+ *		 - there are affected devices that are not bound to vfio
+ *		   driver yet.
+ *		 - no valid iommufd is bound (e.g. noiommu mode)
+ */
+struct vfio_pci_hot_reset_group_info {
+	__u32	argsz;
+	__u32	flags;
+	__u32	count;
+	__u32	devices[];
+};
+
+#define VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO	_IO(VFIO_TYPE, VFIO_BASE + 18)
+
 /**
  * VFIO_DEVICE_PCI_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE + 13,
  *				    struct vfio_pci_hot_reset)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
  2023-03-27  9:34 ` [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO Yi Liu
@ 2023-03-27 19:26   ` Alex Williamson
  2023-03-27 20:40     ` Alex Williamson
  2023-03-28  3:32     ` Liu, Yi L
  2023-03-28 12:40   ` Jason Gunthorpe
  1 sibling, 2 replies; 56+ messages in thread
From: Alex Williamson @ 2023-03-27 19:26 UTC (permalink / raw)
  To: Yi Liu
  Cc: jgg, kevin.tian, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

On Mon, 27 Mar 2023 02:34:58 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> This is a preparation for vfio device cdev as cdev gives userspace the
> capability to open device cdev fd and management stack (e.g. libvirt)
> could pass the device fd to the actual user (e.g. QEMU). As a result,
> the actual user has no idea about the device's bus:devfn information.
> This is a problem when user uses VFIO_DEVICE_GET_PCI_HOT_RESET_INFO to
> know the hot reset affected device scope as this ioctl returns bus:devfn
> info. For the fd passing usage, the acutal user cannot map the bus:devfn
> to the devices it has opened via the fd passed from management stack. So
> a new ioctl is required.
> 
> This new ioctl reports the list of iommufd dev_id that is opened by the
> user. If there is affected device that is not bound to vfio driver or
> opened by another user, this command shall fail with -EPERM. For the
> noiommu mode in the vfio device cdev path, this shall fail as no dev_id
> would be generated, hence nothing to report.
> 
> This ioctl is useless to the users that open vfio group as such users
> have no idea about the iommufd dev_id and it can use the existing
> VFIO_DEVICE_GET_PCI_HOT_RESET_INFO. The user that uses the traditional
> mode vfio group/container would be failed if invoking this ioctl. But
> the user that uses the iommufd compat mode vfio group/container shall
> succeed. This is harmless as long as user cannot make use of it and
> should use VFIO_DEVICE_GET_PCI_HOT_RESET_INFO.


So VFIO_DEVICE_GET_PCI_HOT_RESET_INFO reports a group and bdf, but
VFIO_DEVICE_GET_PCI_HOT_RESET_*GROUP*_INFO is meant for the non-group,
cdev use case and returns a dev_id rather than a group???

Additionally, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO has a flags arg that
isn't used, why do we need a new ioctl vs defining
VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID.  In fact, we could define
vfio_dependent_device as:

struct vfio_pci_dependent_device {
	union {
	        __u32   group_id;
		__u32	dev_id;
	}
        __u16   segment;
        __u8    bus;
        __u8    devfn;
};

If the user calls with the above flag, dev_id is valid, otherwise
group_id.  Perhaps segment:buus:devfn could still be filled in with a
NULL/invalid dev_id if the user doesn't have permissions for the device
so that debugging from userspace isn't so opaque.  Thanks,

Alex
 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/pci/vfio_pci_core.c | 98 ++++++++++++++++++++++++++++++++
>  include/uapi/linux/vfio.h        | 47 +++++++++++++++
>  2 files changed, 145 insertions(+)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 19f5b075d70a..45edf4e9b98b 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -1181,6 +1181,102 @@ static int vfio_pci_ioctl_reset(struct vfio_pci_core_device *vdev,
>  	return ret;
>  }
>  
> +static struct pci_dev *
> +vfio_pci_dev_set_resettable(struct vfio_device_set *dev_set);
> +
> +static int vfio_pci_ioctl_get_pci_hot_reset_group_info(
> +	struct vfio_pci_core_device *vdev,
> +	struct vfio_pci_hot_reset_group_info __user *arg)
> +{
> +	unsigned long minsz =
> +		offsetofend(struct vfio_pci_hot_reset_group_info, count);
> +	struct vfio_pci_hot_reset_group_info hdr;
> +	struct iommufd_ctx *iommufd, *cur_iommufd;
> +	u32 count = 0, index = 0, *devices = NULL;
> +	struct vfio_pci_core_device *cur;
> +	bool slot = false;
> +	int ret = 0;
> +
> +	if (copy_from_user(&hdr, arg, minsz))
> +		return -EFAULT;
> +
> +	if (hdr.argsz < minsz)
> +		return -EINVAL;
> +
> +	hdr.flags = 0;
> +
> +	/* Can we do a slot or bus reset or neither? */
> +	if (!pci_probe_reset_slot(vdev->pdev->slot))
> +		slot = true;
> +	else if (pci_probe_reset_bus(vdev->pdev->bus))
> +		return -ENODEV;
> +
> +	mutex_lock(&vdev->vdev.dev_set->lock);
> +	if (!vfio_pci_dev_set_resettable(vdev->vdev.dev_set)) {
> +		ret = -EPERM;
> +		goto out_unlock;
> +	}
> +
> +	iommufd = vfio_iommufd_physical_ictx(&vdev->vdev);
> +	if (!iommufd) {
> +		ret = -EPERM;
> +		goto out_unlock;
> +	}
> +
> +	/* How many devices are affected? */
> +	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_count_devs,
> +					    &count, slot);
> +	if (ret)
> +		goto out_unlock;
> +
> +	WARN_ON(!count); /* Should always be at least one */
> +
> +	/*
> +	 * If there's enough space, fill it now, otherwise return -ENOSPC and
> +	 * the number of devices affected.
> +	 */
> +	if (hdr.argsz < sizeof(hdr) + (count * sizeof(*devices))) {
> +		ret = -ENOSPC;
> +		hdr.count = count;
> +		goto reset_info_exit;
> +	}
> +
> +	devices = kcalloc(count, sizeof(*devices), GFP_KERNEL);
> +	if (!devices) {
> +		ret = -ENOMEM;
> +		goto reset_info_exit;
> +	}
> +
> +	list_for_each_entry(cur, &vdev->vdev.dev_set->device_list, vdev.dev_set_list) {
> +		cur_iommufd = vfio_iommufd_physical_ictx(&cur->vdev);
> +		if (cur->vdev.open_count) {
> +			if (cur_iommufd != iommufd) {
> +				ret = -EPERM;
> +				break;
> +			}
> +			ret = vfio_iommufd_physical_devid(&cur->vdev, &devices[index]);
> +			if (ret)
> +				break;
> +			index++;
> +		}
> +	}
> +
> +reset_info_exit:
> +	if (copy_to_user(arg, &hdr, minsz))
> +		ret = -EFAULT;
> +
> +	if (!ret) {
> +		if (copy_to_user(&arg->devices, devices,
> +				 hdr.count * sizeof(*devices)))
> +			ret = -EFAULT;
> +	}
> +
> +	kfree(devices);
> +out_unlock:
> +	mutex_unlock(&vdev->vdev.dev_set->lock);
> +	return ret;
> +}
> +
>  static int vfio_pci_ioctl_get_pci_hot_reset_info(
>  	struct vfio_pci_core_device *vdev,
>  	struct vfio_pci_hot_reset_info __user *arg)
> @@ -1404,6 +1500,8 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
>  		return vfio_pci_ioctl_get_irq_info(vdev, uarg);
>  	case VFIO_DEVICE_GET_PCI_HOT_RESET_INFO:
>  		return vfio_pci_ioctl_get_pci_hot_reset_info(vdev, uarg);
> +	case VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO:
> +		return vfio_pci_ioctl_get_pci_hot_reset_group_info(vdev, uarg);
>  	case VFIO_DEVICE_GET_REGION_INFO:
>  		return vfio_pci_ioctl_get_region_info(vdev, uarg);
>  	case VFIO_DEVICE_IOEVENTFD:
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 25432ef213ee..61b801dfd40b 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -669,6 +669,53 @@ struct vfio_pci_hot_reset_info {
>  
>  #define VFIO_DEVICE_GET_PCI_HOT_RESET_INFO	_IO(VFIO_TYPE, VFIO_BASE + 12)
>  
> +/**
> + * VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 12,
> + *						    struct vfio_pci_hot_reset_group_info)
> + *
> + * This is used in the vfio device cdev mode.  It returns the list of
> + * affected devices (represented by iommufd dev_id) when hot reset is
> + * issued on the current device with which this ioctl is invoked.  It
> + * only includes the devices that are opened by the current user in the
> + * time of this command is invoked.  This list may change when user opens
> + * new device or close opened device, hence user should re-invoke it
> + * after open/close devices.  This command has no guarantee on the result
> + * of VFIO_DEVICE_PCI_HOT_RESET since the not-opened affected device can
> + * be by other users in the window between the two ioctls.  If the affected
> + * devices are opened by multiple users, the VFIO_DEVICE_PCI_HOT_RESET
> + * shall fail, detail can check the description of VFIO_DEVICE_PCI_HOT_RESET.
> + *
> + * For the users that open vfio group/container, this ioctl is useless as
> + * they have no idea about the iommufd dev_id returned by this ioctl.  For
> + * the users of the traditional mode vfio group/container, this ioctl will
> + * fail as this mode does not use iommufd hence no dev_id to report back.
> + * For the users of the iommufd compat mode vfio group/container, this ioctl
> + * would succeed as this mode uses iommufd as container fd.  But such users
> + * still have no idea about the iommufd dev_id as the dev_id is only stored
> + * in kernel in this mode.  For the users of the vfio group/container, the
> + * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO should be used to know the hot reset
> + * affected devices.
> + *
> + * Return: 0 on success, -errno on failure:
> + *	-enospc = insufficient buffer;
> + *	-enodev = unsupported for device;
> + *	-eperm = no permission for device, this error comes:
> + *		 - when there are affected devices that are opened but
> + *		   not bound to the same iommufd with the current device
> + *		   with which this ioctl is invoked,
> + *		 - there are affected devices that are not bound to vfio
> + *		   driver yet.
> + *		 - no valid iommufd is bound (e.g. noiommu mode)
> + */
> +struct vfio_pci_hot_reset_group_info {
> +	__u32	argsz;
> +	__u32	flags;
> +	__u32	count;
> +	__u32	devices[];
> +};
> +
> +#define VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO	_IO(VFIO_TYPE, VFIO_BASE + 18)
> +
>  /**
>   * VFIO_DEVICE_PCI_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE + 13,
>   *				    struct vfio_pci_hot_reset)


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
  2023-03-27 19:26   ` Alex Williamson
@ 2023-03-27 20:40     ` Alex Williamson
  2023-03-28  3:45       ` Liu, Yi L
  2023-03-28  3:32     ` Liu, Yi L
  1 sibling, 1 reply; 56+ messages in thread
From: Alex Williamson @ 2023-03-27 20:40 UTC (permalink / raw)
  To: Yi Liu
  Cc: jgg, kevin.tian, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang

On Mon, 27 Mar 2023 13:26:19 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Mon, 27 Mar 2023 02:34:58 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > This is a preparation for vfio device cdev as cdev gives userspace the
> > capability to open device cdev fd and management stack (e.g. libvirt)
> > could pass the device fd to the actual user (e.g. QEMU). As a result,
> > the actual user has no idea about the device's bus:devfn information.
> > This is a problem when user uses VFIO_DEVICE_GET_PCI_HOT_RESET_INFO to
> > know the hot reset affected device scope as this ioctl returns bus:devfn
> > info. For the fd passing usage, the acutal user cannot map the bus:devfn
> > to the devices it has opened via the fd passed from management stack. So
> > a new ioctl is required.
> > 
> > This new ioctl reports the list of iommufd dev_id that is opened by the
> > user. If there is affected device that is not bound to vfio driver or
> > opened by another user, this command shall fail with -EPERM. For the
> > noiommu mode in the vfio device cdev path, this shall fail as no dev_id
> > would be generated, hence nothing to report.
> > 
> > This ioctl is useless to the users that open vfio group as such users
> > have no idea about the iommufd dev_id and it can use the existing
> > VFIO_DEVICE_GET_PCI_HOT_RESET_INFO. The user that uses the traditional
> > mode vfio group/container would be failed if invoking this ioctl. But
> > the user that uses the iommufd compat mode vfio group/container shall
> > succeed. This is harmless as long as user cannot make use of it and
> > should use VFIO_DEVICE_GET_PCI_HOT_RESET_INFO.  
> 
> 
> So VFIO_DEVICE_GET_PCI_HOT_RESET_INFO reports a group and bdf, but
> VFIO_DEVICE_GET_PCI_HOT_RESET_*GROUP*_INFO is meant for the non-group,
> cdev use case and returns a dev_id rather than a group???
> 
> Additionally, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO has a flags arg that
> isn't used, why do we need a new ioctl vs defining
> VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID.  In fact, we could define
> vfio_dependent_device as:
> 
> struct vfio_pci_dependent_device {
> 	union {
> 	        __u32   group_id;
> 		__u32	dev_id;
> 	}
>         __u16   segment;
>         __u8    bus;
>         __u8    devfn;
> };
> 
> If the user calls with the above flag, dev_id is valid, otherwise
> group_id.  Perhaps segment:buus:devfn could still be filled in with a
> NULL/invalid dev_id if the user doesn't have permissions for the device
> so that debugging from userspace isn't so opaque.  Thanks,
> 
> Alex
>  
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/pci/vfio_pci_core.c | 98 ++++++++++++++++++++++++++++++++
> >  include/uapi/linux/vfio.h        | 47 +++++++++++++++
> >  2 files changed, 145 insertions(+)
> > 
> > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> > index 19f5b075d70a..45edf4e9b98b 100644
> > --- a/drivers/vfio/pci/vfio_pci_core.c
> > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > @@ -1181,6 +1181,102 @@ static int vfio_pci_ioctl_reset(struct vfio_pci_core_device *vdev,
> >  	return ret;
> >  }
> >  
> > +static struct pci_dev *
> > +vfio_pci_dev_set_resettable(struct vfio_device_set *dev_set);
> > +
> > +static int vfio_pci_ioctl_get_pci_hot_reset_group_info(
> > +	struct vfio_pci_core_device *vdev,
> > +	struct vfio_pci_hot_reset_group_info __user *arg)
> > +{
> > +	unsigned long minsz =
> > +		offsetofend(struct vfio_pci_hot_reset_group_info, count);
> > +	struct vfio_pci_hot_reset_group_info hdr;
> > +	struct iommufd_ctx *iommufd, *cur_iommufd;
> > +	u32 count = 0, index = 0, *devices = NULL;
> > +	struct vfio_pci_core_device *cur;
> > +	bool slot = false;
> > +	int ret = 0;
> > +
> > +	if (copy_from_user(&hdr, arg, minsz))
> > +		return -EFAULT;
> > +
> > +	if (hdr.argsz < minsz)
> > +		return -EINVAL;
> > +
> > +	hdr.flags = 0;
> > +
> > +	/* Can we do a slot or bus reset or neither? */
> > +	if (!pci_probe_reset_slot(vdev->pdev->slot))
> > +		slot = true;
> > +	else if (pci_probe_reset_bus(vdev->pdev->bus))
> > +		return -ENODEV;
> > +
> > +	mutex_lock(&vdev->vdev.dev_set->lock);
> > +	if (!vfio_pci_dev_set_resettable(vdev->vdev.dev_set)) {
> > +		ret = -EPERM;
> > +		goto out_unlock;
> > +	}
> > +
> > +	iommufd = vfio_iommufd_physical_ictx(&vdev->vdev);
> > +	if (!iommufd) {
> > +		ret = -EPERM;
> > +		goto out_unlock;
> > +	}
> > +
> > +	/* How many devices are affected? */
> > +	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_count_devs,
> > +					    &count, slot);
> > +	if (ret)
> > +		goto out_unlock;
> > +
> > +	WARN_ON(!count); /* Should always be at least one */
> > +
> > +	/*
> > +	 * If there's enough space, fill it now, otherwise return -ENOSPC and
> > +	 * the number of devices affected.
> > +	 */
> > +	if (hdr.argsz < sizeof(hdr) + (count * sizeof(*devices))) {
> > +		ret = -ENOSPC;
> > +		hdr.count = count;
> > +		goto reset_info_exit;
> > +	}
> > +
> > +	devices = kcalloc(count, sizeof(*devices), GFP_KERNEL);
> > +	if (!devices) {
> > +		ret = -ENOMEM;
> > +		goto reset_info_exit;
> > +	}
> > +
> > +	list_for_each_entry(cur, &vdev->vdev.dev_set->device_list, vdev.dev_set_list) {
> > +		cur_iommufd = vfio_iommufd_physical_ictx(&cur->vdev);
> > +		if (cur->vdev.open_count) {
> > +			if (cur_iommufd != iommufd) {
> > +				ret = -EPERM;
> > +				break;
> > +			}
> > +			ret = vfio_iommufd_physical_devid(&cur->vdev, &devices[index]);
> > +			if (ret)
> > +				break;
> > +			index++;
> > +		}
> > +	}

This also makes use of vfio_pci_for_each_slot_or_bus() to iterate
affected devices, but then still assumes those affected devices are
necessarily represented in the dev_set.  For example, a two-function
device with ACS isolation can have function 0 bound to vfio and
function 1 bound to a native host driver.  The above code requires the
user to pass a buffer large enough for both functions, but only
function 0 is part of the dev_set, so function 1 is not reported as
affected, nor does it generate an error.

It looks like we also fail to set hdr.count except in the error path
above, so the below copy_to_user() copies a user specified range beyond
the end the allocated devices buffer out to userspace!  Thanks,

Alex

> > +
> > +reset_info_exit:
> > +	if (copy_to_user(arg, &hdr, minsz))
> > +		ret = -EFAULT;
> > +
> > +	if (!ret) {
> > +		if (copy_to_user(&arg->devices, devices,
> > +				 hdr.count * sizeof(*devices)))
> > +			ret = -EFAULT;
> > +	}
> > +
> > +	kfree(devices);
> > +out_unlock:
> > +	mutex_unlock(&vdev->vdev.dev_set->lock);
> > +	return ret;
> > +}
> > +
> >  static int vfio_pci_ioctl_get_pci_hot_reset_info(
> >  	struct vfio_pci_core_device *vdev,
> >  	struct vfio_pci_hot_reset_info __user *arg)
> > @@ -1404,6 +1500,8 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
> >  		return vfio_pci_ioctl_get_irq_info(vdev, uarg);
> >  	case VFIO_DEVICE_GET_PCI_HOT_RESET_INFO:
> >  		return vfio_pci_ioctl_get_pci_hot_reset_info(vdev, uarg);
> > +	case VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO:
> > +		return vfio_pci_ioctl_get_pci_hot_reset_group_info(vdev, uarg);
> >  	case VFIO_DEVICE_GET_REGION_INFO:
> >  		return vfio_pci_ioctl_get_region_info(vdev, uarg);
> >  	case VFIO_DEVICE_IOEVENTFD:
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 25432ef213ee..61b801dfd40b 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -669,6 +669,53 @@ struct vfio_pci_hot_reset_info {
> >  
> >  #define VFIO_DEVICE_GET_PCI_HOT_RESET_INFO	_IO(VFIO_TYPE, VFIO_BASE + 12)
> >  
> > +/**
> > + * VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 12,
> > + *						    struct vfio_pci_hot_reset_group_info)
> > + *
> > + * This is used in the vfio device cdev mode.  It returns the list of
> > + * affected devices (represented by iommufd dev_id) when hot reset is
> > + * issued on the current device with which this ioctl is invoked.  It
> > + * only includes the devices that are opened by the current user in the
> > + * time of this command is invoked.  This list may change when user opens
> > + * new device or close opened device, hence user should re-invoke it
> > + * after open/close devices.  This command has no guarantee on the result
> > + * of VFIO_DEVICE_PCI_HOT_RESET since the not-opened affected device can
> > + * be by other users in the window between the two ioctls.  If the affected
> > + * devices are opened by multiple users, the VFIO_DEVICE_PCI_HOT_RESET
> > + * shall fail, detail can check the description of VFIO_DEVICE_PCI_HOT_RESET.
> > + *
> > + * For the users that open vfio group/container, this ioctl is useless as
> > + * they have no idea about the iommufd dev_id returned by this ioctl.  For
> > + * the users of the traditional mode vfio group/container, this ioctl will
> > + * fail as this mode does not use iommufd hence no dev_id to report back.
> > + * For the users of the iommufd compat mode vfio group/container, this ioctl
> > + * would succeed as this mode uses iommufd as container fd.  But such users
> > + * still have no idea about the iommufd dev_id as the dev_id is only stored
> > + * in kernel in this mode.  For the users of the vfio group/container, the
> > + * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO should be used to know the hot reset
> > + * affected devices.
> > + *
> > + * Return: 0 on success, -errno on failure:
> > + *	-enospc = insufficient buffer;
> > + *	-enodev = unsupported for device;
> > + *	-eperm = no permission for device, this error comes:
> > + *		 - when there are affected devices that are opened but
> > + *		   not bound to the same iommufd with the current device
> > + *		   with which this ioctl is invoked,
> > + *		 - there are affected devices that are not bound to vfio
> > + *		   driver yet.
> > + *		 - no valid iommufd is bound (e.g. noiommu mode)
> > + */
> > +struct vfio_pci_hot_reset_group_info {
> > +	__u32	argsz;
> > +	__u32	flags;
> > +	__u32	count;
> > +	__u32	devices[];
> > +};
> > +
> > +#define VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO	_IO(VFIO_TYPE, VFIO_BASE + 18)
> > +
> >  /**
> >   * VFIO_DEVICE_PCI_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE + 13,
> >   *				    struct vfio_pci_hot_reset)  
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
  2023-03-27 19:26   ` Alex Williamson
  2023-03-27 20:40     ` Alex Williamson
@ 2023-03-28  3:32     ` Liu, Yi L
  2023-03-28  6:19       ` Tian, Kevin
  1 sibling, 1 reply; 56+ messages in thread
From: Liu, Yi L @ 2023-03-28  3:32 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, March 28, 2023 3:26 AM
> 
> On Mon, 27 Mar 2023 02:34:58 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > This is a preparation for vfio device cdev as cdev gives userspace the
> > capability to open device cdev fd and management stack (e.g. libvirt)
> > could pass the device fd to the actual user (e.g. QEMU). As a result,
> > the actual user has no idea about the device's bus:devfn information.
> > This is a problem when user uses VFIO_DEVICE_GET_PCI_HOT_RESET_INFO to
> > know the hot reset affected device scope as this ioctl returns bus:devfn
> > info. For the fd passing usage, the acutal user cannot map the bus:devfn
> > to the devices it has opened via the fd passed from management stack. So
> > a new ioctl is required.
> >
> > This new ioctl reports the list of iommufd dev_id that is opened by the
> > user. If there is affected device that is not bound to vfio driver or
> > opened by another user, this command shall fail with -EPERM. For the
> > noiommu mode in the vfio device cdev path, this shall fail as no dev_id
> > would be generated, hence nothing to report.
> >
> > This ioctl is useless to the users that open vfio group as such users
> > have no idea about the iommufd dev_id and it can use the existing
> > VFIO_DEVICE_GET_PCI_HOT_RESET_INFO. The user that uses the traditional
> > mode vfio group/container would be failed if invoking this ioctl. But
> > the user that uses the iommufd compat mode vfio group/container shall
> > succeed. This is harmless as long as user cannot make use of it and
> > should use VFIO_DEVICE_GET_PCI_HOT_RESET_INFO.
> 
> 
> So VFIO_DEVICE_GET_PCI_HOT_RESET_INFO reports a group and bdf, but
> VFIO_DEVICE_GET_PCI_HOT_RESET_*GROUP*_INFO is meant for the non-
> group,
> cdev use case and returns a dev_id rather than a group???

Yes, this is the meaning, but poor naming here ☹ I also struggled on it.
Perhaps your below Suggestion makes more sense. Introducing a flag and
reuse the existing _INFO ioctl.

> Additionally, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO has a flags arg that
> isn't used, why do we need a new ioctl vs defining
> VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID.

Sure. I can follow this suggestion. BTW. I have a doubt here. This new flag
is set by user. What if in the future kernel has new extensions and needs
to report something new to the user and add new flags to tell user? Such
flag is set by kernel. Then the flags field may have two kinds of flags (some
set by user while some set by kernel). Will it mess up the flags space?

>  In fact, we could define vfio_dependent_device as:
> 
> struct vfio_pci_dependent_device {
> 	union {
> 	        __u32   group_id;
> 		__u32	dev_id;
> 	}
>         __u16   segment;
>         __u8    bus;
>         __u8    devfn;
> };
> 
> If the user calls with the above flag, dev_id is valid, otherwise
> group_id.  Perhaps segment:buus:devfn could still be filled in with a
> NULL/invalid dev_id if the user doesn't have permissions for the device
> so that debugging from userspace isn't so opaque.  Thanks,

Also, have one question here. Should the invalid dev_id be defined in
the vfio uapi or iommufd uapi? Maybe the latter one since dev_id is
generated by iommufd subsystem.

Regards,
Yi Liu
> 
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/pci/vfio_pci_core.c | 98
> ++++++++++++++++++++++++++++++++
> >  include/uapi/linux/vfio.h        | 47 +++++++++++++++
> >  2 files changed, 145 insertions(+)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci_core.c
> b/drivers/vfio/pci/vfio_pci_core.c
> > index 19f5b075d70a..45edf4e9b98b 100644
> > --- a/drivers/vfio/pci/vfio_pci_core.c
> > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > @@ -1181,6 +1181,102 @@ static int vfio_pci_ioctl_reset(struct
> vfio_pci_core_device *vdev,
> >  	return ret;
> >  }
> >
> > +static struct pci_dev *
> > +vfio_pci_dev_set_resettable(struct vfio_device_set *dev_set);
> > +
> > +static int vfio_pci_ioctl_get_pci_hot_reset_group_info(
> > +	struct vfio_pci_core_device *vdev,
> > +	struct vfio_pci_hot_reset_group_info __user *arg)
> > +{
> > +	unsigned long minsz =
> > +		offsetofend(struct vfio_pci_hot_reset_group_info, count);
> > +	struct vfio_pci_hot_reset_group_info hdr;
> > +	struct iommufd_ctx *iommufd, *cur_iommufd;
> > +	u32 count = 0, index = 0, *devices = NULL;
> > +	struct vfio_pci_core_device *cur;
> > +	bool slot = false;
> > +	int ret = 0;
> > +
> > +	if (copy_from_user(&hdr, arg, minsz))
> > +		return -EFAULT;
> > +
> > +	if (hdr.argsz < minsz)
> > +		return -EINVAL;
> > +
> > +	hdr.flags = 0;
> > +
> > +	/* Can we do a slot or bus reset or neither? */
> > +	if (!pci_probe_reset_slot(vdev->pdev->slot))
> > +		slot = true;
> > +	else if (pci_probe_reset_bus(vdev->pdev->bus))
> > +		return -ENODEV;
> > +
> > +	mutex_lock(&vdev->vdev.dev_set->lock);
> > +	if (!vfio_pci_dev_set_resettable(vdev->vdev.dev_set)) {
> > +		ret = -EPERM;
> > +		goto out_unlock;
> > +	}
> > +
> > +	iommufd = vfio_iommufd_physical_ictx(&vdev->vdev);
> > +	if (!iommufd) {
> > +		ret = -EPERM;
> > +		goto out_unlock;
> > +	}
> > +
> > +	/* How many devices are affected? */
> > +	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev,
> vfio_pci_count_devs,
> > +					    &count, slot);
> > +	if (ret)
> > +		goto out_unlock;
> > +
> > +	WARN_ON(!count); /* Should always be at least one */
> > +
> > +	/*
> > +	 * If there's enough space, fill it now, otherwise return -ENOSPC and
> > +	 * the number of devices affected.
> > +	 */
> > +	if (hdr.argsz < sizeof(hdr) + (count * sizeof(*devices))) {
> > +		ret = -ENOSPC;
> > +		hdr.count = count;
> > +		goto reset_info_exit;
> > +	}
> > +
> > +	devices = kcalloc(count, sizeof(*devices), GFP_KERNEL);
> > +	if (!devices) {
> > +		ret = -ENOMEM;
> > +		goto reset_info_exit;
> > +	}
> > +
> > +	list_for_each_entry(cur, &vdev->vdev.dev_set->device_list,
> vdev.dev_set_list) {
> > +		cur_iommufd = vfio_iommufd_physical_ictx(&cur->vdev);
> > +		if (cur->vdev.open_count) {
> > +			if (cur_iommufd != iommufd) {
> > +				ret = -EPERM;
> > +				break;
> > +			}
> > +			ret = vfio_iommufd_physical_devid(&cur->vdev,
> &devices[index]);
> > +			if (ret)
> > +				break;
> > +			index++;
> > +		}
> > +	}
> > +
> > +reset_info_exit:
> > +	if (copy_to_user(arg, &hdr, minsz))
> > +		ret = -EFAULT;
> > +
> > +	if (!ret) {
> > +		if (copy_to_user(&arg->devices, devices,
> > +				 hdr.count * sizeof(*devices)))
> > +			ret = -EFAULT;
> > +	}
> > +
> > +	kfree(devices);
> > +out_unlock:
> > +	mutex_unlock(&vdev->vdev.dev_set->lock);
> > +	return ret;
> > +}
> > +
> >  static int vfio_pci_ioctl_get_pci_hot_reset_info(
> >  	struct vfio_pci_core_device *vdev,
> >  	struct vfio_pci_hot_reset_info __user *arg)
> > @@ -1404,6 +1500,8 @@ long vfio_pci_core_ioctl(struct vfio_device
> *core_vdev, unsigned int cmd,
> >  		return vfio_pci_ioctl_get_irq_info(vdev, uarg);
> >  	case VFIO_DEVICE_GET_PCI_HOT_RESET_INFO:
> >  		return vfio_pci_ioctl_get_pci_hot_reset_info(vdev, uarg);
> > +	case VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO:
> > +		return vfio_pci_ioctl_get_pci_hot_reset_group_info(vdev,
> uarg);
> >  	case VFIO_DEVICE_GET_REGION_INFO:
> >  		return vfio_pci_ioctl_get_region_info(vdev, uarg);
> >  	case VFIO_DEVICE_IOEVENTFD:
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 25432ef213ee..61b801dfd40b 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -669,6 +669,53 @@ struct vfio_pci_hot_reset_info {
> >
> >  #define VFIO_DEVICE_GET_PCI_HOT_RESET_INFO	_IO(VFIO_TYPE,
> VFIO_BASE + 12)
> >
> > +/**
> > + * VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO - _IOWR(VFIO_TYPE,
> VFIO_BASE + 12,
> > + *						    struct
> vfio_pci_hot_reset_group_info)
> > + *
> > + * This is used in the vfio device cdev mode.  It returns the list of
> > + * affected devices (represented by iommufd dev_id) when hot reset is
> > + * issued on the current device with which this ioctl is invoked.  It
> > + * only includes the devices that are opened by the current user in the
> > + * time of this command is invoked.  This list may change when user
> opens
> > + * new device or close opened device, hence user should re-invoke it
> > + * after open/close devices.  This command has no guarantee on the
> result
> > + * of VFIO_DEVICE_PCI_HOT_RESET since the not-opened affected device
> can
> > + * be by other users in the window between the two ioctls.  If the
> affected
> > + * devices are opened by multiple users, the
> VFIO_DEVICE_PCI_HOT_RESET
> > + * shall fail, detail can check the description of
> VFIO_DEVICE_PCI_HOT_RESET.
> > + *
> > + * For the users that open vfio group/container, this ioctl is useless as
> > + * they have no idea about the iommufd dev_id returned by this ioctl.
> For
> > + * the users of the traditional mode vfio group/container, this ioctl will
> > + * fail as this mode does not use iommufd hence no dev_id to report
> back.
> > + * For the users of the iommufd compat mode vfio group/container, this
> ioctl
> > + * would succeed as this mode uses iommufd as container fd.  But such
> users
> > + * still have no idea about the iommufd dev_id as the dev_id is only
> stored
> > + * in kernel in this mode.  For the users of the vfio group/container, the
> > + * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO should be used to know the
> hot reset
> > + * affected devices.
> > + *
> > + * Return: 0 on success, -errno on failure:
> > + *	-enospc = insufficient buffer;
> > + *	-enodev = unsupported for device;
> > + *	-eperm = no permission for device, this error comes:
> > + *		 - when there are affected devices that are opened but
> > + *		   not bound to the same iommufd with the current device
> > + *		   with which this ioctl is invoked,
> > + *		 - there are affected devices that are not bound to vfio
> > + *		   driver yet.
> > + *		 - no valid iommufd is bound (e.g. noiommu mode)
> > + */
> > +struct vfio_pci_hot_reset_group_info {
> > +	__u32	argsz;
> > +	__u32	flags;
> > +	__u32	count;
> > +	__u32	devices[];
> > +};
> > +
> > +#define VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
> 	_IO(VFIO_TYPE, VFIO_BASE + 18)
> > +
> >  /**
> >   * VFIO_DEVICE_PCI_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE + 13,
> >   *				    struct vfio_pci_hot_reset)


^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
  2023-03-27 20:40     ` Alex Williamson
@ 2023-03-28  3:45       ` Liu, Yi L
  0 siblings, 0 replies; 56+ messages in thread
From: Liu, Yi L @ 2023-03-28  3:45 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, March 28, 2023 4:41 AM
> 
> On Mon, 27 Mar 2023 13:26:19 -0600
> Alex Williamson <alex.williamson@redhat.com> wrote:
> 
[...]
> > > diff --git a/drivers/vfio/pci/vfio_pci_core.c
> b/drivers/vfio/pci/vfio_pci_core.c
> > > index 19f5b075d70a..45edf4e9b98b 100644
> > > --- a/drivers/vfio/pci/vfio_pci_core.c
> > > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > > @@ -1181,6 +1181,102 @@ static int vfio_pci_ioctl_reset(struct
> vfio_pci_core_device *vdev,
> > >  	return ret;
> > >  }
> > >
> > > +static struct pci_dev *
> > > +vfio_pci_dev_set_resettable(struct vfio_device_set *dev_set);
> > > +
> > > +static int vfio_pci_ioctl_get_pci_hot_reset_group_info(
> > > +	struct vfio_pci_core_device *vdev,
> > > +	struct vfio_pci_hot_reset_group_info __user *arg)
> > > +{
> > > +	unsigned long minsz =
> > > +		offsetofend(struct vfio_pci_hot_reset_group_info, count);
> > > +	struct vfio_pci_hot_reset_group_info hdr;
> > > +	struct iommufd_ctx *iommufd, *cur_iommufd;
> > > +	u32 count = 0, index = 0, *devices = NULL;
> > > +	struct vfio_pci_core_device *cur;
> > > +	bool slot = false;
> > > +	int ret = 0;
> > > +
> > > +	if (copy_from_user(&hdr, arg, minsz))
> > > +		return -EFAULT;
> > > +
> > > +	if (hdr.argsz < minsz)
> > > +		return -EINVAL;
> > > +
> > > +	hdr.flags = 0;
> > > +
> > > +	/* Can we do a slot or bus reset or neither? */
> > > +	if (!pci_probe_reset_slot(vdev->pdev->slot))
> > > +		slot = true;
> > > +	else if (pci_probe_reset_bus(vdev->pdev->bus))
> > > +		return -ENODEV;
> > > +
> > > +	mutex_lock(&vdev->vdev.dev_set->lock);
> > > +	if (!vfio_pci_dev_set_resettable(vdev->vdev.dev_set)) {

[1]

> > > +		ret = -EPERM;
> > > +		goto out_unlock;
> > > +	}
> > > +
> > > +	iommufd = vfio_iommufd_physical_ictx(&vdev->vdev);
> > > +	if (!iommufd) {
> > > +		ret = -EPERM;
> > > +		goto out_unlock;
> > > +	}
> > > +
> > > +	/* How many devices are affected? */
> > > +	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev,
> vfio_pci_count_devs,
> > > +					    &count, slot);
> > > +	if (ret)
> > > +		goto out_unlock;
> > > +
> > > +	WARN_ON(!count); /* Should always be at least one */
> > > +
> > > +	/*
> > > +	 * If there's enough space, fill it now, otherwise return -ENOSPC and
> > > +	 * the number of devices affected.
> > > +	 */
> > > +	if (hdr.argsz < sizeof(hdr) + (count * sizeof(*devices))) {
> > > +		ret = -ENOSPC;
> > > +		hdr.count = count;
> > > +		goto reset_info_exit;
> > > +	}
> > > +
> > > +	devices = kcalloc(count, sizeof(*devices), GFP_KERNEL);
> > > +	if (!devices) {
> > > +		ret = -ENOMEM;
> > > +		goto reset_info_exit;
> > > +	}
> > > +
> > > +	list_for_each_entry(cur, &vdev->vdev.dev_set->device_list,
> vdev.dev_set_list) {
> > > +		cur_iommufd = vfio_iommufd_physical_ictx(&cur->vdev);
> > > +		if (cur->vdev.open_count) {
> > > +			if (cur_iommufd != iommufd) {
> > > +				ret = -EPERM;
> > > +				break;
> > > +			}
> > > +			ret = vfio_iommufd_physical_devid(&cur->vdev,
> &devices[index]);
> > > +			if (ret)
> > > +				break;
> > > +			index++;
> > > +		}
> > > +	}
> 
> This also makes use of vfio_pci_for_each_slot_or_bus() to iterate
> affected devices, but then still assumes those affected devices are
> necessarily represented in the dev_set.  For example, a two-function
> device with ACS isolation can have function 0 bound to vfio and
> function 1 bound to a native host driver.  The above code requires the
> user to pass a buffer large enough for both functions, but only
> function 0 is part of the dev_set, so function 1 is not reported as
> affected, nor does it generate an error.

The vfio_pci_dev_set_resettable() is used at [1] to check if all the affected
devices are in the dev_set. If not, then it fails at the first place. So in the
following code, looping the devices in the dev_set is equivalent to looping
devices with vfio_pci_for_each_slot_or_bus(). I need to loop dev_set as
dev_set has the vfio_device which is more convenient to get dev_id.

> 
> It looks like we also fail to set hdr.count except in the error path
> above, so the below copy_to_user() copies a user specified range beyond
> the end the allocated devices buffer out to userspace!  Thanks,

Oops, yes, it is. My test only has one device affected, so it does not hit
the problem. ☹ 

Thanks,
Yi Liu

> Alex
> 
> > > +
> > > +reset_info_exit:
> > > +	if (copy_to_user(arg, &hdr, minsz))
> > > +		ret = -EFAULT;
> > > +
> > > +	if (!ret) {
> > > +		if (copy_to_user(&arg->devices, devices,
> > > +				 hdr.count * sizeof(*devices)))
> > > +			ret = -EFAULT;
> > > +	}
> > > +
> > > +	kfree(devices);
> > > +out_unlock:
> > > +	mutex_unlock(&vdev->vdev.dev_set->lock);
> > > +	return ret;
> > > +}
> > > +
> > >  static int vfio_pci_ioctl_get_pci_hot_reset_info(
> > >  	struct vfio_pci_core_device *vdev,
> > >  	struct vfio_pci_hot_reset_info __user *arg)
> > > @@ -1404,6 +1500,8 @@ long vfio_pci_core_ioctl(struct vfio_device
> *core_vdev, unsigned int cmd,
> > >  		return vfio_pci_ioctl_get_irq_info(vdev, uarg);
> > >  	case VFIO_DEVICE_GET_PCI_HOT_RESET_INFO:
> > >  		return vfio_pci_ioctl_get_pci_hot_reset_info(vdev, uarg);
> > > +	case VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO:
> > > +		return vfio_pci_ioctl_get_pci_hot_reset_group_info(vdev,
> uarg);
> > >  	case VFIO_DEVICE_GET_REGION_INFO:
> > >  		return vfio_pci_ioctl_get_region_info(vdev, uarg);
> > >  	case VFIO_DEVICE_IOEVENTFD:
> > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > > index 25432ef213ee..61b801dfd40b 100644
> > > --- a/include/uapi/linux/vfio.h
> > > +++ b/include/uapi/linux/vfio.h
> > > @@ -669,6 +669,53 @@ struct vfio_pci_hot_reset_info {
> > >
> > >  #define VFIO_DEVICE_GET_PCI_HOT_RESET_INFO	_IO(VFIO_TYPE,
> VFIO_BASE + 12)
> > >
> > > +/**
> > > + * VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO -
> _IOWR(VFIO_TYPE, VFIO_BASE + 12,
> > > + *						    struct
> vfio_pci_hot_reset_group_info)
> > > + *
> > > + * This is used in the vfio device cdev mode.  It returns the list of
> > > + * affected devices (represented by iommufd dev_id) when hot reset is
> > > + * issued on the current device with which this ioctl is invoked.  It
> > > + * only includes the devices that are opened by the current user in the
> > > + * time of this command is invoked.  This list may change when user
> opens
> > > + * new device or close opened device, hence user should re-invoke it
> > > + * after open/close devices.  This command has no guarantee on the
> result
> > > + * of VFIO_DEVICE_PCI_HOT_RESET since the not-opened affected
> device can
> > > + * be by other users in the window between the two ioctls.  If the
> affected
> > > + * devices are opened by multiple users, the
> VFIO_DEVICE_PCI_HOT_RESET
> > > + * shall fail, detail can check the description of
> VFIO_DEVICE_PCI_HOT_RESET.
> > > + *
> > > + * For the users that open vfio group/container, this ioctl is useless as
> > > + * they have no idea about the iommufd dev_id returned by this ioctl.
> For
> > > + * the users of the traditional mode vfio group/container, this ioctl will
> > > + * fail as this mode does not use iommufd hence no dev_id to report
> back.
> > > + * For the users of the iommufd compat mode vfio group/container,
> this ioctl
> > > + * would succeed as this mode uses iommufd as container fd.  But such
> users
> > > + * still have no idea about the iommufd dev_id as the dev_id is only
> stored
> > > + * in kernel in this mode.  For the users of the vfio group/container, the
> > > + * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO should be used to know
> the hot reset
> > > + * affected devices.
> > > + *
> > > + * Return: 0 on success, -errno on failure:
> > > + *	-enospc = insufficient buffer;
> > > + *	-enodev = unsupported for device;
> > > + *	-eperm = no permission for device, this error comes:
> > > + *		 - when there are affected devices that are opened but
> > > + *		   not bound to the same iommufd with the current device
> > > + *		   with which this ioctl is invoked,
> > > + *		 - there are affected devices that are not bound to vfio
> > > + *		   driver yet.
> > > + *		 - no valid iommufd is bound (e.g. noiommu mode)
> > > + */
> > > +struct vfio_pci_hot_reset_group_info {
> > > +	__u32	argsz;
> > > +	__u32	flags;
> > > +	__u32	count;
> > > +	__u32	devices[];
> > > +};
> > > +
> > > +#define VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
> 	_IO(VFIO_TYPE, VFIO_BASE + 18)
> > > +
> > >  /**
> > >   * VFIO_DEVICE_PCI_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE + 13,
> > >   *				    struct vfio_pci_hot_reset)
> >


^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
  2023-03-28  3:32     ` Liu, Yi L
@ 2023-03-28  6:19       ` Tian, Kevin
  2023-03-28 14:25         ` Alex Williamson
  0 siblings, 1 reply; 56+ messages in thread
From: Tian, Kevin @ 2023-03-28  6:19 UTC (permalink / raw)
  To: Liu, Yi L, Alex Williamson
  Cc: jgg, joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Tuesday, March 28, 2023 11:32 AM
> 
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Tuesday, March 28, 2023 3:26 AM
> >
> > Additionally, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO has a flags arg that
> > isn't used, why do we need a new ioctl vs defining
> > VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID.
> 
> Sure. I can follow this suggestion. BTW. I have a doubt here. This new flag
> is set by user. What if in the future kernel has new extensions and needs
> to report something new to the user and add new flags to tell user? Such
> flag is set by kernel. Then the flags field may have two kinds of flags (some
> set by user while some set by kernel). Will it mess up the flags space?
> 

flags in a GET_INFO ioctl is for output.

if user needs to use flags as input to select different type of info then it should
be split into multiple GET_INFO cmds.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
  2023-03-27  9:34 ` [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO Yi Liu
  2023-03-27 19:26   ` Alex Williamson
@ 2023-03-28 12:40   ` Jason Gunthorpe
  2023-03-28 14:45     ` Liu, Yi L
  1 sibling, 1 reply; 56+ messages in thread
From: Jason Gunthorpe @ 2023-03-28 12:40 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang

On Mon, Mar 27, 2023 at 02:34:58AM -0700, Yi Liu wrote:

> +	devices = kcalloc(count, sizeof(*devices), GFP_KERNEL);
> +	if (!devices) {
> +		ret = -ENOMEM;
> +		goto reset_info_exit;
> +	}

This doesn't need to be so complicated

> +	list_for_each_entry(cur, &vdev->vdev.dev_set->device_list, vdev.dev_set_list) {
> +		cur_iommufd = vfio_iommufd_physical_ictx(&cur->vdev);
> +		if (cur->vdev.open_count) {
> +			if (cur_iommufd != iommufd) {
> +				ret = -EPERM;
> +				break;
> +			}
> +			ret = vfio_iommufd_physical_devid(&cur->vdev, &devices[index]);


u32 device;

if (index >= hdr.count)
   return -ENOSPC;

ret = vfio_iommufd_physical_devid(&cur->vdev, &devices);
...

if (put_user(&arg->devices[index], device))
   -EFAULT

index++;

Jason

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
  2023-03-28  6:19       ` Tian, Kevin
@ 2023-03-28 14:25         ` Alex Williamson
  2023-03-28 14:38           ` Liu, Yi L
  0 siblings, 1 reply; 56+ messages in thread
From: Alex Williamson @ 2023-03-28 14:25 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, jgg, joro, robin.murphy, cohuck, eric.auger, nicolinc,
	kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting

On Tue, 28 Mar 2023 06:19:06 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Tuesday, March 28, 2023 11:32 AM
> >   
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Tuesday, March 28, 2023 3:26 AM
> > >
> > > Additionally, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO has a flags arg that
> > > isn't used, why do we need a new ioctl vs defining
> > > VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID.  
> > 
> > Sure. I can follow this suggestion. BTW. I have a doubt here. This new flag
> > is set by user. What if in the future kernel has new extensions and needs
> > to report something new to the user and add new flags to tell user? Such
> > flag is set by kernel. Then the flags field may have two kinds of flags (some
> > set by user while some set by kernel). Will it mess up the flags space?
> >   
> 
> flags in a GET_INFO ioctl is for output.
> 
> if user needs to use flags as input to select different type of info then it should
> be split into multiple GET_INFO cmds.

I don't know that that's actually a rule, however we don't currently
test flags is zero for input, so in this case I think we are stuck with
it only being for output.

Alternatively, should VFIO_DEVICE_GET_PCI_HOT_RESET_INFO automatically
return the dev_id variant of the output and set a flag to indicate this
is the case when called on a device fd opened as a cdev?  Thanks,

Alex


^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
  2023-03-28 14:25         ` Alex Williamson
@ 2023-03-28 14:38           ` Liu, Yi L
  2023-03-28 14:46             ` Alex Williamson
  0 siblings, 1 reply; 56+ messages in thread
From: Liu, Yi L @ 2023-03-28 14:38 UTC (permalink / raw)
  To: Alex Williamson, Tian, Kevin
  Cc: jgg, joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, March 28, 2023 10:26 PM
> 
> On Tue, 28 Mar 2023 06:19:06 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Tuesday, March 28, 2023 11:32 AM
> > >
> > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > Sent: Tuesday, March 28, 2023 3:26 AM
> > > >
> > > > Additionally, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO has a flags arg
> that
> > > > isn't used, why do we need a new ioctl vs defining
> > > > VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID.
> > >
> > > Sure. I can follow this suggestion. BTW. I have a doubt here. This new
> flag
> > > is set by user. What if in the future kernel has new extensions and needs
> > > to report something new to the user and add new flags to tell user? Such
> > > flag is set by kernel. Then the flags field may have two kinds of flags
> (some
> > > set by user while some set by kernel). Will it mess up the flags space?
> > >
> >
> > flags in a GET_INFO ioctl is for output.
> >
> > if user needs to use flags as input to select different type of info then it
> should
> > be split into multiple GET_INFO cmds.
> 
> I don't know that that's actually a rule, however we don't currently
> test flags is zero for input, so in this case I think we are stuck with
> it only being for output.
> 
> Alternatively, should VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
> automatically
> return the dev_id variant of the output and set a flag to indicate this
> is the case when called on a device fd opened as a cdev?  Thanks,

Personally I prefer that user asks for dev_id info explicitly. The major reason
that we return dev_id is that the group/bdf info is not enough for the device
fd passing case. But if qemu opens device by itself, the group/bdf info is still
enough. So a device opened as a cdev doesn't mean it should return dev_id,
it depends on if user has the bdf knowledge.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
  2023-03-28 12:40   ` Jason Gunthorpe
@ 2023-03-28 14:45     ` Liu, Yi L
  0 siblings, 0 replies; 56+ messages in thread
From: Liu, Yi L @ 2023-03-28 14:45 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: alex.williamson, Tian, Kevin, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390, Hao,
	Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, March 28, 2023 8:40 PM
> 
> On Mon, Mar 27, 2023 at 02:34:58AM -0700, Yi Liu wrote:
> 
> > +	devices = kcalloc(count, sizeof(*devices), GFP_KERNEL);
> > +	if (!devices) {
> > +		ret = -ENOMEM;
> > +		goto reset_info_exit;
> > +	}
> 
> This doesn't need to be so complicated
> 
> > +	list_for_each_entry(cur, &vdev->vdev.dev_set->device_list,
> vdev.dev_set_list) {
> > +		cur_iommufd = vfio_iommufd_physical_ictx(&cur->vdev);
> > +		if (cur->vdev.open_count) {
> > +			if (cur_iommufd != iommufd) {
> > +				ret = -EPERM;
> > +				break;
> > +			}
> > +			ret = vfio_iommufd_physical_devid(&cur->vdev,
> &devices[index]);
> 
> 
> u32 device;
> 
> if (index >= hdr.count)
>    return -ENOSPC;
> 
> ret = vfio_iommufd_physical_devid(&cur->vdev, &devices);

Ok, so the whole point is that if  cur->vdev->iommufd_ctx==null, then the
vfio_iommufd_physical_devid() shall fail as well.

> ...
> 
> if (put_user(&arg->devices[index], device))

Will modify it. let's close the "separate ioctl" v.s. "reuse ioctl + new flag" open with
Alex first.

Thanks,
Yi Liu

>    -EFAULT
> 
> index++;
> 
> Jason

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
  2023-03-28 14:38           ` Liu, Yi L
@ 2023-03-28 14:46             ` Alex Williamson
  2023-03-28 15:00               ` Liu, Yi L
  0 siblings, 1 reply; 56+ messages in thread
From: Alex Williamson @ 2023-03-28 14:46 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Tian, Kevin, jgg, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting

On Tue, 28 Mar 2023 14:38:12 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Tuesday, March 28, 2023 10:26 PM
> > 
> > On Tue, 28 Mar 2023 06:19:06 +0000
> > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >   
> > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > Sent: Tuesday, March 28, 2023 11:32 AM
> > > >  
> > > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > > Sent: Tuesday, March 28, 2023 3:26 AM
> > > > >
> > > > > Additionally, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO has a flags arg  
> > that  
> > > > > isn't used, why do we need a new ioctl vs defining
> > > > > VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID.  
> > > >
> > > > Sure. I can follow this suggestion. BTW. I have a doubt here. This new  
> > flag  
> > > > is set by user. What if in the future kernel has new extensions and needs
> > > > to report something new to the user and add new flags to tell user? Such
> > > > flag is set by kernel. Then the flags field may have two kinds of flags  
> > (some  
> > > > set by user while some set by kernel). Will it mess up the flags space?
> > > >  
> > >
> > > flags in a GET_INFO ioctl is for output.
> > >
> > > if user needs to use flags as input to select different type of info then it  
> > should  
> > > be split into multiple GET_INFO cmds.  
> > 
> > I don't know that that's actually a rule, however we don't currently
> > test flags is zero for input, so in this case I think we are stuck with
> > it only being for output.
> > 
> > Alternatively, should VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
> > automatically
> > return the dev_id variant of the output and set a flag to indicate this
> > is the case when called on a device fd opened as a cdev?  Thanks,  
> 
> Personally I prefer that user asks for dev_id info explicitly. The major reason
> that we return dev_id is that the group/bdf info is not enough for the device
> fd passing case. But if qemu opens device by itself, the group/bdf info is still
> enough. So a device opened as a cdev doesn't mean it should return dev_id,
> it depends on if user has the bdf knowledge.

But if QEMU opens the cdev, vs getting it from the group, does it make
any sense to return a set of group-ids + bdf in the host-reset info?
I'm inclined to think the answer is no.

Per my previous suggestion, I think we should always return the bdf. We
can't know if the user is accessing through an fd they opened
themselves or were passed, but it allows an actually useful debugging
report if userspace can know "I can't perform a hot reset of this
device because my iommufd context doesn't know about device <bdf>", vs
an opaque -EPERM.  Therefore I think we're only discussing the data
conveyed in the u32, a group-id or dev_id.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
  2023-03-28 14:46             ` Alex Williamson
@ 2023-03-28 15:00               ` Liu, Yi L
  2023-03-28 15:18                 ` Alex Williamson
  0 siblings, 1 reply; 56+ messages in thread
From: Liu, Yi L @ 2023-03-28 15:00 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Tian, Kevin, jgg, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, March 28, 2023 10:46 PM
> 
> On Tue, 28 Mar 2023 14:38:12 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> 
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Tuesday, March 28, 2023 10:26 PM
> > >
> > > On Tue, 28 Mar 2023 06:19:06 +0000
> > > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> > >
> > > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > > Sent: Tuesday, March 28, 2023 11:32 AM
> > > > >
> > > > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > > > Sent: Tuesday, March 28, 2023 3:26 AM
> > > > > >
> > > > > > Additionally, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO has a flags
> arg
> > > that
> > > > > > isn't used, why do we need a new ioctl vs defining
> > > > > > VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID.
> > > > >
> > > > > Sure. I can follow this suggestion. BTW. I have a doubt here. This
> new
> > > flag
> > > > > is set by user. What if in the future kernel has new extensions and
> needs
> > > > > to report something new to the user and add new flags to tell user?
> Such
> > > > > flag is set by kernel. Then the flags field may have two kinds of flags
> > > (some
> > > > > set by user while some set by kernel). Will it mess up the flags space?
> > > > >
> > > >
> > > > flags in a GET_INFO ioctl is for output.
> > > >
> > > > if user needs to use flags as input to select different type of info then it
> > > should
> > > > be split into multiple GET_INFO cmds.
> > >
> > > I don't know that that's actually a rule, however we don't currently
> > > test flags is zero for input, so in this case I think we are stuck with
> > > it only being for output.
> > >
> > > Alternatively, should VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
> > > automatically
> > > return the dev_id variant of the output and set a flag to indicate this
> > > is the case when called on a device fd opened as a cdev?  Thanks,
> >
> > Personally I prefer that user asks for dev_id info explicitly. The major
> reason
> > that we return dev_id is that the group/bdf info is not enough for the
> device
> > fd passing case. But if qemu opens device by itself, the group/bdf info is
> still
> > enough. So a device opened as a cdev doesn't mean it should return
> dev_id,
> > it depends on if user has the bdf knowledge.
> 
> But if QEMU opens the cdev, vs getting it from the group, does it make
> any sense to return a set of group-ids + bdf in the host-reset info?
> I'm inclined to think the answer is no.
> 
> Per my previous suggestion, I think we should always return the bdf. We
> can't know if the user is accessing through an fd they opened
> themselves or were passed,

Oh, yes. I'm convinced by this reason since only cdev mode supports device fd
passing. So I'll reuse the existing _INFO and let kernel set a flag to mark the returned
info is dev_id+bdf.

A check. If the device that the _INFIO is invoked is opened via cdev, but there
are devices in the dev_set that are got via VFIO_GROUP_GET_DEVICE_FD, should
I fail it or allow it?

> but it allows an actually useful debugging
> report if userspace can know "I can't perform a hot reset of this
> device because my iommufd context doesn't know about device <bdf>", vs
> an opaque -EPERM.  Therefore I think we're only discussing the data
> conveyed in the u32, a group-id or dev_id.  Thanks,

Sure.

Regards,
Yi Liu



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
  2023-03-28 15:00               ` Liu, Yi L
@ 2023-03-28 15:18                 ` Alex Williamson
  2023-03-28 15:45                   ` Liu, Yi L
  2023-03-28 16:29                   ` Jason Gunthorpe
  0 siblings, 2 replies; 56+ messages in thread
From: Alex Williamson @ 2023-03-28 15:18 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Tian, Kevin, jgg, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting

On Tue, 28 Mar 2023 15:00:42 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Tuesday, March 28, 2023 10:46 PM
> > 
> > On Tue, 28 Mar 2023 14:38:12 +0000
> > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >   
> > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > Sent: Tuesday, March 28, 2023 10:26 PM
> > > >
> > > > On Tue, 28 Mar 2023 06:19:06 +0000
> > > > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> > > >  
> > > > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > > > Sent: Tuesday, March 28, 2023 11:32 AM
> > > > > >  
> > > > > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > > > > Sent: Tuesday, March 28, 2023 3:26 AM
> > > > > > >
> > > > > > > Additionally, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO has a flags  
> > arg  
> > > > that  
> > > > > > > isn't used, why do we need a new ioctl vs defining
> > > > > > > VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID.  
> > > > > >
> > > > > > Sure. I can follow this suggestion. BTW. I have a doubt here. This  
> > new  
> > > > flag  
> > > > > > is set by user. What if in the future kernel has new extensions and  
> > needs  
> > > > > > to report something new to the user and add new flags to tell user?  
> > Such  
> > > > > > flag is set by kernel. Then the flags field may have two kinds of flags  
> > > > (some  
> > > > > > set by user while some set by kernel). Will it mess up the flags space?
> > > > > >  
> > > > >
> > > > > flags in a GET_INFO ioctl is for output.
> > > > >
> > > > > if user needs to use flags as input to select different type of info then it  
> > > > should  
> > > > > be split into multiple GET_INFO cmds.  
> > > >
> > > > I don't know that that's actually a rule, however we don't currently
> > > > test flags is zero for input, so in this case I think we are stuck with
> > > > it only being for output.
> > > >
> > > > Alternatively, should VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
> > > > automatically
> > > > return the dev_id variant of the output and set a flag to indicate this
> > > > is the case when called on a device fd opened as a cdev?  Thanks,  
> > >
> > > Personally I prefer that user asks for dev_id info explicitly. The major  
> > reason  
> > > that we return dev_id is that the group/bdf info is not enough for the  
> > device  
> > > fd passing case. But if qemu opens device by itself, the group/bdf info is  
> > still  
> > > enough. So a device opened as a cdev doesn't mean it should return  
> > dev_id,  
> > > it depends on if user has the bdf knowledge.  
> > 
> > But if QEMU opens the cdev, vs getting it from the group, does it make
> > any sense to return a set of group-ids + bdf in the host-reset info?
> > I'm inclined to think the answer is no.
> > 
> > Per my previous suggestion, I think we should always return the bdf. We
> > can't know if the user is accessing through an fd they opened
> > themselves or were passed,  
> 
> Oh, yes. I'm convinced by this reason since only cdev mode supports device fd
> passing. So I'll reuse the existing _INFO and let kernel set a flag to mark the returned
> info is dev_id+bdf.
> 
> A check. If the device that the _INFIO is invoked is opened via cdev, but there
> are devices in the dev_set that are got via VFIO_GROUP_GET_DEVICE_FD, should
> I fail it or allow it?

It's a niche case, but I think it needs to be allowed.  We'd still
report the bdf for those devices, but make use of the invalid/null
dev-id.  I think this empowers userspace that they could make the same
call on a group opened fd if necessary.  An alternative would be to
redefine the returned data structure entirely with a flag per entry
describing the output, but then I think we need to invent a kernel
policy of which gets reported, which seems overly complicated if our
goal is to phase out group usage.  Make sense, or will this bite us?
Thanks,

Alex


^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
  2023-03-28 15:18                 ` Alex Williamson
@ 2023-03-28 15:45                   ` Liu, Yi L
  2023-03-28 16:00                     ` Alex Williamson
  2023-03-28 16:29                   ` Jason Gunthorpe
  1 sibling, 1 reply; 56+ messages in thread
From: Liu, Yi L @ 2023-03-28 15:45 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Tian, Kevin, jgg, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Tuesday, March 28, 2023 11:18 PM
> 
> On Tue, 28 Mar 2023 15:00:42 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> 
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Tuesday, March 28, 2023 10:46 PM
> > >
> > > On Tue, 28 Mar 2023 14:38:12 +0000
> > > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> > >
> > > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > > Sent: Tuesday, March 28, 2023 10:26 PM
> > > > >
> > > > > On Tue, 28 Mar 2023 06:19:06 +0000
> > > > > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> > > > >
> > > > > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > > > > Sent: Tuesday, March 28, 2023 11:32 AM
> > > > > > >
> > > > > > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > > > > > Sent: Tuesday, March 28, 2023 3:26 AM
> > > > > > > >
> > > > > > > > Additionally, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO has a
> flags
> > > arg
> > > > > that
> > > > > > > > isn't used, why do we need a new ioctl vs defining
> > > > > > > > VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID.
> > > > > > >
> > > > > > > Sure. I can follow this suggestion. BTW. I have a doubt here. This
> > > new
> > > > > flag
> > > > > > > is set by user. What if in the future kernel has new extensions and
> > > needs
> > > > > > > to report something new to the user and add new flags to tell
> user?
> > > Such
> > > > > > > flag is set by kernel. Then the flags field may have two kinds of
> flags
> > > > > (some
> > > > > > > set by user while some set by kernel). Will it mess up the flags
> space?
> > > > > > >
> > > > > >
> > > > > > flags in a GET_INFO ioctl is for output.
> > > > > >
> > > > > > if user needs to use flags as input to select different type of info
> then it
> > > > > should
> > > > > > be split into multiple GET_INFO cmds.
> > > > >
> > > > > I don't know that that's actually a rule, however we don't currently
> > > > > test flags is zero for input, so in this case I think we are stuck with
> > > > > it only being for output.
> > > > >
> > > > > Alternatively, should VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
> > > > > automatically
> > > > > return the dev_id variant of the output and set a flag to indicate this
> > > > > is the case when called on a device fd opened as a cdev?  Thanks,
> > > >
> > > > Personally I prefer that user asks for dev_id info explicitly. The major
> > > reason
> > > > that we return dev_id is that the group/bdf info is not enough for the
> > > device
> > > > fd passing case. But if qemu opens device by itself, the group/bdf info
> is
> > > still
> > > > enough. So a device opened as a cdev doesn't mean it should return
> > > dev_id,
> > > > it depends on if user has the bdf knowledge.
> > >
> > > But if QEMU opens the cdev, vs getting it from the group, does it make
> > > any sense to return a set of group-ids + bdf in the host-reset info?
> > > I'm inclined to think the answer is no.
> > >
> > > Per my previous suggestion, I think we should always return the bdf. We
> > > can't know if the user is accessing through an fd they opened
> > > themselves or were passed,
> >
> > Oh, yes. I'm convinced by this reason since only cdev mode supports
> device fd
> > passing. So I'll reuse the existing _INFO and let kernel set a flag to mark
> the returned
> > info is dev_id+bdf.
> >
> > A check. If the device that the _INFIO is invoked is opened via cdev, but
> there
> > are devices in the dev_set that are got via VFIO_GROUP_GET_DEVICE_FD,
> should
> > I fail it or allow it?
> 
> It's a niche case, but I think it needs to be allowed. 

I'm also wondering if it is common in the future. Actually, a user should be
preferred to either use the group or cdev, but not both. Otherwise, it looks
to be bad programming.:-)

Also, as an earlier remark from Jason. If there are affected devices that are
opened by other users, the new _INFO should fail with -EPERM. I know this
remark was for the new _INFO ioctl. But now, we are going to reuse the
existing _INFO, so I'd also want to check if we still need this policy? If yes,
then it is a problem to check the owner of the devices that are opened by
the group path.

https://lore.kernel.org/kvm/ZBsF950laMs2ldFc@nvidia.com/


> We'd still
> report the bdf for those devices, but make use of the invalid/null
> dev-id.  I think this empowers userspace that they could make the same
> call on a group opened fd if necessary.

For the devices opened via group path, it should have an entry that
includes invalid_dev_id+bdf. So user can map it to the device. But
there is no group_id, this may be fine since group is just a shortcut
to find the device. Is it?


> An alternative would be to
> redefine the returned data structure entirely with a flag per entry
> describing the output, but then I think we need to invent a kernel
> policy of which gets reported, which seems overly complicated if our
> goal is to phase out group usage.  Make sense, or will this bite us?

This does smell complicated. 😊 maybe minimum change is better as
the group is going to be phased out.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
  2023-03-28 15:45                   ` Liu, Yi L
@ 2023-03-28 16:00                     ` Alex Williamson
  2023-03-29  3:13                       ` Liu, Yi L
  0 siblings, 1 reply; 56+ messages in thread
From: Alex Williamson @ 2023-03-28 16:00 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Tian, Kevin, jgg, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting

On Tue, 28 Mar 2023 15:45:38 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Tuesday, March 28, 2023 11:18 PM
> > 
> > On Tue, 28 Mar 2023 15:00:42 +0000
> > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> >   
> > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > Sent: Tuesday, March 28, 2023 10:46 PM
> > > >
> > > > On Tue, 28 Mar 2023 14:38:12 +0000
> > > > "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> > > >  
> > > > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > > > Sent: Tuesday, March 28, 2023 10:26 PM
> > > > > >
> > > > > > On Tue, 28 Mar 2023 06:19:06 +0000
> > > > > > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> > > > > >  
> > > > > > > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > > > > > > Sent: Tuesday, March 28, 2023 11:32 AM
> > > > > > > >  
> > > > > > > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > > > > > > Sent: Tuesday, March 28, 2023 3:26 AM
> > > > > > > > >
> > > > > > > > > Additionally, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO has a  
> > flags  
> > > > arg  
> > > > > > that  
> > > > > > > > > isn't used, why do we need a new ioctl vs defining
> > > > > > > > > VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID.  
> > > > > > > >
> > > > > > > > Sure. I can follow this suggestion. BTW. I have a doubt here. This  
> > > > new  
> > > > > > flag  
> > > > > > > > is set by user. What if in the future kernel has new extensions and  
> > > > needs  
> > > > > > > > to report something new to the user and add new flags to tell  
> > user?  
> > > > Such  
> > > > > > > > flag is set by kernel. Then the flags field may have two kinds of  
> > flags  
> > > > > > (some  
> > > > > > > > set by user while some set by kernel). Will it mess up the flags  
> > space?  
> > > > > > > >  
> > > > > > >
> > > > > > > flags in a GET_INFO ioctl is for output.
> > > > > > >
> > > > > > > if user needs to use flags as input to select different type of info  
> > then it  
> > > > > > should  
> > > > > > > be split into multiple GET_INFO cmds.  
> > > > > >
> > > > > > I don't know that that's actually a rule, however we don't currently
> > > > > > test flags is zero for input, so in this case I think we are stuck with
> > > > > > it only being for output.
> > > > > >
> > > > > > Alternatively, should VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
> > > > > > automatically
> > > > > > return the dev_id variant of the output and set a flag to indicate this
> > > > > > is the case when called on a device fd opened as a cdev?  Thanks,  
> > > > >
> > > > > Personally I prefer that user asks for dev_id info explicitly. The major  
> > > > reason  
> > > > > that we return dev_id is that the group/bdf info is not enough for the  
> > > > device  
> > > > > fd passing case. But if qemu opens device by itself, the group/bdf info  
> > is  
> > > > still  
> > > > > enough. So a device opened as a cdev doesn't mean it should return  
> > > > dev_id,  
> > > > > it depends on if user has the bdf knowledge.  
> > > >
> > > > But if QEMU opens the cdev, vs getting it from the group, does it make
> > > > any sense to return a set of group-ids + bdf in the host-reset info?
> > > > I'm inclined to think the answer is no.
> > > >
> > > > Per my previous suggestion, I think we should always return the bdf. We
> > > > can't know if the user is accessing through an fd they opened
> > > > themselves or were passed,  
> > >
> > > Oh, yes. I'm convinced by this reason since only cdev mode supports  
> > device fd  
> > > passing. So I'll reuse the existing _INFO and let kernel set a flag to mark  
> > the returned  
> > > info is dev_id+bdf.
> > >
> > > A check. If the device that the _INFIO is invoked is opened via cdev, but  
> > there  
> > > are devices in the dev_set that are got via VFIO_GROUP_GET_DEVICE_FD,  
> > should  
> > > I fail it or allow it?  
> > 
> > It's a niche case, but I think it needs to be allowed.   
> 
> I'm also wondering if it is common in the future. Actually, a user should be
> preferred to either use the group or cdev, but not both. Otherwise, it looks
> to be bad programming.:-)
> 
> Also, as an earlier remark from Jason. If there are affected devices that are
> opened by other users, the new _INFO should fail with -EPERM. I know this
> remark was for the new _INFO ioctl. But now, we are going to reuse the
> existing _INFO, so I'd also want to check if we still need this policy? If yes,
> then it is a problem to check the owner of the devices that are opened by
> the group path.
> 
> https://lore.kernel.org/kvm/ZBsF950laMs2ldFc@nvidia.com/

Personally I don't like the suggestion to fail with -EPERM if the user
doesn't own all the affected devices.  This isn't a "probe if I can do
a reset" ioctl, it's a "provide information about the devices affected
by a reset to know how to call the hot-reset ioctl".  We're returning
the bdf to the cdev version of this ioctl for exactly this debugging
purpose when the devices are not owned, that becomes useless if we give
up an return -EPERM if ownership doesn't align.

> > We'd still
> > report the bdf for those devices, but make use of the invalid/null
> > dev-id.  I think this empowers userspace that they could make the same
> > call on a group opened fd if necessary.  
> 
> For the devices opened via group path, it should have an entry that
> includes invalid_dev_id+bdf. So user can map it to the device. But
> there is no group_id, this may be fine since group is just a shortcut
> to find the device. Is it?

Yes, it could be argued that the group-id itself is superfluous, the
user can determine the group via the bdf, but it also aligns with the
hot-reset ioctl, which currently requires the group fd.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
  2023-03-28 15:18                 ` Alex Williamson
  2023-03-28 15:45                   ` Liu, Yi L
@ 2023-03-28 16:29                   ` Jason Gunthorpe
  2023-03-28 19:09                     ` Alex Williamson
  1 sibling, 1 reply; 56+ messages in thread
From: Jason Gunthorpe @ 2023-03-28 16:29 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Liu, Yi L, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting

On Tue, Mar 28, 2023 at 09:18:01AM -0600, Alex Williamson wrote:

> It's a niche case, but I think it needs to be allowed.  We'd still
> report the bdf for those devices, but make use of the invalid/null
> dev-id.

IDK, it makes the whole implementation much more complicated. Instead
of just copying the current dev_set to the output and calling
vfio_pci_dev_set_resettable() we need to do something more complex..

Keeping the current ioctl as-is means this IOCTL can be used to do any
debugging by getting the actual BDF list.

It means we can make the a new ioctl simple and just return the dev_id
array without these edge complications. I don't think merging two
different ioctls is helping make things simple..

It seems like it does what qemu wants: call the new IOCTL, if it
fails, call the old IOCTL and print out the BDF list to help debug and
then exit.

On success use the data in the new ioctl to generate the machine
configuration to pass the reset grouping into the VM.

When reset actually comes in just trigger it.

Jason

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
  2023-03-28 16:29                   ` Jason Gunthorpe
@ 2023-03-28 19:09                     ` Alex Williamson
  2023-03-28 19:22                       ` Jason Gunthorpe
  0 siblings, 1 reply; 56+ messages in thread
From: Alex Williamson @ 2023-03-28 19:09 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liu, Yi L, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting

On Tue, 28 Mar 2023 13:29:23 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Tue, Mar 28, 2023 at 09:18:01AM -0600, Alex Williamson wrote:
> 
> > It's a niche case, but I think it needs to be allowed.  We'd still
> > report the bdf for those devices, but make use of the invalid/null
> > dev-id.  
> 
> IDK, it makes the whole implementation much more complicated. Instead
> of just copying the current dev_set to the output and calling
> vfio_pci_dev_set_resettable() we need to do something more complex..
> 
> Keeping the current ioctl as-is means this IOCTL can be used to do any
> debugging by getting the actual BDF list.
> 
> It means we can make the a new ioctl simple and just return the dev_id
> array without these edge complications. I don't think merging two
> different ioctls is helping make things simple..

OTOH, we already have an ioctl that essentially "does the right thing",
we just want to swap out one return field for another.  So which is
more complicated, adding another ioctl that does not quite the same
thing but still needing to maintain the old ioctl for detailed
information, or making the old ioctl bi-modal to return the appropriate
information for the type of device used to access it?

> It seems like it does what qemu wants: call the new IOCTL, if it
> fails, call the old IOCTL and print out the BDF list to help debug and
> then exit.

Userspace is already dealing with a variable length array for the
return value, why would it ever want to repeat that process to get
debugging info.  Besides, wouldn't QEMU prefer the similarity of making
the same call for groups and cdev and simply keying on the data type of
one field?

> On success use the data in the new ioctl to generate the machine
> configuration to pass the reset grouping into the VM.
> 
> When reset actually comes in just trigger it.

"Just trigger it" is the same in either case.  It seems bold to play
the complexity argument when we already have a function that does 90%
the correct thing where we can share much of the implementation and
userspace code without duplicating, but still relying on a legacy
interface for debugging.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
  2023-03-28 19:09                     ` Alex Williamson
@ 2023-03-28 19:22                       ` Jason Gunthorpe
  0 siblings, 0 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2023-03-28 19:22 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Liu, Yi L, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting

On Tue, Mar 28, 2023 at 01:09:49PM -0600, Alex Williamson wrote:

> "Just trigger it" is the same in either case.  It seems bold to play
> the complexity argument when we already have a function that does 90%
> the correct thing where we can share much of the implementation and
> userspace code without duplicating, but still relying on a legacy
> interface for debugging. 

It just doesn't seem like we are sharing a lot, this patch is a whole
new function and you are saying the unique implementation needs to be more
complex still.

Maybe the next version will be able to share more??

Like can we just patch the existing code that sets the group_id/dev_id
or somethiing???

Still, I'm not sure that qemu will share really share much here,
because if it runs in group mode then it has to parse the result in a
totally different way than if it runs in dev_id mode. The ioctl call
is only a line or two. I imagine qemu will end up with two different
functions that both return some kind of internal list of qemu devices
in the reset group.

Jason

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
  2023-03-28 16:00                     ` Alex Williamson
@ 2023-03-29  3:13                       ` Liu, Yi L
  2023-03-29  9:41                         ` Tian, Kevin
  0 siblings, 1 reply; 56+ messages in thread
From: Liu, Yi L @ 2023-03-29  3:13 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Tian, Kevin, jgg, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Wednesday, March 29, 2023 12:00 AM
> 
[...]
> > > > A check. If the device that the _INFIO is invoked is opened via cdev,
> but
> > > there
> > > > are devices in the dev_set that are got via
> VFIO_GROUP_GET_DEVICE_FD,
> > > should
> > > > I fail it or allow it?
> > >
> > > It's a niche case, but I think it needs to be allowed.
> >
> > I'm also wondering if it is common in the future. Actually, a user should be
> > preferred to either use the group or cdev, but not both. Otherwise, it
> looks
> > to be bad programming.:-)
> >
> > Also, as an earlier remark from Jason. If there are affected devices that
> are
> > opened by other users, the new _INFO should fail with -EPERM. I know
> this
> > remark was for the new _INFO ioctl. But now, we are going to reuse the
> > existing _INFO, so I'd also want to check if we still need this policy? If yes,
> > then it is a problem to check the owner of the devices that are opened by
> > the group path.
> >
> > https://lore.kernel.org/kvm/ZBsF950laMs2ldFc@nvidia.com/
> 
> Personally I don't like the suggestion to fail with -EPERM if the user
> doesn't own all the affected devices.  This isn't a "probe if I can do
> a reset" ioctl, it's a "provide information about the devices affected
> by a reset to know how to call the hot-reset ioctl".  We're returning
> the bdf to the cdev version of this ioctl for exactly this debugging
> purpose when the devices are not owned, that becomes useless if we give
> up an return -EPERM if ownership doesn't align.

Jason's suggestion makes sense for returning the case of returning dev_id
as dev_id is local to iommufd. If there are devices in the same dev_set are
opened by multiple users, multiple iommufd would be used. Then the
dev_id would have overlap. e.g. a dev_set has three devices. Device A and
B are opened by the current user as cdev, dev_id #1 and #2 are generated.
While device C opened by another user as cdev, dev_id #n is generated for
it. If dev_id #n happens to be #1, then user gets two info entries that have
the same dev_id.

I know the existing _INFO does not have such policy since the group/bdf
info are unique in the system. But for the dev_id case, seems really necessary
to fail if the dev_set is not only opened by one user. From this angle, will it be
good to have two ioctls. Sorry for the back-forth here. ☹

> > > We'd still
> > > report the bdf for those devices, but make use of the invalid/null
> > > dev-id.  I think this empowers userspace that they could make the same
> > > call on a group opened fd if necessary.
> >
> > For the devices opened via group path, it should have an entry that
> > includes invalid_dev_id+bdf. So user can map it to the device. But
> > there is no group_id, this may be fine since group is just a shortcut
> > to find the device. Is it?
> 
> Yes, it could be argued that the group-id itself is superfluous, the
> user can determine the group via the bdf, but it also aligns with the
> hot-reset ioctl, which currently requires the group fd.  Thanks,

I see. For the existing _INFO and HOT_RESET ioctl, I have no doubt.
Both of them use the group as a short-cut.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
  2023-03-29  3:13                       ` Liu, Yi L
@ 2023-03-29  9:41                         ` Tian, Kevin
  2023-03-29 15:49                           ` Alex Williamson
  2023-03-29 15:50                           ` Jason Gunthorpe
  0 siblings, 2 replies; 56+ messages in thread
From: Tian, Kevin @ 2023-03-29  9:41 UTC (permalink / raw)
  To: Liu, Yi L, Alex Williamson
  Cc: jgg, joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Wednesday, March 29, 2023 11:14 AM
> 
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Wednesday, March 29, 2023 12:00 AM
> >
> >
> > Personally I don't like the suggestion to fail with -EPERM if the user
> > doesn't own all the affected devices.  This isn't a "probe if I can do
> > a reset" ioctl, it's a "provide information about the devices affected
> > by a reset to know how to call the hot-reset ioctl".  We're returning
> > the bdf to the cdev version of this ioctl for exactly this debugging
> > purpose when the devices are not owned, that becomes useless if we give
> > up an return -EPERM if ownership doesn't align.
> 
> Jason's suggestion makes sense for returning the case of returning dev_id
> as dev_id is local to iommufd. If there are devices in the same dev_set are
> opened by multiple users, multiple iommufd would be used. Then the
> dev_id would have overlap. e.g. a dev_set has three devices. Device A and
> B are opened by the current user as cdev, dev_id #1 and #2 are generated.
> While device C opened by another user as cdev, dev_id #n is generated for
> it. If dev_id #n happens to be #1, then user gets two info entries that have
> the same dev_id.
> 

In Alex's proposal you'll set a invalid dev_id for device C so the user can
still get the info for diagnostic purpose instead of seeing an -EPERM error.

btw I found an open about fd pass scheme which may affect the choice here.

In concept even with cdev we still expect the userspace to maintain the
group knowledge so it won't inadvertently attempt to assign devices in
the same group to different IOAS's. It also needs such knowledge when
constructing guest topology.

with fd passed in Qemu has no way to associate the fd to a group.

We could extend bind_iommufd to return the group id or introduce a
new ioctl to query it per dev_id.

Once that is in place looks we don't need a new _INFO ioctl?

Thanks
Kevin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
  2023-03-29  9:41                         ` Tian, Kevin
@ 2023-03-29 15:49                           ` Alex Williamson
  2023-03-29 15:57                             ` Jason Gunthorpe
  2023-03-30 12:48                             ` Liu, Yi L
  2023-03-29 15:50                           ` Jason Gunthorpe
  1 sibling, 2 replies; 56+ messages in thread
From: Alex Williamson @ 2023-03-29 15:49 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, jgg, joro, robin.murphy, cohuck, eric.auger, nicolinc,
	kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting

On Wed, 29 Mar 2023 09:41:26 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Wednesday, March 29, 2023 11:14 AM
> >   
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Wednesday, March 29, 2023 12:00 AM
> > >
> > >
> > > Personally I don't like the suggestion to fail with -EPERM if the user
> > > doesn't own all the affected devices.  This isn't a "probe if I can do
> > > a reset" ioctl, it's a "provide information about the devices affected
> > > by a reset to know how to call the hot-reset ioctl".  We're returning
> > > the bdf to the cdev version of this ioctl for exactly this debugging
> > > purpose when the devices are not owned, that becomes useless if we give
> > > up an return -EPERM if ownership doesn't align.  
> > 
> > Jason's suggestion makes sense for returning the case of returning dev_id
> > as dev_id is local to iommufd. If there are devices in the same dev_set are
> > opened by multiple users, multiple iommufd would be used. Then the
> > dev_id would have overlap. e.g. a dev_set has three devices. Device A and
> > B are opened by the current user as cdev, dev_id #1 and #2 are generated.
> > While device C opened by another user as cdev, dev_id #n is generated for
> > it. If dev_id #n happens to be #1, then user gets two info entries that have
> > the same dev_id.
> >   
> 
> In Alex's proposal you'll set a invalid dev_id for device C so the user can
> still get the info for diagnostic purpose instead of seeing an -EPERM error.

Yes, we shouldn't be reporting dev_ids outside of the user's iommufd
context.

> btw I found an open about fd pass scheme which may affect the choice here.
> 
> In concept even with cdev we still expect the userspace to maintain the
> group knowledge so it won't inadvertently attempt to assign devices in
> the same group to different IOAS's. It also needs such knowledge when
> constructing guest topology.
> 
> with fd passed in Qemu has no way to associate the fd to a group.

Hmm, QEMU tries to get the group for the device address space in the
guest, so finding an existing group with a different address space
indeed allows QEMU to know of this conflict since the group is the
fundamental unit IOMMU context in the legacy vfio model.

> We could extend bind_iommufd to return the group id or introduce a
> new ioctl to query it per dev_id.

That would be ironic to go to all this trouble to remove groups from
the API only to have them show up here.  But with a cdev interface,
don't we break that model of conflating isolation and address-ability?

For example, devices within a group cannot be bound to separate
iommufds due to lack of isolation, which is handled via DMA ownership,
but barring DMA aliasing issues, due to conventional PCI buses or
quirks, cdev could allow devices within the same group to be managed by
separate IOAS's.  So the group information really isn't enough for
userspace to infer address space restrictions with cdev anyway.

Therefore aren't we expecting this to be denied at attach_ioas() and
QEMU shouldn't be making these sorts of assumptions for cdev anyway?
Thanks,

Alex


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
  2023-03-29  9:41                         ` Tian, Kevin
  2023-03-29 15:49                           ` Alex Williamson
@ 2023-03-29 15:50                           ` Jason Gunthorpe
  2023-03-30  1:10                             ` Tian, Kevin
  1 sibling, 1 reply; 56+ messages in thread
From: Jason Gunthorpe @ 2023-03-29 15:50 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Liu, Yi L, Alex Williamson, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390, Hao,
	Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting

On Wed, Mar 29, 2023 at 09:41:26AM +0000, Tian, Kevin wrote:

> We could extend bind_iommufd to return the group id or introduce a
> new ioctl to query it per dev_id.

> Once that is in place looks we don't need a new _INFO ioctl?

The iommu_group and the reset group are different things

The issue is processing the BDF strings, not the group ID.

Probably we should have some way for iommufd to report the group_id
from the dev_id?

Jason

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
  2023-03-29 15:49                           ` Alex Williamson
@ 2023-03-29 15:57                             ` Jason Gunthorpe
  2023-03-30  1:17                               ` Tian, Kevin
  2023-03-30 12:48                             ` Liu, Yi L
  1 sibling, 1 reply; 56+ messages in thread
From: Jason Gunthorpe @ 2023-03-29 15:57 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Tian, Kevin, Liu, Yi L, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting

On Wed, Mar 29, 2023 at 09:49:44AM -0600, Alex Williamson wrote:
 
> > We could extend bind_iommufd to return the group id or introduce a
> > new ioctl to query it per dev_id.
> 
> That would be ironic to go to all this trouble to remove groups from
> the API only to have them show up here.

Groups always had to be part of the API for advanced cases like qemu -
the point was to make them a small side bit of information not front
and center in control of everything.

> For example, devices within a group cannot be bound to separate
> iommufds due to lack of isolation, which is handled via DMA ownership,
> but barring DMA aliasing issues, due to conventional PCI buses or
> quirks, cdev could allow devices within the same group to be managed by
> separate IOAS's.  

Maybe some future kernel could do this, the API allows it at least..

> So the group information really isn't enough for
> userspace to infer address space restrictions with cdev anyway.
> 
> Therefore aren't we expecting this to be denied at attach_ioas() and
> QEMU shouldn't be making these sorts of assumptions for cdev anyway?

I guess we could make an API specifically to report same-iommu_domina
information?

I was assuming qemu would use the group for now as I don't see a
likely future when we would relax that restriction.. So I was keeping
a "add it when we need it" attitude here.

Jason

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
  2023-03-29 15:50                           ` Jason Gunthorpe
@ 2023-03-30  1:10                             ` Tian, Kevin
  2023-03-30  1:33                               ` Tian, Kevin
  0 siblings, 1 reply; 56+ messages in thread
From: Tian, Kevin @ 2023-03-30  1:10 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liu, Yi L, Alex Williamson, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390, Hao,
	Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, March 29, 2023 11:50 PM
> 
> On Wed, Mar 29, 2023 at 09:41:26AM +0000, Tian, Kevin wrote:
> 
> > We could extend bind_iommufd to return the group id or introduce a
> > new ioctl to query it per dev_id.
> 
> > Once that is in place looks we don't need a new _INFO ioctl?
> 
> The iommu_group and the reset group are different things
> 
> The issue is processing the BDF strings, not the group ID.
> 
> Probably we should have some way for iommufd to report the group_id
> from the dev_id?
> 

Yes, that is my thought. Though iommu_group and reset group are
different things we could still leverage existing _INFO ioctl once there
is a way to associated dev_id to group_id.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
  2023-03-29 15:57                             ` Jason Gunthorpe
@ 2023-03-30  1:17                               ` Tian, Kevin
  2023-03-30 22:38                                 ` Jason Gunthorpe
  0 siblings, 1 reply; 56+ messages in thread
From: Tian, Kevin @ 2023-03-30  1:17 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson
  Cc: Liu, Yi L, joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, March 29, 2023 11:58 PM
> 
> On Wed, Mar 29, 2023 at 09:49:44AM -0600, Alex Williamson wrote:
> 
> > > We could extend bind_iommufd to return the group id or introduce a
> > > new ioctl to query it per dev_id.
> >
> > That would be ironic to go to all this trouble to remove groups from
> > the API only to have them show up here.
> 
> Groups always had to be part of the API for advanced cases like qemu -
> the point was to make them a small side bit of information not front
> and center in control of everything.

Agree.

> 
> > For example, devices within a group cannot be bound to separate
> > iommufds due to lack of isolation, which is handled via DMA ownership,
> > but barring DMA aliasing issues, due to conventional PCI buses or
> > quirks, cdev could allow devices within the same group to be managed by
> > separate IOAS's.
> 
> Maybe some future kernel could do this, the API allows it at least..
> 
> > So the group information really isn't enough for
> > userspace to infer address space restrictions with cdev anyway.
> >
> > Therefore aren't we expecting this to be denied at attach_ioas() and
> > QEMU shouldn't be making these sorts of assumptions for cdev anyway?
> 
> I guess we could make an API specifically to report same-iommu_domina
> information?
> 
> I was assuming qemu would use the group for now as I don't see a
> likely future when we would relax that restriction.. So I was keeping
> a "add it when we need it" attitude here.
> 

IIRC we discussed this subgroup concept in the thread of reviewing my
high level design proposal 2yrs ago. The consensus at the moment was
that subgroup is architecturally allowed w/o DMA aliasing issues but
we're yet to see a real demand of relaxing current group restriction to
support it. Also with time moving newer platforms should have less
multi-devices group so the need of subgroup is further decreased.

So I'm also inclined to laying the existing group restriction with cdev
for now.

Then can we make a decision how this group_id might be reported?

In nesting series we'll have a GET_INFO ioctl per dev_id. It could be
extended to report group_id too.

Or alternatively just return it in BIND_IOMMUFD together with dev_id.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
  2023-03-30  1:10                             ` Tian, Kevin
@ 2023-03-30  1:33                               ` Tian, Kevin
  0 siblings, 0 replies; 56+ messages in thread
From: Tian, Kevin @ 2023-03-30  1:33 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe
  Cc: Liu, Yi L, Alex Williamson, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390, Hao,
	Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting

> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Thursday, March 30, 2023 9:10 AM
> 
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Wednesday, March 29, 2023 11:50 PM
> >
> > On Wed, Mar 29, 2023 at 09:41:26AM +0000, Tian, Kevin wrote:
> >
> > > We could extend bind_iommufd to return the group id or introduce a
> > > new ioctl to query it per dev_id.
> >
> > > Once that is in place looks we don't need a new _INFO ioctl?
> >
> > The iommu_group and the reset group are different things
> >
> > The issue is processing the BDF strings, not the group ID.
> >
> > Probably we should have some way for iommufd to report the group_id
> > from the dev_id?
> >
> 
> Yes, that is my thought. Though iommu_group and reset group are
> different things we could still leverage existing _INFO ioctl once there
> is a way to associated dev_id to group_id.

Please ignore this comment. Yes they are different things so even if
a dev_id is in a group_id reported on a reset BDF string it doesn't mean
this dev_id is in the reset group.

Qemu can know that all affected devices are either owned by itself or
not used by other processes if dev_id's opened by itself can be
associated to all group_id's reported in the BDF strings. But it still lacks
of information to tell the reset dependency within those opened devices
within Qemu.

So we do need a new _INFO ioctl for cdev. :/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
  2023-03-29 15:49                           ` Alex Williamson
  2023-03-29 15:57                             ` Jason Gunthorpe
@ 2023-03-30 12:48                             ` Liu, Yi L
  2023-03-30 12:56                               ` Liu, Yi L
  2023-03-30 22:44                               ` Jason Gunthorpe
  1 sibling, 2 replies; 56+ messages in thread
From: Liu, Yi L @ 2023-03-30 12:48 UTC (permalink / raw)
  To: Alex Williamson, Tian, Kevin
  Cc: jgg, joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Wednesday, March 29, 2023 11:50 PM
> 
> On Wed, 29 Mar 2023 09:41:26 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Wednesday, March 29, 2023 11:14 AM
> > >
> > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > Sent: Wednesday, March 29, 2023 12:00 AM
> > > >
> > > >
> > > > Personally I don't like the suggestion to fail with -EPERM if the user
> > > > doesn't own all the affected devices.  This isn't a "probe if I can do
> > > > a reset" ioctl, it's a "provide information about the devices affected
> > > > by a reset to know how to call the hot-reset ioctl".  We're returning
> > > > the bdf to the cdev version of this ioctl for exactly this debugging
> > > > purpose when the devices are not owned, that becomes useless if we give
> > > > up an return -EPERM if ownership doesn't align.
> > >
> > > Jason's suggestion makes sense for returning the case of returning dev_id
> > > as dev_id is local to iommufd. If there are devices in the same dev_set are
> > > opened by multiple users, multiple iommufd would be used. Then the
> > > dev_id would have overlap. e.g. a dev_set has three devices. Device A and
> > > B are opened by the current user as cdev, dev_id #1 and #2 are generated.
> > > While device C opened by another user as cdev, dev_id #n is generated for
> > > it. If dev_id #n happens to be #1, then user gets two info entries that have
> > > the same dev_id.
> > >
> >
> > In Alex's proposal you'll set a invalid dev_id for device C so the user can
> > still get the info for diagnostic purpose instead of seeing an -EPERM error.
> 
> Yes, we shouldn't be reporting dev_ids outside of the user's iommufd
> context.

Following Alex's suggestion, here are two commits to extend existing _INFO
to report dev_id.

From ad5c81366813c5effd707a0b5f5e797f5fdb3115 Mon Sep 17 00:00:00 2001
From: Yi Liu <yi.l.liu@intel.com>
Date: Thu, 30 Mar 2023 05:29:36 -0700
Subject: [PATCH] vfio: Mark cdev usage in vfio_device

There are users that needs to check if vfio_device is opened as cdev.
e.g. vfio-pci.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/device_cdev.c |  2 ++
 include/linux/vfio.h       | 14 ++++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
index b5de997bff6d..56f3bbe34680 100644
--- a/drivers/vfio/device_cdev.c
+++ b/drivers/vfio/device_cdev.c
@@ -66,6 +66,7 @@ void vfio_device_cdev_close(struct vfio_device_file *df)
 		return;
 
 	mutex_lock(&device->dev_set->lock);
+	device->cdev_opened = false;
 	vfio_device_close(df);
 	vfio_device_put_kvm(device);
 	if (df->iommufd)
@@ -180,6 +181,7 @@ long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
 	 * read/write/mmap
 	 */
 	smp_store_release(&df->access_granted, true);
+	device->cdev_opened = true;
 	mutex_unlock(&device->dev_set->lock);
 
 	return 0;
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 1367605d617c..86efc1640940 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -58,6 +58,7 @@ struct vfio_device {
 	struct device device;	/* device.kref covers object life circle */
 #if IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV)
 	struct cdev cdev;
+	bool cdev_opened;
 #endif
 	refcount_t refcount;	/* user count on registered device*/
 	unsigned int open_count;
@@ -167,6 +168,19 @@ static inline int vfio_iommufd_physical_devid(struct vfio_device *vdev, u32 *id)
 	((void (*)(struct vfio_device *vdev)) NULL)
 #endif
 
+#if IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV)
+static inline bool vfio_device_cdev_opened(struct vfio_device *device)
+{
+	lockdep_assert_held(&device->dev_set->lock);
+	return device->cdev_opened;
+}
+#else
+static inline bool vfio_device_cdev_opened(struct vfio_device *device)
+{
+	return false;
+}
+#endif
+
 /**
  * @migration_set_state: Optional callback to change the migration state for
  *         devices that support migration. It's mandatory for
-- 
2.34.1


From f796cafd6c51e49adcf76352dc1daf14712c3a48 Mon Sep 17 00:00:00 2001
From: Yi Liu <yi.l.liu@intel.com>
Date: Thu, 30 Mar 2023 05:44:45 -0700
Subject: [PATCH] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

for the devices opened as cdev.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 59 ++++++++++++++++++++++++++++----
 include/uapi/linux/vfio.h        |  6 +++-
 2 files changed, 57 insertions(+), 8 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 19f5b075d70a..49e0981037f7 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -767,6 +767,20 @@ static int vfio_pci_get_irq_count(struct vfio_pci_core_device *vdev, int irq_typ
 	return 0;
 }
 
+static struct vfio_device *
+vfio_pci_find_device_in_devset(struct vfio_device_set *dev_set,
+			       struct pci_dev *pdev)
+{
+	struct vfio_device *cur;
+
+	lockdep_assert_held(&dev_set->lock);
+
+	list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
+		if (cur->dev == &pdev->dev)
+			return cur;
+	return NULL;
+}
+
 static int vfio_pci_count_devs(struct pci_dev *pdev, void *data)
 {
 	(*(int *)data)++;
@@ -776,13 +790,20 @@ static int vfio_pci_count_devs(struct pci_dev *pdev, void *data)
 struct vfio_pci_fill_info {
 	int max;
 	int cur;
+	bool require_devid;
+	struct iommufd_ctx *iommufd;
+	struct vfio_device_set *dev_set;
 	struct vfio_pci_dependent_device *devices;
 };
 
 static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
 {
 	struct vfio_pci_fill_info *fill = data;
+	struct vfio_device_set *dev_set = fill->dev_set;
 	struct iommu_group *iommu_group;
+	struct vfio_device *vdev;
+
+	lockdep_assert_held(&dev_set->lock);
 
 	if (fill->cur == fill->max)
 		return -EAGAIN; /* Something changed, try again */
@@ -791,7 +812,24 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
 	if (!iommu_group)
 		return -EPERM; /* Cannot reset non-isolated devices */
 
-	fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
+	/*
+	 * If dev_id is needed, fill in the dev_id field, otherwise
+	 * fill in group_id.
+	 */
+	if (fill->require_devid) {
+		/*
+		 * Report the devices that are opened as cdev and have
+		 * the same iommufd with the fill->iommufd.  Otherwise,
+		 * just fill in an IOMMUFD_INVALID_ID.
+		 */
+		vdev = vfio_pci_find_device_in_devset(dev_set, pdev);
+		if (vdev && !vfio_device_cdev_opened(vdev) &&
+		    fill->iommufd == vfio_iommufd_physical_ictx(vdev))
+			vfio_iommufd_physical_devid(vdev, &fill->devices[fill->cur].dev_id);
+		fill->devices[fill->cur].dev_id = IOMMUFD_INVALID_ID;
+	} else {
+		fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
+	}
 	fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
 	fill->devices[fill->cur].bus = pdev->bus->number;
 	fill->devices[fill->cur].devfn = pdev->devfn;
@@ -1230,17 +1269,25 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
 		return -ENOMEM;
 
 	fill.devices = devices;
+	fill.dev_set = vdev->vdev.dev_set;
 
+	mutex_lock(&vdev->vdev.dev_set->lock);
+	if (vfio_device_cdev_opened(&vdev->vdev))
+		fill.require_devid = true;
 	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
 					    &fill, slot);
+	mutex_unlock(&vdev->vdev.dev_set->lock);
 
 	/*
 	 * If a device was removed between counting and filling, we may come up
 	 * short of fill.max.  If a device was added, we'll have a return of
 	 * -EAGAIN above.
 	 */
-	if (!ret)
+	if (!ret) {
 		hdr.count = fill.cur;
+		if (fill.require_devid)
+			hdr.flags = VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID;
+	}
 
 reset_info_exit:
 	if (copy_to_user(arg, &hdr, minsz))
@@ -2346,12 +2393,10 @@ static bool vfio_dev_in_files(struct vfio_pci_core_device *vdev,
 static int vfio_pci_is_device_in_set(struct pci_dev *pdev, void *data)
 {
 	struct vfio_device_set *dev_set = data;
-	struct vfio_device *cur;
 
-	list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
-		if (cur->dev == &pdev->dev)
-			return 0;
-	return -EBUSY;
+	lockdep_assert_held(&dev_set->lock);
+
+	return vfio_pci_find_device_in_devset(dev_set, pdev) ? 0 : -EBUSY;
 }
 
 /*
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 53c72e26ecd3..3fcbc84d51ba 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -743,7 +743,10 @@ enum {
  *	-enospc = insufficient buffer, -enodev = unsupported for device.
  */
 struct vfio_pci_dependent_device {
-	__u32	group_id;
+	union {
+		__u32   group_id;
+		__u32	dev_id;
+	};
 	__u16	segment;
 	__u8	bus;
 	__u8	devfn; /* Use PCI_SLOT/PCI_FUNC */
@@ -752,6 +755,7 @@ struct vfio_pci_dependent_device {
 struct vfio_pci_hot_reset_info {
 	__u32	argsz;
 	__u32	flags;
+#define VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID	(1 << 0)
 	__u32	count;
 	struct vfio_pci_dependent_device	devices[];
 };
-- 
2.34.1

Regards,
Yi Liu

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* RE: [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
  2023-03-30 12:48                             ` Liu, Yi L
@ 2023-03-30 12:56                               ` Liu, Yi L
  2023-03-30 22:44                               ` Jason Gunthorpe
  1 sibling, 0 replies; 56+ messages in thread
From: Liu, Yi L @ 2023-03-30 12:56 UTC (permalink / raw)
  To: Alex Williamson, Tian, Kevin
  Cc: jgg, joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm,
	mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting

> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Thursday, March 30, 2023 8:48 PM

>  	if (fill->cur == fill->max)
>  		return -EAGAIN; /* Something changed, try again */
> @@ -791,7 +812,24 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
>  	if (!iommu_group)
>  		return -EPERM; /* Cannot reset non-isolated devices */
> 
> -	fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> +	/*
> +	 * If dev_id is needed, fill in the dev_id field, otherwise
> +	 * fill in group_id.
> +	 */
> +	if (fill->require_devid) {
> +		/*
> +		 * Report the devices that are opened as cdev and have
> +		 * the same iommufd with the fill->iommufd.  Otherwise,
> +		 * just fill in an IOMMUFD_INVALID_ID.
> +		 */
> +		vdev = vfio_pci_find_device_in_devset(dev_set, pdev);
> +		if (vdev && !vfio_device_cdev_opened(vdev) &&

a typo..it should be
		if (vdev && vfio_device_cdev_opened(vdev) &&

> +		    fill->iommufd == vfio_iommufd_physical_ictx(vdev))
> +			vfio_iommufd_physical_devid(vdev, &fill->devices[fill-
> >cur].dev_id);
> +		fill->devices[fill->cur].dev_id = IOMMUFD_INVALID_ID;
> +	} else {
> +		fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> +	}
>  	fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
>  	fill->devices[fill->cur].bus = pdev->bus->number;
>  	fill->devices[fill->cur].devfn = pdev->devfn;

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
  2023-03-30  1:17                               ` Tian, Kevin
@ 2023-03-30 22:38                                 ` Jason Gunthorpe
  0 siblings, 0 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2023-03-30 22:38 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Alex Williamson, Liu, Yi L, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390, Hao,
	Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting

On Thu, Mar 30, 2023 at 01:17:03AM +0000, Tian, Kevin wrote:

> In nesting series we'll have a GET_INFO ioctl per dev_id. It could be
> extended to report group_id too.

I like the GET_INFO because it would work with VDPA, is it possible?

Jason

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
  2023-03-30 12:48                             ` Liu, Yi L
  2023-03-30 12:56                               ` Liu, Yi L
@ 2023-03-30 22:44                               ` Jason Gunthorpe
  2023-03-30 23:05                                 ` Alex Williamson
  1 sibling, 1 reply; 56+ messages in thread
From: Jason Gunthorpe @ 2023-03-30 22:44 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Alex Williamson, Tian, Kevin, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390, Hao,
	Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting

On Thu, Mar 30, 2023 at 12:48:03PM +0000, Liu, Yi L wrote:
> +	/*
> +	 * If dev_id is needed, fill in the dev_id field, otherwise
> +	 * fill in group_id.
> +	 */
> +	if (fill->require_devid) {
> +		/*
> +		 * Report the devices that are opened as cdev and have
> +		 * the same iommufd with the fill->iommufd.  Otherwise,
> +		 * just fill in an IOMMUFD_INVALID_ID.
> +		 */
> +		vdev = vfio_pci_find_device_in_devset(dev_set, pdev);
> +		if (vdev && !vfio_device_cdev_opened(vdev) &&
> +		    fill->iommufd == vfio_iommufd_physical_ictx(vdev))
> +			vfio_iommufd_physical_devid(vdev, &fill->devices[fill->cur].dev_id);
> +		fill->devices[fill->cur].dev_id = IOMMUFD_INVALID_ID;

This needs an else?

I suggest to check for VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID on input
as well. I know the old kernels don't enforce this but at least we
could start enforcing it going forward so that the group path would
reject it to catch userspace bugs.

May as well fix it up to fully validate the flags

Jason

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
  2023-03-30 22:44                               ` Jason Gunthorpe
@ 2023-03-30 23:05                                 ` Alex Williamson
  2023-03-30 23:18                                   ` Jason Gunthorpe
  0 siblings, 1 reply; 56+ messages in thread
From: Alex Williamson @ 2023-03-30 23:05 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Liu, Yi L, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting

On Thu, 30 Mar 2023 19:44:55 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Thu, Mar 30, 2023 at 12:48:03PM +0000, Liu, Yi L wrote:
> > +	/*
> > +	 * If dev_id is needed, fill in the dev_id field, otherwise
> > +	 * fill in group_id.
> > +	 */
> > +	if (fill->require_devid) {
> > +		/*
> > +		 * Report the devices that are opened as cdev and have
> > +		 * the same iommufd with the fill->iommufd.  Otherwise,
> > +		 * just fill in an IOMMUFD_INVALID_ID.
> > +		 */
> > +		vdev = vfio_pci_find_device_in_devset(dev_set, pdev);
> > +		if (vdev && !vfio_device_cdev_opened(vdev) &&
> > +		    fill->iommufd == vfio_iommufd_physical_ictx(vdev))
> > +			vfio_iommufd_physical_devid(vdev, &fill->devices[fill->cur].dev_id);
> > +		fill->devices[fill->cur].dev_id = IOMMUFD_INVALID_ID;  
> 
> This needs an else?
> 
> I suggest to check for VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID on input
> as well. I know the old kernels don't enforce this but at least we
> could start enforcing it going forward so that the group path would
> reject it to catch userspace bugs.
> 
> May as well fix it up to fully validate the flags

Is this under the guise of "if nobody complains it's ok, otherwise
revert" plan?  We report dev-id based on the nature of the device, not
the provided flags, so I'm not sure I follow how this protects the group
path, unless we've failed to clear the output flags on that path with
this change.  Thanks,

Alex



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
  2023-03-30 23:05                                 ` Alex Williamson
@ 2023-03-30 23:18                                   ` Jason Gunthorpe
  0 siblings, 0 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2023-03-30 23:18 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Liu, Yi L, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting

On Thu, Mar 30, 2023 at 05:05:31PM -0600, Alex Williamson wrote:
> > I suggest to check for VFIO_PCI_HOT_RESET_FLAG_IOMMUFD_DEV_ID on input
> > as well. I know the old kernels don't enforce this but at least we
> > could start enforcing it going forward so that the group path would
> > reject it to catch userspace bugs.
> > 
> > May as well fix it up to fully validate the flags
> 
> Is this under the guise of "if nobody complains it's ok, otherwise
> revert" plan? 

Yah, assuming people don't pass in uninited structs.

Jason

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 03/10] vfio/pci: Move the existing hot reset logic to be a helper
  2023-03-27  9:34 ` [PATCH v2 03/10] vfio/pci: Move the existing hot reset logic to be a helper Yi Liu
@ 2023-03-30 23:39   ` Jason Gunthorpe
  2023-03-30 23:44   ` Jason Gunthorpe
  1 sibling, 0 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2023-03-30 23:39 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang

On Mon, Mar 27, 2023 at 02:34:51AM -0700, Yi Liu wrote:
> This prepares to add another method for hot reset. The major hot reset logic
> are moved to vfio_pci_ioctl_pci_hot_reset_groups().
> 
> No functional change is intended.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/pci/vfio_pci_core.c | 56 +++++++++++++++++++-------------
>  1 file changed, 33 insertions(+), 23 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 03/10] vfio/pci: Move the existing hot reset logic to be a helper
  2023-03-27  9:34 ` [PATCH v2 03/10] vfio/pci: Move the existing hot reset logic to be a helper Yi Liu
  2023-03-30 23:39   ` Jason Gunthorpe
@ 2023-03-30 23:44   ` Jason Gunthorpe
  1 sibling, 0 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2023-03-30 23:44 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang

On Mon, Mar 27, 2023 at 02:34:51AM -0700, Yi Liu wrote:
> This prepares to add another method for hot reset. The major hot reset logic
> are moved to vfio_pci_ioctl_pci_hot_reset_groups().
> 
> No functional change is intended.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/pci/vfio_pci_core.c | 56 +++++++++++++++++++-------------
>  1 file changed, 33 insertions(+), 23 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 04/10] vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for vfio_device
  2023-03-27  9:34 ` [PATCH v2 04/10] vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for vfio_device Yi Liu
@ 2023-03-30 23:44   ` Jason Gunthorpe
  0 siblings, 0 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2023-03-30 23:44 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang

On Mon, Mar 27, 2023 at 02:34:52AM -0700, Yi Liu wrote:
> This is needed by the vfio pci driver to report affected devices in the
> hot reset for a given device.
> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/iommu/iommufd/device.c | 12 ++++++++++++
>  drivers/vfio/iommufd.c         | 16 ++++++++++++++++
>  include/linux/iommufd.h        |  3 +++
>  include/linux/vfio.h           | 13 +++++++++++++
>  4 files changed, 44 insertions(+)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 05/10] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-03-27  9:34 ` [PATCH v2 05/10] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET Yi Liu
@ 2023-03-30 23:47   ` Jason Gunthorpe
  0 siblings, 0 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2023-03-30 23:47 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang

On Mon, Mar 27, 2023 at 02:34:53AM -0700, Yi Liu wrote:
> as an alternative method for ownership check when iommufd is used. In
> this case all opened devices in the affected dev_set are verified to
> be bound to a same valid iommufd value to allow reset. It's simpler
> and faster as user does not need to pass a set of fds and kernel no
> need to search the device within the given fds.
> 
> a device in noiommu mode doesn't have a valid iommufd, so this method
> should not be used in a dev_set which contains multiple devices and one
> of them is in noiommu. The only allowed noiommu scenario is that the
> calling device is noiommu and it's in a singleton dev_set.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/pci/vfio_pci_core.c | 42 +++++++++++++++++++++++++++-----
>  include/uapi/linux/vfio.h        |  9 ++++++-
>  2 files changed, 44 insertions(+), 7 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 06/10] vfio: Refine vfio file kAPIs for vfio PCI hot reset
  2023-03-27  9:34 ` [PATCH v2 06/10] vfio: Refine vfio file kAPIs for vfio PCI hot reset Yi Liu
@ 2023-03-30 23:48   ` Jason Gunthorpe
  0 siblings, 0 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2023-03-30 23:48 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang

On Mon, Mar 27, 2023 at 02:34:54AM -0700, Yi Liu wrote:
> This prepares vfio core to accept vfio device file from the vfio PCI
> hot reset path. vfio_file_is_group() is still kept for KVM usage.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/group.c             | 32 ++++++++++++++------------------
>  drivers/vfio/pci/vfio_pci_core.c |  4 ++--
>  drivers/vfio/vfio.h              |  2 ++
>  drivers/vfio/vfio_main.c         | 29 +++++++++++++++++++++++++++++
>  include/linux/vfio.h             |  1 +
>  5 files changed, 48 insertions(+), 20 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 07/10] vfio: Accpet device file from vfio PCI hot reset path
  2023-03-27  9:34 ` [PATCH v2 07/10] vfio: Accpet device file from vfio PCI hot reset path Yi Liu
@ 2023-03-30 23:49   ` Jason Gunthorpe
  0 siblings, 0 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2023-03-30 23:49 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang

On Mon, Mar 27, 2023 at 02:34:55AM -0700, Yi Liu wrote:
> This extends both vfio_file_is_valid() and vfio_file_has_dev() to accept
> device file from the vfio PCI hot reset.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/vfio_main.c | 23 +++++++++++++++++++----
>  1 file changed, 19 insertions(+), 4 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 09/10] vfio/pci: Accept device fd in VFIO_DEVICE_PCI_HOT_RESET ioctl
  2023-03-27  9:34 ` [PATCH v2 09/10] vfio/pci: Accept device fd in VFIO_DEVICE_PCI_HOT_RESET ioctl Yi Liu
@ 2023-03-30 23:50   ` Jason Gunthorpe
  0 siblings, 0 replies; 56+ messages in thread
From: Jason Gunthorpe @ 2023-03-30 23:50 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang

On Mon, Mar 27, 2023 at 02:34:57AM -0700, Yi Liu wrote:
> Now user can also provide an array of device fds as a 3rd method to verify
> the reset ownership. It's not useful at this point when the device fds are
> acquired via group fds. But it's necessary when moving to device cdev which
> allows the user to directly acquire device fds by skipping group. In that
> case this method can be used as a last resort when the preferred iommufd
> verification doesn't work, e.g. in noiommu usages.
> 
> Clarify it in uAPI.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/pci/vfio_pci_core.c | 9 +++++----
>  include/uapi/linux/vfio.h        | 3 ++-
>  2 files changed, 7 insertions(+), 5 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v2 00/10] Introduce new methods for verifying ownership in vfio PCI hot reset
  2023-03-27  9:34 [PATCH v2 00/10] Introduce new methods for verifying ownership in vfio PCI hot reset Yi Liu
                   ` (9 preceding siblings ...)
  2023-03-27  9:34 ` [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO Yi Liu
@ 2023-03-31  3:14 ` Jiang, Yanting
  2023-03-31 13:24   ` Alex Williamson
  2023-03-31  5:01 ` Jiang, Yanting
  2023-03-31 17:27 ` Xu, Terrence
  12 siblings, 1 reply; 56+ messages in thread
From: Jiang, Yanting @ 2023-03-31  3:14 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg, Tian, Kevin
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence

> 
> VFIO_DEVICE_PCI_HOT_RESET requires user to pass an array of group fds to
> prove that it owns all devices affected by resetting the calling device. This series
> introduces several extensions to allow the ownership check better aligned with
> iommufd and coming vfio device cdev support.
> 
> First, resetting an unopened device is always safe given nobody is using it. So
> relax the check to allow such devices not covered by group fd array. [1]
> 
> When iommufd is used we can simply verify that all affected devices are bound
> to a same iommufd then no need for the user to provide extra fd information.
> This is enabled by the user passing a zero-length fd array and moving forward
> this should be the preferred way for hot reset. [2]
> 
> However the iommufd method has difficulty working with noiommu devices
> since those devices don't have a valid iommufd, unless the noiommu device is in
> a singleton dev_set hence no ownership check is required. [3]
> 
> For noiommu backward compatibility a 3rd method is introduced by allowing the
> user to pass an array of device fds to prove ownership. [4]
> 
> As suggested by Jason [5], we have this series to introduce the above stuffs to
> the vfio PCI hot reset. Per the dicussion in [6], this series also adds a new _INFO
> ioctl to get hot reset scope for given device.
> 
Tested NIC passthrough on Intel platform.
Result looks good hence, 
Tested by: Jiang, Yanting <yanting.jiang@intel.com>

Thanks,
Yanting

^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v2 00/10] Introduce new methods for verifying ownership in vfio PCI hot reset
  2023-03-27  9:34 [PATCH v2 00/10] Introduce new methods for verifying ownership in vfio PCI hot reset Yi Liu
                   ` (10 preceding siblings ...)
  2023-03-31  3:14 ` [PATCH v2 00/10] Introduce new methods for verifying ownership in vfio PCI hot reset Jiang, Yanting
@ 2023-03-31  5:01 ` Jiang, Yanting
  2023-03-31 17:27 ` Xu, Terrence
  12 siblings, 0 replies; 56+ messages in thread
From: Jiang, Yanting @ 2023-03-31  5:01 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg, Tian, Kevin
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence

> VFIO_DEVICE_PCI_HOT_RESET requires user to pass an array of group fds to
> prove that it owns all devices affected by resetting the calling device. This series
> introduces several extensions to allow the ownership check better aligned with
> iommufd and coming vfio device cdev support.
> 
> First, resetting an unopened device is always safe given nobody is using it. So
> relax the check to allow such devices not covered by group fd array. [1]
> 
> When iommufd is used we can simply verify that all affected devices are bound
> to a same iommufd then no need for the user to provide extra fd information.
> This is enabled by the user passing a zero-length fd array and moving forward
> this should be the preferred way for hot reset. [2]
> 
> However the iommufd method has difficulty working with noiommu devices
> since those devices don't have a valid iommufd, unless the noiommu device is in
> a singleton dev_set hence no ownership check is required. [3]
> 
> For noiommu backward compatibility a 3rd method is introduced by allowing the
> user to pass an array of device fds to prove ownership. [4]
> 
> As suggested by Jason [5], we have this series to introduce the above stuffs to
> the vfio PCI hot reset. Per the dicussion in [6], this series also adds a new _INFO
> ioctl to get hot reset scope for given device.
> 

Tested-by: Yanting Jiang <yanting.jiang@intel.com>

Thanks,
Yanting

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 00/10] Introduce new methods for verifying ownership in vfio PCI hot reset
  2023-03-31  3:14 ` [PATCH v2 00/10] Introduce new methods for verifying ownership in vfio PCI hot reset Jiang, Yanting
@ 2023-03-31 13:24   ` Alex Williamson
  2023-04-03  2:04     ` Jiang, Yanting
  0 siblings, 1 reply; 56+ messages in thread
From: Alex Williamson @ 2023-03-31 13:24 UTC (permalink / raw)
  To: Jiang, Yanting
  Cc: Liu, Yi L, jgg, Tian, Kevin, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390, Hao,
	Xudong, Zhao, Yan Y, Xu, Terrence

On Fri, 31 Mar 2023 03:14:23 +0000
"Jiang, Yanting" <yanting.jiang@intel.com> wrote:

> > 
> > VFIO_DEVICE_PCI_HOT_RESET requires user to pass an array of group fds to
> > prove that it owns all devices affected by resetting the calling device. This series
> > introduces several extensions to allow the ownership check better aligned with
> > iommufd and coming vfio device cdev support.
> > 
> > First, resetting an unopened device is always safe given nobody is using it. So
> > relax the check to allow such devices not covered by group fd array. [1]
> > 
> > When iommufd is used we can simply verify that all affected devices are bound
> > to a same iommufd then no need for the user to provide extra fd information.
> > This is enabled by the user passing a zero-length fd array and moving forward
> > this should be the preferred way for hot reset. [2]
> > 
> > However the iommufd method has difficulty working with noiommu devices
> > since those devices don't have a valid iommufd, unless the noiommu device is in
> > a singleton dev_set hence no ownership check is required. [3]
> > 
> > For noiommu backward compatibility a 3rd method is introduced by allowing the
> > user to pass an array of device fds to prove ownership. [4]
> > 
> > As suggested by Jason [5], we have this series to introduce the above stuffs to
> > the vfio PCI hot reset. Per the dicussion in [6], this series also adds a new _INFO
> > ioctl to get hot reset scope for given device.
> >   
> Tested NIC passthrough on Intel platform.
> Result looks good hence, 
> Tested by: Jiang, Yanting <yanting.jiang@intel.com>

I'm not aware of any userspace that exercises this reset ioctl in cdev
mode.  Is this regression testing only?  Thanks,

Alex


^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v2 00/10] Introduce new methods for verifying ownership in vfio PCI hot reset
  2023-03-27  9:34 [PATCH v2 00/10] Introduce new methods for verifying ownership in vfio PCI hot reset Yi Liu
                   ` (11 preceding siblings ...)
  2023-03-31  5:01 ` Jiang, Yanting
@ 2023-03-31 17:27 ` Xu, Terrence
  2023-03-31 17:49   ` Alex Williamson
  12 siblings, 1 reply; 56+ messages in thread
From: Xu, Terrence @ 2023-03-31 17:27 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg, Tian, Kevin
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Jiang, Yanting


> -----Original Message-----
> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, March 27, 2023 5:35 PM
> 
> VFIO_DEVICE_PCI_HOT_RESET requires user to pass an array of group fds to
> prove that it owns all devices affected by resetting the calling device. This
> series introduces several extensions to allow the ownership check better
> aligned with iommufd and coming vfio device cdev support.
> 
> First, resetting an unopened device is always safe given nobody is using it. So
> relax the check to allow such devices not covered by group fd array. [1]
> 
> When iommufd is used we can simply verify that all affected devices are
> bound to a same iommufd then no need for the user to provide extra fd
> information. This is enabled by the user passing a zero-length fd array and
> moving forward this should be the preferred way for hot reset. [2]
> 
> However the iommufd method has difficulty working with noiommu devices
> since those devices don't have a valid iommufd, unless the noiommu device
> is in a singleton dev_set hence no ownership check is required. [3]
> 
> For noiommu backward compatibility a 3rd method is introduced by allowing
> the user to pass an array of device fds to prove ownership. [4]
> 
> As suggested by Jason [5], we have this series to introduce the above stuffs
> to the vfio PCI hot reset. Per the dicussion in [6], this series also adds a new
> _INFO ioctl to get hot reset scope for given device.
> 
> [1] https://lore.kernel.org/kvm/Y%2FdobS6gdSkxnPH7@nvidia.com/
> [2] https://lore.kernel.org/kvm/Y%2FZOOClu8nXy2toX@nvidia.com/#t
> [3] https://lore.kernel.org/kvm/ZACX+Np%2FIY7ygqL5@nvidia.com/
> [4]
> https://lore.kernel.org/kvm/DS0PR11MB7529BE88460582BD599DC1F7C3B19
> @DS0PR11MB7529.namprd11.prod.outlook.com/#t
> [5] https://lore.kernel.org/kvm/ZAcvzvhkt9QhCmdi@nvidia.com/
> [6] https://lore.kernel.org/kvm/ZBoYgNq60eDpV9Un@nvidia.com/
> 
> Change log:
> 
> v2:
>  - Split the patch 03 of v1 to be 03, 04 and 05 of v2 (Jaon)
>  - Add r-b from Kevin and Jason
>  - Add patch 10 to introduce a new _INFO ioctl for the usage of device
>    fd passing usage in cdev path (Jason, Alex)
> 
> v1: https://lore.kernel.org/kvm/20230316124156.12064-1-yi.l.liu@intel.com/
> 
> Regards,
> 	Yi Liu
> 
> Yi Liu (10):
>   vfio/pci: Update comment around group_fd get in
>     vfio_pci_ioctl_pci_hot_reset()
>   vfio/pci: Only check ownership of opened devices in hot reset
>   vfio/pci: Move the existing hot reset logic to be a helper
>   vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for
>     vfio_device
>   vfio/pci: Allow passing zero-length fd array in
>     VFIO_DEVICE_PCI_HOT_RESET
>   vfio: Refine vfio file kAPIs for vfio PCI hot reset
>   vfio: Accpet device file from vfio PCI hot reset path
>   vfio/pci: Renaming for accepting device fd in hot reset path
>   vfio/pci: Accept device fd in VFIO_DEVICE_PCI_HOT_RESET ioctl
>   vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
> 
>  drivers/iommu/iommufd/device.c   |  12 ++
>  drivers/vfio/group.c             |  32 ++--
>  drivers/vfio/iommufd.c           |  16 ++
>  drivers/vfio/pci/vfio_pci_core.c | 244 ++++++++++++++++++++++++-------
>  drivers/vfio/vfio.h              |   2 +
>  drivers/vfio/vfio_main.c         |  44 ++++++
>  include/linux/iommufd.h          |   3 +
>  include/linux/vfio.h             |  14 ++
>  include/uapi/linux/vfio.h        |  65 +++++++-
>  9 files changed, 364 insertions(+), 68 deletions(-)
> 
> --
> 2.34.1

Verified this series by "Intel GVT-g GPU device mediated passthrough".
Passed VFIO legacy mode / compat mode / cdev mode basic functionality and GPU force reset test.

Tested-by: Terrence Xu <terrence.xu@intel.com>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 00/10] Introduce new methods for verifying ownership in vfio PCI hot reset
  2023-03-31 17:27 ` Xu, Terrence
@ 2023-03-31 17:49   ` Alex Williamson
  2023-04-01  9:15     ` Xu, Terrence
  0 siblings, 1 reply; 56+ messages in thread
From: Alex Williamson @ 2023-03-31 17:49 UTC (permalink / raw)
  To: Xu, Terrence
  Cc: Liu, Yi L, jgg, Tian, Kevin, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390, Hao,
	Xudong, Zhao, Yan Y, Jiang, Yanting

On Fri, 31 Mar 2023 17:27:27 +0000
"Xu, Terrence" <terrence.xu@intel.com> wrote:

> > -----Original Message-----
> > From: Liu, Yi L <yi.l.liu@intel.com>
> > Sent: Monday, March 27, 2023 5:35 PM
> > 
> > VFIO_DEVICE_PCI_HOT_RESET requires user to pass an array of group fds to
> > prove that it owns all devices affected by resetting the calling device. This
> > series introduces several extensions to allow the ownership check better
> > aligned with iommufd and coming vfio device cdev support.
> > 
> > First, resetting an unopened device is always safe given nobody is using it. So
> > relax the check to allow such devices not covered by group fd array. [1]
> > 
> > When iommufd is used we can simply verify that all affected devices are
> > bound to a same iommufd then no need for the user to provide extra fd
> > information. This is enabled by the user passing a zero-length fd array and
> > moving forward this should be the preferred way for hot reset. [2]
> > 
> > However the iommufd method has difficulty working with noiommu devices
> > since those devices don't have a valid iommufd, unless the noiommu device
> > is in a singleton dev_set hence no ownership check is required. [3]
> > 
> > For noiommu backward compatibility a 3rd method is introduced by allowing
> > the user to pass an array of device fds to prove ownership. [4]
> > 
> > As suggested by Jason [5], we have this series to introduce the above stuffs
> > to the vfio PCI hot reset. Per the dicussion in [6], this series also adds a new
> > _INFO ioctl to get hot reset scope for given device.
> > 
> > [1] https://lore.kernel.org/kvm/Y%2FdobS6gdSkxnPH7@nvidia.com/
> > [2] https://lore.kernel.org/kvm/Y%2FZOOClu8nXy2toX@nvidia.com/#t
> > [3] https://lore.kernel.org/kvm/ZACX+Np%2FIY7ygqL5@nvidia.com/
> > [4]
> > https://lore.kernel.org/kvm/DS0PR11MB7529BE88460582BD599DC1F7C3B19
> > @DS0PR11MB7529.namprd11.prod.outlook.com/#t
> > [5] https://lore.kernel.org/kvm/ZAcvzvhkt9QhCmdi@nvidia.com/
> > [6] https://lore.kernel.org/kvm/ZBoYgNq60eDpV9Un@nvidia.com/
> > 
> > Change log:
> > 
> > v2:
> >  - Split the patch 03 of v1 to be 03, 04 and 05 of v2 (Jaon)
> >  - Add r-b from Kevin and Jason
> >  - Add patch 10 to introduce a new _INFO ioctl for the usage of device
> >    fd passing usage in cdev path (Jason, Alex)
> > 
> > v1: https://lore.kernel.org/kvm/20230316124156.12064-1-yi.l.liu@intel.com/
> > 
> > Regards,
> > 	Yi Liu
> > 
> > Yi Liu (10):
> >   vfio/pci: Update comment around group_fd get in
> >     vfio_pci_ioctl_pci_hot_reset()
> >   vfio/pci: Only check ownership of opened devices in hot reset
> >   vfio/pci: Move the existing hot reset logic to be a helper
> >   vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for
> >     vfio_device
> >   vfio/pci: Allow passing zero-length fd array in
> >     VFIO_DEVICE_PCI_HOT_RESET
> >   vfio: Refine vfio file kAPIs for vfio PCI hot reset
> >   vfio: Accpet device file from vfio PCI hot reset path
> >   vfio/pci: Renaming for accepting device fd in hot reset path
> >   vfio/pci: Accept device fd in VFIO_DEVICE_PCI_HOT_RESET ioctl
> >   vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
> > 
> >  drivers/iommu/iommufd/device.c   |  12 ++
> >  drivers/vfio/group.c             |  32 ++--
> >  drivers/vfio/iommufd.c           |  16 ++
> >  drivers/vfio/pci/vfio_pci_core.c | 244 ++++++++++++++++++++++++-------
> >  drivers/vfio/vfio.h              |   2 +
> >  drivers/vfio/vfio_main.c         |  44 ++++++
> >  include/linux/iommufd.h          |   3 +
> >  include/linux/vfio.h             |  14 ++
> >  include/uapi/linux/vfio.h        |  65 +++++++-
> >  9 files changed, 364 insertions(+), 68 deletions(-)
> > 
> > --
> > 2.34.1  
> 
> Verified this series by "Intel GVT-g GPU device mediated passthrough".
> Passed VFIO legacy mode / compat mode / cdev mode basic functionality and GPU force reset test.
> 
> Tested-by: Terrence Xu <terrence.xu@intel.com>

Seems like only this "GPU force reset test" is relevant to the new
functionality of this series, GVT-g does not and has no reason to
support the HOT_RESET ioctls used here.  Can you provide more details
of the force-reset test?  What userspace driver is being used?  Thanks,

Alex


^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v2 00/10] Introduce new methods for verifying ownership in vfio PCI hot reset
  2023-03-31 17:49   ` Alex Williamson
@ 2023-04-01  9:15     ` Xu, Terrence
  0 siblings, 0 replies; 56+ messages in thread
From: Xu, Terrence @ 2023-04-01  9:15 UTC (permalink / raw)
  To: Alex Williamson
  Cc: mjrosato, jasowang, Hao, Xudong, peterx, chao.p.peng, linux-s390,
	Liu, Yi L, kvm, lulu, Jiang, Yanting, joro, nicolinc, jgg, Tian,
	Kevin, Zhao, Yan Y, intel-gfx, eric.auger, intel-gvt-dev,
	yi.y.sun, cohuck, shameerali.kolothum.thodi,
	suravee.suthikulpanit, robin.murphy


> -----Original Message-----
> From: intel-gvt-dev <intel-gvt-dev-bounces@lists.freedesktop.org> On
> Behalf Of Alex Williamson
> Sent: Saturday, April 1, 2023 1:50 AM
> 
> On Fri, 31 Mar 2023 17:27:27 +0000
> "Xu, Terrence" <terrence.xu@intel.com> wrote:
> 
> > > -----Original Message-----
> > > From: Liu, Yi L <yi.l.liu@intel.com>
> > > Sent: Monday, March 27, 2023 5:35 PM
> > >
> > > VFIO_DEVICE_PCI_HOT_RESET requires user to pass an array of group
> > > fds to prove that it owns all devices affected by resetting the
> > > calling device. This series introduces several extensions to allow
> > > the ownership check better aligned with iommufd and coming vfio device
> cdev support.
> > >
> > > First, resetting an unopened device is always safe given nobody is
> > > using it. So relax the check to allow such devices not covered by
> > > group fd array. [1]
> > >
> > > When iommufd is used we can simply verify that all affected devices
> > > are bound to a same iommufd then no need for the user to provide
> > > extra fd information. This is enabled by the user passing a
> > > zero-length fd array and moving forward this should be the preferred
> > > way for hot reset. [2]
> > >
> > > However the iommufd method has difficulty working with noiommu
> > > devices since those devices don't have a valid iommufd, unless the
> > > noiommu device is in a singleton dev_set hence no ownership check is
> > > required. [3]
> > >
> > > For noiommu backward compatibility a 3rd method is introduced by
> > > allowing the user to pass an array of device fds to prove ownership.
> > > [4]
> > >
> > > As suggested by Jason [5], we have this series to introduce the
> > > above stuffs to the vfio PCI hot reset. Per the dicussion in [6],
> > > this series also adds a new _INFO ioctl to get hot reset scope for given
> device.
> > >
> > > [1] https://lore.kernel.org/kvm/Y%2FdobS6gdSkxnPH7@nvidia.com/
> > > [2] https://lore.kernel.org/kvm/Y%2FZOOClu8nXy2toX@nvidia.com/#t
> > > [3] https://lore.kernel.org/kvm/ZACX+Np%2FIY7ygqL5@nvidia.com/
> > > [4]
> > >
> https://lore.kernel.org/kvm/DS0PR11MB7529BE88460582BD599DC1F7C3B19
> > > @DS0PR11MB7529.namprd11.prod.outlook.com/#t
> > > [5] https://lore.kernel.org/kvm/ZAcvzvhkt9QhCmdi@nvidia.com/
> > > [6] https://lore.kernel.org/kvm/ZBoYgNq60eDpV9Un@nvidia.com/
> > >
> > > Change log:
> > >
> > > v2:
> > >  - Split the patch 03 of v1 to be 03, 04 and 05 of v2 (Jaon)
> > >  - Add r-b from Kevin and Jason
> > >  - Add patch 10 to introduce a new _INFO ioctl for the usage of device
> > >    fd passing usage in cdev path (Jason, Alex)
> > >
> > > v1:
> > > https://lore.kernel.org/kvm/20230316124156.12064-1-yi.l.liu@intel.co
> > > m/
> > >
> > > Regards,
> > > 	Yi Liu
> > >
> > > Yi Liu (10):
> > >   vfio/pci: Update comment around group_fd get in
> > >     vfio_pci_ioctl_pci_hot_reset()
> > >   vfio/pci: Only check ownership of opened devices in hot reset
> > >   vfio/pci: Move the existing hot reset logic to be a helper
> > >   vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for
> > >     vfio_device
> > >   vfio/pci: Allow passing zero-length fd array in
> > >     VFIO_DEVICE_PCI_HOT_RESET
> > >   vfio: Refine vfio file kAPIs for vfio PCI hot reset
> > >   vfio: Accpet device file from vfio PCI hot reset path
> > >   vfio/pci: Renaming for accepting device fd in hot reset path
> > >   vfio/pci: Accept device fd in VFIO_DEVICE_PCI_HOT_RESET ioctl
> > >   vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO
> > >
> > >  drivers/iommu/iommufd/device.c   |  12 ++
> > >  drivers/vfio/group.c             |  32 ++--
> > >  drivers/vfio/iommufd.c           |  16 ++
> > >  drivers/vfio/pci/vfio_pci_core.c | 244 ++++++++++++++++++++++++----
> ---
> > >  drivers/vfio/vfio.h              |   2 +
> > >  drivers/vfio/vfio_main.c         |  44 ++++++
> > >  include/linux/iommufd.h          |   3 +
> > >  include/linux/vfio.h             |  14 ++
> > >  include/uapi/linux/vfio.h        |  65 +++++++-
> > >  9 files changed, 364 insertions(+), 68 deletions(-)
> > >
> > > --
> > > 2.34.1
> >
> > Verified this series by "Intel GVT-g GPU device mediated passthrough".
> > Passed VFIO legacy mode / compat mode / cdev mode basic functionality
> and GPU force reset test.
> >
> > Tested-by: Terrence Xu <terrence.xu@intel.com>
> 
> Seems like only this "GPU force reset test" is relevant to the new
> functionality of this series, GVT-g does not and has no reason to support the
> HOT_RESET ioctls used here.  Can you provide more details of the force-reset
> test?  What userspace driver is being used?  Thanks,
> 
> Alex
Hi Alex, about the "GPU force reset test", I used the "i915_hangman" test from intel-gpu-tools, it is for GPU force hang / reset. 
It is an important regression test scenario for this patch series. 
To test the HOT_RESET ioctls itself, need to wait the corresponding Qemu changes from Yi.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* RE: [PATCH v2 00/10] Introduce new methods for verifying ownership in vfio PCI hot reset
  2023-03-31 13:24   ` Alex Williamson
@ 2023-04-03  2:04     ` Jiang, Yanting
  0 siblings, 0 replies; 56+ messages in thread
From: Jiang, Yanting @ 2023-04-03  2:04 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Liu, Yi L, jgg, Tian, Kevin, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390, Hao,
	Xudong, Zhao, Yan Y, Xu, Terrence



> -----Original Message-----
> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, March 31, 2023 9:25 PM
> To: Jiang, Yanting <yanting.jiang@intel.com>
> Cc: Liu, Yi L <yi.l.liu@intel.com>; jgg@nvidia.com; Tian, Kevin
> <kevin.tian@intel.com>; joro@8bytes.org; robin.murphy@arm.com;
> cohuck@redhat.com; eric.auger@redhat.com; nicolinc@nvidia.com;
> kvm@vger.kernel.org; mjrosato@linux.ibm.com; chao.p.peng@linux.intel.com;
> yi.y.sun@linux.intel.com; peterx@redhat.com; jasowang@redhat.com;
> shameerali.kolothum.thodi@huawei.com; lulu@redhat.com;
> suravee.suthikulpanit@amd.com; intel-gvt-dev@lists.freedesktop.org; intel-
> gfx@lists.freedesktop.org; linux-s390@vger.kernel.org; Hao, Xudong
> <xudong.hao@intel.com>; Zhao, Yan Y <yan.y.zhao@intel.com>; Xu, Terrence
> <terrence.xu@intel.com>
> Subject: Re: [PATCH v2 00/10] Introduce new methods for verifying ownership in
> vfio PCI hot reset
> 
> On Fri, 31 Mar 2023 03:14:23 +0000
> "Jiang, Yanting" <yanting.jiang@intel.com> wrote:
> 
> > >
> > > VFIO_DEVICE_PCI_HOT_RESET requires user to pass an array of group
> > > fds to prove that it owns all devices affected by resetting the
> > > calling device. This series introduces several extensions to allow
> > > the ownership check better aligned with iommufd and coming vfio device
> cdev support.
> > >
> > > First, resetting an unopened device is always safe given nobody is
> > > using it. So relax the check to allow such devices not covered by
> > > group fd array. [1]
> > >
> > > When iommufd is used we can simply verify that all affected devices
> > > are bound to a same iommufd then no need for the user to provide extra fd
> information.
> > > This is enabled by the user passing a zero-length fd array and
> > > moving forward this should be the preferred way for hot reset. [2]
> > >
> > > However the iommufd method has difficulty working with noiommu
> > > devices since those devices don't have a valid iommufd, unless the
> > > noiommu device is in a singleton dev_set hence no ownership check is
> > > required. [3]
> > >
> > > For noiommu backward compatibility a 3rd method is introduced by
> > > allowing the user to pass an array of device fds to prove ownership.
> > > [4]
> > >
> > > As suggested by Jason [5], we have this series to introduce the
> > > above stuffs to the vfio PCI hot reset. Per the dicussion in [6],
> > > this series also adds a new _INFO ioctl to get hot reset scope for given
> device.
> > >
> > Tested NIC passthrough on Intel platform.
> > Result looks good hence,
> > Tested by: Jiang, Yanting <yanting.jiang@intel.com>
> 
> I'm not aware of any userspace that exercises this reset ioctl in cdev mode.  Is
> this regression testing only?  Thanks,
> 
> Alex

Hi Alex, 

Yes, only regression testing and some negative testing for NIC passthrough with legacy vfio mode, vfio iommufd compat mode, and cdev mode.

Thanks,
Yanting



^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2023-04-03  2:04 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-27  9:34 [PATCH v2 00/10] Introduce new methods for verifying ownership in vfio PCI hot reset Yi Liu
2023-03-27  9:34 ` [PATCH v2 01/10] vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset() Yi Liu
2023-03-27  9:34 ` [PATCH v2 02/10] vfio/pci: Only check ownership of opened devices in hot reset Yi Liu
2023-03-27  9:34 ` [PATCH v2 03/10] vfio/pci: Move the existing hot reset logic to be a helper Yi Liu
2023-03-30 23:39   ` Jason Gunthorpe
2023-03-30 23:44   ` Jason Gunthorpe
2023-03-27  9:34 ` [PATCH v2 04/10] vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for vfio_device Yi Liu
2023-03-30 23:44   ` Jason Gunthorpe
2023-03-27  9:34 ` [PATCH v2 05/10] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET Yi Liu
2023-03-30 23:47   ` Jason Gunthorpe
2023-03-27  9:34 ` [PATCH v2 06/10] vfio: Refine vfio file kAPIs for vfio PCI hot reset Yi Liu
2023-03-30 23:48   ` Jason Gunthorpe
2023-03-27  9:34 ` [PATCH v2 07/10] vfio: Accpet device file from vfio PCI hot reset path Yi Liu
2023-03-30 23:49   ` Jason Gunthorpe
2023-03-27  9:34 ` [PATCH v2 08/10] vfio/pci: Renaming for accepting device fd in " Yi Liu
2023-03-27  9:34 ` [PATCH v2 09/10] vfio/pci: Accept device fd in VFIO_DEVICE_PCI_HOT_RESET ioctl Yi Liu
2023-03-30 23:50   ` Jason Gunthorpe
2023-03-27  9:34 ` [PATCH v2 10/10] vfio/pci: Add VFIO_DEVICE_GET_PCI_HOT_RESET_GROUP_INFO Yi Liu
2023-03-27 19:26   ` Alex Williamson
2023-03-27 20:40     ` Alex Williamson
2023-03-28  3:45       ` Liu, Yi L
2023-03-28  3:32     ` Liu, Yi L
2023-03-28  6:19       ` Tian, Kevin
2023-03-28 14:25         ` Alex Williamson
2023-03-28 14:38           ` Liu, Yi L
2023-03-28 14:46             ` Alex Williamson
2023-03-28 15:00               ` Liu, Yi L
2023-03-28 15:18                 ` Alex Williamson
2023-03-28 15:45                   ` Liu, Yi L
2023-03-28 16:00                     ` Alex Williamson
2023-03-29  3:13                       ` Liu, Yi L
2023-03-29  9:41                         ` Tian, Kevin
2023-03-29 15:49                           ` Alex Williamson
2023-03-29 15:57                             ` Jason Gunthorpe
2023-03-30  1:17                               ` Tian, Kevin
2023-03-30 22:38                                 ` Jason Gunthorpe
2023-03-30 12:48                             ` Liu, Yi L
2023-03-30 12:56                               ` Liu, Yi L
2023-03-30 22:44                               ` Jason Gunthorpe
2023-03-30 23:05                                 ` Alex Williamson
2023-03-30 23:18                                   ` Jason Gunthorpe
2023-03-29 15:50                           ` Jason Gunthorpe
2023-03-30  1:10                             ` Tian, Kevin
2023-03-30  1:33                               ` Tian, Kevin
2023-03-28 16:29                   ` Jason Gunthorpe
2023-03-28 19:09                     ` Alex Williamson
2023-03-28 19:22                       ` Jason Gunthorpe
2023-03-28 12:40   ` Jason Gunthorpe
2023-03-28 14:45     ` Liu, Yi L
2023-03-31  3:14 ` [PATCH v2 00/10] Introduce new methods for verifying ownership in vfio PCI hot reset Jiang, Yanting
2023-03-31 13:24   ` Alex Williamson
2023-04-03  2:04     ` Jiang, Yanting
2023-03-31  5:01 ` Jiang, Yanting
2023-03-31 17:27 ` Xu, Terrence
2023-03-31 17:49   ` Alex Williamson
2023-04-01  9:15     ` Xu, Terrence

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).