All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/9] Enhance vfio PCI hot reset for vfio cdev device
@ 2023-06-02 12:15 ` Yi Liu
  0 siblings, 0 replies; 77+ messages in thread
From: Yi Liu @ 2023-06-02 12:15 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

VFIO_DEVICE_PCI_HOT_RESET requires user to pass an array of group fds
to prove that it owns all devices affected by resetting the calling
device. While for cdev devices, user can use an iommufd-based ownership
checking model and invoke VFIO_DEVICE_PCI_HOT_RESET with a zero-length
fd array.

This series extends VFIO_DEVICE_GET_PCI_HOT_RESET_INFO to check ownership
and return the check result and the devid of affected devices to user. In
the end, extends the VFIO_DEVICE_PCI_HOT_RESET to accept zero-length fd
array for hot-reset with cdev devices.

The new hot reset method and updated _INFO ioctl are tested with the
below qemu:

https://github.com/yiliu1765/qemu/tree/iommufd_rfcv4.mig.reset.v4_var3
(requires to test with the cdev kernel)

Change log:

v7:
 - Drop noiommu support (patch 01 of v6 is dropped)
 - Remove helpers to get devid and ictx for iommufd_access
 - Document the dev_set representative requirement in the
   VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for the cdev opened device (Alex)
 - zero-length fd array approach is only for cdev opened device (Alex)

v6: https://lore.kernel.org/kvm/20230522115751.326947-1-yi.l.liu@intel.com/
 - Remove noiommu_access, reuse iommufd_access instead (Alex)
 - vfio_iommufd_physical_ictx -> vfio_iommufd_device_ictx
 - vfio_iommufd_physical_devid -> vfio_iommufd_device_hot_reset_devid
 - Refine logic in patch 9 and 10 of v5, no uapi change. (Alex)
 - Remove lockdep asset in vfio_pci_is_device_in_set (Cédric)
 - Add t-b from Terrence (Tested GVT-g / GVT-d VFIO legacy mode / compat mode
   / cdev mode, including negative tests. No regression be introduced.)

v5: https://lore.kernel.org/kvm/20230513132136.15021-1-yi.l.liu@intel.com/
 - Drop patch 01 of v4 (Alex)
 - Create noiommu_access for noiommu devices (Jason)
 - Reserve all negative iommufd IDs, hence VFIO can encode negative
   values (Jason)
 - Make vfio_iommufd_physical_devid() return -EINVAL if it's not called
   with a physical device or a noiommu device.
 - Add vfio_find_device_in_devset() in vfio_main.c (Alex)
 - Add iommufd_ctx_has_group() to replace vfio_devset_iommufd_has_group().
   Reason: vfio_devset_iommufd_has_group() only loops the devices within
   the given devset to check the iommufd an iommu_group, but an iommu_group
   can span into multiple devsets. So if failed to find the group in a
   devset doesn't mean the group is not owned by the iommufd. So here either
   needs to search all the devsets or add an iommufd API to check it. It
   appears an iommufd API makes more sense.
 - Adopt suggestions from Alex on patch 08 and 09 of v4, refine the hot-reset
   uapi description and minor tweaks
 - Use bitfields for bool members (Alex)

v4: https://lore.kernel.org/kvm/20230426145419.450922-1-yi.l.liu@intel.com/
 - Rename the patch series subject
 - Patch 01 is moved from the cdev series
 - Patch 02, 06 are new per review comments in v3
 - Patch 03/04/05/07/08/09 are from v3 with updates

v3: https://lore.kernel.org/kvm/20230401144429.88673-1-yi.l.liu@intel.com/
 - Remove the new _INFO ioctl of v2, extend the existing _INFO ioctl to
   report devid (Alex)
 - Add r-b from Jason
 - Add t-b from Terrence Xu and Yanting Jiang (mainly regression test)

v2: https://lore.kernel.org/kvm/20230327093458.44939-1-yi.l.liu@intel.com/
 - Split the patch 03 of v1 to be 03, 04 and 05 of v2 (Jaon)
 - Add r-b from Kevin and Jason
 - Add patch 10 to introduce a new _INFO ioctl for the usage of device
   fd passing usage in cdev path (Jason, Alex)

v1: https://lore.kernel.org/kvm/20230316124156.12064-1-yi.l.liu@intel.com/

Regards,
	Yi Liu

Yi Liu (9):
  vfio/pci: Update comment around group_fd get in
    vfio_pci_ioctl_pci_hot_reset()
  vfio/pci: Move the existing hot reset logic to be a helper
  iommufd: Reserve all negative IDs in the iommufd xarray
  iommufd: Add iommufd_ctx_has_group()
  iommufd: Add helper to retrieve iommufd_ctx and devid
  vfio: Mark cdev usage in vfio_device
  vfio: Add helper to search vfio_device in a dev_set
  vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device
    cdev
  vfio/pci: Allow passing zero-length fd array in
    VFIO_DEVICE_PCI_HOT_RESET

 drivers/iommu/iommufd/device.c   |  42 ++++++++
 drivers/iommu/iommufd/main.c     |   2 +-
 drivers/vfio/iommufd.c           |  49 +++++++++
 drivers/vfio/pci/vfio_pci_core.c | 170 ++++++++++++++++++++++---------
 drivers/vfio/vfio_main.c         |  15 +++
 include/linux/iommufd.h          |  11 ++
 include/linux/vfio.h             |  24 +++++
 include/uapi/linux/vfio.h        |  64 +++++++++++-
 8 files changed, 328 insertions(+), 49 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH v7 0/9] Enhance vfio PCI hot reset for vfio cdev device
@ 2023-06-02 12:15 ` Yi Liu
  0 siblings, 0 replies; 77+ messages in thread
From: Yi Liu @ 2023-06-02 12:15 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

VFIO_DEVICE_PCI_HOT_RESET requires user to pass an array of group fds
to prove that it owns all devices affected by resetting the calling
device. While for cdev devices, user can use an iommufd-based ownership
checking model and invoke VFIO_DEVICE_PCI_HOT_RESET with a zero-length
fd array.

This series extends VFIO_DEVICE_GET_PCI_HOT_RESET_INFO to check ownership
and return the check result and the devid of affected devices to user. In
the end, extends the VFIO_DEVICE_PCI_HOT_RESET to accept zero-length fd
array for hot-reset with cdev devices.

The new hot reset method and updated _INFO ioctl are tested with the
below qemu:

https://github.com/yiliu1765/qemu/tree/iommufd_rfcv4.mig.reset.v4_var3
(requires to test with the cdev kernel)

Change log:

v7:
 - Drop noiommu support (patch 01 of v6 is dropped)
 - Remove helpers to get devid and ictx for iommufd_access
 - Document the dev_set representative requirement in the
   VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for the cdev opened device (Alex)
 - zero-length fd array approach is only for cdev opened device (Alex)

v6: https://lore.kernel.org/kvm/20230522115751.326947-1-yi.l.liu@intel.com/
 - Remove noiommu_access, reuse iommufd_access instead (Alex)
 - vfio_iommufd_physical_ictx -> vfio_iommufd_device_ictx
 - vfio_iommufd_physical_devid -> vfio_iommufd_device_hot_reset_devid
 - Refine logic in patch 9 and 10 of v5, no uapi change. (Alex)
 - Remove lockdep asset in vfio_pci_is_device_in_set (Cédric)
 - Add t-b from Terrence (Tested GVT-g / GVT-d VFIO legacy mode / compat mode
   / cdev mode, including negative tests. No regression be introduced.)

v5: https://lore.kernel.org/kvm/20230513132136.15021-1-yi.l.liu@intel.com/
 - Drop patch 01 of v4 (Alex)
 - Create noiommu_access for noiommu devices (Jason)
 - Reserve all negative iommufd IDs, hence VFIO can encode negative
   values (Jason)
 - Make vfio_iommufd_physical_devid() return -EINVAL if it's not called
   with a physical device or a noiommu device.
 - Add vfio_find_device_in_devset() in vfio_main.c (Alex)
 - Add iommufd_ctx_has_group() to replace vfio_devset_iommufd_has_group().
   Reason: vfio_devset_iommufd_has_group() only loops the devices within
   the given devset to check the iommufd an iommu_group, but an iommu_group
   can span into multiple devsets. So if failed to find the group in a
   devset doesn't mean the group is not owned by the iommufd. So here either
   needs to search all the devsets or add an iommufd API to check it. It
   appears an iommufd API makes more sense.
 - Adopt suggestions from Alex on patch 08 and 09 of v4, refine the hot-reset
   uapi description and minor tweaks
 - Use bitfields for bool members (Alex)

v4: https://lore.kernel.org/kvm/20230426145419.450922-1-yi.l.liu@intel.com/
 - Rename the patch series subject
 - Patch 01 is moved from the cdev series
 - Patch 02, 06 are new per review comments in v3
 - Patch 03/04/05/07/08/09 are from v3 with updates

v3: https://lore.kernel.org/kvm/20230401144429.88673-1-yi.l.liu@intel.com/
 - Remove the new _INFO ioctl of v2, extend the existing _INFO ioctl to
   report devid (Alex)
 - Add r-b from Jason
 - Add t-b from Terrence Xu and Yanting Jiang (mainly regression test)

v2: https://lore.kernel.org/kvm/20230327093458.44939-1-yi.l.liu@intel.com/
 - Split the patch 03 of v1 to be 03, 04 and 05 of v2 (Jaon)
 - Add r-b from Kevin and Jason
 - Add patch 10 to introduce a new _INFO ioctl for the usage of device
   fd passing usage in cdev path (Jason, Alex)

v1: https://lore.kernel.org/kvm/20230316124156.12064-1-yi.l.liu@intel.com/

Regards,
	Yi Liu

Yi Liu (9):
  vfio/pci: Update comment around group_fd get in
    vfio_pci_ioctl_pci_hot_reset()
  vfio/pci: Move the existing hot reset logic to be a helper
  iommufd: Reserve all negative IDs in the iommufd xarray
  iommufd: Add iommufd_ctx_has_group()
  iommufd: Add helper to retrieve iommufd_ctx and devid
  vfio: Mark cdev usage in vfio_device
  vfio: Add helper to search vfio_device in a dev_set
  vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device
    cdev
  vfio/pci: Allow passing zero-length fd array in
    VFIO_DEVICE_PCI_HOT_RESET

 drivers/iommu/iommufd/device.c   |  42 ++++++++
 drivers/iommu/iommufd/main.c     |   2 +-
 drivers/vfio/iommufd.c           |  49 +++++++++
 drivers/vfio/pci/vfio_pci_core.c | 170 ++++++++++++++++++++++---------
 drivers/vfio/vfio_main.c         |  15 +++
 include/linux/iommufd.h          |  11 ++
 include/linux/vfio.h             |  24 +++++
 include/uapi/linux/vfio.h        |  64 +++++++++++-
 8 files changed, 328 insertions(+), 49 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v7 1/9] vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset()
  2023-06-02 12:15 ` [Intel-gfx] " Yi Liu
@ 2023-06-02 12:15   ` Yi Liu
  -1 siblings, 0 replies; 77+ messages in thread
From: Yi Liu @ 2023-06-02 12:15 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

This suits more on what the code does.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index a5ab416cf476..f824de4dbf27 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1308,9 +1308,8 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	}
 
 	/*
-	 * For each group_fd, get the group through the vfio external user
-	 * interface and store the group and iommu ID.  This ensures the group
-	 * is held across the reset.
+	 * Get the group file for each fd to ensure the group is held across
+	 * the reset
 	 */
 	for (file_idx = 0; file_idx < hdr.count; file_idx++) {
 		struct file *file = fget(group_fds[file_idx]);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH v7 1/9] vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset()
@ 2023-06-02 12:15   ` Yi Liu
  0 siblings, 0 replies; 77+ messages in thread
From: Yi Liu @ 2023-06-02 12:15 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

This suits more on what the code does.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index a5ab416cf476..f824de4dbf27 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1308,9 +1308,8 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	}
 
 	/*
-	 * For each group_fd, get the group through the vfio external user
-	 * interface and store the group and iommu ID.  This ensures the group
-	 * is held across the reset.
+	 * Get the group file for each fd to ensure the group is held across
+	 * the reset
 	 */
 	for (file_idx = 0; file_idx < hdr.count; file_idx++) {
 		struct file *file = fget(group_fds[file_idx]);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v7 2/9] vfio/pci: Move the existing hot reset logic to be a helper
  2023-06-02 12:15 ` [Intel-gfx] " Yi Liu
@ 2023-06-02 12:15   ` Yi Liu
  -1 siblings, 0 replies; 77+ messages in thread
From: Yi Liu @ 2023-06-02 12:15 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

This prepares to add another method for hot reset. The major hot reset logic
are moved to vfio_pci_ioctl_pci_hot_reset_groups().

No functional change is intended.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 55 +++++++++++++++++++-------------
 1 file changed, 32 insertions(+), 23 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index f824de4dbf27..39e7823088e7 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1255,29 +1255,16 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
 	return ret;
 }
 
-static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
-					struct vfio_pci_hot_reset __user *arg)
+static int
+vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
+				    int array_count, bool slot,
+				    struct vfio_pci_hot_reset __user *arg)
 {
-	unsigned long minsz = offsetofend(struct vfio_pci_hot_reset, count);
-	struct vfio_pci_hot_reset hdr;
 	int32_t *group_fds;
 	struct file **files;
 	struct vfio_pci_group_info info;
-	bool slot = false;
 	int file_idx, count = 0, ret = 0;
 
-	if (copy_from_user(&hdr, arg, minsz))
-		return -EFAULT;
-
-	if (hdr.argsz < minsz || hdr.flags)
-		return -EINVAL;
-
-	/* Can we do a slot or bus reset or neither? */
-	if (!pci_probe_reset_slot(vdev->pdev->slot))
-		slot = true;
-	else if (pci_probe_reset_bus(vdev->pdev->bus))
-		return -ENODEV;
-
 	/*
 	 * We can't let userspace give us an arbitrarily large buffer to copy,
 	 * so verify how many we think there could be.  Note groups can have
@@ -1289,11 +1276,11 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 		return ret;
 
 	/* Somewhere between 1 and count is OK */
-	if (!hdr.count || hdr.count > count)
+	if (!array_count || array_count > count)
 		return -EINVAL;
 
-	group_fds = kcalloc(hdr.count, sizeof(*group_fds), GFP_KERNEL);
-	files = kcalloc(hdr.count, sizeof(*files), GFP_KERNEL);
+	group_fds = kcalloc(array_count, sizeof(*group_fds), GFP_KERNEL);
+	files = kcalloc(array_count, sizeof(*files), GFP_KERNEL);
 	if (!group_fds || !files) {
 		kfree(group_fds);
 		kfree(files);
@@ -1301,7 +1288,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	}
 
 	if (copy_from_user(group_fds, arg->group_fds,
-			   hdr.count * sizeof(*group_fds))) {
+			   array_count * sizeof(*group_fds))) {
 		kfree(group_fds);
 		kfree(files);
 		return -EFAULT;
@@ -1311,7 +1298,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	 * Get the group file for each fd to ensure the group is held across
 	 * the reset
 	 */
-	for (file_idx = 0; file_idx < hdr.count; file_idx++) {
+	for (file_idx = 0; file_idx < array_count; file_idx++) {
 		struct file *file = fget(group_fds[file_idx]);
 
 		if (!file) {
@@ -1335,7 +1322,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	if (ret)
 		goto hot_reset_release;
 
-	info.count = hdr.count;
+	info.count = array_count;
 	info.files = files;
 
 	ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info);
@@ -1348,6 +1335,28 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	return ret;
 }
 
+static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
+					struct vfio_pci_hot_reset __user *arg)
+{
+	unsigned long minsz = offsetofend(struct vfio_pci_hot_reset, count);
+	struct vfio_pci_hot_reset hdr;
+	bool slot = false;
+
+	if (copy_from_user(&hdr, arg, minsz))
+		return -EFAULT;
+
+	if (hdr.argsz < minsz || hdr.flags)
+		return -EINVAL;
+
+	/* Can we do a slot or bus reset or neither? */
+	if (!pci_probe_reset_slot(vdev->pdev->slot))
+		slot = true;
+	else if (pci_probe_reset_bus(vdev->pdev->bus))
+		return -ENODEV;
+
+	return vfio_pci_ioctl_pci_hot_reset_groups(vdev, hdr.count, slot, arg);
+}
+
 static int vfio_pci_ioctl_ioeventfd(struct vfio_pci_core_device *vdev,
 				    struct vfio_device_ioeventfd __user *arg)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH v7 2/9] vfio/pci: Move the existing hot reset logic to be a helper
@ 2023-06-02 12:15   ` Yi Liu
  0 siblings, 0 replies; 77+ messages in thread
From: Yi Liu @ 2023-06-02 12:15 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

This prepares to add another method for hot reset. The major hot reset logic
are moved to vfio_pci_ioctl_pci_hot_reset_groups().

No functional change is intended.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 55 +++++++++++++++++++-------------
 1 file changed, 32 insertions(+), 23 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index f824de4dbf27..39e7823088e7 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1255,29 +1255,16 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
 	return ret;
 }
 
-static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
-					struct vfio_pci_hot_reset __user *arg)
+static int
+vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
+				    int array_count, bool slot,
+				    struct vfio_pci_hot_reset __user *arg)
 {
-	unsigned long minsz = offsetofend(struct vfio_pci_hot_reset, count);
-	struct vfio_pci_hot_reset hdr;
 	int32_t *group_fds;
 	struct file **files;
 	struct vfio_pci_group_info info;
-	bool slot = false;
 	int file_idx, count = 0, ret = 0;
 
-	if (copy_from_user(&hdr, arg, minsz))
-		return -EFAULT;
-
-	if (hdr.argsz < minsz || hdr.flags)
-		return -EINVAL;
-
-	/* Can we do a slot or bus reset or neither? */
-	if (!pci_probe_reset_slot(vdev->pdev->slot))
-		slot = true;
-	else if (pci_probe_reset_bus(vdev->pdev->bus))
-		return -ENODEV;
-
 	/*
 	 * We can't let userspace give us an arbitrarily large buffer to copy,
 	 * so verify how many we think there could be.  Note groups can have
@@ -1289,11 +1276,11 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 		return ret;
 
 	/* Somewhere between 1 and count is OK */
-	if (!hdr.count || hdr.count > count)
+	if (!array_count || array_count > count)
 		return -EINVAL;
 
-	group_fds = kcalloc(hdr.count, sizeof(*group_fds), GFP_KERNEL);
-	files = kcalloc(hdr.count, sizeof(*files), GFP_KERNEL);
+	group_fds = kcalloc(array_count, sizeof(*group_fds), GFP_KERNEL);
+	files = kcalloc(array_count, sizeof(*files), GFP_KERNEL);
 	if (!group_fds || !files) {
 		kfree(group_fds);
 		kfree(files);
@@ -1301,7 +1288,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	}
 
 	if (copy_from_user(group_fds, arg->group_fds,
-			   hdr.count * sizeof(*group_fds))) {
+			   array_count * sizeof(*group_fds))) {
 		kfree(group_fds);
 		kfree(files);
 		return -EFAULT;
@@ -1311,7 +1298,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	 * Get the group file for each fd to ensure the group is held across
 	 * the reset
 	 */
-	for (file_idx = 0; file_idx < hdr.count; file_idx++) {
+	for (file_idx = 0; file_idx < array_count; file_idx++) {
 		struct file *file = fget(group_fds[file_idx]);
 
 		if (!file) {
@@ -1335,7 +1322,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	if (ret)
 		goto hot_reset_release;
 
-	info.count = hdr.count;
+	info.count = array_count;
 	info.files = files;
 
 	ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info);
@@ -1348,6 +1335,28 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	return ret;
 }
 
+static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
+					struct vfio_pci_hot_reset __user *arg)
+{
+	unsigned long minsz = offsetofend(struct vfio_pci_hot_reset, count);
+	struct vfio_pci_hot_reset hdr;
+	bool slot = false;
+
+	if (copy_from_user(&hdr, arg, minsz))
+		return -EFAULT;
+
+	if (hdr.argsz < minsz || hdr.flags)
+		return -EINVAL;
+
+	/* Can we do a slot or bus reset or neither? */
+	if (!pci_probe_reset_slot(vdev->pdev->slot))
+		slot = true;
+	else if (pci_probe_reset_bus(vdev->pdev->bus))
+		return -ENODEV;
+
+	return vfio_pci_ioctl_pci_hot_reset_groups(vdev, hdr.count, slot, arg);
+}
+
 static int vfio_pci_ioctl_ioeventfd(struct vfio_pci_core_device *vdev,
 				    struct vfio_device_ioeventfd __user *arg)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v7 3/9] iommufd: Reserve all negative IDs in the iommufd xarray
  2023-06-02 12:15 ` [Intel-gfx] " Yi Liu
@ 2023-06-02 12:15   ` Yi Liu
  -1 siblings, 0 replies; 77+ messages in thread
From: Yi Liu @ 2023-06-02 12:15 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

With this reservation, IOMMUFD users can encode the negative IDs for
specific purposes. e.g. VFIO needs two reserved values to tell userspace
the ID returned is not valid but has other meaning.

Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 3fbe636c3d8a..32ce7befc8dd 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -50,7 +50,7 @@ struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx,
 	 * before calling iommufd_object_finalize().
 	 */
 	rc = xa_alloc(&ictx->objects, &obj->id, XA_ZERO_ENTRY,
-		      xa_limit_32b, GFP_KERNEL_ACCOUNT);
+		      xa_limit_31b, GFP_KERNEL_ACCOUNT);
 	if (rc)
 		goto out_free;
 	return obj;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH v7 3/9] iommufd: Reserve all negative IDs in the iommufd xarray
@ 2023-06-02 12:15   ` Yi Liu
  0 siblings, 0 replies; 77+ messages in thread
From: Yi Liu @ 2023-06-02 12:15 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

With this reservation, IOMMUFD users can encode the negative IDs for
specific purposes. e.g. VFIO needs two reserved values to tell userspace
the ID returned is not valid but has other meaning.

Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 3fbe636c3d8a..32ce7befc8dd 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -50,7 +50,7 @@ struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx,
 	 * before calling iommufd_object_finalize().
 	 */
 	rc = xa_alloc(&ictx->objects, &obj->id, XA_ZERO_ENTRY,
-		      xa_limit_32b, GFP_KERNEL_ACCOUNT);
+		      xa_limit_31b, GFP_KERNEL_ACCOUNT);
 	if (rc)
 		goto out_free;
 	return obj;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v7 4/9] iommufd: Add iommufd_ctx_has_group()
  2023-06-02 12:15 ` [Intel-gfx] " Yi Liu
@ 2023-06-02 12:15   ` Yi Liu
  -1 siblings, 0 replies; 77+ messages in thread
From: Yi Liu @ 2023-06-02 12:15 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

This adds the helper to check if any device within the given iommu_group
has been bound with the iommufd_ctx. This is helpful for the checking on
device ownership for the devices which have not been bound but cannot be
bound to any other iommufd_ctx as the iommu_group has been bound.

Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/device.c | 30 ++++++++++++++++++++++++++++++
 include/linux/iommufd.h        |  8 ++++++++
 2 files changed, 38 insertions(+)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 4f9b2142274c..4571344c8508 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -98,6 +98,36 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_device_bind, IOMMUFD);
 
+/**
+ * iommufd_ctx_has_group - True if any device within the group is bound
+ *                         to the ictx
+ * @ictx: iommufd file descriptor
+ * @group: Pointer to a physical iommu_group struct
+ *
+ * True if any device within the group has been bound to this ictx, ex. via
+ * iommufd_device_bind(), therefore implying ictx ownership of the group.
+ */
+bool iommufd_ctx_has_group(struct iommufd_ctx *ictx, struct iommu_group *group)
+{
+	struct iommufd_object *obj;
+	unsigned long index;
+
+	if (!ictx || !group)
+		return false;
+
+	xa_lock(&ictx->objects);
+	xa_for_each(&ictx->objects, index, obj) {
+		if (obj->type == IOMMUFD_OBJ_DEVICE &&
+		    container_of(obj, struct iommufd_device, obj)->group == group) {
+			xa_unlock(&ictx->objects);
+			return true;
+		}
+	}
+	xa_unlock(&ictx->objects);
+	return false;
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_ctx_has_group, IOMMUFD);
+
 /**
  * iommufd_device_unbind - Undo iommufd_device_bind()
  * @idev: Device returned by iommufd_device_bind()
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index 1129a36a74c4..33fe57e95e42 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -16,6 +16,7 @@ struct page;
 struct iommufd_ctx;
 struct iommufd_access;
 struct file;
+struct iommu_group;
 
 struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
 					   struct device *dev, u32 *id);
@@ -50,6 +51,7 @@ void iommufd_ctx_get(struct iommufd_ctx *ictx);
 #if IS_ENABLED(CONFIG_IOMMUFD)
 struct iommufd_ctx *iommufd_ctx_from_file(struct file *file);
 void iommufd_ctx_put(struct iommufd_ctx *ictx);
+bool iommufd_ctx_has_group(struct iommufd_ctx *ictx, struct iommu_group *group);
 
 int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova,
 			     unsigned long length, struct page **out_pages,
@@ -71,6 +73,12 @@ static inline void iommufd_ctx_put(struct iommufd_ctx *ictx)
 {
 }
 
+static inline bool iommufd_ctx_has_group(struct iommufd_ctx *ictx,
+					 struct iommu_group *group)
+{
+	return false;
+}
+
 static inline int iommufd_access_pin_pages(struct iommufd_access *access,
 					   unsigned long iova,
 					   unsigned long length,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH v7 4/9] iommufd: Add iommufd_ctx_has_group()
@ 2023-06-02 12:15   ` Yi Liu
  0 siblings, 0 replies; 77+ messages in thread
From: Yi Liu @ 2023-06-02 12:15 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

This adds the helper to check if any device within the given iommu_group
has been bound with the iommufd_ctx. This is helpful for the checking on
device ownership for the devices which have not been bound but cannot be
bound to any other iommufd_ctx as the iommu_group has been bound.

Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/device.c | 30 ++++++++++++++++++++++++++++++
 include/linux/iommufd.h        |  8 ++++++++
 2 files changed, 38 insertions(+)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 4f9b2142274c..4571344c8508 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -98,6 +98,36 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_device_bind, IOMMUFD);
 
+/**
+ * iommufd_ctx_has_group - True if any device within the group is bound
+ *                         to the ictx
+ * @ictx: iommufd file descriptor
+ * @group: Pointer to a physical iommu_group struct
+ *
+ * True if any device within the group has been bound to this ictx, ex. via
+ * iommufd_device_bind(), therefore implying ictx ownership of the group.
+ */
+bool iommufd_ctx_has_group(struct iommufd_ctx *ictx, struct iommu_group *group)
+{
+	struct iommufd_object *obj;
+	unsigned long index;
+
+	if (!ictx || !group)
+		return false;
+
+	xa_lock(&ictx->objects);
+	xa_for_each(&ictx->objects, index, obj) {
+		if (obj->type == IOMMUFD_OBJ_DEVICE &&
+		    container_of(obj, struct iommufd_device, obj)->group == group) {
+			xa_unlock(&ictx->objects);
+			return true;
+		}
+	}
+	xa_unlock(&ictx->objects);
+	return false;
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_ctx_has_group, IOMMUFD);
+
 /**
  * iommufd_device_unbind - Undo iommufd_device_bind()
  * @idev: Device returned by iommufd_device_bind()
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index 1129a36a74c4..33fe57e95e42 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -16,6 +16,7 @@ struct page;
 struct iommufd_ctx;
 struct iommufd_access;
 struct file;
+struct iommu_group;
 
 struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
 					   struct device *dev, u32 *id);
@@ -50,6 +51,7 @@ void iommufd_ctx_get(struct iommufd_ctx *ictx);
 #if IS_ENABLED(CONFIG_IOMMUFD)
 struct iommufd_ctx *iommufd_ctx_from_file(struct file *file);
 void iommufd_ctx_put(struct iommufd_ctx *ictx);
+bool iommufd_ctx_has_group(struct iommufd_ctx *ictx, struct iommu_group *group);
 
 int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova,
 			     unsigned long length, struct page **out_pages,
@@ -71,6 +73,12 @@ static inline void iommufd_ctx_put(struct iommufd_ctx *ictx)
 {
 }
 
+static inline bool iommufd_ctx_has_group(struct iommufd_ctx *ictx,
+					 struct iommu_group *group)
+{
+	return false;
+}
+
 static inline int iommufd_access_pin_pages(struct iommufd_access *access,
 					   unsigned long iova,
 					   unsigned long length,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v7 5/9] iommufd: Add helper to retrieve iommufd_ctx and devid
  2023-06-02 12:15 ` [Intel-gfx] " Yi Liu
@ 2023-06-02 12:15   ` Yi Liu
  -1 siblings, 0 replies; 77+ messages in thread
From: Yi Liu @ 2023-06-02 12:15 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

This is needed by the vfio-pci driver to report affected devices in the
hot-reset for a given device.

Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/device.c | 12 ++++++++++++
 include/linux/iommufd.h        |  3 +++
 2 files changed, 15 insertions(+)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 4571344c8508..96d4281bfa7c 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -146,6 +146,18 @@ void iommufd_device_unbind(struct iommufd_device *idev)
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_device_unbind, IOMMUFD);
 
+struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev)
+{
+	return idev->ictx;
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_device_to_ictx, IOMMUFD);
+
+u32 iommufd_device_to_id(struct iommufd_device *idev)
+{
+	return idev->obj.id;
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_device_to_id, IOMMUFD);
+
 static int iommufd_device_setup_msi(struct iommufd_device *idev,
 				    struct iommufd_hw_pagetable *hwpt,
 				    phys_addr_t sw_msi_start)
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index 33fe57e95e42..33933b0f95fc 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -25,6 +25,9 @@ void iommufd_device_unbind(struct iommufd_device *idev);
 int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id);
 void iommufd_device_detach(struct iommufd_device *idev);
 
+struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev);
+u32 iommufd_device_to_id(struct iommufd_device *idev);
+
 struct iommufd_access_ops {
 	u8 needs_pin_pages : 1;
 	void (*unmap)(void *data, unsigned long iova, unsigned long length);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH v7 5/9] iommufd: Add helper to retrieve iommufd_ctx and devid
@ 2023-06-02 12:15   ` Yi Liu
  0 siblings, 0 replies; 77+ messages in thread
From: Yi Liu @ 2023-06-02 12:15 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

This is needed by the vfio-pci driver to report affected devices in the
hot-reset for a given device.

Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/device.c | 12 ++++++++++++
 include/linux/iommufd.h        |  3 +++
 2 files changed, 15 insertions(+)

diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 4571344c8508..96d4281bfa7c 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -146,6 +146,18 @@ void iommufd_device_unbind(struct iommufd_device *idev)
 }
 EXPORT_SYMBOL_NS_GPL(iommufd_device_unbind, IOMMUFD);
 
+struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev)
+{
+	return idev->ictx;
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_device_to_ictx, IOMMUFD);
+
+u32 iommufd_device_to_id(struct iommufd_device *idev)
+{
+	return idev->obj.id;
+}
+EXPORT_SYMBOL_NS_GPL(iommufd_device_to_id, IOMMUFD);
+
 static int iommufd_device_setup_msi(struct iommufd_device *idev,
 				    struct iommufd_hw_pagetable *hwpt,
 				    phys_addr_t sw_msi_start)
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index 33fe57e95e42..33933b0f95fc 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -25,6 +25,9 @@ void iommufd_device_unbind(struct iommufd_device *idev);
 int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id);
 void iommufd_device_detach(struct iommufd_device *idev);
 
+struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev);
+u32 iommufd_device_to_id(struct iommufd_device *idev);
+
 struct iommufd_access_ops {
 	u8 needs_pin_pages : 1;
 	void (*unmap)(void *data, unsigned long iova, unsigned long length);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v7 6/9] vfio: Mark cdev usage in vfio_device
  2023-06-02 12:15 ` [Intel-gfx] " Yi Liu
@ 2023-06-02 12:15   ` Yi Liu
  -1 siblings, 0 replies; 77+ messages in thread
From: Yi Liu @ 2023-06-02 12:15 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

This can be used to differentiate whether to report group_id or devid in
the revised VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl. At this moment, no
cdev path yet, so the vfio_device_cdev_opened() helper always returns false.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 include/linux/vfio.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 2c137ea94a3e..2a45853773a6 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -139,6 +139,11 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
 	((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
 #endif
 
+static inline bool vfio_device_cdev_opened(struct vfio_device *device)
+{
+	return false;
+}
+
 /**
  * struct vfio_migration_ops - VFIO bus device driver migration callbacks
  *
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH v7 6/9] vfio: Mark cdev usage in vfio_device
@ 2023-06-02 12:15   ` Yi Liu
  0 siblings, 0 replies; 77+ messages in thread
From: Yi Liu @ 2023-06-02 12:15 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

This can be used to differentiate whether to report group_id or devid in
the revised VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl. At this moment, no
cdev path yet, so the vfio_device_cdev_opened() helper always returns false.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 include/linux/vfio.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 2c137ea94a3e..2a45853773a6 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -139,6 +139,11 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
 	((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
 #endif
 
+static inline bool vfio_device_cdev_opened(struct vfio_device *device)
+{
+	return false;
+}
+
 /**
  * struct vfio_migration_ops - VFIO bus device driver migration callbacks
  *
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v7 7/9] vfio: Add helper to search vfio_device in a dev_set
  2023-06-02 12:15 ` [Intel-gfx] " Yi Liu
@ 2023-06-02 12:15   ` Yi Liu
  -1 siblings, 0 replies; 77+ messages in thread
From: Yi Liu @ 2023-06-02 12:15 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

There are drivers that need to search vfio_device within a given dev_set.
e.g. vfio-pci. So add a helper.

vfio_pci_is_device_in_set() now returns -EBUSY in commit a882c16a2b7e
("vfio/pci: Change vfio_pci_try_bus_reset() to use the dev_set") where
it was trying to preserve the return of vfio_pci_try_zap_and_vma_lock_cb().
However, it makes more sense to return -ENODEV.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/pci/vfio_pci_core.c |  6 +-----
 drivers/vfio/vfio_main.c         | 15 +++++++++++++++
 include/linux/vfio.h             |  3 +++
 3 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 39e7823088e7..3a2f67675036 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -2335,12 +2335,8 @@ static bool vfio_dev_in_groups(struct vfio_pci_core_device *vdev,
 static int vfio_pci_is_device_in_set(struct pci_dev *pdev, void *data)
 {
 	struct vfio_device_set *dev_set = data;
-	struct vfio_device *cur;
 
-	list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
-		if (cur->dev == &pdev->dev)
-			return 0;
-	return -EBUSY;
+	return vfio_find_device_in_devset(dev_set, &pdev->dev) ? 0 : -ENODEV;
 }
 
 /*
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index f0ca33b2e1df..ab4f3a794f78 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -141,6 +141,21 @@ unsigned int vfio_device_set_open_count(struct vfio_device_set *dev_set)
 }
 EXPORT_SYMBOL_GPL(vfio_device_set_open_count);
 
+struct vfio_device *
+vfio_find_device_in_devset(struct vfio_device_set *dev_set,
+			   struct device *dev)
+{
+	struct vfio_device *cur;
+
+	lockdep_assert_held(&dev_set->lock);
+
+	list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
+		if (cur->dev == dev)
+			return cur;
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(vfio_find_device_in_devset);
+
 /*
  * Device objects - create, release, get, put, search
  */
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 2a45853773a6..ee120d2d530b 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -244,6 +244,9 @@ void vfio_unregister_group_dev(struct vfio_device *device);
 
 int vfio_assign_device_set(struct vfio_device *device, void *set_id);
 unsigned int vfio_device_set_open_count(struct vfio_device_set *dev_set);
+struct vfio_device *
+vfio_find_device_in_devset(struct vfio_device_set *dev_set,
+			   struct device *dev);
 
 int vfio_mig_get_next_state(struct vfio_device *device,
 			    enum vfio_device_mig_state cur_fsm,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH v7 7/9] vfio: Add helper to search vfio_device in a dev_set
@ 2023-06-02 12:15   ` Yi Liu
  0 siblings, 0 replies; 77+ messages in thread
From: Yi Liu @ 2023-06-02 12:15 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

There are drivers that need to search vfio_device within a given dev_set.
e.g. vfio-pci. So add a helper.

vfio_pci_is_device_in_set() now returns -EBUSY in commit a882c16a2b7e
("vfio/pci: Change vfio_pci_try_bus_reset() to use the dev_set") where
it was trying to preserve the return of vfio_pci_try_zap_and_vma_lock_cb().
However, it makes more sense to return -ENODEV.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/pci/vfio_pci_core.c |  6 +-----
 drivers/vfio/vfio_main.c         | 15 +++++++++++++++
 include/linux/vfio.h             |  3 +++
 3 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 39e7823088e7..3a2f67675036 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -2335,12 +2335,8 @@ static bool vfio_dev_in_groups(struct vfio_pci_core_device *vdev,
 static int vfio_pci_is_device_in_set(struct pci_dev *pdev, void *data)
 {
 	struct vfio_device_set *dev_set = data;
-	struct vfio_device *cur;
 
-	list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
-		if (cur->dev == &pdev->dev)
-			return 0;
-	return -EBUSY;
+	return vfio_find_device_in_devset(dev_set, &pdev->dev) ? 0 : -ENODEV;
 }
 
 /*
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index f0ca33b2e1df..ab4f3a794f78 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -141,6 +141,21 @@ unsigned int vfio_device_set_open_count(struct vfio_device_set *dev_set)
 }
 EXPORT_SYMBOL_GPL(vfio_device_set_open_count);
 
+struct vfio_device *
+vfio_find_device_in_devset(struct vfio_device_set *dev_set,
+			   struct device *dev)
+{
+	struct vfio_device *cur;
+
+	lockdep_assert_held(&dev_set->lock);
+
+	list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
+		if (cur->dev == dev)
+			return cur;
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(vfio_find_device_in_devset);
+
 /*
  * Device objects - create, release, get, put, search
  */
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 2a45853773a6..ee120d2d530b 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -244,6 +244,9 @@ void vfio_unregister_group_dev(struct vfio_device *device);
 
 int vfio_assign_device_set(struct vfio_device *device, void *set_id);
 unsigned int vfio_device_set_open_count(struct vfio_device_set *dev_set);
+struct vfio_device *
+vfio_find_device_in_devset(struct vfio_device_set *dev_set,
+			   struct device *dev);
 
 int vfio_mig_get_next_state(struct vfio_device *device,
 			    enum vfio_device_mig_state cur_fsm,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v7 8/9] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
  2023-06-02 12:15 ` [Intel-gfx] " Yi Liu
@ 2023-06-02 12:15   ` Yi Liu
  -1 siblings, 0 replies; 77+ messages in thread
From: Yi Liu @ 2023-06-02 12:15 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

This allows VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl use the iommufd_ctx
of the cdev device to check the ownership of the other affected devices.

When VFIO_DEVICE_GET_PCI_HOT_RESET_INFO is called on an IOMMUFD managed
device, the new flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is reported to indicate
the values returned are IOMMUFD devids rather than group IDs as used when
accessing vfio devices through the conventional vfio group interface.
Additionally the flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED will be reported
in this mode if all of the devices affected by the hot-reset are owned by
either virtue of being directly bound to the same iommufd context as the
calling device, or implicitly owned via a shared IOMMU group.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/iommufd.c           | 49 +++++++++++++++++++++++++++++++
 drivers/vfio/pci/vfio_pci_core.c | 47 +++++++++++++++++++++++++-----
 include/linux/vfio.h             | 16 ++++++++++
 include/uapi/linux/vfio.h        | 50 +++++++++++++++++++++++++++++++-
 4 files changed, 154 insertions(+), 8 deletions(-)

diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index 88b00c501015..a04f3a493437 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -66,6 +66,55 @@ void vfio_iommufd_unbind(struct vfio_device *vdev)
 		vdev->ops->unbind_iommufd(vdev);
 }
 
+struct iommufd_ctx *vfio_iommufd_device_ictx(struct vfio_device *vdev)
+{
+	if (vdev->iommufd_device)
+		return iommufd_device_to_ictx(vdev->iommufd_device);
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_device_ictx);
+
+static int vfio_iommufd_device_id(struct vfio_device *vdev)
+{
+	if (vdev->iommufd_device)
+		return iommufd_device_to_id(vdev->iommufd_device);
+	return -EINVAL;
+}
+
+/*
+ * Return devid for a device which is affected by hot-reset.
+ * - valid devid > 0 for the device that is bound to the input
+ *   iommufd_ctx.
+ * - devid == VFIO_PCI_DEVID_OWNED for the device that has not
+ *   been bound to any iommufd_ctx but other device within its
+ *   group has been bound to the input iommufd_ctx.
+ * - devid == VFIO_PCI_DEVID_NOT_OWNED for others. e.g. device
+ *   is bound to other iommufd_ctx etc.
+ */
+int vfio_iommufd_device_hot_reset_devid(struct vfio_device *vdev,
+					struct iommufd_ctx *ictx)
+{
+	struct iommu_group *group;
+	int devid;
+
+	if (vfio_iommufd_device_ictx(vdev) == ictx)
+		return vfio_iommufd_device_id(vdev);
+
+	group = iommu_group_get(vdev->dev);
+	if (!group)
+		return VFIO_PCI_DEVID_NOT_OWNED;
+
+	if (iommufd_ctx_has_group(ictx, group))
+		devid = VFIO_PCI_DEVID_OWNED;
+	else
+		devid = VFIO_PCI_DEVID_NOT_OWNED;
+
+	iommu_group_put(group);
+
+	return devid;
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_device_hot_reset_devid);
+
 /*
  * The physical standard ops mean that the iommufd_device is bound to the
  * physical device vdev->dev that was provided to vfio_init_group_dev(). Drivers
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 3a2f67675036..a615a223cdef 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -27,6 +27,7 @@
 #include <linux/vgaarb.h>
 #include <linux/nospec.h>
 #include <linux/sched/mm.h>
+#include <linux/iommufd.h>
 #if IS_ENABLED(CONFIG_EEH)
 #include <asm/eeh.h>
 #endif
@@ -776,26 +777,49 @@ struct vfio_pci_fill_info {
 	int max;
 	int cur;
 	struct vfio_pci_dependent_device *devices;
+	struct vfio_device *vdev;
+	u32 flags;
 };
 
 static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
 {
 	struct vfio_pci_fill_info *fill = data;
-	struct iommu_group *iommu_group;
 
 	if (fill->cur == fill->max)
 		return -EAGAIN; /* Something changed, try again */
 
-	iommu_group = iommu_group_get(&pdev->dev);
-	if (!iommu_group)
-		return -EPERM; /* Cannot reset non-isolated devices */
+	if (fill->flags & VFIO_PCI_HOT_RESET_FLAG_DEV_ID) {
+		struct iommufd_ctx *iommufd = vfio_iommufd_device_ictx(fill->vdev);
+		struct vfio_device_set *dev_set = fill->vdev->dev_set;
+		struct vfio_device *vdev;
 
-	fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
+		/*
+		 * hot-reset requires all affected devices be represented in
+		 * the dev_set.
+		 */
+		vdev = vfio_find_device_in_devset(dev_set, &pdev->dev);
+		if (!vdev)
+			fill->devices[fill->cur].devid = VFIO_PCI_DEVID_NOT_OWNED;
+		else
+			fill->devices[fill->cur].devid =
+				vfio_iommufd_device_hot_reset_devid(vdev, iommufd);
+		/* If devid is VFIO_PCI_DEVID_NOT_OWNED, clear owned flag. */
+		if (fill->devices[fill->cur].devid == VFIO_PCI_DEVID_NOT_OWNED)
+			fill->flags &= ~VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED;
+	} else {
+		struct iommu_group *iommu_group;
+
+		iommu_group = iommu_group_get(&pdev->dev);
+		if (!iommu_group)
+			return -EPERM; /* Cannot reset non-isolated devices */
+
+		fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
+		iommu_group_put(iommu_group);
+	}
 	fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
 	fill->devices[fill->cur].bus = pdev->bus->number;
 	fill->devices[fill->cur].devfn = pdev->devfn;
 	fill->cur++;
-	iommu_group_put(iommu_group);
 	return 0;
 }
 
@@ -1229,17 +1253,26 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
 		return -ENOMEM;
 
 	fill.devices = devices;
+	fill.vdev = &vdev->vdev;
+
+	if (vfio_device_cdev_opened(&vdev->vdev))
+		fill.flags |= VFIO_PCI_HOT_RESET_FLAG_DEV_ID |
+			     VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED;
 
+	mutex_lock(&vdev->vdev.dev_set->lock);
 	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
 					    &fill, slot);
+	mutex_unlock(&vdev->vdev.dev_set->lock);
 
 	/*
 	 * If a device was removed between counting and filling, we may come up
 	 * short of fill.max.  If a device was added, we'll have a return of
 	 * -EAGAIN above.
 	 */
-	if (!ret)
+	if (!ret) {
 		hdr.count = fill.cur;
+		hdr.flags = fill.flags;
+	}
 
 reset_info_exit:
 	if (copy_to_user(arg, &hdr, minsz))
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index ee120d2d530b..382a7b119c7c 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -114,6 +114,9 @@ struct vfio_device_ops {
 };
 
 #if IS_ENABLED(CONFIG_IOMMUFD)
+struct iommufd_ctx *vfio_iommufd_device_ictx(struct vfio_device *vdev);
+int vfio_iommufd_device_hot_reset_devid(struct vfio_device *vdev,
+					struct iommufd_ctx *ictx);
 int vfio_iommufd_physical_bind(struct vfio_device *vdev,
 			       struct iommufd_ctx *ictx, u32 *out_device_id);
 void vfio_iommufd_physical_unbind(struct vfio_device *vdev);
@@ -123,6 +126,19 @@ int vfio_iommufd_emulated_bind(struct vfio_device *vdev,
 void vfio_iommufd_emulated_unbind(struct vfio_device *vdev);
 int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
 #else
+static inline struct iommufd_ctx *
+vfio_iommufd_device_ictx(struct vfio_device *vdev)
+{
+	return NULL;
+}
+
+static inline int
+vfio_iommufd_device_hot_reset_devid(struct vfio_device *vdev,
+				    struct iommufd_ctx *ictx)
+{
+	return VFIO_PCI_DEVID_NOT_OWNED;
+}
+
 #define vfio_iommufd_physical_bind                                      \
 	((int (*)(struct vfio_device *vdev, struct iommufd_ctx *ictx,   \
 		  u32 *out_device_id)) NULL)
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 0552e8dcf0cb..70cc31e6b1ce 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -650,11 +650,57 @@ enum {
  * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 12,
  *					      struct vfio_pci_hot_reset_info)
  *
+ * This command is used to query the affected devices in the hot reset for
+ * a given device.
+ *
+ * This command always reports the segment, bus, and devfn information for
+ * each affected device, and selectively reports the group_id or devid per
+ * the way how the calling device is opened.
+ *
+ *	- If the calling device is opened via the traditional group/container
+ *	  API, group_id is reported.  User should check if it has owned all
+ *	  the affected devices and provides a set of group fds to prove the
+ *	  ownership in VFIO_DEVICE_PCI_HOT_RESET ioctl.
+ *
+ *	- If the calling device is opened as a cdev, devid is reported.
+ *	  Flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is set to indicate this
+ *	  data type.  All the affected devices should be represented in
+ *	  the dev_set, ex. bound to a vfio driver, and also be owned by
+ *	  this interface which is determined by the following conditions:
+ *	  1) Has a valid devid within the iommufd_ctx of the calling device.
+ *	     Ownership cannot be determined across separate iommufd_ctx and
+ *	     the cdev calling conventions do not support a proof-of-ownership
+ *	     model as provided in the legacy group interface.  In this case
+ *	     valid devid with value greater than zero is provided in the return
+ *	     structure.
+ *	  2) Does not have a valid devid within the iommufd_ctx of the calling
+ *	     device, but belongs to the same IOMMU group as the calling device
+ *	     or another opened device that has a valid devid within the
+ *	     iommufd_ctx of the calling device.  This provides implicit ownership
+ *	     for devices within the same DMA isolation context.  In this case
+ *	     the devid value of VFIO_PCI_DEVID_OWNED is provided in the return
+ *	     structure.
+ *
+ *	  A devid value of VFIO_PCI_DEVID_NOT_OWNED is provided in the return
+ *	  structure for affected devices where device is NOT represented in the
+ *	  dev_set or ownership is not available.  Such devices prevent the use
+ *	  of VFIO_DEVICE_PCI_HOT_RESET ioctl outside of the proof-of-ownership
+ *	  calling conventions (ie. via legacy group accessed devices).  Flag
+ *	  VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED would be set when all the
+ *	  affected devices are represented in the dev_set and also owned by
+ *	  the user.  This flag is available only when
+ *	  flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is set, otherwise reserved.
+ *
  * Return: 0 on success, -errno on failure:
  *	-enospc = insufficient buffer, -enodev = unsupported for device.
  */
 struct vfio_pci_dependent_device {
-	__u32	group_id;
+	union {
+		__u32   group_id;
+		__u32	devid;
+#define VFIO_PCI_DEVID_OWNED		0
+#define VFIO_PCI_DEVID_NOT_OWNED	-1
+	};
 	__u16	segment;
 	__u8	bus;
 	__u8	devfn; /* Use PCI_SLOT/PCI_FUNC */
@@ -663,6 +709,8 @@ struct vfio_pci_dependent_device {
 struct vfio_pci_hot_reset_info {
 	__u32	argsz;
 	__u32	flags;
+#define VFIO_PCI_HOT_RESET_FLAG_DEV_ID		(1 << 0)
+#define VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED	(1 << 1)
 	__u32	count;
 	struct vfio_pci_dependent_device	devices[];
 };
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH v7 8/9] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
@ 2023-06-02 12:15   ` Yi Liu
  0 siblings, 0 replies; 77+ messages in thread
From: Yi Liu @ 2023-06-02 12:15 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

This allows VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl use the iommufd_ctx
of the cdev device to check the ownership of the other affected devices.

When VFIO_DEVICE_GET_PCI_HOT_RESET_INFO is called on an IOMMUFD managed
device, the new flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is reported to indicate
the values returned are IOMMUFD devids rather than group IDs as used when
accessing vfio devices through the conventional vfio group interface.
Additionally the flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED will be reported
in this mode if all of the devices affected by the hot-reset are owned by
either virtue of being directly bound to the same iommufd context as the
calling device, or implicitly owned via a shared IOMMU group.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/iommufd.c           | 49 +++++++++++++++++++++++++++++++
 drivers/vfio/pci/vfio_pci_core.c | 47 +++++++++++++++++++++++++-----
 include/linux/vfio.h             | 16 ++++++++++
 include/uapi/linux/vfio.h        | 50 +++++++++++++++++++++++++++++++-
 4 files changed, 154 insertions(+), 8 deletions(-)

diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index 88b00c501015..a04f3a493437 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -66,6 +66,55 @@ void vfio_iommufd_unbind(struct vfio_device *vdev)
 		vdev->ops->unbind_iommufd(vdev);
 }
 
+struct iommufd_ctx *vfio_iommufd_device_ictx(struct vfio_device *vdev)
+{
+	if (vdev->iommufd_device)
+		return iommufd_device_to_ictx(vdev->iommufd_device);
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_device_ictx);
+
+static int vfio_iommufd_device_id(struct vfio_device *vdev)
+{
+	if (vdev->iommufd_device)
+		return iommufd_device_to_id(vdev->iommufd_device);
+	return -EINVAL;
+}
+
+/*
+ * Return devid for a device which is affected by hot-reset.
+ * - valid devid > 0 for the device that is bound to the input
+ *   iommufd_ctx.
+ * - devid == VFIO_PCI_DEVID_OWNED for the device that has not
+ *   been bound to any iommufd_ctx but other device within its
+ *   group has been bound to the input iommufd_ctx.
+ * - devid == VFIO_PCI_DEVID_NOT_OWNED for others. e.g. device
+ *   is bound to other iommufd_ctx etc.
+ */
+int vfio_iommufd_device_hot_reset_devid(struct vfio_device *vdev,
+					struct iommufd_ctx *ictx)
+{
+	struct iommu_group *group;
+	int devid;
+
+	if (vfio_iommufd_device_ictx(vdev) == ictx)
+		return vfio_iommufd_device_id(vdev);
+
+	group = iommu_group_get(vdev->dev);
+	if (!group)
+		return VFIO_PCI_DEVID_NOT_OWNED;
+
+	if (iommufd_ctx_has_group(ictx, group))
+		devid = VFIO_PCI_DEVID_OWNED;
+	else
+		devid = VFIO_PCI_DEVID_NOT_OWNED;
+
+	iommu_group_put(group);
+
+	return devid;
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_device_hot_reset_devid);
+
 /*
  * The physical standard ops mean that the iommufd_device is bound to the
  * physical device vdev->dev that was provided to vfio_init_group_dev(). Drivers
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 3a2f67675036..a615a223cdef 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -27,6 +27,7 @@
 #include <linux/vgaarb.h>
 #include <linux/nospec.h>
 #include <linux/sched/mm.h>
+#include <linux/iommufd.h>
 #if IS_ENABLED(CONFIG_EEH)
 #include <asm/eeh.h>
 #endif
@@ -776,26 +777,49 @@ struct vfio_pci_fill_info {
 	int max;
 	int cur;
 	struct vfio_pci_dependent_device *devices;
+	struct vfio_device *vdev;
+	u32 flags;
 };
 
 static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
 {
 	struct vfio_pci_fill_info *fill = data;
-	struct iommu_group *iommu_group;
 
 	if (fill->cur == fill->max)
 		return -EAGAIN; /* Something changed, try again */
 
-	iommu_group = iommu_group_get(&pdev->dev);
-	if (!iommu_group)
-		return -EPERM; /* Cannot reset non-isolated devices */
+	if (fill->flags & VFIO_PCI_HOT_RESET_FLAG_DEV_ID) {
+		struct iommufd_ctx *iommufd = vfio_iommufd_device_ictx(fill->vdev);
+		struct vfio_device_set *dev_set = fill->vdev->dev_set;
+		struct vfio_device *vdev;
 
-	fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
+		/*
+		 * hot-reset requires all affected devices be represented in
+		 * the dev_set.
+		 */
+		vdev = vfio_find_device_in_devset(dev_set, &pdev->dev);
+		if (!vdev)
+			fill->devices[fill->cur].devid = VFIO_PCI_DEVID_NOT_OWNED;
+		else
+			fill->devices[fill->cur].devid =
+				vfio_iommufd_device_hot_reset_devid(vdev, iommufd);
+		/* If devid is VFIO_PCI_DEVID_NOT_OWNED, clear owned flag. */
+		if (fill->devices[fill->cur].devid == VFIO_PCI_DEVID_NOT_OWNED)
+			fill->flags &= ~VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED;
+	} else {
+		struct iommu_group *iommu_group;
+
+		iommu_group = iommu_group_get(&pdev->dev);
+		if (!iommu_group)
+			return -EPERM; /* Cannot reset non-isolated devices */
+
+		fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
+		iommu_group_put(iommu_group);
+	}
 	fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
 	fill->devices[fill->cur].bus = pdev->bus->number;
 	fill->devices[fill->cur].devfn = pdev->devfn;
 	fill->cur++;
-	iommu_group_put(iommu_group);
 	return 0;
 }
 
@@ -1229,17 +1253,26 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
 		return -ENOMEM;
 
 	fill.devices = devices;
+	fill.vdev = &vdev->vdev;
+
+	if (vfio_device_cdev_opened(&vdev->vdev))
+		fill.flags |= VFIO_PCI_HOT_RESET_FLAG_DEV_ID |
+			     VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED;
 
+	mutex_lock(&vdev->vdev.dev_set->lock);
 	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
 					    &fill, slot);
+	mutex_unlock(&vdev->vdev.dev_set->lock);
 
 	/*
 	 * If a device was removed between counting and filling, we may come up
 	 * short of fill.max.  If a device was added, we'll have a return of
 	 * -EAGAIN above.
 	 */
-	if (!ret)
+	if (!ret) {
 		hdr.count = fill.cur;
+		hdr.flags = fill.flags;
+	}
 
 reset_info_exit:
 	if (copy_to_user(arg, &hdr, minsz))
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index ee120d2d530b..382a7b119c7c 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -114,6 +114,9 @@ struct vfio_device_ops {
 };
 
 #if IS_ENABLED(CONFIG_IOMMUFD)
+struct iommufd_ctx *vfio_iommufd_device_ictx(struct vfio_device *vdev);
+int vfio_iommufd_device_hot_reset_devid(struct vfio_device *vdev,
+					struct iommufd_ctx *ictx);
 int vfio_iommufd_physical_bind(struct vfio_device *vdev,
 			       struct iommufd_ctx *ictx, u32 *out_device_id);
 void vfio_iommufd_physical_unbind(struct vfio_device *vdev);
@@ -123,6 +126,19 @@ int vfio_iommufd_emulated_bind(struct vfio_device *vdev,
 void vfio_iommufd_emulated_unbind(struct vfio_device *vdev);
 int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
 #else
+static inline struct iommufd_ctx *
+vfio_iommufd_device_ictx(struct vfio_device *vdev)
+{
+	return NULL;
+}
+
+static inline int
+vfio_iommufd_device_hot_reset_devid(struct vfio_device *vdev,
+				    struct iommufd_ctx *ictx)
+{
+	return VFIO_PCI_DEVID_NOT_OWNED;
+}
+
 #define vfio_iommufd_physical_bind                                      \
 	((int (*)(struct vfio_device *vdev, struct iommufd_ctx *ictx,   \
 		  u32 *out_device_id)) NULL)
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 0552e8dcf0cb..70cc31e6b1ce 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -650,11 +650,57 @@ enum {
  * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 12,
  *					      struct vfio_pci_hot_reset_info)
  *
+ * This command is used to query the affected devices in the hot reset for
+ * a given device.
+ *
+ * This command always reports the segment, bus, and devfn information for
+ * each affected device, and selectively reports the group_id or devid per
+ * the way how the calling device is opened.
+ *
+ *	- If the calling device is opened via the traditional group/container
+ *	  API, group_id is reported.  User should check if it has owned all
+ *	  the affected devices and provides a set of group fds to prove the
+ *	  ownership in VFIO_DEVICE_PCI_HOT_RESET ioctl.
+ *
+ *	- If the calling device is opened as a cdev, devid is reported.
+ *	  Flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is set to indicate this
+ *	  data type.  All the affected devices should be represented in
+ *	  the dev_set, ex. bound to a vfio driver, and also be owned by
+ *	  this interface which is determined by the following conditions:
+ *	  1) Has a valid devid within the iommufd_ctx of the calling device.
+ *	     Ownership cannot be determined across separate iommufd_ctx and
+ *	     the cdev calling conventions do not support a proof-of-ownership
+ *	     model as provided in the legacy group interface.  In this case
+ *	     valid devid with value greater than zero is provided in the return
+ *	     structure.
+ *	  2) Does not have a valid devid within the iommufd_ctx of the calling
+ *	     device, but belongs to the same IOMMU group as the calling device
+ *	     or another opened device that has a valid devid within the
+ *	     iommufd_ctx of the calling device.  This provides implicit ownership
+ *	     for devices within the same DMA isolation context.  In this case
+ *	     the devid value of VFIO_PCI_DEVID_OWNED is provided in the return
+ *	     structure.
+ *
+ *	  A devid value of VFIO_PCI_DEVID_NOT_OWNED is provided in the return
+ *	  structure for affected devices where device is NOT represented in the
+ *	  dev_set or ownership is not available.  Such devices prevent the use
+ *	  of VFIO_DEVICE_PCI_HOT_RESET ioctl outside of the proof-of-ownership
+ *	  calling conventions (ie. via legacy group accessed devices).  Flag
+ *	  VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED would be set when all the
+ *	  affected devices are represented in the dev_set and also owned by
+ *	  the user.  This flag is available only when
+ *	  flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is set, otherwise reserved.
+ *
  * Return: 0 on success, -errno on failure:
  *	-enospc = insufficient buffer, -enodev = unsupported for device.
  */
 struct vfio_pci_dependent_device {
-	__u32	group_id;
+	union {
+		__u32   group_id;
+		__u32	devid;
+#define VFIO_PCI_DEVID_OWNED		0
+#define VFIO_PCI_DEVID_NOT_OWNED	-1
+	};
 	__u16	segment;
 	__u8	bus;
 	__u8	devfn; /* Use PCI_SLOT/PCI_FUNC */
@@ -663,6 +709,8 @@ struct vfio_pci_dependent_device {
 struct vfio_pci_hot_reset_info {
 	__u32	argsz;
 	__u32	flags;
+#define VFIO_PCI_HOT_RESET_FLAG_DEV_ID		(1 << 0)
+#define VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED	(1 << 1)
 	__u32	count;
 	struct vfio_pci_dependent_device	devices[];
 };
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v7 9/9] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-06-02 12:15 ` [Intel-gfx] " Yi Liu
@ 2023-06-02 12:15   ` Yi Liu
  -1 siblings, 0 replies; 77+ messages in thread
From: Yi Liu @ 2023-06-02 12:15 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.l.liu, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

This is the way user to invoke hot-reset for the devices opened by cdev
interface. User should check the flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED
in the output of VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl before doing
hot-reset for cdev devices.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 61 ++++++++++++++++++++++++++------
 include/uapi/linux/vfio.h        | 14 ++++++++
 2 files changed, 64 insertions(+), 11 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index a615a223cdef..b0eadafcbcf5 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -181,7 +181,8 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_core_device *vdev)
 struct vfio_pci_group_info;
 static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set);
 static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
-				      struct vfio_pci_group_info *groups);
+				      struct vfio_pci_group_info *groups,
+				      struct iommufd_ctx *iommufd_ctx);
 
 /*
  * INTx masking requires the ability to disable INTx signaling via PCI_COMMAND
@@ -1308,8 +1309,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
 	if (ret)
 		return ret;
 
-	/* Somewhere between 1 and count is OK */
-	if (!array_count || array_count > count)
+	if (array_count > count)
 		return -EINVAL;
 
 	group_fds = kcalloc(array_count, sizeof(*group_fds), GFP_KERNEL);
@@ -1358,7 +1358,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
 	info.count = array_count;
 	info.files = files;
 
-	ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info);
+	ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info, NULL);
 
 hot_reset_release:
 	for (file_idx--; file_idx >= 0; file_idx--)
@@ -1381,13 +1381,21 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	if (hdr.argsz < minsz || hdr.flags)
 		return -EINVAL;
 
+	/* zero-length array is only for cdev opened devices */
+	if (!!hdr.count == vfio_device_cdev_opened(&vdev->vdev))
+		return -EINVAL;
+
 	/* Can we do a slot or bus reset or neither? */
 	if (!pci_probe_reset_slot(vdev->pdev->slot))
 		slot = true;
 	else if (pci_probe_reset_bus(vdev->pdev->bus))
 		return -ENODEV;
 
-	return vfio_pci_ioctl_pci_hot_reset_groups(vdev, hdr.count, slot, arg);
+	if (hdr.count)
+		return vfio_pci_ioctl_pci_hot_reset_groups(vdev, hdr.count, slot, arg);
+
+	return vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, NULL,
+					  vfio_iommufd_device_ictx(&vdev->vdev));
 }
 
 static int vfio_pci_ioctl_ioeventfd(struct vfio_pci_core_device *vdev,
@@ -2354,13 +2362,16 @@ const struct pci_error_handlers vfio_pci_core_err_handlers = {
 };
 EXPORT_SYMBOL_GPL(vfio_pci_core_err_handlers);
 
-static bool vfio_dev_in_groups(struct vfio_pci_core_device *vdev,
+static bool vfio_dev_in_groups(struct vfio_device *vdev,
 			       struct vfio_pci_group_info *groups)
 {
 	unsigned int i;
 
+	if (!groups)
+		return false;
+
 	for (i = 0; i < groups->count; i++)
-		if (vfio_file_has_dev(groups->files[i], &vdev->vdev))
+		if (vfio_file_has_dev(groups->files[i], vdev))
 			return true;
 	return false;
 }
@@ -2436,7 +2447,8 @@ static int vfio_pci_dev_set_pm_runtime_get(struct vfio_device_set *dev_set)
  * get each memory_lock.
  */
 static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
-				      struct vfio_pci_group_info *groups)
+				      struct vfio_pci_group_info *groups,
+				      struct iommufd_ctx *iommufd_ctx)
 {
 	struct vfio_pci_core_device *cur_mem;
 	struct vfio_pci_core_device *cur_vma;
@@ -2466,11 +2478,38 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
 		goto err_unlock;
 
 	list_for_each_entry(cur_vma, &dev_set->device_list, vdev.dev_set_list) {
+		bool owned;
+
 		/*
-		 * Test whether all the affected devices are contained by the
-		 * set of groups provided by the user.
+		 * Test whether all the affected devices can be reset by the
+		 * user.
+		 *
+		 * If called from a group opened device and the user provides
+		 * a set of groups, all the devices in the dev_set should be
+		 * contained by the set of groups provided by the user.
+		 *
+		 * If called from a cdev opened device and the user provides
+		 * a zero-length array, all the devices in the dev_set must
+		 * be bound to the same iommufd_ctx as the input iommufd_ctx.
+		 * If there is any device that has not been bound to any
+		 * iommufd_ctx yet, check if its iommu_group has any device
+		 * bound to the input iommufd_ctx.  Such devices can be
+		 * considered owned by the input iommufd_ctx as the device
+		 * cannot be owned by another iommufd_ctx when its iommu_group
+		 * is owned.
+		 *
+		 * Otherwise, reset is not allowed.
 		 */
-		if (!vfio_dev_in_groups(cur_vma, groups)) {
+		if (iommufd_ctx) {
+			int devid = vfio_iommufd_device_hot_reset_devid(&cur_vma->vdev,
+									iommufd_ctx);
+
+			owned = (devid != VFIO_PCI_DEVID_NOT_OWNED);
+		} else {
+			owned = vfio_dev_in_groups(&cur_vma->vdev, groups);
+		}
+
+		if (!owned) {
 			ret = -EINVAL;
 			goto err_undo;
 		}
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 70cc31e6b1ce..f753124e1c82 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -690,6 +690,9 @@ enum {
  *	  affected devices are represented in the dev_set and also owned by
  *	  the user.  This flag is available only when
  *	  flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is set, otherwise reserved.
+ *	  When set, user could invoke VFIO_DEVICE_PCI_HOT_RESET with a zero
+ *	  length fd array on the calling device as the ownership is validated
+ *	  by iommufd_ctx.
  *
  * Return: 0 on success, -errno on failure:
  *	-enospc = insufficient buffer, -enodev = unsupported for device.
@@ -721,6 +724,17 @@ struct vfio_pci_hot_reset_info {
  * VFIO_DEVICE_PCI_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE + 13,
  *				    struct vfio_pci_hot_reset)
  *
+ * Userspace requests hot reset for the devices it operates.  Due to the
+ * underlying topology, multiple devices can be affected in the reset
+ * while some might be opened by another user.  To avoid interference
+ * the calling user must ensure all affected devices are owned by itself.
+ *
+ * As the ownership described by VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, the
+ * cdev opened devices must exclusively provide a zero-length fd array and
+ * the group opened devices must exclusively use an array of group fds for
+ * proof of ownership.  Mixed access to devices between cdev and legacy
+ * groups are not supported by this interface.
+ *
  * Return: 0 on success, -errno on failure.
  */
 struct vfio_pci_hot_reset {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] [PATCH v7 9/9] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
@ 2023-06-02 12:15   ` Yi Liu
  0 siblings, 0 replies; 77+ messages in thread
From: Yi Liu @ 2023-06-02 12:15 UTC (permalink / raw)
  To: alex.williamson, jgg, kevin.tian
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, yi.l.liu, kvm, lulu,
	yanting.jiang, joro, nicolinc, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

This is the way user to invoke hot-reset for the devices opened by cdev
interface. User should check the flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED
in the output of VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl before doing
hot-reset for cdev devices.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Yanting Jiang <yanting.jiang@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 61 ++++++++++++++++++++++++++------
 include/uapi/linux/vfio.h        | 14 ++++++++
 2 files changed, 64 insertions(+), 11 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index a615a223cdef..b0eadafcbcf5 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -181,7 +181,8 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_core_device *vdev)
 struct vfio_pci_group_info;
 static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set);
 static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
-				      struct vfio_pci_group_info *groups);
+				      struct vfio_pci_group_info *groups,
+				      struct iommufd_ctx *iommufd_ctx);
 
 /*
  * INTx masking requires the ability to disable INTx signaling via PCI_COMMAND
@@ -1308,8 +1309,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
 	if (ret)
 		return ret;
 
-	/* Somewhere between 1 and count is OK */
-	if (!array_count || array_count > count)
+	if (array_count > count)
 		return -EINVAL;
 
 	group_fds = kcalloc(array_count, sizeof(*group_fds), GFP_KERNEL);
@@ -1358,7 +1358,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
 	info.count = array_count;
 	info.files = files;
 
-	ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info);
+	ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info, NULL);
 
 hot_reset_release:
 	for (file_idx--; file_idx >= 0; file_idx--)
@@ -1381,13 +1381,21 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
 	if (hdr.argsz < minsz || hdr.flags)
 		return -EINVAL;
 
+	/* zero-length array is only for cdev opened devices */
+	if (!!hdr.count == vfio_device_cdev_opened(&vdev->vdev))
+		return -EINVAL;
+
 	/* Can we do a slot or bus reset or neither? */
 	if (!pci_probe_reset_slot(vdev->pdev->slot))
 		slot = true;
 	else if (pci_probe_reset_bus(vdev->pdev->bus))
 		return -ENODEV;
 
-	return vfio_pci_ioctl_pci_hot_reset_groups(vdev, hdr.count, slot, arg);
+	if (hdr.count)
+		return vfio_pci_ioctl_pci_hot_reset_groups(vdev, hdr.count, slot, arg);
+
+	return vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, NULL,
+					  vfio_iommufd_device_ictx(&vdev->vdev));
 }
 
 static int vfio_pci_ioctl_ioeventfd(struct vfio_pci_core_device *vdev,
@@ -2354,13 +2362,16 @@ const struct pci_error_handlers vfio_pci_core_err_handlers = {
 };
 EXPORT_SYMBOL_GPL(vfio_pci_core_err_handlers);
 
-static bool vfio_dev_in_groups(struct vfio_pci_core_device *vdev,
+static bool vfio_dev_in_groups(struct vfio_device *vdev,
 			       struct vfio_pci_group_info *groups)
 {
 	unsigned int i;
 
+	if (!groups)
+		return false;
+
 	for (i = 0; i < groups->count; i++)
-		if (vfio_file_has_dev(groups->files[i], &vdev->vdev))
+		if (vfio_file_has_dev(groups->files[i], vdev))
 			return true;
 	return false;
 }
@@ -2436,7 +2447,8 @@ static int vfio_pci_dev_set_pm_runtime_get(struct vfio_device_set *dev_set)
  * get each memory_lock.
  */
 static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
-				      struct vfio_pci_group_info *groups)
+				      struct vfio_pci_group_info *groups,
+				      struct iommufd_ctx *iommufd_ctx)
 {
 	struct vfio_pci_core_device *cur_mem;
 	struct vfio_pci_core_device *cur_vma;
@@ -2466,11 +2478,38 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
 		goto err_unlock;
 
 	list_for_each_entry(cur_vma, &dev_set->device_list, vdev.dev_set_list) {
+		bool owned;
+
 		/*
-		 * Test whether all the affected devices are contained by the
-		 * set of groups provided by the user.
+		 * Test whether all the affected devices can be reset by the
+		 * user.
+		 *
+		 * If called from a group opened device and the user provides
+		 * a set of groups, all the devices in the dev_set should be
+		 * contained by the set of groups provided by the user.
+		 *
+		 * If called from a cdev opened device and the user provides
+		 * a zero-length array, all the devices in the dev_set must
+		 * be bound to the same iommufd_ctx as the input iommufd_ctx.
+		 * If there is any device that has not been bound to any
+		 * iommufd_ctx yet, check if its iommu_group has any device
+		 * bound to the input iommufd_ctx.  Such devices can be
+		 * considered owned by the input iommufd_ctx as the device
+		 * cannot be owned by another iommufd_ctx when its iommu_group
+		 * is owned.
+		 *
+		 * Otherwise, reset is not allowed.
 		 */
-		if (!vfio_dev_in_groups(cur_vma, groups)) {
+		if (iommufd_ctx) {
+			int devid = vfio_iommufd_device_hot_reset_devid(&cur_vma->vdev,
+									iommufd_ctx);
+
+			owned = (devid != VFIO_PCI_DEVID_NOT_OWNED);
+		} else {
+			owned = vfio_dev_in_groups(&cur_vma->vdev, groups);
+		}
+
+		if (!owned) {
 			ret = -EINVAL;
 			goto err_undo;
 		}
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 70cc31e6b1ce..f753124e1c82 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -690,6 +690,9 @@ enum {
  *	  affected devices are represented in the dev_set and also owned by
  *	  the user.  This flag is available only when
  *	  flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is set, otherwise reserved.
+ *	  When set, user could invoke VFIO_DEVICE_PCI_HOT_RESET with a zero
+ *	  length fd array on the calling device as the ownership is validated
+ *	  by iommufd_ctx.
  *
  * Return: 0 on success, -errno on failure:
  *	-enospc = insufficient buffer, -enodev = unsupported for device.
@@ -721,6 +724,17 @@ struct vfio_pci_hot_reset_info {
  * VFIO_DEVICE_PCI_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE + 13,
  *				    struct vfio_pci_hot_reset)
  *
+ * Userspace requests hot reset for the devices it operates.  Due to the
+ * underlying topology, multiple devices can be affected in the reset
+ * while some might be opened by another user.  To avoid interference
+ * the calling user must ensure all affected devices are owned by itself.
+ *
+ * As the ownership described by VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, the
+ * cdev opened devices must exclusively provide a zero-length fd array and
+ * the group opened devices must exclusively use an array of group fds for
+ * proof of ownership.  Mixed access to devices between cdev and legacy
+ * groups are not supported by this interface.
+ *
  * Return: 0 on success, -errno on failure.
  */
 struct vfio_pci_hot_reset {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Enhance vfio PCI hot reset for vfio cdev device (rev5)
  2023-06-02 12:15 ` [Intel-gfx] " Yi Liu
                   ` (9 preceding siblings ...)
  (?)
@ 2023-06-02 15:14 ` Patchwork
  -1 siblings, 0 replies; 77+ messages in thread
From: Patchwork @ 2023-06-02 15:14 UTC (permalink / raw)
  To: Alex Williamson; +Cc: intel-gfx

== Series Details ==

Series: Enhance vfio PCI hot reset for vfio cdev device (rev5)
URL   : https://patchwork.freedesktop.org/series/116991/
State : warning

== Summary ==

Error: dim checkpatch failed
f87b519b0c61 vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset()
7bb647905c96 vfio/pci: Move the existing hot reset logic to be a helper
-:6: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#6: 
This prepares to add another method for hot reset. The major hot reset logic

total: 0 errors, 1 warnings, 0 checks, 99 lines checked
7f5d0638a299 iommufd: Reserve all negative IDs in the iommufd xarray
f121ad27c9d3 iommufd: Add iommufd_ctx_has_group()
3bbd0b1fd6f9 iommufd: Add helper to retrieve iommufd_ctx and devid
f38953b3e72a vfio: Mark cdev usage in vfio_device
-:8: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#8: 
cdev path yet, so the vfio_device_cdev_opened() helper always returns false.

total: 0 errors, 1 warnings, 0 checks, 11 lines checked
caf12fc9c4de vfio: Add helper to search vfio_device in a dev_set
9425c8cc330c vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
0b7683a235f1 vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET



^ permalink raw reply	[flat|nested] 77+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for Enhance vfio PCI hot reset for vfio cdev device (rev5)
  2023-06-02 12:15 ` [Intel-gfx] " Yi Liu
                   ` (10 preceding siblings ...)
  (?)
@ 2023-06-02 15:29 ` Patchwork
  -1 siblings, 0 replies; 77+ messages in thread
From: Patchwork @ 2023-06-02 15:29 UTC (permalink / raw)
  To: Alex Williamson; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 12056 bytes --]

== Series Details ==

Series: Enhance vfio PCI hot reset for vfio cdev device (rev5)
URL   : https://patchwork.freedesktop.org/series/116991/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_13220 -> Patchwork_116991v5
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/index.html

Participating hosts (37 -> 38)
------------------------------

  Additional (2): fi-kbl-soraka bat-dg1-5 
  Missing    (1): fi-snb-2520m 

Known issues
------------

  Here are the changes found in Patchwork_116991v5 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@dmabuf@all-tests@dma_fence:
    - fi-glk-j4005:       [PASS][1] -> [ABORT][2] ([i915#8144])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13220/fi-glk-j4005/igt@dmabuf@all-tests@dma_fence.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/fi-glk-j4005/igt@dmabuf@all-tests@dma_fence.html

  * igt@dmabuf@all-tests@dma_fence_chain:
    - fi-glk-j4005:       [PASS][3] -> [DMESG-FAIL][4] ([i915#8144])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13220/fi-glk-j4005/igt@dmabuf@all-tests@dma_fence_chain.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/fi-glk-j4005/igt@dmabuf@all-tests@dma_fence_chain.html

  * igt@gem_huc_copy@huc-copy:
    - fi-kbl-soraka:      NOTRUN -> [SKIP][5] ([fdo#109271] / [i915#2190])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/fi-kbl-soraka/igt@gem_huc_copy@huc-copy.html

  * igt@gem_lmem_swapping@basic:
    - fi-kbl-soraka:      NOTRUN -> [SKIP][6] ([fdo#109271] / [i915#4613]) +3 similar issues
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/fi-kbl-soraka/igt@gem_lmem_swapping@basic.html

  * igt@gem_mmap@basic:
    - bat-dg1-5:          NOTRUN -> [SKIP][7] ([i915#4083])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/bat-dg1-5/igt@gem_mmap@basic.html

  * igt@gem_tiled_fence_blits@basic:
    - bat-dg1-5:          NOTRUN -> [SKIP][8] ([i915#4077]) +2 similar issues
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/bat-dg1-5/igt@gem_tiled_fence_blits@basic.html

  * igt@gem_tiled_pread_basic:
    - bat-dg1-5:          NOTRUN -> [SKIP][9] ([i915#4079]) +1 similar issue
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/bat-dg1-5/igt@gem_tiled_pread_basic.html

  * igt@i915_pm_backlight@basic-brightness:
    - bat-dg1-5:          NOTRUN -> [SKIP][10] ([i915#7561])
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/bat-dg1-5/igt@i915_pm_backlight@basic-brightness.html

  * igt@i915_pm_rpm@basic-pci-d3-state:
    - fi-hsw-4770:        [PASS][11] -> [SKIP][12] ([fdo#109271])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13220/fi-hsw-4770/igt@i915_pm_rpm@basic-pci-d3-state.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/fi-hsw-4770/igt@i915_pm_rpm@basic-pci-d3-state.html

  * igt@i915_pm_rpm@basic-rte:
    - fi-hsw-4770:        [PASS][13] -> [FAIL][14] ([i915#7364])
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13220/fi-hsw-4770/igt@i915_pm_rpm@basic-rte.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/fi-hsw-4770/igt@i915_pm_rpm@basic-rte.html

  * igt@i915_pm_rps@basic-api:
    - bat-dg1-5:          NOTRUN -> [SKIP][15] ([i915#6621])
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/bat-dg1-5/igt@i915_pm_rps@basic-api.html

  * igt@i915_selftest@live@gt_heartbeat:
    - fi-kbl-soraka:      NOTRUN -> [DMESG-FAIL][16] ([i915#5334] / [i915#7872])
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/fi-kbl-soraka/igt@i915_selftest@live@gt_heartbeat.html

  * igt@i915_selftest@live@gt_pm:
    - fi-kbl-soraka:      NOTRUN -> [DMESG-FAIL][17] ([i915#1886] / [i915#7913])
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/fi-kbl-soraka/igt@i915_selftest@live@gt_pm.html

  * igt@i915_selftest@live@requests:
    - bat-rpls-1:         [PASS][18] -> [ABORT][19] ([i915#7911] / [i915#7920] / [i915#7982])
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13220/bat-rpls-1/igt@i915_selftest@live@requests.html
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/bat-rpls-1/igt@i915_selftest@live@requests.html
    - fi-kbl-soraka:      NOTRUN -> [ABORT][20] ([i915#7913])
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/fi-kbl-soraka/igt@i915_selftest@live@requests.html

  * igt@i915_suspend@basic-s2idle-without-i915:
    - bat-rpls-2:         NOTRUN -> [ABORT][21] ([i915#6687])
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/bat-rpls-2/igt@i915_suspend@basic-s2idle-without-i915.html

  * igt@kms_addfb_basic@basic-x-tiled-legacy:
    - bat-dg1-5:          NOTRUN -> [SKIP][22] ([i915#4212]) +7 similar issues
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/bat-dg1-5/igt@kms_addfb_basic@basic-x-tiled-legacy.html

  * igt@kms_addfb_basic@basic-y-tiled-legacy:
    - bat-dg1-5:          NOTRUN -> [SKIP][23] ([i915#4215])
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/bat-dg1-5/igt@kms_addfb_basic@basic-y-tiled-legacy.html

  * igt@kms_chamelium_frames@hdmi-crc-fast:
    - fi-kbl-soraka:      NOTRUN -> [SKIP][24] ([fdo#109271]) +14 similar issues
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/fi-kbl-soraka/igt@kms_chamelium_frames@hdmi-crc-fast.html

  * igt@kms_chamelium_hpd@vga-hpd-fast:
    - bat-dg1-5:          NOTRUN -> [SKIP][25] ([i915#7828]) +8 similar issues
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/bat-dg1-5/igt@kms_chamelium_hpd@vga-hpd-fast.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor-legacy:
    - bat-dg1-5:          NOTRUN -> [SKIP][26] ([i915#4103] / [i915#4213]) +1 similar issue
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/bat-dg1-5/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-legacy.html

  * igt@kms_force_connector_basic@force-load-detect:
    - bat-dg1-5:          NOTRUN -> [SKIP][27] ([fdo#109285])
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/bat-dg1-5/igt@kms_force_connector_basic@force-load-detect.html

  * igt@kms_psr@sprite_plane_onoff:
    - bat-dg1-5:          NOTRUN -> [SKIP][28] ([i915#1072] / [i915#4078]) +3 similar issues
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/bat-dg1-5/igt@kms_psr@sprite_plane_onoff.html

  * igt@kms_setmode@basic-clone-single-crtc:
    - bat-dg1-5:          NOTRUN -> [SKIP][29] ([i915#3555] / [i915#4579])
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/bat-dg1-5/igt@kms_setmode@basic-clone-single-crtc.html
    - fi-kbl-soraka:      NOTRUN -> [SKIP][30] ([fdo#109271] / [i915#4579])
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/fi-kbl-soraka/igt@kms_setmode@basic-clone-single-crtc.html

  * igt@prime_vgem@basic-fence-read:
    - bat-dg1-5:          NOTRUN -> [SKIP][31] ([i915#3708]) +3 similar issues
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/bat-dg1-5/igt@prime_vgem@basic-fence-read.html

  * igt@prime_vgem@basic-gtt:
    - bat-dg1-5:          NOTRUN -> [SKIP][32] ([i915#3708] / [i915#4077]) +1 similar issue
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/bat-dg1-5/igt@prime_vgem@basic-gtt.html

  
#### Possible fixes ####

  * igt@i915_selftest@live@reset:
    - bat-rpls-2:         [ABORT][33] ([i915#4983] / [i915#7461] / [i915#7913] / [i915#7981] / [i915#8347]) -> [PASS][34]
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13220/bat-rpls-2/igt@i915_selftest@live@reset.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/bat-rpls-2/igt@i915_selftest@live@reset.html

  * igt@i915_selftest@live@slpc:
    - {bat-mtlp-6}:       [DMESG-WARN][35] ([i915#6367]) -> [PASS][36]
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13220/bat-mtlp-6/igt@i915_selftest@live@slpc.html
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/bat-mtlp-6/igt@i915_selftest@live@slpc.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [i915#1072]: https://gitlab.freedesktop.org/drm/intel/issues/1072
  [i915#1886]: https://gitlab.freedesktop.org/drm/intel/issues/1886
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#3708]: https://gitlab.freedesktop.org/drm/intel/issues/3708
  [i915#4077]: https://gitlab.freedesktop.org/drm/intel/issues/4077
  [i915#4078]: https://gitlab.freedesktop.org/drm/intel/issues/4078
  [i915#4079]: https://gitlab.freedesktop.org/drm/intel/issues/4079
  [i915#4083]: https://gitlab.freedesktop.org/drm/intel/issues/4083
  [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103
  [i915#4212]: https://gitlab.freedesktop.org/drm/intel/issues/4212
  [i915#4213]: https://gitlab.freedesktop.org/drm/intel/issues/4213
  [i915#4215]: https://gitlab.freedesktop.org/drm/intel/issues/4215
  [i915#4423]: https://gitlab.freedesktop.org/drm/intel/issues/4423
  [i915#4579]: https://gitlab.freedesktop.org/drm/intel/issues/4579
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#4983]: https://gitlab.freedesktop.org/drm/intel/issues/4983
  [i915#5334]: https://gitlab.freedesktop.org/drm/intel/issues/5334
  [i915#6367]: https://gitlab.freedesktop.org/drm/intel/issues/6367
  [i915#6621]: https://gitlab.freedesktop.org/drm/intel/issues/6621
  [i915#6687]: https://gitlab.freedesktop.org/drm/intel/issues/6687
  [i915#7364]: https://gitlab.freedesktop.org/drm/intel/issues/7364
  [i915#7461]: https://gitlab.freedesktop.org/drm/intel/issues/7461
  [i915#7561]: https://gitlab.freedesktop.org/drm/intel/issues/7561
  [i915#7828]: https://gitlab.freedesktop.org/drm/intel/issues/7828
  [i915#7872]: https://gitlab.freedesktop.org/drm/intel/issues/7872
  [i915#7911]: https://gitlab.freedesktop.org/drm/intel/issues/7911
  [i915#7913]: https://gitlab.freedesktop.org/drm/intel/issues/7913
  [i915#7920]: https://gitlab.freedesktop.org/drm/intel/issues/7920
  [i915#7953]: https://gitlab.freedesktop.org/drm/intel/issues/7953
  [i915#7981]: https://gitlab.freedesktop.org/drm/intel/issues/7981
  [i915#7982]: https://gitlab.freedesktop.org/drm/intel/issues/7982
  [i915#8144]: https://gitlab.freedesktop.org/drm/intel/issues/8144
  [i915#8347]: https://gitlab.freedesktop.org/drm/intel/issues/8347


Build changes
-------------

  * Linux: CI_DRM_13220 -> Patchwork_116991v5

  CI-20190529: 20190529
  CI_DRM_13220: 52299578d2150cbbdfe9c8958639a0feeb55a9a4 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_7317: c902b72df45aa49faa38205bc5be3c748d33a3e0 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_116991v5: 52299578d2150cbbdfe9c8958639a0feeb55a9a4 @ git://anongit.freedesktop.org/gfx-ci/linux


### Linux commits

15dc769d8405 vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
74bc9a8e285a vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
515ee28a3274 vfio: Add helper to search vfio_device in a dev_set
93756090ccc7 vfio: Mark cdev usage in vfio_device
f88532f3a791 iommufd: Add helper to retrieve iommufd_ctx and devid
7e4a2e4b109f iommufd: Add iommufd_ctx_has_group()
25d7752140c9 iommufd: Reserve all negative IDs in the iommufd xarray
08448ae36826 vfio/pci: Move the existing hot reset logic to be a helper
7efe79bff3d9 vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset()

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/index.html

[-- Attachment #2: Type: text/html, Size: 13947 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [Intel-gfx] ✓ Fi.CI.IGT: success for Enhance vfio PCI hot reset for vfio cdev device (rev5)
  2023-06-02 12:15 ` [Intel-gfx] " Yi Liu
                   ` (11 preceding siblings ...)
  (?)
@ 2023-06-04 20:05 ` Patchwork
  -1 siblings, 0 replies; 77+ messages in thread
From: Patchwork @ 2023-06-04 20:05 UTC (permalink / raw)
  To: Alex Williamson; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 7749 bytes --]

== Series Details ==

Series: Enhance vfio PCI hot reset for vfio cdev device (rev5)
URL   : https://patchwork.freedesktop.org/series/116991/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_13220_full -> Patchwork_116991v5_full
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Participating hosts (7 -> 7)
------------------------------

  No changes in participating hosts

Known issues
------------

  Here are the changes found in Patchwork_116991v5_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_fair@basic-pace@vcs0:
    - shard-glk:          [PASS][1] -> [FAIL][2] ([i915#2842])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13220/shard-glk6/igt@gem_exec_fair@basic-pace@vcs0.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/shard-glk1/igt@gem_exec_fair@basic-pace@vcs0.html

  * igt@gem_spin_batch@spin-each:
    - shard-apl:          [PASS][3] -> [FAIL][4] ([i915#2898])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13220/shard-apl4/igt@gem_spin_batch@spin-each.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/shard-apl3/igt@gem_spin_batch@spin-each.html

  * igt@kms_ccs@pipe-b-bad-aux-stride-y_tiled_gen12_mc_ccs:
    - shard-apl:          NOTRUN -> [SKIP][5] ([fdo#109271] / [i915#3886])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/shard-apl6/igt@kms_ccs@pipe-b-bad-aux-stride-y_tiled_gen12_mc_ccs.html

  * igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions:
    - shard-apl:          [PASS][6] -> [FAIL][7] ([i915#2346])
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13220/shard-apl6/igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions.html
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/shard-apl1/igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions.html

  * igt@kms_plane_scaling@plane-upscale-with-modifiers-factor-0-25@pipe-a-vga-1:
    - shard-snb:          NOTRUN -> [SKIP][8] ([fdo#109271]) +14 similar issues
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/shard-snb4/igt@kms_plane_scaling@plane-upscale-with-modifiers-factor-0-25@pipe-a-vga-1.html

  * igt@kms_plane_scaling@planes-downscale-factor-0-5-unity-scaling@pipe-b-vga-1:
    - shard-snb:          NOTRUN -> [SKIP][9] ([fdo#109271] / [i915#4579]) +7 similar issues
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/shard-snb4/igt@kms_plane_scaling@planes-downscale-factor-0-5-unity-scaling@pipe-b-vga-1.html

  * igt@kms_vrr@flip-suspend:
    - shard-apl:          NOTRUN -> [SKIP][10] ([fdo#109271] / [i915#4579]) +3 similar issues
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/shard-apl6/igt@kms_vrr@flip-suspend.html

  * igt@v3d/v3d_wait_bo@unused-bo-1ns:
    - shard-apl:          NOTRUN -> [SKIP][11] ([fdo#109271]) +32 similar issues
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/shard-apl6/igt@v3d/v3d_wait_bo@unused-bo-1ns.html

  
#### Possible fixes ####

  * igt@gem_eio@reset-stress:
    - {shard-dg1}:        [FAIL][12] ([i915#5784]) -> [PASS][13]
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13220/shard-dg1-17/igt@gem_eio@reset-stress.html
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/shard-dg1-15/igt@gem_eio@reset-stress.html

  * igt@gem_exec_fair@basic-none@bcs0:
    - {shard-rkl}:        [FAIL][14] ([i915#2842]) -> [PASS][15] +1 similar issue
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13220/shard-rkl-2/igt@gem_exec_fair@basic-none@bcs0.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/shard-rkl-6/igt@gem_exec_fair@basic-none@bcs0.html

  * igt@gem_exec_fair@basic-pace-solo@rcs0:
    - shard-glk:          [FAIL][16] ([i915#2842]) -> [PASS][17]
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13220/shard-glk2/igt@gem_exec_fair@basic-pace-solo@rcs0.html
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/shard-glk2/igt@gem_exec_fair@basic-pace-solo@rcs0.html

  * igt@i915_pm_rpm@modeset-non-lpsp-stress:
    - {shard-rkl}:        [SKIP][18] ([i915#1397]) -> [PASS][19] +1 similar issue
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13220/shard-rkl-7/igt@i915_pm_rpm@modeset-non-lpsp-stress.html
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/shard-rkl-4/igt@i915_pm_rpm@modeset-non-lpsp-stress.html

  * igt@i915_suspend@basic-s3-without-i915:
    - {shard-rkl}:        [FAIL][20] ([fdo#103375]) -> [PASS][21]
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13220/shard-rkl-6/igt@i915_suspend@basic-s3-without-i915.html
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/shard-rkl-7/igt@i915_suspend@basic-s3-without-i915.html

  * igt@kms_plane@plane-panning-bottom-right-suspend@pipe-b-planes:
    - shard-apl:          [ABORT][22] ([i915#180]) -> [PASS][23]
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_13220/shard-apl1/igt@kms_plane@plane-panning-bottom-right-suspend@pipe-b-planes.html
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/shard-apl6/igt@kms_plane@plane-panning-bottom-right-suspend@pipe-b-planes.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#103375]: https://bugs.freedesktop.org/show_bug.cgi?id=103375
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [i915#1072]: https://gitlab.freedesktop.org/drm/intel/issues/1072
  [i915#1397]: https://gitlab.freedesktop.org/drm/intel/issues/1397
  [i915#180]: https://gitlab.freedesktop.org/drm/intel/issues/180
  [i915#2346]: https://gitlab.freedesktop.org/drm/intel/issues/2346
  [i915#2842]: https://gitlab.freedesktop.org/drm/intel/issues/2842
  [i915#2898]: https://gitlab.freedesktop.org/drm/intel/issues/2898
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#3591]: https://gitlab.freedesktop.org/drm/intel/issues/3591
  [i915#3886]: https://gitlab.freedesktop.org/drm/intel/issues/3886
  [i915#4070]: https://gitlab.freedesktop.org/drm/intel/issues/4070
  [i915#4078]: https://gitlab.freedesktop.org/drm/intel/issues/4078
  [i915#4579]: https://gitlab.freedesktop.org/drm/intel/issues/4579
  [i915#4816]: https://gitlab.freedesktop.org/drm/intel/issues/4816
  [i915#5176]: https://gitlab.freedesktop.org/drm/intel/issues/5176
  [i915#5235]: https://gitlab.freedesktop.org/drm/intel/issues/5235
  [i915#5493]: https://gitlab.freedesktop.org/drm/intel/issues/5493
  [i915#5784]: https://gitlab.freedesktop.org/drm/intel/issues/5784
  [i915#6268]: https://gitlab.freedesktop.org/drm/intel/issues/6268
  [i915#7742]: https://gitlab.freedesktop.org/drm/intel/issues/7742
  [i915#7975]: https://gitlab.freedesktop.org/drm/intel/issues/7975
  [i915#8011]: https://gitlab.freedesktop.org/drm/intel/issues/8011
  [i915#8213]: https://gitlab.freedesktop.org/drm/intel/issues/8213
  [i915#8304]: https://gitlab.freedesktop.org/drm/intel/issues/8304


Build changes
-------------

  * Linux: CI_DRM_13220 -> Patchwork_116991v5

  CI-20190529: 20190529
  CI_DRM_13220: 52299578d2150cbbdfe9c8958639a0feeb55a9a4 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_7317: c902b72df45aa49faa38205bc5be3c748d33a3e0 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_116991v5: 52299578d2150cbbdfe9c8958639a0feeb55a9a4 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_116991v5/index.html

[-- Attachment #2: Type: text/html, Size: 8144 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* RE: [PATCH v7 0/9] Enhance vfio PCI hot reset for vfio cdev device
  2023-06-02 12:15 ` [Intel-gfx] " Yi Liu
@ 2023-06-08  6:59   ` Jiang, Yanting
  -1 siblings, 0 replies; 77+ messages in thread
From: Jiang, Yanting @ 2023-06-08  6:59 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg, Tian, Kevin
  Cc: joro, robin.murphy, cohuck, eric.auger, nicolinc, kvm, mjrosato,
	chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Duan, Zhenzhong, clegoate

> Subject: [PATCH v7 0/9] Enhance vfio PCI hot reset for vfio cdev device
> 
> VFIO_DEVICE_PCI_HOT_RESET requires user to pass an array of group fds to
> prove that it owns all devices affected by resetting the calling device. While for
> cdev devices, user can use an iommufd-based ownership checking model and
> invoke VFIO_DEVICE_PCI_HOT_RESET with a zero-length fd array.
> 
> This series extends VFIO_DEVICE_GET_PCI_HOT_RESET_INFO to check
> ownership and return the check result and the devid of affected devices to user.
> In the end, extends the VFIO_DEVICE_PCI_HOT_RESET to accept zero-length fd
> array for hot-reset with cdev devices.
> 
> The new hot reset method and updated _INFO ioctl are tested with the below
> qemu:
> 
> https://github.com/yiliu1765/qemu/tree/iommufd_rfcv4.mig.reset.v4_var3
> (requires to test with the cdev kernel)
> 

Tested NIC passthrough on Intel platform.
Result looks good hence,
Tested-by: Yanting Jiang <yanting.jiang@intel.com>

Thanks,
Yanting


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH v7 0/9] Enhance vfio PCI hot reset for vfio cdev device
@ 2023-06-08  6:59   ` Jiang, Yanting
  0 siblings, 0 replies; 77+ messages in thread
From: Jiang, Yanting @ 2023-06-08  6:59 UTC (permalink / raw)
  To: Liu, Yi L, alex.williamson, jgg, Tian, Kevin
  Cc: mjrosato, jasowang, Hao, Xudong, peterx, Xu, Terrence,
	chao.p.peng, linux-s390, kvm, lulu, Duan, Zhenzhong, joro,
	nicolinc, Zhao, Yan Y, intel-gfx, eric.auger, intel-gvt-dev,
	yi.y.sun, clegoate, cohuck, shameerali.kolothum.thodi,
	suravee.suthikulpanit, robin.murphy

> Subject: [PATCH v7 0/9] Enhance vfio PCI hot reset for vfio cdev device
> 
> VFIO_DEVICE_PCI_HOT_RESET requires user to pass an array of group fds to
> prove that it owns all devices affected by resetting the calling device. While for
> cdev devices, user can use an iommufd-based ownership checking model and
> invoke VFIO_DEVICE_PCI_HOT_RESET with a zero-length fd array.
> 
> This series extends VFIO_DEVICE_GET_PCI_HOT_RESET_INFO to check
> ownership and return the check result and the devid of affected devices to user.
> In the end, extends the VFIO_DEVICE_PCI_HOT_RESET to accept zero-length fd
> array for hot-reset with cdev devices.
> 
> The new hot reset method and updated _INFO ioctl are tested with the below
> qemu:
> 
> https://github.com/yiliu1765/qemu/tree/iommufd_rfcv4.mig.reset.v4_var3
> (requires to test with the cdev kernel)
> 

Tested NIC passthrough on Intel platform.
Result looks good hence,
Tested-by: Yanting Jiang <yanting.jiang@intel.com>

Thanks,
Yanting


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH v7 4/9] iommufd: Add iommufd_ctx_has_group()
  2023-06-02 12:15   ` [Intel-gfx] " Yi Liu
@ 2023-06-08 21:40     ` Alex Williamson
  -1 siblings, 0 replies; 77+ messages in thread
From: Alex Williamson @ 2023-06-08 21:40 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, kvm, lulu, yanting.jiang,
	joro, nicolinc, jgg, kevin.tian, yan.y.zhao, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Fri,  2 Jun 2023 05:15:10 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> This adds the helper to check if any device within the given iommu_group
> has been bound with the iommufd_ctx. This is helpful for the checking on
> device ownership for the devices which have not been bound but cannot be
> bound to any other iommufd_ctx as the iommu_group has been bound.
> 
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/iommu/iommufd/device.c | 30 ++++++++++++++++++++++++++++++
>  include/linux/iommufd.h        |  8 ++++++++
>  2 files changed, 38 insertions(+)
> 
> diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> index 4f9b2142274c..4571344c8508 100644
> --- a/drivers/iommu/iommufd/device.c
> +++ b/drivers/iommu/iommufd/device.c
> @@ -98,6 +98,36 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
>  }
>  EXPORT_SYMBOL_NS_GPL(iommufd_device_bind, IOMMUFD);
>  
> +/**
> + * iommufd_ctx_has_group - True if any device within the group is bound
> + *                         to the ictx
> + * @ictx: iommufd file descriptor
> + * @group: Pointer to a physical iommu_group struct
> + *
> + * True if any device within the group has been bound to this ictx, ex. via
> + * iommufd_device_bind(), therefore implying ictx ownership of the group.
> + */
> +bool iommufd_ctx_has_group(struct iommufd_ctx *ictx, struct iommu_group *group)
> +{
> +	struct iommufd_object *obj;
> +	unsigned long index;
> +
> +	if (!ictx || !group)
> +		return false;
> +
> +	xa_lock(&ictx->objects);
> +	xa_for_each(&ictx->objects, index, obj) {
> +		if (obj->type == IOMMUFD_OBJ_DEVICE &&
> +		    container_of(obj, struct iommufd_device, obj)->group == group) {
> +			xa_unlock(&ictx->objects);
> +			return true;
> +		}
> +	}
> +	xa_unlock(&ictx->objects);
> +	return false;
> +}
> +EXPORT_SYMBOL_NS_GPL(iommufd_ctx_has_group, IOMMUFD);
> +
>  /**
>   * iommufd_device_unbind - Undo iommufd_device_bind()
>   * @idev: Device returned by iommufd_device_bind()
> diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
> index 1129a36a74c4..33fe57e95e42 100644
> --- a/include/linux/iommufd.h
> +++ b/include/linux/iommufd.h
> @@ -16,6 +16,7 @@ struct page;
>  struct iommufd_ctx;
>  struct iommufd_access;
>  struct file;
> +struct iommu_group;
>  
>  struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
>  					   struct device *dev, u32 *id);
> @@ -50,6 +51,7 @@ void iommufd_ctx_get(struct iommufd_ctx *ictx);
>  #if IS_ENABLED(CONFIG_IOMMUFD)
>  struct iommufd_ctx *iommufd_ctx_from_file(struct file *file);
>  void iommufd_ctx_put(struct iommufd_ctx *ictx);
> +bool iommufd_ctx_has_group(struct iommufd_ctx *ictx, struct iommu_group *group);
>  
>  int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova,
>  			     unsigned long length, struct page **out_pages,
> @@ -71,6 +73,12 @@ static inline void iommufd_ctx_put(struct iommufd_ctx *ictx)
>  {
>  }
>  
> +static inline bool iommufd_ctx_has_group(struct iommufd_ctx *ictx,
> +					 struct iommu_group *group)
> +{
> +	return false;
> +}
> +
>  static inline int iommufd_access_pin_pages(struct iommufd_access *access,
>  					   unsigned long iova,
>  					   unsigned long length,

It looks like the v12 cdev series no longer requires this stub?  We
haven't used this function except from iommufd specific code since v5.
Thanks,

Alex


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v7 4/9] iommufd: Add iommufd_ctx_has_group()
@ 2023-06-08 21:40     ` Alex Williamson
  0 siblings, 0 replies; 77+ messages in thread
From: Alex Williamson @ 2023-06-08 21:40 UTC (permalink / raw)
  To: Yi Liu
  Cc: jgg, kevin.tian, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

On Fri,  2 Jun 2023 05:15:10 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> This adds the helper to check if any device within the given iommu_group
> has been bound with the iommufd_ctx. This is helpful for the checking on
> device ownership for the devices which have not been bound but cannot be
> bound to any other iommufd_ctx as the iommu_group has been bound.
> 
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/iommu/iommufd/device.c | 30 ++++++++++++++++++++++++++++++
>  include/linux/iommufd.h        |  8 ++++++++
>  2 files changed, 38 insertions(+)
> 
> diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> index 4f9b2142274c..4571344c8508 100644
> --- a/drivers/iommu/iommufd/device.c
> +++ b/drivers/iommu/iommufd/device.c
> @@ -98,6 +98,36 @@ struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
>  }
>  EXPORT_SYMBOL_NS_GPL(iommufd_device_bind, IOMMUFD);
>  
> +/**
> + * iommufd_ctx_has_group - True if any device within the group is bound
> + *                         to the ictx
> + * @ictx: iommufd file descriptor
> + * @group: Pointer to a physical iommu_group struct
> + *
> + * True if any device within the group has been bound to this ictx, ex. via
> + * iommufd_device_bind(), therefore implying ictx ownership of the group.
> + */
> +bool iommufd_ctx_has_group(struct iommufd_ctx *ictx, struct iommu_group *group)
> +{
> +	struct iommufd_object *obj;
> +	unsigned long index;
> +
> +	if (!ictx || !group)
> +		return false;
> +
> +	xa_lock(&ictx->objects);
> +	xa_for_each(&ictx->objects, index, obj) {
> +		if (obj->type == IOMMUFD_OBJ_DEVICE &&
> +		    container_of(obj, struct iommufd_device, obj)->group == group) {
> +			xa_unlock(&ictx->objects);
> +			return true;
> +		}
> +	}
> +	xa_unlock(&ictx->objects);
> +	return false;
> +}
> +EXPORT_SYMBOL_NS_GPL(iommufd_ctx_has_group, IOMMUFD);
> +
>  /**
>   * iommufd_device_unbind - Undo iommufd_device_bind()
>   * @idev: Device returned by iommufd_device_bind()
> diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
> index 1129a36a74c4..33fe57e95e42 100644
> --- a/include/linux/iommufd.h
> +++ b/include/linux/iommufd.h
> @@ -16,6 +16,7 @@ struct page;
>  struct iommufd_ctx;
>  struct iommufd_access;
>  struct file;
> +struct iommu_group;
>  
>  struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
>  					   struct device *dev, u32 *id);
> @@ -50,6 +51,7 @@ void iommufd_ctx_get(struct iommufd_ctx *ictx);
>  #if IS_ENABLED(CONFIG_IOMMUFD)
>  struct iommufd_ctx *iommufd_ctx_from_file(struct file *file);
>  void iommufd_ctx_put(struct iommufd_ctx *ictx);
> +bool iommufd_ctx_has_group(struct iommufd_ctx *ictx, struct iommu_group *group);
>  
>  int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova,
>  			     unsigned long length, struct page **out_pages,
> @@ -71,6 +73,12 @@ static inline void iommufd_ctx_put(struct iommufd_ctx *ictx)
>  {
>  }
>  
> +static inline bool iommufd_ctx_has_group(struct iommufd_ctx *ictx,
> +					 struct iommu_group *group)
> +{
> +	return false;
> +}
> +
>  static inline int iommufd_access_pin_pages(struct iommufd_access *access,
>  					   unsigned long iova,
>  					   unsigned long length,

It looks like the v12 cdev series no longer requires this stub?  We
haven't used this function except from iommufd specific code since v5.
Thanks,

Alex


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH v7 8/9] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
  2023-06-02 12:15   ` [Intel-gfx] " Yi Liu
@ 2023-06-08 22:26     ` Alex Williamson
  -1 siblings, 0 replies; 77+ messages in thread
From: Alex Williamson @ 2023-06-08 22:26 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, kvm, lulu, yanting.jiang,
	joro, nicolinc, jgg, kevin.tian, yan.y.zhao, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Fri,  2 Jun 2023 05:15:14 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> This allows VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl use the iommufd_ctx
> of the cdev device to check the ownership of the other affected devices.
> 
> When VFIO_DEVICE_GET_PCI_HOT_RESET_INFO is called on an IOMMUFD managed
> device, the new flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is reported to indicate
> the values returned are IOMMUFD devids rather than group IDs as used when
> accessing vfio devices through the conventional vfio group interface.
> Additionally the flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED will be reported
> in this mode if all of the devices affected by the hot-reset are owned by
> either virtue of being directly bound to the same iommufd context as the
> calling device, or implicitly owned via a shared IOMMU group.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/iommufd.c           | 49 +++++++++++++++++++++++++++++++
>  drivers/vfio/pci/vfio_pci_core.c | 47 +++++++++++++++++++++++++-----
>  include/linux/vfio.h             | 16 ++++++++++
>  include/uapi/linux/vfio.h        | 50 +++++++++++++++++++++++++++++++-
>  4 files changed, 154 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> index 88b00c501015..a04f3a493437 100644
> --- a/drivers/vfio/iommufd.c
> +++ b/drivers/vfio/iommufd.c
> @@ -66,6 +66,55 @@ void vfio_iommufd_unbind(struct vfio_device *vdev)
>  		vdev->ops->unbind_iommufd(vdev);
>  }
>  
> +struct iommufd_ctx *vfio_iommufd_device_ictx(struct vfio_device *vdev)
> +{
> +	if (vdev->iommufd_device)
> +		return iommufd_device_to_ictx(vdev->iommufd_device);
> +	return NULL;
> +}
> +EXPORT_SYMBOL_GPL(vfio_iommufd_device_ictx);
> +
> +static int vfio_iommufd_device_id(struct vfio_device *vdev)
> +{
> +	if (vdev->iommufd_device)
> +		return iommufd_device_to_id(vdev->iommufd_device);
> +	return -EINVAL;

If this is actually reachable, it allows us to return -EINVAL as a
devid in the reset-info ioctl, which is not a defined value.  Should
this return VFIO_PCI_DEVID_NOT_OWNED or do you want to catch the errno
value in the caller?  Thanks,

Alex

> +}
> +
> +/*
> + * Return devid for a device which is affected by hot-reset.
> + * - valid devid > 0 for the device that is bound to the input
> + *   iommufd_ctx.
> + * - devid == VFIO_PCI_DEVID_OWNED for the device that has not
> + *   been bound to any iommufd_ctx but other device within its
> + *   group has been bound to the input iommufd_ctx.
> + * - devid == VFIO_PCI_DEVID_NOT_OWNED for others. e.g. device
> + *   is bound to other iommufd_ctx etc.
> + */
> +int vfio_iommufd_device_hot_reset_devid(struct vfio_device *vdev,
> +					struct iommufd_ctx *ictx)
> +{
> +	struct iommu_group *group;
> +	int devid;
> +
> +	if (vfio_iommufd_device_ictx(vdev) == ictx)
> +		return vfio_iommufd_device_id(vdev);
> +
> +	group = iommu_group_get(vdev->dev);
> +	if (!group)
> +		return VFIO_PCI_DEVID_NOT_OWNED;
> +
> +	if (iommufd_ctx_has_group(ictx, group))
> +		devid = VFIO_PCI_DEVID_OWNED;
> +	else
> +		devid = VFIO_PCI_DEVID_NOT_OWNED;
> +
> +	iommu_group_put(group);
> +
> +	return devid;
> +}
> +EXPORT_SYMBOL_GPL(vfio_iommufd_device_hot_reset_devid);
> +
>  /*
>   * The physical standard ops mean that the iommufd_device is bound to the
>   * physical device vdev->dev that was provided to vfio_init_group_dev(). Drivers
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 3a2f67675036..a615a223cdef 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -27,6 +27,7 @@
>  #include <linux/vgaarb.h>
>  #include <linux/nospec.h>
>  #include <linux/sched/mm.h>
> +#include <linux/iommufd.h>
>  #if IS_ENABLED(CONFIG_EEH)
>  #include <asm/eeh.h>
>  #endif
> @@ -776,26 +777,49 @@ struct vfio_pci_fill_info {
>  	int max;
>  	int cur;
>  	struct vfio_pci_dependent_device *devices;
> +	struct vfio_device *vdev;
> +	u32 flags;
>  };
>  
>  static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
>  {
>  	struct vfio_pci_fill_info *fill = data;
> -	struct iommu_group *iommu_group;
>  
>  	if (fill->cur == fill->max)
>  		return -EAGAIN; /* Something changed, try again */
>  
> -	iommu_group = iommu_group_get(&pdev->dev);
> -	if (!iommu_group)
> -		return -EPERM; /* Cannot reset non-isolated devices */
> +	if (fill->flags & VFIO_PCI_HOT_RESET_FLAG_DEV_ID) {
> +		struct iommufd_ctx *iommufd = vfio_iommufd_device_ictx(fill->vdev);
> +		struct vfio_device_set *dev_set = fill->vdev->dev_set;
> +		struct vfio_device *vdev;
>  
> -	fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> +		/*
> +		 * hot-reset requires all affected devices be represented in
> +		 * the dev_set.
> +		 */
> +		vdev = vfio_find_device_in_devset(dev_set, &pdev->dev);
> +		if (!vdev)
> +			fill->devices[fill->cur].devid = VFIO_PCI_DEVID_NOT_OWNED;
> +		else
> +			fill->devices[fill->cur].devid =
> +				vfio_iommufd_device_hot_reset_devid(vdev, iommufd);
> +		/* If devid is VFIO_PCI_DEVID_NOT_OWNED, clear owned flag. */
> +		if (fill->devices[fill->cur].devid == VFIO_PCI_DEVID_NOT_OWNED)
> +			fill->flags &= ~VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED;
> +	} else {
> +		struct iommu_group *iommu_group;
> +
> +		iommu_group = iommu_group_get(&pdev->dev);
> +		if (!iommu_group)
> +			return -EPERM; /* Cannot reset non-isolated devices */
> +
> +		fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> +		iommu_group_put(iommu_group);
> +	}
>  	fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
>  	fill->devices[fill->cur].bus = pdev->bus->number;
>  	fill->devices[fill->cur].devfn = pdev->devfn;
>  	fill->cur++;
> -	iommu_group_put(iommu_group);
>  	return 0;
>  }
>  
> @@ -1229,17 +1253,26 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
>  		return -ENOMEM;
>  
>  	fill.devices = devices;
> +	fill.vdev = &vdev->vdev;
> +
> +	if (vfio_device_cdev_opened(&vdev->vdev))
> +		fill.flags |= VFIO_PCI_HOT_RESET_FLAG_DEV_ID |
> +			     VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED;
>  
> +	mutex_lock(&vdev->vdev.dev_set->lock);
>  	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
>  					    &fill, slot);
> +	mutex_unlock(&vdev->vdev.dev_set->lock);
>  
>  	/*
>  	 * If a device was removed between counting and filling, we may come up
>  	 * short of fill.max.  If a device was added, we'll have a return of
>  	 * -EAGAIN above.
>  	 */
> -	if (!ret)
> +	if (!ret) {
>  		hdr.count = fill.cur;
> +		hdr.flags = fill.flags;
> +	}
>  
>  reset_info_exit:
>  	if (copy_to_user(arg, &hdr, minsz))
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index ee120d2d530b..382a7b119c7c 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -114,6 +114,9 @@ struct vfio_device_ops {
>  };
>  
>  #if IS_ENABLED(CONFIG_IOMMUFD)
> +struct iommufd_ctx *vfio_iommufd_device_ictx(struct vfio_device *vdev);
> +int vfio_iommufd_device_hot_reset_devid(struct vfio_device *vdev,
> +					struct iommufd_ctx *ictx);
>  int vfio_iommufd_physical_bind(struct vfio_device *vdev,
>  			       struct iommufd_ctx *ictx, u32 *out_device_id);
>  void vfio_iommufd_physical_unbind(struct vfio_device *vdev);
> @@ -123,6 +126,19 @@ int vfio_iommufd_emulated_bind(struct vfio_device *vdev,
>  void vfio_iommufd_emulated_unbind(struct vfio_device *vdev);
>  int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
>  #else
> +static inline struct iommufd_ctx *
> +vfio_iommufd_device_ictx(struct vfio_device *vdev)
> +{
> +	return NULL;
> +}
> +
> +static inline int
> +vfio_iommufd_device_hot_reset_devid(struct vfio_device *vdev,
> +				    struct iommufd_ctx *ictx)
> +{
> +	return VFIO_PCI_DEVID_NOT_OWNED;
> +}
> +
>  #define vfio_iommufd_physical_bind                                      \
>  	((int (*)(struct vfio_device *vdev, struct iommufd_ctx *ictx,   \
>  		  u32 *out_device_id)) NULL)
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 0552e8dcf0cb..70cc31e6b1ce 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -650,11 +650,57 @@ enum {
>   * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 12,
>   *					      struct vfio_pci_hot_reset_info)
>   *
> + * This command is used to query the affected devices in the hot reset for
> + * a given device.
> + *
> + * This command always reports the segment, bus, and devfn information for
> + * each affected device, and selectively reports the group_id or devid per
> + * the way how the calling device is opened.
> + *
> + *	- If the calling device is opened via the traditional group/container
> + *	  API, group_id is reported.  User should check if it has owned all
> + *	  the affected devices and provides a set of group fds to prove the
> + *	  ownership in VFIO_DEVICE_PCI_HOT_RESET ioctl.
> + *
> + *	- If the calling device is opened as a cdev, devid is reported.
> + *	  Flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is set to indicate this
> + *	  data type.  All the affected devices should be represented in
> + *	  the dev_set, ex. bound to a vfio driver, and also be owned by
> + *	  this interface which is determined by the following conditions:
> + *	  1) Has a valid devid within the iommufd_ctx of the calling device.
> + *	     Ownership cannot be determined across separate iommufd_ctx and
> + *	     the cdev calling conventions do not support a proof-of-ownership
> + *	     model as provided in the legacy group interface.  In this case
> + *	     valid devid with value greater than zero is provided in the return
> + *	     structure.
> + *	  2) Does not have a valid devid within the iommufd_ctx of the calling
> + *	     device, but belongs to the same IOMMU group as the calling device
> + *	     or another opened device that has a valid devid within the
> + *	     iommufd_ctx of the calling device.  This provides implicit ownership
> + *	     for devices within the same DMA isolation context.  In this case
> + *	     the devid value of VFIO_PCI_DEVID_OWNED is provided in the return
> + *	     structure.
> + *
> + *	  A devid value of VFIO_PCI_DEVID_NOT_OWNED is provided in the return
> + *	  structure for affected devices where device is NOT represented in the
> + *	  dev_set or ownership is not available.  Such devices prevent the use
> + *	  of VFIO_DEVICE_PCI_HOT_RESET ioctl outside of the proof-of-ownership
> + *	  calling conventions (ie. via legacy group accessed devices).  Flag
> + *	  VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED would be set when all the
> + *	  affected devices are represented in the dev_set and also owned by
> + *	  the user.  This flag is available only when
> + *	  flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is set, otherwise reserved.
> + *
>   * Return: 0 on success, -errno on failure:
>   *	-enospc = insufficient buffer, -enodev = unsupported for device.
>   */
>  struct vfio_pci_dependent_device {
> -	__u32	group_id;
> +	union {
> +		__u32   group_id;
> +		__u32	devid;
> +#define VFIO_PCI_DEVID_OWNED		0
> +#define VFIO_PCI_DEVID_NOT_OWNED	-1
> +	};
>  	__u16	segment;
>  	__u8	bus;
>  	__u8	devfn; /* Use PCI_SLOT/PCI_FUNC */
> @@ -663,6 +709,8 @@ struct vfio_pci_dependent_device {
>  struct vfio_pci_hot_reset_info {
>  	__u32	argsz;
>  	__u32	flags;
> +#define VFIO_PCI_HOT_RESET_FLAG_DEV_ID		(1 << 0)
> +#define VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED	(1 << 1)
>  	__u32	count;
>  	struct vfio_pci_dependent_device	devices[];
>  };


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v7 8/9] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
@ 2023-06-08 22:26     ` Alex Williamson
  0 siblings, 0 replies; 77+ messages in thread
From: Alex Williamson @ 2023-06-08 22:26 UTC (permalink / raw)
  To: Yi Liu
  Cc: jgg, kevin.tian, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

On Fri,  2 Jun 2023 05:15:14 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> This allows VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl use the iommufd_ctx
> of the cdev device to check the ownership of the other affected devices.
> 
> When VFIO_DEVICE_GET_PCI_HOT_RESET_INFO is called on an IOMMUFD managed
> device, the new flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is reported to indicate
> the values returned are IOMMUFD devids rather than group IDs as used when
> accessing vfio devices through the conventional vfio group interface.
> Additionally the flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED will be reported
> in this mode if all of the devices affected by the hot-reset are owned by
> either virtue of being directly bound to the same iommufd context as the
> calling device, or implicitly owned via a shared IOMMU group.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/iommufd.c           | 49 +++++++++++++++++++++++++++++++
>  drivers/vfio/pci/vfio_pci_core.c | 47 +++++++++++++++++++++++++-----
>  include/linux/vfio.h             | 16 ++++++++++
>  include/uapi/linux/vfio.h        | 50 +++++++++++++++++++++++++++++++-
>  4 files changed, 154 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> index 88b00c501015..a04f3a493437 100644
> --- a/drivers/vfio/iommufd.c
> +++ b/drivers/vfio/iommufd.c
> @@ -66,6 +66,55 @@ void vfio_iommufd_unbind(struct vfio_device *vdev)
>  		vdev->ops->unbind_iommufd(vdev);
>  }
>  
> +struct iommufd_ctx *vfio_iommufd_device_ictx(struct vfio_device *vdev)
> +{
> +	if (vdev->iommufd_device)
> +		return iommufd_device_to_ictx(vdev->iommufd_device);
> +	return NULL;
> +}
> +EXPORT_SYMBOL_GPL(vfio_iommufd_device_ictx);
> +
> +static int vfio_iommufd_device_id(struct vfio_device *vdev)
> +{
> +	if (vdev->iommufd_device)
> +		return iommufd_device_to_id(vdev->iommufd_device);
> +	return -EINVAL;

If this is actually reachable, it allows us to return -EINVAL as a
devid in the reset-info ioctl, which is not a defined value.  Should
this return VFIO_PCI_DEVID_NOT_OWNED or do you want to catch the errno
value in the caller?  Thanks,

Alex

> +}
> +
> +/*
> + * Return devid for a device which is affected by hot-reset.
> + * - valid devid > 0 for the device that is bound to the input
> + *   iommufd_ctx.
> + * - devid == VFIO_PCI_DEVID_OWNED for the device that has not
> + *   been bound to any iommufd_ctx but other device within its
> + *   group has been bound to the input iommufd_ctx.
> + * - devid == VFIO_PCI_DEVID_NOT_OWNED for others. e.g. device
> + *   is bound to other iommufd_ctx etc.
> + */
> +int vfio_iommufd_device_hot_reset_devid(struct vfio_device *vdev,
> +					struct iommufd_ctx *ictx)
> +{
> +	struct iommu_group *group;
> +	int devid;
> +
> +	if (vfio_iommufd_device_ictx(vdev) == ictx)
> +		return vfio_iommufd_device_id(vdev);
> +
> +	group = iommu_group_get(vdev->dev);
> +	if (!group)
> +		return VFIO_PCI_DEVID_NOT_OWNED;
> +
> +	if (iommufd_ctx_has_group(ictx, group))
> +		devid = VFIO_PCI_DEVID_OWNED;
> +	else
> +		devid = VFIO_PCI_DEVID_NOT_OWNED;
> +
> +	iommu_group_put(group);
> +
> +	return devid;
> +}
> +EXPORT_SYMBOL_GPL(vfio_iommufd_device_hot_reset_devid);
> +
>  /*
>   * The physical standard ops mean that the iommufd_device is bound to the
>   * physical device vdev->dev that was provided to vfio_init_group_dev(). Drivers
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index 3a2f67675036..a615a223cdef 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -27,6 +27,7 @@
>  #include <linux/vgaarb.h>
>  #include <linux/nospec.h>
>  #include <linux/sched/mm.h>
> +#include <linux/iommufd.h>
>  #if IS_ENABLED(CONFIG_EEH)
>  #include <asm/eeh.h>
>  #endif
> @@ -776,26 +777,49 @@ struct vfio_pci_fill_info {
>  	int max;
>  	int cur;
>  	struct vfio_pci_dependent_device *devices;
> +	struct vfio_device *vdev;
> +	u32 flags;
>  };
>  
>  static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
>  {
>  	struct vfio_pci_fill_info *fill = data;
> -	struct iommu_group *iommu_group;
>  
>  	if (fill->cur == fill->max)
>  		return -EAGAIN; /* Something changed, try again */
>  
> -	iommu_group = iommu_group_get(&pdev->dev);
> -	if (!iommu_group)
> -		return -EPERM; /* Cannot reset non-isolated devices */
> +	if (fill->flags & VFIO_PCI_HOT_RESET_FLAG_DEV_ID) {
> +		struct iommufd_ctx *iommufd = vfio_iommufd_device_ictx(fill->vdev);
> +		struct vfio_device_set *dev_set = fill->vdev->dev_set;
> +		struct vfio_device *vdev;
>  
> -	fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> +		/*
> +		 * hot-reset requires all affected devices be represented in
> +		 * the dev_set.
> +		 */
> +		vdev = vfio_find_device_in_devset(dev_set, &pdev->dev);
> +		if (!vdev)
> +			fill->devices[fill->cur].devid = VFIO_PCI_DEVID_NOT_OWNED;
> +		else
> +			fill->devices[fill->cur].devid =
> +				vfio_iommufd_device_hot_reset_devid(vdev, iommufd);
> +		/* If devid is VFIO_PCI_DEVID_NOT_OWNED, clear owned flag. */
> +		if (fill->devices[fill->cur].devid == VFIO_PCI_DEVID_NOT_OWNED)
> +			fill->flags &= ~VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED;
> +	} else {
> +		struct iommu_group *iommu_group;
> +
> +		iommu_group = iommu_group_get(&pdev->dev);
> +		if (!iommu_group)
> +			return -EPERM; /* Cannot reset non-isolated devices */
> +
> +		fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> +		iommu_group_put(iommu_group);
> +	}
>  	fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
>  	fill->devices[fill->cur].bus = pdev->bus->number;
>  	fill->devices[fill->cur].devfn = pdev->devfn;
>  	fill->cur++;
> -	iommu_group_put(iommu_group);
>  	return 0;
>  }
>  
> @@ -1229,17 +1253,26 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
>  		return -ENOMEM;
>  
>  	fill.devices = devices;
> +	fill.vdev = &vdev->vdev;
> +
> +	if (vfio_device_cdev_opened(&vdev->vdev))
> +		fill.flags |= VFIO_PCI_HOT_RESET_FLAG_DEV_ID |
> +			     VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED;
>  
> +	mutex_lock(&vdev->vdev.dev_set->lock);
>  	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
>  					    &fill, slot);
> +	mutex_unlock(&vdev->vdev.dev_set->lock);
>  
>  	/*
>  	 * If a device was removed between counting and filling, we may come up
>  	 * short of fill.max.  If a device was added, we'll have a return of
>  	 * -EAGAIN above.
>  	 */
> -	if (!ret)
> +	if (!ret) {
>  		hdr.count = fill.cur;
> +		hdr.flags = fill.flags;
> +	}
>  
>  reset_info_exit:
>  	if (copy_to_user(arg, &hdr, minsz))
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index ee120d2d530b..382a7b119c7c 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -114,6 +114,9 @@ struct vfio_device_ops {
>  };
>  
>  #if IS_ENABLED(CONFIG_IOMMUFD)
> +struct iommufd_ctx *vfio_iommufd_device_ictx(struct vfio_device *vdev);
> +int vfio_iommufd_device_hot_reset_devid(struct vfio_device *vdev,
> +					struct iommufd_ctx *ictx);
>  int vfio_iommufd_physical_bind(struct vfio_device *vdev,
>  			       struct iommufd_ctx *ictx, u32 *out_device_id);
>  void vfio_iommufd_physical_unbind(struct vfio_device *vdev);
> @@ -123,6 +126,19 @@ int vfio_iommufd_emulated_bind(struct vfio_device *vdev,
>  void vfio_iommufd_emulated_unbind(struct vfio_device *vdev);
>  int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
>  #else
> +static inline struct iommufd_ctx *
> +vfio_iommufd_device_ictx(struct vfio_device *vdev)
> +{
> +	return NULL;
> +}
> +
> +static inline int
> +vfio_iommufd_device_hot_reset_devid(struct vfio_device *vdev,
> +				    struct iommufd_ctx *ictx)
> +{
> +	return VFIO_PCI_DEVID_NOT_OWNED;
> +}
> +
>  #define vfio_iommufd_physical_bind                                      \
>  	((int (*)(struct vfio_device *vdev, struct iommufd_ctx *ictx,   \
>  		  u32 *out_device_id)) NULL)
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 0552e8dcf0cb..70cc31e6b1ce 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -650,11 +650,57 @@ enum {
>   * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 12,
>   *					      struct vfio_pci_hot_reset_info)
>   *
> + * This command is used to query the affected devices in the hot reset for
> + * a given device.
> + *
> + * This command always reports the segment, bus, and devfn information for
> + * each affected device, and selectively reports the group_id or devid per
> + * the way how the calling device is opened.
> + *
> + *	- If the calling device is opened via the traditional group/container
> + *	  API, group_id is reported.  User should check if it has owned all
> + *	  the affected devices and provides a set of group fds to prove the
> + *	  ownership in VFIO_DEVICE_PCI_HOT_RESET ioctl.
> + *
> + *	- If the calling device is opened as a cdev, devid is reported.
> + *	  Flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is set to indicate this
> + *	  data type.  All the affected devices should be represented in
> + *	  the dev_set, ex. bound to a vfio driver, and also be owned by
> + *	  this interface which is determined by the following conditions:
> + *	  1) Has a valid devid within the iommufd_ctx of the calling device.
> + *	     Ownership cannot be determined across separate iommufd_ctx and
> + *	     the cdev calling conventions do not support a proof-of-ownership
> + *	     model as provided in the legacy group interface.  In this case
> + *	     valid devid with value greater than zero is provided in the return
> + *	     structure.
> + *	  2) Does not have a valid devid within the iommufd_ctx of the calling
> + *	     device, but belongs to the same IOMMU group as the calling device
> + *	     or another opened device that has a valid devid within the
> + *	     iommufd_ctx of the calling device.  This provides implicit ownership
> + *	     for devices within the same DMA isolation context.  In this case
> + *	     the devid value of VFIO_PCI_DEVID_OWNED is provided in the return
> + *	     structure.
> + *
> + *	  A devid value of VFIO_PCI_DEVID_NOT_OWNED is provided in the return
> + *	  structure for affected devices where device is NOT represented in the
> + *	  dev_set or ownership is not available.  Such devices prevent the use
> + *	  of VFIO_DEVICE_PCI_HOT_RESET ioctl outside of the proof-of-ownership
> + *	  calling conventions (ie. via legacy group accessed devices).  Flag
> + *	  VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED would be set when all the
> + *	  affected devices are represented in the dev_set and also owned by
> + *	  the user.  This flag is available only when
> + *	  flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is set, otherwise reserved.
> + *
>   * Return: 0 on success, -errno on failure:
>   *	-enospc = insufficient buffer, -enodev = unsupported for device.
>   */
>  struct vfio_pci_dependent_device {
> -	__u32	group_id;
> +	union {
> +		__u32   group_id;
> +		__u32	devid;
> +#define VFIO_PCI_DEVID_OWNED		0
> +#define VFIO_PCI_DEVID_NOT_OWNED	-1
> +	};
>  	__u16	segment;
>  	__u8	bus;
>  	__u8	devfn; /* Use PCI_SLOT/PCI_FUNC */
> @@ -663,6 +709,8 @@ struct vfio_pci_dependent_device {
>  struct vfio_pci_hot_reset_info {
>  	__u32	argsz;
>  	__u32	flags;
> +#define VFIO_PCI_HOT_RESET_FLAG_DEV_ID		(1 << 0)
> +#define VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED	(1 << 1)
>  	__u32	count;
>  	struct vfio_pci_dependent_device	devices[];
>  };


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH v7 9/9] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-06-02 12:15   ` [Intel-gfx] " Yi Liu
@ 2023-06-08 22:30     ` Alex Williamson
  -1 siblings, 0 replies; 77+ messages in thread
From: Alex Williamson @ 2023-06-08 22:30 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, kvm, lulu, yanting.jiang,
	joro, nicolinc, jgg, kevin.tian, yan.y.zhao, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Fri,  2 Jun 2023 05:15:15 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> This is the way user to invoke hot-reset for the devices opened by cdev
> interface. User should check the flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED
> in the output of VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl before doing
> hot-reset for cdev devices.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/pci/vfio_pci_core.c | 61 ++++++++++++++++++++++++++------
>  include/uapi/linux/vfio.h        | 14 ++++++++
>  2 files changed, 64 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index a615a223cdef..b0eadafcbcf5 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -181,7 +181,8 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_core_device *vdev)
>  struct vfio_pci_group_info;
>  static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set);
>  static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> -				      struct vfio_pci_group_info *groups);
> +				      struct vfio_pci_group_info *groups,
> +				      struct iommufd_ctx *iommufd_ctx);
>  
>  /*
>   * INTx masking requires the ability to disable INTx signaling via PCI_COMMAND
> @@ -1308,8 +1309,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
>  	if (ret)
>  		return ret;
>  
> -	/* Somewhere between 1 and count is OK */
> -	if (!array_count || array_count > count)
> +	if (array_count > count)
>  		return -EINVAL;
>  
>  	group_fds = kcalloc(array_count, sizeof(*group_fds), GFP_KERNEL);
> @@ -1358,7 +1358,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
>  	info.count = array_count;
>  	info.files = files;
>  
> -	ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info);
> +	ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info, NULL);
>  
>  hot_reset_release:
>  	for (file_idx--; file_idx >= 0; file_idx--)
> @@ -1381,13 +1381,21 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
>  	if (hdr.argsz < minsz || hdr.flags)
>  		return -EINVAL;
>  
> +	/* zero-length array is only for cdev opened devices */
> +	if (!!hdr.count == vfio_device_cdev_opened(&vdev->vdev))
> +		return -EINVAL;
> +
>  	/* Can we do a slot or bus reset or neither? */
>  	if (!pci_probe_reset_slot(vdev->pdev->slot))
>  		slot = true;
>  	else if (pci_probe_reset_bus(vdev->pdev->bus))
>  		return -ENODEV;
>  
> -	return vfio_pci_ioctl_pci_hot_reset_groups(vdev, hdr.count, slot, arg);
> +	if (hdr.count)
> +		return vfio_pci_ioctl_pci_hot_reset_groups(vdev, hdr.count, slot, arg);
> +
> +	return vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, NULL,
> +					  vfio_iommufd_device_ictx(&vdev->vdev));
>  }
>  
>  static int vfio_pci_ioctl_ioeventfd(struct vfio_pci_core_device *vdev,
> @@ -2354,13 +2362,16 @@ const struct pci_error_handlers vfio_pci_core_err_handlers = {
>  };
>  EXPORT_SYMBOL_GPL(vfio_pci_core_err_handlers);
>  
> -static bool vfio_dev_in_groups(struct vfio_pci_core_device *vdev,
> +static bool vfio_dev_in_groups(struct vfio_device *vdev,
>  			       struct vfio_pci_group_info *groups)
>  {
>  	unsigned int i;
>  
> +	if (!groups)
> +		return false;
> +
>  	for (i = 0; i < groups->count; i++)
> -		if (vfio_file_has_dev(groups->files[i], &vdev->vdev))
> +		if (vfio_file_has_dev(groups->files[i], vdev))
>  			return true;
>  	return false;
>  }
> @@ -2436,7 +2447,8 @@ static int vfio_pci_dev_set_pm_runtime_get(struct vfio_device_set *dev_set)
>   * get each memory_lock.
>   */
>  static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> -				      struct vfio_pci_group_info *groups)
> +				      struct vfio_pci_group_info *groups,
> +				      struct iommufd_ctx *iommufd_ctx)
>  {
>  	struct vfio_pci_core_device *cur_mem;
>  	struct vfio_pci_core_device *cur_vma;
> @@ -2466,11 +2478,38 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
>  		goto err_unlock;
>  
>  	list_for_each_entry(cur_vma, &dev_set->device_list, vdev.dev_set_list) {
> +		bool owned;
> +
>  		/*
> -		 * Test whether all the affected devices are contained by the
> -		 * set of groups provided by the user.
> +		 * Test whether all the affected devices can be reset by the
> +		 * user.
> +		 *
> +		 * If called from a group opened device and the user provides
> +		 * a set of groups, all the devices in the dev_set should be
> +		 * contained by the set of groups provided by the user.
> +		 *
> +		 * If called from a cdev opened device and the user provides
> +		 * a zero-length array, all the devices in the dev_set must
> +		 * be bound to the same iommufd_ctx as the input iommufd_ctx.
> +		 * If there is any device that has not been bound to any
> +		 * iommufd_ctx yet, check if its iommu_group has any device
> +		 * bound to the input iommufd_ctx.  Such devices can be
> +		 * considered owned by the input iommufd_ctx as the device
> +		 * cannot be owned by another iommufd_ctx when its iommu_group
> +		 * is owned.
> +		 *
> +		 * Otherwise, reset is not allowed.
>  		 */
> -		if (!vfio_dev_in_groups(cur_vma, groups)) {
> +		if (iommufd_ctx) {
> +			int devid = vfio_iommufd_device_hot_reset_devid(&cur_vma->vdev,
> +									iommufd_ctx);
> +
> +			owned = (devid != VFIO_PCI_DEVID_NOT_OWNED);
> +		} else {
> +			owned = vfio_dev_in_groups(&cur_vma->vdev, groups);
> +		}
> +
> +		if (!owned) {
>  			ret = -EINVAL;
>  			goto err_undo;
>  		}
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 70cc31e6b1ce..f753124e1c82 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -690,6 +690,9 @@ enum {
>   *	  affected devices are represented in the dev_set and also owned by
>   *	  the user.  This flag is available only when
>   *	  flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is set, otherwise reserved.
> + *	  When set, user could invoke VFIO_DEVICE_PCI_HOT_RESET with a zero
> + *	  length fd array on the calling device as the ownership is validated
> + *	  by iommufd_ctx.
>   *
>   * Return: 0 on success, -errno on failure:
>   *	-enospc = insufficient buffer, -enodev = unsupported for device.
> @@ -721,6 +724,17 @@ struct vfio_pci_hot_reset_info {
>   * VFIO_DEVICE_PCI_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE + 13,
>   *				    struct vfio_pci_hot_reset)
>   *
> + * Userspace requests hot reset for the devices it operates.  Due to the
> + * underlying topology, multiple devices can be affected in the reset
> + * while some might be opened by another user.  To avoid interference
> + * the calling user must ensure all affected devices are owned by itself.

This phrasing suggest to me that we're placing the responsibility on
the user to avoid resetting another user's devices.  Perhaps these
paragraphs could be replaced with:

  A PCI hot reset results in either a bus or slot reset which may affect
  other devices sharing the bus/slot.  The calling user must have
  ownership of the full set of affected devices as determined by the
  VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl.

  When called on a device file descriptor acquired through the vfio
  group interface, the user is required to provide proof of ownership
  of those affected devices via the group_fds array in struct
  vfio_pci_hot_reset.

  When called on a direct cdev opened vfio device, the flags field of
  struct vfio_pci_hot_reset_info reports the ownership status of the
  affected devices and this ioctl must be called with an empty group_fds
  array.  See above INFO ioctl definition for ownership requirements.

  Mixed usage of legacy groups and cdevs across the set of affected
  devices is not supported.

Other than this and the couple other comments, the series looks ok to
me.  We still need acks from Jason for iommufd on 3-5.  Thanks,

Alex

> + *
> + * As the ownership described by VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, the
> + * cdev opened devices must exclusively provide a zero-length fd array and
> + * the group opened devices must exclusively use an array of group fds for
> + * proof of ownership.  Mixed access to devices between cdev and legacy
> + * groups are not supported by this interface.
> + *
>   * Return: 0 on success, -errno on failure.
>   */
>  struct vfio_pci_hot_reset {


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v7 9/9] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
@ 2023-06-08 22:30     ` Alex Williamson
  0 siblings, 0 replies; 77+ messages in thread
From: Alex Williamson @ 2023-06-08 22:30 UTC (permalink / raw)
  To: Yi Liu
  Cc: jgg, kevin.tian, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, xudong.hao, yan.y.zhao,
	terrence.xu, yanting.jiang, zhenzhong.duan, clegoate

On Fri,  2 Jun 2023 05:15:15 -0700
Yi Liu <yi.l.liu@intel.com> wrote:

> This is the way user to invoke hot-reset for the devices opened by cdev
> interface. User should check the flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED
> in the output of VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl before doing
> hot-reset for cdev devices.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/pci/vfio_pci_core.c | 61 ++++++++++++++++++++++++++------
>  include/uapi/linux/vfio.h        | 14 ++++++++
>  2 files changed, 64 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index a615a223cdef..b0eadafcbcf5 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -181,7 +181,8 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_core_device *vdev)
>  struct vfio_pci_group_info;
>  static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set);
>  static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> -				      struct vfio_pci_group_info *groups);
> +				      struct vfio_pci_group_info *groups,
> +				      struct iommufd_ctx *iommufd_ctx);
>  
>  /*
>   * INTx masking requires the ability to disable INTx signaling via PCI_COMMAND
> @@ -1308,8 +1309,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
>  	if (ret)
>  		return ret;
>  
> -	/* Somewhere between 1 and count is OK */
> -	if (!array_count || array_count > count)
> +	if (array_count > count)
>  		return -EINVAL;
>  
>  	group_fds = kcalloc(array_count, sizeof(*group_fds), GFP_KERNEL);
> @@ -1358,7 +1358,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct vfio_pci_core_device *vdev,
>  	info.count = array_count;
>  	info.files = files;
>  
> -	ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info);
> +	ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info, NULL);
>  
>  hot_reset_release:
>  	for (file_idx--; file_idx >= 0; file_idx--)
> @@ -1381,13 +1381,21 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
>  	if (hdr.argsz < minsz || hdr.flags)
>  		return -EINVAL;
>  
> +	/* zero-length array is only for cdev opened devices */
> +	if (!!hdr.count == vfio_device_cdev_opened(&vdev->vdev))
> +		return -EINVAL;
> +
>  	/* Can we do a slot or bus reset or neither? */
>  	if (!pci_probe_reset_slot(vdev->pdev->slot))
>  		slot = true;
>  	else if (pci_probe_reset_bus(vdev->pdev->bus))
>  		return -ENODEV;
>  
> -	return vfio_pci_ioctl_pci_hot_reset_groups(vdev, hdr.count, slot, arg);
> +	if (hdr.count)
> +		return vfio_pci_ioctl_pci_hot_reset_groups(vdev, hdr.count, slot, arg);
> +
> +	return vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, NULL,
> +					  vfio_iommufd_device_ictx(&vdev->vdev));
>  }
>  
>  static int vfio_pci_ioctl_ioeventfd(struct vfio_pci_core_device *vdev,
> @@ -2354,13 +2362,16 @@ const struct pci_error_handlers vfio_pci_core_err_handlers = {
>  };
>  EXPORT_SYMBOL_GPL(vfio_pci_core_err_handlers);
>  
> -static bool vfio_dev_in_groups(struct vfio_pci_core_device *vdev,
> +static bool vfio_dev_in_groups(struct vfio_device *vdev,
>  			       struct vfio_pci_group_info *groups)
>  {
>  	unsigned int i;
>  
> +	if (!groups)
> +		return false;
> +
>  	for (i = 0; i < groups->count; i++)
> -		if (vfio_file_has_dev(groups->files[i], &vdev->vdev))
> +		if (vfio_file_has_dev(groups->files[i], vdev))
>  			return true;
>  	return false;
>  }
> @@ -2436,7 +2447,8 @@ static int vfio_pci_dev_set_pm_runtime_get(struct vfio_device_set *dev_set)
>   * get each memory_lock.
>   */
>  static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> -				      struct vfio_pci_group_info *groups)
> +				      struct vfio_pci_group_info *groups,
> +				      struct iommufd_ctx *iommufd_ctx)
>  {
>  	struct vfio_pci_core_device *cur_mem;
>  	struct vfio_pci_core_device *cur_vma;
> @@ -2466,11 +2478,38 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
>  		goto err_unlock;
>  
>  	list_for_each_entry(cur_vma, &dev_set->device_list, vdev.dev_set_list) {
> +		bool owned;
> +
>  		/*
> -		 * Test whether all the affected devices are contained by the
> -		 * set of groups provided by the user.
> +		 * Test whether all the affected devices can be reset by the
> +		 * user.
> +		 *
> +		 * If called from a group opened device and the user provides
> +		 * a set of groups, all the devices in the dev_set should be
> +		 * contained by the set of groups provided by the user.
> +		 *
> +		 * If called from a cdev opened device and the user provides
> +		 * a zero-length array, all the devices in the dev_set must
> +		 * be bound to the same iommufd_ctx as the input iommufd_ctx.
> +		 * If there is any device that has not been bound to any
> +		 * iommufd_ctx yet, check if its iommu_group has any device
> +		 * bound to the input iommufd_ctx.  Such devices can be
> +		 * considered owned by the input iommufd_ctx as the device
> +		 * cannot be owned by another iommufd_ctx when its iommu_group
> +		 * is owned.
> +		 *
> +		 * Otherwise, reset is not allowed.
>  		 */
> -		if (!vfio_dev_in_groups(cur_vma, groups)) {
> +		if (iommufd_ctx) {
> +			int devid = vfio_iommufd_device_hot_reset_devid(&cur_vma->vdev,
> +									iommufd_ctx);
> +
> +			owned = (devid != VFIO_PCI_DEVID_NOT_OWNED);
> +		} else {
> +			owned = vfio_dev_in_groups(&cur_vma->vdev, groups);
> +		}
> +
> +		if (!owned) {
>  			ret = -EINVAL;
>  			goto err_undo;
>  		}
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 70cc31e6b1ce..f753124e1c82 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -690,6 +690,9 @@ enum {
>   *	  affected devices are represented in the dev_set and also owned by
>   *	  the user.  This flag is available only when
>   *	  flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is set, otherwise reserved.
> + *	  When set, user could invoke VFIO_DEVICE_PCI_HOT_RESET with a zero
> + *	  length fd array on the calling device as the ownership is validated
> + *	  by iommufd_ctx.
>   *
>   * Return: 0 on success, -errno on failure:
>   *	-enospc = insufficient buffer, -enodev = unsupported for device.
> @@ -721,6 +724,17 @@ struct vfio_pci_hot_reset_info {
>   * VFIO_DEVICE_PCI_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE + 13,
>   *				    struct vfio_pci_hot_reset)
>   *
> + * Userspace requests hot reset for the devices it operates.  Due to the
> + * underlying topology, multiple devices can be affected in the reset
> + * while some might be opened by another user.  To avoid interference
> + * the calling user must ensure all affected devices are owned by itself.

This phrasing suggest to me that we're placing the responsibility on
the user to avoid resetting another user's devices.  Perhaps these
paragraphs could be replaced with:

  A PCI hot reset results in either a bus or slot reset which may affect
  other devices sharing the bus/slot.  The calling user must have
  ownership of the full set of affected devices as determined by the
  VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl.

  When called on a device file descriptor acquired through the vfio
  group interface, the user is required to provide proof of ownership
  of those affected devices via the group_fds array in struct
  vfio_pci_hot_reset.

  When called on a direct cdev opened vfio device, the flags field of
  struct vfio_pci_hot_reset_info reports the ownership status of the
  affected devices and this ioctl must be called with an empty group_fds
  array.  See above INFO ioctl definition for ownership requirements.

  Mixed usage of legacy groups and cdevs across the set of affected
  devices is not supported.

Other than this and the couple other comments, the series looks ok to
me.  We still need acks from Jason for iommufd on 3-5.  Thanks,

Alex

> + *
> + * As the ownership described by VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, the
> + * cdev opened devices must exclusively provide a zero-length fd array and
> + * the group opened devices must exclusively use an array of group fds for
> + * proof of ownership.  Mixed access to devices between cdev and legacy
> + * groups are not supported by this interface.
> + *
>   * Return: 0 on success, -errno on failure.
>   */
>  struct vfio_pci_hot_reset {


^ permalink raw reply	[flat|nested] 77+ messages in thread

* RE: [PATCH v7 4/9] iommufd: Add iommufd_ctx_has_group()
  2023-06-08 21:40     ` Alex Williamson
@ 2023-06-08 23:44       ` Liu, Yi L
  -1 siblings, 0 replies; 77+ messages in thread
From: Liu, Yi L @ 2023-06-08 23:44 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, June 9, 2023 5:41 AM
> 
> On Fri,  2 Jun 2023 05:15:10 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > This adds the helper to check if any device within the given iommu_group
> > has been bound with the iommufd_ctx. This is helpful for the checking on
> > device ownership for the devices which have not been bound but cannot be
> > bound to any other iommufd_ctx as the iommu_group has been bound.
> >
> > Tested-by: Terrence Xu <terrence.xu@intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/iommu/iommufd/device.c | 30 ++++++++++++++++++++++++++++++
> >  include/linux/iommufd.h        |  8 ++++++++
> >  2 files changed, 38 insertions(+)
> >
> > diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> > index 4f9b2142274c..4571344c8508 100644
> > --- a/drivers/iommu/iommufd/device.c
> > +++ b/drivers/iommu/iommufd/device.c
> > @@ -98,6 +98,36 @@ struct iommufd_device *iommufd_device_bind(struct
> iommufd_ctx *ictx,
> >  }
> >  EXPORT_SYMBOL_NS_GPL(iommufd_device_bind, IOMMUFD);
> >
> > +/**
> > + * iommufd_ctx_has_group - True if any device within the group is bound
> > + *                         to the ictx
> > + * @ictx: iommufd file descriptor
> > + * @group: Pointer to a physical iommu_group struct
> > + *
> > + * True if any device within the group has been bound to this ictx, ex. via
> > + * iommufd_device_bind(), therefore implying ictx ownership of the group.
> > + */
> > +bool iommufd_ctx_has_group(struct iommufd_ctx *ictx, struct iommu_group *group)
> > +{
> > +	struct iommufd_object *obj;
> > +	unsigned long index;
> > +
> > +	if (!ictx || !group)
> > +		return false;
> > +
> > +	xa_lock(&ictx->objects);
> > +	xa_for_each(&ictx->objects, index, obj) {
> > +		if (obj->type == IOMMUFD_OBJ_DEVICE &&
> > +		    container_of(obj, struct iommufd_device, obj)->group == group) {
> > +			xa_unlock(&ictx->objects);
> > +			return true;
> > +		}
> > +	}
> > +	xa_unlock(&ictx->objects);
> > +	return false;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(iommufd_ctx_has_group, IOMMUFD);
> > +
> >  /**
> >   * iommufd_device_unbind - Undo iommufd_device_bind()
> >   * @idev: Device returned by iommufd_device_bind()
> > diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
> > index 1129a36a74c4..33fe57e95e42 100644
> > --- a/include/linux/iommufd.h
> > +++ b/include/linux/iommufd.h
> > @@ -16,6 +16,7 @@ struct page;
> >  struct iommufd_ctx;
> >  struct iommufd_access;
> >  struct file;
> > +struct iommu_group;
> >
> >  struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
> >  					   struct device *dev, u32 *id);
> > @@ -50,6 +51,7 @@ void iommufd_ctx_get(struct iommufd_ctx *ictx);
> >  #if IS_ENABLED(CONFIG_IOMMUFD)
> >  struct iommufd_ctx *iommufd_ctx_from_file(struct file *file);
> >  void iommufd_ctx_put(struct iommufd_ctx *ictx);
> > +bool iommufd_ctx_has_group(struct iommufd_ctx *ictx, struct iommu_group *group);
> >
> >  int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova,
> >  			     unsigned long length, struct page **out_pages,
> > @@ -71,6 +73,12 @@ static inline void iommufd_ctx_put(struct iommufd_ctx *ictx)
> >  {
> >  }
> >
> > +static inline bool iommufd_ctx_has_group(struct iommufd_ctx *ictx,
> > +					 struct iommu_group *group)
> > +{
> > +	return false;
> > +}
> > +
> >  static inline int iommufd_access_pin_pages(struct iommufd_access *access,
> >  					   unsigned long iova,
> >  					   unsigned long length,
> 
> It looks like the v12 cdev series no longer requires this stub?  We
> haven't used this function except from iommufd specific code since v5.

You are right. It should be dropped.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH v7 4/9] iommufd: Add iommufd_ctx_has_group()
@ 2023-06-08 23:44       ` Liu, Yi L
  0 siblings, 0 replies; 77+ messages in thread
From: Liu, Yi L @ 2023-06-08 23:44 UTC (permalink / raw)
  To: Alex Williamson
  Cc: mjrosato, jasowang, Hao, Xudong, Duan,  Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, jgg, Tian, Kevin, Zhao, Yan Y, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, June 9, 2023 5:41 AM
> 
> On Fri,  2 Jun 2023 05:15:10 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > This adds the helper to check if any device within the given iommu_group
> > has been bound with the iommufd_ctx. This is helpful for the checking on
> > device ownership for the devices which have not been bound but cannot be
> > bound to any other iommufd_ctx as the iommu_group has been bound.
> >
> > Tested-by: Terrence Xu <terrence.xu@intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/iommu/iommufd/device.c | 30 ++++++++++++++++++++++++++++++
> >  include/linux/iommufd.h        |  8 ++++++++
> >  2 files changed, 38 insertions(+)
> >
> > diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> > index 4f9b2142274c..4571344c8508 100644
> > --- a/drivers/iommu/iommufd/device.c
> > +++ b/drivers/iommu/iommufd/device.c
> > @@ -98,6 +98,36 @@ struct iommufd_device *iommufd_device_bind(struct
> iommufd_ctx *ictx,
> >  }
> >  EXPORT_SYMBOL_NS_GPL(iommufd_device_bind, IOMMUFD);
> >
> > +/**
> > + * iommufd_ctx_has_group - True if any device within the group is bound
> > + *                         to the ictx
> > + * @ictx: iommufd file descriptor
> > + * @group: Pointer to a physical iommu_group struct
> > + *
> > + * True if any device within the group has been bound to this ictx, ex. via
> > + * iommufd_device_bind(), therefore implying ictx ownership of the group.
> > + */
> > +bool iommufd_ctx_has_group(struct iommufd_ctx *ictx, struct iommu_group *group)
> > +{
> > +	struct iommufd_object *obj;
> > +	unsigned long index;
> > +
> > +	if (!ictx || !group)
> > +		return false;
> > +
> > +	xa_lock(&ictx->objects);
> > +	xa_for_each(&ictx->objects, index, obj) {
> > +		if (obj->type == IOMMUFD_OBJ_DEVICE &&
> > +		    container_of(obj, struct iommufd_device, obj)->group == group) {
> > +			xa_unlock(&ictx->objects);
> > +			return true;
> > +		}
> > +	}
> > +	xa_unlock(&ictx->objects);
> > +	return false;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(iommufd_ctx_has_group, IOMMUFD);
> > +
> >  /**
> >   * iommufd_device_unbind - Undo iommufd_device_bind()
> >   * @idev: Device returned by iommufd_device_bind()
> > diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
> > index 1129a36a74c4..33fe57e95e42 100644
> > --- a/include/linux/iommufd.h
> > +++ b/include/linux/iommufd.h
> > @@ -16,6 +16,7 @@ struct page;
> >  struct iommufd_ctx;
> >  struct iommufd_access;
> >  struct file;
> > +struct iommu_group;
> >
> >  struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
> >  					   struct device *dev, u32 *id);
> > @@ -50,6 +51,7 @@ void iommufd_ctx_get(struct iommufd_ctx *ictx);
> >  #if IS_ENABLED(CONFIG_IOMMUFD)
> >  struct iommufd_ctx *iommufd_ctx_from_file(struct file *file);
> >  void iommufd_ctx_put(struct iommufd_ctx *ictx);
> > +bool iommufd_ctx_has_group(struct iommufd_ctx *ictx, struct iommu_group *group);
> >
> >  int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long iova,
> >  			     unsigned long length, struct page **out_pages,
> > @@ -71,6 +73,12 @@ static inline void iommufd_ctx_put(struct iommufd_ctx *ictx)
> >  {
> >  }
> >
> > +static inline bool iommufd_ctx_has_group(struct iommufd_ctx *ictx,
> > +					 struct iommu_group *group)
> > +{
> > +	return false;
> > +}
> > +
> >  static inline int iommufd_access_pin_pages(struct iommufd_access *access,
> >  					   unsigned long iova,
> >  					   unsigned long length,
> 
> It looks like the v12 cdev series no longer requires this stub?  We
> haven't used this function except from iommufd specific code since v5.

You are right. It should be dropped.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* RE: [PATCH v7 8/9] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
  2023-06-08 22:26     ` Alex Williamson
@ 2023-06-09  0:04       ` Liu, Yi L
  -1 siblings, 0 replies; 77+ messages in thread
From: Liu, Yi L @ 2023-06-09  0:04 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, June 9, 2023 6:27 AM
> 
> On Fri,  2 Jun 2023 05:15:14 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > This allows VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl use the iommufd_ctx
> > of the cdev device to check the ownership of the other affected devices.
> >
> > When VFIO_DEVICE_GET_PCI_HOT_RESET_INFO is called on an IOMMUFD managed
> > device, the new flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is reported to indicate
> > the values returned are IOMMUFD devids rather than group IDs as used when
> > accessing vfio devices through the conventional vfio group interface.
> > Additionally the flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED will be reported
> > in this mode if all of the devices affected by the hot-reset are owned by
> > either virtue of being directly bound to the same iommufd context as the
> > calling device, or implicitly owned via a shared IOMMU group.
> >
> > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/iommufd.c           | 49 +++++++++++++++++++++++++++++++
> >  drivers/vfio/pci/vfio_pci_core.c | 47 +++++++++++++++++++++++++-----
> >  include/linux/vfio.h             | 16 ++++++++++
> >  include/uapi/linux/vfio.h        | 50 +++++++++++++++++++++++++++++++-
> >  4 files changed, 154 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> > index 88b00c501015..a04f3a493437 100644
> > --- a/drivers/vfio/iommufd.c
> > +++ b/drivers/vfio/iommufd.c
> > @@ -66,6 +66,55 @@ void vfio_iommufd_unbind(struct vfio_device *vdev)
> >  		vdev->ops->unbind_iommufd(vdev);
> >  }
> >
> > +struct iommufd_ctx *vfio_iommufd_device_ictx(struct vfio_device *vdev)
> > +{
> > +	if (vdev->iommufd_device)
> > +		return iommufd_device_to_ictx(vdev->iommufd_device);
> > +	return NULL;
> > +}
> > +EXPORT_SYMBOL_GPL(vfio_iommufd_device_ictx);
> > +
> > +static int vfio_iommufd_device_id(struct vfio_device *vdev)
> > +{
> > +	if (vdev->iommufd_device)
> > +		return iommufd_device_to_id(vdev->iommufd_device);
> > +	return -EINVAL;
> 
> If this is actually reachable, it allows us to return -EINVAL as a
> devid in the reset-info ioctl, which is not a defined value.  Should
> this return VFIO_PCI_DEVID_NOT_OWNED or do you want to catch the errno
> value in the caller?  Thanks,

This error can be reached if user invokes _INFO or HOT_RESET on an emulated
device or a physical device that has not been bound to iommufd. Both should
be considered as not-owned. So return VFIO_PCI_DEVID_NOT_OWNED makes
more sense.

Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH v7 8/9] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
@ 2023-06-09  0:04       ` Liu, Yi L
  0 siblings, 0 replies; 77+ messages in thread
From: Liu, Yi L @ 2023-06-09  0:04 UTC (permalink / raw)
  To: Alex Williamson
  Cc: mjrosato, jasowang, Hao, Xudong, Duan,  Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, jgg, Tian, Kevin, Zhao, Yan Y, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, June 9, 2023 6:27 AM
> 
> On Fri,  2 Jun 2023 05:15:14 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > This allows VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl use the iommufd_ctx
> > of the cdev device to check the ownership of the other affected devices.
> >
> > When VFIO_DEVICE_GET_PCI_HOT_RESET_INFO is called on an IOMMUFD managed
> > device, the new flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is reported to indicate
> > the values returned are IOMMUFD devids rather than group IDs as used when
> > accessing vfio devices through the conventional vfio group interface.
> > Additionally the flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED will be reported
> > in this mode if all of the devices affected by the hot-reset are owned by
> > either virtue of being directly bound to the same iommufd context as the
> > calling device, or implicitly owned via a shared IOMMU group.
> >
> > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/iommufd.c           | 49 +++++++++++++++++++++++++++++++
> >  drivers/vfio/pci/vfio_pci_core.c | 47 +++++++++++++++++++++++++-----
> >  include/linux/vfio.h             | 16 ++++++++++
> >  include/uapi/linux/vfio.h        | 50 +++++++++++++++++++++++++++++++-
> >  4 files changed, 154 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> > index 88b00c501015..a04f3a493437 100644
> > --- a/drivers/vfio/iommufd.c
> > +++ b/drivers/vfio/iommufd.c
> > @@ -66,6 +66,55 @@ void vfio_iommufd_unbind(struct vfio_device *vdev)
> >  		vdev->ops->unbind_iommufd(vdev);
> >  }
> >
> > +struct iommufd_ctx *vfio_iommufd_device_ictx(struct vfio_device *vdev)
> > +{
> > +	if (vdev->iommufd_device)
> > +		return iommufd_device_to_ictx(vdev->iommufd_device);
> > +	return NULL;
> > +}
> > +EXPORT_SYMBOL_GPL(vfio_iommufd_device_ictx);
> > +
> > +static int vfio_iommufd_device_id(struct vfio_device *vdev)
> > +{
> > +	if (vdev->iommufd_device)
> > +		return iommufd_device_to_id(vdev->iommufd_device);
> > +	return -EINVAL;
> 
> If this is actually reachable, it allows us to return -EINVAL as a
> devid in the reset-info ioctl, which is not a defined value.  Should
> this return VFIO_PCI_DEVID_NOT_OWNED or do you want to catch the errno
> value in the caller?  Thanks,

This error can be reached if user invokes _INFO or HOT_RESET on an emulated
device or a physical device that has not been bound to iommufd. Both should
be considered as not-owned. So return VFIO_PCI_DEVID_NOT_OWNED makes
more sense.

Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 77+ messages in thread

* RE: [PATCH v7 9/9] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-06-08 22:30     ` Alex Williamson
@ 2023-06-09  0:13       ` Liu, Yi L
  -1 siblings, 0 replies; 77+ messages in thread
From: Liu, Yi L @ 2023-06-09  0:13 UTC (permalink / raw)
  To: Alex Williamson
  Cc: jgg, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, June 9, 2023 6:30 AM
> 
> On Fri,  2 Jun 2023 05:15:15 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > This is the way user to invoke hot-reset for the devices opened by cdev
> > interface. User should check the flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED
> > in the output of VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl before doing
> > hot-reset for cdev devices.
> >
> > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/pci/vfio_pci_core.c | 61 ++++++++++++++++++++++++++------
> >  include/uapi/linux/vfio.h        | 14 ++++++++
> >  2 files changed, 64 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> > index a615a223cdef..b0eadafcbcf5 100644
> > --- a/drivers/vfio/pci/vfio_pci_core.c
> > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > @@ -181,7 +181,8 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_core_device
> *vdev)
> >  struct vfio_pci_group_info;
> >  static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set);
> >  static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> > -				      struct vfio_pci_group_info *groups);
> > +				      struct vfio_pci_group_info *groups,
> > +				      struct iommufd_ctx *iommufd_ctx);
> >
> >  /*
> >   * INTx masking requires the ability to disable INTx signaling via PCI_COMMAND
> > @@ -1308,8 +1309,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct
> vfio_pci_core_device *vdev,
> >  	if (ret)
> >  		return ret;
> >
> > -	/* Somewhere between 1 and count is OK */
> > -	if (!array_count || array_count > count)
> > +	if (array_count > count)
> >  		return -EINVAL;
> >
> >  	group_fds = kcalloc(array_count, sizeof(*group_fds), GFP_KERNEL);
> > @@ -1358,7 +1358,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct
> vfio_pci_core_device *vdev,
> >  	info.count = array_count;
> >  	info.files = files;
> >
> > -	ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info);
> > +	ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info, NULL);
> >
> >  hot_reset_release:
> >  	for (file_idx--; file_idx >= 0; file_idx--)
> > @@ -1381,13 +1381,21 @@ static int vfio_pci_ioctl_pci_hot_reset(struct
> vfio_pci_core_device *vdev,
> >  	if (hdr.argsz < minsz || hdr.flags)
> >  		return -EINVAL;
> >
> > +	/* zero-length array is only for cdev opened devices */
> > +	if (!!hdr.count == vfio_device_cdev_opened(&vdev->vdev))
> > +		return -EINVAL;
> > +
> >  	/* Can we do a slot or bus reset or neither? */
> >  	if (!pci_probe_reset_slot(vdev->pdev->slot))
> >  		slot = true;
> >  	else if (pci_probe_reset_bus(vdev->pdev->bus))
> >  		return -ENODEV;
> >
> > -	return vfio_pci_ioctl_pci_hot_reset_groups(vdev, hdr.count, slot, arg);
> > +	if (hdr.count)
> > +		return vfio_pci_ioctl_pci_hot_reset_groups(vdev, hdr.count, slot, arg);
> > +
> > +	return vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, NULL,
> > +					  vfio_iommufd_device_ictx(&vdev->vdev));
> >  }
> >
> >  static int vfio_pci_ioctl_ioeventfd(struct vfio_pci_core_device *vdev,
> > @@ -2354,13 +2362,16 @@ const struct pci_error_handlers
> vfio_pci_core_err_handlers = {
> >  };
> >  EXPORT_SYMBOL_GPL(vfio_pci_core_err_handlers);
> >
> > -static bool vfio_dev_in_groups(struct vfio_pci_core_device *vdev,
> > +static bool vfio_dev_in_groups(struct vfio_device *vdev,
> >  			       struct vfio_pci_group_info *groups)
> >  {
> >  	unsigned int i;
> >
> > +	if (!groups)
> > +		return false;
> > +
> >  	for (i = 0; i < groups->count; i++)
> > -		if (vfio_file_has_dev(groups->files[i], &vdev->vdev))
> > +		if (vfio_file_has_dev(groups->files[i], vdev))
> >  			return true;
> >  	return false;
> >  }
> > @@ -2436,7 +2447,8 @@ static int vfio_pci_dev_set_pm_runtime_get(struct
> vfio_device_set *dev_set)
> >   * get each memory_lock.
> >   */
> >  static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> > -				      struct vfio_pci_group_info *groups)
> > +				      struct vfio_pci_group_info *groups,
> > +				      struct iommufd_ctx *iommufd_ctx)
> >  {
> >  	struct vfio_pci_core_device *cur_mem;
> >  	struct vfio_pci_core_device *cur_vma;
> > @@ -2466,11 +2478,38 @@ static int vfio_pci_dev_set_hot_reset(struct
> vfio_device_set *dev_set,
> >  		goto err_unlock;
> >
> >  	list_for_each_entry(cur_vma, &dev_set->device_list, vdev.dev_set_list) {
> > +		bool owned;
> > +
> >  		/*
> > -		 * Test whether all the affected devices are contained by the
> > -		 * set of groups provided by the user.
> > +		 * Test whether all the affected devices can be reset by the
> > +		 * user.
> > +		 *
> > +		 * If called from a group opened device and the user provides
> > +		 * a set of groups, all the devices in the dev_set should be
> > +		 * contained by the set of groups provided by the user.
> > +		 *
> > +		 * If called from a cdev opened device and the user provides
> > +		 * a zero-length array, all the devices in the dev_set must
> > +		 * be bound to the same iommufd_ctx as the input iommufd_ctx.
> > +		 * If there is any device that has not been bound to any
> > +		 * iommufd_ctx yet, check if its iommu_group has any device
> > +		 * bound to the input iommufd_ctx.  Such devices can be
> > +		 * considered owned by the input iommufd_ctx as the device
> > +		 * cannot be owned by another iommufd_ctx when its iommu_group
> > +		 * is owned.
> > +		 *
> > +		 * Otherwise, reset is not allowed.
> >  		 */
> > -		if (!vfio_dev_in_groups(cur_vma, groups)) {
> > +		if (iommufd_ctx) {
> > +			int devid = vfio_iommufd_device_hot_reset_devid(&cur_vma-
> >vdev,
> > +									iommufd_ctx);
> > +
> > +			owned = (devid != VFIO_PCI_DEVID_NOT_OWNED);
> > +		} else {
> > +			owned = vfio_dev_in_groups(&cur_vma->vdev, groups);
> > +		}
> > +
> > +		if (!owned) {
> >  			ret = -EINVAL;
> >  			goto err_undo;
> >  		}
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 70cc31e6b1ce..f753124e1c82 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -690,6 +690,9 @@ enum {
> >   *	  affected devices are represented in the dev_set and also owned by
> >   *	  the user.  This flag is available only when
> >   *	  flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is set, otherwise reserved.
> > + *	  When set, user could invoke VFIO_DEVICE_PCI_HOT_RESET with a zero
> > + *	  length fd array on the calling device as the ownership is validated
> > + *	  by iommufd_ctx.
> >   *
> >   * Return: 0 on success, -errno on failure:
> >   *	-enospc = insufficient buffer, -enodev = unsupported for device.
> > @@ -721,6 +724,17 @@ struct vfio_pci_hot_reset_info {
> >   * VFIO_DEVICE_PCI_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE + 13,
> >   *				    struct vfio_pci_hot_reset)
> >   *
> > + * Userspace requests hot reset for the devices it operates.  Due to the
> > + * underlying topology, multiple devices can be affected in the reset
> > + * while some might be opened by another user.  To avoid interference
> > + * the calling user must ensure all affected devices are owned by itself.
> 
> This phrasing suggest to me that we're placing the responsibility on
> the user to avoid resetting another user's devices.

This responsibility is not new. Is it? 😊

> Perhaps these
> paragraphs could be replaced with:
> 
>   A PCI hot reset results in either a bus or slot reset which may affect
>   other devices sharing the bus/slot.  The calling user must have
>   ownership of the full set of affected devices as determined by the
>   VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl.
> 
>   When called on a device file descriptor acquired through the vfio
>   group interface, the user is required to provide proof of ownership
>   of those affected devices via the group_fds array in struct
>   vfio_pci_hot_reset.
> 
>   When called on a direct cdev opened vfio device, the flags field of
>   struct vfio_pci_hot_reset_info reports the ownership status of the
>   affected devices and this ioctl must be called with an empty group_fds
>   array.  See above INFO ioctl definition for ownership requirements.
> 
>   Mixed usage of legacy groups and cdevs across the set of affected
>   devices is not supported.

Above is better. 

> Other than this and the couple other comments, the series looks ok to
> me.  We still need acks from Jason for iommufd on 3-5.  Thanks,

Thanks, perhaps one more version after getting feedback from Jason.

Regards,
Yi Liu

> Alex
> 
> > + *
> > + * As the ownership described by VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, the
> > + * cdev opened devices must exclusively provide a zero-length fd array and
> > + * the group opened devices must exclusively use an array of group fds for
> > + * proof of ownership.  Mixed access to devices between cdev and legacy
> > + * groups are not supported by this interface.
> > + *
> >   * Return: 0 on success, -errno on failure.
> >   */
> >  struct vfio_pci_hot_reset {


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH v7 9/9] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
@ 2023-06-09  0:13       ` Liu, Yi L
  0 siblings, 0 replies; 77+ messages in thread
From: Liu, Yi L @ 2023-06-09  0:13 UTC (permalink / raw)
  To: Alex Williamson
  Cc: mjrosato, jasowang, Hao, Xudong, Duan,  Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, jgg, Tian, Kevin, Zhao, Yan Y, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, June 9, 2023 6:30 AM
> 
> On Fri,  2 Jun 2023 05:15:15 -0700
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
> > This is the way user to invoke hot-reset for the devices opened by cdev
> > interface. User should check the flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED
> > in the output of VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl before doing
> > hot-reset for cdev devices.
> >
> > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/pci/vfio_pci_core.c | 61 ++++++++++++++++++++++++++------
> >  include/uapi/linux/vfio.h        | 14 ++++++++
> >  2 files changed, 64 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> > index a615a223cdef..b0eadafcbcf5 100644
> > --- a/drivers/vfio/pci/vfio_pci_core.c
> > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > @@ -181,7 +181,8 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_core_device
> *vdev)
> >  struct vfio_pci_group_info;
> >  static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set);
> >  static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> > -				      struct vfio_pci_group_info *groups);
> > +				      struct vfio_pci_group_info *groups,
> > +				      struct iommufd_ctx *iommufd_ctx);
> >
> >  /*
> >   * INTx masking requires the ability to disable INTx signaling via PCI_COMMAND
> > @@ -1308,8 +1309,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct
> vfio_pci_core_device *vdev,
> >  	if (ret)
> >  		return ret;
> >
> > -	/* Somewhere between 1 and count is OK */
> > -	if (!array_count || array_count > count)
> > +	if (array_count > count)
> >  		return -EINVAL;
> >
> >  	group_fds = kcalloc(array_count, sizeof(*group_fds), GFP_KERNEL);
> > @@ -1358,7 +1358,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct
> vfio_pci_core_device *vdev,
> >  	info.count = array_count;
> >  	info.files = files;
> >
> > -	ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info);
> > +	ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, &info, NULL);
> >
> >  hot_reset_release:
> >  	for (file_idx--; file_idx >= 0; file_idx--)
> > @@ -1381,13 +1381,21 @@ static int vfio_pci_ioctl_pci_hot_reset(struct
> vfio_pci_core_device *vdev,
> >  	if (hdr.argsz < minsz || hdr.flags)
> >  		return -EINVAL;
> >
> > +	/* zero-length array is only for cdev opened devices */
> > +	if (!!hdr.count == vfio_device_cdev_opened(&vdev->vdev))
> > +		return -EINVAL;
> > +
> >  	/* Can we do a slot or bus reset or neither? */
> >  	if (!pci_probe_reset_slot(vdev->pdev->slot))
> >  		slot = true;
> >  	else if (pci_probe_reset_bus(vdev->pdev->bus))
> >  		return -ENODEV;
> >
> > -	return vfio_pci_ioctl_pci_hot_reset_groups(vdev, hdr.count, slot, arg);
> > +	if (hdr.count)
> > +		return vfio_pci_ioctl_pci_hot_reset_groups(vdev, hdr.count, slot, arg);
> > +
> > +	return vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, NULL,
> > +					  vfio_iommufd_device_ictx(&vdev->vdev));
> >  }
> >
> >  static int vfio_pci_ioctl_ioeventfd(struct vfio_pci_core_device *vdev,
> > @@ -2354,13 +2362,16 @@ const struct pci_error_handlers
> vfio_pci_core_err_handlers = {
> >  };
> >  EXPORT_SYMBOL_GPL(vfio_pci_core_err_handlers);
> >
> > -static bool vfio_dev_in_groups(struct vfio_pci_core_device *vdev,
> > +static bool vfio_dev_in_groups(struct vfio_device *vdev,
> >  			       struct vfio_pci_group_info *groups)
> >  {
> >  	unsigned int i;
> >
> > +	if (!groups)
> > +		return false;
> > +
> >  	for (i = 0; i < groups->count; i++)
> > -		if (vfio_file_has_dev(groups->files[i], &vdev->vdev))
> > +		if (vfio_file_has_dev(groups->files[i], vdev))
> >  			return true;
> >  	return false;
> >  }
> > @@ -2436,7 +2447,8 @@ static int vfio_pci_dev_set_pm_runtime_get(struct
> vfio_device_set *dev_set)
> >   * get each memory_lock.
> >   */
> >  static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> > -				      struct vfio_pci_group_info *groups)
> > +				      struct vfio_pci_group_info *groups,
> > +				      struct iommufd_ctx *iommufd_ctx)
> >  {
> >  	struct vfio_pci_core_device *cur_mem;
> >  	struct vfio_pci_core_device *cur_vma;
> > @@ -2466,11 +2478,38 @@ static int vfio_pci_dev_set_hot_reset(struct
> vfio_device_set *dev_set,
> >  		goto err_unlock;
> >
> >  	list_for_each_entry(cur_vma, &dev_set->device_list, vdev.dev_set_list) {
> > +		bool owned;
> > +
> >  		/*
> > -		 * Test whether all the affected devices are contained by the
> > -		 * set of groups provided by the user.
> > +		 * Test whether all the affected devices can be reset by the
> > +		 * user.
> > +		 *
> > +		 * If called from a group opened device and the user provides
> > +		 * a set of groups, all the devices in the dev_set should be
> > +		 * contained by the set of groups provided by the user.
> > +		 *
> > +		 * If called from a cdev opened device and the user provides
> > +		 * a zero-length array, all the devices in the dev_set must
> > +		 * be bound to the same iommufd_ctx as the input iommufd_ctx.
> > +		 * If there is any device that has not been bound to any
> > +		 * iommufd_ctx yet, check if its iommu_group has any device
> > +		 * bound to the input iommufd_ctx.  Such devices can be
> > +		 * considered owned by the input iommufd_ctx as the device
> > +		 * cannot be owned by another iommufd_ctx when its iommu_group
> > +		 * is owned.
> > +		 *
> > +		 * Otherwise, reset is not allowed.
> >  		 */
> > -		if (!vfio_dev_in_groups(cur_vma, groups)) {
> > +		if (iommufd_ctx) {
> > +			int devid = vfio_iommufd_device_hot_reset_devid(&cur_vma-
> >vdev,
> > +									iommufd_ctx);
> > +
> > +			owned = (devid != VFIO_PCI_DEVID_NOT_OWNED);
> > +		} else {
> > +			owned = vfio_dev_in_groups(&cur_vma->vdev, groups);
> > +		}
> > +
> > +		if (!owned) {
> >  			ret = -EINVAL;
> >  			goto err_undo;
> >  		}
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 70cc31e6b1ce..f753124e1c82 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -690,6 +690,9 @@ enum {
> >   *	  affected devices are represented in the dev_set and also owned by
> >   *	  the user.  This flag is available only when
> >   *	  flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is set, otherwise reserved.
> > + *	  When set, user could invoke VFIO_DEVICE_PCI_HOT_RESET with a zero
> > + *	  length fd array on the calling device as the ownership is validated
> > + *	  by iommufd_ctx.
> >   *
> >   * Return: 0 on success, -errno on failure:
> >   *	-enospc = insufficient buffer, -enodev = unsupported for device.
> > @@ -721,6 +724,17 @@ struct vfio_pci_hot_reset_info {
> >   * VFIO_DEVICE_PCI_HOT_RESET - _IOW(VFIO_TYPE, VFIO_BASE + 13,
> >   *				    struct vfio_pci_hot_reset)
> >   *
> > + * Userspace requests hot reset for the devices it operates.  Due to the
> > + * underlying topology, multiple devices can be affected in the reset
> > + * while some might be opened by another user.  To avoid interference
> > + * the calling user must ensure all affected devices are owned by itself.
> 
> This phrasing suggest to me that we're placing the responsibility on
> the user to avoid resetting another user's devices.

This responsibility is not new. Is it? 😊

> Perhaps these
> paragraphs could be replaced with:
> 
>   A PCI hot reset results in either a bus or slot reset which may affect
>   other devices sharing the bus/slot.  The calling user must have
>   ownership of the full set of affected devices as determined by the
>   VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl.
> 
>   When called on a device file descriptor acquired through the vfio
>   group interface, the user is required to provide proof of ownership
>   of those affected devices via the group_fds array in struct
>   vfio_pci_hot_reset.
> 
>   When called on a direct cdev opened vfio device, the flags field of
>   struct vfio_pci_hot_reset_info reports the ownership status of the
>   affected devices and this ioctl must be called with an empty group_fds
>   array.  See above INFO ioctl definition for ownership requirements.
> 
>   Mixed usage of legacy groups and cdevs across the set of affected
>   devices is not supported.

Above is better. 

> Other than this and the couple other comments, the series looks ok to
> me.  We still need acks from Jason for iommufd on 3-5.  Thanks,

Thanks, perhaps one more version after getting feedback from Jason.

Regards,
Yi Liu

> Alex
> 
> > + *
> > + * As the ownership described by VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, the
> > + * cdev opened devices must exclusively provide a zero-length fd array and
> > + * the group opened devices must exclusively use an array of group fds for
> > + * proof of ownership.  Mixed access to devices between cdev and legacy
> > + * groups are not supported by this interface.
> > + *
> >   * Return: 0 on success, -errno on failure.
> >   */
> >  struct vfio_pci_hot_reset {


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v7 9/9] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-06-09  0:13       ` [Intel-gfx] " Liu, Yi L
@ 2023-06-09 14:38         ` Jason Gunthorpe
  -1 siblings, 0 replies; 77+ messages in thread
From: Jason Gunthorpe @ 2023-06-09 14:38 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Alex Williamson, Tian, Kevin, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390, Hao,
	Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting, Duan,
	Zhenzhong, clegoate

On Fri, Jun 09, 2023 at 12:13:58AM +0000, Liu, Yi L wrote:

> > Other than this and the couple other comments, the series looks ok to
> > me.  We still need acks from Jason for iommufd on 3-5.  Thanks,
> 
> Thanks, perhaps one more version after getting feedback from Jason.

Yes, perhaps today I can reach it

Jason

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH v7 9/9] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
@ 2023-06-09 14:38         ` Jason Gunthorpe
  0 siblings, 0 replies; 77+ messages in thread
From: Jason Gunthorpe @ 2023-06-09 14:38 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato, jasowang, Hao, Xudong, Duan, Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, Tian, Kevin, Zhao, Yan Y, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Fri, Jun 09, 2023 at 12:13:58AM +0000, Liu, Yi L wrote:

> > Other than this and the couple other comments, the series looks ok to
> > me.  We still need acks from Jason for iommufd on 3-5.  Thanks,
> 
> Thanks, perhaps one more version after getting feedback from Jason.

Yes, perhaps today I can reach it

Jason

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v7 4/9] iommufd: Add iommufd_ctx_has_group()
  2023-06-02 12:15   ` [Intel-gfx] " Yi Liu
@ 2023-06-13 11:46     ` Jason Gunthorpe
  -1 siblings, 0 replies; 77+ messages in thread
From: Jason Gunthorpe @ 2023-06-13 11:46 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang,
	zhenzhong.duan, clegoate

On Fri, Jun 02, 2023 at 05:15:10AM -0700, Yi Liu wrote:
> This adds the helper to check if any device within the given iommu_group
> has been bound with the iommufd_ctx. This is helpful for the checking on
> device ownership for the devices which have not been bound but cannot be
> bound to any other iommufd_ctx as the iommu_group has been bound.
> 
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/iommu/iommufd/device.c | 30 ++++++++++++++++++++++++++++++
>  include/linux/iommufd.h        |  8 ++++++++
>  2 files changed, 38 insertions(+)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH v7 4/9] iommufd: Add iommufd_ctx_has_group()
@ 2023-06-13 11:46     ` Jason Gunthorpe
  0 siblings, 0 replies; 77+ messages in thread
From: Jason Gunthorpe @ 2023-06-13 11:46 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, kvm, lulu, yanting.jiang,
	joro, nicolinc, kevin.tian, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Fri, Jun 02, 2023 at 05:15:10AM -0700, Yi Liu wrote:
> This adds the helper to check if any device within the given iommu_group
> has been bound with the iommufd_ctx. This is helpful for the checking on
> device ownership for the devices which have not been bound but cannot be
> bound to any other iommufd_ctx as the iommu_group has been bound.
> 
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/iommu/iommufd/device.c | 30 ++++++++++++++++++++++++++++++
>  include/linux/iommufd.h        |  8 ++++++++
>  2 files changed, 38 insertions(+)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v7 3/9] iommufd: Reserve all negative IDs in the iommufd xarray
  2023-06-02 12:15   ` [Intel-gfx] " Yi Liu
@ 2023-06-13 11:46     ` Jason Gunthorpe
  -1 siblings, 0 replies; 77+ messages in thread
From: Jason Gunthorpe @ 2023-06-13 11:46 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang,
	zhenzhong.duan, clegoate

On Fri, Jun 02, 2023 at 05:15:09AM -0700, Yi Liu wrote:
> With this reservation, IOMMUFD users can encode the negative IDs for
> specific purposes. e.g. VFIO needs two reserved values to tell userspace
> the ID returned is not valid but has other meaning.
> 
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/iommu/iommufd/main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH v7 3/9] iommufd: Reserve all negative IDs in the iommufd xarray
@ 2023-06-13 11:46     ` Jason Gunthorpe
  0 siblings, 0 replies; 77+ messages in thread
From: Jason Gunthorpe @ 2023-06-13 11:46 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, kvm, lulu, yanting.jiang,
	joro, nicolinc, kevin.tian, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Fri, Jun 02, 2023 at 05:15:09AM -0700, Yi Liu wrote:
> With this reservation, IOMMUFD users can encode the negative IDs for
> specific purposes. e.g. VFIO needs two reserved values to tell userspace
> the ID returned is not valid but has other meaning.
> 
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/iommu/iommufd/main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v7 8/9] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
  2023-06-02 12:15   ` [Intel-gfx] " Yi Liu
@ 2023-06-13 11:46     ` Jason Gunthorpe
  -1 siblings, 0 replies; 77+ messages in thread
From: Jason Gunthorpe @ 2023-06-13 11:46 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang,
	zhenzhong.duan, clegoate

On Fri, Jun 02, 2023 at 05:15:14AM -0700, Yi Liu wrote:
> +/*
> + * Return devid for a device which is affected by hot-reset.
> + * - valid devid > 0 for the device that is bound to the input
> + *   iommufd_ctx.
> + * - devid == VFIO_PCI_DEVID_OWNED for the device that has not
> + *   been bound to any iommufd_ctx but other device within its
> + *   group has been bound to the input iommufd_ctx.
> + * - devid == VFIO_PCI_DEVID_NOT_OWNED for others. e.g. device
> + *   is bound to other iommufd_ctx etc.
> + */
> +int vfio_iommufd_device_hot_reset_devid(struct vfio_device *vdev,
> +					struct iommufd_ctx *ictx)
> +{
> +	struct iommu_group *group;
> +	int devid;
> +
> +	if (vfio_iommufd_device_ictx(vdev) == ictx)
> +		return vfio_iommufd_device_id(vdev);
> +
> +	group = iommu_group_get(vdev->dev);
> +	if (!group)
> +		return VFIO_PCI_DEVID_NOT_OWNED;
> +
> +	if (iommufd_ctx_has_group(ictx, group))
> +		devid = VFIO_PCI_DEVID_OWNED;
> +	else
> +		devid = VFIO_PCI_DEVID_NOT_OWNED;
> +
> +	iommu_group_put(group);
> +
> +	return devid;
> +}
> +EXPORT_SYMBOL_GPL(vfio_iommufd_device_hot_reset_devid);

This function really should not be in the core iommufd.c file - it is
a purely vfio-pci function - why did you have to place it here?

Jason

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH v7 8/9] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
@ 2023-06-13 11:46     ` Jason Gunthorpe
  0 siblings, 0 replies; 77+ messages in thread
From: Jason Gunthorpe @ 2023-06-13 11:46 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, kvm, lulu, yanting.jiang,
	joro, nicolinc, kevin.tian, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Fri, Jun 02, 2023 at 05:15:14AM -0700, Yi Liu wrote:
> +/*
> + * Return devid for a device which is affected by hot-reset.
> + * - valid devid > 0 for the device that is bound to the input
> + *   iommufd_ctx.
> + * - devid == VFIO_PCI_DEVID_OWNED for the device that has not
> + *   been bound to any iommufd_ctx but other device within its
> + *   group has been bound to the input iommufd_ctx.
> + * - devid == VFIO_PCI_DEVID_NOT_OWNED for others. e.g. device
> + *   is bound to other iommufd_ctx etc.
> + */
> +int vfio_iommufd_device_hot_reset_devid(struct vfio_device *vdev,
> +					struct iommufd_ctx *ictx)
> +{
> +	struct iommu_group *group;
> +	int devid;
> +
> +	if (vfio_iommufd_device_ictx(vdev) == ictx)
> +		return vfio_iommufd_device_id(vdev);
> +
> +	group = iommu_group_get(vdev->dev);
> +	if (!group)
> +		return VFIO_PCI_DEVID_NOT_OWNED;
> +
> +	if (iommufd_ctx_has_group(ictx, group))
> +		devid = VFIO_PCI_DEVID_OWNED;
> +	else
> +		devid = VFIO_PCI_DEVID_NOT_OWNED;
> +
> +	iommu_group_put(group);
> +
> +	return devid;
> +}
> +EXPORT_SYMBOL_GPL(vfio_iommufd_device_hot_reset_devid);

This function really should not be in the core iommufd.c file - it is
a purely vfio-pci function - why did you have to place it here?

Jason

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v7 5/9] iommufd: Add helper to retrieve iommufd_ctx and devid
  2023-06-02 12:15   ` [Intel-gfx] " Yi Liu
@ 2023-06-13 11:47     ` Jason Gunthorpe
  -1 siblings, 0 replies; 77+ messages in thread
From: Jason Gunthorpe @ 2023-06-13 11:47 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang,
	zhenzhong.duan, clegoate

On Fri, Jun 02, 2023 at 05:15:11AM -0700, Yi Liu wrote:
> This is needed by the vfio-pci driver to report affected devices in the
> hot-reset for a given device.
> 
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/iommu/iommufd/device.c | 12 ++++++++++++
>  include/linux/iommufd.h        |  3 +++
>  2 files changed, 15 insertions(+)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH v7 5/9] iommufd: Add helper to retrieve iommufd_ctx and devid
@ 2023-06-13 11:47     ` Jason Gunthorpe
  0 siblings, 0 replies; 77+ messages in thread
From: Jason Gunthorpe @ 2023-06-13 11:47 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, kvm, lulu, yanting.jiang,
	joro, nicolinc, kevin.tian, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Fri, Jun 02, 2023 at 05:15:11AM -0700, Yi Liu wrote:
> This is needed by the vfio-pci driver to report affected devices in the
> hot-reset for a given device.
> 
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/iommu/iommufd/device.c | 12 ++++++++++++
>  include/linux/iommufd.h        |  3 +++
>  2 files changed, 15 insertions(+)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH v7 7/9] vfio: Add helper to search vfio_device in a dev_set
  2023-06-02 12:15   ` [Intel-gfx] " Yi Liu
@ 2023-06-13 11:47     ` Jason Gunthorpe
  -1 siblings, 0 replies; 77+ messages in thread
From: Jason Gunthorpe @ 2023-06-13 11:47 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, kvm, lulu, yanting.jiang,
	joro, nicolinc, kevin.tian, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Fri, Jun 02, 2023 at 05:15:13AM -0700, Yi Liu wrote:
> There are drivers that need to search vfio_device within a given dev_set.
> e.g. vfio-pci. So add a helper.
> 
> vfio_pci_is_device_in_set() now returns -EBUSY in commit a882c16a2b7e
> ("vfio/pci: Change vfio_pci_try_bus_reset() to use the dev_set") where
> it was trying to preserve the return of vfio_pci_try_zap_and_vma_lock_cb().
> However, it makes more sense to return -ENODEV.
> 
> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/pci/vfio_pci_core.c |  6 +-----
>  drivers/vfio/vfio_main.c         | 15 +++++++++++++++
>  include/linux/vfio.h             |  3 +++
>  3 files changed, 19 insertions(+), 5 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v7 7/9] vfio: Add helper to search vfio_device in a dev_set
@ 2023-06-13 11:47     ` Jason Gunthorpe
  0 siblings, 0 replies; 77+ messages in thread
From: Jason Gunthorpe @ 2023-06-13 11:47 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang,
	zhenzhong.duan, clegoate

On Fri, Jun 02, 2023 at 05:15:13AM -0700, Yi Liu wrote:
> There are drivers that need to search vfio_device within a given dev_set.
> e.g. vfio-pci. So add a helper.
> 
> vfio_pci_is_device_in_set() now returns -EBUSY in commit a882c16a2b7e
> ("vfio/pci: Change vfio_pci_try_bus_reset() to use the dev_set") where
> it was trying to preserve the return of vfio_pci_try_zap_and_vma_lock_cb().
> However, it makes more sense to return -ENODEV.
> 
> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/pci/vfio_pci_core.c |  6 +-----
>  drivers/vfio/vfio_main.c         | 15 +++++++++++++++
>  include/linux/vfio.h             |  3 +++
>  3 files changed, 19 insertions(+), 5 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 77+ messages in thread

* RE: [PATCH v7 8/9] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
  2023-06-13 11:46     ` [Intel-gfx] " Jason Gunthorpe
@ 2023-06-13 12:50       ` Liu, Yi L
  -1 siblings, 0 replies; 77+ messages in thread
From: Liu, Yi L @ 2023-06-13 12:50 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: alex.williamson, Tian, Kevin, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390, Hao,
	Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting, Duan,
	Zhenzhong, clegoate

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, June 13, 2023 7:47 PM
> 
> On Fri, Jun 02, 2023 at 05:15:14AM -0700, Yi Liu wrote:
> > +/*
> > + * Return devid for a device which is affected by hot-reset.
> > + * - valid devid > 0 for the device that is bound to the input
> > + *   iommufd_ctx.
> > + * - devid == VFIO_PCI_DEVID_OWNED for the device that has not
> > + *   been bound to any iommufd_ctx but other device within its
> > + *   group has been bound to the input iommufd_ctx.
> > + * - devid == VFIO_PCI_DEVID_NOT_OWNED for others. e.g. device
> > + *   is bound to other iommufd_ctx etc.
> > + */
> > +int vfio_iommufd_device_hot_reset_devid(struct vfio_device *vdev,
> > +					struct iommufd_ctx *ictx)
> > +{
> > +	struct iommu_group *group;
> > +	int devid;
> > +
> > +	if (vfio_iommufd_device_ictx(vdev) == ictx)
> > +		return vfio_iommufd_device_id(vdev);
> > +
> > +	group = iommu_group_get(vdev->dev);
> > +	if (!group)
> > +		return VFIO_PCI_DEVID_NOT_OWNED;
> > +
> > +	if (iommufd_ctx_has_group(ictx, group))
> > +		devid = VFIO_PCI_DEVID_OWNED;
> > +	else
> > +		devid = VFIO_PCI_DEVID_NOT_OWNED;
> > +
> > +	iommu_group_put(group);
> > +
> > +	return devid;
> > +}
> > +EXPORT_SYMBOL_GPL(vfio_iommufd_device_hot_reset_devid);
> 
> This function really should not be in the core iommufd.c file - it is
> a purely vfio-pci function - why did you have to place it here?

Put it here can avoid calling iommufd_ctx_has_group() in vfio-pci,
which requires to import IOMMUFD_NS. If this reason is not so
strong I can move it back to vfio-pci code.

Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH v7 8/9] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
@ 2023-06-13 12:50       ` Liu, Yi L
  0 siblings, 0 replies; 77+ messages in thread
From: Liu, Yi L @ 2023-06-13 12:50 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato, jasowang, Hao, Xudong, Duan,  Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, Tian, Kevin, Zhao, Yan Y, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, June 13, 2023 7:47 PM
> 
> On Fri, Jun 02, 2023 at 05:15:14AM -0700, Yi Liu wrote:
> > +/*
> > + * Return devid for a device which is affected by hot-reset.
> > + * - valid devid > 0 for the device that is bound to the input
> > + *   iommufd_ctx.
> > + * - devid == VFIO_PCI_DEVID_OWNED for the device that has not
> > + *   been bound to any iommufd_ctx but other device within its
> > + *   group has been bound to the input iommufd_ctx.
> > + * - devid == VFIO_PCI_DEVID_NOT_OWNED for others. e.g. device
> > + *   is bound to other iommufd_ctx etc.
> > + */
> > +int vfio_iommufd_device_hot_reset_devid(struct vfio_device *vdev,
> > +					struct iommufd_ctx *ictx)
> > +{
> > +	struct iommu_group *group;
> > +	int devid;
> > +
> > +	if (vfio_iommufd_device_ictx(vdev) == ictx)
> > +		return vfio_iommufd_device_id(vdev);
> > +
> > +	group = iommu_group_get(vdev->dev);
> > +	if (!group)
> > +		return VFIO_PCI_DEVID_NOT_OWNED;
> > +
> > +	if (iommufd_ctx_has_group(ictx, group))
> > +		devid = VFIO_PCI_DEVID_OWNED;
> > +	else
> > +		devid = VFIO_PCI_DEVID_NOT_OWNED;
> > +
> > +	iommu_group_put(group);
> > +
> > +	return devid;
> > +}
> > +EXPORT_SYMBOL_GPL(vfio_iommufd_device_hot_reset_devid);
> 
> This function really should not be in the core iommufd.c file - it is
> a purely vfio-pci function - why did you have to place it here?

Put it here can avoid calling iommufd_ctx_has_group() in vfio-pci,
which requires to import IOMMUFD_NS. If this reason is not so
strong I can move it back to vfio-pci code.

Regards,
Yi Liu


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH v7 8/9] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
  2023-06-13 12:50       ` [Intel-gfx] " Liu, Yi L
@ 2023-06-13 14:32         ` Alex Williamson
  -1 siblings, 0 replies; 77+ messages in thread
From: Alex Williamson @ 2023-06-13 14:32 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato, jasowang, Hao, Xudong, Duan, Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, Jason Gunthorpe, Tian, Kevin, Zhao,  Yan Y,
	intel-gfx, eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Tue, 13 Jun 2023 12:50:43 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, June 13, 2023 7:47 PM
> > 
> > On Fri, Jun 02, 2023 at 05:15:14AM -0700, Yi Liu wrote:  
> > > +/*
> > > + * Return devid for a device which is affected by hot-reset.
> > > + * - valid devid > 0 for the device that is bound to the input
> > > + *   iommufd_ctx.
> > > + * - devid == VFIO_PCI_DEVID_OWNED for the device that has not
> > > + *   been bound to any iommufd_ctx but other device within its
> > > + *   group has been bound to the input iommufd_ctx.
> > > + * - devid == VFIO_PCI_DEVID_NOT_OWNED for others. e.g. device
> > > + *   is bound to other iommufd_ctx etc.
> > > + */
> > > +int vfio_iommufd_device_hot_reset_devid(struct vfio_device *vdev,
> > > +					struct iommufd_ctx *ictx)
> > > +{
> > > +	struct iommu_group *group;
> > > +	int devid;
> > > +
> > > +	if (vfio_iommufd_device_ictx(vdev) == ictx)
> > > +		return vfio_iommufd_device_id(vdev);
> > > +
> > > +	group = iommu_group_get(vdev->dev);
> > > +	if (!group)
> > > +		return VFIO_PCI_DEVID_NOT_OWNED;
> > > +
> > > +	if (iommufd_ctx_has_group(ictx, group))
> > > +		devid = VFIO_PCI_DEVID_OWNED;
> > > +	else
> > > +		devid = VFIO_PCI_DEVID_NOT_OWNED;
> > > +
> > > +	iommu_group_put(group);
> > > +
> > > +	return devid;
> > > +}
> > > +EXPORT_SYMBOL_GPL(vfio_iommufd_device_hot_reset_devid);  
> > 
> > This function really should not be in the core iommufd.c file - it is
> > a purely vfio-pci function - why did you have to place it here?  
> 
> Put it here can avoid calling iommufd_ctx_has_group() in vfio-pci,
> which requires to import IOMMUFD_NS. If this reason is not so
> strong I can move it back to vfio-pci code.

The PCI-isms here are the name of the function and the return value,
otherwise this is simply a "give me the devid for this device in this
context".  The function name is trivial to change and we can define the
internal errno usage such that the vfio-pci-core code can interpret the
correct uAPI value.  For example, -EXDEV ("Cross-device link") could
maybe be interpreted as owned and any other errno is not-owned, -ENODEV
maybe being the default.

Errno values are often contentious, are there other suggestions?
Thanks,

Alex


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v7 8/9] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
@ 2023-06-13 14:32         ` Alex Williamson
  0 siblings, 0 replies; 77+ messages in thread
From: Alex Williamson @ 2023-06-13 14:32 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Jason Gunthorpe, Tian, Kevin, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390, Hao,
	Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting, Duan,
	Zhenzhong, clegoate

On Tue, 13 Jun 2023 12:50:43 +0000
"Liu, Yi L" <yi.l.liu@intel.com> wrote:

> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Tuesday, June 13, 2023 7:47 PM
> > 
> > On Fri, Jun 02, 2023 at 05:15:14AM -0700, Yi Liu wrote:  
> > > +/*
> > > + * Return devid for a device which is affected by hot-reset.
> > > + * - valid devid > 0 for the device that is bound to the input
> > > + *   iommufd_ctx.
> > > + * - devid == VFIO_PCI_DEVID_OWNED for the device that has not
> > > + *   been bound to any iommufd_ctx but other device within its
> > > + *   group has been bound to the input iommufd_ctx.
> > > + * - devid == VFIO_PCI_DEVID_NOT_OWNED for others. e.g. device
> > > + *   is bound to other iommufd_ctx etc.
> > > + */
> > > +int vfio_iommufd_device_hot_reset_devid(struct vfio_device *vdev,
> > > +					struct iommufd_ctx *ictx)
> > > +{
> > > +	struct iommu_group *group;
> > > +	int devid;
> > > +
> > > +	if (vfio_iommufd_device_ictx(vdev) == ictx)
> > > +		return vfio_iommufd_device_id(vdev);
> > > +
> > > +	group = iommu_group_get(vdev->dev);
> > > +	if (!group)
> > > +		return VFIO_PCI_DEVID_NOT_OWNED;
> > > +
> > > +	if (iommufd_ctx_has_group(ictx, group))
> > > +		devid = VFIO_PCI_DEVID_OWNED;
> > > +	else
> > > +		devid = VFIO_PCI_DEVID_NOT_OWNED;
> > > +
> > > +	iommu_group_put(group);
> > > +
> > > +	return devid;
> > > +}
> > > +EXPORT_SYMBOL_GPL(vfio_iommufd_device_hot_reset_devid);  
> > 
> > This function really should not be in the core iommufd.c file - it is
> > a purely vfio-pci function - why did you have to place it here?  
> 
> Put it here can avoid calling iommufd_ctx_has_group() in vfio-pci,
> which requires to import IOMMUFD_NS. If this reason is not so
> strong I can move it back to vfio-pci code.

The PCI-isms here are the name of the function and the return value,
otherwise this is simply a "give me the devid for this device in this
context".  The function name is trivial to change and we can define the
internal errno usage such that the vfio-pci-core code can interpret the
correct uAPI value.  For example, -EXDEV ("Cross-device link") could
maybe be interpreted as owned and any other errno is not-owned, -ENODEV
maybe being the default.

Errno values are often contentious, are there other suggestions?
Thanks,

Alex


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v7 8/9] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
  2023-06-13 14:32         ` Alex Williamson
@ 2023-06-13 17:40           ` Jason Gunthorpe
  -1 siblings, 0 replies; 77+ messages in thread
From: Jason Gunthorpe @ 2023-06-13 17:40 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Liu, Yi L, Tian, Kevin, joro, robin.murphy, cohuck, eric.auger,
	nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun, peterx, jasowang,
	shameerali.kolothum.thodi, lulu, suravee.suthikulpanit,
	intel-gvt-dev, intel-gfx, linux-s390, Hao, Xudong, Zhao, Yan Y,
	Xu, Terrence, Jiang, Yanting, Duan, Zhenzhong, clegoate

On Tue, Jun 13, 2023 at 08:32:29AM -0600, Alex Williamson wrote:
> On Tue, 13 Jun 2023 12:50:43 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> 
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Tuesday, June 13, 2023 7:47 PM
> > > 
> > > On Fri, Jun 02, 2023 at 05:15:14AM -0700, Yi Liu wrote:  
> > > > +/*
> > > > + * Return devid for a device which is affected by hot-reset.
> > > > + * - valid devid > 0 for the device that is bound to the input
> > > > + *   iommufd_ctx.
> > > > + * - devid == VFIO_PCI_DEVID_OWNED for the device that has not
> > > > + *   been bound to any iommufd_ctx but other device within its
> > > > + *   group has been bound to the input iommufd_ctx.
> > > > + * - devid == VFIO_PCI_DEVID_NOT_OWNED for others. e.g. device
> > > > + *   is bound to other iommufd_ctx etc.
> > > > + */
> > > > +int vfio_iommufd_device_hot_reset_devid(struct vfio_device *vdev,
> > > > +					struct iommufd_ctx *ictx)
> > > > +{
> > > > +	struct iommu_group *group;
> > > > +	int devid;
> > > > +
> > > > +	if (vfio_iommufd_device_ictx(vdev) == ictx)
> > > > +		return vfio_iommufd_device_id(vdev);
> > > > +
> > > > +	group = iommu_group_get(vdev->dev);
> > > > +	if (!group)
> > > > +		return VFIO_PCI_DEVID_NOT_OWNED;
> > > > +
> > > > +	if (iommufd_ctx_has_group(ictx, group))
> > > > +		devid = VFIO_PCI_DEVID_OWNED;
> > > > +	else
> > > > +		devid = VFIO_PCI_DEVID_NOT_OWNED;
> > > > +
> > > > +	iommu_group_put(group);
> > > > +
> > > > +	return devid;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(vfio_iommufd_device_hot_reset_devid);  
> > > 
> > > This function really should not be in the core iommufd.c file - it is
> > > a purely vfio-pci function - why did you have to place it here?  
> > 
> > Put it here can avoid calling iommufd_ctx_has_group() in vfio-pci,
> > which requires to import IOMMUFD_NS. If this reason is not so
> > strong I can move it back to vfio-pci code.
> 
> The PCI-isms here are the name of the function and the return value,
> otherwise this is simply a "give me the devid for this device in this
> context".  The function name is trivial to change and we can define the
> internal errno usage such that the vfio-pci-core code can interpret the
> correct uAPI value.  For example, -EXDEV ("Cross-device link") could
> maybe be interpreted as owned and any other errno is not-owned, -ENODEV
> maybe being the default.

Yeah, this approach seems logical

If the function is called

  vfio_iommufd_get_dev_id(struct vfio_device *vdev, struct iommufd_ctx *ictx)

Then maybe 
  ENOENT = device is owned but there is no Id
  ENODEV = device is not owned

EXDEV is good too, nice symmetry with ENODEV - it doesn't really
matter since there is only one caller and there is no embedded errno
propogation.

Jason

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH v7 8/9] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
@ 2023-06-13 17:40           ` Jason Gunthorpe
  0 siblings, 0 replies; 77+ messages in thread
From: Jason Gunthorpe @ 2023-06-13 17:40 UTC (permalink / raw)
  To: Alex Williamson
  Cc: mjrosato, jasowang, Hao, Xudong, Duan, Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, Liu, Yi L, kvm, lulu, Jiang,
	Yanting, joro, nicolinc, Tian, Kevin, Zhao, Yan Y, intel-gfx,
	eric.auger, intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Tue, Jun 13, 2023 at 08:32:29AM -0600, Alex Williamson wrote:
> On Tue, 13 Jun 2023 12:50:43 +0000
> "Liu, Yi L" <yi.l.liu@intel.com> wrote:
> 
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Tuesday, June 13, 2023 7:47 PM
> > > 
> > > On Fri, Jun 02, 2023 at 05:15:14AM -0700, Yi Liu wrote:  
> > > > +/*
> > > > + * Return devid for a device which is affected by hot-reset.
> > > > + * - valid devid > 0 for the device that is bound to the input
> > > > + *   iommufd_ctx.
> > > > + * - devid == VFIO_PCI_DEVID_OWNED for the device that has not
> > > > + *   been bound to any iommufd_ctx but other device within its
> > > > + *   group has been bound to the input iommufd_ctx.
> > > > + * - devid == VFIO_PCI_DEVID_NOT_OWNED for others. e.g. device
> > > > + *   is bound to other iommufd_ctx etc.
> > > > + */
> > > > +int vfio_iommufd_device_hot_reset_devid(struct vfio_device *vdev,
> > > > +					struct iommufd_ctx *ictx)
> > > > +{
> > > > +	struct iommu_group *group;
> > > > +	int devid;
> > > > +
> > > > +	if (vfio_iommufd_device_ictx(vdev) == ictx)
> > > > +		return vfio_iommufd_device_id(vdev);
> > > > +
> > > > +	group = iommu_group_get(vdev->dev);
> > > > +	if (!group)
> > > > +		return VFIO_PCI_DEVID_NOT_OWNED;
> > > > +
> > > > +	if (iommufd_ctx_has_group(ictx, group))
> > > > +		devid = VFIO_PCI_DEVID_OWNED;
> > > > +	else
> > > > +		devid = VFIO_PCI_DEVID_NOT_OWNED;
> > > > +
> > > > +	iommu_group_put(group);
> > > > +
> > > > +	return devid;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(vfio_iommufd_device_hot_reset_devid);  
> > > 
> > > This function really should not be in the core iommufd.c file - it is
> > > a purely vfio-pci function - why did you have to place it here?  
> > 
> > Put it here can avoid calling iommufd_ctx_has_group() in vfio-pci,
> > which requires to import IOMMUFD_NS. If this reason is not so
> > strong I can move it back to vfio-pci code.
> 
> The PCI-isms here are the name of the function and the return value,
> otherwise this is simply a "give me the devid for this device in this
> context".  The function name is trivial to change and we can define the
> internal errno usage such that the vfio-pci-core code can interpret the
> correct uAPI value.  For example, -EXDEV ("Cross-device link") could
> maybe be interpreted as owned and any other errno is not-owned, -ENODEV
> maybe being the default.

Yeah, this approach seems logical

If the function is called

  vfio_iommufd_get_dev_id(struct vfio_device *vdev, struct iommufd_ctx *ictx)

Then maybe 
  ENOENT = device is owned but there is no Id
  ENODEV = device is not owned

EXDEV is good too, nice symmetry with ENODEV - it doesn't really
matter since there is only one caller and there is no embedded errno
propogation.

Jason

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v7 6/9] vfio: Mark cdev usage in vfio_device
  2023-06-02 12:15   ` [Intel-gfx] " Yi Liu
@ 2023-06-13 17:56     ` Jason Gunthorpe
  -1 siblings, 0 replies; 77+ messages in thread
From: Jason Gunthorpe @ 2023-06-13 17:56 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang,
	zhenzhong.duan, clegoate

On Fri, Jun 02, 2023 at 05:15:12AM -0700, Yi Liu wrote:
> This can be used to differentiate whether to report group_id or devid in
> the revised VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl. At this moment, no
> cdev path yet, so the vfio_device_cdev_opened() helper always returns false.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  include/linux/vfio.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 2c137ea94a3e..2a45853773a6 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -139,6 +139,11 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
>  	((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
>  #endif
>  
> +static inline bool vfio_device_cdev_opened(struct vfio_device *device)
> +{
> +	return false;
> +}

This and the two hunks in the other two patches that use this function
should be folded into the cdev series, probably just flattened to one
patch

Jason

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH v7 6/9] vfio: Mark cdev usage in vfio_device
@ 2023-06-13 17:56     ` Jason Gunthorpe
  0 siblings, 0 replies; 77+ messages in thread
From: Jason Gunthorpe @ 2023-06-13 17:56 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, kvm, lulu, yanting.jiang,
	joro, nicolinc, kevin.tian, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Fri, Jun 02, 2023 at 05:15:12AM -0700, Yi Liu wrote:
> This can be used to differentiate whether to report group_id or devid in
> the revised VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl. At this moment, no
> cdev path yet, so the vfio_device_cdev_opened() helper always returns false.
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Tested-by: Terrence Xu <terrence.xu@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  include/linux/vfio.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 2c137ea94a3e..2a45853773a6 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -139,6 +139,11 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
>  	((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
>  #endif
>  
> +static inline bool vfio_device_cdev_opened(struct vfio_device *device)
> +{
> +	return false;
> +}

This and the two hunks in the other two patches that use this function
should be folded into the cdev series, probably just flattened to one
patch

Jason

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v7 9/9] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
  2023-06-02 12:15   ` [Intel-gfx] " Yi Liu
@ 2023-06-13 18:09     ` Jason Gunthorpe
  -1 siblings, 0 replies; 77+ messages in thread
From: Jason Gunthorpe @ 2023-06-13 18:09 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang,
	zhenzhong.duan, clegoate

On Fri, Jun 02, 2023 at 05:15:15AM -0700, Yi Liu wrote:
> This is the way user to invoke hot-reset for the devices opened by cdev
> interface. User should check the flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED
> in the output of VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl before doing
> hot-reset for cdev devices.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/pci/vfio_pci_core.c | 61 ++++++++++++++++++++++++++------
>  include/uapi/linux/vfio.h        | 14 ++++++++
>  2 files changed, 64 insertions(+), 11 deletions(-)

This looks OK but it should be in the cdev series..

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH v7 9/9] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET
@ 2023-06-13 18:09     ` Jason Gunthorpe
  0 siblings, 0 replies; 77+ messages in thread
From: Jason Gunthorpe @ 2023-06-13 18:09 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, kvm, lulu, yanting.jiang,
	joro, nicolinc, kevin.tian, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Fri, Jun 02, 2023 at 05:15:15AM -0700, Yi Liu wrote:
> This is the way user to invoke hot-reset for the devices opened by cdev
> interface. User should check the flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED
> in the output of VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl before doing
> hot-reset for cdev devices.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Tested-by: Yanting Jiang <yanting.jiang@intel.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/pci/vfio_pci_core.c | 61 ++++++++++++++++++++++++++------
>  include/uapi/linux/vfio.h        | 14 ++++++++
>  2 files changed, 64 insertions(+), 11 deletions(-)

This looks OK but it should be in the cdev series..

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v7 8/9] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
  2023-06-02 12:15   ` [Intel-gfx] " Yi Liu
@ 2023-06-13 18:23     ` Jason Gunthorpe
  -1 siblings, 0 replies; 77+ messages in thread
From: Jason Gunthorpe @ 2023-06-13 18:23 UTC (permalink / raw)
  To: Yi Liu
  Cc: alex.williamson, kevin.tian, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390,
	xudong.hao, yan.y.zhao, terrence.xu, yanting.jiang,
	zhenzhong.duan, clegoate

On Fri, Jun 02, 2023 at 05:15:14AM -0700, Yi Liu wrote:
> This allows VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl use the iommufd_ctx
> of the cdev device to check the ownership of the other affected devices.
> 
> When VFIO_DEVICE_GET_PCI_HOT_RESET_INFO is called on an IOMMUFD managed
> device, the new flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is reported to indicate
> the values returned are IOMMUFD devids rather than group IDs as used when
> accessing vfio devices through the conventional vfio group interface.
> Additionally the flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED will be reported
> in this mode if all of the devices affected by the hot-reset are owned by
> either virtue of being directly bound to the same iommufd context as the
> calling device, or implicitly owned via a shared IOMMU group.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/iommufd.c           | 49 +++++++++++++++++++++++++++++++
>  drivers/vfio/pci/vfio_pci_core.c | 47 +++++++++++++++++++++++++-----
>  include/linux/vfio.h             | 16 ++++++++++
>  include/uapi/linux/vfio.h        | 50 +++++++++++++++++++++++++++++++-
>  4 files changed, 154 insertions(+), 8 deletions(-)

This could use some more fiddling, like we could copy each
vfio_pci_dependent_device to user memory inside the loop instead of
allocating an array.

Add another patch with something like this in it:

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index b0eadafcbcf502..516e0fda74bec9 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -775,19 +775,23 @@ static int vfio_pci_count_devs(struct pci_dev *pdev, void *data)
 }
 
 struct vfio_pci_fill_info {
-	int max;
-	int cur;
-	struct vfio_pci_dependent_device *devices;
+	struct vfio_pci_dependent_device __user *devices;
+	struct vfio_pci_dependent_device __user *devices_end;
 	struct vfio_device *vdev;
 	u32 flags;
 };
 
 static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
 {
+	struct vfio_pci_dependent_device info = {
+		.segment = pci_domain_nr(pdev->bus),
+		.bus = pdev->bus->number,
+		.devfn = pdev->devfn,
+	};
 	struct vfio_pci_fill_info *fill = data;
 
-	if (fill->cur == fill->max)
-		return -EAGAIN; /* Something changed, try again */
+	if (fill->devices_end >= fill->devices)
+		return -ENOSPC;
 
 	if (fill->flags & VFIO_PCI_HOT_RESET_FLAG_DEV_ID) {
 		struct iommufd_ctx *iommufd = vfio_iommufd_device_ictx(fill->vdev);
@@ -800,12 +804,12 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
 		 */
 		vdev = vfio_find_device_in_devset(dev_set, &pdev->dev);
 		if (!vdev)
-			fill->devices[fill->cur].devid = VFIO_PCI_DEVID_NOT_OWNED;
+			info.devid = VFIO_PCI_DEVID_NOT_OWNED;
 		else
-			fill->devices[fill->cur].devid =
-				vfio_iommufd_device_hot_reset_devid(vdev, iommufd);
+			info.devid = vfio_iommufd_device_hot_reset_devid(
+				vdev, iommufd);
 		/* If devid is VFIO_PCI_DEVID_NOT_OWNED, clear owned flag. */
-		if (fill->devices[fill->cur].devid == VFIO_PCI_DEVID_NOT_OWNED)
+		if (info.devid == VFIO_PCI_DEVID_NOT_OWNED)
 			fill->flags &= ~VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED;
 	} else {
 		struct iommu_group *iommu_group;
@@ -814,13 +818,13 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
 		if (!iommu_group)
 			return -EPERM; /* Cannot reset non-isolated devices */
 
-		fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
+		info.group_id = iommu_group_id(iommu_group);
 		iommu_group_put(iommu_group);
 	}
-	fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
-	fill->devices[fill->cur].bus = pdev->bus->number;
-	fill->devices[fill->cur].devfn = pdev->devfn;
-	fill->cur++;
+
+	if (copy_to_user(fill->devices, &info, sizeof(info)))
+		return -EFAULT;
+	fill->devices++;
 	return 0;
 }
 
@@ -1212,8 +1216,7 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
 	unsigned long minsz =
 		offsetofend(struct vfio_pci_hot_reset_info, count);
 	struct vfio_pci_hot_reset_info hdr;
-	struct vfio_pci_fill_info fill = { 0 };
-	struct vfio_pci_dependent_device *devices = NULL;
+	struct vfio_pci_fill_info fill = {};
 	bool slot = false;
 	int ret = 0;
 
@@ -1231,29 +1234,9 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
 	else if (pci_probe_reset_bus(vdev->pdev->bus))
 		return -ENODEV;
 
-	/* How many devices are affected? */
-	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_count_devs,
-					    &fill.max, slot);
-	if (ret)
-		return ret;
-
-	WARN_ON(!fill.max); /* Should always be at least one */
-
-	/*
-	 * If there's enough space, fill it now, otherwise return -ENOSPC and
-	 * the number of devices affected.
-	 */
-	if (hdr.argsz < sizeof(hdr) + (fill.max * sizeof(*devices))) {
-		ret = -ENOSPC;
-		hdr.count = fill.max;
-		goto reset_info_exit;
-	}
-
-	devices = kcalloc(fill.max, sizeof(*devices), GFP_KERNEL);
-	if (!devices)
-		return -ENOMEM;
-
-	fill.devices = devices;
+	fill.devices = arg->devices;
+	fill.devices_end = arg->devices +
+			   (hdr.argsz - sizeof(hdr)) / sizeof(arg->devices[0]);
 	fill.vdev = &vdev->vdev;
 
 	if (vfio_device_cdev_opened(&vdev->vdev))
@@ -1264,29 +1247,14 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
 	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
 					    &fill, slot);
 	mutex_unlock(&vdev->vdev.dev_set->lock);
+	if (ret)
+		return ret;
 
-	/*
-	 * If a device was removed between counting and filling, we may come up
-	 * short of fill.max.  If a device was added, we'll have a return of
-	 * -EAGAIN above.
-	 */
-	if (!ret) {
-		hdr.count = fill.cur;
-		hdr.flags = fill.flags;
-	}
-
-reset_info_exit:
+	hdr.count = fill.devices - arg->devices;
+	hdr.flags = fill.flags;
 	if (copy_to_user(arg, &hdr, minsz))
 		ret = -EFAULT;
-
-	if (!ret) {
-		if (copy_to_user(&arg->devices, devices,
-				 hdr.count * sizeof(*devices)))
-			ret = -EFAULT;
-	}
-
-	kfree(devices);
-	return ret;
+	return 0;
 }
 
 static int

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH v7 8/9] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
@ 2023-06-13 18:23     ` Jason Gunthorpe
  0 siblings, 0 replies; 77+ messages in thread
From: Jason Gunthorpe @ 2023-06-13 18:23 UTC (permalink / raw)
  To: Yi Liu
  Cc: mjrosato, jasowang, xudong.hao, zhenzhong.duan, peterx,
	terrence.xu, chao.p.peng, linux-s390, kvm, lulu, yanting.jiang,
	joro, nicolinc, kevin.tian, yan.y.zhao, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Fri, Jun 02, 2023 at 05:15:14AM -0700, Yi Liu wrote:
> This allows VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl use the iommufd_ctx
> of the cdev device to check the ownership of the other affected devices.
> 
> When VFIO_DEVICE_GET_PCI_HOT_RESET_INFO is called on an IOMMUFD managed
> device, the new flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is reported to indicate
> the values returned are IOMMUFD devids rather than group IDs as used when
> accessing vfio devices through the conventional vfio group interface.
> Additionally the flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED will be reported
> in this mode if all of the devices affected by the hot-reset are owned by
> either virtue of being directly bound to the same iommufd context as the
> calling device, or implicitly owned via a shared IOMMU group.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/vfio/iommufd.c           | 49 +++++++++++++++++++++++++++++++
>  drivers/vfio/pci/vfio_pci_core.c | 47 +++++++++++++++++++++++++-----
>  include/linux/vfio.h             | 16 ++++++++++
>  include/uapi/linux/vfio.h        | 50 +++++++++++++++++++++++++++++++-
>  4 files changed, 154 insertions(+), 8 deletions(-)

This could use some more fiddling, like we could copy each
vfio_pci_dependent_device to user memory inside the loop instead of
allocating an array.

Add another patch with something like this in it:

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index b0eadafcbcf502..516e0fda74bec9 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -775,19 +775,23 @@ static int vfio_pci_count_devs(struct pci_dev *pdev, void *data)
 }
 
 struct vfio_pci_fill_info {
-	int max;
-	int cur;
-	struct vfio_pci_dependent_device *devices;
+	struct vfio_pci_dependent_device __user *devices;
+	struct vfio_pci_dependent_device __user *devices_end;
 	struct vfio_device *vdev;
 	u32 flags;
 };
 
 static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
 {
+	struct vfio_pci_dependent_device info = {
+		.segment = pci_domain_nr(pdev->bus),
+		.bus = pdev->bus->number,
+		.devfn = pdev->devfn,
+	};
 	struct vfio_pci_fill_info *fill = data;
 
-	if (fill->cur == fill->max)
-		return -EAGAIN; /* Something changed, try again */
+	if (fill->devices_end >= fill->devices)
+		return -ENOSPC;
 
 	if (fill->flags & VFIO_PCI_HOT_RESET_FLAG_DEV_ID) {
 		struct iommufd_ctx *iommufd = vfio_iommufd_device_ictx(fill->vdev);
@@ -800,12 +804,12 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
 		 */
 		vdev = vfio_find_device_in_devset(dev_set, &pdev->dev);
 		if (!vdev)
-			fill->devices[fill->cur].devid = VFIO_PCI_DEVID_NOT_OWNED;
+			info.devid = VFIO_PCI_DEVID_NOT_OWNED;
 		else
-			fill->devices[fill->cur].devid =
-				vfio_iommufd_device_hot_reset_devid(vdev, iommufd);
+			info.devid = vfio_iommufd_device_hot_reset_devid(
+				vdev, iommufd);
 		/* If devid is VFIO_PCI_DEVID_NOT_OWNED, clear owned flag. */
-		if (fill->devices[fill->cur].devid == VFIO_PCI_DEVID_NOT_OWNED)
+		if (info.devid == VFIO_PCI_DEVID_NOT_OWNED)
 			fill->flags &= ~VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED;
 	} else {
 		struct iommu_group *iommu_group;
@@ -814,13 +818,13 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
 		if (!iommu_group)
 			return -EPERM; /* Cannot reset non-isolated devices */
 
-		fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
+		info.group_id = iommu_group_id(iommu_group);
 		iommu_group_put(iommu_group);
 	}
-	fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
-	fill->devices[fill->cur].bus = pdev->bus->number;
-	fill->devices[fill->cur].devfn = pdev->devfn;
-	fill->cur++;
+
+	if (copy_to_user(fill->devices, &info, sizeof(info)))
+		return -EFAULT;
+	fill->devices++;
 	return 0;
 }
 
@@ -1212,8 +1216,7 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
 	unsigned long minsz =
 		offsetofend(struct vfio_pci_hot_reset_info, count);
 	struct vfio_pci_hot_reset_info hdr;
-	struct vfio_pci_fill_info fill = { 0 };
-	struct vfio_pci_dependent_device *devices = NULL;
+	struct vfio_pci_fill_info fill = {};
 	bool slot = false;
 	int ret = 0;
 
@@ -1231,29 +1234,9 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
 	else if (pci_probe_reset_bus(vdev->pdev->bus))
 		return -ENODEV;
 
-	/* How many devices are affected? */
-	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_count_devs,
-					    &fill.max, slot);
-	if (ret)
-		return ret;
-
-	WARN_ON(!fill.max); /* Should always be at least one */
-
-	/*
-	 * If there's enough space, fill it now, otherwise return -ENOSPC and
-	 * the number of devices affected.
-	 */
-	if (hdr.argsz < sizeof(hdr) + (fill.max * sizeof(*devices))) {
-		ret = -ENOSPC;
-		hdr.count = fill.max;
-		goto reset_info_exit;
-	}
-
-	devices = kcalloc(fill.max, sizeof(*devices), GFP_KERNEL);
-	if (!devices)
-		return -ENOMEM;
-
-	fill.devices = devices;
+	fill.devices = arg->devices;
+	fill.devices_end = arg->devices +
+			   (hdr.argsz - sizeof(hdr)) / sizeof(arg->devices[0]);
 	fill.vdev = &vdev->vdev;
 
 	if (vfio_device_cdev_opened(&vdev->vdev))
@@ -1264,29 +1247,14 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
 	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
 					    &fill, slot);
 	mutex_unlock(&vdev->vdev.dev_set->lock);
+	if (ret)
+		return ret;
 
-	/*
-	 * If a device was removed between counting and filling, we may come up
-	 * short of fill.max.  If a device was added, we'll have a return of
-	 * -EAGAIN above.
-	 */
-	if (!ret) {
-		hdr.count = fill.cur;
-		hdr.flags = fill.flags;
-	}
-
-reset_info_exit:
+	hdr.count = fill.devices - arg->devices;
+	hdr.flags = fill.flags;
 	if (copy_to_user(arg, &hdr, minsz))
 		ret = -EFAULT;
-
-	if (!ret) {
-		if (copy_to_user(&arg->devices, devices,
-				 hdr.count * sizeof(*devices)))
-			ret = -EFAULT;
-	}
-
-	kfree(devices);
-	return ret;
+	return 0;
 }
 
 static int

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BUILD: failure for Enhance vfio PCI hot reset for vfio cdev device (rev6)
  2023-06-02 12:15 ` [Intel-gfx] " Yi Liu
                   ` (13 preceding siblings ...)
  (?)
@ 2023-06-13 20:47 ` Patchwork
  -1 siblings, 0 replies; 77+ messages in thread
From: Patchwork @ 2023-06-13 20:47 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: intel-gfx

== Series Details ==

Series: Enhance vfio PCI hot reset for vfio cdev device (rev6)
URL   : https://patchwork.freedesktop.org/series/116991/
State : failure

== Summary ==

Error: patch https://patchwork.freedesktop.org/api/1.0/series/116991/revisions/6/mbox/ not applied
Applying: vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset()
Applying: vfio/pci: Move the existing hot reset logic to be a helper
Applying: iommufd: Reserve all negative IDs in the iommufd xarray
Applying: iommufd: Add iommufd_ctx_has_group()
Applying: iommufd: Add helper to retrieve iommufd_ctx and devid
Applying: vfio: Mark cdev usage in vfio_device
Applying: vfio: Add helper to search vfio_device in a dev_set
Applying: vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
Using index info to reconstruct a base tree...
M	drivers/vfio/pci/vfio_pci_core.c
Falling back to patching base and 3-way merge...
Auto-merging drivers/vfio/pci/vfio_pci_core.c
CONFLICT (content): Merge conflict in drivers/vfio/pci/vfio_pci_core.c
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0008 vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Build failed, no error log produced



^ permalink raw reply	[flat|nested] 77+ messages in thread

* RE: [PATCH v7 6/9] vfio: Mark cdev usage in vfio_device
  2023-06-13 17:56     ` [Intel-gfx] " Jason Gunthorpe
@ 2023-06-14  5:56       ` Liu, Yi L
  -1 siblings, 0 replies; 77+ messages in thread
From: Liu, Yi L @ 2023-06-14  5:56 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: alex.williamson, Tian, Kevin, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390, Hao,
	Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting, Duan,
	Zhenzhong, clegoate

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, June 14, 2023 1:56 AM
> 
> On Fri, Jun 02, 2023 at 05:15:12AM -0700, Yi Liu wrote:
> > This can be used to differentiate whether to report group_id or devid in
> > the revised VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl. At this moment, no
> > cdev path yet, so the vfio_device_cdev_opened() helper always returns false.
> >
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > Tested-by: Terrence Xu <terrence.xu@intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  include/linux/vfio.h | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > index 2c137ea94a3e..2a45853773a6 100644
> > --- a/include/linux/vfio.h
> > +++ b/include/linux/vfio.h
> > @@ -139,6 +139,11 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device
> *vdev, u32 *pt_id);
> >  	((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
> >  #endif
> >
> > +static inline bool vfio_device_cdev_opened(struct vfio_device *device)
> > +{
> > +	return false;
> > +}
> 
> This and the two hunks in the other two patches that use this function
> should be folded into the cdev series, probably just flattened to one
> patch

Hmmm. I have a doubt about the rule. I think the reason to have this
sub-series is to avoid bumping the cdev series. So perhaps we can still
put this and the patch 9 in this series? Otherwise, most of the series
needs to be in the cdev series.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH v7 6/9] vfio: Mark cdev usage in vfio_device
@ 2023-06-14  5:56       ` Liu, Yi L
  0 siblings, 0 replies; 77+ messages in thread
From: Liu, Yi L @ 2023-06-14  5:56 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato, jasowang, Hao, Xudong, Duan,  Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, Tian, Kevin, Zhao, Yan Y, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, June 14, 2023 1:56 AM
> 
> On Fri, Jun 02, 2023 at 05:15:12AM -0700, Yi Liu wrote:
> > This can be used to differentiate whether to report group_id or devid in
> > the revised VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl. At this moment, no
> > cdev path yet, so the vfio_device_cdev_opened() helper always returns false.
> >
> > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > Tested-by: Terrence Xu <terrence.xu@intel.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  include/linux/vfio.h | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > index 2c137ea94a3e..2a45853773a6 100644
> > --- a/include/linux/vfio.h
> > +++ b/include/linux/vfio.h
> > @@ -139,6 +139,11 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device
> *vdev, u32 *pt_id);
> >  	((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
> >  #endif
> >
> > +static inline bool vfio_device_cdev_opened(struct vfio_device *device)
> > +{
> > +	return false;
> > +}
> 
> This and the two hunks in the other two patches that use this function
> should be folded into the cdev series, probably just flattened to one
> patch

Hmmm. I have a doubt about the rule. I think the reason to have this
sub-series is to avoid bumping the cdev series. So perhaps we can still
put this and the patch 9 in this series? Otherwise, most of the series
needs to be in the cdev series.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* RE: [PATCH v7 8/9] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
  2023-06-13 18:23     ` [Intel-gfx] " Jason Gunthorpe
@ 2023-06-14 10:35       ` Liu, Yi L
  -1 siblings, 0 replies; 77+ messages in thread
From: Liu, Yi L @ 2023-06-14 10:35 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: alex.williamson, Tian, Kevin, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390, Hao,
	Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting, Duan,
	Zhenzhong, clegoate

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, June 14, 2023 2:23 AM
> 
> On Fri, Jun 02, 2023 at 05:15:14AM -0700, Yi Liu wrote:
> > This allows VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl use the iommufd_ctx
> > of the cdev device to check the ownership of the other affected devices.
> >
> > When VFIO_DEVICE_GET_PCI_HOT_RESET_INFO is called on an IOMMUFD managed
> > device, the new flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is reported to indicate
> > the values returned are IOMMUFD devids rather than group IDs as used when
> > accessing vfio devices through the conventional vfio group interface.
> > Additionally the flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED will be reported
> > in this mode if all of the devices affected by the hot-reset are owned by
> > either virtue of being directly bound to the same iommufd context as the
> > calling device, or implicitly owned via a shared IOMMU group.
> >
> > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/iommufd.c           | 49 +++++++++++++++++++++++++++++++
> >  drivers/vfio/pci/vfio_pci_core.c | 47 +++++++++++++++++++++++++-----
> >  include/linux/vfio.h             | 16 ++++++++++
> >  include/uapi/linux/vfio.h        | 50 +++++++++++++++++++++++++++++++-
> >  4 files changed, 154 insertions(+), 8 deletions(-)
> 
> This could use some more fiddling, like we could copy each
> vfio_pci_dependent_device to user memory inside the loop instead of
> allocating an array.

I understand the motivation. But have some concerns. Please check
inline.

> Add another patch with something like this in it:
> 
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index b0eadafcbcf502..516e0fda74bec9 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -775,19 +775,23 @@ static int vfio_pci_count_devs(struct pci_dev *pdev, void
> *data)
>  }
> 
>  struct vfio_pci_fill_info {
> -	int max;
> -	int cur;
> -	struct vfio_pci_dependent_device *devices;
> +	struct vfio_pci_dependent_device __user *devices;
> +	struct vfio_pci_dependent_device __user *devices_end;
>  	struct vfio_device *vdev;
>  	u32 flags;
>  };
> 
>  static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
>  {
> +	struct vfio_pci_dependent_device info = {
> +		.segment = pci_domain_nr(pdev->bus),
> +		.bus = pdev->bus->number,
> +		.devfn = pdev->devfn,
> +	};
>  	struct vfio_pci_fill_info *fill = data;
> 
> -	if (fill->cur == fill->max)
> -		return -EAGAIN; /* Something changed, try again */
> +	if (fill->devices_end >= fill->devices)
> +		return -ENOSPC;

This should be fill->devices_end <= fill->devices. Even it's corrected. The
new code does not return -EAGAIN. And if return -ENOSPC, the expected
size should be returned. But I didn't see it. As the hunk below[1] is removed,
seems no way to know how many memory it requires.

> 
>  	if (fill->flags & VFIO_PCI_HOT_RESET_FLAG_DEV_ID) {
>  		struct iommufd_ctx *iommufd = vfio_iommufd_device_ictx(fill->vdev);
> @@ -800,12 +804,12 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
>  		 */
>  		vdev = vfio_find_device_in_devset(dev_set, &pdev->dev);
>  		if (!vdev)
> -			fill->devices[fill->cur].devid = VFIO_PCI_DEVID_NOT_OWNED;
> +			info.devid = VFIO_PCI_DEVID_NOT_OWNED;
>  		else
> -			fill->devices[fill->cur].devid =
> -				vfio_iommufd_device_hot_reset_devid(vdev, iommufd);
> +			info.devid = vfio_iommufd_device_hot_reset_devid(
> +				vdev, iommufd);
>  		/* If devid is VFIO_PCI_DEVID_NOT_OWNED, clear owned flag. */
> -		if (fill->devices[fill->cur].devid == VFIO_PCI_DEVID_NOT_OWNED)
> +		if (info.devid == VFIO_PCI_DEVID_NOT_OWNED)
>  			fill->flags &= ~VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED;
>  	} else {
>  		struct iommu_group *iommu_group;
> @@ -814,13 +818,13 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
>  		if (!iommu_group)
>  			return -EPERM; /* Cannot reset non-isolated devices */
> 
> -		fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> +		info.group_id = iommu_group_id(iommu_group);
>  		iommu_group_put(iommu_group);
>  	}
> -	fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
> -	fill->devices[fill->cur].bus = pdev->bus->number;
> -	fill->devices[fill->cur].devfn = pdev->devfn;
> -	fill->cur++;
> +
> +	if (copy_to_user(fill->devices, &info, sizeof(info)))
> +		return -EFAULT;
> +	fill->devices++;
>  	return 0;
>  }
> 
> @@ -1212,8 +1216,7 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
>  	unsigned long minsz =
>  		offsetofend(struct vfio_pci_hot_reset_info, count);
>  	struct vfio_pci_hot_reset_info hdr;
> -	struct vfio_pci_fill_info fill = { 0 };
> -	struct vfio_pci_dependent_device *devices = NULL;
> +	struct vfio_pci_fill_info fill = {};
>  	bool slot = false;
>  	int ret = 0;
> 
> @@ -1231,29 +1234,9 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
>  	else if (pci_probe_reset_bus(vdev->pdev->bus))
>  		return -ENODEV;
> 
> -	/* How many devices are affected? */
> -	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_count_devs,
> -					    &fill.max, slot);
> -	if (ret)
> -		return ret;
> -
> -	WARN_ON(!fill.max); /* Should always be at least one */
> -
> -	/*
> -	 * If there's enough space, fill it now, otherwise return -ENOSPC and
> -	 * the number of devices affected.
> -	 */
> -	if (hdr.argsz < sizeof(hdr) + (fill.max * sizeof(*devices))) {
> -		ret = -ENOSPC;
> -		hdr.count = fill.max;
> -		goto reset_info_exit;
> -	}

[1] The loop in this hunk figures out how many devices are affected
      and also figures out how many memory is needs.

> -
> -	devices = kcalloc(fill.max, sizeof(*devices), GFP_KERNEL);
> -	if (!devices)
> -		return -ENOMEM;
> -
> -	fill.devices = devices;
> +	fill.devices = arg->devices;
> +	fill.devices_end = arg->devices +
> +			   (hdr.argsz - sizeof(hdr)) / sizeof(arg->devices[0]);
>  	fill.vdev = &vdev->vdev;
> 
>  	if (vfio_device_cdev_opened(&vdev->vdev))
> @@ -1264,29 +1247,14 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
>  	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
>  					    &fill, slot);
>  	mutex_unlock(&vdev->vdev.dev_set->lock);
> +	if (ret)
> +		return ret;
> 
> -	/*
> -	 * If a device was removed between counting and filling, we may come up
> -	 * short of fill.max.  If a device was added, we'll have a return of
> -	 * -EAGAIN above.
> -	 */
> -	if (!ret) {
> -		hdr.count = fill.cur;
> -		hdr.flags = fill.flags;
> -	}

This mechanism is also removed though it may be rare.

> -
> -reset_info_exit:
> +	hdr.count = fill.devices - arg->devices;
> +	hdr.flags = fill.flags;
>  	if (copy_to_user(arg, &hdr, minsz))
>  		ret = -EFAULT;
> -
> -	if (!ret) {
> -		if (copy_to_user(&arg->devices, devices,
> -				 hdr.count * sizeof(*devices)))
> -			ret = -EFAULT;
> -	}
> -
> -	kfree(devices);
> -	return ret;
> +	return 0;

should still return ret as "if (copy_to_user(arg, &hdr, minsz))" can
fail.

>  }
> 
>  static int

It appears to me there are subtle changes in the uapi (-ENOSPC, -EAGAIN).
Though uapi header didn't document them. But per the comment in the
code, it's changed. Maybe we can do it in a follow-up patch instead of
part of this series.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH v7 8/9] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
@ 2023-06-14 10:35       ` Liu, Yi L
  0 siblings, 0 replies; 77+ messages in thread
From: Liu, Yi L @ 2023-06-14 10:35 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato, jasowang, Hao, Xudong, Duan,  Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, Tian, Kevin, Zhao, Yan Y, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, June 14, 2023 2:23 AM
> 
> On Fri, Jun 02, 2023 at 05:15:14AM -0700, Yi Liu wrote:
> > This allows VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl use the iommufd_ctx
> > of the cdev device to check the ownership of the other affected devices.
> >
> > When VFIO_DEVICE_GET_PCI_HOT_RESET_INFO is called on an IOMMUFD managed
> > device, the new flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is reported to indicate
> > the values returned are IOMMUFD devids rather than group IDs as used when
> > accessing vfio devices through the conventional vfio group interface.
> > Additionally the flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED will be reported
> > in this mode if all of the devices affected by the hot-reset are owned by
> > either virtue of being directly bound to the same iommufd context as the
> > calling device, or implicitly owned via a shared IOMMU group.
> >
> > Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> > Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/vfio/iommufd.c           | 49 +++++++++++++++++++++++++++++++
> >  drivers/vfio/pci/vfio_pci_core.c | 47 +++++++++++++++++++++++++-----
> >  include/linux/vfio.h             | 16 ++++++++++
> >  include/uapi/linux/vfio.h        | 50 +++++++++++++++++++++++++++++++-
> >  4 files changed, 154 insertions(+), 8 deletions(-)
> 
> This could use some more fiddling, like we could copy each
> vfio_pci_dependent_device to user memory inside the loop instead of
> allocating an array.

I understand the motivation. But have some concerns. Please check
inline.

> Add another patch with something like this in it:
> 
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index b0eadafcbcf502..516e0fda74bec9 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -775,19 +775,23 @@ static int vfio_pci_count_devs(struct pci_dev *pdev, void
> *data)
>  }
> 
>  struct vfio_pci_fill_info {
> -	int max;
> -	int cur;
> -	struct vfio_pci_dependent_device *devices;
> +	struct vfio_pci_dependent_device __user *devices;
> +	struct vfio_pci_dependent_device __user *devices_end;
>  	struct vfio_device *vdev;
>  	u32 flags;
>  };
> 
>  static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
>  {
> +	struct vfio_pci_dependent_device info = {
> +		.segment = pci_domain_nr(pdev->bus),
> +		.bus = pdev->bus->number,
> +		.devfn = pdev->devfn,
> +	};
>  	struct vfio_pci_fill_info *fill = data;
> 
> -	if (fill->cur == fill->max)
> -		return -EAGAIN; /* Something changed, try again */
> +	if (fill->devices_end >= fill->devices)
> +		return -ENOSPC;

This should be fill->devices_end <= fill->devices. Even it's corrected. The
new code does not return -EAGAIN. And if return -ENOSPC, the expected
size should be returned. But I didn't see it. As the hunk below[1] is removed,
seems no way to know how many memory it requires.

> 
>  	if (fill->flags & VFIO_PCI_HOT_RESET_FLAG_DEV_ID) {
>  		struct iommufd_ctx *iommufd = vfio_iommufd_device_ictx(fill->vdev);
> @@ -800,12 +804,12 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
>  		 */
>  		vdev = vfio_find_device_in_devset(dev_set, &pdev->dev);
>  		if (!vdev)
> -			fill->devices[fill->cur].devid = VFIO_PCI_DEVID_NOT_OWNED;
> +			info.devid = VFIO_PCI_DEVID_NOT_OWNED;
>  		else
> -			fill->devices[fill->cur].devid =
> -				vfio_iommufd_device_hot_reset_devid(vdev, iommufd);
> +			info.devid = vfio_iommufd_device_hot_reset_devid(
> +				vdev, iommufd);
>  		/* If devid is VFIO_PCI_DEVID_NOT_OWNED, clear owned flag. */
> -		if (fill->devices[fill->cur].devid == VFIO_PCI_DEVID_NOT_OWNED)
> +		if (info.devid == VFIO_PCI_DEVID_NOT_OWNED)
>  			fill->flags &= ~VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED;
>  	} else {
>  		struct iommu_group *iommu_group;
> @@ -814,13 +818,13 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
>  		if (!iommu_group)
>  			return -EPERM; /* Cannot reset non-isolated devices */
> 
> -		fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> +		info.group_id = iommu_group_id(iommu_group);
>  		iommu_group_put(iommu_group);
>  	}
> -	fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
> -	fill->devices[fill->cur].bus = pdev->bus->number;
> -	fill->devices[fill->cur].devfn = pdev->devfn;
> -	fill->cur++;
> +
> +	if (copy_to_user(fill->devices, &info, sizeof(info)))
> +		return -EFAULT;
> +	fill->devices++;
>  	return 0;
>  }
> 
> @@ -1212,8 +1216,7 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
>  	unsigned long minsz =
>  		offsetofend(struct vfio_pci_hot_reset_info, count);
>  	struct vfio_pci_hot_reset_info hdr;
> -	struct vfio_pci_fill_info fill = { 0 };
> -	struct vfio_pci_dependent_device *devices = NULL;
> +	struct vfio_pci_fill_info fill = {};
>  	bool slot = false;
>  	int ret = 0;
> 
> @@ -1231,29 +1234,9 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
>  	else if (pci_probe_reset_bus(vdev->pdev->bus))
>  		return -ENODEV;
> 
> -	/* How many devices are affected? */
> -	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_count_devs,
> -					    &fill.max, slot);
> -	if (ret)
> -		return ret;
> -
> -	WARN_ON(!fill.max); /* Should always be at least one */
> -
> -	/*
> -	 * If there's enough space, fill it now, otherwise return -ENOSPC and
> -	 * the number of devices affected.
> -	 */
> -	if (hdr.argsz < sizeof(hdr) + (fill.max * sizeof(*devices))) {
> -		ret = -ENOSPC;
> -		hdr.count = fill.max;
> -		goto reset_info_exit;
> -	}

[1] The loop in this hunk figures out how many devices are affected
      and also figures out how many memory is needs.

> -
> -	devices = kcalloc(fill.max, sizeof(*devices), GFP_KERNEL);
> -	if (!devices)
> -		return -ENOMEM;
> -
> -	fill.devices = devices;
> +	fill.devices = arg->devices;
> +	fill.devices_end = arg->devices +
> +			   (hdr.argsz - sizeof(hdr)) / sizeof(arg->devices[0]);
>  	fill.vdev = &vdev->vdev;
> 
>  	if (vfio_device_cdev_opened(&vdev->vdev))
> @@ -1264,29 +1247,14 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
>  	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
>  					    &fill, slot);
>  	mutex_unlock(&vdev->vdev.dev_set->lock);
> +	if (ret)
> +		return ret;
> 
> -	/*
> -	 * If a device was removed between counting and filling, we may come up
> -	 * short of fill.max.  If a device was added, we'll have a return of
> -	 * -EAGAIN above.
> -	 */
> -	if (!ret) {
> -		hdr.count = fill.cur;
> -		hdr.flags = fill.flags;
> -	}

This mechanism is also removed though it may be rare.

> -
> -reset_info_exit:
> +	hdr.count = fill.devices - arg->devices;
> +	hdr.flags = fill.flags;
>  	if (copy_to_user(arg, &hdr, minsz))
>  		ret = -EFAULT;
> -
> -	if (!ret) {
> -		if (copy_to_user(&arg->devices, devices,
> -				 hdr.count * sizeof(*devices)))
> -			ret = -EFAULT;
> -	}
> -
> -	kfree(devices);
> -	return ret;
> +	return 0;

should still return ret as "if (copy_to_user(arg, &hdr, minsz))" can
fail.

>  }
> 
>  static int

It appears to me there are subtle changes in the uapi (-ENOSPC, -EAGAIN).
Though uapi header didn't document them. But per the comment in the
code, it's changed. Maybe we can do it in a follow-up patch instead of
part of this series.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v7 6/9] vfio: Mark cdev usage in vfio_device
  2023-06-14  5:56       ` [Intel-gfx] " Liu, Yi L
@ 2023-06-14 12:11         ` Jason Gunthorpe
  -1 siblings, 0 replies; 77+ messages in thread
From: Jason Gunthorpe @ 2023-06-14 12:11 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: alex.williamson, Tian, Kevin, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390, Hao,
	Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting, Duan,
	Zhenzhong, clegoate

On Wed, Jun 14, 2023 at 05:56:08AM +0000, Liu, Yi L wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Wednesday, June 14, 2023 1:56 AM
> > 
> > On Fri, Jun 02, 2023 at 05:15:12AM -0700, Yi Liu wrote:
> > > This can be used to differentiate whether to report group_id or devid in
> > > the revised VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl. At this moment, no
> > > cdev path yet, so the vfio_device_cdev_opened() helper always returns false.
> > >
> > > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > > Tested-by: Terrence Xu <terrence.xu@intel.com>
> > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > ---
> > >  include/linux/vfio.h | 5 +++++
> > >  1 file changed, 5 insertions(+)
> > >
> > > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > > index 2c137ea94a3e..2a45853773a6 100644
> > > --- a/include/linux/vfio.h
> > > +++ b/include/linux/vfio.h
> > > @@ -139,6 +139,11 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device
> > *vdev, u32 *pt_id);
> > >  	((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
> > >  #endif
> > >
> > > +static inline bool vfio_device_cdev_opened(struct vfio_device *device)
> > > +{
> > > +	return false;
> > > +}
> > 
> > This and the two hunks in the other two patches that use this function
> > should be folded into the cdev series, probably just flattened to one
> > patch
> 
> Hmmm. I have a doubt about the rule. I think the reason to have this
> sub-series is to avoid bumping the cdev series. So perhaps we can still
> put this and the patch 9 in this series? Otherwise, most of the series
> needs to be in the cdev series.

Well, then Alex should apply them at the same time..

Jason

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH v7 6/9] vfio: Mark cdev usage in vfio_device
@ 2023-06-14 12:11         ` Jason Gunthorpe
  0 siblings, 0 replies; 77+ messages in thread
From: Jason Gunthorpe @ 2023-06-14 12:11 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato, jasowang, Hao, Xudong, Duan, Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, Tian, Kevin, Zhao, Yan Y, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Wed, Jun 14, 2023 at 05:56:08AM +0000, Liu, Yi L wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Wednesday, June 14, 2023 1:56 AM
> > 
> > On Fri, Jun 02, 2023 at 05:15:12AM -0700, Yi Liu wrote:
> > > This can be used to differentiate whether to report group_id or devid in
> > > the revised VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl. At this moment, no
> > > cdev path yet, so the vfio_device_cdev_opened() helper always returns false.
> > >
> > > Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> > > Tested-by: Terrence Xu <terrence.xu@intel.com>
> > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > ---
> > >  include/linux/vfio.h | 5 +++++
> > >  1 file changed, 5 insertions(+)
> > >
> > > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > > index 2c137ea94a3e..2a45853773a6 100644
> > > --- a/include/linux/vfio.h
> > > +++ b/include/linux/vfio.h
> > > @@ -139,6 +139,11 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device
> > *vdev, u32 *pt_id);
> > >  	((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
> > >  #endif
> > >
> > > +static inline bool vfio_device_cdev_opened(struct vfio_device *device)
> > > +{
> > > +	return false;
> > > +}
> > 
> > This and the two hunks in the other two patches that use this function
> > should be folded into the cdev series, probably just flattened to one
> > patch
> 
> Hmmm. I have a doubt about the rule. I think the reason to have this
> sub-series is to avoid bumping the cdev series. So perhaps we can still
> put this and the patch 9 in this series? Otherwise, most of the series
> needs to be in the cdev series.

Well, then Alex should apply them at the same time..

Jason

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v7 8/9] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
  2023-06-14 10:35       ` [Intel-gfx] " Liu, Yi L
@ 2023-06-14 12:17         ` Jason Gunthorpe
  -1 siblings, 0 replies; 77+ messages in thread
From: Jason Gunthorpe @ 2023-06-14 12:17 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: alex.williamson, Tian, Kevin, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390, Hao,
	Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting, Duan,
	Zhenzhong, clegoate

On Wed, Jun 14, 2023 at 10:35:10AM +0000, Liu, Yi L wrote:

> > -	if (fill->cur == fill->max)
> > -		return -EAGAIN; /* Something changed, try again */
> > +	if (fill->devices_end >= fill->devices)
> > +		return -ENOSPC;
> 
> This should be fill->devices_end <= fill->devices. 

Yep

> Even it's corrected. The
> new code does not return -EAGAIN. 

Right, there is no EAGAIN. If the caller didn't provide enough space
the previous version returned ENOSPC:

> > -	if (hdr.argsz < sizeof(hdr) + (fill.max * sizeof(*devices))) {
> > -		ret = -ENOSPC;
> > -		hdr.count = fill.max;
> > -		goto reset_info_exit;
> > -	}

-EAGAIN basically means the kernel internally malfunctioned - eg it
allocated too little space for the actual size of devices. That is no
longer possible in this version so it should never return -EAGAIN.

> And if return -ENOSPC, the expected
> size should be returned. But I didn't see it. As the hunk below[1] is removed,
> seems no way to know how many memory it requires.

Yes, I missed that, it should keep counting

Like this then

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index b0eadafcbcf502..05c064896a7a94 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -775,19 +775,25 @@ static int vfio_pci_count_devs(struct pci_dev *pdev, void *data)
 }
 
 struct vfio_pci_fill_info {
-	int max;
-	int cur;
-	struct vfio_pci_dependent_device *devices;
+	struct vfio_pci_dependent_device __user *devices;
+	struct vfio_pci_dependent_device __user *devices_end;
 	struct vfio_device *vdev;
+	u32 count;
 	u32 flags;
 };
 
 static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
 {
+	struct vfio_pci_dependent_device info = {
+		.segment = pci_domain_nr(pdev->bus),
+		.bus = pdev->bus->number,
+		.devfn = pdev->devfn,
+	};
 	struct vfio_pci_fill_info *fill = data;
 
-	if (fill->cur == fill->max)
-		return -EAGAIN; /* Something changed, try again */
+	fill.count++;
+	if (fill->devices >= fill->devices_end)
+		return 0;
 
 	if (fill->flags & VFIO_PCI_HOT_RESET_FLAG_DEV_ID) {
 		struct iommufd_ctx *iommufd = vfio_iommufd_device_ictx(fill->vdev);
@@ -800,12 +806,12 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
 		 */
 		vdev = vfio_find_device_in_devset(dev_set, &pdev->dev);
 		if (!vdev)
-			fill->devices[fill->cur].devid = VFIO_PCI_DEVID_NOT_OWNED;
+			info.devid = VFIO_PCI_DEVID_NOT_OWNED;
 		else
-			fill->devices[fill->cur].devid =
-				vfio_iommufd_device_hot_reset_devid(vdev, iommufd);
+			info.devid = vfio_iommufd_device_hot_reset_devid(
+				vdev, iommufd);
 		/* If devid is VFIO_PCI_DEVID_NOT_OWNED, clear owned flag. */
-		if (fill->devices[fill->cur].devid == VFIO_PCI_DEVID_NOT_OWNED)
+		if (info.devid == VFIO_PCI_DEVID_NOT_OWNED)
 			fill->flags &= ~VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED;
 	} else {
 		struct iommu_group *iommu_group;
@@ -814,13 +820,13 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
 		if (!iommu_group)
 			return -EPERM; /* Cannot reset non-isolated devices */
 
-		fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
+		info.group_id = iommu_group_id(iommu_group);
 		iommu_group_put(iommu_group);
 	}
-	fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
-	fill->devices[fill->cur].bus = pdev->bus->number;
-	fill->devices[fill->cur].devfn = pdev->devfn;
-	fill->cur++;
+
+	if (copy_to_user(fill->devices, &info, sizeof(info)))
+		return -EFAULT;
+	fill->devices++;
 	return 0;
 }
 
@@ -1212,8 +1218,7 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
 	unsigned long minsz =
 		offsetofend(struct vfio_pci_hot_reset_info, count);
 	struct vfio_pci_hot_reset_info hdr;
-	struct vfio_pci_fill_info fill = { 0 };
-	struct vfio_pci_dependent_device *devices = NULL;
+	struct vfio_pci_fill_info fill = {};
 	bool slot = false;
 	int ret = 0;
 
@@ -1231,29 +1236,9 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
 	else if (pci_probe_reset_bus(vdev->pdev->bus))
 		return -ENODEV;
 
-	/* How many devices are affected? */
-	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_count_devs,
-					    &fill.max, slot);
-	if (ret)
-		return ret;
-
-	WARN_ON(!fill.max); /* Should always be at least one */
-
-	/*
-	 * If there's enough space, fill it now, otherwise return -ENOSPC and
-	 * the number of devices affected.
-	 */
-	if (hdr.argsz < sizeof(hdr) + (fill.max * sizeof(*devices))) {
-		ret = -ENOSPC;
-		hdr.count = fill.max;
-		goto reset_info_exit;
-	}
-
-	devices = kcalloc(fill.max, sizeof(*devices), GFP_KERNEL);
-	if (!devices)
-		return -ENOMEM;
-
-	fill.devices = devices;
+	fill.devices = arg->devices;
+	fill.devices_end = arg->devices +
+			   (hdr.argsz - sizeof(hdr)) / sizeof(arg->devices[0]);
 	fill.vdev = &vdev->vdev;
 
 	if (vfio_device_cdev_opened(&vdev->vdev))
@@ -1264,29 +1249,17 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
 	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
 					    &fill, slot);
 	mutex_unlock(&vdev->vdev.dev_set->lock);
+	if (ret)
+		return ret;
 
-	/*
-	 * If a device was removed between counting and filling, we may come up
-	 * short of fill.max.  If a device was added, we'll have a return of
-	 * -EAGAIN above.
-	 */
-	if (!ret) {
-		hdr.count = fill.cur;
-		hdr.flags = fill.flags;
-	}
-
-reset_info_exit:
+	hdr.count = fill.count;
+	hdr.flags = fill.flags;
 	if (copy_to_user(arg, &hdr, minsz))
-		ret = -EFAULT;
+		return -EFAULT;
 
-	if (!ret) {
-		if (copy_to_user(&arg->devices, devices,
-				 hdr.count * sizeof(*devices)))
-			ret = -EFAULT;
-	}
-
-	kfree(devices);
-	return ret;
+	if (fill.count != fill.devices - arg->devices)
+		return -ENOSPC;
+	return 0;
 }
 
 static int

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH v7 8/9] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
@ 2023-06-14 12:17         ` Jason Gunthorpe
  0 siblings, 0 replies; 77+ messages in thread
From: Jason Gunthorpe @ 2023-06-14 12:17 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato, jasowang, Hao, Xudong, Duan, Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, Tian, Kevin, Zhao, Yan Y, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Wed, Jun 14, 2023 at 10:35:10AM +0000, Liu, Yi L wrote:

> > -	if (fill->cur == fill->max)
> > -		return -EAGAIN; /* Something changed, try again */
> > +	if (fill->devices_end >= fill->devices)
> > +		return -ENOSPC;
> 
> This should be fill->devices_end <= fill->devices. 

Yep

> Even it's corrected. The
> new code does not return -EAGAIN. 

Right, there is no EAGAIN. If the caller didn't provide enough space
the previous version returned ENOSPC:

> > -	if (hdr.argsz < sizeof(hdr) + (fill.max * sizeof(*devices))) {
> > -		ret = -ENOSPC;
> > -		hdr.count = fill.max;
> > -		goto reset_info_exit;
> > -	}

-EAGAIN basically means the kernel internally malfunctioned - eg it
allocated too little space for the actual size of devices. That is no
longer possible in this version so it should never return -EAGAIN.

> And if return -ENOSPC, the expected
> size should be returned. But I didn't see it. As the hunk below[1] is removed,
> seems no way to know how many memory it requires.

Yes, I missed that, it should keep counting

Like this then

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index b0eadafcbcf502..05c064896a7a94 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -775,19 +775,25 @@ static int vfio_pci_count_devs(struct pci_dev *pdev, void *data)
 }
 
 struct vfio_pci_fill_info {
-	int max;
-	int cur;
-	struct vfio_pci_dependent_device *devices;
+	struct vfio_pci_dependent_device __user *devices;
+	struct vfio_pci_dependent_device __user *devices_end;
 	struct vfio_device *vdev;
+	u32 count;
 	u32 flags;
 };
 
 static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
 {
+	struct vfio_pci_dependent_device info = {
+		.segment = pci_domain_nr(pdev->bus),
+		.bus = pdev->bus->number,
+		.devfn = pdev->devfn,
+	};
 	struct vfio_pci_fill_info *fill = data;
 
-	if (fill->cur == fill->max)
-		return -EAGAIN; /* Something changed, try again */
+	fill.count++;
+	if (fill->devices >= fill->devices_end)
+		return 0;
 
 	if (fill->flags & VFIO_PCI_HOT_RESET_FLAG_DEV_ID) {
 		struct iommufd_ctx *iommufd = vfio_iommufd_device_ictx(fill->vdev);
@@ -800,12 +806,12 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
 		 */
 		vdev = vfio_find_device_in_devset(dev_set, &pdev->dev);
 		if (!vdev)
-			fill->devices[fill->cur].devid = VFIO_PCI_DEVID_NOT_OWNED;
+			info.devid = VFIO_PCI_DEVID_NOT_OWNED;
 		else
-			fill->devices[fill->cur].devid =
-				vfio_iommufd_device_hot_reset_devid(vdev, iommufd);
+			info.devid = vfio_iommufd_device_hot_reset_devid(
+				vdev, iommufd);
 		/* If devid is VFIO_PCI_DEVID_NOT_OWNED, clear owned flag. */
-		if (fill->devices[fill->cur].devid == VFIO_PCI_DEVID_NOT_OWNED)
+		if (info.devid == VFIO_PCI_DEVID_NOT_OWNED)
 			fill->flags &= ~VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED;
 	} else {
 		struct iommu_group *iommu_group;
@@ -814,13 +820,13 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
 		if (!iommu_group)
 			return -EPERM; /* Cannot reset non-isolated devices */
 
-		fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
+		info.group_id = iommu_group_id(iommu_group);
 		iommu_group_put(iommu_group);
 	}
-	fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
-	fill->devices[fill->cur].bus = pdev->bus->number;
-	fill->devices[fill->cur].devfn = pdev->devfn;
-	fill->cur++;
+
+	if (copy_to_user(fill->devices, &info, sizeof(info)))
+		return -EFAULT;
+	fill->devices++;
 	return 0;
 }
 
@@ -1212,8 +1218,7 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
 	unsigned long minsz =
 		offsetofend(struct vfio_pci_hot_reset_info, count);
 	struct vfio_pci_hot_reset_info hdr;
-	struct vfio_pci_fill_info fill = { 0 };
-	struct vfio_pci_dependent_device *devices = NULL;
+	struct vfio_pci_fill_info fill = {};
 	bool slot = false;
 	int ret = 0;
 
@@ -1231,29 +1236,9 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
 	else if (pci_probe_reset_bus(vdev->pdev->bus))
 		return -ENODEV;
 
-	/* How many devices are affected? */
-	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_count_devs,
-					    &fill.max, slot);
-	if (ret)
-		return ret;
-
-	WARN_ON(!fill.max); /* Should always be at least one */
-
-	/*
-	 * If there's enough space, fill it now, otherwise return -ENOSPC and
-	 * the number of devices affected.
-	 */
-	if (hdr.argsz < sizeof(hdr) + (fill.max * sizeof(*devices))) {
-		ret = -ENOSPC;
-		hdr.count = fill.max;
-		goto reset_info_exit;
-	}
-
-	devices = kcalloc(fill.max, sizeof(*devices), GFP_KERNEL);
-	if (!devices)
-		return -ENOMEM;
-
-	fill.devices = devices;
+	fill.devices = arg->devices;
+	fill.devices_end = arg->devices +
+			   (hdr.argsz - sizeof(hdr)) / sizeof(arg->devices[0]);
 	fill.vdev = &vdev->vdev;
 
 	if (vfio_device_cdev_opened(&vdev->vdev))
@@ -1264,29 +1249,17 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
 	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
 					    &fill, slot);
 	mutex_unlock(&vdev->vdev.dev_set->lock);
+	if (ret)
+		return ret;
 
-	/*
-	 * If a device was removed between counting and filling, we may come up
-	 * short of fill.max.  If a device was added, we'll have a return of
-	 * -EAGAIN above.
-	 */
-	if (!ret) {
-		hdr.count = fill.cur;
-		hdr.flags = fill.flags;
-	}
-
-reset_info_exit:
+	hdr.count = fill.count;
+	hdr.flags = fill.flags;
 	if (copy_to_user(arg, &hdr, minsz))
-		ret = -EFAULT;
+		return -EFAULT;
 
-	if (!ret) {
-		if (copy_to_user(&arg->devices, devices,
-				 hdr.count * sizeof(*devices)))
-			ret = -EFAULT;
-	}
-
-	kfree(devices);
-	return ret;
+	if (fill.count != fill.devices - arg->devices)
+		return -ENOSPC;
+	return 0;
 }
 
 static int

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* RE: [PATCH v7 8/9] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
  2023-06-14 12:17         ` [Intel-gfx] " Jason Gunthorpe
@ 2023-06-14 13:05           ` Liu, Yi L
  -1 siblings, 0 replies; 77+ messages in thread
From: Liu, Yi L @ 2023-06-14 13:05 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: alex.williamson, Tian, Kevin, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390, Hao,
	Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting, Duan,
	Zhenzhong, clegoate

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, June 14, 2023 8:17 PM
> 
> On Wed, Jun 14, 2023 at 10:35:10AM +0000, Liu, Yi L wrote:
> 
> > > -	if (fill->cur == fill->max)
> > > -		return -EAGAIN; /* Something changed, try again */
> > > +	if (fill->devices_end >= fill->devices)
> > > +		return -ENOSPC;
> >
> > This should be fill->devices_end <= fill->devices.
> 
> Yep
> 
> > Even it's corrected. The
> > new code does not return -EAGAIN.
> 
> Right, there is no EAGAIN. If the caller didn't provide enough space
> the previous version returned ENOSPC:
> 
> > > -	if (hdr.argsz < sizeof(hdr) + (fill.max * sizeof(*devices))) {
> > > -		ret = -ENOSPC;
> > > -		hdr.count = fill.max;
> > > -		goto reset_info_exit;
> > > -	}
> 
> -EAGAIN basically means the kernel internally malfunctioned - eg it
> allocated too little space for the actual size of devices. That is no
> longer possible in this version so it should never return -EAGAIN.

I still have one doubt. Per my understanding, this is to handle newly
plugged devices during the info reporting path. I don’t think holding
dev_set lock can prevent it. but maybe -ENOSPC is enough. @Alex,
what about your opinion?

> > And if return -ENOSPC, the expected
> > size should be returned. But I didn't see it. As the hunk below[1] is removed,
> > seems no way to know how many memory it requires.
> 
> Yes, I missed that, it should keep counting
> 
> Like this then
> 
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index b0eadafcbcf502..05c064896a7a94 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -775,19 +775,25 @@ static int vfio_pci_count_devs(struct pci_dev *pdev, void
> *data)
>  }
> 
>  struct vfio_pci_fill_info {
> -	int max;
> -	int cur;
> -	struct vfio_pci_dependent_device *devices;
> +	struct vfio_pci_dependent_device __user *devices;
> +	struct vfio_pci_dependent_device __user *devices_end;
>  	struct vfio_device *vdev;
> +	u32 count;
>  	u32 flags;
>  };
> 
>  static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
>  {
> +	struct vfio_pci_dependent_device info = {
> +		.segment = pci_domain_nr(pdev->bus),
> +		.bus = pdev->bus->number,
> +		.devfn = pdev->devfn,
> +	};
>  	struct vfio_pci_fill_info *fill = data;
> 
> -	if (fill->cur == fill->max)
> -		return -EAGAIN; /* Something changed, try again */
> +	fill.count++;
> +	if (fill->devices >= fill->devices_end)
> +		return 0;
> 
>  	if (fill->flags & VFIO_PCI_HOT_RESET_FLAG_DEV_ID) {
>  		struct iommufd_ctx *iommufd = vfio_iommufd_device_ictx(fill->vdev);
> @@ -800,12 +806,12 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
>  		 */
>  		vdev = vfio_find_device_in_devset(dev_set, &pdev->dev);
>  		if (!vdev)
> -			fill->devices[fill->cur].devid = VFIO_PCI_DEVID_NOT_OWNED;
> +			info.devid = VFIO_PCI_DEVID_NOT_OWNED;
>  		else
> -			fill->devices[fill->cur].devid =
> -				vfio_iommufd_device_hot_reset_devid(vdev, iommufd);
> +			info.devid = vfio_iommufd_device_hot_reset_devid(
> +				vdev, iommufd);
>  		/* If devid is VFIO_PCI_DEVID_NOT_OWNED, clear owned flag. */
> -		if (fill->devices[fill->cur].devid == VFIO_PCI_DEVID_NOT_OWNED)
> +		if (info.devid == VFIO_PCI_DEVID_NOT_OWNED)
>  			fill->flags &= ~VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED;
>  	} else {
>  		struct iommu_group *iommu_group;
> @@ -814,13 +820,13 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
>  		if (!iommu_group)
>  			return -EPERM; /* Cannot reset non-isolated devices */
> 
> -		fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> +		info.group_id = iommu_group_id(iommu_group);
>  		iommu_group_put(iommu_group);
>  	}
> -	fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
> -	fill->devices[fill->cur].bus = pdev->bus->number;
> -	fill->devices[fill->cur].devfn = pdev->devfn;
> -	fill->cur++;
> +
> +	if (copy_to_user(fill->devices, &info, sizeof(info)))
> +		return -EFAULT;
> +	fill->devices++;
>  	return 0;
>  }
> 
> @@ -1212,8 +1218,7 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
>  	unsigned long minsz =
>  		offsetofend(struct vfio_pci_hot_reset_info, count);
>  	struct vfio_pci_hot_reset_info hdr;
> -	struct vfio_pci_fill_info fill = { 0 };
> -	struct vfio_pci_dependent_device *devices = NULL;
> +	struct vfio_pci_fill_info fill = {};
>  	bool slot = false;
>  	int ret = 0;
> 
> @@ -1231,29 +1236,9 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
>  	else if (pci_probe_reset_bus(vdev->pdev->bus))
>  		return -ENODEV;
> 
> -	/* How many devices are affected? */
> -	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_count_devs,
> -					    &fill.max, slot);
> -	if (ret)
> -		return ret;
> -
> -	WARN_ON(!fill.max); /* Should always be at least one */
> -
> -	/*
> -	 * If there's enough space, fill it now, otherwise return -ENOSPC and
> -	 * the number of devices affected.
> -	 */
> -	if (hdr.argsz < sizeof(hdr) + (fill.max * sizeof(*devices))) {
> -		ret = -ENOSPC;
> -		hdr.count = fill.max;
> -		goto reset_info_exit;
> -	}
> -
> -	devices = kcalloc(fill.max, sizeof(*devices), GFP_KERNEL);
> -	if (!devices)
> -		return -ENOMEM;
> -
> -	fill.devices = devices;
> +	fill.devices = arg->devices;
> +	fill.devices_end = arg->devices +
> +			   (hdr.argsz - sizeof(hdr)) / sizeof(arg->devices[0]);
>  	fill.vdev = &vdev->vdev;
> 
>  	if (vfio_device_cdev_opened(&vdev->vdev))
> @@ -1264,29 +1249,17 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
>  	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
>  					    &fill, slot);
>  	mutex_unlock(&vdev->vdev.dev_set->lock);
> +	if (ret)
> +		return ret;
> 
> -	/*
> -	 * If a device was removed between counting and filling, we may come up
> -	 * short of fill.max.  If a device was added, we'll have a return of
> -	 * -EAGAIN above.
> -	 */
> -	if (!ret) {
> -		hdr.count = fill.cur;
> -		hdr.flags = fill.flags;
> -	}
> -
> -reset_info_exit:
> +	hdr.count = fill.count;
> +	hdr.flags = fill.flags;
>  	if (copy_to_user(arg, &hdr, minsz))
> -		ret = -EFAULT;
> +		return -EFAULT;
> 
> -	if (!ret) {
> -		if (copy_to_user(&arg->devices, devices,
> -				 hdr.count * sizeof(*devices)))
> -			ret = -EFAULT;
> -	}
> -
> -	kfree(devices);
> -	return ret;
> +	if (fill.count != fill.devices - arg->devices)

Should be "if (fill.count != (fill.devices - arg->devices) / sizeof(arg->devices[0]))" 😊

Regards,
Yi Liu

> +		return -ENOSPC;
> +	return 0;
>  }
> 
>  static int

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH v7 8/9] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
@ 2023-06-14 13:05           ` Liu, Yi L
  0 siblings, 0 replies; 77+ messages in thread
From: Liu, Yi L @ 2023-06-14 13:05 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato, jasowang, Hao, Xudong, Duan,  Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, Tian, Kevin, Zhao, Yan Y, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, June 14, 2023 8:17 PM
> 
> On Wed, Jun 14, 2023 at 10:35:10AM +0000, Liu, Yi L wrote:
> 
> > > -	if (fill->cur == fill->max)
> > > -		return -EAGAIN; /* Something changed, try again */
> > > +	if (fill->devices_end >= fill->devices)
> > > +		return -ENOSPC;
> >
> > This should be fill->devices_end <= fill->devices.
> 
> Yep
> 
> > Even it's corrected. The
> > new code does not return -EAGAIN.
> 
> Right, there is no EAGAIN. If the caller didn't provide enough space
> the previous version returned ENOSPC:
> 
> > > -	if (hdr.argsz < sizeof(hdr) + (fill.max * sizeof(*devices))) {
> > > -		ret = -ENOSPC;
> > > -		hdr.count = fill.max;
> > > -		goto reset_info_exit;
> > > -	}
> 
> -EAGAIN basically means the kernel internally malfunctioned - eg it
> allocated too little space for the actual size of devices. That is no
> longer possible in this version so it should never return -EAGAIN.

I still have one doubt. Per my understanding, this is to handle newly
plugged devices during the info reporting path. I don’t think holding
dev_set lock can prevent it. but maybe -ENOSPC is enough. @Alex,
what about your opinion?

> > And if return -ENOSPC, the expected
> > size should be returned. But I didn't see it. As the hunk below[1] is removed,
> > seems no way to know how many memory it requires.
> 
> Yes, I missed that, it should keep counting
> 
> Like this then
> 
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index b0eadafcbcf502..05c064896a7a94 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -775,19 +775,25 @@ static int vfio_pci_count_devs(struct pci_dev *pdev, void
> *data)
>  }
> 
>  struct vfio_pci_fill_info {
> -	int max;
> -	int cur;
> -	struct vfio_pci_dependent_device *devices;
> +	struct vfio_pci_dependent_device __user *devices;
> +	struct vfio_pci_dependent_device __user *devices_end;
>  	struct vfio_device *vdev;
> +	u32 count;
>  	u32 flags;
>  };
> 
>  static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
>  {
> +	struct vfio_pci_dependent_device info = {
> +		.segment = pci_domain_nr(pdev->bus),
> +		.bus = pdev->bus->number,
> +		.devfn = pdev->devfn,
> +	};
>  	struct vfio_pci_fill_info *fill = data;
> 
> -	if (fill->cur == fill->max)
> -		return -EAGAIN; /* Something changed, try again */
> +	fill.count++;
> +	if (fill->devices >= fill->devices_end)
> +		return 0;
> 
>  	if (fill->flags & VFIO_PCI_HOT_RESET_FLAG_DEV_ID) {
>  		struct iommufd_ctx *iommufd = vfio_iommufd_device_ictx(fill->vdev);
> @@ -800,12 +806,12 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
>  		 */
>  		vdev = vfio_find_device_in_devset(dev_set, &pdev->dev);
>  		if (!vdev)
> -			fill->devices[fill->cur].devid = VFIO_PCI_DEVID_NOT_OWNED;
> +			info.devid = VFIO_PCI_DEVID_NOT_OWNED;
>  		else
> -			fill->devices[fill->cur].devid =
> -				vfio_iommufd_device_hot_reset_devid(vdev, iommufd);
> +			info.devid = vfio_iommufd_device_hot_reset_devid(
> +				vdev, iommufd);
>  		/* If devid is VFIO_PCI_DEVID_NOT_OWNED, clear owned flag. */
> -		if (fill->devices[fill->cur].devid == VFIO_PCI_DEVID_NOT_OWNED)
> +		if (info.devid == VFIO_PCI_DEVID_NOT_OWNED)
>  			fill->flags &= ~VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED;
>  	} else {
>  		struct iommu_group *iommu_group;
> @@ -814,13 +820,13 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
>  		if (!iommu_group)
>  			return -EPERM; /* Cannot reset non-isolated devices */
> 
> -		fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> +		info.group_id = iommu_group_id(iommu_group);
>  		iommu_group_put(iommu_group);
>  	}
> -	fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
> -	fill->devices[fill->cur].bus = pdev->bus->number;
> -	fill->devices[fill->cur].devfn = pdev->devfn;
> -	fill->cur++;
> +
> +	if (copy_to_user(fill->devices, &info, sizeof(info)))
> +		return -EFAULT;
> +	fill->devices++;
>  	return 0;
>  }
> 
> @@ -1212,8 +1218,7 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
>  	unsigned long minsz =
>  		offsetofend(struct vfio_pci_hot_reset_info, count);
>  	struct vfio_pci_hot_reset_info hdr;
> -	struct vfio_pci_fill_info fill = { 0 };
> -	struct vfio_pci_dependent_device *devices = NULL;
> +	struct vfio_pci_fill_info fill = {};
>  	bool slot = false;
>  	int ret = 0;
> 
> @@ -1231,29 +1236,9 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
>  	else if (pci_probe_reset_bus(vdev->pdev->bus))
>  		return -ENODEV;
> 
> -	/* How many devices are affected? */
> -	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_count_devs,
> -					    &fill.max, slot);
> -	if (ret)
> -		return ret;
> -
> -	WARN_ON(!fill.max); /* Should always be at least one */
> -
> -	/*
> -	 * If there's enough space, fill it now, otherwise return -ENOSPC and
> -	 * the number of devices affected.
> -	 */
> -	if (hdr.argsz < sizeof(hdr) + (fill.max * sizeof(*devices))) {
> -		ret = -ENOSPC;
> -		hdr.count = fill.max;
> -		goto reset_info_exit;
> -	}
> -
> -	devices = kcalloc(fill.max, sizeof(*devices), GFP_KERNEL);
> -	if (!devices)
> -		return -ENOMEM;
> -
> -	fill.devices = devices;
> +	fill.devices = arg->devices;
> +	fill.devices_end = arg->devices +
> +			   (hdr.argsz - sizeof(hdr)) / sizeof(arg->devices[0]);
>  	fill.vdev = &vdev->vdev;
> 
>  	if (vfio_device_cdev_opened(&vdev->vdev))
> @@ -1264,29 +1249,17 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
>  	ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
>  					    &fill, slot);
>  	mutex_unlock(&vdev->vdev.dev_set->lock);
> +	if (ret)
> +		return ret;
> 
> -	/*
> -	 * If a device was removed between counting and filling, we may come up
> -	 * short of fill.max.  If a device was added, we'll have a return of
> -	 * -EAGAIN above.
> -	 */
> -	if (!ret) {
> -		hdr.count = fill.cur;
> -		hdr.flags = fill.flags;
> -	}
> -
> -reset_info_exit:
> +	hdr.count = fill.count;
> +	hdr.flags = fill.flags;
>  	if (copy_to_user(arg, &hdr, minsz))
> -		ret = -EFAULT;
> +		return -EFAULT;
> 
> -	if (!ret) {
> -		if (copy_to_user(&arg->devices, devices,
> -				 hdr.count * sizeof(*devices)))
> -			ret = -EFAULT;
> -	}
> -
> -	kfree(devices);
> -	return ret;
> +	if (fill.count != fill.devices - arg->devices)

Should be "if (fill.count != (fill.devices - arg->devices) / sizeof(arg->devices[0]))" 😊

Regards,
Yi Liu

> +		return -ENOSPC;
> +	return 0;
>  }
> 
>  static int

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v7 8/9] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
  2023-06-14 13:05           ` [Intel-gfx] " Liu, Yi L
@ 2023-06-14 13:37             ` Jason Gunthorpe
  -1 siblings, 0 replies; 77+ messages in thread
From: Jason Gunthorpe @ 2023-06-14 13:37 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: alex.williamson, Tian, Kevin, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390, Hao,
	Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting, Duan,
	Zhenzhong, clegoate

On Wed, Jun 14, 2023 at 01:05:45PM +0000, Liu, Yi L wrote:
> > -EAGAIN basically means the kernel internally malfunctioned - eg it
> > allocated too little space for the actual size of devices. That is no
> > longer possible in this version so it should never return -EAGAIN.
> 
> I still have one doubt. Per my understanding, this is to handle newly
> plugged devices during the info reporting path. I don’t think holding
> dev_set lock can prevent it. but maybe -ENOSPC is enough. @Alex,
> what about your opinion?

If the device was plug instantly before we computed the size we returned
ENOSPC

If it was plugged instantly after we computed the size we returned
EAGAIN

Here we just resolve this race consistently to always return ENOSPC,
which always means we ran out of space in the user provided buffer.

> > -	kfree(devices);
> > -	return ret;
> > +	if (fill.count != fill.devices - arg->devices)
> 
> Should be "if (fill.count != (fill.devices - arg->devices) / sizeof(arg->devices[0]))" 😊

devices is already a typed pointer so the compiler computes the
/sizeof() itself

Your version  above is needed if it was void *

Jason

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH v7 8/9] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
@ 2023-06-14 13:37             ` Jason Gunthorpe
  0 siblings, 0 replies; 77+ messages in thread
From: Jason Gunthorpe @ 2023-06-14 13:37 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: mjrosato, jasowang, Hao, Xudong, Duan, Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, Tian, Kevin, Zhao, Yan Y, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

On Wed, Jun 14, 2023 at 01:05:45PM +0000, Liu, Yi L wrote:
> > -EAGAIN basically means the kernel internally malfunctioned - eg it
> > allocated too little space for the actual size of devices. That is no
> > longer possible in this version so it should never return -EAGAIN.
> 
> I still have one doubt. Per my understanding, this is to handle newly
> plugged devices during the info reporting path. I don’t think holding
> dev_set lock can prevent it. but maybe -ENOSPC is enough. @Alex,
> what about your opinion?

If the device was plug instantly before we computed the size we returned
ENOSPC

If it was plugged instantly after we computed the size we returned
EAGAIN

Here we just resolve this race consistently to always return ENOSPC,
which always means we ran out of space in the user provided buffer.

> > -	kfree(devices);
> > -	return ret;
> > +	if (fill.count != fill.devices - arg->devices)
> 
> Should be "if (fill.count != (fill.devices - arg->devices) / sizeof(arg->devices[0]))" 😊

devices is already a typed pointer so the compiler computes the
/sizeof() itself

Your version  above is needed if it was void *

Jason

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BUILD: failure for Enhance vfio PCI hot reset for vfio cdev device (rev7)
  2023-06-02 12:15 ` [Intel-gfx] " Yi Liu
                   ` (14 preceding siblings ...)
  (?)
@ 2023-06-14 15:47 ` Patchwork
  -1 siblings, 0 replies; 77+ messages in thread
From: Patchwork @ 2023-06-14 15:47 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: intel-gfx

== Series Details ==

Series: Enhance vfio PCI hot reset for vfio cdev device (rev7)
URL   : https://patchwork.freedesktop.org/series/116991/
State : failure

== Summary ==

Error: patch https://patchwork.freedesktop.org/api/1.0/series/116991/revisions/7/mbox/ not applied
Applying: vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset()
Applying: vfio/pci: Move the existing hot reset logic to be a helper
Applying: iommufd: Reserve all negative IDs in the iommufd xarray
Applying: iommufd: Add iommufd_ctx_has_group()
Applying: iommufd: Add helper to retrieve iommufd_ctx and devid
Applying: vfio: Mark cdev usage in vfio_device
Applying: vfio: Add helper to search vfio_device in a dev_set
Applying: vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
Using index info to reconstruct a base tree...
M	drivers/vfio/pci/vfio_pci_core.c
Falling back to patching base and 3-way merge...
Auto-merging drivers/vfio/pci/vfio_pci_core.c
CONFLICT (content): Merge conflict in drivers/vfio/pci/vfio_pci_core.c
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0008 vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Build failed, no error log produced



^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Intel-gfx] [PATCH v7 8/9] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
  2023-06-14 13:37             ` [Intel-gfx] " Jason Gunthorpe
@ 2023-06-15  3:31               ` Liu, Yi L
  -1 siblings, 0 replies; 77+ messages in thread
From: Liu, Yi L @ 2023-06-15  3:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mjrosato, jasowang, Hao, Xudong, Duan,  Zhenzhong, peterx, Xu,
	Terrence, chao.p.peng, linux-s390, kvm, lulu, Jiang, Yanting,
	joro, nicolinc, Tian, Kevin, Zhao, Yan Y, intel-gfx, eric.auger,
	intel-gvt-dev, yi.y.sun, clegoate, cohuck,
	shameerali.kolothum.thodi, suravee.suthikulpanit, robin.murphy

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, June 14, 2023 9:38 PM
> 
> On Wed, Jun 14, 2023 at 01:05:45PM +0000, Liu, Yi L wrote:
> > > -EAGAIN basically means the kernel internally malfunctioned - eg it
> > > allocated too little space for the actual size of devices. That is no
> > > longer possible in this version so it should never return -EAGAIN.
> >
> > I still have one doubt. Per my understanding, this is to handle newly
> > plugged devices during the info reporting path. I don’t think holding
> > dev_set lock can prevent it. but maybe -ENOSPC is enough. @Alex,
> > what about your opinion?
> 
> If the device was plug instantly before we computed the size we returned
> ENOSPC
> 
> If it was plugged instantly after we computed the size we returned
> EAGAIN

Yes.

> Here we just resolve this race consistently to always return ENOSPC,
> which always means we ran out of space in the user provided buffer.

This makes sense.

> > > -	kfree(devices);
> > > -	return ret;
> > > +	if (fill.count != fill.devices - arg->devices)
> >
> > Should be "if (fill.count != (fill.devices - arg->devices) / sizeof(arg->devices[0]))" 😊
> 
> devices is already a typed pointer so the compiler computes the
> /sizeof() itself
> 
> Your version  above is needed if it was void *

Got it.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 77+ messages in thread

* RE: [PATCH v7 8/9] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev
@ 2023-06-15  3:31               ` Liu, Yi L
  0 siblings, 0 replies; 77+ messages in thread
From: Liu, Yi L @ 2023-06-15  3:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: alex.williamson, Tian, Kevin, joro, robin.murphy, cohuck,
	eric.auger, nicolinc, kvm, mjrosato, chao.p.peng, yi.y.sun,
	peterx, jasowang, shameerali.kolothum.thodi, lulu,
	suravee.suthikulpanit, intel-gvt-dev, intel-gfx, linux-s390, Hao,
	Xudong, Zhao, Yan Y, Xu, Terrence, Jiang, Yanting, Duan,
	Zhenzhong, clegoate

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, June 14, 2023 9:38 PM
> 
> On Wed, Jun 14, 2023 at 01:05:45PM +0000, Liu, Yi L wrote:
> > > -EAGAIN basically means the kernel internally malfunctioned - eg it
> > > allocated too little space for the actual size of devices. That is no
> > > longer possible in this version so it should never return -EAGAIN.
> >
> > I still have one doubt. Per my understanding, this is to handle newly
> > plugged devices during the info reporting path. I don’t think holding
> > dev_set lock can prevent it. but maybe -ENOSPC is enough. @Alex,
> > what about your opinion?
> 
> If the device was plug instantly before we computed the size we returned
> ENOSPC
> 
> If it was plugged instantly after we computed the size we returned
> EAGAIN

Yes.

> Here we just resolve this race consistently to always return ENOSPC,
> which always means we ran out of space in the user provided buffer.

This makes sense.

> > > -	kfree(devices);
> > > -	return ret;
> > > +	if (fill.count != fill.devices - arg->devices)
> >
> > Should be "if (fill.count != (fill.devices - arg->devices) / sizeof(arg->devices[0]))" 😊
> 
> devices is already a typed pointer so the compiler computes the
> /sizeof() itself
> 
> Your version  above is needed if it was void *

Got it.

Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 77+ messages in thread

end of thread, other threads:[~2023-06-15  3:31 UTC | newest]

Thread overview: 77+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-02 12:15 [PATCH v7 0/9] Enhance vfio PCI hot reset for vfio cdev device Yi Liu
2023-06-02 12:15 ` [Intel-gfx] " Yi Liu
2023-06-02 12:15 ` [PATCH v7 1/9] vfio/pci: Update comment around group_fd get in vfio_pci_ioctl_pci_hot_reset() Yi Liu
2023-06-02 12:15   ` [Intel-gfx] " Yi Liu
2023-06-02 12:15 ` [PATCH v7 2/9] vfio/pci: Move the existing hot reset logic to be a helper Yi Liu
2023-06-02 12:15   ` [Intel-gfx] " Yi Liu
2023-06-02 12:15 ` [PATCH v7 3/9] iommufd: Reserve all negative IDs in the iommufd xarray Yi Liu
2023-06-02 12:15   ` [Intel-gfx] " Yi Liu
2023-06-13 11:46   ` Jason Gunthorpe
2023-06-13 11:46     ` [Intel-gfx] " Jason Gunthorpe
2023-06-02 12:15 ` [PATCH v7 4/9] iommufd: Add iommufd_ctx_has_group() Yi Liu
2023-06-02 12:15   ` [Intel-gfx] " Yi Liu
2023-06-08 21:40   ` Alex Williamson
2023-06-08 21:40     ` Alex Williamson
2023-06-08 23:44     ` Liu, Yi L
2023-06-08 23:44       ` [Intel-gfx] " Liu, Yi L
2023-06-13 11:46   ` Jason Gunthorpe
2023-06-13 11:46     ` [Intel-gfx] " Jason Gunthorpe
2023-06-02 12:15 ` [PATCH v7 5/9] iommufd: Add helper to retrieve iommufd_ctx and devid Yi Liu
2023-06-02 12:15   ` [Intel-gfx] " Yi Liu
2023-06-13 11:47   ` Jason Gunthorpe
2023-06-13 11:47     ` [Intel-gfx] " Jason Gunthorpe
2023-06-02 12:15 ` [PATCH v7 6/9] vfio: Mark cdev usage in vfio_device Yi Liu
2023-06-02 12:15   ` [Intel-gfx] " Yi Liu
2023-06-13 17:56   ` Jason Gunthorpe
2023-06-13 17:56     ` [Intel-gfx] " Jason Gunthorpe
2023-06-14  5:56     ` Liu, Yi L
2023-06-14  5:56       ` [Intel-gfx] " Liu, Yi L
2023-06-14 12:11       ` Jason Gunthorpe
2023-06-14 12:11         ` [Intel-gfx] " Jason Gunthorpe
2023-06-02 12:15 ` [PATCH v7 7/9] vfio: Add helper to search vfio_device in a dev_set Yi Liu
2023-06-02 12:15   ` [Intel-gfx] " Yi Liu
2023-06-13 11:47   ` Jason Gunthorpe
2023-06-13 11:47     ` Jason Gunthorpe
2023-06-02 12:15 ` [PATCH v7 8/9] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev Yi Liu
2023-06-02 12:15   ` [Intel-gfx] " Yi Liu
2023-06-08 22:26   ` Alex Williamson
2023-06-08 22:26     ` Alex Williamson
2023-06-09  0:04     ` Liu, Yi L
2023-06-09  0:04       ` [Intel-gfx] " Liu, Yi L
2023-06-13 11:46   ` Jason Gunthorpe
2023-06-13 11:46     ` [Intel-gfx] " Jason Gunthorpe
2023-06-13 12:50     ` Liu, Yi L
2023-06-13 12:50       ` [Intel-gfx] " Liu, Yi L
2023-06-13 14:32       ` Alex Williamson
2023-06-13 14:32         ` Alex Williamson
2023-06-13 17:40         ` Jason Gunthorpe
2023-06-13 17:40           ` [Intel-gfx] " Jason Gunthorpe
2023-06-13 18:23   ` Jason Gunthorpe
2023-06-13 18:23     ` [Intel-gfx] " Jason Gunthorpe
2023-06-14 10:35     ` Liu, Yi L
2023-06-14 10:35       ` [Intel-gfx] " Liu, Yi L
2023-06-14 12:17       ` Jason Gunthorpe
2023-06-14 12:17         ` [Intel-gfx] " Jason Gunthorpe
2023-06-14 13:05         ` Liu, Yi L
2023-06-14 13:05           ` [Intel-gfx] " Liu, Yi L
2023-06-14 13:37           ` Jason Gunthorpe
2023-06-14 13:37             ` [Intel-gfx] " Jason Gunthorpe
2023-06-15  3:31             ` Liu, Yi L
2023-06-15  3:31               ` Liu, Yi L
2023-06-02 12:15 ` [PATCH v7 9/9] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET Yi Liu
2023-06-02 12:15   ` [Intel-gfx] " Yi Liu
2023-06-08 22:30   ` Alex Williamson
2023-06-08 22:30     ` Alex Williamson
2023-06-09  0:13     ` Liu, Yi L
2023-06-09  0:13       ` [Intel-gfx] " Liu, Yi L
2023-06-09 14:38       ` Jason Gunthorpe
2023-06-09 14:38         ` [Intel-gfx] " Jason Gunthorpe
2023-06-13 18:09   ` Jason Gunthorpe
2023-06-13 18:09     ` [Intel-gfx] " Jason Gunthorpe
2023-06-02 15:14 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Enhance vfio PCI hot reset for vfio cdev device (rev5) Patchwork
2023-06-02 15:29 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2023-06-04 20:05 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
2023-06-08  6:59 ` [PATCH v7 0/9] Enhance vfio PCI hot reset for vfio cdev device Jiang, Yanting
2023-06-08  6:59   ` [Intel-gfx] " Jiang, Yanting
2023-06-13 20:47 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Enhance vfio PCI hot reset for vfio cdev device (rev6) Patchwork
2023-06-14 15:47 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for Enhance vfio PCI hot reset for vfio cdev device (rev7) Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.