All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/3] vfio/pci: Check the device set open count on reset
@ 2022-11-10  1:40 Anthony DeRossi
  2022-11-10  1:40 ` [PATCH v6 1/3] vfio: Fix container device registration life cycle Anthony DeRossi
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Anthony DeRossi @ 2022-11-10  1:40 UTC (permalink / raw)
  To: kvm; +Cc: alex.williamson, cohuck, jgg, kevin.tian, abhsahu, yishaih

This series fixes an issue where devices bound to vfio-pci are not reset when
they are released. Skipping the reset has unpredictable results depending on
the device, and can cause errors when accessing the device later or binding to
a different driver.

The first patch in this series fixes a life cycle issue that was discovered in
an earlier revision of the series.

Thank you Alex, Jason, and Kevin for your reviews and feedback. This revision
includes the changes suggested on v5, but without any changes to
vfio_device_set_open_count().

Anthony

v5 -> v6:
- Added a call to lockdep_assert_held() in patch 3
- Corrected "vfio_container_device_register()" in the patch 1 commit message
v5: https://lore.kernel.org/kvm/20221105224458.8180-1-ajderossi@gmail.com/

v4 -> v5:
- Replaced patch 2 with a patch that introduces a new function to get the
  open count of a device set
- Updated patch 3 to use the new function
v4: https://lore.kernel.org/kvm/20221104195727.4629-1-ajderossi@gmail.com/

v3 -> v4:
- Added a patch to fix device registration life cycle
- Added a patch to add a public open_count on vfio_device_set
- Changed the implementation to avoid private open_count usage
v3: https://lore.kernel.org/kvm/20221102055732.2110-1-ajderossi@gmail.com/

v2 -> v3:
- Added WARN_ON()
- Revised commit message
v2: https://lore.kernel.org/kvm/20221026194245.1769-1-ajderossi@gmail.com/

v1 -> v2:
- Changed reset behavior instead of open_count ordering
- Retitled from "vfio: Decrement open_count before close_device()"
v1: https://lore.kernel.org/kvm/20221025193820.4412-1-ajderossi@gmail.com/

Anthony DeRossi (3):
  vfio: Fix container device registration life cycle
  vfio: Export the device set open count
  vfio/pci: Check the device set open count on reset

 drivers/vfio/pci/vfio_pci_core.c | 10 +++++-----
 drivers/vfio/vfio_main.c         | 26 +++++++++++++++++++++-----
 include/linux/vfio.h             |  1 +
 3 files changed, 27 insertions(+), 10 deletions(-)

-- 
2.37.4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v6 1/3] vfio: Fix container device registration life cycle
  2022-11-10  1:40 [PATCH v6 0/3] vfio/pci: Check the device set open count on reset Anthony DeRossi
@ 2022-11-10  1:40 ` Anthony DeRossi
  2022-11-10  2:36   ` Yi Liu
  2022-11-10  1:40 ` [PATCH v6 2/3] vfio: Export the device set open count Anthony DeRossi
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Anthony DeRossi @ 2022-11-10  1:40 UTC (permalink / raw)
  To: kvm; +Cc: alex.williamson, cohuck, jgg, kevin.tian, abhsahu, yishaih

In vfio_device_open(), vfio_device_container_register() is always called
when open_count == 1. On error, vfio_device_container_unregister() is
only called when open_count == 1 and close_device is set. This leaks a
registration for devices without a close_device implementation.

In vfio_device_fops_release(), vfio_device_container_unregister() is
called unconditionally. This can cause a device to be unregistered
multiple times.

Treating container device registration/unregistration uniformly (always
when open_count == 1) fixes both issues.

Fixes: ce4b4657ff18 ("vfio: Replace the DMA unmapping notifier with a callback")
Signed-off-by: Anthony DeRossi <ajderossi@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 drivers/vfio/vfio_main.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 2d168793d4e1..9a4af880e941 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -801,8 +801,9 @@ static struct file *vfio_device_open(struct vfio_device *device)
 err_close_device:
 	mutex_lock(&device->dev_set->lock);
 	mutex_lock(&device->group->group_lock);
-	if (device->open_count == 1 && device->ops->close_device) {
-		device->ops->close_device(device);
+	if (device->open_count == 1) {
+		if (device->ops->close_device)
+			device->ops->close_device(device);
 
 		vfio_device_container_unregister(device);
 	}
@@ -1017,10 +1018,12 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 	mutex_lock(&device->dev_set->lock);
 	vfio_assert_device_open(device);
 	mutex_lock(&device->group->group_lock);
-	if (device->open_count == 1 && device->ops->close_device)
-		device->ops->close_device(device);
+	if (device->open_count == 1) {
+		if (device->ops->close_device)
+			device->ops->close_device(device);
 
-	vfio_device_container_unregister(device);
+		vfio_device_container_unregister(device);
+	}
 	mutex_unlock(&device->group->group_lock);
 	device->open_count--;
 	if (device->open_count == 0)
-- 
2.37.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v6 2/3] vfio: Export the device set open count
  2022-11-10  1:40 [PATCH v6 0/3] vfio/pci: Check the device set open count on reset Anthony DeRossi
  2022-11-10  1:40 ` [PATCH v6 1/3] vfio: Fix container device registration life cycle Anthony DeRossi
@ 2022-11-10  1:40 ` Anthony DeRossi
  2022-11-10  2:46   ` Yi Liu
  2022-11-10  1:40 ` [PATCH v6 3/3] vfio/pci: Check the device set open count on reset Anthony DeRossi
  2022-11-10 20:16 ` [PATCH v6 0/3] " Alex Williamson
  3 siblings, 1 reply; 10+ messages in thread
From: Anthony DeRossi @ 2022-11-10  1:40 UTC (permalink / raw)
  To: kvm; +Cc: alex.williamson, cohuck, jgg, kevin.tian, abhsahu, yishaih

The open count of a device set is the sum of the open counts of all
devices in the set. Drivers can use this value to determine whether
shared resources are in use without tracking them manually or accessing
the private open_count in vfio_device.

Signed-off-by: Anthony DeRossi <ajderossi@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 drivers/vfio/vfio_main.c | 13 +++++++++++++
 include/linux/vfio.h     |  1 +
 2 files changed, 14 insertions(+)

diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 9a4af880e941..6e8804fe0095 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -125,6 +125,19 @@ static void vfio_release_device_set(struct vfio_device *device)
 	xa_unlock(&vfio_device_set_xa);
 }
 
+unsigned int vfio_device_set_open_count(struct vfio_device_set *dev_set)
+{
+	struct vfio_device *cur;
+	unsigned int open_count = 0;
+
+	lockdep_assert_held(&dev_set->lock);
+
+	list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
+		open_count += cur->open_count;
+	return open_count;
+}
+EXPORT_SYMBOL_GPL(vfio_device_set_open_count);
+
 /*
  * Group objects - create, release, get, put, search
  */
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index e7cebeb875dd..fdd393f70b19 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -189,6 +189,7 @@ int vfio_register_emulated_iommu_dev(struct vfio_device *device);
 void vfio_unregister_group_dev(struct vfio_device *device);
 
 int vfio_assign_device_set(struct vfio_device *device, void *set_id);
+unsigned int vfio_device_set_open_count(struct vfio_device_set *dev_set);
 
 int vfio_mig_get_next_state(struct vfio_device *device,
 			    enum vfio_device_mig_state cur_fsm,
-- 
2.37.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v6 3/3] vfio/pci: Check the device set open count on reset
  2022-11-10  1:40 [PATCH v6 0/3] vfio/pci: Check the device set open count on reset Anthony DeRossi
  2022-11-10  1:40 ` [PATCH v6 1/3] vfio: Fix container device registration life cycle Anthony DeRossi
  2022-11-10  1:40 ` [PATCH v6 2/3] vfio: Export the device set open count Anthony DeRossi
@ 2022-11-10  1:40 ` Anthony DeRossi
  2022-11-10  3:03   ` Yi Liu
  2022-11-10 20:16 ` [PATCH v6 0/3] " Alex Williamson
  3 siblings, 1 reply; 10+ messages in thread
From: Anthony DeRossi @ 2022-11-10  1:40 UTC (permalink / raw)
  To: kvm; +Cc: alex.williamson, cohuck, jgg, kevin.tian, abhsahu, yishaih

vfio_pci_dev_set_needs_reset() inspects the open_count of every device
in the set to determine whether a reset is allowed. The current device
always has open_count == 1 within vfio_pci_core_disable(), effectively
disabling the reset logic. This field is also documented as private in
vfio_device, so it should not be used to determine whether other devices
in the set are open.

Checking for vfio_device_set_open_count() > 1 on the device set fixes
both issues.

After commit 2cd8b14aaa66 ("vfio/pci: Move to the device set
infrastructure"), failure to create a new file for a device would cause
the reset to be skipped due to open_count being decremented after
calling close_device() in the error path.

After commit eadd86f835c6 ("vfio: Remove calls to
vfio_group_add_container_user()"), releasing a device would always skip
the reset due to an ordering change in vfio_device_fops_release().

Failing to reset the device leaves it in an unknown state, potentially
causing errors when it is accessed later or bound to a different driver.

This issue was observed with a Radeon RX Vega 56 [1002:687f] (rev c3)
assigned to a Windows guest. After shutting down the guest, unbinding
the device from vfio-pci, and binding the device to amdgpu:

[  548.007102] [drm:psp_hw_start [amdgpu]] *ERROR* PSP create ring failed!
[  548.027174] [drm:psp_hw_init [amdgpu]] *ERROR* PSP firmware loading failed
[  548.027242] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* hw_init of IP block <psp> failed -22
[  548.027306] amdgpu 0000:0a:00.0: amdgpu: amdgpu_device_ip_init failed
[  548.027308] amdgpu 0000:0a:00.0: amdgpu: Fatal error during GPU init

Fixes: 2cd8b14aaa66 ("vfio/pci: Move to the device set infrastructure")
Fixes: eadd86f835c6 ("vfio: Remove calls to vfio_group_add_container_user()")
Signed-off-by: Anthony DeRossi <ajderossi@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
---
 drivers/vfio/pci/vfio_pci_core.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index badc9d828cac..e030c2120183 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -2488,12 +2488,12 @@ static bool vfio_pci_dev_set_needs_reset(struct vfio_device_set *dev_set)
 	struct vfio_pci_core_device *cur;
 	bool needs_reset = false;
 
-	list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) {
-		/* No VFIO device in the set can have an open device FD */
-		if (cur->vdev.open_count)
-			return false;
+	/* No other VFIO device in the set can be open. */
+	if (vfio_device_set_open_count(dev_set) > 1)
+		return false;
+
+	list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list)
 		needs_reset |= cur->needs_reset;
-	}
 	return needs_reset;
 }
 
-- 
2.37.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 1/3] vfio: Fix container device registration life cycle
  2022-11-10  1:40 ` [PATCH v6 1/3] vfio: Fix container device registration life cycle Anthony DeRossi
@ 2022-11-10  2:36   ` Yi Liu
  0 siblings, 0 replies; 10+ messages in thread
From: Yi Liu @ 2022-11-10  2:36 UTC (permalink / raw)
  To: Anthony DeRossi, kvm
  Cc: alex.williamson, cohuck, jgg, kevin.tian, abhsahu, yishaih

On 2022/11/10 09:40, Anthony DeRossi wrote:
> In vfio_device_open(), vfio_device_container_register() is always called
> when open_count == 1. On error, vfio_device_container_unregister() is
> only called when open_count == 1 and close_device is set. This leaks a
> registration for devices without a close_device implementation.
> 
> In vfio_device_fops_release(), vfio_device_container_unregister() is
> called unconditionally. This can cause a device to be unregistered
> multiple times.
> 
> Treating container device registration/unregistration uniformly (always
> when open_count == 1) fixes both issues.
> 
> Fixes: ce4b4657ff18 ("vfio: Replace the DMA unmapping notifier with a callback")
> Signed-off-by: Anthony DeRossi <ajderossi@gmail.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> ---
>   drivers/vfio/vfio_main.c | 13 ++++++++-----
>   1 file changed, 8 insertions(+), 5 deletions(-)


Reviewed-by: Yi Liu <yi.l.liu@intel.com>

> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 2d168793d4e1..9a4af880e941 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -801,8 +801,9 @@ static struct file *vfio_device_open(struct vfio_device *device)
>   err_close_device:
>   	mutex_lock(&device->dev_set->lock);
>   	mutex_lock(&device->group->group_lock);
> -	if (device->open_count == 1 && device->ops->close_device) {
> -		device->ops->close_device(device);
> +	if (device->open_count == 1) {
> +		if (device->ops->close_device)
> +			device->ops->close_device(device);
>   
>   		vfio_device_container_unregister(device);
>   	}
> @@ -1017,10 +1018,12 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
>   	mutex_lock(&device->dev_set->lock);
>   	vfio_assert_device_open(device);
>   	mutex_lock(&device->group->group_lock);
> -	if (device->open_count == 1 && device->ops->close_device)
> -		device->ops->close_device(device);
> +	if (device->open_count == 1) {
> +		if (device->ops->close_device)
> +			device->ops->close_device(device);
>   
> -	vfio_device_container_unregister(device);
> +		vfio_device_container_unregister(device);
> +	}
>   	mutex_unlock(&device->group->group_lock);
>   	device->open_count--;
>   	if (device->open_count == 0)

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 2/3] vfio: Export the device set open count
  2022-11-10  1:40 ` [PATCH v6 2/3] vfio: Export the device set open count Anthony DeRossi
@ 2022-11-10  2:46   ` Yi Liu
  0 siblings, 0 replies; 10+ messages in thread
From: Yi Liu @ 2022-11-10  2:46 UTC (permalink / raw)
  To: Anthony DeRossi, kvm
  Cc: alex.williamson, cohuck, jgg, kevin.tian, abhsahu, yishaih

On 2022/11/10 09:40, Anthony DeRossi wrote:
> The open count of a device set is the sum of the open counts of all
> devices in the set. Drivers can use this value to determine whether
> shared resources are in use without tracking them manually or accessing
> the private open_count in vfio_device.
> 
> Signed-off-by: Anthony DeRossi <ajderossi@gmail.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> ---
>   drivers/vfio/vfio_main.c | 13 +++++++++++++
>   include/linux/vfio.h     |  1 +
>   2 files changed, 14 insertions(+)

Reviewed-by: Yi Liu <yi.l.liu@intel.com>

> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 9a4af880e941..6e8804fe0095 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -125,6 +125,19 @@ static void vfio_release_device_set(struct vfio_device *device)
>   	xa_unlock(&vfio_device_set_xa);
>   }
>   
> +unsigned int vfio_device_set_open_count(struct vfio_device_set *dev_set)
> +{
> +	struct vfio_device *cur;
> +	unsigned int open_count = 0;
> +
> +	lockdep_assert_held(&dev_set->lock);
> +
> +	list_for_each_entry(cur, &dev_set->device_list, dev_set_list)
> +		open_count += cur->open_count;
> +	return open_count;
> +}
> +EXPORT_SYMBOL_GPL(vfio_device_set_open_count);
> +
>   /*
>    * Group objects - create, release, get, put, search
>    */
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index e7cebeb875dd..fdd393f70b19 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -189,6 +189,7 @@ int vfio_register_emulated_iommu_dev(struct vfio_device *device);
>   void vfio_unregister_group_dev(struct vfio_device *device);
>   
>   int vfio_assign_device_set(struct vfio_device *device, void *set_id);
> +unsigned int vfio_device_set_open_count(struct vfio_device_set *dev_set);
>   
>   int vfio_mig_get_next_state(struct vfio_device *device,
>   			    enum vfio_device_mig_state cur_fsm,

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 3/3] vfio/pci: Check the device set open count on reset
  2022-11-10  1:40 ` [PATCH v6 3/3] vfio/pci: Check the device set open count on reset Anthony DeRossi
@ 2022-11-10  3:03   ` Yi Liu
  2022-11-10  4:17     ` Alex Williamson
  0 siblings, 1 reply; 10+ messages in thread
From: Yi Liu @ 2022-11-10  3:03 UTC (permalink / raw)
  To: Anthony DeRossi, kvm
  Cc: alex.williamson, cohuck, jgg, kevin.tian, abhsahu, yishaih

Hi DeRossi,

On 2022/11/10 09:40, Anthony DeRossi wrote:
> vfio_pci_dev_set_needs_reset() inspects the open_count of every device
> in the set to determine whether a reset is allowed. The current device
> always has open_count == 1 within vfio_pci_core_disable(), effectively
> disabling the reset logic. This field is also documented as private in
> vfio_device, so it should not be used to determine whether other devices
> in the set are open.

haven't went through the prior version. maybe may question has been already
answered. My question is:

the major reason is the order problem in vfio_main.c. close_device() is
always called before decreasing open_count to be 0. So even other device
has no open fd, the current vfio_device still have one open count. So why
can't we just switch the order of open_count-- and close_device()?

> Checking for vfio_device_set_open_count() > 1 on the device set fixes
> both issues
tbh. it's weird to me that a driver needs to know the internal logic of
vfio core before knowing it needs to check the vfio_device_set_open_count()
in this way. Is vfio-pci the only driver that needs to do this check or
there are other drivers? If there are other drivers, maybe fixing the order
in core is better.

> After commit 2cd8b14aaa66 ("vfio/pci: Move to the device set
> infrastructure"), failure to create a new file for a device would cause
> the reset to be skipped due to open_count being decremented after
> calling close_device() in the error path.
> 
> After commit eadd86f835c6 ("vfio: Remove calls to
> vfio_group_add_container_user()"), releasing a device would always skip
> the reset due to an ordering change in vfio_device_fops_release().
> 
> Failing to reset the device leaves it in an unknown state, potentially
> causing errors when it is accessed later or bound to a different driver.
> 
> This issue was observed with a Radeon RX Vega 56 [1002:687f] (rev c3)
> assigned to a Windows guest. After shutting down the guest, unbinding
> the device from vfio-pci, and binding the device to amdgpu:
> 
> [  548.007102] [drm:psp_hw_start [amdgpu]] *ERROR* PSP create ring failed!
> [  548.027174] [drm:psp_hw_init [amdgpu]] *ERROR* PSP firmware loading failed
> [  548.027242] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* hw_init of IP block <psp> failed -22
> [  548.027306] amdgpu 0000:0a:00.0: amdgpu: amdgpu_device_ip_init failed
> [  548.027308] amdgpu 0000:0a:00.0: amdgpu: Fatal error during GPU init
> 
> Fixes: 2cd8b14aaa66 ("vfio/pci: Move to the device set infrastructure")
> Fixes: eadd86f835c6 ("vfio: Remove calls to vfio_group_add_container_user()")
> Signed-off-by: Anthony DeRossi <ajderossi@gmail.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> ---
>   drivers/vfio/pci/vfio_pci_core.c | 10 +++++-----
>   1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index badc9d828cac..e030c2120183 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -2488,12 +2488,12 @@ static bool vfio_pci_dev_set_needs_reset(struct vfio_device_set *dev_set)
>   	struct vfio_pci_core_device *cur;
>   	bool needs_reset = false;
>   
> -	list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) {
> -		/* No VFIO device in the set can have an open device FD */
> -		if (cur->vdev.open_count)
> -			return false;
> +	/* No other VFIO device in the set can be open. */
> +	if (vfio_device_set_open_count(dev_set) > 1)
> +		return false;
> +
> +	list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list)
>   		needs_reset |= cur->needs_reset;
> -	}
>   	return needs_reset;
>   }
>   

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 3/3] vfio/pci: Check the device set open count on reset
  2022-11-10  3:03   ` Yi Liu
@ 2022-11-10  4:17     ` Alex Williamson
  2022-11-10  4:33       ` Yi Liu
  0 siblings, 1 reply; 10+ messages in thread
From: Alex Williamson @ 2022-11-10  4:17 UTC (permalink / raw)
  To: Yi Liu; +Cc: Anthony DeRossi, kvm, cohuck, jgg, kevin.tian, abhsahu, yishaih

On Thu, 10 Nov 2022 11:03:29 +0800
Yi Liu <yi.l.liu@intel.com> wrote:

> Hi DeRossi,
> 
> On 2022/11/10 09:40, Anthony DeRossi wrote:
> > vfio_pci_dev_set_needs_reset() inspects the open_count of every device
> > in the set to determine whether a reset is allowed. The current device
> > always has open_count == 1 within vfio_pci_core_disable(), effectively
> > disabling the reset logic. This field is also documented as private in
> > vfio_device, so it should not be used to determine whether other devices
> > in the set are open.  
> 
> haven't went through the prior version. maybe may question has been already
> answered. My question is:
> 
> the major reason is the order problem in vfio_main.c. close_device() is
> always called before decreasing open_count to be 0. So even other device
> has no open fd, the current vfio_device still have one open count. So why
> can't we just switch the order of open_count-- and close_device()?

This is what was originally proposed and Jason shot it down:

https://lore.kernel.org/all/Y1kY0I4lr7KntbWp@ziepe.ca/
 
> > Checking for vfio_device_set_open_count() > 1 on the device set fixes
> > both issues  
> tbh. it's weird to me that a driver needs to know the internal logic of
> vfio core before knowing it needs to check the vfio_device_set_open_count()
> in this way. Is vfio-pci the only driver that needs to do this check or
> there are other drivers? If there are other drivers, maybe fixing the order
> in core is better.

Please see the evolution of reflck into device sets.  Both PCI and FSL
can have multiple devices in a set, AIUI.  The driver defines the set.
This ability to test for the last close among devices in the set is a
fundamental feature of the original reflck.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 3/3] vfio/pci: Check the device set open count on reset
  2022-11-10  4:17     ` Alex Williamson
@ 2022-11-10  4:33       ` Yi Liu
  0 siblings, 0 replies; 10+ messages in thread
From: Yi Liu @ 2022-11-10  4:33 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Anthony DeRossi, kvm, cohuck, jgg, kevin.tian, abhsahu, yishaih

On 2022/11/10 12:17, Alex Williamson wrote:
> On Thu, 10 Nov 2022 11:03:29 +0800
> Yi Liu <yi.l.liu@intel.com> wrote:
> 
>> Hi DeRossi,
>>
>> On 2022/11/10 09:40, Anthony DeRossi wrote:
>>> vfio_pci_dev_set_needs_reset() inspects the open_count of every device
>>> in the set to determine whether a reset is allowed. The current device
>>> always has open_count == 1 within vfio_pci_core_disable(), effectively
>>> disabling the reset logic. This field is also documented as private in
>>> vfio_device, so it should not be used to determine whether other devices
>>> in the set are open.
>>
>> haven't went through the prior version. maybe may question has been already
>> answered. My question is:
>>
>> the major reason is the order problem in vfio_main.c. close_device() is
>> always called before decreasing open_count to be 0. So even other device
>> has no open fd, the current vfio_device still have one open count. So why
>> can't we just switch the order of open_count-- and close_device()?
> 
> This is what was originally proposed and Jason shot it down:
> 
> https://lore.kernel.org/all/Y1kY0I4lr7KntbWp@ziepe.ca/

got it. :-)

>>> Checking for vfio_device_set_open_count() > 1 on the device set fixes
>>> both issues
>> tbh. it's weird to me that a driver needs to know the internal logic of
>> vfio core before knowing it needs to check the vfio_device_set_open_count()
>> in this way. Is vfio-pci the only driver that needs to do this check or
>> there are other drivers? If there are other drivers, maybe fixing the order
>> in core is better.
> 
> Please see the evolution of reflck into device sets.  Both PCI and FSL
> can have multiple devices in a set, AIUI.  The driver defines the set.
> This ability to test for the last close among devices in the set is a
> fundamental feature of the original reflck.  Thanks,

thanks for the info.

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 0/3] vfio/pci: Check the device set open count on reset
  2022-11-10  1:40 [PATCH v6 0/3] vfio/pci: Check the device set open count on reset Anthony DeRossi
                   ` (2 preceding siblings ...)
  2022-11-10  1:40 ` [PATCH v6 3/3] vfio/pci: Check the device set open count on reset Anthony DeRossi
@ 2022-11-10 20:16 ` Alex Williamson
  3 siblings, 0 replies; 10+ messages in thread
From: Alex Williamson @ 2022-11-10 20:16 UTC (permalink / raw)
  To: Anthony DeRossi; +Cc: kvm, cohuck, jgg, kevin.tian, abhsahu, yishaih

On Wed,  9 Nov 2022 17:40:24 -0800
Anthony DeRossi <ajderossi@gmail.com> wrote:

> This series fixes an issue where devices bound to vfio-pci are not reset when
> they are released. Skipping the reset has unpredictable results depending on
> the device, and can cause errors when accessing the device later or binding to
> a different driver.
> 
> The first patch in this series fixes a life cycle issue that was discovered in
> an earlier revision of the series.
> 
> Thank you Alex, Jason, and Kevin for your reviews and feedback. This revision
> includes the changes suggested on v5, but without any changes to
> vfio_device_set_open_count().
> 
> Anthony
> 
> v5 -> v6:
> - Added a call to lockdep_assert_held() in patch 3
> - Corrected "vfio_container_device_register()" in the patch 1 commit message
> v5: https://lore.kernel.org/kvm/20221105224458.8180-1-ajderossi@gmail.com/
> 
> v4 -> v5:
> - Replaced patch 2 with a patch that introduces a new function to get the
>   open count of a device set
> - Updated patch 3 to use the new function
> v4: https://lore.kernel.org/kvm/20221104195727.4629-1-ajderossi@gmail.com/
> 
> v3 -> v4:
> - Added a patch to fix device registration life cycle
> - Added a patch to add a public open_count on vfio_device_set
> - Changed the implementation to avoid private open_count usage
> v3: https://lore.kernel.org/kvm/20221102055732.2110-1-ajderossi@gmail.com/
> 
> v2 -> v3:
> - Added WARN_ON()
> - Revised commit message
> v2: https://lore.kernel.org/kvm/20221026194245.1769-1-ajderossi@gmail.com/
> 
> v1 -> v2:
> - Changed reset behavior instead of open_count ordering
> - Retitled from "vfio: Decrement open_count before close_device()"
> v1: https://lore.kernel.org/kvm/20221025193820.4412-1-ajderossi@gmail.com/
> 
> Anthony DeRossi (3):
>   vfio: Fix container device registration life cycle
>   vfio: Export the device set open count
>   vfio/pci: Check the device set open count on reset
> 
>  drivers/vfio/pci/vfio_pci_core.c | 10 +++++-----
>  drivers/vfio/vfio_main.c         | 26 +++++++++++++++++++++-----
>  include/linux/vfio.h             |  1 +
>  3 files changed, 27 insertions(+), 10 deletions(-)
> 

Applied to vfio for-linus branch for v6.1.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-11-10 20:18 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-10  1:40 [PATCH v6 0/3] vfio/pci: Check the device set open count on reset Anthony DeRossi
2022-11-10  1:40 ` [PATCH v6 1/3] vfio: Fix container device registration life cycle Anthony DeRossi
2022-11-10  2:36   ` Yi Liu
2022-11-10  1:40 ` [PATCH v6 2/3] vfio: Export the device set open count Anthony DeRossi
2022-11-10  2:46   ` Yi Liu
2022-11-10  1:40 ` [PATCH v6 3/3] vfio/pci: Check the device set open count on reset Anthony DeRossi
2022-11-10  3:03   ` Yi Liu
2022-11-10  4:17     ` Alex Williamson
2022-11-10  4:33       ` Yi Liu
2022-11-10 20:16 ` [PATCH v6 0/3] " Alex Williamson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.