All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Tian, Kevin" <kevin.tian@intel.com>
To: Anthony DeRossi <ajderossi@gmail.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Cc: "alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	"cohuck@redhat.com" <cohuck@redhat.com>,
	"jgg@ziepe.ca" <jgg@ziepe.ca>,
	"abhsahu@nvidia.com" <abhsahu@nvidia.com>,
	"yishaih@nvidia.com" <yishaih@nvidia.com>
Subject: RE: [PATCH v5 3/3] vfio/pci: Check the device set open count on reset
Date: Wed, 9 Nov 2022 03:38:40 +0000	[thread overview]
Message-ID: <BN9PR11MB52760A98AEB26DB36602F07E8C3E9@BN9PR11MB5276.namprd11.prod.outlook.com> (raw)
In-Reply-To: <20221105224458.8180-4-ajderossi@gmail.com>

> From: Anthony DeRossi <ajderossi@gmail.com>
> Sent: Sunday, November 6, 2022 6:45 AM
> 
> vfio_pci_dev_set_needs_reset() inspects the open_count of every device
> in the set to determine whether a reset is allowed. The current device
> always has open_count == 1 within vfio_pci_core_disable(), effectively
> disabling the reset logic. This field is also documented as private in
> vfio_device, so it should not be used to determine whether other devices
> in the set are open.
> 
> Checking for vfio_device_set_open_count() > 1 on the device set fixes
> both issues.
> 
> After commit 2cd8b14aaa66 ("vfio/pci: Move to the device set
> infrastructure"), failure to create a new file for a device would cause
> the reset to be skipped due to open_count being decremented after
> calling close_device() in the error path.
> 
> After commit eadd86f835c6 ("vfio: Remove calls to
> vfio_group_add_container_user()"), releasing a device would always skip
> the reset due to an ordering change in vfio_device_fops_release().
> 
> Failing to reset the device leaves it in an unknown state, potentially
> causing errors when it is accessed later or bound to a different driver.
> 
> This issue was observed with a Radeon RX Vega 56 [1002:687f] (rev c3)
> assigned to a Windows guest. After shutting down the guest, unbinding
> the device from vfio-pci, and binding the device to amdgpu:
> 
> [  548.007102] [drm:psp_hw_start [amdgpu]] *ERROR* PSP create ring failed!
> [  548.027174] [drm:psp_hw_init [amdgpu]] *ERROR* PSP firmware loading
> failed
> [  548.027242] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* hw_init
> of IP block <psp> failed -22
> [  548.027306] amdgpu 0000:0a:00.0: amdgpu: amdgpu_device_ip_init failed
> [  548.027308] amdgpu 0000:0a:00.0: amdgpu: Fatal error during GPU init
> 
> Fixes: 2cd8b14aaa66 ("vfio/pci: Move to the device set infrastructure")
> Fixes: eadd86f835c6 ("vfio: Remove calls to
> vfio_group_add_container_user()")
> Signed-off-by: Anthony DeRossi <ajderossi@gmail.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

      parent reply	other threads:[~2022-11-09  3:38 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-05 22:44 [PATCH v5 0/3] vfio/pci: Check the device set open count on reset Anthony DeRossi
2022-11-05 22:44 ` [PATCH v5 1/3] vfio: Fix container device registration life cycle Anthony DeRossi
2022-11-09  0:43   ` Jason Gunthorpe
2022-11-09  3:36   ` Tian, Kevin
2022-11-05 22:44 ` [PATCH v5 2/3] vfio: Export the device set open count Anthony DeRossi
2022-11-08 23:52   ` Alex Williamson
2022-11-09  0:48   ` Jason Gunthorpe
2022-11-09 16:04     ` Alex Williamson
2022-11-09  3:37   ` Tian, Kevin
2022-11-05 22:44 ` [PATCH v5 3/3] vfio/pci: Check the device set open count on reset Anthony DeRossi
2022-11-09  0:53   ` Jason Gunthorpe
2022-11-09  3:38   ` Tian, Kevin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BN9PR11MB52760A98AEB26DB36602F07E8C3E9@BN9PR11MB5276.namprd11.prod.outlook.com \
    --to=kevin.tian@intel.com \
    --cc=abhsahu@nvidia.com \
    --cc=ajderossi@gmail.com \
    --cc=alex.williamson@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=jgg@ziepe.ca \
    --cc=kvm@vger.kernel.org \
    --cc=yishaih@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.