All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: linux-pci@vger.kernel.org
Cc: abhsahu@nvidia.com, targupta@nvidia.com, zhguo@redhat.com,
	alex.williamson@redhat.com, sdalvi@google.com
Subject: [PATCH] PCI: Extend D3hot delay for NVIDIA HDA controllers
Date: Thu, 13 Apr 2023 13:40:42 -0600	[thread overview]
Message-ID: <20230413194042.605768-1-alex.williamson@redhat.com> (raw)
In-Reply-To: <168004421186.935858.12296629041962399467.stgit@omen>

Assignment of NVIDIA Ampere-based GPUs have seen a regression since the
below referenced commit, where the reduced D3hot transition delay appears
to introduce a small window where a D3hot->D0 transition followed by a bus
reset can wedge the device.  The entire device is subsequently unavailable,
returning -1 on config space read and is unrecoverable without a host reset.

This has been observed with RTX A2000 and A5000 GPU and audio functions
assigned to a Windows VM, where shutdown of the VM places the devices in
D3hot prior to vfio-pci performing a bus reset when userspace releases the
devices.  The issue has roughly a 2-3% chance of occurring per shutdown.

Restoring the HDA controller d3hot_delay to the effective value before the
below commit has been shown to resolve the issue.  NVIDIA confirms this
change should be safe for all of their HDA controllers.

Cc: Abhishek Sahu <abhsahu@nvidia.com>
Cc: Tarun Gupta <targupta@nvidia.com>
Fixes: 3e347969a577 ("PCI/PM: Reduce D3hot delay with usleep_range()")
Reported-by: Zhiyi Guo <zhguo@redhat.com>
Reviewed-by: Tarun Gupta <targupta@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

Unfortunately Tarun's reply with confirmation doesn't show up on lore,
possibly due to html email, or else I'd provide that as a Link:.

 drivers/pci/quirks.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 44cab813bf95..f4e2a88729fd 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -1939,6 +1939,19 @@ static void quirk_radeon_pm(struct pci_dev *dev)
 }
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6741, quirk_radeon_pm);
 
+/*
+ * NVIDIA Ampere-based HDA controllers can wedge the whole device if a bus
+ * reset is performed too soon after transition to D0, extend d3hot_delay
+ * to previous effective default for all NVIDIA HDA controllers.
+ */
+static void quirk_nvidia_hda_pm(struct pci_dev *dev)
+{
+	quirk_d3hot_delay(dev, 20);
+}
+DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
+			      PCI_CLASS_MULTIMEDIA_HD_AUDIO, 8,
+			      quirk_nvidia_hda_pm);
+
 /*
  * Ryzen5/7 XHCI controllers fail upon resume from runtime suspend or s2idle.
  * https://bugzilla.kernel.org/show_bug.cgi?id=205587
-- 
2.39.2


  parent reply	other threads:[~2023-04-13 19:41 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-28 22:59 [RFC PATCH] PCI: Extend D3hot delay for NVIDIA HDA controllers Alex Williamson
2023-04-06 21:50 ` Bjorn Helgaas
2023-04-06 22:01   ` Alex Williamson
     [not found]     ` <29f51464-55f1-8ff5-db75-df93693e8d4f@nvidia.com>
2023-04-12 20:02       ` Alex Williamson
2023-04-13 19:40 ` Alex Williamson [this message]
2023-04-17 21:14   ` [PATCH] " Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230413194042.605768-1-alex.williamson@redhat.com \
    --to=alex.williamson@redhat.com \
    --cc=abhsahu@nvidia.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=sdalvi@google.com \
    --cc=targupta@nvidia.com \
    --cc=zhguo@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.