linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Shanker R Donthineni <sdonthineni@nvidia.com>
To: "Pali Rohár" <pali@kernel.org>
Cc: Bjorn Helgaas <helgaas@kernel.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	Bjorn Helgaas <bhelgaas@google.com>, <linux-pci@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, Sinan Kaya <okaya@kernel.org>,
	Vikram Sethi <vsethi@nvidia.com>,
	Amey Narkhede <ameynarkhede03@gmail.com>
Subject: Re: [PATCH v4 2/2] PCI: Enable NO_BUS_RESET quirk for Nvidia GPUs
Date: Wed, 5 May 2021 10:35:58 -0500	[thread overview]
Message-ID: <7519c44f-8b78-6f1b-1ef2-7e095c048696@nvidia.com> (raw)
In-Reply-To: <20210505121501.54dlrussyk7kij5d@pali>

Hi Pali,

On 5/5/21 7:15 AM, Pali Rohár wrote:
> Hello! If I understood this "reset" issue correctly, it means that
> affected PCIe GPU device cannot be reset via PCI Secondary Bus Reset
> (PCIe Warm Reset) and some special, platform specific reset type needs
> to be issued.
>
> And code for this platform specific reset is included in ACPI DSDT
> table.
Yes, correct.
> But because ACPI DSDT table is part of BIOS/firmware and not part of the
> PCIe GPU device itself, it means that this kind of reset is available to
> linux kernel only in the case when vendor of motherboard (or who burn
> BIOS/firmware into motherboard EEPROM) includes this specific code into
> HW. Am I Right?
ACPI specification provides a standard mechanism for a function level reset
using _RST method and should work for any OSPM not just Linux.

https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/resetting-and-recovering-a-device
ACPI firmware: Function-level reset
To support function-level device reset, there must be an _RST method defined inside the Device scope. If present, this method will override the bus driver's implementation of function-level device reset (if present) for that device. When executed, the _RST method must reset only that device, and must not affect other devices. In addition, the device must stay connected on the bus.
> So if this PCIe GPU device is connected to other motherboard or other
> system then this special platform reset in ACPI DSDT is not available.
PCI hw resets won't work. only way to reset the device using platform specific code.
> What is doing default APCI _RST() method on motherboards without this
> special platform reset hook? It probably would not be able to reset
> these PCIe GPU devices if standard SBR cannot reset them.
Yes, BIOS/firmware has to support where these affected  GPU devices are attached.
These GPU devices are not plug-in PCIe cards, only exist on server baseboards and
directly attached to PCIe fabric. 
> Would not be better to include for these PCIe devices "native" linux
> code for resetting them?
It requires complicated code sequence and has to access many platform specific
registers. We're taking advantage of OS independent standard ACPI-RST reset
mechanism for resting the GPU device.
> Please correct me if I'm wrong in my assumption or if I understood this
> issue incorrectly.
The GPU has side effects after triggering the SBR, it requires the system reboot to
bring the device back to the operating state, This workaround is to prevent SBR.

  reply	other threads:[~2021-05-05 15:36 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-29  0:49 [PATCH v4 1/2] PCI: Add support for a function level reset based on _RST method Shanker Donthineni
2021-04-29  0:49 ` [PATCH v4 2/2] PCI: Enable NO_BUS_RESET quirk for Nvidia GPUs Shanker Donthineni
2021-04-30 17:01   ` Bjorn Helgaas
2021-04-30 22:11     ` Shanker R Donthineni
2021-05-03 22:42       ` Bjorn Helgaas
2021-05-04  2:07         ` Shanker R Donthineni
2021-05-05  2:12           ` Bjorn Helgaas
2021-05-05  3:51             ` Shanker R Donthineni
2021-05-05  3:56             ` Oliver O'Halloran
2021-05-05 17:40               ` Amey Narkhede
2021-05-05 19:13                 ` Alex Williamson
2021-05-05 20:04                   ` Shanker R Donthineni
2021-05-05 20:40                   ` Bjorn Helgaas
2021-05-05 12:15       ` Pali Rohár
2021-05-05 15:35         ` Shanker R Donthineni [this message]
2021-04-30 18:39 ` [PATCH v4 1/2] PCI: Add support for a function level reset based on _RST method Alex Williamson
2021-04-30 19:05   ` Shanker R Donthineni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7519c44f-8b78-6f1b-1ef2-7e095c048696@nvidia.com \
    --to=sdonthineni@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=ameynarkhede03@gmail.com \
    --cc=bhelgaas@google.com \
    --cc=helgaas@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=okaya@kernel.org \
    --cc=pali@kernel.org \
    --cc=vsethi@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).