linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/3] Improve PCI device post-reset readiness polling
@ 2020-03-07 17:20 Stanislav Spassov
  2020-03-07 17:20 ` [PATCH v4 1/3] PCI: Refactor polling loop out of pci_dev_wait Stanislav Spassov
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Stanislav Spassov @ 2020-03-07 17:20 UTC (permalink / raw)
  To: linux-pci
  Cc: Stanislav Spassov, Bjorn Helgaas, Thomas Gleixner, Andrew Morton,
	Jan H . Schönherr, Jonathan Corbet, Ashok Raj,
	Alex Williamson, Sinan Kaya, Rajat Jain

From: Stanislav Spassov <stanspas@amazon.de>

The first version of this patch series can be found here:
https://lore.kernel.org/linux-pci/20200223122057.6504-1-stanspas@amazon.com

The goal of this patch series is to solve an issue where pci_dev_wait
can cause system crashes. After a reset, a hung device may keep
responding with CRS completions indefinitely. If CRS Software Visibility
is enabled on the Root Port, attempting to read any register other than
PCI_VENDOR_ID will cause the Root Port to autonomously retry the request
without reporting back to the CPU core. Unless the number of retries or
the amount of time spent retrying is limited by platform-specific means,
this scenario leads to low-level platform timeouts (such as a TOR
Timeout), which can easily escalate to a crash.

Feedback on the v1 inspired a lot of additional improvements all around the
device reset codepaths and reducing post-reset delays. These improvements
were published as part of v2 (v3 is just small build fixes).

It looks like there is immediate demand specifically for the CRS work,
so I am once again reducing the series to just that. The reset will be
posted as a separate patch series that will likely require more time and
iterations to stabilize.

Changes since v3:
- In pci_dev_wait(), added "timeout -= waited" to account the time spent
  polling PCI_VENDOR_ID before falling back to polling PCI_COMMAND if
  device readiness could not be positively established via CRS (i.e.,
  if we stopped receiving CRS completions but did not receive a valid
  vendor ID due to dealing with an SR-IOV VF, or due to a different error)
- Simplified the commit message of "PCI: Add CRS handling to pci_dev_wait()"
  to avoid confusion as to when Root Ports will autonomously retry requests
  that resulted in CRS completions.

Stanislav Spassov (3):
  PCI: Refactor polling loop out of pci_dev_wait
  PCI: Cache CRS Software Visibiliy in struct pci_dev
  PCI: Add CRS handling to pci_dev_wait()

 drivers/pci/pci.c   | 109 +++++++++++++++++++++++++++++++++++---------
 drivers/pci/probe.c |   8 +++-
 include/linux/pci.h |   3 ++
 3 files changed, 98 insertions(+), 22 deletions(-)


base-commit: bb6d3fb354c5ee8d6bde2d576eb7220ea09862b9
-- 
2.25.1




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-09-14 17:54 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-07 17:20 [PATCH v4 0/3] Improve PCI device post-reset readiness polling Stanislav Spassov
2020-03-07 17:20 ` [PATCH v4 1/3] PCI: Refactor polling loop out of pci_dev_wait Stanislav Spassov
2020-03-07 17:20 ` [PATCH v4 2/3] PCI: Cache CRS Software Visibiliy in struct pci_dev Stanislav Spassov
2021-09-12 13:32   ` Bjorn Helgaas
2021-09-13 16:06     ` Spassov, Stanislav
2020-03-07 17:20 ` [PATCH v4 3/3] PCI: Add CRS handling to pci_dev_wait() Stanislav Spassov
2020-03-09 15:55   ` Sinan Kaya
2020-03-09 16:19     ` Raj, Ashok
2020-03-09 16:38       ` Spassov, Stanislav
2020-03-09 17:33         ` Sinan Kaya
2021-09-11 14:03   ` Bjorn Helgaas
2021-09-13 16:29     ` Spassov, Stanislav
2021-09-13 16:38       ` Bjorn Helgaas
2021-09-13 18:04         ` Spassov, Stanislav
2021-09-14 17:53           ` Rajat Jain
2021-09-13 16:07   ` Bjorn Helgaas
2021-09-13 16:39     ` Spassov, Stanislav
2021-01-22  8:54 ` [PATCH v4 0/3] Improve PCI device post-reset readiness polling David Woodhouse
2021-09-10  9:32   ` David Woodhouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).