linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stanislav Spassov <stanspas@amazon.com>
To: <linux-pci@vger.kernel.org>
Cc: "Stanislav Spassov" <stanspas@amazon.de>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Jan H . Schönherr" <jschoenh@amazon.de>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Ashok Raj" <ashok.raj@intel.com>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Sinan Kaya" <okaya@kernel.org>,
	"Rajat Jain" <rajatja@google.com>
Subject: [PATCH v4 0/3] Improve PCI device post-reset readiness polling
Date: Sat, 7 Mar 2020 18:20:41 +0100	[thread overview]
Message-ID: <20200307172044.29645-1-stanspas@amazon.com> (raw)

From: Stanislav Spassov <stanspas@amazon.de>

The first version of this patch series can be found here:
https://lore.kernel.org/linux-pci/20200223122057.6504-1-stanspas@amazon.com

The goal of this patch series is to solve an issue where pci_dev_wait
can cause system crashes. After a reset, a hung device may keep
responding with CRS completions indefinitely. If CRS Software Visibility
is enabled on the Root Port, attempting to read any register other than
PCI_VENDOR_ID will cause the Root Port to autonomously retry the request
without reporting back to the CPU core. Unless the number of retries or
the amount of time spent retrying is limited by platform-specific means,
this scenario leads to low-level platform timeouts (such as a TOR
Timeout), which can easily escalate to a crash.

Feedback on the v1 inspired a lot of additional improvements all around the
device reset codepaths and reducing post-reset delays. These improvements
were published as part of v2 (v3 is just small build fixes).

It looks like there is immediate demand specifically for the CRS work,
so I am once again reducing the series to just that. The reset will be
posted as a separate patch series that will likely require more time and
iterations to stabilize.

Changes since v3:
- In pci_dev_wait(), added "timeout -= waited" to account the time spent
  polling PCI_VENDOR_ID before falling back to polling PCI_COMMAND if
  device readiness could not be positively established via CRS (i.e.,
  if we stopped receiving CRS completions but did not receive a valid
  vendor ID due to dealing with an SR-IOV VF, or due to a different error)
- Simplified the commit message of "PCI: Add CRS handling to pci_dev_wait()"
  to avoid confusion as to when Root Ports will autonomously retry requests
  that resulted in CRS completions.

Stanislav Spassov (3):
  PCI: Refactor polling loop out of pci_dev_wait
  PCI: Cache CRS Software Visibiliy in struct pci_dev
  PCI: Add CRS handling to pci_dev_wait()

 drivers/pci/pci.c   | 109 +++++++++++++++++++++++++++++++++++---------
 drivers/pci/probe.c |   8 +++-
 include/linux/pci.h |   3 ++
 3 files changed, 98 insertions(+), 22 deletions(-)


base-commit: bb6d3fb354c5ee8d6bde2d576eb7220ea09862b9
-- 
2.25.1




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




             reply	other threads:[~2020-03-07 17:21 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-07 17:20 Stanislav Spassov [this message]
2020-03-07 17:20 ` [PATCH v4 1/3] PCI: Refactor polling loop out of pci_dev_wait Stanislav Spassov
2020-03-07 17:20 ` [PATCH v4 2/3] PCI: Cache CRS Software Visibiliy in struct pci_dev Stanislav Spassov
2021-09-12 13:32   ` Bjorn Helgaas
2021-09-13 16:06     ` Spassov, Stanislav
2020-03-07 17:20 ` [PATCH v4 3/3] PCI: Add CRS handling to pci_dev_wait() Stanislav Spassov
2020-03-09 15:55   ` Sinan Kaya
2020-03-09 16:19     ` Raj, Ashok
2020-03-09 16:38       ` Spassov, Stanislav
2020-03-09 17:33         ` Sinan Kaya
2021-09-11 14:03   ` Bjorn Helgaas
2021-09-13 16:29     ` Spassov, Stanislav
2021-09-13 16:38       ` Bjorn Helgaas
2021-09-13 18:04         ` Spassov, Stanislav
2021-09-14 17:53           ` Rajat Jain
2021-09-13 16:07   ` Bjorn Helgaas
2021-09-13 16:39     ` Spassov, Stanislav
2021-01-22  8:54 ` [PATCH v4 0/3] Improve PCI device post-reset readiness polling David Woodhouse
2021-09-10  9:32   ` David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200307172044.29645-1-stanspas@amazon.com \
    --to=stanspas@amazon.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=bhelgaas@google.com \
    --cc=corbet@lwn.net \
    --cc=jschoenh@amazon.de \
    --cc=linux-pci@vger.kernel.org \
    --cc=okaya@kernel.org \
    --cc=rajatja@google.com \
    --cc=stanspas@amazon.de \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).