All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rajat Jain <rajatja@google.com>
To: "Spassov, Stanislav" <stanspas@amazon.de>
Cc: "helgaas@kernel.org" <helgaas@kernel.org>,
	"corbet@lwn.net" <corbet@lwn.net>,
	"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	"ashok.raj@intel.com" <ashok.raj@intel.com>,
	"okaya@kernel.org" <Okaya@kernel.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"Schönherr, Jan H." <jschoenh@amazon.de>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"bhelgaas@google.com" <bhelgaas@google.com>
Subject: Re: [PATCH v4 3/3] PCI: Add CRS handling to pci_dev_wait()
Date: Tue, 14 Sep 2021 10:53:20 -0700	[thread overview]
Message-ID: <CACK8Z6HcG45DNYQEGxDvZbL8WkF2uVoO5rCTOh1yCjxnxco5ig@mail.gmail.com> (raw)
In-Reply-To: <9cae764f00c5969c364728ed031c29ce49c03480.camel@amazon.de>

On Mon, Sep 13, 2021 at 11:04 AM Spassov, Stanislav <stanspas@amazon.de> wrote:
>
> On Mon, 2021-09-13 at 11:38 -0500, Bjorn Helgaas wrote:
> > On Mon, Sep 13, 2021 at 04:29:51PM +0000, Spassov, Stanislav wrote:
> > > On Sat, 2021-09-11 at 09:03 -0500, Bjorn Helgaas wrote:
> > >
> > > I later understood the specific CPU did have a proprietary register for
> > > "limiting the number of loops" that the PCIe spec talks about, and indeed
> > > that register was set to "no limit". Coupled with the stuck device, these
> > > indefinite retries eventually triggered TOR timeout.
> >
> > "No limit" sounds like a pretty bad choice, given that it means the
> > CPU will essentially hang forever because of a defective I/O device.
> > There should be a timeout so software can recover (the *device* may
> > never recover, but that's no reason why the kernel must crash).
> >
>
> Correct. "No limit" is definitely a bad choice for that register,
> and fixing the value would be preferable to any software solution.
>
> Unfortunately, at least in the case I worked on, that register was
> not accessible by the kernel.

I can acknowledge that I have across exactly the same issue (no limit
on retries, results in CPU hang) on another old Intel root port too in
the past:
https://lore.kernel.org/linux-pci/53FFA54D.9000907@gmail.com/
https://lkml.org/lkml/2014/8/1/186

and had the same problem (no way to limit the number of retries). I'd
be interested and will keep a lookout for the next patch Stanislav
sends out!

Thanks!

Rajat

> Intel exposes many CPU configuration
> registers in terms of virtual PCI devices residing directly on Root
> Buses, and the system/platform firmware is able to use vendor-provided
> means to completely hide some of these pseudo-devices from the OS.
>
> Additionally, the way the PCIe spec is phrased, not every Root Complex
> implementation is required to even have such a limiting register, while
> all implementations that advertise CRS SV capability are required to
> behave as prescribed when PCI_VENDOR_ID is read. Hence why I believe
> this patch is a general robustness improvement, rather than a workaround
> for a specific device/platform.
>
>
>
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
>
>

  reply	other threads:[~2021-09-14 17:54 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-07 17:20 [PATCH v4 0/3] Improve PCI device post-reset readiness polling Stanislav Spassov
2020-03-07 17:20 ` [PATCH v4 1/3] PCI: Refactor polling loop out of pci_dev_wait Stanislav Spassov
2020-03-07 17:20 ` [PATCH v4 2/3] PCI: Cache CRS Software Visibiliy in struct pci_dev Stanislav Spassov
2021-09-12 13:32   ` Bjorn Helgaas
2021-09-13 16:06     ` Spassov, Stanislav
2020-03-07 17:20 ` [PATCH v4 3/3] PCI: Add CRS handling to pci_dev_wait() Stanislav Spassov
2020-03-09 15:55   ` Sinan Kaya
2020-03-09 16:19     ` Raj, Ashok
2020-03-09 16:38       ` Spassov, Stanislav
2020-03-09 17:33         ` Sinan Kaya
2021-09-11 14:03   ` Bjorn Helgaas
2021-09-13 16:29     ` Spassov, Stanislav
2021-09-13 16:38       ` Bjorn Helgaas
2021-09-13 18:04         ` Spassov, Stanislav
2021-09-14 17:53           ` Rajat Jain [this message]
2021-09-13 16:07   ` Bjorn Helgaas
2021-09-13 16:39     ` Spassov, Stanislav
2021-01-22  8:54 ` [PATCH v4 0/3] Improve PCI device post-reset readiness polling David Woodhouse
2021-09-10  9:32   ` David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACK8Z6HcG45DNYQEGxDvZbL8WkF2uVoO5rCTOh1yCjxnxco5ig@mail.gmail.com \
    --to=rajatja@google.com \
    --cc=Okaya@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=bhelgaas@google.com \
    --cc=corbet@lwn.net \
    --cc=helgaas@kernel.org \
    --cc=jschoenh@amazon.de \
    --cc=linux-pci@vger.kernel.org \
    --cc=stanspas@amazon.de \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.