linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: "Pali Rohár" <pali@kernel.org>
Cc: "Maciej W. Rozycki" <macro@orcam.me.uk>,
	Bjorn Helgaas <bhelgaas@google.com>, Stefan Roese <sr@denx.de>,
	Jim Wilson <wilson@tuliptree.org>,
	David Abdurachmanov <david.abdurachmanov@gmail.com>,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v5 5/5] PCI: Work around PCIe link training failures
Date: Thu, 3 Nov 2022 19:01:37 -0500	[thread overview]
Message-ID: <20221104000137.GA54487@bhelgaas> (raw)
In-Reply-To: <20221103234111.ykexx733fty7g3da@pali>

On Fri, Nov 04, 2022 at 12:41:11AM +0100, Pali Rohár wrote:
> On Thursday 03 November 2022 18:13:35 Bjorn Helgaas wrote:
> > [+cc Pali]
> > 
> > On Sat, Sep 17, 2022 at 01:03:38PM +0100, Maciej W. Rozycki wrote:
> > > Attempt to handle cases such as with a downstream port of the ASMedia 
> > > ASM2824 PCIe switch where link training never completes and the link 
> > > continues switching between speeds indefinitely with the data link layer 
> > > never reaching the active state.
> > > 
> > > It has been observed with a downstream port of the ASMedia ASM2824 Gen 3 
> > > switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2 
> > > switch, using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device, 
> > > P/N 41433, wired to a SiFive HiFive Unmatched board.  In this setup the 
> > > switches are supposed to negotiate the link speed of preferably 5.0GT/s, 
> > > falling back to 2.5GT/s.
> > > 
> > > Instead the link continues oscillating between the two speeds, at the 
> > > rate of 34-35 times per second, with link training reported repeatedly 
> > > active ~84% of the time.  Forcibly limiting the target link speed to 
> > > 2.5GT/s with the upstream ASM2824 device however makes the two switches 
> > > communicate correctly.  Removing the speed restriction afterwards makes 
> > > the two devices switch to 5.0GT/s then.
> > > 
> > > Make use of these observations then and detect the inability to train 
> > > the link, by checking for the Data Link Layer Link Active status bit 
> > > being off while the Link Bandwidth Management Status indicating that 
> > > hardware has changed the link speed or width in an attempt to correct 
> > > unreliable link operation.
> > > 
> > > Restrict the speed to 2.5GT/s then with the Target Link Speed field, 
> > > request a retrain and wait 200ms for the data link to go up.  If this 
> > > turns out successful, then lift the restriction, letting the devices 
> > > negotiate a higher speed.
> > > 
> > > Also check for a 2.5GT/s speed restriction the firmware may have already 
> > > arranged and lift it too with ports of devices known to continue working 
> > > afterwards, currently the ASM2824 only, that already report their data 
> > > link being up.
> > 
> > This quirk is run at boot-time and resume-time.  What happens after a
> > Secondary Bus Reset, as is done by pci_reset_secondary_bus()?
> 
> Flipping SBR bit can be done on any PCI-to-PCI bridge device and in this
> topology there are following: PCIe Root Port, ASMedia PCIe Switch
> Upstream Port, ASMedia PCIe Switch Downstream Port, Pericom PCIe Switch
> Upstream Port, Pericom PCIe Switch Downstream Port.
> (Maciej, I hope that this is whole topology and there is not some other
> device of PCI-to-PCI bridge type in your setup; please correct me)
> 
> Bjorn, to make it clear, on which device you mean to issue secondary bus
> reset?

IIUC, the problem is observed on the link between the ASM2824
downstream port and the PI7C9X2G304 upstream port, so my question is
about asserting SBR on the ASM2824 downstream port.  I think that
should cause the link between ASM2824 and PI7C9X2G304 to go down and
back up.

Thanks for the question; I didn't notice before that this quirk
applies to *all* devices.  I'm a little queasy about trying to fix
problems we have not observed.  In this case, I think the hardware is
*supposed* to establish a link at the highest supported speed
automatically.

If we need to work around a hardware bug, that's fine, but I'm not
sure I want to blindly try to help things along.

> Because I would not be surprised if different things happen when issuing
> bus reset on different parts of that topology.
> 
> > PCIe r6.0, sec 7.5.1.3.13, says "setting Secondary Bus Reset triggers
> > a hot reset on the corresponding PCI Express Port".  Sec 4.2.7 says
> > LinkUp is 0 in the LTSSM Hot Reset state, and the Hot Reset state
> > leads to Detect, so it looks like this reset would cause the link to
> > go down and come back up.
> > 
> > Can you tell if that's what happens?  Does the link negotiation fail
> > then, too?
> > 
> > If it does fail then, I don't know how hard we need to work to fix it.
> > Maybe we just accept it?  Or maybe we need a "quirk-after-reset" phase
> > or something?
> > 
> > Bjorn

  reply	other threads:[~2022-11-04  0:01 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-17 12:03 [PATCH v5 0/5] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
2022-09-17 12:03 ` [PATCH v5 1/5] PCI: Consistently report presence of PCIe link registers Maciej W. Rozycki
2022-11-07 21:27   ` Bjorn Helgaas
2022-09-17 12:03 ` [PATCH v5 2/5] PCI: Export `pcie_cap_has_lnkctl2' Maciej W. Rozycki
2022-09-17 12:03 ` [PATCH v5 3/5] PCI: Export PCI link retrain timeout Maciej W. Rozycki
2022-09-17 12:03 ` [PATCH v5 4/5] PCI: Execute `quirk_enable_clear_retrain_link' earlier Maciej W. Rozycki
2022-09-17 12:03 ` [PATCH v5 5/5] PCI: Work around PCIe link training failures Maciej W. Rozycki
2022-11-03 23:13   ` Bjorn Helgaas
2022-11-03 23:41     ` Pali Rohár
2022-11-04  0:01       ` Bjorn Helgaas [this message]
2022-11-09  2:57         ` Maciej W. Rozycki
2022-11-09  5:04           ` Bjorn Helgaas
2022-11-09 20:16             ` Alex Williamson
2022-11-29  9:57               ` Maciej W. Rozycki
2022-11-29  9:57             ` Maciej W. Rozycki
2022-10-09 14:14 ` [PATCH v5 0/5] pci: Work around ASMedia ASM2824 " Pali Rohár
2022-11-01 23:07   ` Pali Rohár

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221104000137.GA54487@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=bhelgaas@google.com \
    --cc=david.abdurachmanov@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=macro@orcam.me.uk \
    --cc=pali@kernel.org \
    --cc=sr@denx.de \
    --cc=wilson@tuliptree.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).