All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: "Pali Rohár" <pali@kernel.org>
Cc: "Maciej W. Rozycki" <macro@orcam.me.uk>,
	Bjorn Helgaas <bhelgaas@google.com>, Stefan Roese <sr@denx.de>,
	Jim Wilson <wilson@tuliptree.org>,
	David Abdurachmanov <david.abdurachmanov@gmail.com>,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v5 5/5] PCI: Work around PCIe link training failures
Date: Thu, 3 Nov 2022 19:01:37 -0500	[thread overview]
Message-ID: <20221104000137.GA54487@bhelgaas> (raw)
In-Reply-To: <20221103234111.ykexx733fty7g3da@pali>

On Fri, Nov 04, 2022 at 12:41:11AM +0100, Pali Rohár wrote:
> On Thursday 03 November 2022 18:13:35 Bjorn Helgaas wrote:
> > [+cc Pali]
> > 
> > On Sat, Sep 17, 2022 at 01:03:38PM +0100, Maciej W. Rozycki wrote:
> > > Attempt to handle cases such as with a downstream port of the ASMedia 
> > > ASM2824 PCIe switch where link training never completes and the link 
> > > continues switching between speeds indefinitely with the data link layer 
> > > never reaching the active state.
> > > 
> > > It has been observed with a downstream port of the ASMedia ASM2824 Gen 3 
> > > switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2 
> > > switch, using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device, 
> > > P/N 41433, wired to a SiFive HiFive Unmatched board.  In this setup the 
> > > switches are supposed to negotiate the link speed of preferably 5.0GT/s, 
> > > falling back to 2.5GT/s.
> > > 
> > > Instead the link continues oscillating between the two speeds, at the 
> > > rate of 34-35 times per second, with link training reported repeatedly 
> > > active ~84% of the time.  Forcibly limiting the target link speed to 
> > > 2.5GT/s with the upstream ASM2824 device however makes the two switches 
> > > communicate correctly.  Removing the speed restriction afterwards makes 
> > > the two devices switch to 5.0GT/s then.
> > > 
> > > Make use of these observations then and detect the inability to train 
> > > the link, by checking for the Data Link Layer Link Active status bit 
> > > being off while the Link Bandwidth Management Status indicating that 
> > > hardware has changed the link speed or width in an attempt to correct 
> > > unreliable link operation.
> > > 
> > > Restrict the speed to 2.5GT/s then with the Target Link Speed field, 
> > > request a retrain and wait 200ms for the data link to go up.  If this 
> > > turns out successful, then lift the restriction, letting the devices 
> > > negotiate a higher speed.
> > > 
> > > Also check for a 2.5GT/s speed restriction the firmware may have already 
> > > arranged and lift it too with ports of devices known to continue working 
> > > afterwards, currently the ASM2824 only, that already report their data 
> > > link being up.
> > 
> > This quirk is run at boot-time and resume-time.  What happens after a
> > Secondary Bus Reset, as is done by pci_reset_secondary_bus()?
> 
> Flipping SBR bit can be done on any PCI-to-PCI bridge device and in this
> topology there are following: PCIe Root Port, ASMedia PCIe Switch
> Upstream Port, ASMedia PCIe Switch Downstream Port, Pericom PCIe Switch
> Upstream Port, Pericom PCIe Switch Downstream Port.
> (Maciej, I hope that this is whole topology and there is not some other
> device of PCI-to-PCI bridge type in your setup; please correct me)
> 
> Bjorn, to make it clear, on which device you mean to issue secondary bus
> reset?

IIUC, the problem is observed on the link between the ASM2824
downstream port and the PI7C9X2G304 upstream port, so my question is
about asserting SBR on the ASM2824 downstream port.  I think that
should cause the link between ASM2824 and PI7C9X2G304 to go down and
back up.

Thanks for the question; I didn't notice before that this quirk
applies to *all* devices.  I'm a little queasy about trying to fix
problems we have not observed.  In this case, I think the hardware is
*supposed* to establish a link at the highest supported speed
automatically.

If we need to work around a hardware bug, that's fine, but I'm not
sure I want to blindly try to help things along.

> Because I would not be surprised if different things happen when issuing
> bus reset on different parts of that topology.
> 
> > PCIe r6.0, sec 7.5.1.3.13, says "setting Secondary Bus Reset triggers
> > a hot reset on the corresponding PCI Express Port".  Sec 4.2.7 says
> > LinkUp is 0 in the LTSSM Hot Reset state, and the Hot Reset state
> > leads to Detect, so it looks like this reset would cause the link to
> > go down and come back up.
> > 
> > Can you tell if that's what happens?  Does the link negotiation fail
> > then, too?
> > 
> > If it does fail then, I don't know how hard we need to work to fix it.
> > Maybe we just accept it?  Or maybe we need a "quirk-after-reset" phase
> > or something?
> > 
> > Bjorn

  reply	other threads:[~2022-11-04  0:01 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-17 12:03 [PATCH v5 0/5] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
2022-09-17 12:03 ` [PATCH v5 1/5] PCI: Consistently report presence of PCIe link registers Maciej W. Rozycki
2022-11-07 21:27   ` Bjorn Helgaas
2022-09-17 12:03 ` [PATCH v5 2/5] PCI: Export `pcie_cap_has_lnkctl2' Maciej W. Rozycki
2022-09-17 12:03 ` [PATCH v5 3/5] PCI: Export PCI link retrain timeout Maciej W. Rozycki
2022-09-17 12:03 ` [PATCH v5 4/5] PCI: Execute `quirk_enable_clear_retrain_link' earlier Maciej W. Rozycki
2022-09-17 12:03 ` [PATCH v5 5/5] PCI: Work around PCIe link training failures Maciej W. Rozycki
2022-11-03 23:13   ` Bjorn Helgaas
2022-11-03 23:41     ` Pali Rohár
2022-11-04  0:01       ` Bjorn Helgaas [this message]
2022-11-09  2:57         ` Maciej W. Rozycki
2022-11-09  5:04           ` Bjorn Helgaas
2022-11-09 20:16             ` Alex Williamson
2022-11-29  9:57               ` Maciej W. Rozycki
2022-11-29  9:57             ` Maciej W. Rozycki
2022-10-09 14:14 ` [PATCH v5 0/5] pci: Work around ASMedia ASM2824 " Pali Rohár
2022-11-01 23:07   ` Pali Rohár

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221104000137.GA54487@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=bhelgaas@google.com \
    --cc=david.abdurachmanov@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=macro@orcam.me.uk \
    --cc=pali@kernel.org \
    --cc=sr@denx.de \
    --cc=wilson@tuliptree.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.