linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] pci: fix broadcom secondary bus reset handling
@ 2024-05-01 14:51 Keith Busch
  2024-05-01 19:55 ` Bjorn Helgaas
  2024-05-01 21:50 ` Lukas Wunner
  0 siblings, 2 replies; 6+ messages in thread
From: Keith Busch @ 2024-05-01 14:51 UTC (permalink / raw)
  To: linux-pci; +Cc: Keith Busch, Suganath Prabu S, Peter Delevoryas

From: Keith Busch <kbusch@kernel.org>

After a link reset, the Broadcom / LSI PEX890xx PCIe Gen 5 Switch in synth
mode will temporarily insert a fake place-holder device, 1000 02b2, before
the link is actually active for the expected downstream device. Confirm
the device's identifier matches what we expect before moving forward.
Otherwise, the pciehp driver may unmask hotplug notifications before
the link is actually active, which triggers an undesired device removal.

Cc: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Cc: Peter Delevoryas <pdel@meta.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/pci/pci.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index e5f243dd42884..4dc00f7411a94 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1255,6 +1255,7 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
 	int delay = 1;
 	bool retrain = false;
 	struct pci_dev *bridge;
+	u32 vid = dev->vendor | dev->device << 16;
 
 	if (pci_is_pcie(dev)) {
 		bridge = pci_upstream_bridge(dev);
@@ -1268,17 +1269,22 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
 	 * responding to them with CRS completions.  The Root Port will
 	 * generally synthesize ~0 (PCI_ERROR_RESPONSE) data to complete
 	 * the read (except when CRS SV is enabled and the read was for the
-	 * Vendor ID; in that case it synthesizes 0x0001 data).
+	 * Vendor ID; in that case it synthesizes 0x0001 data, or if the device
+	 * is downstream a Broadcom switch, which syntesizes a fake device)
 	 *
 	 * Wait for the device to return a non-CRS completion.  Read the
-	 * Command register instead of Vendor ID so we don't have to
-	 * contend with the CRS SV value.
+	 * Command register instead of Vendor ID so we don't have to contend
+	 * with the CRS SV value. But, also read the Vendor and Device ID's
+	 * to defeat Broadcom switch's placeholder device.
 	 */
 	for (;;) {
-		u32 id;
+		u32 id, l;
 
+		pci_read_config_dword(dev, PCI_VENDOR_ID, &l);
 		pci_read_config_dword(dev, PCI_COMMAND, &id);
-		if (!PCI_POSSIBLE_ERROR(id))
+
+		if (!PCI_POSSIBLE_ERROR(id) && !PCI_POSSIBLE_ERROR(l) &&
+		    l == vid)
 			break;
 
 		if (delay > timeout) {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] pci: fix broadcom secondary bus reset handling
  2024-05-01 14:51 [PATCH] pci: fix broadcom secondary bus reset handling Keith Busch
@ 2024-05-01 19:55 ` Bjorn Helgaas
  2024-05-02  8:47   ` Keith Busch
  2024-05-01 21:50 ` Lukas Wunner
  1 sibling, 1 reply; 6+ messages in thread
From: Bjorn Helgaas @ 2024-05-01 19:55 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-pci, Keith Busch, Suganath Prabu S, Peter Delevoryas, Lukas Wunner

[+cc Lukas since you mentioned pciehp]

On Wed, May 01, 2024 at 07:51:18AM -0700, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> After a link reset, the Broadcom / LSI PEX890xx PCIe Gen 5 Switch in synth
> mode will temporarily insert a fake place-holder device, 1000 02b2, before
> the link is actually active for the expected downstream device. Confirm
> the device's identifier matches what we expect before moving forward.
> Otherwise, the pciehp driver may unmask hotplug notifications before
> the link is actually active, which triggers an undesired device removal.

Is this a Switch that doesn't conform to the PCIe spec, or is there
something wrong with the way we're looking for a CRS completion?

In the absence of a device defect, I would not expect to need a
Broacom-specific comment in this code.

> Cc: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
> Cc: Peter Delevoryas <pdel@meta.com>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>  drivers/pci/pci.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index e5f243dd42884..4dc00f7411a94 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1255,6 +1255,7 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
>  	int delay = 1;
>  	bool retrain = false;
>  	struct pci_dev *bridge;
> +	u32 vid = dev->vendor | dev->device << 16;
>  
>  	if (pci_is_pcie(dev)) {
>  		bridge = pci_upstream_bridge(dev);
> @@ -1268,17 +1269,22 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
>  	 * responding to them with CRS completions.  The Root Port will
>  	 * generally synthesize ~0 (PCI_ERROR_RESPONSE) data to complete
>  	 * the read (except when CRS SV is enabled and the read was for the
> -	 * Vendor ID; in that case it synthesizes 0x0001 data).
> +	 * Vendor ID; in that case it synthesizes 0x0001 data, or if the device
> +	 * is downstream a Broadcom switch, which syntesizes a fake device)

s/syntesizes/synthesizes/

>  	 * Wait for the device to return a non-CRS completion.  Read the
> -	 * Command register instead of Vendor ID so we don't have to
> -	 * contend with the CRS SV value.
> +	 * Command register instead of Vendor ID so we don't have to contend
> +	 * with the CRS SV value. But, also read the Vendor and Device ID's
> +	 * to defeat Broadcom switch's placeholder device.

s/ID's/IDs/

>  	 */
>  	for (;;) {
> -		u32 id;
> +		u32 id, l;
>  
> +		pci_read_config_dword(dev, PCI_VENDOR_ID, &l);
>  		pci_read_config_dword(dev, PCI_COMMAND, &id);
> -		if (!PCI_POSSIBLE_ERROR(id))
> +
> +		if (!PCI_POSSIBLE_ERROR(id) && !PCI_POSSIBLE_ERROR(l) &&
> +		    l == vid)
>  			break;
>  
>  		if (delay > timeout) {
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] pci: fix broadcom secondary bus reset handling
  2024-05-01 14:51 [PATCH] pci: fix broadcom secondary bus reset handling Keith Busch
  2024-05-01 19:55 ` Bjorn Helgaas
@ 2024-05-01 21:50 ` Lukas Wunner
  2024-05-02  8:56   ` Keith Busch
  1 sibling, 1 reply; 6+ messages in thread
From: Lukas Wunner @ 2024-05-01 21:50 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-pci, Keith Busch, Suganath Prabu S, Peter Delevoryas

On Wed, May 01, 2024 at 07:51:18AM -0700, Keith Busch wrote:
> After a link reset, the Broadcom / LSI PEX890xx PCIe Gen 5 Switch in synth
> mode will temporarily insert a fake place-holder device, 1000 02b2, before
> the link is actually active for the expected downstream device. Confirm
> the device's identifier matches what we expect before moving forward.
> Otherwise, the pciehp driver may unmask hotplug notifications before
> the link is actually active, which triggers an undesired device removal.

This won't work if the device was hot-swapped with a different one
and thus correctly returns a different Vendor/Device ID.  We'd wait
for the device to report the previous device's Vendor/Device ID,
which doesn't make sense.

It would be possible to raise d3cold_delay in struct pci_dev for
children of affected Broadcom switches.  Have you considered that
as a potential solution?

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] pci: fix broadcom secondary bus reset handling
  2024-05-01 19:55 ` Bjorn Helgaas
@ 2024-05-02  8:47   ` Keith Busch
  2024-05-08 21:03     ` Keith Busch
  0 siblings, 1 reply; 6+ messages in thread
From: Keith Busch @ 2024-05-02  8:47 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Keith Busch, linux-pci, Suganath Prabu S, Peter Delevoryas, Lukas Wunner

On Wed, May 01, 2024 at 02:55:34PM -0500, Bjorn Helgaas wrote:
> On Wed, May 01, 2024 at 07:51:18AM -0700, Keith Busch wrote:
> > From: Keith Busch <kbusch@kernel.org>
> > 
> > After a link reset, the Broadcom / LSI PEX890xx PCIe Gen 5 Switch in synth
> > mode will temporarily insert a fake place-holder device, 1000 02b2, before
> > the link is actually active for the expected downstream device. Confirm
> > the device's identifier matches what we expect before moving forward.
> > Otherwise, the pciehp driver may unmask hotplug notifications before
> > the link is actually active, which triggers an undesired device removal.
> 
> Is this a Switch that doesn't conform to the PCIe spec, or is there
> something wrong with the way we're looking for a CRS completion?
> 
> In the absence of a device defect, I would not expect to need a
> Broacom-specific comment in this code.

Yeah, you're right. I started off quirking specific devices, then it
evolved to the more generic handling this turned into, but didn't update
the commit log or comments accordingly.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] pci: fix broadcom secondary bus reset handling
  2024-05-01 21:50 ` Lukas Wunner
@ 2024-05-02  8:56   ` Keith Busch
  0 siblings, 0 replies; 6+ messages in thread
From: Keith Busch @ 2024-05-02  8:56 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: Keith Busch, linux-pci, Suganath Prabu S, Peter Delevoryas

On Wed, May 01, 2024 at 11:50:47PM +0200, Lukas Wunner wrote:
> On Wed, May 01, 2024 at 07:51:18AM -0700, Keith Busch wrote:
> > After a link reset, the Broadcom / LSI PEX890xx PCIe Gen 5 Switch in synth
> > mode will temporarily insert a fake place-holder device, 1000 02b2, before
> > the link is actually active for the expected downstream device. Confirm
> > the device's identifier matches what we expect before moving forward.
> > Otherwise, the pciehp driver may unmask hotplug notifications before
> > the link is actually active, which triggers an undesired device removal.
> 
> This won't work if the device was hot-swapped with a different one
> and thus correctly returns a different Vendor/Device ID.  We'd wait
> for the device to report the previous device's Vendor/Device ID,
> which doesn't make sense.
> 
> It would be possible to raise d3cold_delay in struct pci_dev for
> children of affected Broadcom switches.  Have you considered that
> as a potential solution?

Good point, there's more paths I need to consider here. The path this is
addressing is through pciehp's reset_slot handling, which temporarily
disables the link change and presence detection. In the error scenario,
the secondary bus reset completes too quickly, which re-enables the
pciehp events before the downstream device has settled. Once it settles,
that triggers a Link Change/PDC event, then we lose our device.

I briefly considered a quirk for d3cold_delay, but I was hoping for
something more programatic than adding an arbitrary delay. That might be
okay though.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] pci: fix broadcom secondary bus reset handling
  2024-05-02  8:47   ` Keith Busch
@ 2024-05-08 21:03     ` Keith Busch
  0 siblings, 0 replies; 6+ messages in thread
From: Keith Busch @ 2024-05-08 21:03 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Keith Busch, linux-pci, Suganath Prabu S, Peter Delevoryas, Lukas Wunner

On Thu, May 02, 2024 at 09:47:31AM +0100, Keith Busch wrote:
> Yeah, you're right. I started off quirking specific devices, then it
> evolved to the more generic handling this turned into, but didn't update
> the commit log or comments accordingly.

Quick update, we're testing configurable options to modify the pcie port
behavior, so a kernel-side update may not happen if that is successful.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-05-08 21:03 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-01 14:51 [PATCH] pci: fix broadcom secondary bus reset handling Keith Busch
2024-05-01 19:55 ` Bjorn Helgaas
2024-05-02  8:47   ` Keith Busch
2024-05-08 21:03     ` Keith Busch
2024-05-01 21:50 ` Lukas Wunner
2024-05-02  8:56   ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).