linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] pci: Work around PCIe link training failures
       [not found] ` <20211114154152.kmabs4k3gdrzlzke@pali>
@ 2021-11-14 18:07   ` Maciej W. Rozycki
  2021-11-14 19:28     ` Pali Rohár
  0 siblings, 1 reply; 3+ messages in thread
From: Maciej W. Rozycki @ 2021-11-14 18:07 UTC (permalink / raw)
  To: Pali Rohár
  Cc: u-boot, linux-pci, Stefan Roese, Simon Glass, Phil Sutter,
	Vladimir Oltean, Bin Meng, Tim Harvey, Tom Rini, Jim Wilson,
	David Abdurachmanov

Hi Pali,

> > +	dm_pci_read_config32(dev, pcie_off + PCI_EXP_LNKCAP, &exp_lnkcap);
> > +	dm_pci_read_config32(dev, pcie_off + PCI_EXP_LNKCAP2, &exp_lnkcap2);
> > +	for (speed = (exp_lnkcap & PCI_EXP_LNKCAP_SLS) - 1;
> > +	     speed >= PCI_EXP_LNKCAP_SLS_2_5GB;
> > +	     speed--) {
> > +		if (exp_lnkcap2 && (exp_lnkcap2 & 1 << speed) == 0)
> > +			continue;
> 
> This check is not correct because lnkcap2 is optional also when lnkctl2
> is implemented. IIRC unimplemented lnkcap2 with implemented lnkctl2 can
> happen only for link speeds <= 5GT/s and when lnkcap2 returns zero then
> software should detect speeds from lnkcap register. And I have such PCIe
> GEN2 HW where PCIe Root Ports implements lnkctl2, but lnkcap2 returns
> zeros. So it is not corner case.

 Well, this code checks for non-zero lnkcap2 first and ignores it if it's 
zero, so I believe it does the right thing.  Have I missed anything?

> > @@ -267,6 +410,11 @@ void dm_pciauto_prescan_setup_bridge(str
> >  		cmdstat |= PCI_COMMAND_IO;
> >  	}
> >  
> > +	/* For PCIe devices see if we need to retrain the link by hand */
> > +	pcie_off = dm_pci_find_capability(dev, PCI_CAP_ID_EXP);
> > +	if (pcie_off)
> > +		dm_pciauto_exp_fixup_link(dev, pcie_off);
> > +
> >  	/* Enable memory and I/O accesses, enable bus master */
> >  	dm_pci_write_config16(dev, PCI_COMMAND, cmdstat | PCI_COMMAND_MASTER);
> 
> This code cannot be enabled globally for all PCIe devices. There are
> more PCIe cards which do not like retraining link and touching link
> retrain bit in downstream port break them (= power on/off is needed).

 Sigh.  It looks like plenty of breakage to choose from.

 However we have a case here where the link does not work anyway, so by 
trying to retrain it we're not going to make it any worse.  At worst it 
will remain in the non-working state.  I have tried hard to make my code 
rather careful in not touching links that do work, so unless I missed a 
case (please point out one if I did), this is not going to affect those 
PCIe devices that do not like link retraining.

> Could you bring this discussion to Linux PCI list? I guess that same
> issue had to be fixed also in Linux kernel and it would be great to have
> same fix in both projects.

 It could be done in Linux in addition to U-Boot, although I think doing 
that in the firmware is more important, especially as there could be a 
boot device downstream such a switch.  And depending on the platform Linux 
does not always reassign buses, so a user intervention (i.e. an explicitly 
added kernel parameter) would have to be required.

 And these are generic PCIe switches, they could be anywhere and with the 
weird combinations of hardware interfaces available now (e.g. PCI-PCIe 
bridge adapters or M.2 to regular PCIe slot adapters) virtually any 
combination is possible.

 E.g. I have a 1997-vintage dual Pentium MMX box (82439HX host bridge; 
[8086:1250]) with PCIe devices, although it does require a lot of Linux 
interventions to cope with its firmware limitations.  NB I plan to add 
some NVMe storage to that box, and I believe the ASM2824 has been used in 
some M.2 carrier boards meant for NVMe devices.

 I've cc-ed the linux-pci list in case someone there wants to chime in.

> For me it looks like Asmedia specific bug and so it should be applied
> only for affected Asmedia devices based on PCI vendor/device ids.

 It is surely one option to consider, though I'd prefer to keep this piece 
generic and enabled as widely as possible in a best effort to make things 
work seamlessly, so as to prevent end users from having to chase a similar 
problem with another device.  It was not exactly easy to figure out what 
is going on here to me and I am a professional with decades of experience.  
So what could an ordinary user do other than concluding that things do not 
work?  I think a seamless solution is always preferable.

 As I noted ASMedia declined to comment, so it's hard to say if the cause 
has been nailed correctly and which devices if any, beyond [1b21:2824], 
are affected.

 Thank you for your input.

  Maciej

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] pci: Work around PCIe link training failures
  2021-11-14 18:07   ` [PATCH] pci: Work around PCIe link training failures Maciej W. Rozycki
@ 2021-11-14 19:28     ` Pali Rohár
  2021-11-15  3:35       ` Maciej W. Rozycki
  0 siblings, 1 reply; 3+ messages in thread
From: Pali Rohár @ 2021-11-14 19:28 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: u-boot, linux-pci, Stefan Roese, Simon Glass, Phil Sutter,
	Vladimir Oltean, Bin Meng, Tim Harvey, Tom Rini, Jim Wilson,
	David Abdurachmanov

Hello!

On Sunday 14 November 2021 18:07:18 Maciej W. Rozycki wrote:
> Hi Pali,
> 
> > > +	dm_pci_read_config32(dev, pcie_off + PCI_EXP_LNKCAP, &exp_lnkcap);
> > > +	dm_pci_read_config32(dev, pcie_off + PCI_EXP_LNKCAP2, &exp_lnkcap2);
> > > +	for (speed = (exp_lnkcap & PCI_EXP_LNKCAP_SLS) - 1;
> > > +	     speed >= PCI_EXP_LNKCAP_SLS_2_5GB;
> > > +	     speed--) {
> > > +		if (exp_lnkcap2 && (exp_lnkcap2 & 1 << speed) == 0)
> > > +			continue;
> > 
> > This check is not correct because lnkcap2 is optional also when lnkctl2
> > is implemented. IIRC unimplemented lnkcap2 with implemented lnkctl2 can
> > happen only for link speeds <= 5GT/s and when lnkcap2 returns zero then
> > software should detect speeds from lnkcap register. And I have such PCIe
> > GEN2 HW where PCIe Root Ports implements lnkctl2, but lnkcap2 returns
> > zeros. So it is not corner case.
> 
>  Well, this code checks for non-zero lnkcap2 first and ignores it if it's 
> zero, so I believe it does the right thing.  Have I missed anything?

I'm reading spec again and I'm not sure now. It has following section:

  For software to determine the supported Link speeds for components
  where the Link Capabilities 2 register is either not implemented, or
  the value of its Supported Link Speeds Vector is 0000000b, software
  can read bits 3:0 of the Link Capabilities register (now defined to be
  the Max Link Speed field), and interpret the value as follows:
    0001b 2.5 GT/s Link speed supported
    0010b 5.0 GT/s and 2.5 GT/s Link speeds supported

I'm not sure how it should be interpreted, but my assumption is that
empty lnkcap2 implicates that only 5GT/s or 2.5GT/s speeds are supported
and therefore trying higher should not be done...

Also in spec is following note:

  It is strongly encouraged that software primarily utilize the
  Supported Link Speeds Vector instead of the Max Link Speed field.

So based on this note, iteration should be done via lnkcap2 bits instead
of lnkcap speed.

> > > @@ -267,6 +410,11 @@ void dm_pciauto_prescan_setup_bridge(str
> > >  		cmdstat |= PCI_COMMAND_IO;
> > >  	}
> > >  
> > > +	/* For PCIe devices see if we need to retrain the link by hand */
> > > +	pcie_off = dm_pci_find_capability(dev, PCI_CAP_ID_EXP);
> > > +	if (pcie_off)
> > > +		dm_pciauto_exp_fixup_link(dev, pcie_off);
> > > +
> > >  	/* Enable memory and I/O accesses, enable bus master */
> > >  	dm_pci_write_config16(dev, PCI_COMMAND, cmdstat | PCI_COMMAND_MASTER);
> > 
> > This code cannot be enabled globally for all PCIe devices. There are
> > more PCIe cards which do not like retraining link and touching link
> > retrain bit in downstream port break them (= power on/off is needed).
> 
>  Sigh.  It looks like plenty of breakage to choose from.
> 
>  However we have a case here where the link does not work anyway, so by 
> trying to retrain it we're not going to make it any worse.  At worst it 
> will remain in the non-working state.  I have tried hard to make my code 
> rather careful in not touching links that do work, so unless I missed a 
> case (please point out one if I did), this is not going to affect those 
> PCIe devices that do not like link retraining.

It looks like it should not cause issue. This is something which I said
to myself during debugging other issue and then I found those Atheros
wifi cards which enters into broken state if device on other side of the
PCIe link tries to retrain link. Something which I would never expect
that could be broken :-(

> > Could you bring this discussion to Linux PCI list? I guess that same
> > issue had to be fixed also in Linux kernel and it would be great to have
> > same fix in both projects.
> 
>  It could be done in Linux in addition to U-Boot, although I think doing 
> that in the firmware is more important, especially as there could be a 
> boot device downstream such a switch.  And depending on the platform Linux 
> does not always reassign buses, so a user intervention (i.e. an explicitly 
> added kernel parameter) would have to be required.

It is important to have it in both components (U-Boot and Linux). For
example native PCIe controller drivers in linux kernel as a first thing
do complete reset of controller together with connected devices. So
whatever do U-Boot is completely lost. And important is that Linux
kernel drivers should not depend on some bootloader configuration. And
note that your patch implements this workaround in CONFIG_PCI_PNP code,
so if board disable this option, workaround is not applied.

>  And these are generic PCIe switches, they could be anywhere and with the 
> weird combinations of hardware interfaces available now (e.g. PCI-PCIe 
> bridge adapters or M.2 to regular PCIe slot adapters) virtually any 
> combination is possible.
> 
>  E.g. I have a 1997-vintage dual Pentium MMX box (82439HX host bridge; 
> [8086:1250]) with PCIe devices, although it does require a lot of Linux 
> interventions to cope with its firmware limitations.  NB I plan to add 
> some NVMe storage to that box, and I believe the ASM2824 has been used in 
> some M.2 carrier boards meant for NVMe devices.

Hm... now thinking about your patch... and if it is general, applied to
all devices, should it also be applied to any type of downstream port?
Meaning also for root ports, or PCIe port of PCI/X to PCIe bridge (I
guess that such old platforms with only PCI host bridge without PCIe
could be affected too if you connect PCIe card via PCI/X to PCIe bridge
and then this bridge to host PCI slot).

Also one important note. I have some PCIe devices which do not want to
automatically negotiate maximal link speed (5 GT/s) and stay on initial
2.5 GT/s until flipping link retrain bit happen. But I have not debugged
this fully (I postponed this issue to future). So maybe this something
similar to your Asmedia issue: link speed is detected incorrectly and
software needs to setup (maximal) link speed manually.

>  I've cc-ed the linux-pci list in case someone there wants to chime in.
> 
> > For me it looks like Asmedia specific bug and so it should be applied
> > only for affected Asmedia devices based on PCI vendor/device ids.
> 
>  It is surely one option to consider, though I'd prefer to keep this piece 
> generic and enabled as widely as possible in a best effort to make things 
> work seamlessly, so as to prevent end users from having to chase a similar 
> problem with another device.  It was not exactly easy to figure out what 
> is going on here to me and I am a professional with decades of experience.  
> So what could an ordinary user do other than concluding that things do not 
> work?  I think a seamless solution is always preferable.
> 
>  As I noted ASMedia declined to comment, so it's hard to say if the cause 
> has been nailed correctly and which devices if any, beyond [1b21:2824], 
> are affected.

Yea, we do not know... And because we know that there is lot of broken
HW which works with current version, we need to be very careful when
introducing some workaround which is called on every hardware.

For example something similar like in your patch was implemented in
Marvell SerDes U-Boot driver, which controls physical layer (so on place
where nobody would expect touching higher layer code). Just this code
did not touched retrain link bit... And so it could not have worked and
was recently removed in patch series:
https://lore.kernel.org/u-boot/20210924205922.25432-1-marek.behun@nic.cz/

I'm not saying that I'm opposing this patch. Just I would like see how
is this issue will be fixed in kernel as kernel general workaround would
affect lot of more devices. And so solution accepted by kernel project
should be perfectly fine also for smaller project like U-Boot.

>  Thank you for your input.
> 
>   Maciej

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] pci: Work around PCIe link training failures
  2021-11-14 19:28     ` Pali Rohár
@ 2021-11-15  3:35       ` Maciej W. Rozycki
  0 siblings, 0 replies; 3+ messages in thread
From: Maciej W. Rozycki @ 2021-11-15  3:35 UTC (permalink / raw)
  To: Pali Rohár
  Cc: u-boot, linux-pci, Stefan Roese, Simon Glass, Phil Sutter,
	Vladimir Oltean, Bin Meng, Tim Harvey, Tom Rini,
	David Abdurachmanov

Hi Pali,

> >  Well, this code checks for non-zero lnkcap2 first and ignores it if it's 
> > zero, so I believe it does the right thing.  Have I missed anything?
> 
> I'm reading spec again and I'm not sure now. It has following section:
> 
>   For software to determine the supported Link speeds for components
>   where the Link Capabilities 2 register is either not implemented, or
>   the value of its Supported Link Speeds Vector is 0000000b, software
>   can read bits 3:0 of the Link Capabilities register (now defined to be
>   the Max Link Speed field), and interpret the value as follows:
>     0001b 2.5 GT/s Link speed supported
>     0010b 5.0 GT/s and 2.5 GT/s Link speeds supported

 This is consistent with speeds defined with PCIe rev. 2.0, which is the 
last revision not to define the Link Capabilities 2 register.

> I'm not sure how it should be interpreted, but my assumption is that
> empty lnkcap2 implicates that only 5GT/s or 2.5GT/s speeds are supported
> and therefore trying higher should not be done...

 I can see no ambiguity here.

 The Link Capabilities register always defines the maximum speed supported 
and given that there is no speed masking defined for PCIe rev. 2.0 and 
below all speeds up to the maximum must be always supported with hardware 
compliant to these specification revisions.

 As from PCIe rev. 3.0 the Link Capabilities 2 register provides support 
for speed masking and therefore not all speeds below the maximum given by 
the Link Capabilities register must be supported with hardware compliant 
to this and higher specification revisions.

> Also in spec is following note:
> 
>   It is strongly encouraged that software primarily utilize the
>   Supported Link Speeds Vector instead of the Max Link Speed field.
> 
> So based on this note, iteration should be done via lnkcap2 bits instead
> of lnkcap speed.

 I believe my code does iterate over lnkcap2 bits already, because speed 
encodings that correspond to those lnkcap2 bits that are zero are skipped.

> >  It could be done in Linux in addition to U-Boot, although I think doing 
> > that in the firmware is more important, especially as there could be a 
> > boot device downstream such a switch.  And depending on the platform Linux 
> > does not always reassign buses, so a user intervention (i.e. an explicitly 
> > added kernel parameter) would have to be required.
> 
> It is important to have it in both components (U-Boot and Linux). For
> example native PCIe controller drivers in linux kernel as a first thing
> do complete reset of controller together with connected devices. So
> whatever do U-Boot is completely lost. And important is that Linux
> kernel drivers should not depend on some bootloader configuration. And
> note that your patch implements this workaround in CONFIG_PCI_PNP code,
> so if board disable this option, workaround is not applied.

 Well, everything is not Linux.  The lone Unmatched board can run at least 
FreeBSD or Haiku right now, and it is only one of the numerous machines 
supported by U-Boot.  A piece of code in Linux will not help other OSes.

 Also as I noted not all Linux platforms discard the firmware settings 
with PCI/e devices, let alone bring them to their power-on defaults; this 
certainly does not happen with the Unmatched or I couldn't have benefitted 
with this change of mine proposed or earlier hacks with poking at the 
relevant registers from the U-Boot command line.  So in my scenario this 
fix surely qualifies as good enough.  Interestingly Linux does notice the 
clamp imposed by my change with the offending link:

pci 0000:05:00.0: 2.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x1 link at 0000:02:03.0 (capable of 8.000 Gb/s with 5.0 GT/s PCIe x2 link)

(the x1 vs x2 limitation is imposed by wiring one lane only in the Delock 
device in the first place; then I wired it to a single-lane M.2 connector 
of the Unmatched).

 That said I'm not opposed to porting this code to Linux, I have certainly 
contributed infinitely more code to Linux than I did to U-Boot.  It is 
just that my resources are limited, so I need to cautiously allocate them.  
I have other priorities, the Unmatched box is in my remote lab and I am 
only going to have physical access to it this week only.  The next 
opportunity may be next year at the earliest.  And I dare not reflashing 
U-Boot remotely so as not to lose access with a broken build because I 
need the machine for other purposes like GCC verification.

 So I'd rather focus on getting the change right for U-Boot now, while I 
am here.  If we agree on the way to move forward with this code, then I 
can look into porting this stuff to Linux, as I can surely add another 
kernel image to boot the system from remotely in a safe manner.

> >  And these are generic PCIe switches, they could be anywhere and with the 
> > weird combinations of hardware interfaces available now (e.g. PCI-PCIe 
> > bridge adapters or M.2 to regular PCIe slot adapters) virtually any 
> > combination is possible.
> > 
> >  E.g. I have a 1997-vintage dual Pentium MMX box (82439HX host bridge; 
> > [8086:1250]) with PCIe devices, although it does require a lot of Linux 
> > interventions to cope with its firmware limitations.  NB I plan to add 
> > some NVMe storage to that box, and I believe the ASM2824 has been used in 
> > some M.2 carrier boards meant for NVMe devices.
> 
> Hm... now thinking about your patch... and if it is general, applied to
> all devices, should it also be applied to any type of downstream port?
> 
> Meaning also for root ports, or PCIe port of PCI/X to PCIe bridge (I
> guess that such old platforms with only PCI host bridge without PCIe
> could be affected too if you connect PCIe card via PCI/X to PCIe bridge
> and then this bridge to host PCI slot).

 I guess so.  I considered root ports, but somehow it did not occur to me 
that they can be wired directly to slots (any sane manufacturer would get 
their onboard devices sorted), which of course they can.  I forgot about 
the PCI/PCI-X to PCI Express bridges though.  Thanks for the suggestion.

 As this is a trivial change to make I'll post an update, but I will wait 
till the end of the day or so so as to let people who do this stuff as a 
part of their dayjob have a chance to give feedback.

> >  As I noted ASMedia declined to comment, so it's hard to say if the cause 
> > has been nailed correctly and which devices if any, beyond [1b21:2824], 
> > are affected.
> 
> Yea, we do not know... And because we know that there is lot of broken
> HW which works with current version, we need to be very careful when
> introducing some workaround which is called on every hardware.
> 
> For example something similar like in your patch was implemented in
> Marvell SerDes U-Boot driver, which controls physical layer (so on place
> where nobody would expect touching higher layer code). Just this code
> did not touched retrain link bit... And so it could not have worked and
> was recently removed in patch series:
> https://lore.kernel.org/u-boot/20210924205922.25432-1-marek.behun@nic.cz/

 Sure, which is why this change tries hard to be conservative and checks 
multiple conditions to make sure a link is in trouble before it pokes at 
it.  If you suspect that these checks may be insufficient, then I'm open 
to further suggestions.

 NB I forgot to mention in the change description and I can add it in v2 
that the rate of the speed oscillation/link training with the ASM2824 is 
34-35 times per second.  So it's quite a lengthy process.

> I'm not saying that I'm opposing this patch. Just I would like see how
> is this issue will be fixed in kernel as kernel general workaround would
> affect lot of more devices. And so solution accepted by kernel project
> should be perfectly fine also for smaller project like U-Boot.

 Well, as far as I'm concerned I'd propose a functional equivalent that 
has been merely mechanically adapted to Linux internal interfaces (though 
busy-looping for extended periods with interrupts disabled, which is what 
is needed here, may be problematic in Linux or indeed any OS, unlike with 
the firmware; maybe PCI enumeration is early enough for that not to matter 
in reality though).

 Again, thanks for your input.

  Maciej

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-11-15  3:35 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <alpine.DEB.2.21.2111140303040.19625@angie.orcam.me.uk>
     [not found] ` <20211114154152.kmabs4k3gdrzlzke@pali>
2021-11-14 18:07   ` [PATCH] pci: Work around PCIe link training failures Maciej W. Rozycki
2021-11-14 19:28     ` Pali Rohár
2021-11-15  3:35       ` Maciej W. Rozycki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).