linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] PCI: Fix disconnect related issues
@ 2024-01-29 11:27 Ilpo Järvinen
  2024-01-29 11:27 ` [PATCH 1/2] PCI: Clear LBMS on resume to avoid Target Speed quirk Ilpo Järvinen
  2024-01-29 11:27 ` [PATCH 2/2] PCI: Do not wait for disconnected devices when resuming Ilpo Järvinen
  0 siblings, 2 replies; 15+ messages in thread
From: Ilpo Järvinen @ 2024-01-29 11:27 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci, Maciej W. Rozycki
  Cc: Mika Westerberg, linux-kernel, Ilpo Järvinen

This series addresses two PCI suspend related issues Mika Westerberg
reported to occur because of Thunderbolt disconnect.

Ilpo Järvinen (2):
  PCI: Clear LBMS on resume to avoid Target Speed quirk
  PCI: Do not wait for disconnected devices when resuming

 drivers/pci/pci-driver.c | 6 ++++++
 drivers/pci/pci.c        | 5 +++++
 drivers/pci/pci.h        | 4 +++-
 3 files changed, 14 insertions(+), 1 deletion(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] PCI: Clear LBMS on resume to avoid Target Speed quirk
  2024-01-29 11:27 [PATCH 0/2] PCI: Fix disconnect related issues Ilpo Järvinen
@ 2024-01-29 11:27 ` Ilpo Järvinen
  2024-01-29 18:43   ` Bjorn Helgaas
  2024-01-29 11:27 ` [PATCH 2/2] PCI: Do not wait for disconnected devices when resuming Ilpo Järvinen
  1 sibling, 1 reply; 15+ messages in thread
From: Ilpo Järvinen @ 2024-01-29 11:27 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci, Maciej W. Rozycki, linux-kernel
  Cc: Mika Westerberg, Ilpo Järvinen

While a device is runtime suspended along with its PCIe hierarchy, the
device could get disconnected. Because of the suspend, the device
disconnection cannot be detected until portdrv/hotplug have resumed. On
runtime resume, pcie_wait_for_link_delay() is called:

  pci_pm_runtime_resume()
    pci_pm_bridge_power_up_actions()
      pci_bridge_wait_for_secondary_bus()
        pcie_wait_for_link_delay()

Because the device is already disconnected, this results in cascading
failures:

  1. pcie_wait_for_link_status() returns -ETIMEDOUT.

  2. After the commit a89c82249c37 ("PCI: Work around PCIe link
     training failures"), pcie_failed_link_retrain() spuriously detects
     this failure as a Link Retraining failure and attempts the Target
     Speed trick, which also fails.

  3. pci_bridge_wait_for_secondary_bus() then calls pci_dev_wait() which
     cannot succeed (but waits ~1 minute, delaying the resume).

The Target Speed trick (in step 2) is only used if LBMS bit (PCIe r6.1
sec 7.5.3.8) is set. For links that have been operational before
suspend, it is well possible that LBMS has been set at the bridge and
remains on. Thus, after resume, LBMS does not indicate the link needs
the Target Speed quirk. Clear LBMS on resume for bridges to avoid the
issue.

Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures")
Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 drivers/pci/pci-driver.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 51ec9e7e784f..05a114962df3 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -574,6 +574,12 @@ static void pci_pm_bridge_power_up_actions(struct pci_dev *pci_dev)
 {
 	int ret;
 
+	/*
+	 * Clear LBMS on resume to avoid spuriously triggering Target Speed
+	 * quirk in pcie_failed_link_retrain().
+	 */
+	pcie_capability_write_word(pci_dev, PCI_EXP_LNKSTA, PCI_EXP_LNKSTA_LBMS);
+
 	ret = pci_bridge_wait_for_secondary_bus(pci_dev, "resume");
 	if (ret) {
 		/*
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] PCI: Do not wait for disconnected devices when resuming
  2024-01-29 11:27 [PATCH 0/2] PCI: Fix disconnect related issues Ilpo Järvinen
  2024-01-29 11:27 ` [PATCH 1/2] PCI: Clear LBMS on resume to avoid Target Speed quirk Ilpo Järvinen
@ 2024-01-29 11:27 ` Ilpo Järvinen
  2024-01-29 18:55   ` Bjorn Helgaas
  1 sibling, 1 reply; 15+ messages in thread
From: Ilpo Järvinen @ 2024-01-29 11:27 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci, Maciej W. Rozycki, linux-kernel
  Cc: Mika Westerberg, Ilpo Järvinen

On runtime resume, pci_dev_wait() is called:
  pci_pm_runtime_resume()
    pci_pm_bridge_power_up_actions()
      pci_bridge_wait_for_secondary_bus()
        pci_dev_wait()

While a device is runtime suspended along with its PCIe hierarchy, the
device could get disconnected. In such case, the link will not come up
no matter how log pci_dev_wait() waits for it.

Besides the above mentioned case, there could be other ways to get the
device disconnected while pci_dev_wait() is waiting for the link to
come up.

Make pci_dev_wait() to exit if the device is already disconnected to
avoid unnecessary delay. As disconnected device is not really even a
failure in the same sense as link failing to come up for whatever
reason, return 0 instead of errno.

Also make sure compiler does not become too clever with
dev->error_state and use READ_ONCE() to force a fetch for the
up-to-date value.

Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 drivers/pci/pci.c | 5 +++++
 drivers/pci/pci.h | 4 +++-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index d8f11a078924..ec9bf6c90312 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1250,6 +1250,11 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
 	for (;;) {
 		u32 id;
 
+		if (pci_dev_is_disconnected(dev)) {
+			pci_dbg(dev, "disconnected; not waiting\n");
+			return 0;
+		}
+
 		pci_read_config_dword(dev, PCI_COMMAND, &id);
 		if (!PCI_POSSIBLE_ERROR(id))
 			break;
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 2336a8d1edab..563a275dff67 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -4,6 +4,8 @@
 
 #include <linux/pci.h>
 
+#include <asm/rwonce.h>
+
 /* Number of possible devfns: 0.0 to 1f.7 inclusive */
 #define MAX_NR_DEVFNS 256
 
@@ -370,7 +372,7 @@ static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused)
 
 static inline bool pci_dev_is_disconnected(const struct pci_dev *dev)
 {
-	return dev->error_state == pci_channel_io_perm_failure;
+	return READ_ONCE(dev->error_state) == pci_channel_io_perm_failure;
 }
 
 /* pci_dev priv_flags */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] PCI: Clear LBMS on resume to avoid Target Speed quirk
  2024-01-29 11:27 ` [PATCH 1/2] PCI: Clear LBMS on resume to avoid Target Speed quirk Ilpo Järvinen
@ 2024-01-29 18:43   ` Bjorn Helgaas
  2024-01-30 11:53     ` Ilpo Järvinen
  0 siblings, 1 reply; 15+ messages in thread
From: Bjorn Helgaas @ 2024-01-29 18:43 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Bjorn Helgaas, linux-pci, Maciej W. Rozycki, linux-kernel,
	Mika Westerberg

On Mon, Jan 29, 2024 at 01:27:09PM +0200, Ilpo Järvinen wrote:
> While a device is runtime suspended along with its PCIe hierarchy, the
> device could get disconnected. Because of the suspend, the device
> disconnection cannot be detected until portdrv/hotplug have resumed. On
> runtime resume, pcie_wait_for_link_delay() is called:
> 
>   pci_pm_runtime_resume()
>     pci_pm_bridge_power_up_actions()
>       pci_bridge_wait_for_secondary_bus()
>         pcie_wait_for_link_delay()
> 
> Because the device is already disconnected, this results in cascading
> failures:
> 
>   1. pcie_wait_for_link_status() returns -ETIMEDOUT.
> 
>   2. After the commit a89c82249c37 ("PCI: Work around PCIe link
>      training failures"),

I this this also depends on the merge resolution in 1abb47390350
("Merge branch 'pci/enumeration'").  Just looking at a89c82249c37 in
isolation suggests that pcie_wait_for_link_status() returning
-ETIMEDOUT would not cause pcie_wait_for_link_delay() to call
pcie_failed_link_retrain().

>      pcie_failed_link_retrain() spuriously detects
>      this failure as a Link Retraining failure and attempts the Target
>      Speed trick, which also fails.

Based on the comment below, I guess "Target Speed trick" probably
refers to the "retrain at 2.5GT/s, then remove the speed restriction
and retrain again" part of pcie_failed_link_retrain() (which I guess
is basically the entire point of the function)?

>   3. pci_bridge_wait_for_secondary_bus() then calls pci_dev_wait() which
>      cannot succeed (but waits ~1 minute, delaying the resume).
> 
> The Target Speed trick (in step 2) is only used if LBMS bit (PCIe r6.1
> sec 7.5.3.8) is set. For links that have been operational before
> suspend, it is well possible that LBMS has been set at the bridge and
> remains on. Thus, after resume, LBMS does not indicate the link needs
> the Target Speed quirk. Clear LBMS on resume for bridges to avoid the
> issue.
> 
> Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures")
> Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> ---
>  drivers/pci/pci-driver.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 51ec9e7e784f..05a114962df3 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -574,6 +574,12 @@ static void pci_pm_bridge_power_up_actions(struct pci_dev *pci_dev)
>  {
>  	int ret;
>  
> +	/*
> +	 * Clear LBMS on resume to avoid spuriously triggering Target Speed
> +	 * quirk in pcie_failed_link_retrain().
> +	 */
> +	pcie_capability_write_word(pci_dev, PCI_EXP_LNKSTA, PCI_EXP_LNKSTA_LBMS);
> +
>  	ret = pci_bridge_wait_for_secondary_bus(pci_dev, "resume");
>  	if (ret) {
>  		/*
> -- 
> 2.39.2
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] PCI: Do not wait for disconnected devices when resuming
  2024-01-29 11:27 ` [PATCH 2/2] PCI: Do not wait for disconnected devices when resuming Ilpo Järvinen
@ 2024-01-29 18:55   ` Bjorn Helgaas
  2024-01-30 13:15     ` Ilpo Järvinen
  0 siblings, 1 reply; 15+ messages in thread
From: Bjorn Helgaas @ 2024-01-29 18:55 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Bjorn Helgaas, linux-pci, Maciej W. Rozycki, linux-kernel,
	Mika Westerberg

On Mon, Jan 29, 2024 at 01:27:10PM +0200, Ilpo Järvinen wrote:
> On runtime resume, pci_dev_wait() is called:
>   pci_pm_runtime_resume()
>     pci_pm_bridge_power_up_actions()
>       pci_bridge_wait_for_secondary_bus()
>         pci_dev_wait()
> 
> While a device is runtime suspended along with its PCIe hierarchy, the
> device could get disconnected. In such case, the link will not come up
> no matter how log pci_dev_wait() waits for it.

s/PCIe/PCI/ (unless this is a PCIe-specific thing)
s/log/long/

> Besides the above mentioned case, there could be other ways to get the
> device disconnected while pci_dev_wait() is waiting for the link to
> come up.
> 
> Make pci_dev_wait() to exit if the device is already disconnected to
> avoid unnecessary delay. As disconnected device is not really even a
> failure in the same sense as link failing to come up for whatever
> reason, return 0 instead of errno.

The device being disconnected is not the same as a link failure.  Do
all the callers do the right thing if pci_dev_wait() returns success
when there's no device there?

> Also make sure compiler does not become too clever with
> dev->error_state and use READ_ONCE() to force a fetch for the
> up-to-date value.

I think we should have a comment there to say why READ_ONCE() is
needed.  Otherwise it's hard to know whether a future change might
make it unnecessary.

> Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> ---
>  drivers/pci/pci.c | 5 +++++
>  drivers/pci/pci.h | 4 +++-
>  2 files changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index d8f11a078924..ec9bf6c90312 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1250,6 +1250,11 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
>  	for (;;) {
>  		u32 id;
>  
> +		if (pci_dev_is_disconnected(dev)) {
> +			pci_dbg(dev, "disconnected; not waiting\n");
> +			return 0;
> +		}
> +
>  		pci_read_config_dword(dev, PCI_COMMAND, &id);
>  		if (!PCI_POSSIBLE_ERROR(id))
>  			break;
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 2336a8d1edab..563a275dff67 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -4,6 +4,8 @@
>  
>  #include <linux/pci.h>
>  
> +#include <asm/rwonce.h>
> +
>  /* Number of possible devfns: 0.0 to 1f.7 inclusive */
>  #define MAX_NR_DEVFNS 256
>  
> @@ -370,7 +372,7 @@ static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused)
>  
>  static inline bool pci_dev_is_disconnected(const struct pci_dev *dev)
>  {
> -	return dev->error_state == pci_channel_io_perm_failure;
> +	return READ_ONCE(dev->error_state) == pci_channel_io_perm_failure;
>  }
>  
>  /* pci_dev priv_flags */
> -- 
> 2.39.2
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] PCI: Clear LBMS on resume to avoid Target Speed quirk
  2024-01-29 18:43   ` Bjorn Helgaas
@ 2024-01-30 11:53     ` Ilpo Järvinen
  2024-01-30 16:41       ` Maciej W. Rozycki
  0 siblings, 1 reply; 15+ messages in thread
From: Ilpo Järvinen @ 2024-01-30 11:53 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, Maciej W. Rozycki, LKML, Mika Westerberg

[-- Attachment #1: Type: text/plain, Size: 3339 bytes --]

On Mon, 29 Jan 2024, Bjorn Helgaas wrote:

> On Mon, Jan 29, 2024 at 01:27:09PM +0200, Ilpo Järvinen wrote:
> > While a device is runtime suspended along with its PCIe hierarchy, the
> > device could get disconnected. Because of the suspend, the device
> > disconnection cannot be detected until portdrv/hotplug have resumed. On
> > runtime resume, pcie_wait_for_link_delay() is called:
> > 
> >   pci_pm_runtime_resume()
> >     pci_pm_bridge_power_up_actions()
> >       pci_bridge_wait_for_secondary_bus()
> >         pcie_wait_for_link_delay()
> > 
> > Because the device is already disconnected, this results in cascading
> > failures:
> > 
> >   1. pcie_wait_for_link_status() returns -ETIMEDOUT.
> > 
> >   2. After the commit a89c82249c37 ("PCI: Work around PCIe link
> >      training failures"),
> 
> I this this also depends on the merge resolution in 1abb47390350
> ("Merge branch 'pci/enumeration'").  Just looking at a89c82249c37 in
> isolation suggests that pcie_wait_for_link_status() returning
> -ETIMEDOUT would not cause pcie_wait_for_link_delay() to call
> pcie_failed_link_retrain().

I was aware of the merge but I seem to have somehow misanalyzed the return 
values earlier since I cannot anymore reach my earlier conclusion and now
ended up agreeing with your analysis that 1abb47390350 broke it.

That would imply there is a logic error in 1abb47390350 in addition to 
the LBMS-logic problem in a89c82249c37 my patch is fixing... However, I 
cannot pinpoint a single error because there seems to be more than one in 
the whole code.

First of all, this is not true for pcie_failed_link_retrain():
 * Return TRUE if the link has been successfully retrained, otherwise FALSE.
If LBMS is not set, the Target Speed quirk is not applied but the function 
still returns true. I think that should be changed to early return false
when no LBMS is present.

But if I make that change, then pcie_wait_for_link_delay() will do 
msleep() + return true, and pci_bridge_wait_for_secondary_bus() will call 
long ~60s pci_dev_wait().

I'll try to come up another patch to cleanup all that return logic so that 
it actually starts to make some sense.

> >      pcie_failed_link_retrain() spuriously detects
> >      this failure as a Link Retraining failure and attempts the Target
> >      Speed trick, which also fails.
> 
> Based on the comment below, I guess "Target Speed trick" probably
> refers to the "retrain at 2.5GT/s, then remove the speed restriction
> and retrain again" part of pcie_failed_link_retrain() (which I guess
> is basically the entire point of the function)?

Yes. I'll change the wording slightly to make it more obvious and put 
(Target Speed quirk) into parenthesis so I can use it below.

> >   3. pci_bridge_wait_for_secondary_bus() then calls pci_dev_wait() which
> >      cannot succeed (but waits ~1 minute, delaying the resume).
> > 
> > The Target Speed trick (in step 2) is only used if LBMS bit (PCIe r6.1
> > sec 7.5.3.8) is set. For links that have been operational before
> > suspend, it is well possible that LBMS has been set at the bridge and
> > remains on. Thus, after resume, LBMS does not indicate the link needs
> > the Target Speed quirk. Clear LBMS on resume for bridges to avoid the
> > issue.


-- 
 i.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] PCI: Do not wait for disconnected devices when resuming
  2024-01-29 18:55   ` Bjorn Helgaas
@ 2024-01-30 13:15     ` Ilpo Järvinen
  2024-02-02 17:03       ` Ilpo Järvinen
  0 siblings, 1 reply; 15+ messages in thread
From: Ilpo Järvinen @ 2024-01-30 13:15 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, linux-pci, Maciej W. Rozycki, LKML, Mika Westerberg

[-- Attachment #1: Type: text/plain, Size: 2938 bytes --]

On Mon, 29 Jan 2024, Bjorn Helgaas wrote:

> On Mon, Jan 29, 2024 at 01:27:10PM +0200, Ilpo Järvinen wrote:
> > On runtime resume, pci_dev_wait() is called:
> >   pci_pm_runtime_resume()
> >     pci_pm_bridge_power_up_actions()
> >       pci_bridge_wait_for_secondary_bus()
> >         pci_dev_wait()
> > 
> > While a device is runtime suspended along with its PCIe hierarchy, the
> > device could get disconnected. In such case, the link will not come up
> > no matter how log pci_dev_wait() waits for it.
> 
> s/PCIe/PCI/ (unless this is a PCIe-specific thing)
> s/log/long/
> 
> > Besides the above mentioned case, there could be other ways to get the
> > device disconnected while pci_dev_wait() is waiting for the link to
> > come up.
> > 
> > Make pci_dev_wait() to exit if the device is already disconnected to
> > avoid unnecessary delay. As disconnected device is not really even a
> > failure in the same sense as link failing to come up for whatever
> > reason, return 0 instead of errno.
> 
> The device being disconnected is not the same as a link failure.

This we agree and it's what I tried to write above.

> Do
> all the callers do the right thing if pci_dev_wait() returns success
> when there's no device there?

It's a bit complicated. I honestly don't know what is the best approach 
here so I'm very much open to your suggestion what would be preferrable.

There are two main use cases (more than two callers but they seem boil 
down to two use cases).

One use case is reset related functions and those would probably prefer to 
have error returned if the wait, and thus reset, failed.

Then the another is wait for buses, that is, 
pci_bridge_wait_for_secondary_bus() which return 0 if there's no device 
(wait successful). For it, it would make sense to return 0 also when 
device is disconnected because it seems analoguous to the case where 
there's no device in the first place.

Perhaps it would be better to return -ENOTTY from pci_dev_wait() and check 
for that disconnected condition inside 
pci_bridge_wait_for_secondary_bus()? To further complicate things, 
however, DPC also depends on the return value of 
pci_bridge_wait_for_secondary_bus() and from its perspective, returning 
error when there is a device that is disconnected might be the desirable 
alternative (but I'm not fully sure how everything in DPC works but I 
highly suspect I'm correct here).

Either way, the fix itself does care and seemed to work regardless of the 
actual value returned by pci_dev_wait().

> > Also make sure compiler does not become too clever with
> > dev->error_state and use READ_ONCE() to force a fetch for the
> > up-to-date value.
> 
> I think we should have a comment there to say why READ_ONCE() is
> needed.  Otherwise it's hard to know whether a future change might
> make it unnecessary.

Sure, I'll add a comment there.

-- 
 i.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] PCI: Clear LBMS on resume to avoid Target Speed quirk
  2024-01-30 11:53     ` Ilpo Järvinen
@ 2024-01-30 16:41       ` Maciej W. Rozycki
  2024-01-30 17:33         ` Ilpo Järvinen
  0 siblings, 1 reply; 15+ messages in thread
From: Maciej W. Rozycki @ 2024-01-30 16:41 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Bjorn Helgaas, linux-pci, LKML, Mika Westerberg

On Tue, 30 Jan 2024, Ilpo Järvinen wrote:

> First of all, this is not true for pcie_failed_link_retrain():
>  * Return TRUE if the link has been successfully retrained, otherwise FALSE.
> If LBMS is not set, the Target Speed quirk is not applied but the function 
> still returns true. I think that should be changed to early return false
> when no LBMS is present.

 I think there is a corner case here indeed.  If Link Active reporting is 
supported and neither DLLLA nor LBMS are set at entry, then the function 
indeed returns success even though the link is down and no attempt to 
retrain will have been made.  So this does indeed look a case for a return 
with the FALSE result.

 I think most easily this can be sorted by delegating the return result to 
a temporary variable, preset to FALSE and then only updated from results 
of the calls to `pcie_retrain_link'.  I can offer a patch as the author of 
the code and one who has means to verify it right away if that helped.

 Overall I guess it's all legacy of how this code evolved before it's been 
finally merged.

> > >   3. pci_bridge_wait_for_secondary_bus() then calls pci_dev_wait() which
> > >      cannot succeed (but waits ~1 minute, delaying the resume).
> > > 
> > > The Target Speed trick (in step 2) is only used if LBMS bit (PCIe r6.1
> > > sec 7.5.3.8) is set. For links that have been operational before
> > > suspend, it is well possible that LBMS has been set at the bridge and
> > > remains on. Thus, after resume, LBMS does not indicate the link needs
> > > the Target Speed quirk. Clear LBMS on resume for bridges to avoid the
> > > issue.

 This can be problematic AFAICT however.  While I am not able to verify 
suspend/resume operation with my devices, I expect the behaviour to be 
exactly the same after resume as after a bus reset: the link will fail to 
negotiate and the LBMS and DLLLA bits will be respectively set and clear.  
Consequently if you clear LBMS at resume, then the workaround won't 
trigger and the link will remain inoperational in its limbo state.

 What kind of scenario does the LBMS bit get set in that you have a 
trouble with?  You write that in your case the downstream device has been 
disconnected while the bug hierarchy was suspended and it is not present 
anymore at resume, is that correct?

 But in that case no link negotiation could have been possible and 
consequently the LBMS bit mustn't have been set by hardware (according to 
my understanding of PCIe), because (for compliant, non-broken devices 
anyway) it is only specified to be set for ports that can communicate with 
the other link end (the spec explicitly says there mustn't have been a 
transition through the DL_Down status for the port).

 Am I missing something?

  Maciej

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] PCI: Clear LBMS on resume to avoid Target Speed quirk
  2024-01-30 16:41       ` Maciej W. Rozycki
@ 2024-01-30 17:33         ` Ilpo Järvinen
  2024-02-01  9:47           ` Ilpo Järvinen
  0 siblings, 1 reply; 15+ messages in thread
From: Ilpo Järvinen @ 2024-01-30 17:33 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Bjorn Helgaas, linux-pci, LKML, Mika Westerberg

[-- Attachment #1: Type: text/plain, Size: 4613 bytes --]

On Tue, 30 Jan 2024, Maciej W. Rozycki wrote:

> On Tue, 30 Jan 2024, Ilpo Järvinen wrote:
> 
> > First of all, this is not true for pcie_failed_link_retrain():
> >  * Return TRUE if the link has been successfully retrained, otherwise FALSE.
> > If LBMS is not set, the Target Speed quirk is not applied but the function 
> > still returns true. I think that should be changed to early return false
> > when no LBMS is present.
> 
>  I think there is a corner case here indeed.  If Link Active reporting is 
> supported and neither DLLLA nor LBMS are set at entry, then the function 
> indeed returns success even though the link is down and no attempt to 
> retrain will have been made.  So this does indeed look a case for a return 
> with the FALSE result.
> 
>  I think most easily this can be sorted by delegating the return result to 
> a temporary variable, preset to FALSE and then only updated from results 
> of the calls to `pcie_retrain_link'.  I can offer a patch as the author of 
> the code and one who has means to verify it right away if that helped.

I already wrote a patch which addresses this together with the caller 
site changes.

>  Overall I guess it's all legacy of how this code evolved before it's been 
> finally merged.

Indeed, it looks the step by step changed didn't yield good result here.

> > > >   3. pci_bridge_wait_for_secondary_bus() then calls pci_dev_wait() which
> > > >      cannot succeed (but waits ~1 minute, delaying the resume).
> > > > 
> > > > The Target Speed trick (in step 2) is only used if LBMS bit (PCIe r6.1
> > > > sec 7.5.3.8) is set. For links that have been operational before
> > > > suspend, it is well possible that LBMS has been set at the bridge and
> > > > remains on. Thus, after resume, LBMS does not indicate the link needs
> > > > the Target Speed quirk. Clear LBMS on resume for bridges to avoid the
> > > > issue.
> 
>  This can be problematic AFAICT however.  While I am not able to verify 
> suspend/resume operation with my devices, I expect the behaviour to be 
> exactly the same after resume as after a bus reset: the link will fail to 
> negotiate and the LBMS and DLLLA bits will be respectively set and clear.  
> Consequently if you clear LBMS at resume, then the workaround won't 
> trigger and the link will remain inoperational in its limbo state.

How do you get the LBMS set in the first place? Isn't that because the 
link tries to come up so shouldn't it reassert that bit again before the 
code ends up into the target speed quirk? That is, I assumed you actually 
wanted to detect LBMS getting set during pcie_wait_for_link_status() call 
preceeding pcie_failed_link_retrain() call?

There's always an option to store it into pci_dev for later use when the 
quirk is found to be working for the first time if other solutions don't 
work.

In any case and unrelated to this patch, the way this quirk monopolizes 
LBMS bit is going to have to be changed because it won't be reliable with 
the PCIe BW controller that sets up and irq for LBMS (and clears the bit).
In bwctrl v5 (yet to be posted) I'll add LBMS counter into bwctrl to allow 
this quirk to keep working (which will need to be confirmed).

>  What kind of scenario does the LBMS bit get set in that you have a 
> trouble with?  You write that in your case the downstream device has been 
> disconnected while the bug hierarchy was suspended and it is not present 
> anymore at resume, is that correct?
>
>  But in that case no link negotiation could have been possible and 
> consequently the LBMS bit mustn't have been set by hardware (according to 
> my understanding of PCIe), because (for compliant, non-broken devices 
> anyway) it is only specified to be set for ports that can communicate with 
> the other link end (the spec explicitly says there mustn't have been a 
> transition through the DL_Down status for the port).
>
>  Am I missing something?

Yes, when resuming the device is already gone but the bridge still has 
LBMS set. My understanding is that it was set because it was there
from pre-suspend time but I've not really taken a deep look into it 
because the problem and fix seemed obvious.

I read that "without the Port transitioning through DL_Down status" 
differently than you, I only interpret that it relates to the two 
bullets following it. ...So if retrain bit is set, and link then goes 
down, the bullet no longer applies and LBMS should not be set because 
there was transition through DL_Down. But I could well be wrong...

-- 
 i.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] PCI: Clear LBMS on resume to avoid Target Speed quirk
  2024-01-30 17:33         ` Ilpo Järvinen
@ 2024-02-01  9:47           ` Ilpo Järvinen
  2024-02-01 18:49             ` Maciej W. Rozycki
  0 siblings, 1 reply; 15+ messages in thread
From: Ilpo Järvinen @ 2024-02-01  9:47 UTC (permalink / raw)
  To: Maciej W. Rozycki, Bjorn Helgaas; +Cc: linux-pci, LKML, Mika Westerberg

[-- Attachment #1: Type: text/plain, Size: 6410 bytes --]

On Tue, 30 Jan 2024, Ilpo Järvinen wrote:

> On Tue, 30 Jan 2024, Maciej W. Rozycki wrote:
> 
> > On Tue, 30 Jan 2024, Ilpo Järvinen wrote:
> > 
> > > First of all, this is not true for pcie_failed_link_retrain():
> > >  * Return TRUE if the link has been successfully retrained, otherwise FALSE.
> > > If LBMS is not set, the Target Speed quirk is not applied but the function 
> > > still returns true. I think that should be changed to early return false
> > > when no LBMS is present.
> > 
> >  I think there is a corner case here indeed.  If Link Active reporting is 
> > supported and neither DLLLA nor LBMS are set at entry, then the function 
> > indeed returns success even though the link is down and no attempt to 
> > retrain will have been made.  So this does indeed look a case for a return 
> > with the FALSE result.
> > 
> >  I think most easily this can be sorted by delegating the return result to 
> > a temporary variable, preset to FALSE and then only updated from results 
> > of the calls to `pcie_retrain_link'.  I can offer a patch as the author of 
> > the code and one who has means to verify it right away if that helped.
> 
> I already wrote a patch which addresses this together with the caller 
> site changes.
> 
> >  Overall I guess it's all legacy of how this code evolved before it's been 
> > finally merged.
> 
> Indeed, it looks the step by step changed didn't yield good result here.
> 
> > > > >   3. pci_bridge_wait_for_secondary_bus() then calls pci_dev_wait() which
> > > > >      cannot succeed (but waits ~1 minute, delaying the resume).
> > > > > 
> > > > > The Target Speed trick (in step 2) is only used if LBMS bit (PCIe r6.1
> > > > > sec 7.5.3.8) is set. For links that have been operational before
> > > > > suspend, it is well possible that LBMS has been set at the bridge and
> > > > > remains on. Thus, after resume, LBMS does not indicate the link needs
> > > > > the Target Speed quirk. Clear LBMS on resume for bridges to avoid the
> > > > > issue.
> > 
> >  This can be problematic AFAICT however.  While I am not able to verify 
> > suspend/resume operation with my devices, I expect the behaviour to be 
> > exactly the same after resume as after a bus reset: the link will fail to 
> > negotiate and the LBMS and DLLLA bits will be respectively set and clear.  
> > Consequently if you clear LBMS at resume, then the workaround won't 
> > trigger and the link will remain inoperational in its limbo state.
> 
> How do you get the LBMS set in the first place? Isn't that because the 
> link tries to come up so shouldn't it reassert that bit again before the 
> code ends up into the target speed quirk? That is, I assumed you actually 
> wanted to detect LBMS getting set during pcie_wait_for_link_status() call 
> preceeding pcie_failed_link_retrain() call?
> 
> There's always an option to store it into pci_dev for later use when the 
> quirk is found to be working for the first time if other solutions don't 
> work.
> 
> In any case and unrelated to this patch, the way this quirk monopolizes 
> LBMS bit is going to have to be changed because it won't be reliable with 
> the PCIe BW controller that sets up and irq for LBMS (and clears the bit).
> In bwctrl v5 (yet to be posted) I'll add LBMS counter into bwctrl to allow 
> this quirk to keep working (which will need to be confirmed).
> 
> >  What kind of scenario does the LBMS bit get set in that you have a 
> > trouble with?  You write that in your case the downstream device has been 
> > disconnected while the bug hierarchy was suspended and it is not present 
> > anymore at resume, is that correct?
> >
> >  But in that case no link negotiation could have been possible and 
> > consequently the LBMS bit mustn't have been set by hardware (according to 
> > my understanding of PCIe), because (for compliant, non-broken devices 
> > anyway) it is only specified to be set for ports that can communicate with 
> > the other link end (the spec explicitly says there mustn't have been a 
> > transition through the DL_Down status for the port).
> >
> >  Am I missing something?
> 
> Yes, when resuming the device is already gone but the bridge still has 
> LBMS set. My understanding is that it was set because it was there
> from pre-suspend time but I've not really taken a deep look into it 
> because the problem and fix seemed obvious.
> 
> I read that "without the Port transitioning through DL_Down status" 
> differently than you, I only interpret that it relates to the two 
> bullets following it. ...So if retrain bit is set, and link then goes 
> down, the bullet no longer applies and LBMS should not be set because 
> there was transition through DL_Down. But I could well be wrong...

Hi again,

I went to check Root Ports on some machines and toggled their Link 
Disable bit on. None of the Root Ports I tested cleared LBMS. That means
LBMS being on after resume is not enough to differentiate the HW which 
needs Target Speed quirk and which does not. Unless we clear LBMS at some 
point which was what this patch tried to do.

So this misidentification problem in the quirk looks quite widespread but 
is mostly dormant because Links do come up normally so the quirk code is 
never called.

I might conduct even larger set of tests once I script a way to 
automatically pick a "safe" Root Port (something that is connected to a 
device but unused, I manually picked Root Ports above unused NICs for the 
manual tests I already did). But I don't believe it changes anything and 
in the end I won't find any Root Ports that would actually clear LBMS on 
their own when Link goes down.

So I would be really curious now to know how you get the LBMS on the 
device that needs the Target Speed quirk? Is this true (from the commit 
a89c82249c37 ("PCI: Work around PCIe link training failures")):

"Instead the link continues oscillating between the two speeds, at the 
rate of 34-35 times per second, with link training reported repeatedly 
active ~84% of the time."

?

Because if it is constantly picking another speed, it would mean you get 
LBMS set over and over again, no? If that happens 34-35 times per second, 
it should be set already again when we get into that quirk because there 
was some wait before it gets called.

-- 
 i.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] PCI: Clear LBMS on resume to avoid Target Speed quirk
  2024-02-01  9:47           ` Ilpo Järvinen
@ 2024-02-01 18:49             ` Maciej W. Rozycki
  2024-02-02 15:27               ` Ilpo Järvinen
  2024-02-12 17:56               ` Maciej W. Rozycki
  0 siblings, 2 replies; 15+ messages in thread
From: Maciej W. Rozycki @ 2024-02-01 18:49 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Bjorn Helgaas, linux-pci, LKML, Mika Westerberg

On Thu, 1 Feb 2024, Ilpo Järvinen wrote:

> > >  I think there is a corner case here indeed.  If Link Active reporting is 
> > > supported and neither DLLLA nor LBMS are set at entry, then the function 
> > > indeed returns success even though the link is down and no attempt to 
> > > retrain will have been made.  So this does indeed look a case for a return 
> > > with the FALSE result.
> > > 
> > >  I think most easily this can be sorted by delegating the return result to 
> > > a temporary variable, preset to FALSE and then only updated from results 
> > > of the calls to `pcie_retrain_link'.  I can offer a patch as the author of 
> > > the code and one who has means to verify it right away if that helped.
> > 
> > I already wrote a patch which addresses this together with the caller 
> > site changes.

 Can you post it here for review then, as surely it's a standalone change?

> > >  This can be problematic AFAICT however.  While I am not able to verify 
> > > suspend/resume operation with my devices, I expect the behaviour to be 
> > > exactly the same after resume as after a bus reset: the link will fail to 
> > > negotiate and the LBMS and DLLLA bits will be respectively set and clear.  
> > > Consequently if you clear LBMS at resume, then the workaround won't 
> > > trigger and the link will remain inoperational in its limbo state.
> > 
> > How do you get the LBMS set in the first place? Isn't that because the 
> > link tries to come up so shouldn't it reassert that bit again before the 
> > code ends up into the target speed quirk? That is, I assumed you actually 
> > wanted to detect LBMS getting set during pcie_wait_for_link_status() call 
> > preceeding pcie_failed_link_retrain() call?

 It is a good question what the sequence of events exactly is that sets 
the LBMS bit.  I don't know the answer offhand.

> > There's always an option to store it into pci_dev for later use when the 
> > quirk is found to be working for the first time if other solutions don't 
> > work.

 Indeed there is.

> > In any case and unrelated to this patch, the way this quirk monopolizes 
> > LBMS bit is going to have to be changed because it won't be reliable with 
> > the PCIe BW controller that sets up and irq for LBMS (and clears the bit).
> > In bwctrl v5 (yet to be posted) I'll add LBMS counter into bwctrl to allow 
> > this quirk to keep working (which will need to be confirmed).

 If there's an interrupt handler for LBMS events, then it may be the best 
approach if the quirk is triggered by the handler instead, possibly as a 
softirq.

> > >  What kind of scenario does the LBMS bit get set in that you have a 
> > > trouble with?  You write that in your case the downstream device has been 
> > > disconnected while the bug hierarchy was suspended and it is not present 
> > > anymore at resume, is that correct?
> > >
> > >  But in that case no link negotiation could have been possible and 
> > > consequently the LBMS bit mustn't have been set by hardware (according to 
> > > my understanding of PCIe), because (for compliant, non-broken devices 
> > > anyway) it is only specified to be set for ports that can communicate with 
> > > the other link end (the spec explicitly says there mustn't have been a 
> > > transition through the DL_Down status for the port).
> > >
> > >  Am I missing something?
> > 
> > Yes, when resuming the device is already gone but the bridge still has 
> > LBMS set. My understanding is that it was set because it was there
> > from pre-suspend time but I've not really taken a deep look into it 
> > because the problem and fix seemed obvious.

 I've always been confused with the suspend/resume terminology: I'd have 
assumed this would have gone through a power cycle, in which case the LBMS 
bit would have necessarily been cleared in the transition, because its 
required state at power-up/reset is 0, so the value of 1 observed would be 
a result of what has happened solely through the resume stage.  Otherwise 
it may make sense to clear the bit in the course of the suspend stage, 
though it wouldn't be race-free I'm afraid.

> > I read that "without the Port transitioning through DL_Down status" 
> > differently than you, I only interpret that it relates to the two 
> > bullets following it. ...So if retrain bit is set, and link then goes 
> > down, the bullet no longer applies and LBMS should not be set because 
> > there was transition through DL_Down. But I could well be wrong...

 What I refer to is that if you suspend your system, remove the device 
that originally caused the quirk to trigger and then resume your system 
with the device absent, then LBMS couldn't have been set in the course of 
resume, because the port couldn't have come out of the DL_Down status in 
the absence of the downstream device.  Do you interpret it differently?

 Of course once set the bit isn't self-clearing except at power-up/reset.

> So I would be really curious now to know how you get the LBMS on the 
> device that needs the Target Speed quirk? Is this true (from the commit 
> a89c82249c37 ("PCI: Work around PCIe link training failures")):
> 
> "Instead the link continues oscillating between the two speeds, at the 
> rate of 34-35 times per second, with link training reported repeatedly 
> active ~84% of the time."
> 
> ?

 That is what I have observed.  It was so long ago I don't remember how I 
calculated the figures anymore, it may have been with a U-Boot debug patch 
made to collect samples (because with U-Boot you can just poll the LT bit 
while busy-looping).  I'd have to try and dig out the old stuff.

> Because if it is constantly picking another speed, it would mean you get 
> LBMS set over and over again, no? If that happens 34-35 times per second, 
> it should be set already again when we get into that quirk because there 
> was some wait before it gets called.

 I'll see if I can experiment with the hardware over the next couple of 
days and come back with my findings.

  Maciej

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] PCI: Clear LBMS on resume to avoid Target Speed quirk
  2024-02-01 18:49             ` Maciej W. Rozycki
@ 2024-02-02 15:27               ` Ilpo Järvinen
  2024-02-07 12:33                 ` Ilpo Järvinen
  2024-02-12 17:56               ` Maciej W. Rozycki
  1 sibling, 1 reply; 15+ messages in thread
From: Ilpo Järvinen @ 2024-02-02 15:27 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Bjorn Helgaas, linux-pci, LKML, Mika Westerberg

[-- Attachment #1: Type: text/plain, Size: 7256 bytes --]

On Thu, 1 Feb 2024, Maciej W. Rozycki wrote:

> On Thu, 1 Feb 2024, Ilpo Järvinen wrote:

> > > >  This can be problematic AFAICT however.  While I am not able to verify 
> > > > suspend/resume operation with my devices, I expect the behaviour to be 
> > > > exactly the same after resume as after a bus reset: the link will fail to 
> > > > negotiate and the LBMS and DLLLA bits will be respectively set and clear.  
> > > > Consequently if you clear LBMS at resume, then the workaround won't 
> > > > trigger and the link will remain inoperational in its limbo state.
> > > 
> > > How do you get the LBMS set in the first place? Isn't that because the 
> > > link tries to come up so shouldn't it reassert that bit again before the 
> > > code ends up into the target speed quirk? That is, I assumed you actually 
> > > wanted to detect LBMS getting set during pcie_wait_for_link_status() call 
> > > preceeding pcie_failed_link_retrain() call?
> 
>  It is a good question what the sequence of events exactly is that sets 
> the LBMS bit.  I don't know the answer offhand.
>
> > > In any case and unrelated to this patch, the way this quirk monopolizes 
> > > LBMS bit is going to have to be changed because it won't be reliable with 
> > > the PCIe BW controller that sets up and irq for LBMS (and clears the bit).
> > > In bwctrl v5 (yet to be posted) I'll add LBMS counter into bwctrl to allow 
> > > this quirk to keep working (which will need to be confirmed).
> 
>  If there's an interrupt handler for LBMS events, then it may be the best 
> approach if the quirk is triggered by the handler instead, possibly as a 
> softirq.

Okay, I'll look into changing the code towards that direction. The small 
trouble there is that soon I've very little that can be configured away 
from the bandwidth controller because this quirk already relates to the 
target speed changing code and the LBMS will require the other side to be 
always compiled too...

> > > >  What kind of scenario does the LBMS bit get set in that you have a 
> > > > trouble with?  You write that in your case the downstream device has been 
> > > > disconnected while the bug hierarchy was suspended and it is not present 
> > > > anymore at resume, is that correct?
> > > >
> > > >  But in that case no link negotiation could have been possible and 
> > > > consequently the LBMS bit mustn't have been set by hardware (according to 
> > > > my understanding of PCIe), because (for compliant, non-broken devices 
> > > > anyway) it is only specified to be set for ports that can communicate with 
> > > > the other link end (the spec explicitly says there mustn't have been a 
> > > > transition through the DL_Down status for the port).
> > > >
> > > >  Am I missing something?
> > > 
> > > Yes, when resuming the device is already gone but the bridge still has 
> > > LBMS set. My understanding is that it was set because it was there
> > > from pre-suspend time but I've not really taken a deep look into it 
> > > because the problem and fix seemed obvious.
> 
>  I've always been confused with the suspend/resume terminology: I'd have 
> assumed this would have gone through a power cycle, in which case the LBMS 
> bit would have necessarily been cleared in the transition, because its 
> required state at power-up/reset is 0, so the value of 1 observed would be 
> a result of what has happened solely through the resume stage.  Otherwise 
> it may make sense to clear the bit in the course of the suspend stage, 
> though it wouldn't be race-free I'm afraid.

I also thought suspend as one possibility but yes, it racy. Mika also 
suggested clearing LBMS after each successful retrain but that wouldn't 
cover all possible ways to get LBMS set as devices can set it 
independently of OS. Keeping it cleared constantly is pretty much what 
will happen with the bandwidth controller anyway.

> > > I read that "without the Port transitioning through DL_Down status" 
> > > differently than you, I only interpret that it relates to the two 
> > > bullets following it. ...So if retrain bit is set, and link then goes 
> > > down, the bullet no longer applies and LBMS should not be set because 
> > > there was transition through DL_Down. But I could well be wrong...
> 
> What I refer to is that if you suspend your system, remove the device 
> that originally caused the quirk to trigger and then resume your system 
> with the device absent,

A small correction here, the quirk didn't trigger initially for the 
device, it does that only after resume. And even then quirk is called 
only because the link doesn't come up.

On longer term, I'd actually want to have hotplug resumed earlier so the 
disconnect could be detected before/while all this waiting related to link 
up is done. But that's very complicated to realize in practice because 
hotplug lurks behind portdrv so resuming it earlier isn't going to be 
about just moving a few lines around.

> then LBMS couldn't have been set in the course of 
> resume, because the port couldn't have come out of the DL_Down status in 
> the absence of the downstream device.  Do you interpret it differently?

That's a good question and I don't have an answer to this yet, that is,
I don't fully understand what happens to those device during runtime 
suspend/resume cycle and what is the exact mechanism that preserves the 
LBMS bit, I'll look more into it.

But I agree that if the device goes cold enough and downstream is 
disconnected, the port should no longer have a way to reassert LBMS.

> Of course once set the bit isn't self-clearing except at power-up/reset.

Okay, I misunderstood you meant it would be self-clearing whenever 
DL_Down happens. I can see that could have been one possible 
interpretation of the text fragment in the spec.

> > So I would be really curious now to know how you get the LBMS on the 
> > device that needs the Target Speed quirk? Is this true (from the commit 
> > a89c82249c37 ("PCI: Work around PCIe link training failures")):
> > 
> > "Instead the link continues oscillating between the two speeds, at the 
> > rate of 34-35 times per second, with link training reported repeatedly 
> > active ~84% of the time."
> > 
> > ?
> 
>  That is what I have observed.  It was so long ago I don't remember how I 
> calculated the figures anymore, it may have been with a U-Boot debug patch 
> made to collect samples (because with U-Boot you can just poll the LT bit 
> while busy-looping).  I'd have to try and dig out the old stuff.

I'd guess it probably sets the bit on each try, or perhaps only on the 
subset of tries which were "successful" before the link almost immediately 
runs into another error (that 16% of the time).

> > Because if it is constantly picking another speed, it would mean you get 
> > LBMS set over and over again, no? If that happens 34-35 times per second, 
> > it should be set already again when we get into that quirk because there 
> > was some wait before it gets called.
> 
>  I'll see if I can experiment with the hardware over the next couple of 
> days and come back with my findings.

Okay thanks.


-- 
 i.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] PCI: Do not wait for disconnected devices when resuming
  2024-01-30 13:15     ` Ilpo Järvinen
@ 2024-02-02 17:03       ` Ilpo Järvinen
  0 siblings, 0 replies; 15+ messages in thread
From: Ilpo Järvinen @ 2024-02-02 17:03 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Bjorn Helgaas, Bjorn Helgaas, linux-pci, Maciej W. Rozycki, LKML,
	Mika Westerberg

[-- Attachment #1: Type: text/plain, Size: 2897 bytes --]

On Tue, 30 Jan 2024, Ilpo Järvinen wrote:

> On Mon, 29 Jan 2024, Bjorn Helgaas wrote:
> 
> > On Mon, Jan 29, 2024 at 01:27:10PM +0200, Ilpo Järvinen wrote:
> > > On runtime resume, pci_dev_wait() is called:
> > >   pci_pm_runtime_resume()
> > >     pci_pm_bridge_power_up_actions()
> > >       pci_bridge_wait_for_secondary_bus()
> > >         pci_dev_wait()
> > > 
> > > While a device is runtime suspended along with its PCIe hierarchy, the
> > > device could get disconnected. In such case, the link will not come up
> > > no matter how log pci_dev_wait() waits for it.
> > 
> > s/PCIe/PCI/ (unless this is a PCIe-specific thing)
> > s/log/long/
> > 
> > > Besides the above mentioned case, there could be other ways to get the
> > > device disconnected while pci_dev_wait() is waiting for the link to
> > > come up.
> > > 
> > > Make pci_dev_wait() to exit if the device is already disconnected to
> > > avoid unnecessary delay. As disconnected device is not really even a
> > > failure in the same sense as link failing to come up for whatever
> > > reason, return 0 instead of errno.
> > 
> > The device being disconnected is not the same as a link failure.
> 
> This we agree and it's what I tried to write above.
> 
> > Do
> > all the callers do the right thing if pci_dev_wait() returns success
> > when there's no device there?
> 
> It's a bit complicated. I honestly don't know what is the best approach 
> here so I'm very much open to your suggestion what would be preferrable.
> 
> There are two main use cases (more than two callers but they seem boil 
> down to two use cases).
> 
> One use case is reset related functions and those would probably prefer to 
> have error returned if the wait, and thus reset, failed.
> 
> Then the another is wait for buses, that is, 
> pci_bridge_wait_for_secondary_bus() which return 0 if there's no device 
> (wait successful). For it, it would make sense to return 0 also when 
> device is disconnected because it seems analoguous to the case where 
> there's no device in the first place.
> 
> Perhaps it would be better to return -ENOTTY from pci_dev_wait() and check 
> for that disconnected condition inside 
> pci_bridge_wait_for_secondary_bus()? To further complicate things, 
> however, DPC also depends on the return value of 
> pci_bridge_wait_for_secondary_bus() and from its perspective, returning 
> error when there is a device that is disconnected might be the desirable 
> alternative (but I'm not fully sure how everything in DPC works but I 
> highly suspect I'm correct here).

Just to note here I intend to reverse the return to -ENOTTY in v2. 
It is easier and doing anything more complicated than that felt 
over-engineering because it would have just avoided marking same 
disconnected devices disconnected for another time which is harmless.

-- 
 i.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] PCI: Clear LBMS on resume to avoid Target Speed quirk
  2024-02-02 15:27               ` Ilpo Järvinen
@ 2024-02-07 12:33                 ` Ilpo Järvinen
  0 siblings, 0 replies; 15+ messages in thread
From: Ilpo Järvinen @ 2024-02-07 12:33 UTC (permalink / raw)
  To: Maciej W. Rozycki, Bjorn Helgaas; +Cc: linux-pci, LKML, Mika Westerberg

[-- Attachment #1: Type: text/plain, Size: 6247 bytes --]

On Fri, 2 Feb 2024, Ilpo Järvinen wrote:
> On Thu, 1 Feb 2024, Maciej W. Rozycki wrote:
> > On Thu, 1 Feb 2024, Ilpo Järvinen wrote:
> >
> > > > >  What kind of scenario does the LBMS bit get set in that you have a 
> > > > > trouble with?  You write that in your case the downstream device has been 
> > > > > disconnected while the bug hierarchy was suspended and it is not present 
> > > > > anymore at resume, is that correct?
> > > > >
> > > > >  But in that case no link negotiation could have been possible and 
> > > > > consequently the LBMS bit mustn't have been set by hardware (according to 
> > > > > my understanding of PCIe), because (for compliant, non-broken devices 
> > > > > anyway) it is only specified to be set for ports that can communicate with 
> > > > > the other link end (the spec explicitly says there mustn't have been a 
> > > > > transition through the DL_Down status for the port).
> > > > >
> > > > >  Am I missing something?
> > > > 
> > > > Yes, when resuming the device is already gone but the bridge still has 
> > > > LBMS set. My understanding is that it was set because it was there
> > > > from pre-suspend time but I've not really taken a deep look into it 
> > > > because the problem and fix seemed obvious.
> > 
> >  I've always been confused with the suspend/resume terminology: I'd have 
> > assumed this would have gone through a power cycle, in which case the LBMS 
> > bit would have necessarily been cleared in the transition, because its 
> > required state at power-up/reset is 0, so the value of 1 observed would be 
> > a result of what has happened solely through the resume stage.  Otherwise 
> > it may make sense to clear the bit in the course of the suspend stage, 
> > though it wouldn't be race-free I'm afraid.
> 
> I also thought suspend as one possibility but yes, it racy. Mika also 
> suggested clearing LBMS after each successful retrain but that wouldn't 
> cover all possible ways to get LBMS set as devices can set it 
> independently of OS. Keeping it cleared constantly is pretty much what 
> will happen with the bandwidth controller anyway.
> 
> > > > I read that "without the Port transitioning through DL_Down status" 
> > > > differently than you, I only interpret that it relates to the two 
> > > > bullets following it. ...So if retrain bit is set, and link then goes 
> > > > down, the bullet no longer applies and LBMS should not be set because 
> > > > there was transition through DL_Down. But I could well be wrong...
> > 
> > What I refer to is that if you suspend your system, remove the device 
> > that originally caused the quirk to trigger and then resume your system 
> > with the device absent,
> 
> A small correction here, the quirk didn't trigger initially for the 
> device, it does that only after resume. And even then quirk is called 
> only because the link doesn't come up.
> 
> On longer term, I'd actually want to have hotplug resumed earlier so the 
> disconnect could be detected before/while all this waiting related to link 
> up is done. But that's very complicated to realize in practice because 
> hotplug lurks behind portdrv so resuming it earlier isn't going to be 
> about just moving a few lines around.
> 
> > then LBMS couldn't have been set in the course of 
> > resume, because the port couldn't have come out of the DL_Down status in 
> > the absence of the downstream device.  Do you interpret it differently?
> 
> That's a good question and I don't have an answer to this yet, that is,
> I don't fully understand what happens to those device during runtime 
> suspend/resume cycle and what is the exact mechanism that preserves the 
> LBMS bit, I'll look more into it.
> 
> But I agree that if the device goes cold enough and downstream is 
> disconnected, the port should no longer have a way to reassert LBMS.

It seems that the root cause here is that the bridge ports do not enter 
D3cold but remain in D3hot because of an ACPI power resource that is 
shared (with Thunderbolt in this case but that's just one example, there 
could be other similar sharings outside of what PCI controls).

There is an attempt to power off the entire hierarchy into D3cold on 
suspend but the ports won't reach that state. Because the port remains in 
D3hot, the config space is preserved and LBMS bit along with it.

So it seems that we cannot make the assumption that a device enters D3cold 
just because it was suspended.


On the positive side, it was also tested that the logic fix I sent earlier 
solved the most visible problem which is the delay on resume. It doesn't 
address the false activation of Target Speed quirk because LBMS bit is 
still set but the symptoms are no longer the same because of the 
corrected logic.

So to solve the main issue with delay, there's no need to clear the LBMS 
and the patch you're preparing/testing for pcie_failed_link_retrain()
together with the logic fix on its caller are enough to address the first 
issue.

I still think those two should be put into the same commit because the
true <-> false changes, if made separately, lead to additional
incoherent states but if Bjorn wants them separately, at least they 
should be put back-to-back so the brokeness is just within a single 
commit in the history.

In addition, my 2nd patch addresses another issue which is unrelated 
to this despite similar symptoms with extra delay on resume. I'll send v2 
of that 2nd path separately with the inverted return value.

> > > Because if it is constantly picking another speed, it would mean you get 
> > > LBMS set over and over again, no? If that happens 34-35 times per second, 
> > > it should be set already again when we get into that quirk because there 
> > > was some wait before it gets called.
> > 
> >  I'll see if I can experiment with the hardware over the next couple of 
> > days and come back with my findings.
> 
> Okay thanks.

One point I'd be very interested to know if the link actually comes up 
successfully (even if briefly) because this has large implications whether 
the quirk can actually be invoked from the bandwidth controller code.


-- 
 i.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] PCI: Clear LBMS on resume to avoid Target Speed quirk
  2024-02-01 18:49             ` Maciej W. Rozycki
  2024-02-02 15:27               ` Ilpo Järvinen
@ 2024-02-12 17:56               ` Maciej W. Rozycki
  1 sibling, 0 replies; 15+ messages in thread
From: Maciej W. Rozycki @ 2024-02-12 17:56 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Bjorn Helgaas, linux-pci, LKML, Mika Westerberg

On Thu, 1 Feb 2024, Maciej W. Rozycki wrote:

> > > There's always an option to store it into pci_dev for later use when the 
> > > quirk is found to be working for the first time if other solutions don't 
> > > work.
> 
>  Indeed there is.

 On second thoughts there is not.

 The thing is if we handle a situation where the downstream device can be 
replaced while in suspend (which I infer from the discussion that we do), 
then it could be that one device will require the quirk while the other 
one will not.

 So any previous results have to be considered irrelevant and the whole 
dance with the quirk has to be repeated every time at resume or hot-plug.

  Maciej

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-02-12 17:56 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-29 11:27 [PATCH 0/2] PCI: Fix disconnect related issues Ilpo Järvinen
2024-01-29 11:27 ` [PATCH 1/2] PCI: Clear LBMS on resume to avoid Target Speed quirk Ilpo Järvinen
2024-01-29 18:43   ` Bjorn Helgaas
2024-01-30 11:53     ` Ilpo Järvinen
2024-01-30 16:41       ` Maciej W. Rozycki
2024-01-30 17:33         ` Ilpo Järvinen
2024-02-01  9:47           ` Ilpo Järvinen
2024-02-01 18:49             ` Maciej W. Rozycki
2024-02-02 15:27               ` Ilpo Järvinen
2024-02-07 12:33                 ` Ilpo Järvinen
2024-02-12 17:56               ` Maciej W. Rozycki
2024-01-29 11:27 ` [PATCH 2/2] PCI: Do not wait for disconnected devices when resuming Ilpo Järvinen
2024-01-29 18:55   ` Bjorn Helgaas
2024-01-30 13:15     ` Ilpo Järvinen
2024-02-02 17:03       ` Ilpo Järvinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).