All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Len Brown <lenb@kernel.org>, Lukas Wunner <lukas@wunner.de>,
	Keith Busch <keith.busch@intel.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Alexandru Gagniuc <mr.nuke.me@gmail.com>,
	Kai-Heng Feng <kai.heng.feng@canonical.com>,
	Matthias Andree <matthias.andree@gmx.de>,
	Paul Menzel <pmenzel@molgen.mpg.de>,
	Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec
Date: Sat, 26 Oct 2019 09:19:38 -0500	[thread overview]
Message-ID: <20191026141938.GA177525@google.com> (raw)
In-Reply-To: <20191004123947.11087-3-mika.westerberg@linux.intel.com>

On Fri, Oct 04, 2019 at 03:39:47PM +0300, Mika Westerberg wrote:
> Currently Linux does not follow PCIe spec regarding the required delays
> after reset. A concrete example is a Thunderbolt add-in-card that
> consists of a PCIe switch and two PCIe endpoints:
> 
>   +-1b.0-[01-6b]----00.0-[02-6b]--+-00.0-[03]----00.0 TBT controller
>                                   +-01.0-[04-36]-- DS hotplug port
>                                   +-02.0-[37]----00.0 xHCI controller
>                                   \-04.0-[38-6b]-- DS hotplug port
> 
> The root port (1b.0) and the PCIe switch downstream ports are all PCIe
> gen3 so they support 8GT/s link speeds.
> 
> We wait for the PCIe hierarchy to enter D3cold (runtime):
> 
>   pcieport 0000:00:1b.0: power state changed by ACPI to D3cold
> 
> When it wakes up from D3cold, according to the PCIe 4.0 section 5.8 the
> PCIe switch is put to reset and its power is re-applied. This means that
> we must follow the rules in PCIe 4.0 section 6.6.1.

If you have the PCIe 5.0 spec, can you update these references to
that?  If not, I'm happy to do it for you.

> For the PCIe gen3 ports we are dealing with here, the following applies:
> 
>   With a Downstream Port that supports Link speeds greater than 5.0
>   GT/s, software must wait a minimum of 100 ms after Link training
>   completes before sending a Configuration Request to the device
>   immediately below that Port. Software can determine when Link training
>   completes by polling the Data Link Layer Link Active bit or by setting
>   up an associated interrupt (see Section 6.7.3.3).
> 
> Translating this into the above topology we would need to do this (DLLLA
> stands for Data Link Layer Link Active):
> 
>   0000:00:1b.0: wait for 100 ms after DLLLA is set before access to 0000:01:00.0
>   0000:02:00.0: wait for 100 ms after DLLLA is set before access to 0000:03:00.0
>   0000:02:02.0: wait for 100 ms after DLLLA is set before access to 0000:37:00.0
> 
> I've instrumented the kernel with some additional logging so we can see
> the actual delays performed:
> 
>   pcieport 0000:00:1b.0: power state changed by ACPI to D0
>   pcieport 0000:00:1b.0: waiting for D3cold delay of 100 ms
>   pcieport 0000:00:1b.0: waiting for D3hot delay of 10 ms
>   pcieport 0000:02:01.0: waiting for D3hot delay of 10 ms
>   pcieport 0000:02:04.0: waiting for D3hot delay of 10 ms
> 
> For the switch upstream port (01:00.0 reachable through 00:1b.0 root
> port) we wait for 100 ms but not taking into account the DLLLA
> requirement. We then wait 10 ms for D3hot -> D0 transition of the root
> port and the two downstream hotplug ports. This means that we deviate
> from what the spec requires.
> 
> Performing the same check for system sleep (s2idle) transitions it turns
> out to be even worse. None of the mandatory delays are performed. If
> this would be S3 instead of s2idle then according to PCI FW spec 3.2
> section 4.6.8. there is a specific _DSM that allows the OS to skip the
> delays but this platform does not provide the _DSM and does not go to S3
> anyway so no firmware is involved that could already handle these
> delays.
> 
> On this particular platform these delays are not actually needed because
> there is an additional delay as part of the ACPI power resource that is
> used to turn on power to the hierarchy but since that additional delay
> is not required by any of standards (PCIe, ACPI) it is not present in
> the Intel Ice Lake, for example where missing the mandatory delays
> causes pciehp to start tearing down the stack too early (links are not
> yet trained). Below is an example how it looks like when this happens:
> 
>   pcieport 0000:83:04.0: pciehp: Slot(4): Card not present
>   pcieport 0000:87:04.0: PME# disabled
>   pcieport 0000:83:04.0: pciehp: pciehp_unconfigure_device: domain:bus:dev = 0000:86:00
>   pcieport 0000:86:00.0: Refused to change power state, currently in D3
>   pcieport 0000:86:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x201ff)
>   pcieport 0000:86:00.0: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0)
>   ...
> 
> There is also one reported case (see the bugzilla link below) where the
> missing delay causes xHCI on a Titan Ridge controller fail to runtime
> resume when USB-C dock is plugged. This does not involve pciehp but
> instead it PCI core fails to runtime resume the xHCI device:
> 
>   pcieport 0000:04:02.0: restoring config space at offset 0xc (was 0x10000, writing 0x10020)
>   pcieport 0000:04:02.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100406)
>   xhci_hcd 0000:39:00.0: Refused to change power state, currently in D3
>   xhci_hcd 0000:39:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x1ff)
>   xhci_hcd 0000:39:00.0: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0)
>   ...
> 
> For this reason, introduce a new function pci_bridge_wait_for_secondary_bus()
> that is called on PCI core resume and runtime resume paths accordingly
> if the bridge entered D3cold (and thus went through reset).
> 
> This is second attempt to add the missing delays. The previous solution
> in commit c2bf1fc212f7 ("PCI: Add missing link delays required by the
> PCIe spec") was reverted because of two issues it caused:
> 
>   1. One system become unresponsive after S3 resume due to PME service
>      spinning in pcie_pme_work_fn(). The root port in question reports
>      that the xHCI sent PME but the xHCI device itself does not have PME
>      status set. The PME status bit is never cleared in the root port
>      resulting the indefinite loop in pcie_pme_work_fn().
> 
>   2. Slows down resume if the root/downstream port does not support
>      Data Link Layer Active Reporting because pcie_wait_for_link_delay()
>      waits 1100 ms in that case.
> 
> This version should avoid the above issues because we restrict the delay
> to happen only if the port went into D3cold.
> 
> Link: https://lore.kernel.org/linux-pci/SL2P216MB01878BBCD75F21D882AEEA2880C60@SL2P216MB0187.KORP216.PROD.OUTLOOK.COM/
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=203885
> Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> ---
>  drivers/pci/pci-driver.c | 18 ++++++++
>  drivers/pci/pci.c        | 92 +++++++++++++++++++++++++++++++++++++---
>  drivers/pci/pci.h        |  1 +
>  3 files changed, 104 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index a8124e47bf6e..74a144c9cf4e 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -917,6 +917,7 @@ static int pci_pm_suspend_noirq(struct device *dev)
>  static int pci_pm_resume_noirq(struct device *dev)
>  {
>  	struct pci_dev *pci_dev = to_pci_dev(dev);
> +	bool d3cold = pci_dev->current_state == PCI_D3cold;
>  	struct device_driver *drv = dev->driver;
>  	int error = 0;
>  
> @@ -947,6 +948,14 @@ static int pci_pm_resume_noirq(struct device *dev)
>  
>  	pcie_pme_root_status_cleanup(pci_dev);
>  
> +	/*
> +	 * If the hierarchy went into D3cold wait for the secondary bus to
> +	 * become accessible. This is important for PCIe to prevent pciehp
> +	 * from tearing down the downstream devices too soon.

The pciehp connection isn't obvious to me.

> +	 */
> +	if (d3cold)
> +		pci_bridge_wait_for_secondary_bus(pci_dev);

This will need to be rebased on top of my pci/pm branch, but I think
that's minor.

Can we move this closer to where we initiate the reset?  It's pretty
hard to tell from looking at pci_pm_resume_noirq() that there's a
reset happening here.

For D3cold->D0, I guess that would be somewhere down in
platform_pci_set_power_state()?  Maybe acpi_pci_set_power_state()?
What about the mid_pci_set_power_state() path?  Does that need this
too?

In the ACPI spec, _PS0 doesn't say anything about delays.  _ON (which
I assume is not for PCI devices themselves) *does* say firmware is
responsible for sequencing delays, so I would tend to assume it's
really firmware's job and we shouldn't need to do this in the kernel
at all.

What about D3hot->D0?  When a bridge (Root Port or Switch Downstream
Port) is in D3hot, I'm not really clear on the state of its link.  If
the link is down, I assume putting the bridge in D0 will bring it up
and we'd have to wait for that?  If so, we'd need to do something in
the reset path, e.g., pci_pm_reset()?

Sketch of control flow for this patch:

  pci_pm_resume_noirq
    if (...)
      pci_pm_default_resume_early
	pci_power_up
	  if (...)
	    platform_pci_set_power_state
	      acpi_pci_set_power_state
	  pci_raw_set_power_state(D0)
+   if (d3cold)
+     pci_bridge_wait_for_secondary_bus
    if (legacy)
      pci_legacy_resume_early
    else
      drv->pm->resume_noirq

>  	if (drv && drv->pm && drv->pm->resume_noirq)
>  		error = drv->pm->resume_noirq(dev);

  reply	other threads:[~2019-10-26 14:19 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-04 12:39 [PATCH v2 0/2] PCI: Add missing link delays Mika Westerberg
2019-10-04 12:39 ` [PATCH v2 1/2] PCI: Introduce pcie_wait_for_link_delay() Mika Westerberg
2020-08-08 20:22   ` Marc MERLIN
2020-08-08 20:23     ` Marc MERLIN
2020-08-09 16:31     ` Marc MERLIN
2020-09-06 18:18     ` pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73) Marc MERLIN
2020-09-06 18:18       ` Marc MERLIN
2020-09-06 18:26       ` Matthias Andree
2020-09-07 19:14       ` [Nouveau] " Karol Herbst
2020-09-07 19:14         ` Karol Herbst
2020-09-07 20:58         ` [Nouveau] " Marc MERLIN
2020-09-07 20:58           ` Marc MERLIN
2020-09-07 23:51           ` [Nouveau] " Karol Herbst
2020-09-07 23:51             ` Karol Herbst
2020-09-08  0:29             ` [Nouveau] " Marc MERLIN
2020-05-29 18:03               ` 5.5 kernel: using nouveau or something else just long enough to turn off Quadro RTX 4000 Mobile for hybrid graphics? Marc MERLIN
     [not found]                 ` <20200529180315.GA18804-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
2020-05-29 18:53                   ` Ilia Mirkin
     [not found]                     ` <CAKb7Uvhw2EYo1RR-=NGgLO3CU9QTRWchcAw1injffybZbJ-zOA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-05-29 19:46                       ` Marc MERLIN
     [not found]                         ` <20200529194605.GB18804-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
2020-05-30 17:32                           ` Karol Herbst
2023-04-19  6:49                         ` [Nouveau] 6.1 still cannot get display on Thinkpad P73Quadro " Marc MERLIN
2023-04-21  5:46                           ` [Nouveau] 6.2 still cannot get hdmi display out on Thinkpad P73 Quadro RTX 4000 Mobile/TU104 Marc MERLIN
     [not found]                       ` <CACO55tsvY0t_z986VVoYCvxuBASdZ+rQcDtZ_dAtQR60NLmQQw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-05-31 18:31                         ` 5.5 kernel: using nouveau or something else just long enough to turn off Quadro RTX 4000 Mobile for hybrid graphics? Marc MERLIN
2020-12-26 11:12                 ` 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile) Marc MERLIN
2020-12-26 11:12                   ` Marc MERLIN
2020-12-27 18:28                   ` [Nouveau] " Ilia Mirkin
2020-12-27 18:28                     ` Ilia Mirkin
2021-01-27 21:33                   ` Bjorn Helgaas
2021-01-27 21:33                     ` Bjorn Helgaas
2021-01-28 20:59                     ` Bjorn Helgaas
2021-01-28 20:59                       ` [Nouveau] " Bjorn Helgaas
2021-01-29  0:56                     ` Marc MERLIN
2021-01-29  0:56                       ` [Nouveau] " Marc MERLIN
2021-01-29 21:20                       ` Bjorn Helgaas
2021-01-29 21:20                         ` [Nouveau] " Bjorn Helgaas
2021-01-30  2:04                         ` Marc MERLIN
2021-01-30  2:04                           ` [Nouveau] " Marc MERLIN
2021-05-05 21:42                           ` [Nouveau] 5.12.1 0010:nvkm_falcon_v1_wait_for_halt+0x8f/0xb9 [nouveau] Marc MERLIN
2021-05-06 14:50                             ` Bjorn Helgaas
2021-05-25  3:13                               ` Ben Skeggs
2020-12-29 15:51                 ` 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile) Marc MERLIN
2020-12-29 15:51                   ` Marc MERLIN
2020-12-29 16:33                   ` Ilia Mirkin
2020-12-29 16:33                     ` Ilia Mirkin
     [not found]                     ` <CAKb7UviFP_YVxC4PO7MDNnw6NDrD=3BCGF37umwAfaimjbX9Pw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-12-29 17:47                       ` Marc MERLIN
     [not found]                         ` <20201229174750.GI23389-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
2021-01-04 11:49                           ` Marc MERLIN
     [not found]                             ` <20210104114955.GM32533-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
2021-01-04 13:28                               ` Karol Herbst
     [not found]                                 ` <CACO55tsdG37YKv7FV2er4hRnXk9vmwMbPuPptA+=ZtziWXC2+g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-01-07 11:49                                   ` Marc MERLIN
2020-12-30 12:16                       ` ael
2020-09-13 20:15               ` [Nouveau] pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73) Marc MERLIN
2020-09-13 20:15                 ` Marc MERLIN
     [not found]                 ` <20200913201545.GL2622-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
2020-09-19 23:18                   ` Marc MERLIN
2019-10-04 12:39 ` [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec Mika Westerberg
2019-10-26 14:19   ` Bjorn Helgaas [this message]
2019-10-28 11:28     ` Mika Westerberg
2019-10-28 13:42       ` Bjorn Helgaas
2019-10-28 18:06         ` Mika Westerberg
2019-10-28 20:16           ` Bjorn Helgaas
2019-10-29 11:15             ` Mika Westerberg
2019-10-29 20:27               ` Bjorn Helgaas
2019-10-30 11:15                 ` Mika Westerberg
2019-10-31 22:31                   ` Bjorn Helgaas
2019-11-01 11:19                     ` Mika Westerberg
2019-11-05  0:00                       ` Bjorn Helgaas
2019-11-05  9:54                         ` Mika Westerberg
2019-11-05 12:58                           ` Mika Westerberg
2019-11-05 20:01                             ` Bjorn Helgaas
2019-11-06 13:31                               ` Mika Westerberg
2019-11-05 15:00                           ` Bjorn Helgaas
2019-11-05 15:28                             ` Mika Westerberg
2019-11-05 16:10                               ` Bjorn Helgaas
2019-11-06 13:29                                 ` Mika Westerberg
2019-10-29 20:54   ` Bjorn Helgaas
2019-10-30 11:33     ` Mika Westerberg
2019-10-04 12:57 ` [PATCH v2 0/2] PCI: Add missing link delays Matthias Andree
2019-10-04 13:06   ` Mika Westerberg
2019-10-05  7:34     ` Matthias Andree
2019-10-07  9:32       ` Mika Westerberg
2019-10-07 15:15         ` Matthias Andree
2019-10-08  9:05           ` Mika Westerberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191026141938.GA177525@google.com \
    --to=helgaas@kernel.org \
    --cc=alex.williamson@redhat.com \
    --cc=kai.heng.feng@canonical.com \
    --cc=keith.busch@intel.com \
    --cc=lenb@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=matthias.andree@gmx.de \
    --cc=mika.westerberg@linux.intel.com \
    --cc=mr.nuke.me@gmail.com \
    --cc=nicholas.johnson-opensource@outlook.com.au \
    --cc=pmenzel@molgen.mpg.de \
    --cc=rjw@rjwysocki.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.