All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Len Brown <lenb@kernel.org>, Lukas Wunner <lukas@wunner.de>,
	Keith Busch <keith.busch@intel.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Alexandru Gagniuc <mr.nuke.me@gmail.com>,
	Kai-Heng Feng <kai.heng.feng@canonical.com>,
	Paul Menzel <pmenzel@molgen.mpg.de>,
	Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec
Date: Thu, 31 Oct 2019 17:31:44 -0500	[thread overview]
Message-ID: <20191031223144.GA81598@google.com> (raw)
In-Reply-To: <20191030111516.GX2593@lahna.fi.intel.com>

On Wed, Oct 30, 2019 at 01:15:16PM +0200, Mika Westerberg wrote:
> On Tue, Oct 29, 2019 at 03:27:09PM -0500, Bjorn Helgaas wrote:
> > On Tue, Oct 29, 2019 at 01:15:20PM +0200, Mika Westerberg wrote:
> > > On Mon, Oct 28, 2019 at 03:16:53PM -0500, Bjorn Helgaas wrote:
> > > > > The related hardware event is resume in this case. Can you point
> > > > > me to the actual point where you want me to put this?
> > > > 
> > > > "Resume" is a Linux software concept, so of course the PCIe spec
> > > > doesn't say anything about it.  The spec talks about delays
> > > > related to resets and device power and link state transitions, so
> > > > somehow we have to connect the Linux delay with those hardware
> > > > events.
> > > > 
> > > > Since we're talking about a transition from D3cold, this has to be
> > > > done via something external to the device such as power
> > > > regulators.  For ACPI systems that's probably hidden inside _PS0
> > > > or something similar.  That's opaque, but at least it's a hook
> > > > that says "here's where we put the device into D0".  I suggested
> > > > acpi_pci_set_power_state() as a possibility since I think that's
> > > > the lowest-level point where we have the pci_dev so we know the
> > > > current state and the new state.
> > > 
> > > I looked at how we could use acpi_pci_set_power_state() but I don't
> > > think it is possible because it is likely that only the root port
> > > has the power resource that is used to bring the link to L2 or L3.
> > > However, we would need to repeat the delay for each downstream/root
> > > port if there are multiple PCIe switches in the topology.
> > 
> > OK, I think I understand why that's a problem (correct me if I'm
> > wrong):
> > 
> >   We call pci_pm_resume_noirq() for every device, but it only calls
> >   acpi_pci_set_power_state() for devices that have _PS0 or _PR0
> >   methods.  So if the delay is in acpi_pci_set_power_state() and we
> >   have A -> B -> C where only A has _PS0, we would delay for the link
> >   to B to come up, but not for the link to C.
> 
> Yes, that's correct.
> 
> > I do see that we do need both delays.  In acpi_pci_set_power_state()
> > when we transition A from D3cold->D0, I assume that single _PS0
> > evaluation on A causes B to transition from D3cold->D3hot, which in
> > turn causes C to transition from D3cold->D3hot.  Is that your
> > understanding, too?
> 
> Not exactly :)
> 
> It is _ON() that causes the links to be retrained and it also causes the
> PERST# (reset) to be unasserted for the whole topology transitioning all
> devices into D0unitialized (default value for PMCSR PowerState field is 0).

OK.  I guess the important thing is that a single power-on from D3cold
at any point in the hierarchy can power on the entire subtree rooted
at that point.  So if we have RP -> SUP -> SDP0..SDP7 where SDP0..SDP7
are Switch Downstream Ports, when we evaluate _ON for RP, PERST# will
be deasserted below it, and everything downstream should start the
process of going to D0uninitialized.

And we can't rely on any other hooks like _ON/_PS0 invocations for
SUP, SDPx, etc, where we could do additional delays.

> > If the delay is in pci_pm_resume_noirq() (as in your patch), what
> > happens with a switch with several Downstream Ports?  I assume that
> > all the Downstream Ports start their transition out of D3cold
> > basically simultaneously, so we probably don't need N delays, do we?
> 
> No. Actually Linux already resumes these in parallel because async
> suspend is set for them (for system suspend that is).

So I think we have something like this:

  pci_pm_resume_noirq(RP)
    pdev->current_state == PCI_D3cold  # HW actually in D3cold
    _ON(RP)                            # turns on entire hierarchy
    pci_bridge_wait_for_secondary_bus  # waits only for RP -> SUP link

  pci_pm_resume_noirq(SUP)
    pdev->current_state == PCI_D3cold  # HW probably in D0uninitialized
    pci_bridge_wait_for_secondary_bus  # no wait (not a downstream port)

  pci_pm_resume_noirq(SDP0)
    pdev->current_state == PCI_D3cold  # HW probably in D0uninitialized
    pci_bridge_wait_for_secondary_bus  # waits for SDP0 -> ? link

  ...

  pci_pm_resume_noirq(SDP7)
    pdev->current_state == PCI_D3cold  # HW probably in D0uninitialized
    pci_bridge_wait_for_secondary_bus  # waits for SDP7 -> ? link

and we have 1 delay for the Root Port plus 8 delays (one for each
Switch Downstream Port), and as soon as SUP has been resumed,
SDP0..SDP7 can be resumed simultaneously (assuming async is set for
them)?

I'm not a huge fan of relying on async because the asynchrony is far
removed from this code and really hard to figure out.  Maybe an
alternative would be to figure out in the pci_pm_resume_noirq(RP) path
how many levels of links to wait for.

Ideally someone expert in PCIe but not in Linux would be able to look
at the local code and verify that it matches the spec.  If verification
requires extensive analysis or someone expert in *both* PCIe and
Linux, it makes maintenance much harder.

> > The outline of the pci_pm_resume_noirq() part of this patch is:
> > 
> >   pci_pm_resume_noirq
> >     if (!dev->skip_bus_pm ...)   # <-- condition 1
> >       pci_pm_default_resume_early
> >         pci_power_up
> >           if (platform_pci_power_manageable())   # _PS0 or _PR0 exist?
> >             platform_pci_set_power_state
> >               pci_platform_pm->set_state
> >                 acpi_pci_set_power_state(PCI_D0) # acpi_pci_platform_pm.set_state
> >                   acpi_device_set_power(ACPI_STATE_D0) # <-- eval _PS0
> > +   if (d3cold)                  # <-- condition 2
> > +     pci_bridge_wait_for_secondary_bus
> > 
> > Another thing that niggles at me here is that the condition for
> > calling pci_bridge_wait_for_secondary_bus() is completely different
> > than the condition for changing the power state.  If we didn't change
> > the power state, there's no reason to wait, is there?
> 
> Indeed, if you are talking about the dev->skip_bus_pm check there is no
> point to wait if we did not change the power state. I would assume that
> d3cold is false in that case but we could also do this for clarity:
> 
> 	if (!dev->skip_bus_pm && d3cold)
> 		pci_bridge_wait_for_secondary_bus(...)

Could the wait go in pci_power_up()?  That would at least connect it
directly with a -> D0 transition.  Or, if that doesn't seem the right
place for it, could we do the following?

  if (!(pci_dev->skip_bus_pm && pm_suspend_no_platform())) {
    pci_pm_default_resume_early(pci_dev);
    if (d3cold)
      pci_bridge_wait_for_secondary_bus(pci_dev);
  }

  pci_fixup_device(pci_fixup_resume_early, pci_dev);
  pcie_pme_root_status_cleanup(pci_dev);

  if (pci_has_legacy_pm_support(pci_dev))
    return pci_legacy_resume_early(dev);
  ...

Either way would also fix the problem that with the current patch, if
the device has legacy PM support, we call pci_legacy_resume_early()
but we don't wait for the secondary bus.

> > The outline of the pci_pm_runtime_resume() part of this patch is:
> > 
> >   pci_pm_runtime_resume
> >     pci_restore_standard_config
> >       if (dev->current_state != PCI_D0)
> >         pci_set_power_state(PCI_D0)
> >           __pci_start_power_transition
> >             pci_platform_power_transition
> >               if (platform_pci_power_manageable())   # _PS0 or _PR0 exist?
> >                 platform_pci_set_power_state
> >                   pci_platform_pm->set_state
> >                     acpi_pci_set_power_state(PCI_D0) # acpi_pci_platform_pm.set_state
> >                       acpi_device_set_power(ACPI_STATE_D0) # <-- eval _PS0
> >               pci_raw_set_power_state
> >           __pci_complete_power_transition
> > +   if (d3cold)
> > +     pci_bridge_wait_for_secondary_bus
> > 
> > In this part, the power state change is inside
> > pci_restore_standard_config(), which calls pci_set_power_state().
> > There are many other callers of pci_set_power_state(); can we be sure
> > that none of them need a delay?
> 
> Since we are handling the delay when we resume the downstream port, not
> when we resume the device itself, I think the link should be up already
> and the device accessible if someone calls pci_set_power_state() for it
> (as the parent is always resumed before children).

Ah, yeah, I guess that since all the calls I see are for non-bridge
devices, there would be no delay for a secondary bus.

This is a tangent, but there are ~140 pci_set_power_state(PCI_D0)
calls, mostly from .resume() methods of drivers with legacy PM.  Are
those even necessary?  It looks like the PCI core does this so the
driver wouldn't need to:

  pci_pm_resume_noirq
    pci_pm_default_resume_early
      pci_power_up
        pci_raw_set_power_state(dev, PCI_D0)   # <-- PCI core

  pci_pm_resume
    if (pci_has_legacy_pm_support(pci_dev))
      pci_legacy_resume(dev)
        drv->resume
	  pci_set_power_state(PCI_D0)          # <-- driver .resume()

> > > > But it seems that at least some ACPI firmware doesn't do those
> > > > delays, so I guess our only alternatives are to always do it in
> > > > the OS or have some sort of blacklist.  And it doesn't really seem
> > > > practical to maintain a blacklist.
> > > 
> > > I really think this is crystal clear:
> > 
> > I am agreeing with you that the OS needs to do the delays.

Did you miss this part?  I said below that the existence of the _DSM
*by itself* doesn't convince me.  But I think the lack of clarity and
the fact that at least some firmware doesn't do it means that the OS
must do it.

> > > The OS is always responsible for the delays described in the PCIe
> > > spec.
> > 
> > If the ACPI spec contained this statement, it would be useful, but I
> > haven't seen it.  It's certainly true that some combination of
> > firmware and the OS is responsible for the delays :)
> > 
> > > However, if the platform implements some of them say in _ON or _PS0
> > > methods then it can notify the OS about this by using the _DSM so
> > > the OS does not need to duplicate all of them.
> > 
> > That makes good sense, but there are other reasons for using that
> > _DSM, e.g., firmware may know that MID or similar devices are not
> > really PCI devices and don't need delays anywhere.  So the existence
> > of the _DSM by itself doesn't convince me that the OS is responsible
> > for the delays.
> 
> Hmm, my interpretion of the specs is that OS is responsible for these
> delays but if you can't be convinced then how you propose we handle this
> problem? I mean there are two cases already listed in the changelog of
> this patch from a real systems that need these delays. I don't think we
> can just say people that unfortunately your system will not be supported
> by Linux because we are not convinced that OS should do these delays. ;-)

  reply	other threads:[~2019-10-31 22:31 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-04 12:39 [PATCH v2 0/2] PCI: Add missing link delays Mika Westerberg
2019-10-04 12:39 ` [PATCH v2 1/2] PCI: Introduce pcie_wait_for_link_delay() Mika Westerberg
2020-08-08 20:22   ` Marc MERLIN
2020-08-08 20:23     ` Marc MERLIN
2020-08-09 16:31     ` Marc MERLIN
2020-09-06 18:18     ` pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73) Marc MERLIN
2020-09-06 18:18       ` Marc MERLIN
2020-09-06 18:26       ` Matthias Andree
2020-09-07 19:14       ` [Nouveau] " Karol Herbst
2020-09-07 19:14         ` Karol Herbst
2020-09-07 20:58         ` [Nouveau] " Marc MERLIN
2020-09-07 20:58           ` Marc MERLIN
2020-09-07 23:51           ` [Nouveau] " Karol Herbst
2020-09-07 23:51             ` Karol Herbst
2020-09-08  0:29             ` [Nouveau] " Marc MERLIN
2020-05-29 18:03               ` 5.5 kernel: using nouveau or something else just long enough to turn off Quadro RTX 4000 Mobile for hybrid graphics? Marc MERLIN
     [not found]                 ` <20200529180315.GA18804-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
2020-05-29 18:53                   ` Ilia Mirkin
     [not found]                     ` <CAKb7Uvhw2EYo1RR-=NGgLO3CU9QTRWchcAw1injffybZbJ-zOA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-05-29 19:46                       ` Marc MERLIN
     [not found]                         ` <20200529194605.GB18804-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
2020-05-30 17:32                           ` Karol Herbst
2023-04-19  6:49                         ` [Nouveau] 6.1 still cannot get display on Thinkpad P73Quadro " Marc MERLIN
2023-04-21  5:46                           ` [Nouveau] 6.2 still cannot get hdmi display out on Thinkpad P73 Quadro RTX 4000 Mobile/TU104 Marc MERLIN
     [not found]                       ` <CACO55tsvY0t_z986VVoYCvxuBASdZ+rQcDtZ_dAtQR60NLmQQw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-05-31 18:31                         ` 5.5 kernel: using nouveau or something else just long enough to turn off Quadro RTX 4000 Mobile for hybrid graphics? Marc MERLIN
2020-12-26 11:12                 ` 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile) Marc MERLIN
2020-12-26 11:12                   ` Marc MERLIN
2020-12-27 18:28                   ` [Nouveau] " Ilia Mirkin
2020-12-27 18:28                     ` Ilia Mirkin
2021-01-27 21:33                   ` Bjorn Helgaas
2021-01-27 21:33                     ` Bjorn Helgaas
2021-01-28 20:59                     ` Bjorn Helgaas
2021-01-28 20:59                       ` [Nouveau] " Bjorn Helgaas
2021-01-29  0:56                     ` Marc MERLIN
2021-01-29  0:56                       ` [Nouveau] " Marc MERLIN
2021-01-29 21:20                       ` Bjorn Helgaas
2021-01-29 21:20                         ` [Nouveau] " Bjorn Helgaas
2021-01-30  2:04                         ` Marc MERLIN
2021-01-30  2:04                           ` [Nouveau] " Marc MERLIN
2021-05-05 21:42                           ` [Nouveau] 5.12.1 0010:nvkm_falcon_v1_wait_for_halt+0x8f/0xb9 [nouveau] Marc MERLIN
2021-05-06 14:50                             ` Bjorn Helgaas
2021-05-25  3:13                               ` Ben Skeggs
2020-12-29 15:51                 ` 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile) Marc MERLIN
2020-12-29 15:51                   ` Marc MERLIN
2020-12-29 16:33                   ` Ilia Mirkin
2020-12-29 16:33                     ` Ilia Mirkin
     [not found]                     ` <CAKb7UviFP_YVxC4PO7MDNnw6NDrD=3BCGF37umwAfaimjbX9Pw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-12-29 17:47                       ` Marc MERLIN
     [not found]                         ` <20201229174750.GI23389-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
2021-01-04 11:49                           ` Marc MERLIN
     [not found]                             ` <20210104114955.GM32533-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
2021-01-04 13:28                               ` Karol Herbst
     [not found]                                 ` <CACO55tsdG37YKv7FV2er4hRnXk9vmwMbPuPptA+=ZtziWXC2+g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-01-07 11:49                                   ` Marc MERLIN
2020-12-30 12:16                       ` ael
2020-09-13 20:15               ` [Nouveau] pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73) Marc MERLIN
2020-09-13 20:15                 ` Marc MERLIN
     [not found]                 ` <20200913201545.GL2622-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
2020-09-19 23:18                   ` Marc MERLIN
2019-10-04 12:39 ` [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec Mika Westerberg
2019-10-26 14:19   ` Bjorn Helgaas
2019-10-28 11:28     ` Mika Westerberg
2019-10-28 13:42       ` Bjorn Helgaas
2019-10-28 18:06         ` Mika Westerberg
2019-10-28 20:16           ` Bjorn Helgaas
2019-10-29 11:15             ` Mika Westerberg
2019-10-29 20:27               ` Bjorn Helgaas
2019-10-30 11:15                 ` Mika Westerberg
2019-10-31 22:31                   ` Bjorn Helgaas [this message]
2019-11-01 11:19                     ` Mika Westerberg
2019-11-05  0:00                       ` Bjorn Helgaas
2019-11-05  9:54                         ` Mika Westerberg
2019-11-05 12:58                           ` Mika Westerberg
2019-11-05 20:01                             ` Bjorn Helgaas
2019-11-06 13:31                               ` Mika Westerberg
2019-11-05 15:00                           ` Bjorn Helgaas
2019-11-05 15:28                             ` Mika Westerberg
2019-11-05 16:10                               ` Bjorn Helgaas
2019-11-06 13:29                                 ` Mika Westerberg
2019-10-29 20:54   ` Bjorn Helgaas
2019-10-30 11:33     ` Mika Westerberg
2019-10-04 12:57 ` [PATCH v2 0/2] PCI: Add missing link delays Matthias Andree
2019-10-04 13:06   ` Mika Westerberg
2019-10-05  7:34     ` Matthias Andree
2019-10-07  9:32       ` Mika Westerberg
2019-10-07 15:15         ` Matthias Andree
2019-10-08  9:05           ` Mika Westerberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191031223144.GA81598@google.com \
    --to=helgaas@kernel.org \
    --cc=alex.williamson@redhat.com \
    --cc=kai.heng.feng@canonical.com \
    --cc=keith.busch@intel.com \
    --cc=lenb@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=mika.westerberg@linux.intel.com \
    --cc=mr.nuke.me@gmail.com \
    --cc=nicholas.johnson-opensource@outlook.com.au \
    --cc=pmenzel@molgen.mpg.de \
    --cc=rjw@rjwysocki.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.