All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Len Brown <lenb@kernel.org>, Lukas Wunner <lukas@wunner.de>,
	Keith Busch <keith.busch@intel.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Alexandru Gagniuc <mr.nuke.me@gmail.com>,
	Kai-Heng Feng <kai.heng.feng@canonical.com>,
	Matthias Andree <matthias.andree@gmx.de>,
	Paul Menzel <pmenzel@molgen.mpg.de>,
	Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec
Date: Tue, 29 Oct 2019 15:27:09 -0500	[thread overview]
Message-ID: <20191029202708.GA38926@google.com> (raw)
In-Reply-To: <20191029111520.GE2593@lahna.fi.intel.com>

On Tue, Oct 29, 2019 at 01:15:20PM +0200, Mika Westerberg wrote:
> On Mon, Oct 28, 2019 at 03:16:53PM -0500, Bjorn Helgaas wrote:
> > > The related hardware event is resume in this case. Can you point
> > > me to the actual point where you want me to put this?
> > 
> > "Resume" is a Linux software concept, so of course the PCIe spec
> > doesn't say anything about it.  The spec talks about delays
> > related to resets and device power and link state transitions, so
> > somehow we have to connect the Linux delay with those hardware
> > events.
> > 
> > Since we're talking about a transition from D3cold, this has to be
> > done via something external to the device such as power
> > regulators.  For ACPI systems that's probably hidden inside _PS0
> > or something similar.  That's opaque, but at least it's a hook
> > that says "here's where we put the device into D0".  I suggested
> > acpi_pci_set_power_state() as a possibility since I think that's
> > the lowest-level point where we have the pci_dev so we know the
> > current state and the new state.
> 
> I looked at how we could use acpi_pci_set_power_state() but I don't
> think it is possible because it is likely that only the root port
> has the power resource that is used to bring the link to L2 or L3.
> However, we would need to repeat the delay for each downstream/root
> port if there are multiple PCIe switches in the topology.

OK, I think I understand why that's a problem (correct me if I'm
wrong):

  We call pci_pm_resume_noirq() for every device, but it only calls
  acpi_pci_set_power_state() for devices that have _PS0 or _PR0
  methods.  So if the delay is in acpi_pci_set_power_state() and we
  have A -> B -> C where only A has _PS0, we would delay for the link
  to B to come up, but not for the link to C.

I do see that we do need both delays.  In acpi_pci_set_power_state()
when we transition A from D3cold->D0, I assume that single _PS0
evaluation on A causes B to transition from D3cold->D3hot, which in
turn causes C to transition from D3cold->D3hot.  Is that your
understanding, too?

We do know that topology in acpi_pci_set_power_state(), since we have
the pci_dev for A, so it seems conceivable that we could descend the
hierarchy and delay for each level.

If the delay is in pci_pm_resume_noirq() (as in your patch), what
happens with a switch with several Downstream Ports?  I assume that
all the Downstream Ports start their transition out of D3cold
basically simultaneously, so we probably don't need N delays, do we?
It seems a little messy to optimize this in pci_pm_resume_noirq().

The outline of the pci_pm_resume_noirq() part of this patch is:

  pci_pm_resume_noirq
    if (!dev->skip_bus_pm ...)   # <-- condition 1
      pci_pm_default_resume_early
        pci_power_up
          if (platform_pci_power_manageable())   # _PS0 or _PR0 exist?
            platform_pci_set_power_state
              pci_platform_pm->set_state
                acpi_pci_set_power_state(PCI_D0) # acpi_pci_platform_pm.set_state
                  acpi_device_set_power(ACPI_STATE_D0) # <-- eval _PS0
+   if (d3cold)                  # <-- condition 2
+     pci_bridge_wait_for_secondary_bus

Another thing that niggles at me here is that the condition for
calling pci_bridge_wait_for_secondary_bus() is completely different
than the condition for changing the power state.  If we didn't change
the power state, there's no reason to wait, is there?

The outline of the pci_pm_runtime_resume() part of this patch is:

  pci_pm_runtime_resume
    pci_restore_standard_config
      if (dev->current_state != PCI_D0)
        pci_set_power_state(PCI_D0)
          __pci_start_power_transition
            pci_platform_power_transition
              if (platform_pci_power_manageable())   # _PS0 or _PR0 exist?
                platform_pci_set_power_state
                  pci_platform_pm->set_state
                    acpi_pci_set_power_state(PCI_D0) # acpi_pci_platform_pm.set_state
                      acpi_device_set_power(ACPI_STATE_D0) # <-- eval _PS0
              pci_raw_set_power_state
          __pci_complete_power_transition
+   if (d3cold)
+     pci_bridge_wait_for_secondary_bus

In this part, the power state change is inside
pci_restore_standard_config(), which calls pci_set_power_state().
There are many other callers of pci_set_power_state(); can we be sure
that none of them need a delay?

> > But it seems that at least some ACPI firmware doesn't do those
> > delays, so I guess our only alternatives are to always do it in
> > the OS or have some sort of blacklist.  And it doesn't really seem
> > practical to maintain a blacklist.
> 
> I really think this is crystal clear:

I am agreeing with you that the OS needs to do the delays.

> The OS is always responsible for the delays described in the PCIe
> spec.

If the ACPI spec contained this statement, it would be useful, but I
haven't seen it.  It's certainly true that some combination of
firmware and the OS is responsible for the delays :)

> However, if the platform implements some of them say in _ON or _PS0
> methods then it can notify the OS about this by using the _DSM so
> the OS does not need to duplicate all of them.

That makes good sense, but there are other reasons for using that
_DSM, e.g., firmware may know that MID or similar devices are not
really PCI devices and don't need delays anywhere.  So the existence
of the _DSM by itself doesn't convince me that the OS is responsible
for the delays.

> > > > In pci_pm_reset(), we're doing the D0->D3hot->D0 transitions
> > > > specifically to do a reset, so No_Soft_Reset is false.
> > > > Doesn't 6.6.1 say we need at least 100ms here?
> > > 
> > > No since it does not go into D3cold. It just "reset" the thing
> > > if it happens to do internal reset after D3hot -> D0.
> > 
> > Sec 5.8, Figure 5-18 says D3hot->D0uninitialized is a "Soft
> > Reset", which unfortunately is not defined.
> > 
> > My guess is that in sec 5.9, Table 5-13, the 10ms delay is for the
> > D3hot->D0active (i.e., No_Soft_Reset=1) transition, and the
> > D3hot->D0uninitialized (i.e., No_Soft_Reset=0) that does a "soft
> > reset" (whatever that is) probably requires more and we should
> > handle it like a conventional reset to be safe.
> 
> I think it simply means the device functional context is lost (there
> is more in section 5.3.1.4). Linux handles this properly already
> (well at least according the minimum timings required by the spec)
> and restores the context accordingly after it has waited for the
> 10ms.
> 
> It is the D3cold (where links go to L2 or L3) where we really need
> the delays so that the link gets properly trained before we start
> poking the downstream device.

I'm already speculating above, so I don't think I can contribute
anything useful here.

  reply	other threads:[~2019-10-29 20:27 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-04 12:39 [PATCH v2 0/2] PCI: Add missing link delays Mika Westerberg
2019-10-04 12:39 ` [PATCH v2 1/2] PCI: Introduce pcie_wait_for_link_delay() Mika Westerberg
2020-08-08 20:22   ` Marc MERLIN
2020-08-08 20:23     ` Marc MERLIN
2020-08-09 16:31     ` Marc MERLIN
2020-09-06 18:18     ` pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73) Marc MERLIN
2020-09-06 18:18       ` Marc MERLIN
2020-09-06 18:26       ` Matthias Andree
2020-09-07 19:14       ` [Nouveau] " Karol Herbst
2020-09-07 19:14         ` Karol Herbst
2020-09-07 20:58         ` [Nouveau] " Marc MERLIN
2020-09-07 20:58           ` Marc MERLIN
2020-09-07 23:51           ` [Nouveau] " Karol Herbst
2020-09-07 23:51             ` Karol Herbst
2020-09-08  0:29             ` [Nouveau] " Marc MERLIN
2020-05-29 18:03               ` 5.5 kernel: using nouveau or something else just long enough to turn off Quadro RTX 4000 Mobile for hybrid graphics? Marc MERLIN
     [not found]                 ` <20200529180315.GA18804-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
2020-05-29 18:53                   ` Ilia Mirkin
     [not found]                     ` <CAKb7Uvhw2EYo1RR-=NGgLO3CU9QTRWchcAw1injffybZbJ-zOA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-05-29 19:46                       ` Marc MERLIN
     [not found]                         ` <20200529194605.GB18804-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
2020-05-30 17:32                           ` Karol Herbst
2023-04-19  6:49                         ` [Nouveau] 6.1 still cannot get display on Thinkpad P73Quadro " Marc MERLIN
2023-04-21  5:46                           ` [Nouveau] 6.2 still cannot get hdmi display out on Thinkpad P73 Quadro RTX 4000 Mobile/TU104 Marc MERLIN
     [not found]                       ` <CACO55tsvY0t_z986VVoYCvxuBASdZ+rQcDtZ_dAtQR60NLmQQw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-05-31 18:31                         ` 5.5 kernel: using nouveau or something else just long enough to turn off Quadro RTX 4000 Mobile for hybrid graphics? Marc MERLIN
2020-12-26 11:12                 ` 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile) Marc MERLIN
2020-12-26 11:12                   ` Marc MERLIN
2020-12-27 18:28                   ` [Nouveau] " Ilia Mirkin
2020-12-27 18:28                     ` Ilia Mirkin
2021-01-27 21:33                   ` Bjorn Helgaas
2021-01-27 21:33                     ` Bjorn Helgaas
2021-01-28 20:59                     ` Bjorn Helgaas
2021-01-28 20:59                       ` [Nouveau] " Bjorn Helgaas
2021-01-29  0:56                     ` Marc MERLIN
2021-01-29  0:56                       ` [Nouveau] " Marc MERLIN
2021-01-29 21:20                       ` Bjorn Helgaas
2021-01-29 21:20                         ` [Nouveau] " Bjorn Helgaas
2021-01-30  2:04                         ` Marc MERLIN
2021-01-30  2:04                           ` [Nouveau] " Marc MERLIN
2021-05-05 21:42                           ` [Nouveau] 5.12.1 0010:nvkm_falcon_v1_wait_for_halt+0x8f/0xb9 [nouveau] Marc MERLIN
2021-05-06 14:50                             ` Bjorn Helgaas
2021-05-25  3:13                               ` Ben Skeggs
2020-12-29 15:51                 ` 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile) Marc MERLIN
2020-12-29 15:51                   ` Marc MERLIN
2020-12-29 16:33                   ` Ilia Mirkin
2020-12-29 16:33                     ` Ilia Mirkin
     [not found]                     ` <CAKb7UviFP_YVxC4PO7MDNnw6NDrD=3BCGF37umwAfaimjbX9Pw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-12-29 17:47                       ` Marc MERLIN
     [not found]                         ` <20201229174750.GI23389-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
2021-01-04 11:49                           ` Marc MERLIN
     [not found]                             ` <20210104114955.GM32533-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
2021-01-04 13:28                               ` Karol Herbst
     [not found]                                 ` <CACO55tsdG37YKv7FV2er4hRnXk9vmwMbPuPptA+=ZtziWXC2+g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-01-07 11:49                                   ` Marc MERLIN
2020-12-30 12:16                       ` ael
2020-09-13 20:15               ` [Nouveau] pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73) Marc MERLIN
2020-09-13 20:15                 ` Marc MERLIN
     [not found]                 ` <20200913201545.GL2622-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
2020-09-19 23:18                   ` Marc MERLIN
2019-10-04 12:39 ` [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec Mika Westerberg
2019-10-26 14:19   ` Bjorn Helgaas
2019-10-28 11:28     ` Mika Westerberg
2019-10-28 13:42       ` Bjorn Helgaas
2019-10-28 18:06         ` Mika Westerberg
2019-10-28 20:16           ` Bjorn Helgaas
2019-10-29 11:15             ` Mika Westerberg
2019-10-29 20:27               ` Bjorn Helgaas [this message]
2019-10-30 11:15                 ` Mika Westerberg
2019-10-31 22:31                   ` Bjorn Helgaas
2019-11-01 11:19                     ` Mika Westerberg
2019-11-05  0:00                       ` Bjorn Helgaas
2019-11-05  9:54                         ` Mika Westerberg
2019-11-05 12:58                           ` Mika Westerberg
2019-11-05 20:01                             ` Bjorn Helgaas
2019-11-06 13:31                               ` Mika Westerberg
2019-11-05 15:00                           ` Bjorn Helgaas
2019-11-05 15:28                             ` Mika Westerberg
2019-11-05 16:10                               ` Bjorn Helgaas
2019-11-06 13:29                                 ` Mika Westerberg
2019-10-29 20:54   ` Bjorn Helgaas
2019-10-30 11:33     ` Mika Westerberg
2019-10-04 12:57 ` [PATCH v2 0/2] PCI: Add missing link delays Matthias Andree
2019-10-04 13:06   ` Mika Westerberg
2019-10-05  7:34     ` Matthias Andree
2019-10-07  9:32       ` Mika Westerberg
2019-10-07 15:15         ` Matthias Andree
2019-10-08  9:05           ` Mika Westerberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191029202708.GA38926@google.com \
    --to=helgaas@kernel.org \
    --cc=alex.williamson@redhat.com \
    --cc=kai.heng.feng@canonical.com \
    --cc=keith.busch@intel.com \
    --cc=lenb@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=matthias.andree@gmx.de \
    --cc=mika.westerberg@linux.intel.com \
    --cc=mr.nuke.me@gmail.com \
    --cc=nicholas.johnson-opensource@outlook.com.au \
    --cc=pmenzel@molgen.mpg.de \
    --cc=rjw@rjwysocki.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.