linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v5 2/2] PCI/PM: Fix pci_pm_suspend_noirq() to disable PTM
       [not found] ` <20220429210538.2270472-2-rajvi.jingar@intel.com>
@ 2022-05-12 13:49   ` Rafael J. Wysocki
  2022-05-12 17:42     ` Bjorn Helgaas
  0 siblings, 1 reply; 14+ messages in thread
From: Rafael J. Wysocki @ 2022-05-12 13:49 UTC (permalink / raw)
  To: Rajvi Jingar, bhelgaas; +Cc: david.e.box, linux-pci, linux-kernel, linux-pm

On 4/29/2022 11:05 PM, Rajvi Jingar wrote:
> For the PCIe devices (like nvme) that do not go into D3 state still need to
> disable PTM to allow the port to enter a lower-power PM state and the SoC
> to reach a lower-power idle state as a whole. Move the pci_disable_ptm()
> out of pci_prepare_to_sleep() as this code path is not followed for devices
> that do not go into D3. This fixes the issue seen on Dell XPS 9300 with
> Ice Lake CPU and Dell Precision 5530 with Coffee Lake CPU platforms to get
> improved residency in low power idle states.
>
> Also, on receiving a PTM Request from a downstream device, if PTM is
> disabled on the root port, as per PCIe r6.0, sec 6.21.3, such a request
> would cause an Unsupported Request error. So it must first disable PTM in
> any downstream devices.
>
> Fixes: a697f072f5da ("PCI: Disable PTM during suspend to save power")
> Signed-off-by: Rajvi Jingar <rajvi.jingar@intel.com>
> Suggested-by: David E. Box <david.e.box@linux.intel.com>
> ---
>   v1 -> v2: add Fixes tag in commit message
>   v2 -> v3: move changelog after "---" marker
>   v3 -> v4: add "---" marker after changelog
>   v4 -> v5: move pci_disable_ptm() out of the pci_dev->state_saved check.
> 	   disable PTM for all devices, not just root ports.
> ---
>   drivers/pci/pci-driver.c | 28 +++++++++++++++++++---------
>   drivers/pci/pci.c        | 10 ----------
>   2 files changed, 19 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 8b55a90126a2..400dd18a9cf5 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -813,6 +813,7 @@ static int pci_pm_suspend_late(struct device *dev)
>   
>   static int pci_pm_suspend_noirq(struct device *dev)
>   {
> +	unsigned int dev_state_saved;
>   	struct pci_dev *pci_dev = to_pci_dev(dev);
>   	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
>   
> @@ -845,16 +846,25 @@ static int pci_pm_suspend_noirq(struct device *dev)
>   		}
>   	}
>   
> -	if (!pci_dev->state_saved) {
> +	dev_state_saved = pci_dev->state_saved;

If pci_dev->state_saved is set here, the device may be in D3cold already 
and disabling PTM for it will not work.  Of course, it is not necessary 
to disable PTM for it then, but this case need to be taken care of.

> +	if (!dev_state_saved)
>   		pci_save_state(pci_dev);
> -		/*
> -		 * If the device is a bridge with a child in D0 below it, it needs to
> -		 * stay in D0, so check skip_bus_pm to avoid putting it into a
> -		 * low-power state in that case.
> -		 */
> -		if (!pci_dev->skip_bus_pm && pci_power_manageable(pci_dev))
> -			pci_prepare_to_sleep(pci_dev);
> -	}
> +
> +	/*
> +	 * There are systems (for example, Intel mobile chips since Coffee
> +	 * Lake) where the power drawn while suspended can be significantly
> +	 * reduced by disabling PTM as this allows the SoC to reach a
> +	 * lower-power idle state as a whole.
> +	 */

Something like this should suffice IMV:

if (!dev_state_saved || pci_dev->current_state != PCI_D3cold)

         pci_disable_ptm(pci_dev);


> +	pci_disable_ptm(pci_dev);
> +
> +	/* If the device is a bridge with a child in D0 below it, it needs to
> +	 * stay in D0, so check skip_bus_pm to avoid putting it into a
> +	 * low-power state in that case.
> +	 */
> +	if (!dev_state_saved && !pci_dev->skip_bus_pm &&
> +	    pci_power_manageable(pci_dev))
> +		pci_prepare_to_sleep(pci_dev);
>   
>   	pci_dbg(pci_dev, "PCI PM: Suspend power state: %s\n",
>   		pci_power_name(pci_dev->current_state));
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 9ecce435fb3f..f8768672c064 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2660,16 +2660,6 @@ int pci_prepare_to_sleep(struct pci_dev *dev)
>   	if (target_state == PCI_POWER_ERROR)
>   		return -EIO;
>   
> -	/*
> -	 * There are systems (for example, Intel mobile chips since Coffee
> -	 * Lake) where the power drawn while suspended can be significantly
> -	 * reduced by disabling PTM on PCIe root ports as this allows the
> -	 * port to enter a lower-power PM state and the SoC to reach a
> -	 * lower-power idle state as a whole.
> -	 */
> -	if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT)
> -		pci_disable_ptm(dev);
> -
>   	pci_enable_wake(dev, target_state, wakeup);
>   
>   	error = pci_set_power_state(dev, target_state);

And the restoration of the PTM state on errors in pci_prepare_to_sleep() 
is not needed any more.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 2/2] PCI/PM: Fix pci_pm_suspend_noirq() to disable PTM
  2022-05-12 13:49   ` [PATCH v5 2/2] PCI/PM: Fix pci_pm_suspend_noirq() to disable PTM Rafael J. Wysocki
@ 2022-05-12 17:42     ` Bjorn Helgaas
  2022-05-12 17:52       ` Rafael J. Wysocki
  0 siblings, 1 reply; 14+ messages in thread
From: Bjorn Helgaas @ 2022-05-12 17:42 UTC (permalink / raw)
  To: Rajvi Jingar
  Cc: Rafael J. Wysocki, bhelgaas, david.e.box, linux-pci,
	linux-kernel, linux-pm

Hi Rajvi,

I received your v1, v2, v3, v4, v5 postings because they were sent
directly to bhelgaas@google.com, but for some reason vger doesn't like
them so they don't show up on the mailing list:

  https://lore.kernel.org/all/?q=a%3Arajvi.jingar

I looked at the ones I received directly and don't see an obvious
problem.  Maybe there's a hint here?

  http://vger.kernel.org/majordomo-info.html

All patches should appear on the linux-pci mailing list before
applying them, so we need to figure this out somehow.  In fact, I read
and review patches from linux-pci, so I often don't even see things
that are just sent directly to bhelgaas@google.com. 

On Thu, May 12, 2022 at 03:49:18PM +0200, Rafael J. Wysocki wrote:
> On 4/29/2022 11:05 PM, Rajvi Jingar wrote:
> > For the PCIe devices (like nvme) that do not go into D3 state still need to
> > disable PTM to allow the port to enter a lower-power PM state and the SoC
> > to reach a lower-power idle state as a whole. Move the pci_disable_ptm()
> > out of pci_prepare_to_sleep() as this code path is not followed for devices
> > that do not go into D3. This fixes the issue seen on Dell XPS 9300 with
> > Ice Lake CPU and Dell Precision 5530 with Coffee Lake CPU platforms to get
> > improved residency in low power idle states.

I think the paragraph above is a distraction, and the real reason is
the paragraph below.

> > Also, on receiving a PTM Request from a downstream device, if PTM is
> > disabled on the root port, as per PCIe r6.0, sec 6.21.3, such a request
> > would cause an Unsupported Request error. So it must first disable PTM in
> > any downstream devices.
> > 
> > Fixes: a697f072f5da ("PCI: Disable PTM during suspend to save power")
> > Signed-off-by: Rajvi Jingar <rajvi.jingar@intel.com>
> > Suggested-by: David E. Box <david.e.box@linux.intel.com>
> > ---
> >   v1 -> v2: add Fixes tag in commit message
> >   v2 -> v3: move changelog after "---" marker
> >   v3 -> v4: add "---" marker after changelog
> >   v4 -> v5: move pci_disable_ptm() out of the pci_dev->state_saved check.
> > 	   disable PTM for all devices, not just root ports.
> > ---
> >   drivers/pci/pci-driver.c | 28 +++++++++++++++++++---------
> >   drivers/pci/pci.c        | 10 ----------
> >   2 files changed, 19 insertions(+), 19 deletions(-)
> > 
> > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> > index 8b55a90126a2..400dd18a9cf5 100644
> > --- a/drivers/pci/pci-driver.c
> > +++ b/drivers/pci/pci-driver.c
> > @@ -813,6 +813,7 @@ static int pci_pm_suspend_late(struct device *dev)
> >   static int pci_pm_suspend_noirq(struct device *dev)
> >   {
> > +	unsigned int dev_state_saved;
> >   	struct pci_dev *pci_dev = to_pci_dev(dev);
> >   	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> > @@ -845,16 +846,25 @@ static int pci_pm_suspend_noirq(struct device *dev)
> >   		}
> >   	}
> > -	if (!pci_dev->state_saved) {
> > +	dev_state_saved = pci_dev->state_saved;
> 
> If pci_dev->state_saved is set here, the device may be in D3cold already and
> disabling PTM for it will not work.  Of course, it is not necessary to
> disable PTM for it then, but this case need to be taken care of.
> 
> > +	if (!dev_state_saved)
> >   		pci_save_state(pci_dev);
> > -		/*
> > -		 * If the device is a bridge with a child in D0 below it, it needs to
> > -		 * stay in D0, so check skip_bus_pm to avoid putting it into a
> > -		 * low-power state in that case.
> > -		 */
> > -		if (!pci_dev->skip_bus_pm && pci_power_manageable(pci_dev))
> > -			pci_prepare_to_sleep(pci_dev);
> > -	}
> > +
> > +	/*
> > +	 * There are systems (for example, Intel mobile chips since Coffee
> > +	 * Lake) where the power drawn while suspended can be significantly
> > +	 * reduced by disabling PTM as this allows the SoC to reach a
> > +	 * lower-power idle state as a whole.

I think the argument for disabling PTM is that:

  - If a PTM Requester is put in a low-power state, a PTM Responder
    upstream from it may also be put in a low-power state.

  - Putting a Port in D1, D2, or D3hot does not prohibit it from
    sending or responding to PTM Requests (I'd be glad to be corrected
    about this).

  - We want to disable PTM on Responders when they are in a low-power
    state.

  - Per 6.21.3, a PTM Requester must not be enabled when the upstream
    PTM Responder is disabled.

  - Therefore, we must disable all PTM on all downstream PTM
    Requesters before disabling it on the PTM Responder, e.g., a Root
    Port.

This has nothing specifically to do with Coffee Lake or other Intel
chips, so I think the comment should be merely something to the
effect that "disabling PTM reduces power consumption."

> Something like this should suffice IMV:
> 
> if (!dev_state_saved || pci_dev->current_state != PCI_D3cold)
> 
>         pci_disable_ptm(pci_dev);

It makes sense to me that we needn't disable PTM if the device is in
D3cold.  But the "!dev_state_saved" condition depends on what the
driver did.  Why is that important?  Why should we not do the
following?

  if (pci_dev->current_state != PCI_D3cold)
    pci_disable_ptm(pci_dev);

Bjorn

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 2/2] PCI/PM: Fix pci_pm_suspend_noirq() to disable PTM
  2022-05-12 17:42     ` Bjorn Helgaas
@ 2022-05-12 17:52       ` Rafael J. Wysocki
  2022-05-12 18:35         ` Bjorn Helgaas
  0 siblings, 1 reply; 14+ messages in thread
From: Rafael J. Wysocki @ 2022-05-12 17:52 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rajvi Jingar, Rafael J. Wysocki, Bjorn Helgaas, David Box,
	Linux PCI, Linux Kernel Mailing List, Linux PM

On Thu, May 12, 2022 at 7:42 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> Hi Rajvi,
>
> I received your v1, v2, v3, v4, v5 postings because they were sent
> directly to bhelgaas@google.com, but for some reason vger doesn't like
> them so they don't show up on the mailing list:
>
>   https://lore.kernel.org/all/?q=a%3Arajvi.jingar
>
> I looked at the ones I received directly and don't see an obvious
> problem.  Maybe there's a hint here?
>
>   http://vger.kernel.org/majordomo-info.html
>
> All patches should appear on the linux-pci mailing list before
> applying them, so we need to figure this out somehow.  In fact, I read
> and review patches from linux-pci, so I often don't even see things
> that are just sent directly to bhelgaas@google.com.
>
> On Thu, May 12, 2022 at 03:49:18PM +0200, Rafael J. Wysocki wrote:
> > On 4/29/2022 11:05 PM, Rajvi Jingar wrote:
> > > For the PCIe devices (like nvme) that do not go into D3 state still need to
> > > disable PTM to allow the port to enter a lower-power PM state and the SoC
> > > to reach a lower-power idle state as a whole. Move the pci_disable_ptm()
> > > out of pci_prepare_to_sleep() as this code path is not followed for devices
> > > that do not go into D3. This fixes the issue seen on Dell XPS 9300 with
> > > Ice Lake CPU and Dell Precision 5530 with Coffee Lake CPU platforms to get
> > > improved residency in low power idle states.
>
> I think the paragraph above is a distraction, and the real reason is
> the paragraph below.
>
> > > Also, on receiving a PTM Request from a downstream device, if PTM is
> > > disabled on the root port, as per PCIe r6.0, sec 6.21.3, such a request
> > > would cause an Unsupported Request error. So it must first disable PTM in
> > > any downstream devices.
> > >
> > > Fixes: a697f072f5da ("PCI: Disable PTM during suspend to save power")
> > > Signed-off-by: Rajvi Jingar <rajvi.jingar@intel.com>
> > > Suggested-by: David E. Box <david.e.box@linux.intel.com>
> > > ---
> > >   v1 -> v2: add Fixes tag in commit message
> > >   v2 -> v3: move changelog after "---" marker
> > >   v3 -> v4: add "---" marker after changelog
> > >   v4 -> v5: move pci_disable_ptm() out of the pci_dev->state_saved check.
> > >        disable PTM for all devices, not just root ports.
> > > ---
> > >   drivers/pci/pci-driver.c | 28 +++++++++++++++++++---------
> > >   drivers/pci/pci.c        | 10 ----------
> > >   2 files changed, 19 insertions(+), 19 deletions(-)
> > >
> > > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> > > index 8b55a90126a2..400dd18a9cf5 100644
> > > --- a/drivers/pci/pci-driver.c
> > > +++ b/drivers/pci/pci-driver.c
> > > @@ -813,6 +813,7 @@ static int pci_pm_suspend_late(struct device *dev)
> > >   static int pci_pm_suspend_noirq(struct device *dev)
> > >   {
> > > +   unsigned int dev_state_saved;
> > >     struct pci_dev *pci_dev = to_pci_dev(dev);
> > >     const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> > > @@ -845,16 +846,25 @@ static int pci_pm_suspend_noirq(struct device *dev)
> > >             }
> > >     }
> > > -   if (!pci_dev->state_saved) {
> > > +   dev_state_saved = pci_dev->state_saved;
> >
> > If pci_dev->state_saved is set here, the device may be in D3cold already and
> > disabling PTM for it will not work.  Of course, it is not necessary to
> > disable PTM for it then, but this case need to be taken care of.
> >
> > > +   if (!dev_state_saved)
> > >             pci_save_state(pci_dev);
> > > -           /*
> > > -            * If the device is a bridge with a child in D0 below it, it needs to
> > > -            * stay in D0, so check skip_bus_pm to avoid putting it into a
> > > -            * low-power state in that case.
> > > -            */
> > > -           if (!pci_dev->skip_bus_pm && pci_power_manageable(pci_dev))
> > > -                   pci_prepare_to_sleep(pci_dev);
> > > -   }
> > > +
> > > +   /*
> > > +    * There are systems (for example, Intel mobile chips since Coffee
> > > +    * Lake) where the power drawn while suspended can be significantly
> > > +    * reduced by disabling PTM as this allows the SoC to reach a
> > > +    * lower-power idle state as a whole.
>
> I think the argument for disabling PTM is that:
>
>   - If a PTM Requester is put in a low-power state, a PTM Responder
>     upstream from it may also be put in a low-power state.
>
>   - Putting a Port in D1, D2, or D3hot does not prohibit it from
>     sending or responding to PTM Requests (I'd be glad to be corrected
>     about this).
>
>   - We want to disable PTM on Responders when they are in a low-power
>     state.
>
>   - Per 6.21.3, a PTM Requester must not be enabled when the upstream
>     PTM Responder is disabled.
>
>   - Therefore, we must disable all PTM on all downstream PTM
>     Requesters before disabling it on the PTM Responder, e.g., a Root
>     Port.
>
> This has nothing specifically to do with Coffee Lake or other Intel
> chips, so I think the comment should be merely something to the
> effect that "disabling PTM reduces power consumption."
>
> > Something like this should suffice IMV:
> >
> > if (!dev_state_saved || pci_dev->current_state != PCI_D3cold)
> >
> >         pci_disable_ptm(pci_dev);
>
> It makes sense to me that we needn't disable PTM if the device is in
> D3cold.  But the "!dev_state_saved" condition depends on what the
> driver did.  Why is that important?  Why should we not do the
> following?
>
>   if (pci_dev->current_state != PCI_D3cold)
>     pci_disable_ptm(pci_dev);

We can do this too.  I thought we could skip the power state check if
dev_state_saved was unset, because then we would know that the power
state was not D3cold.  It probably isn't worth the hassle though.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 2/2] PCI/PM: Fix pci_pm_suspend_noirq() to disable PTM
  2022-05-12 17:52       ` Rafael J. Wysocki
@ 2022-05-12 18:35         ` Bjorn Helgaas
  2022-05-13 22:00           ` Jingar, Rajvi
  0 siblings, 1 reply; 14+ messages in thread
From: Bjorn Helgaas @ 2022-05-12 18:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rajvi Jingar, Rafael J. Wysocki, Bjorn Helgaas, David Box,
	Linux PCI, Linux Kernel Mailing List, Linux PM

On Thu, May 12, 2022 at 07:52:36PM +0200, Rafael J. Wysocki wrote:
> On Thu, May 12, 2022 at 7:42 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Thu, May 12, 2022 at 03:49:18PM +0200, Rafael J. Wysocki wrote:

> > > Something like this should suffice IMV:
> > >
> > > if (!dev_state_saved || pci_dev->current_state != PCI_D3cold)
> > >
> > >         pci_disable_ptm(pci_dev);
> >
> > It makes sense to me that we needn't disable PTM if the device is in
> > D3cold.  But the "!dev_state_saved" condition depends on what the
> > driver did.  Why is that important?  Why should we not do the
> > following?
> >
> >   if (pci_dev->current_state != PCI_D3cold)
> >     pci_disable_ptm(pci_dev);
> 
> We can do this too.  I thought we could skip the power state check if
> dev_state_saved was unset, because then we would know that the power
> state was not D3cold.  It probably isn't worth the hassle though.

Ah, thanks.  IMHO it's easier to analyze for correctness if we only
check the power state.

Bjorn

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH v5 2/2] PCI/PM: Fix pci_pm_suspend_noirq() to disable PTM
  2022-05-12 18:35         ` Bjorn Helgaas
@ 2022-05-13 22:00           ` Jingar, Rajvi
  2022-05-14 17:12             ` Rafael J. Wysocki
  2022-05-16 20:09             ` Bjorn Helgaas
  0 siblings, 2 replies; 14+ messages in thread
From: Jingar, Rajvi @ 2022-05-13 22:00 UTC (permalink / raw)
  To: Bjorn Helgaas, Rafael J. Wysocki
  Cc: Wysocki, Rafael J, Bjorn Helgaas, David Box, Linux PCI,
	Linux Kernel Mailing List, Linux PM


> -----Original Message-----
> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: Thursday, May 12, 2022 11:36 AM
> To: Rafael J. Wysocki <rafael@kernel.org>
> Cc: Jingar, Rajvi <rajvi.jingar@intel.com>; Wysocki, Rafael J
> <rafael.j.wysocki@intel.com>; Bjorn Helgaas <bhelgaas@google.com>; David Box
> <david.e.box@linux.intel.com>; Linux PCI <linux-pci@vger.kernel.org>; Linux
> Kernel Mailing List <linux-kernel@vger.kernel.org>; Linux PM <linux-
> pm@vger.kernel.org>
> Subject: Re: [PATCH v5 2/2] PCI/PM: Fix pci_pm_suspend_noirq() to disable PTM
> 
> On Thu, May 12, 2022 at 07:52:36PM +0200, Rafael J. Wysocki wrote:
> > On Thu, May 12, 2022 at 7:42 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Thu, May 12, 2022 at 03:49:18PM +0200, Rafael J. Wysocki wrote:
> 
> > > > Something like this should suffice IMV:
> > > >
> > > > if (!dev_state_saved || pci_dev->current_state != PCI_D3cold)
> > > >
> > > >         pci_disable_ptm(pci_dev);
> > >
> > > It makes sense to me that we needn't disable PTM if the device is in
> > > D3cold.  But the "!dev_state_saved" condition depends on what the
> > > driver did.  Why is that important?  Why should we not do the
> > > following?
> > >
> > >   if (pci_dev->current_state != PCI_D3cold)
> > >     pci_disable_ptm(pci_dev);
> >
> > We can do this too.  I thought we could skip the power state check if
> > dev_state_saved was unset, because then we would know that the power
> > state was not D3cold.  It probably isn't worth the hassle though.
>

We see issue with certain platforms where only checking if device power 
state in D3Cold is not enough and the !dev_state_saved check is needed
when disabling PTM. Device like nvme is relying on ASPM, it stays in D0 but 
state is saved. Touching the config space wakes up the device which 
prevents the system from entering into low power state.

Following would fix the issue:

 if (!pci_dev->state_save) {
                pci_save_state(pci_dev);

               pci_disable_ptm(pci_dev);

                if (!pci_dev->skip_bus_pm && pci_power_manageable(pci_dev))
                        pci_prepare_to_sleep(pci_dev);
}

> Ah, thanks.  IMHO it's easier to analyze for correctness if we only
> check the power state.
> 
> Bjorn

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 2/2] PCI/PM: Fix pci_pm_suspend_noirq() to disable PTM
  2022-05-13 22:00           ` Jingar, Rajvi
@ 2022-05-14 17:12             ` Rafael J. Wysocki
  2022-05-16 20:09             ` Bjorn Helgaas
  1 sibling, 0 replies; 14+ messages in thread
From: Rafael J. Wysocki @ 2022-05-14 17:12 UTC (permalink / raw)
  To: Jingar, Rajvi
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Wysocki, Rafael J,
	Bjorn Helgaas, David Box, Linux PCI, Linux Kernel Mailing List,
	Linux PM

On Sat, May 14, 2022 at 12:01 AM Jingar, Rajvi <rajvi.jingar@intel.com> wrote:
>
>
> > -----Original Message-----
> > From: Bjorn Helgaas <helgaas@kernel.org>
> > Sent: Thursday, May 12, 2022 11:36 AM
> > To: Rafael J. Wysocki <rafael@kernel.org>
> > Cc: Jingar, Rajvi <rajvi.jingar@intel.com>; Wysocki, Rafael J
> > <rafael.j.wysocki@intel.com>; Bjorn Helgaas <bhelgaas@google.com>; David Box
> > <david.e.box@linux.intel.com>; Linux PCI <linux-pci@vger.kernel.org>; Linux
> > Kernel Mailing List <linux-kernel@vger.kernel.org>; Linux PM <linux-
> > pm@vger.kernel.org>
> > Subject: Re: [PATCH v5 2/2] PCI/PM: Fix pci_pm_suspend_noirq() to disable PTM
> >
> > On Thu, May 12, 2022 at 07:52:36PM +0200, Rafael J. Wysocki wrote:
> > > On Thu, May 12, 2022 at 7:42 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Thu, May 12, 2022 at 03:49:18PM +0200, Rafael J. Wysocki wrote:
> >
> > > > > Something like this should suffice IMV:
> > > > >
> > > > > if (!dev_state_saved || pci_dev->current_state != PCI_D3cold)
> > > > >
> > > > >         pci_disable_ptm(pci_dev);
> > > >
> > > > It makes sense to me that we needn't disable PTM if the device is in
> > > > D3cold.  But the "!dev_state_saved" condition depends on what the
> > > > driver did.  Why is that important?  Why should we not do the
> > > > following?
> > > >
> > > >   if (pci_dev->current_state != PCI_D3cold)
> > > >     pci_disable_ptm(pci_dev);
> > >
> > > We can do this too.  I thought we could skip the power state check if
> > > dev_state_saved was unset, because then we would know that the power
> > > state was not D3cold.  It probably isn't worth the hassle though.
> >
>
> We see issue with certain platforms where only checking if device power
> state in D3Cold is not enough and the !dev_state_saved check is needed
> when disabling PTM. Device like nvme is relying on ASPM, it stays in D0 but
> state is saved. Touching the config space wakes up the device which
> prevents the system from entering into low power state.
>
> Following would fix the issue:
>
>  if (!pci_dev->state_save) {
>                 pci_save_state(pci_dev);
>
>                pci_disable_ptm(pci_dev);
>
>                 if (!pci_dev->skip_bus_pm && pci_power_manageable(pci_dev))
>                         pci_prepare_to_sleep(pci_dev);
> }

Well, the point is to also disable PTM for devices that were put into
D3 by their drivers.

In addition to D3cold, the check could cover D0 too, that is

if (pci_dev->current_state > D0 && pci_dev->current_state < PCI_D3cold)
         pci_disable_ptm(pci_dev);

> > Ah, thanks.  IMHO it's easier to analyze for correctness if we only
> > check the power state.
> >
> > Bjorn

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 2/2] PCI/PM: Fix pci_pm_suspend_noirq() to disable PTM
  2022-05-13 22:00           ` Jingar, Rajvi
  2022-05-14 17:12             ` Rafael J. Wysocki
@ 2022-05-16 20:09             ` Bjorn Helgaas
  2022-05-16 20:59               ` Rafael J. Wysocki
  1 sibling, 1 reply; 14+ messages in thread
From: Bjorn Helgaas @ 2022-05-16 20:09 UTC (permalink / raw)
  To: Jingar, Rajvi
  Cc: Rafael J. Wysocki, Wysocki, Rafael J, Bjorn Helgaas, David Box,
	Linux PCI, Linux Kernel Mailing List, Linux PM

On Fri, May 13, 2022 at 10:00:48PM +0000, Jingar, Rajvi wrote:
> 
> > -----Original Message-----
> > From: Bjorn Helgaas <helgaas@kernel.org>
> > Sent: Thursday, May 12, 2022 11:36 AM
> > To: Rafael J. Wysocki <rafael@kernel.org>
> > Cc: Jingar, Rajvi <rajvi.jingar@intel.com>; Wysocki, Rafael J
> > <rafael.j.wysocki@intel.com>; Bjorn Helgaas <bhelgaas@google.com>; David Box
> > <david.e.box@linux.intel.com>; Linux PCI <linux-pci@vger.kernel.org>; Linux
> > Kernel Mailing List <linux-kernel@vger.kernel.org>; Linux PM <linux-
> > pm@vger.kernel.org>
> > Subject: Re: [PATCH v5 2/2] PCI/PM: Fix pci_pm_suspend_noirq() to disable PTM
> > 
> > On Thu, May 12, 2022 at 07:52:36PM +0200, Rafael J. Wysocki wrote:
> > > On Thu, May 12, 2022 at 7:42 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Thu, May 12, 2022 at 03:49:18PM +0200, Rafael J. Wysocki wrote:
> > 
> > > > > Something like this should suffice IMV:
> > > > >
> > > > > if (!dev_state_saved || pci_dev->current_state != PCI_D3cold)
> > > > >
> > > > >         pci_disable_ptm(pci_dev);
> > > >
> > > > It makes sense to me that we needn't disable PTM if the device is in
> > > > D3cold.  But the "!dev_state_saved" condition depends on what the
> > > > driver did.  Why is that important?  Why should we not do the
> > > > following?
> > > >
> > > >   if (pci_dev->current_state != PCI_D3cold)
> > > >     pci_disable_ptm(pci_dev);
> > >
> > > We can do this too.  I thought we could skip the power state
> > > check if dev_state_saved was unset, because then we would know
> > > that the power state was not D3cold.  It probably isn't worth
> > > the hassle though.
> 
> We see issue with certain platforms where only checking if device
> power state in D3Cold is not enough and the !dev_state_saved check
> is needed when disabling PTM. Device like nvme is relying on ASPM,
> it stays in D0 but state is saved. Touching the config space wakes
> up the device which prevents the system from entering into low power
> state.

Correct me if I'm wrong: for NVMe devices, nvme_suspend() has already
saved state and put the device in some low-power state.  Disabling PTM
here is functionally OK but prevents a system low power state, so you
want to leave PTM enabled.

But I must be missing something because pci_prepare_to_sleep()
currently disables PTM for Root Ports.  If we leave PTM enabled on
NVMe but disable it on the Root Port above it, any PTM Request from
NVMe will cause an Unsupported Request error.

Disabling PTM must be coordinated across PTM Requesters and PTM
Responders.  That means the decision to disable cannot depend on
driver-specific things like whether the driver has saved state.

> Following would fix the issue:
> 
>  if (!pci_dev->state_save) {
>                 pci_save_state(pci_dev);
> 
>                pci_disable_ptm(pci_dev);
> 
>                 if (!pci_dev->skip_bus_pm && pci_power_manageable(pci_dev))
>                         pci_prepare_to_sleep(pci_dev);
> }
> 
> > Ah, thanks.  IMHO it's easier to analyze for correctness if we only
> > check the power state.
> > 
> > Bjorn

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 2/2] PCI/PM: Fix pci_pm_suspend_noirq() to disable PTM
  2022-05-16 20:09             ` Bjorn Helgaas
@ 2022-05-16 20:59               ` Rafael J. Wysocki
  2022-05-17 14:48                 ` Bjorn Helgaas
  0 siblings, 1 reply; 14+ messages in thread
From: Rafael J. Wysocki @ 2022-05-16 20:59 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jingar, Rajvi, Rafael J. Wysocki, Wysocki, Rafael J,
	Bjorn Helgaas, David Box, Linux PCI, Linux Kernel Mailing List,
	Linux PM

On Mon, May 16, 2022 at 10:09 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Fri, May 13, 2022 at 10:00:48PM +0000, Jingar, Rajvi wrote:
> >
> > > -----Original Message-----
> > > From: Bjorn Helgaas <helgaas@kernel.org>
> > > Sent: Thursday, May 12, 2022 11:36 AM
> > > To: Rafael J. Wysocki <rafael@kernel.org>
> > > Cc: Jingar, Rajvi <rajvi.jingar@intel.com>; Wysocki, Rafael J
> > > <rafael.j.wysocki@intel.com>; Bjorn Helgaas <bhelgaas@google.com>; David Box
> > > <david.e.box@linux.intel.com>; Linux PCI <linux-pci@vger.kernel.org>; Linux
> > > Kernel Mailing List <linux-kernel@vger.kernel.org>; Linux PM <linux-
> > > pm@vger.kernel.org>
> > > Subject: Re: [PATCH v5 2/2] PCI/PM: Fix pci_pm_suspend_noirq() to disable PTM
> > >
> > > On Thu, May 12, 2022 at 07:52:36PM +0200, Rafael J. Wysocki wrote:
> > > > On Thu, May 12, 2022 at 7:42 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > On Thu, May 12, 2022 at 03:49:18PM +0200, Rafael J. Wysocki wrote:
> > >
> > > > > > Something like this should suffice IMV:
> > > > > >
> > > > > > if (!dev_state_saved || pci_dev->current_state != PCI_D3cold)
> > > > > >
> > > > > >         pci_disable_ptm(pci_dev);
> > > > >
> > > > > It makes sense to me that we needn't disable PTM if the device is in
> > > > > D3cold.  But the "!dev_state_saved" condition depends on what the
> > > > > driver did.  Why is that important?  Why should we not do the
> > > > > following?
> > > > >
> > > > >   if (pci_dev->current_state != PCI_D3cold)
> > > > >     pci_disable_ptm(pci_dev);
> > > >
> > > > We can do this too.  I thought we could skip the power state
> > > > check if dev_state_saved was unset, because then we would know
> > > > that the power state was not D3cold.  It probably isn't worth
> > > > the hassle though.
> >
> > We see issue with certain platforms where only checking if device
> > power state in D3Cold is not enough and the !dev_state_saved check
> > is needed when disabling PTM. Device like nvme is relying on ASPM,
> > it stays in D0 but state is saved. Touching the config space wakes
> > up the device which prevents the system from entering into low power
> > state.
>
> Correct me if I'm wrong: for NVMe devices, nvme_suspend() has already
> saved state and put the device in some low-power state.  Disabling PTM
> here is functionally OK but prevents a system low power state, so you
> want to leave PTM enabled.
>
> But I must be missing something because pci_prepare_to_sleep()
> currently disables PTM for Root Ports.  If we leave PTM enabled on
> NVMe but disable it on the Root Port above it, any PTM Request from
> NVMe will cause an Unsupported Request error.
>
> Disabling PTM must be coordinated across PTM Requesters and PTM
> Responders.  That means the decision to disable cannot depend on
> driver-specific things like whether the driver has saved state.

Setting state_saved generally informs pci_pm_suspend_noirq() that the
device has already been handled and it doesn't need to do anything to
it.

But you are right that PTM should be disabled on downstream devices as
well as on the ports that those devices are connected to and it can be
done even if the given device has already been handled, so the
state_saved value is technically irrelevant.

That's why I suggested to check if the power state is between D0 and
D3cold (exclusive) and only disable PTM if that is the case.  It is
pointless to disable PTM for devices in D3cold and it may be harmful
for devices that are left in D0.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 2/2] PCI/PM: Fix pci_pm_suspend_noirq() to disable PTM
  2022-05-16 20:59               ` Rafael J. Wysocki
@ 2022-05-17 14:48                 ` Bjorn Helgaas
  2022-05-17 14:54                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 14+ messages in thread
From: Bjorn Helgaas @ 2022-05-17 14:48 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jingar, Rajvi, Wysocki, Rafael J, Bjorn Helgaas, David Box,
	Linux PCI, Linux Kernel Mailing List, Linux PM

On Mon, May 16, 2022 at 10:59:32PM +0200, Rafael J. Wysocki wrote:
> On Mon, May 16, 2022 at 10:09 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Fri, May 13, 2022 at 10:00:48PM +0000, Jingar, Rajvi wrote:
> > > > -----Original Message-----
> > > > From: Bjorn Helgaas <helgaas@kernel.org>
> > > > Sent: Thursday, May 12, 2022 11:36 AM
> > > > To: Rafael J. Wysocki <rafael@kernel.org>
> > > > Cc: Jingar, Rajvi <rajvi.jingar@intel.com>; Wysocki, Rafael J
> > > > <rafael.j.wysocki@intel.com>; Bjorn Helgaas <bhelgaas@google.com>; David Box
> > > > <david.e.box@linux.intel.com>; Linux PCI <linux-pci@vger.kernel.org>; Linux
> > > > Kernel Mailing List <linux-kernel@vger.kernel.org>; Linux PM <linux-
> > > > pm@vger.kernel.org>
> > > > Subject: Re: [PATCH v5 2/2] PCI/PM: Fix pci_pm_suspend_noirq() to disable PTM
> > > >
> > > > On Thu, May 12, 2022 at 07:52:36PM +0200, Rafael J. Wysocki wrote:
> > > > > On Thu, May 12, 2022 at 7:42 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > On Thu, May 12, 2022 at 03:49:18PM +0200, Rafael J. Wysocki wrote:
> > > >
> > > > > > > Something like this should suffice IMV:
> > > > > > >
> > > > > > > if (!dev_state_saved || pci_dev->current_state != PCI_D3cold)
> > > > > > >
> > > > > > >         pci_disable_ptm(pci_dev);
> > > > > >
> > > > > > It makes sense to me that we needn't disable PTM if the device is in
> > > > > > D3cold.  But the "!dev_state_saved" condition depends on what the
> > > > > > driver did.  Why is that important?  Why should we not do the
> > > > > > following?
> > > > > >
> > > > > >   if (pci_dev->current_state != PCI_D3cold)
> > > > > >     pci_disable_ptm(pci_dev);
> > > > >
> > > > > We can do this too.  I thought we could skip the power state
> > > > > check if dev_state_saved was unset, because then we would know
> > > > > that the power state was not D3cold.  It probably isn't worth
> > > > > the hassle though.
> > >
> > > We see issue with certain platforms where only checking if device
> > > power state in D3Cold is not enough and the !dev_state_saved check
> > > is needed when disabling PTM. Device like nvme is relying on ASPM,
> > > it stays in D0 but state is saved. Touching the config space wakes
> > > up the device which prevents the system from entering into low power
> > > state.
> >
> > Correct me if I'm wrong: for NVMe devices, nvme_suspend() has already
> > saved state and put the device in some low-power state.  Disabling PTM
> > here is functionally OK but prevents a system low power state, so you
> > want to leave PTM enabled.
> >
> > But I must be missing something because pci_prepare_to_sleep()
> > currently disables PTM for Root Ports.  If we leave PTM enabled on
> > NVMe but disable it on the Root Port above it, any PTM Request from
> > NVMe will cause an Unsupported Request error.
> >
> > Disabling PTM must be coordinated across PTM Requesters and PTM
> > Responders.  That means the decision to disable cannot depend on
> > driver-specific things like whether the driver has saved state.
> 
> Setting state_saved generally informs pci_pm_suspend_noirq() that the
> device has already been handled and it doesn't need to do anything to
> it.
> 
> But you are right that PTM should be disabled on downstream devices as
> well as on the ports that those devices are connected to and it can be
> done even if the given device has already been handled, so the
> state_saved value is technically irrelevant.
> 
> That's why I suggested to check if the power state is between D0 and
> D3cold (exclusive) and only disable PTM if that is the case.  It is
> pointless to disable PTM for devices in D3cold and it may be harmful
> for devices that are left in D0.

"... it may be harmful for devices that are left in D0" -- I want to
understand this better.  It sounds like nvme_suspend() leaves the
device in some device-specific low-power flavor of D0, and subsequent
config accesses take it out of that low-power situation?  

If that's the case, it sounds a little brittle.  I don't think it's
obvious that "pci_dev->state_saved was set by the driver" means "no
config accesses allowed in pci_pm_suspend_noirq()."  And
pci_pm_suspend_noirq() calls quirks via pci_fixup_device(), which are
very likely to do config accesses.

Maybe PTM needs to be disabled earlier, e.g., in pci_pm_suspend()?  I
don't think PTM uses any interrupts, so there's probably no reason
interrupts need to be disabled before disabling PTM.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 2/2] PCI/PM: Fix pci_pm_suspend_noirq() to disable PTM
  2022-05-17 14:48                 ` Bjorn Helgaas
@ 2022-05-17 14:54                   ` Rafael J. Wysocki
  2022-05-17 18:05                     ` David E. Box
  0 siblings, 1 reply; 14+ messages in thread
From: Rafael J. Wysocki @ 2022-05-17 14:54 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, Jingar, Rajvi, Wysocki, Rafael J,
	Bjorn Helgaas, David Box, Linux PCI, Linux Kernel Mailing List,
	Linux PM

On Tue, May 17, 2022 at 4:48 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Mon, May 16, 2022 at 10:59:32PM +0200, Rafael J. Wysocki wrote:
> > On Mon, May 16, 2022 at 10:09 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Fri, May 13, 2022 at 10:00:48PM +0000, Jingar, Rajvi wrote:
> > > > > -----Original Message-----
> > > > > From: Bjorn Helgaas <helgaas@kernel.org>
> > > > > Sent: Thursday, May 12, 2022 11:36 AM
> > > > > To: Rafael J. Wysocki <rafael@kernel.org>
> > > > > Cc: Jingar, Rajvi <rajvi.jingar@intel.com>; Wysocki, Rafael J
> > > > > <rafael.j.wysocki@intel.com>; Bjorn Helgaas <bhelgaas@google.com>; David Box
> > > > > <david.e.box@linux.intel.com>; Linux PCI <linux-pci@vger.kernel.org>; Linux
> > > > > Kernel Mailing List <linux-kernel@vger.kernel.org>; Linux PM <linux-
> > > > > pm@vger.kernel.org>
> > > > > Subject: Re: [PATCH v5 2/2] PCI/PM: Fix pci_pm_suspend_noirq() to disable PTM
> > > > >
> > > > > On Thu, May 12, 2022 at 07:52:36PM +0200, Rafael J. Wysocki wrote:
> > > > > > On Thu, May 12, 2022 at 7:42 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > > On Thu, May 12, 2022 at 03:49:18PM +0200, Rafael J. Wysocki wrote:
> > > > >
> > > > > > > > Something like this should suffice IMV:
> > > > > > > >
> > > > > > > > if (!dev_state_saved || pci_dev->current_state != PCI_D3cold)
> > > > > > > >
> > > > > > > >         pci_disable_ptm(pci_dev);
> > > > > > >
> > > > > > > It makes sense to me that we needn't disable PTM if the device is in
> > > > > > > D3cold.  But the "!dev_state_saved" condition depends on what the
> > > > > > > driver did.  Why is that important?  Why should we not do the
> > > > > > > following?
> > > > > > >
> > > > > > >   if (pci_dev->current_state != PCI_D3cold)
> > > > > > >     pci_disable_ptm(pci_dev);
> > > > > >
> > > > > > We can do this too.  I thought we could skip the power state
> > > > > > check if dev_state_saved was unset, because then we would know
> > > > > > that the power state was not D3cold.  It probably isn't worth
> > > > > > the hassle though.
> > > >
> > > > We see issue with certain platforms where only checking if device
> > > > power state in D3Cold is not enough and the !dev_state_saved check
> > > > is needed when disabling PTM. Device like nvme is relying on ASPM,
> > > > it stays in D0 but state is saved. Touching the config space wakes
> > > > up the device which prevents the system from entering into low power
> > > > state.
> > >
> > > Correct me if I'm wrong: for NVMe devices, nvme_suspend() has already
> > > saved state and put the device in some low-power state.  Disabling PTM
> > > here is functionally OK but prevents a system low power state, so you
> > > want to leave PTM enabled.
> > >
> > > But I must be missing something because pci_prepare_to_sleep()
> > > currently disables PTM for Root Ports.  If we leave PTM enabled on
> > > NVMe but disable it on the Root Port above it, any PTM Request from
> > > NVMe will cause an Unsupported Request error.
> > >
> > > Disabling PTM must be coordinated across PTM Requesters and PTM
> > > Responders.  That means the decision to disable cannot depend on
> > > driver-specific things like whether the driver has saved state.
> >
> > Setting state_saved generally informs pci_pm_suspend_noirq() that the
> > device has already been handled and it doesn't need to do anything to
> > it.
> >
> > But you are right that PTM should be disabled on downstream devices as
> > well as on the ports that those devices are connected to and it can be
> > done even if the given device has already been handled, so the
> > state_saved value is technically irrelevant.
> >
> > That's why I suggested to check if the power state is between D0 and
> > D3cold (exclusive) and only disable PTM if that is the case.  It is
> > pointless to disable PTM for devices in D3cold and it may be harmful
> > for devices that are left in D0.
>
> "... it may be harmful for devices that are left in D0" -- I want to
> understand this better.  It sounds like nvme_suspend() leaves the
> device in some device-specific low-power flavor of D0, and subsequent
> config accesses take it out of that low-power situation?

That's my understanding of it.

> If that's the case, it sounds a little brittle.  I don't think it's
> obvious that "pci_dev->state_saved was set by the driver" means "no
> config accesses allowed in pci_pm_suspend_noirq()."

Well, yes and no.  The device may be in D3cold then, so
pci_pm_suspend_noirq() should at least check that before accessing its
config space.

> And pci_pm_suspend_noirq() calls quirks via pci_fixup_device(), which are
> very likely to do config accesses.
>
> Maybe PTM needs to be disabled earlier, e.g., in pci_pm_suspend()?  I
> don't think PTM uses any interrupts, so there's probably no reason
> interrupts need to be disabled before disabling PTM.

That certainly is worth investigation.  For one, I don't see any
obvious downsides of doing so.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 2/2] PCI/PM: Fix pci_pm_suspend_noirq() to disable PTM
  2022-05-17 14:54                   ` Rafael J. Wysocki
@ 2022-05-17 18:05                     ` David E. Box
  2022-05-19 19:01                       ` Rafael J. Wysocki
  0 siblings, 1 reply; 14+ messages in thread
From: David E. Box @ 2022-05-17 18:05 UTC (permalink / raw)
  To: Rafael J. Wysocki, Bjorn Helgaas
  Cc: Jingar, Rajvi, Wysocki, Rafael J, Bjorn Helgaas, Linux PCI,
	Linux Kernel Mailing List, Linux PM

On Tue, 2022-05-17 at 16:54 +0200, Rafael J. Wysocki wrote:
> On Tue, May 17, 2022 at 4:48 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > 
> > On Mon, May 16, 2022 at 10:59:32PM +0200, Rafael J. Wysocki wrote:
> > > On Mon, May 16, 2022 at 10:09 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Fri, May 13, 2022 at 10:00:48PM +0000, Jingar, Rajvi wrote:
> > > > > > -----Original Message-----
> > > > > > From: Bjorn Helgaas <helgaas@kernel.org>
> > > > > > Sent: Thursday, May 12, 2022 11:36 AM
> > > > > > To: Rafael J. Wysocki <rafael@kernel.org>
> > > > > > Cc: Jingar, Rajvi <rajvi.jingar@intel.com>; Wysocki, Rafael J
> > > > > > <rafael.j.wysocki@intel.com>; Bjorn Helgaas <bhelgaas@google.com>;
> > > > > > David Box
> > > > > > <david.e.box@linux.intel.com>; Linux PCI <linux-pci@vger.kernel.org>;
> > > > > > Linux
> > > > > > Kernel Mailing List <linux-kernel@vger.kernel.org>; Linux PM <linux-
> > > > > > pm@vger.kernel.org>
> > > > > > Subject: Re: [PATCH v5 2/2] PCI/PM: Fix pci_pm_suspend_noirq() to
> > > > > > disable PTM
> > > > > > 
> > > > > > On Thu, May 12, 2022 at 07:52:36PM +0200, Rafael J. Wysocki wrote:
> > > > > > > On Thu, May 12, 2022 at 7:42 PM Bjorn Helgaas <helgaas@kernel.org>
> > > > > > > wrote:
> > > > > > > > On Thu, May 12, 2022 at 03:49:18PM +0200, Rafael J. Wysocki wrote:
> > > > > > 
> > > > > > > > > Something like this should suffice IMV:
> > > > > > > > > 
> > > > > > > > > if (!dev_state_saved || pci_dev->current_state != PCI_D3cold)
> > > > > > > > > 
> > > > > > > > >         pci_disable_ptm(pci_dev);
> > > > > > > > 
> > > > > > > > It makes sense to me that we needn't disable PTM if the device is
> > > > > > > > in
> > > > > > > > D3cold.  But the "!dev_state_saved" condition depends on what the
> > > > > > > > driver did.  Why is that important?  Why should we not do the
> > > > > > > > following?
> > > > > > > > 
> > > > > > > >   if (pci_dev->current_state != PCI_D3cold)
> > > > > > > >     pci_disable_ptm(pci_dev);
> > > > > > > 
> > > > > > > We can do this too.  I thought we could skip the power state
> > > > > > > check if dev_state_saved was unset, because then we would know
> > > > > > > that the power state was not D3cold.  It probably isn't worth
> > > > > > > the hassle though.
> > > > > 
> > > > > We see issue with certain platforms where only checking if device
> > > > > power state in D3Cold is not enough and the !dev_state_saved check
> > > > > is needed when disabling PTM. Device like nvme is relying on ASPM,
> > > > > it stays in D0 but state is saved. Touching the config space wakes
> > > > > up the device which prevents the system from entering into low power
> > > > > state.
> > > > 
> > > > Correct me if I'm wrong: for NVMe devices, nvme_suspend() has already
> > > > saved state and put the device in some low-power state.  Disabling PTM
> > > > here is functionally OK but prevents a system low power state, so you
> > > > want to leave PTM enabled.
> > > > 
> > > > But I must be missing something because pci_prepare_to_sleep()
> > > > currently disables PTM for Root Ports.  If we leave PTM enabled on
> > > > NVMe but disable it on the Root Port above it, any PTM Request from
> > > > NVMe will cause an Unsupported Request error.
> > > > 
> > > > Disabling PTM must be coordinated across PTM Requesters and PTM
> > > > Responders.  That means the decision to disable cannot depend on
> > > > driver-specific things like whether the driver has saved state.
> > > 
> > > Setting state_saved generally informs pci_pm_suspend_noirq() that the
> > > device has already been handled and it doesn't need to do anything to
> > > it.
> > > 
> > > But you are right that PTM should be disabled on downstream devices as
> > > well as on the ports that those devices are connected to and it can be
> > > done even if the given device has already been handled, so the
> > > state_saved value is technically irrelevant.
> > > 
> > > That's why I suggested to check if the power state is between D0 and
> > > D3cold (exclusive) and only disable PTM if that is the case.  It is
> > > pointless to disable PTM for devices in D3cold and it may be harmful
> > > for devices that are left in D0.
> > 
> > "... it may be harmful for devices that are left in D0" -- I want to
> > understand this better.  It sounds like nvme_suspend() leaves the
> > device in some device-specific low-power flavor of D0, and subsequent
> > config accesses take it out of that low-power situation?
> 

This is exactly what we see. It's not all machines, but in our lab we've seen in
it on 3 production systems out of about 20. And they were all different
generations, a 7th, 8th, and 10th gen.

nvme_suspend is relying on NVMe APST / PCIe ASPM to put the device in a low
power state. The link state will be L1 or deeper while the device remains in D0.

https://nvmexpress.org/resources/nvm-express-technology-features/nvme-technology-power-features/


> That's my understanding of it.
> 
> > If that's the case, it sounds a little brittle.  I don't think it's
> > obvious that "pci_dev->state_saved was set by the driver" means "no
> > config accesses allowed in pci_pm_suspend_noirq()."
> 
> Well, yes and no.  The device may be in D3cold then, so
> pci_pm_suspend_noirq() should at least check that before accessing its
> config space.
> 
> > And pci_pm_suspend_noirq() calls quirks via pci_fixup_device(), which are
> > very likely to do config accesses.
> > 
> > Maybe PTM needs to be disabled earlier, e.g., in pci_pm_suspend()?  I
> > don't think PTM uses any interrupts, so there's probably no reason
> > interrupts need to be disabled before disabling PTM.
> 
> That certainly is worth investigation.  For one, I don't see any
> obvious downsides of doing so.

We will look at this.

David



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 2/2] PCI/PM: Fix pci_pm_suspend_noirq() to disable PTM
  2022-05-17 18:05                     ` David E. Box
@ 2022-05-19 19:01                       ` Rafael J. Wysocki
  2022-05-23 18:32                         ` Bjorn Helgaas
  0 siblings, 1 reply; 14+ messages in thread
From: Rafael J. Wysocki @ 2022-05-19 19:01 UTC (permalink / raw)
  To: David Box, Bjorn Helgaas
  Cc: Rafael J. Wysocki, Jingar, Rajvi, Wysocki, Rafael J,
	Bjorn Helgaas, Linux PCI, Linux Kernel Mailing List, Linux PM

On Tue, May 17, 2022 at 8:05 PM David E. Box
<david.e.box@linux.intel.com> wrote:
>
> On Tue, 2022-05-17 at 16:54 +0200, Rafael J. Wysocki wrote:
> > On Tue, May 17, 2022 at 4:48 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > >
> > > On Mon, May 16, 2022 at 10:59:32PM +0200, Rafael J. Wysocki wrote:
> > > > On Mon, May 16, 2022 at 10:09 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > On Fri, May 13, 2022 at 10:00:48PM +0000, Jingar, Rajvi wrote:
> > > > > > > -----Original Message-----
> > > > > > > From: Bjorn Helgaas <helgaas@kernel.org>
> > > > > > > Sent: Thursday, May 12, 2022 11:36 AM
> > > > > > > To: Rafael J. Wysocki <rafael@kernel.org>
> > > > > > > Cc: Jingar, Rajvi <rajvi.jingar@intel.com>; Wysocki, Rafael J
> > > > > > > <rafael.j.wysocki@intel.com>; Bjorn Helgaas <bhelgaas@google.com>;
> > > > > > > David Box
> > > > > > > <david.e.box@linux.intel.com>; Linux PCI <linux-pci@vger.kernel.org>;
> > > > > > > Linux
> > > > > > > Kernel Mailing List <linux-kernel@vger.kernel.org>; Linux PM <linux-
> > > > > > > pm@vger.kernel.org>
> > > > > > > Subject: Re: [PATCH v5 2/2] PCI/PM: Fix pci_pm_suspend_noirq() to
> > > > > > > disable PTM
> > > > > > >
> > > > > > > On Thu, May 12, 2022 at 07:52:36PM +0200, Rafael J. Wysocki wrote:
> > > > > > > > On Thu, May 12, 2022 at 7:42 PM Bjorn Helgaas <helgaas@kernel.org>
> > > > > > > > wrote:
> > > > > > > > > On Thu, May 12, 2022 at 03:49:18PM +0200, Rafael J. Wysocki wrote:
> > > > > > >
> > > > > > > > > > Something like this should suffice IMV:
> > > > > > > > > >
> > > > > > > > > > if (!dev_state_saved || pci_dev->current_state != PCI_D3cold)
> > > > > > > > > >
> > > > > > > > > >         pci_disable_ptm(pci_dev);
> > > > > > > > >
> > > > > > > > > It makes sense to me that we needn't disable PTM if the device is
> > > > > > > > > in
> > > > > > > > > D3cold.  But the "!dev_state_saved" condition depends on what the
> > > > > > > > > driver did.  Why is that important?  Why should we not do the
> > > > > > > > > following?
> > > > > > > > >
> > > > > > > > >   if (pci_dev->current_state != PCI_D3cold)
> > > > > > > > >     pci_disable_ptm(pci_dev);
> > > > > > > >
> > > > > > > > We can do this too.  I thought we could skip the power state
> > > > > > > > check if dev_state_saved was unset, because then we would know
> > > > > > > > that the power state was not D3cold.  It probably isn't worth
> > > > > > > > the hassle though.
> > > > > >
> > > > > > We see issue with certain platforms where only checking if device
> > > > > > power state in D3Cold is not enough and the !dev_state_saved check
> > > > > > is needed when disabling PTM. Device like nvme is relying on ASPM,
> > > > > > it stays in D0 but state is saved. Touching the config space wakes
> > > > > > up the device which prevents the system from entering into low power
> > > > > > state.
> > > > >
> > > > > Correct me if I'm wrong: for NVMe devices, nvme_suspend() has already
> > > > > saved state and put the device in some low-power state.  Disabling PTM
> > > > > here is functionally OK but prevents a system low power state, so you
> > > > > want to leave PTM enabled.
> > > > >
> > > > > But I must be missing something because pci_prepare_to_sleep()
> > > > > currently disables PTM for Root Ports.  If we leave PTM enabled on
> > > > > NVMe but disable it on the Root Port above it, any PTM Request from
> > > > > NVMe will cause an Unsupported Request error.
> > > > >
> > > > > Disabling PTM must be coordinated across PTM Requesters and PTM
> > > > > Responders.  That means the decision to disable cannot depend on
> > > > > driver-specific things like whether the driver has saved state.
> > > >
> > > > Setting state_saved generally informs pci_pm_suspend_noirq() that the
> > > > device has already been handled and it doesn't need to do anything to
> > > > it.
> > > >
> > > > But you are right that PTM should be disabled on downstream devices as
> > > > well as on the ports that those devices are connected to and it can be
> > > > done even if the given device has already been handled, so the
> > > > state_saved value is technically irrelevant.
> > > >
> > > > That's why I suggested to check if the power state is between D0 and
> > > > D3cold (exclusive) and only disable PTM if that is the case.  It is
> > > > pointless to disable PTM for devices in D3cold and it may be harmful
> > > > for devices that are left in D0.
> > >
> > > "... it may be harmful for devices that are left in D0" -- I want to
> > > understand this better.  It sounds like nvme_suspend() leaves the
> > > device in some device-specific low-power flavor of D0, and subsequent
> > > config accesses take it out of that low-power situation?
> >
>
> This is exactly what we see. It's not all machines, but in our lab we've seen in
> it on 3 production systems out of about 20. And they were all different
> generations, a 7th, 8th, and 10th gen.
>
> nvme_suspend is relying on NVMe APST / PCIe ASPM to put the device in a low
> power state. The link state will be L1 or deeper while the device remains in D0.
>
> https://nvmexpress.org/resources/nvm-express-technology-features/nvme-technology-power-features/
>
>
> > That's my understanding of it.
> >
> > > If that's the case, it sounds a little brittle.  I don't think it's
> > > obvious that "pci_dev->state_saved was set by the driver" means "no
> > > config accesses allowed in pci_pm_suspend_noirq()."
> >
> > Well, yes and no.  The device may be in D3cold then, so
> > pci_pm_suspend_noirq() should at least check that before accessing its
> > config space.
> >
> > > And pci_pm_suspend_noirq() calls quirks via pci_fixup_device(), which are
> > > very likely to do config accesses.
> > >
> > > Maybe PTM needs to be disabled earlier, e.g., in pci_pm_suspend()?  I
> > > don't think PTM uses any interrupts, so there's probably no reason
> > > interrupts need to be disabled before disabling PTM.
> >
> > That certainly is worth investigation.  For one, I don't see any
> > obvious downsides of doing so.
>
> We will look at this.

Appreciated.

In the meantime, I think that it would make sense to pick up the first
patch in this series which is a good cleanup regardless.

Bjorn, could you do that, please?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 2/2] PCI/PM: Fix pci_pm_suspend_noirq() to disable PTM
  2022-05-19 19:01                       ` Rafael J. Wysocki
@ 2022-05-23 18:32                         ` Bjorn Helgaas
  2022-05-23 19:02                           ` Rafael J. Wysocki
  0 siblings, 1 reply; 14+ messages in thread
From: Bjorn Helgaas @ 2022-05-23 18:32 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: David Box, Jingar, Rajvi, Wysocki, Rafael J, Bjorn Helgaas,
	Linux PCI, Linux Kernel Mailing List, Linux PM

On Thu, May 19, 2022 at 09:01:27PM +0200, Rafael J. Wysocki wrote:
> ...

> In the meantime, I think that it would make sense to pick up the first
> patch in this series which is a good cleanup regardless.
> 
> Bjorn, could you do that, please?

As far as I know, this series has never actually made it to
linux-pci@vger.kernel.org, so I haven't seen the 1/2 patch.  This
query:

  https://lore.kernel.org/linux-pci/?q=f%3Arajvi.jingar

finds only a couple responses.  I did mention this to Rajvi, but
haven't heard anything yet.

Bjorn

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 2/2] PCI/PM: Fix pci_pm_suspend_noirq() to disable PTM
  2022-05-23 18:32                         ` Bjorn Helgaas
@ 2022-05-23 19:02                           ` Rafael J. Wysocki
  0 siblings, 0 replies; 14+ messages in thread
From: Rafael J. Wysocki @ 2022-05-23 19:02 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, David Box, Jingar, Rajvi, Wysocki, Rafael J,
	Bjorn Helgaas, Linux PCI, Linux Kernel Mailing List, Linux PM

On Mon, May 23, 2022 at 8:32 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Thu, May 19, 2022 at 09:01:27PM +0200, Rafael J. Wysocki wrote:
> > ...
>
> > In the meantime, I think that it would make sense to pick up the first
> > patch in this series which is a good cleanup regardless.
> >
> > Bjorn, could you do that, please?
>
> As far as I know, this series has never actually made it to
> linux-pci@vger.kernel.org, so I haven't seen the 1/2 patch.  This
> query:
>
>   https://lore.kernel.org/linux-pci/?q=f%3Arajvi.jingar
>
> finds only a couple responses.  I did mention this to Rajvi, but
> haven't heard anything yet.

OK, I'll resubmit that patch for him.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-05-23 19:24 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20220429210538.2270472-1-rajvi.jingar@intel.com>
     [not found] ` <20220429210538.2270472-2-rajvi.jingar@intel.com>
2022-05-12 13:49   ` [PATCH v5 2/2] PCI/PM: Fix pci_pm_suspend_noirq() to disable PTM Rafael J. Wysocki
2022-05-12 17:42     ` Bjorn Helgaas
2022-05-12 17:52       ` Rafael J. Wysocki
2022-05-12 18:35         ` Bjorn Helgaas
2022-05-13 22:00           ` Jingar, Rajvi
2022-05-14 17:12             ` Rafael J. Wysocki
2022-05-16 20:09             ` Bjorn Helgaas
2022-05-16 20:59               ` Rafael J. Wysocki
2022-05-17 14:48                 ` Bjorn Helgaas
2022-05-17 14:54                   ` Rafael J. Wysocki
2022-05-17 18:05                     ` David E. Box
2022-05-19 19:01                       ` Rafael J. Wysocki
2022-05-23 18:32                         ` Bjorn Helgaas
2022-05-23 19:02                           ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).