[PATCH v2 0/3] PCI / ACPI: Handle sibling devices sharing power resources

linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/3] PCI / ACPI: Handle sibling devices sharing power resources
@ 2019-06-18 16:18 Mika Westerberg
  2019-06-18 16:18 ` [PATCH v2 1/3] PCI / ACPI: Use cached ACPI device state to get PCI device power state Mika Westerberg
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Mika Westerberg @ 2019-06-18 16:18 UTC (permalink / raw)
  To: Rafael J. Wysocki, Bjorn Helgaas
  Cc: Len Brown, Lukas Wunner, Keith Busch, Alex Williamson,
	Alexandru Gagniuc, Mika Westerberg, linux-acpi, linux-pci

Hi all,

Based on a discussion regarding patch series I sent previously [1] to deal
with sibling devices sharing ACPI power resources, I prepared a new
reworked version according to the comments I got.

To summarize, in Intel Ice Lake the Thunderbolt controller, PCIe root ports
and xHCI all share power resources. When they are all in D3hot power
resources (returned by _PR3) can be turned off powering off the whole
block. However, there are two issues around this.

Firstly the PCI core sets the device power state by asking what the real
ACPI power state is. This results that all but last device sharing the
power resources are in D3hot when the power resources are turned off. This
causes issues if user runs for example 'lspci' because the device is really
in D3cold so what user gets back is all ones (0xffffffff).

Secondly if any of the device is runtime resumed the power resources are
turned on bringing all other devices sharing the resources to
D0uninitialized losing their wakeup configuration.

This series aims to fix the two issues by:

  1. Using the ACPI cached power state when PCI devices are transitioned
     into low power states instead of reading back the "real" power state.

  2. Introducing concept of "_PR0 dependent devices" that get runtime
     resumed whenever their power resource (which they might share with
     other sibling devices) gets turned on.

The series is based on the idea of Rafael J. Wysocki <rafael@kernel.org>.

[1] https://www.spinics.net/lists/linux-pci/msg83583.html

Mika Westerberg (3):
  PCI / ACPI: Use cached ACPI device state to get PCI device power state
  ACPI / PM: Introduce concept of a _PR0 dependent device
  PCI / ACPI: Add _PR0 dependent devices

 drivers/acpi/power.c    | 139 ++++++++++++++++++++++++++++++++++++++++
 drivers/pci/pci-acpi.c  |   5 +-
 include/acpi/acpi_bus.h |   4 ++
 3 files changed, 147 insertions(+), 1 deletion(-)

-- 
2.20.1

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 1/3] PCI / ACPI: Use cached ACPI device state to get PCI device power state
  2019-06-18 16:18 [PATCH v2 0/3] PCI / ACPI: Handle sibling devices sharing power resources Mika Westerberg
@ 2019-06-18 16:18 ` Mika Westerberg
  2019-06-19 21:28   ` Bjorn Helgaas
  2019-06-21 11:56   ` Rafael J. Wysocki
  2019-06-18 16:18 ` [PATCH v2 2/3] ACPI / PM: Introduce concept of a _PR0 dependent device Mika Westerberg
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 22+ messages in thread
From: Mika Westerberg @ 2019-06-18 16:18 UTC (permalink / raw)
  To: Rafael J. Wysocki, Bjorn Helgaas
  Cc: Len Brown, Lukas Wunner, Keith Busch, Alex Williamson,
	Alexandru Gagniuc, Mika Westerberg, linux-acpi, linux-pci

Intel Ice Lake has an integrated Thunderbolt controller which means that
the PCIe topology is extended directly from the two root ports (RP0 and
RP1). Power management is handled by ACPI power resources that are
shared between the root ports, Thunderbolt controller (NHI) and xHCI
controller.

The topology with the power resources (marked with []) looks like:

  Host bridge
    |
    +- RP0 ---\
    +- RP1 ---|--+--> [TBT]
    +- NHI --/   |
    |            |
    |            v
    +- xHCI --> [D3C]

Here TBT and D3C are the shared ACPI power resources. ACPI _PR3() method
returns either TBT or D3C or both.

Say we runtime suspend first the root ports RP0 and RP1, then NHI. Now
since the TBT power resource is still on when the root ports are runtime
suspended their dev->current_state is set to D3hot. When NHI is runtime
suspended TBT is finally turned off but state of the root ports remain
to be D3hot.

If the user now runs lspci for instance, the result is all 1's like in
the below output (07.0 is the first root port, RP0):

00:07.0 PCI bridge: Intel Corporation Device 8a1d (rev ff) (prog-if ff)
    !!! Unknown header type 7f
    Kernel driver in use: pcieport

I short the hardware state is not in sync with the software state
anymore. The exact same thing happens with the PME polling thread which
ends up bringing the root ports back into D0 after they are runtime
suspended.

ACPI core already sets the device state to be D3cold when it drops its
references to the power resources returned by _PR3 even if these power
resources are still physically on (other devices still reference them).
However, in PCI core we call acpi_device_get_power() to figure out the
power state and that returns the "real" power state based on the state
of its power resources.

To make it work with the shared power resources modify
acpi_pci_get_power_state() so that it reads the ACPI device power state
that was cached by the ACPI core. This makes the PCI device power state
match the ACPI device power state regardless of state of the shared
power resources that may still be on at this point.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
 drivers/pci/pci-acpi.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
index 1897847ceb0c..b782acac26c5 100644
--- a/drivers/pci/pci-acpi.c
+++ b/drivers/pci/pci-acpi.c
@@ -685,7 +685,8 @@ static pci_power_t acpi_pci_get_power_state(struct pci_dev *dev)
 	if (!adev || !acpi_device_power_manageable(adev))
 		return PCI_UNKNOWN;

-	if (acpi_device_get_power(adev, &state) || state == ACPI_STATE_UNKNOWN)
+	state = adev->power.state;
+	if (state == ACPI_STATE_UNKNOWN)
 		return PCI_UNKNOWN;

 	return state_conv[state];
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/3] PCI / ACPI: Use cached ACPI device state to get PCI device power state
  2019-06-18 16:18 ` [PATCH v2 1/3] PCI / ACPI: Use cached ACPI device state to get PCI device power state Mika Westerberg
@ 2019-06-19 21:28   ` Bjorn Helgaas
  2019-06-20  8:27     ` Mika Westerberg
  2019-06-21 11:56   ` Rafael J. Wysocki
  1 sibling, 1 reply; 22+ messages in thread
From: Bjorn Helgaas @ 2019-06-19 21:28 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Rafael J. Wysocki, Len Brown, Lukas Wunner, Keith Busch,
	Alex Williamson, Alexandru Gagniuc, linux-acpi, linux-pci

On Tue, Jun 18, 2019 at 07:18:56PM +0300, Mika Westerberg wrote:
> Intel Ice Lake has an integrated Thunderbolt controller which means that
> the PCIe topology is extended directly from the two root ports (RP0 and
> RP1).

A PCIe topology is always extended directly from root ports,
regardless of whether a Thunderbolt controller is integrated, so I
guess I'm missing the point you're making.  It doesn't sound like this
is anything specific to Thunderbolt?

> Power management is handled by ACPI power resources that are
> shared between the root ports, Thunderbolt controller (NHI) and xHCI
> controller.
> 
> The topology with the power resources (marked with []) looks like:
> 
>   Host bridge
>     |
>     +- RP0 ---\
>     +- RP1 ---|--+--> [TBT]
>     +- NHI --/   |
>     |            |
>     |            v
>     +- xHCI --> [D3C]
> 
> Here TBT and D3C are the shared ACPI power resources. ACPI _PR3() method
> returns either TBT or D3C or both.
> 
> Say we runtime suspend first the root ports RP0 and RP1, then NHI. Now
> since the TBT power resource is still on when the root ports are runtime
> suspended their dev->current_state is set to D3hot. When NHI is runtime
> suspended TBT is finally turned off but state of the root ports remain
> to be D3hot.
> 
> If the user now runs lspci for instance, the result is all 1's like in
> the below output (07.0 is the first root port, RP0):
> 
> 00:07.0 PCI bridge: Intel Corporation Device 8a1d (rev ff) (prog-if ff)
>     !!! Unknown header type 7f
>     Kernel driver in use: pcieport
> 
> I short the hardware state is not in sync with the software state
> anymore. The exact same thing happens with the PME polling thread which
> ends up bringing the root ports back into D0 after they are runtime
> suspended.

s/I /In /

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/3] PCI / ACPI: Use cached ACPI device state to get PCI device power state
  2019-06-19 21:28   ` Bjorn Helgaas
@ 2019-06-20  8:27     ` Mika Westerberg
  2019-06-20 13:16       ` Bjorn Helgaas
  0 siblings, 1 reply; 22+ messages in thread
From: Mika Westerberg @ 2019-06-20  8:27 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, Len Brown, Lukas Wunner, Keith Busch,
	Alex Williamson, Alexandru Gagniuc, linux-acpi, linux-pci

On Wed, Jun 19, 2019 at 04:28:01PM -0500, Bjorn Helgaas wrote:
> On Tue, Jun 18, 2019 at 07:18:56PM +0300, Mika Westerberg wrote:
> > Intel Ice Lake has an integrated Thunderbolt controller which means that
> > the PCIe topology is extended directly from the two root ports (RP0 and
> > RP1).
> 
> A PCIe topology is always extended directly from root ports,
> regardless of whether a Thunderbolt controller is integrated, so I
> guess I'm missing the point you're making.  It doesn't sound like this
> is anything specific to Thunderbolt?

The point I'm trying to make here is to explain why this is problem now
and not with the previous discrete controllers. With the previous there
was only a single ACPI power resource for the root port and the
Thunderbolt host router was connected to that root port. PCIe hierarchy
was extended through downstream ports (not root ports) of that
controller (which includes PCIe switch).

Now the thing is part of the SoC so power management is different and
causes problems in Linux.

> > Power management is handled by ACPI power resources that are
> > shared between the root ports, Thunderbolt controller (NHI) and xHCI
> > controller.
> > 
> > The topology with the power resources (marked with []) looks like:
> > 
> >   Host bridge
> >     |
> >     +- RP0 ---\
> >     +- RP1 ---|--+--> [TBT]
> >     +- NHI --/   |
> >     |            |
> >     |            v
> >     +- xHCI --> [D3C]
> > 
> > Here TBT and D3C are the shared ACPI power resources. ACPI _PR3() method
> > returns either TBT or D3C or both.
> > 
> > Say we runtime suspend first the root ports RP0 and RP1, then NHI. Now
> > since the TBT power resource is still on when the root ports are runtime
> > suspended their dev->current_state is set to D3hot. When NHI is runtime
> > suspended TBT is finally turned off but state of the root ports remain
> > to be D3hot.
> > 
> > If the user now runs lspci for instance, the result is all 1's like in
> > the below output (07.0 is the first root port, RP0):
> > 
> > 00:07.0 PCI bridge: Intel Corporation Device 8a1d (rev ff) (prog-if ff)
> >     !!! Unknown header type 7f
> >     Kernel driver in use: pcieport
> > 
> > I short the hardware state is not in sync with the software state
> > anymore. The exact same thing happens with the PME polling thread which
> > ends up bringing the root ports back into D0 after they are runtime
> > suspended.
> 
> s/I /In /

Thanks, I'll fix it.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/3] PCI / ACPI: Use cached ACPI device state to get PCI device power state
  2019-06-20  8:27     ` Mika Westerberg
@ 2019-06-20 13:16       ` Bjorn Helgaas
  2019-06-20 13:37         ` Mika Westerberg
  0 siblings, 1 reply; 22+ messages in thread
From: Bjorn Helgaas @ 2019-06-20 13:16 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Rafael J. Wysocki, Len Brown, Lukas Wunner, Keith Busch,
	Alex Williamson, Alexandru Gagniuc, linux-acpi, linux-pci

On Thu, Jun 20, 2019 at 11:27:30AM +0300, Mika Westerberg wrote:
> On Wed, Jun 19, 2019 at 04:28:01PM -0500, Bjorn Helgaas wrote:
> > On Tue, Jun 18, 2019 at 07:18:56PM +0300, Mika Westerberg wrote:
> > > Intel Ice Lake has an integrated Thunderbolt controller which
> > > means that the PCIe topology is extended directly from the two
> > > root ports (RP0 and RP1).
> > 
> > A PCIe topology is always extended directly from root ports,
> > regardless of whether a Thunderbolt controller is integrated, so I
> > guess I'm missing the point you're making.  It doesn't sound like
> > this is anything specific to Thunderbolt?
>
> The point I'm trying to make here is to explain why this is problem
> now and not with the previous discrete controllers. With the
> previous there was only a single ACPI power resource for the root
> port and the Thunderbolt host router was connected to that root
> port. PCIe hierarchy was extended through downstream ports (not root
> ports) of that controller (which includes PCIe switch).

Sounds like you're using "PCIe topology extension" to mean
specifically something below a Thunderbolt controller, excluding a
subtree below a root port.  I don't think the PCI core is aware of
that distinction.

> Now the thing is part of the SoC so power management is different
> and causes problems in Linux.

The SoC is a physical packaging issue that really doesn't enter into
the specs directly.  I'm trying to get at the logical topology
questions in terms of the PCIe and ACPI specs.

I assume we could dream up a non-Thunderbolt topology that would show
the same problem?

> > > Power management is handled by ACPI power resources that are
> > > shared between the root ports, Thunderbolt controller (NHI) and xHCI
> > > controller.
> > > 
> > > The topology with the power resources (marked with []) looks like:
> > > 
> > >   Host bridge
> > >     |
> > >     +- RP0 ---\
> > >     +- RP1 ---|--+--> [TBT]
> > >     +- NHI --/   |
> > >     |            |
> > >     |            v
> > >     +- xHCI --> [D3C]
> > > 
> > > Here TBT and D3C are the shared ACPI power resources. ACPI
> > > _PR3() method returns either TBT or D3C or both.

I'm not very familiar with _PR3.  I guess this is under an ACPI object
representing a PCI device, e.g., \_SB.PCI0.RP0._PR3?

> > > Say we runtime suspend first the root ports RP0 and RP1, then
> > > NHI. Now since the TBT power resource is still on when the root
> > > ports are runtime suspended their dev->current_state is set to
> > > D3hot. When NHI is runtime suspended TBT is finally turned off
> > > but state of the root ports remain to be D3hot.

So in this example we might have:

  _SB.PCI0.RP0._PR3: TBT
  _SB.PCI0.RP1._PR3: TBT
  _SB.PCI0.NHI._PR3: TBT

And when Linux figures out that everything depending on TBT is in
D3hot, it evaluates TBT._OFF, which puts them all in D3cold?  And part
of the problem is that they're now in D3cold (where config access
doesn't work) but Linux still thinks they're in D3hot (where config
access would work)?

I feel like I'm missing something because I don't know how D3C is
involved, since you didn't mention suspending xHCI.

And I can't mentally match up the patch with the D3hot/D3cold state
change (if indeed that's the problem).  If we were updating the path
that evaluates _OFF so it changed the power state of all dependent
devices, *that* would make a lot of sense to me because it sounds like
that's where the physical change happens that makes things out of
sync.

> > > If the user now runs lspci for instance, the result is all 1's like in
> > > the below output (07.0 is the first root port, RP0):
> > > 
> > > 00:07.0 PCI bridge: Intel Corporation Device 8a1d (rev ff) (prog-if ff)
> > >     !!! Unknown header type 7f
> > >     Kernel driver in use: pcieport

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/3] PCI / ACPI: Use cached ACPI device state to get PCI device power state
  2019-06-20 13:16       ` Bjorn Helgaas
@ 2019-06-20 13:37         ` Mika Westerberg
  2019-06-20 14:15           ` Bjorn Helgaas
  0 siblings, 1 reply; 22+ messages in thread
From: Mika Westerberg @ 2019-06-20 13:37 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, Len Brown, Lukas Wunner, Keith Busch,
	Alex Williamson, Alexandru Gagniuc, linux-acpi, linux-pci

On Thu, Jun 20, 2019 at 08:16:49AM -0500, Bjorn Helgaas wrote:
> On Thu, Jun 20, 2019 at 11:27:30AM +0300, Mika Westerberg wrote:
> > On Wed, Jun 19, 2019 at 04:28:01PM -0500, Bjorn Helgaas wrote:
> > > On Tue, Jun 18, 2019 at 07:18:56PM +0300, Mika Westerberg wrote:
> > > > Intel Ice Lake has an integrated Thunderbolt controller which
> > > > means that the PCIe topology is extended directly from the two
> > > > root ports (RP0 and RP1).
> > > 
> > > A PCIe topology is always extended directly from root ports,
> > > regardless of whether a Thunderbolt controller is integrated, so I
> > > guess I'm missing the point you're making.  It doesn't sound like
> > > this is anything specific to Thunderbolt?
> >
> > The point I'm trying to make here is to explain why this is problem
> > now and not with the previous discrete controllers. With the
> > previous there was only a single ACPI power resource for the root
> > port and the Thunderbolt host router was connected to that root
> > port. PCIe hierarchy was extended through downstream ports (not root
> > ports) of that controller (which includes PCIe switch).
> 
> Sounds like you're using "PCIe topology extension" to mean
> specifically something below a Thunderbolt controller, excluding a
> subtree below a root port.  I don't think the PCI core is aware of
> that distinction.

Right it is not.

> > Now the thing is part of the SoC so power management is different
> > and causes problems in Linux.
> 
> The SoC is a physical packaging issue that really doesn't enter into
> the specs directly.  I'm trying to get at the logical topology
> questions in terms of the PCIe and ACPI specs.
> 
> I assume we could dream up a non-Thunderbolt topology that would show
> the same problem?

Yes.

> > > > Power management is handled by ACPI power resources that are
> > > > shared between the root ports, Thunderbolt controller (NHI) and xHCI
> > > > controller.
> > > > 
> > > > The topology with the power resources (marked with []) looks like:
> > > > 
> > > >   Host bridge
> > > >     |
> > > >     +- RP0 ---\
> > > >     +- RP1 ---|--+--> [TBT]
> > > >     +- NHI --/   |
> > > >     |            |
> > > >     |            v
> > > >     +- xHCI --> [D3C]
> > > > 
> > > > Here TBT and D3C are the shared ACPI power resources. ACPI
> > > > _PR3() method returns either TBT or D3C or both.
> 
> I'm not very familiar with _PR3.  I guess this is under an ACPI object
> representing a PCI device, e.g., \_SB.PCI0.RP0._PR3?

Correct.

> > > > Say we runtime suspend first the root ports RP0 and RP1, then
> > > > NHI. Now since the TBT power resource is still on when the root
> > > > ports are runtime suspended their dev->current_state is set to
> > > > D3hot. When NHI is runtime suspended TBT is finally turned off
> > > > but state of the root ports remain to be D3hot.
> 
> So in this example we might have:
> 
>   _SB.PCI0.RP0._PR3: TBT
>   _SB.PCI0.RP1._PR3: TBT
>   _SB.PCI0.NHI._PR3: TBT

and also D3C.

> And when Linux figures out that everything depending on TBT is in
> D3hot, it evaluates TBT._OFF, which puts them all in D3cold?  And part
> of the problem is that they're now in D3cold (where config access
> doesn't work) but Linux still thinks they're in D3hot (where config
> access would work)?

Exactly.

> I feel like I'm missing something because I don't know how D3C is
> involved, since you didn't mention suspending xHCI.

That's another power resource so we will also have D3C turned off when
xHCI gets suspended but I did not want to complicate things too much in
the changelog.

> And I can't mentally match up the patch with the D3hot/D3cold state
> change (if indeed that's the problem).  If we were updating the path
> that evaluates _OFF so it changed the power state of all dependent
> devices, *that* would make a lot of sense to me because it sounds like
> that's where the physical change happens that makes things out of
> sync.

I did that in the first version [1] but Rafael pointed out that it is
racy one way or another [2].

[1] https://www.spinics.net/lists/linux-pci/msg83583.html
[2] https://www.spinics.net/lists/linux-pci/msg83600.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/3] PCI / ACPI: Use cached ACPI device state to get PCI device power state
  2019-06-20 13:37         ` Mika Westerberg
@ 2019-06-20 14:15           ` Bjorn Helgaas
  2019-06-21 10:32             ` Rafael J. Wysocki
  0 siblings, 1 reply; 22+ messages in thread
From: Bjorn Helgaas @ 2019-06-20 14:15 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Rafael J. Wysocki, Len Brown, Lukas Wunner, Keith Busch,
	Alex Williamson, Alexandru Gagniuc, linux-acpi, linux-pci

On Thu, Jun 20, 2019 at 04:37:10PM +0300, Mika Westerberg wrote:
> On Thu, Jun 20, 2019 at 08:16:49AM -0500, Bjorn Helgaas wrote:
> > On Thu, Jun 20, 2019 at 11:27:30AM +0300, Mika Westerberg wrote:
> > > On Wed, Jun 19, 2019 at 04:28:01PM -0500, Bjorn Helgaas wrote:
> > > > On Tue, Jun 18, 2019 at 07:18:56PM +0300, Mika Westerberg wrote:
> > > > > Intel Ice Lake has an integrated Thunderbolt controller which
> > > > > means that the PCIe topology is extended directly from the two
> > > > > root ports (RP0 and RP1).
> > > > 
> > > > A PCIe topology is always extended directly from root ports,
> > > > regardless of whether a Thunderbolt controller is integrated, so I
> > > > guess I'm missing the point you're making.  It doesn't sound like
> > > > this is anything specific to Thunderbolt?
> > >
> > > The point I'm trying to make here is to explain why this is problem
> > > now and not with the previous discrete controllers. With the
> > > previous there was only a single ACPI power resource for the root
> > > port and the Thunderbolt host router was connected to that root
> > > port. PCIe hierarchy was extended through downstream ports (not root
> > > ports) of that controller (which includes PCIe switch).
> > 
> > Sounds like you're using "PCIe topology extension" to mean
> > specifically something below a Thunderbolt controller, excluding a
> > subtree below a root port.  I don't think the PCI core is aware of
> > that distinction.
> 
> Right it is not.
> 
> > > Now the thing is part of the SoC so power management is different
> > > and causes problems in Linux.
> > 
> > The SoC is a physical packaging issue that really doesn't enter into
> > the specs directly.  I'm trying to get at the logical topology
> > questions in terms of the PCIe and ACPI specs.
> > 
> > I assume we could dream up a non-Thunderbolt topology that would show
> > the same problem?
> 
> Yes.
> 
> > > > > Power management is handled by ACPI power resources that are
> > > > > shared between the root ports, Thunderbolt controller (NHI) and xHCI
> > > > > controller.
> > > > > 
> > > > > The topology with the power resources (marked with []) looks like:
> > > > > 
> > > > >   Host bridge
> > > > >     |
> > > > >     +- RP0 ---\
> > > > >     +- RP1 ---|--+--> [TBT]
> > > > >     +- NHI --/   |
> > > > >     |            |
> > > > >     |            v
> > > > >     +- xHCI --> [D3C]
> > > > > 
> > > > > Here TBT and D3C are the shared ACPI power resources. ACPI
> > > > > _PR3() method returns either TBT or D3C or both.
> > 
> > I'm not very familiar with _PR3.  I guess this is under an ACPI object
> > representing a PCI device, e.g., \_SB.PCI0.RP0._PR3?
> 
> Correct.
> 
> > > > > Say we runtime suspend first the root ports RP0 and RP1, then
> > > > > NHI. Now since the TBT power resource is still on when the root
> > > > > ports are runtime suspended their dev->current_state is set to
> > > > > D3hot. When NHI is runtime suspended TBT is finally turned off
> > > > > but state of the root ports remain to be D3hot.
> > 
> > So in this example we might have:
> > 
> >   _SB.PCI0.RP0._PR3: TBT
> >   _SB.PCI0.RP1._PR3: TBT
> >   _SB.PCI0.NHI._PR3: TBT
> 
> and also D3C.
> 
> > And when Linux figures out that everything depending on TBT is in
> > D3hot, it evaluates TBT._OFF, which puts them all in D3cold?  And part
> > of the problem is that they're now in D3cold (where config access
> > doesn't work) but Linux still thinks they're in D3hot (where config
> > access would work)?
> 
> Exactly.
> 
> > I feel like I'm missing something because I don't know how D3C is
> > involved, since you didn't mention suspending xHCI.
> 
> That's another power resource so we will also have D3C turned off when
> xHCI gets suspended but I did not want to complicate things too much in
> the changelog.

If D3C isn't essential to seeing this problem, you could just omit it
altogether.  I think stripping out anything that's not essential will
make it easier to think about the underlying issues.

> > And I can't mentally match up the patch with the D3hot/D3cold state
> > change (if indeed that's the problem).  If we were updating the path
> > that evaluates _OFF so it changed the power state of all dependent
> > devices, *that* would make a lot of sense to me because it sounds like
> > that's where the physical change happens that makes things out of
> > sync.
> 
> I did that in the first version [1] but Rafael pointed out that it is
> racy one way or another [2].
> 
> [1] https://www.spinics.net/lists/linux-pci/msg83583.html
> [2] https://www.spinics.net/lists/linux-pci/msg83600.html

Yeah, interesting.  It was definitely a much larger patch.  I don't
know enough to comment on the races.  I would wonder whether there's a
way to get rid of the caches that become stale, but that's just an
idle thought, not a suggestion.

Bjorn

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/3] PCI / ACPI: Use cached ACPI device state to get PCI device power state
  2019-06-20 14:15           ` Bjorn Helgaas
@ 2019-06-21 10:32             ` Rafael J. Wysocki
  2019-06-21 13:09               ` Bjorn Helgaas
  2019-06-24 11:14               ` Rafael J. Wysocki
  0 siblings, 2 replies; 22+ messages in thread
From: Rafael J. Wysocki @ 2019-06-21 10:32 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mika Westerberg, Rafael J. Wysocki, Len Brown, Lukas Wunner,
	Keith Busch, Alex Williamson, Alexandru Gagniuc,
	ACPI Devel Maling List, Linux PCI

On Thu, Jun 20, 2019 at 4:15 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Thu, Jun 20, 2019 at 04:37:10PM +0300, Mika Westerberg wrote:
> > On Thu, Jun 20, 2019 at 08:16:49AM -0500, Bjorn Helgaas wrote:
> > > On Thu, Jun 20, 2019 at 11:27:30AM +0300, Mika Westerberg wrote:
> > > > On Wed, Jun 19, 2019 at 04:28:01PM -0500, Bjorn Helgaas wrote:
> > > > > On Tue, Jun 18, 2019 at 07:18:56PM +0300, Mika Westerberg wrote:
> > > > > > Intel Ice Lake has an integrated Thunderbolt controller which
> > > > > > means that the PCIe topology is extended directly from the two
> > > > > > root ports (RP0 and RP1).
> > > > >
> > > > > A PCIe topology is always extended directly from root ports,
> > > > > regardless of whether a Thunderbolt controller is integrated, so I
> > > > > guess I'm missing the point you're making.  It doesn't sound like
> > > > > this is anything specific to Thunderbolt?
> > > >
> > > > The point I'm trying to make here is to explain why this is problem
> > > > now and not with the previous discrete controllers. With the
> > > > previous there was only a single ACPI power resource for the root
> > > > port and the Thunderbolt host router was connected to that root
> > > > port. PCIe hierarchy was extended through downstream ports (not root
> > > > ports) of that controller (which includes PCIe switch).
> > >
> > > Sounds like you're using "PCIe topology extension" to mean
> > > specifically something below a Thunderbolt controller, excluding a
> > > subtree below a root port.  I don't think the PCI core is aware of
> > > that distinction.
> >
> > Right it is not.
> >
> > > > Now the thing is part of the SoC so power management is different
> > > > and causes problems in Linux.
> > >
> > > The SoC is a physical packaging issue that really doesn't enter into
> > > the specs directly.  I'm trying to get at the logical topology
> > > questions in terms of the PCIe and ACPI specs.
> > >
> > > I assume we could dream up a non-Thunderbolt topology that would show
> > > the same problem?
> >
> > Yes.
> >
> > > > > > Power management is handled by ACPI power resources that are
> > > > > > shared between the root ports, Thunderbolt controller (NHI) and xHCI
> > > > > > controller.
> > > > > >
> > > > > > The topology with the power resources (marked with []) looks like:
> > > > > >
> > > > > >   Host bridge
> > > > > >     |
> > > > > >     +- RP0 ---\
> > > > > >     +- RP1 ---|--+--> [TBT]
> > > > > >     +- NHI --/   |
> > > > > >     |            |
> > > > > >     |            v
> > > > > >     +- xHCI --> [D3C]
> > > > > >
> > > > > > Here TBT and D3C are the shared ACPI power resources. ACPI
> > > > > > _PR3() method returns either TBT or D3C or both.
> > >
> > > I'm not very familiar with _PR3.  I guess this is under an ACPI object
> > > representing a PCI device, e.g., \_SB.PCI0.RP0._PR3?
> >
> > Correct.
> >
> > > > > > Say we runtime suspend first the root ports RP0 and RP1, then
> > > > > > NHI. Now since the TBT power resource is still on when the root
> > > > > > ports are runtime suspended their dev->current_state is set to
> > > > > > D3hot. When NHI is runtime suspended TBT is finally turned off
> > > > > > but state of the root ports remain to be D3hot.
> > >
> > > So in this example we might have:
> > >
> > >   _SB.PCI0.RP0._PR3: TBT
> > >   _SB.PCI0.RP1._PR3: TBT
> > >   _SB.PCI0.NHI._PR3: TBT
> >
> > and also D3C.
> >
> > > And when Linux figures out that everything depending on TBT is in
> > > D3hot, it evaluates TBT._OFF, which puts them all in D3cold?  And part
> > > of the problem is that they're now in D3cold (where config access
> > > doesn't work) but Linux still thinks they're in D3hot (where config
> > > access would work)?
> >
> > Exactly.
> >
> > > I feel like I'm missing something because I don't know how D3C is
> > > involved, since you didn't mention suspending xHCI.
> >
> > That's another power resource so we will also have D3C turned off when
> > xHCI gets suspended but I did not want to complicate things too much in
> > the changelog.
>
> If D3C isn't essential to seeing this problem, you could just omit it
> altogether.  I think stripping out anything that's not essential will
> make it easier to think about the underlying issues.
>
> > > And I can't mentally match up the patch with the D3hot/D3cold state
> > > change (if indeed that's the problem).  If we were updating the path
> > > that evaluates _OFF so it changed the power state of all dependent
> > > devices, *that* would make a lot of sense to me because it sounds like
> > > that's where the physical change happens that makes things out of
> > > sync.
> >
> > I did that in the first version [1] but Rafael pointed out that it is
> > racy one way or another [2].
> >
> > [1] https://www.spinics.net/lists/linux-pci/msg83583.html
> > [2] https://www.spinics.net/lists/linux-pci/msg83600.html
>
> Yeah, interesting.  It was definitely a much larger patch.  I don't
> know enough to comment on the races.

Say two power resources are listed by _PR3 for one device (because why
not?) and you want to change the device's state to D3cold only if the
two power resources are both "off".  Then, you need some locking (or
equivalent) to synchronize two power resources with each other, so
that you can change the devices state when the last of them goes _OFF.
Currently, there is no such synchronization between power resources
other then the "system_level" value which may not be reliable enough
for this type of use.

Or you can say that the device is in D3cold if at least one of the
power resources is _OFF, but IMO that may not really be consistent
with the view that the "logical" power state of the device should
reflect the physical reality accurately.

> I would wonder whether there's a way to get rid of the caches that become stale,

I guess what you mean is that the "cached" (or rather "logical" or
"expected") power state value may become different from what is
returned by acpi_device_get_power() for the device.

The problem here is that acpi_device_get_power() really only should be
used for two purposes: (1) To initialize adev->power.state, or to
update it via acpi_device_update_power(), and (2) by the
"real_power_state" sysfs attribute (of ACPI device objects).  The
adev->power.state value should be used anywhere else, in principle, so
the Mika's patch is correct.

[Note that adev->power.state cannot be updated after calling
acpi_device_get_power() to the value returned by it without updating
the reference counters of the power resources that are "on" *exactly*
because of the problem at hand here.]

> but that's just an idle thought, not a suggestion.

After the initialization of the ACPI subsystem, the authoritative
source of the ACPI device power state information is
adev->power.state.  The ACPI subsystem is expected to update this
value as needed going forward (including system-wide transitions like
resume from S3).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/3] PCI / ACPI: Use cached ACPI device state to get PCI device power state
  2019-06-21 10:32             ` Rafael J. Wysocki
@ 2019-06-21 13:09               ` Bjorn Helgaas
  2019-06-22  8:51                 ` Rafael J. Wysocki
  2019-06-24 11:14               ` Rafael J. Wysocki
  1 sibling, 1 reply; 22+ messages in thread
From: Bjorn Helgaas @ 2019-06-21 13:09 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Mika Westerberg, Rafael J. Wysocki, Len Brown, Lukas Wunner,
	Keith Busch, Alex Williamson, Alexandru Gagniuc,
	ACPI Devel Maling List, Linux PCI

On Fri, Jun 21, 2019 at 12:32:22PM +0200, Rafael J. Wysocki wrote:
> On Thu, Jun 20, 2019 at 4:15 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Thu, Jun 20, 2019 at 04:37:10PM +0300, Mika Westerberg wrote:
> > > On Thu, Jun 20, 2019 at 08:16:49AM -0500, Bjorn Helgaas wrote:
> > > > On Thu, Jun 20, 2019 at 11:27:30AM +0300, Mika Westerberg wrote:
> > > > > On Wed, Jun 19, 2019 at 04:28:01PM -0500, Bjorn Helgaas wrote:
> > > > > > On Tue, Jun 18, 2019 at 07:18:56PM +0300, Mika Westerberg wrote:
> > > > > > > Intel Ice Lake has an integrated Thunderbolt controller which
> > > > > > > means that the PCIe topology is extended directly from the two
> > > > > > > root ports (RP0 and RP1).
> > > > > >
> > > > > > A PCIe topology is always extended directly from root ports,
> > > > > > regardless of whether a Thunderbolt controller is integrated, so I
> > > > > > guess I'm missing the point you're making.  It doesn't sound like
> > > > > > this is anything specific to Thunderbolt?
> > > > >
> > > > > The point I'm trying to make here is to explain why this is problem
> > > > > now and not with the previous discrete controllers. With the
> > > > > previous there was only a single ACPI power resource for the root
> > > > > port and the Thunderbolt host router was connected to that root
> > > > > port. PCIe hierarchy was extended through downstream ports (not root
> > > > > ports) of that controller (which includes PCIe switch).
> > > >
> > > > Sounds like you're using "PCIe topology extension" to mean
> > > > specifically something below a Thunderbolt controller, excluding a
> > > > subtree below a root port.  I don't think the PCI core is aware of
> > > > that distinction.
> > >
> > > Right it is not.
> > >
> > > > > Now the thing is part of the SoC so power management is different
> > > > > and causes problems in Linux.
> > > >
> > > > The SoC is a physical packaging issue that really doesn't enter into
> > > > the specs directly.  I'm trying to get at the logical topology
> > > > questions in terms of the PCIe and ACPI specs.
> > > >
> > > > I assume we could dream up a non-Thunderbolt topology that would show
> > > > the same problem?
> > >
> > > Yes.
> > >
> > > > > > > Power management is handled by ACPI power resources that are
> > > > > > > shared between the root ports, Thunderbolt controller (NHI) and xHCI
> > > > > > > controller.
> > > > > > >
> > > > > > > The topology with the power resources (marked with []) looks like:
> > > > > > >
> > > > > > >   Host bridge
> > > > > > >     |
> > > > > > >     +- RP0 ---\
> > > > > > >     +- RP1 ---|--+--> [TBT]
> > > > > > >     +- NHI --/   |
> > > > > > >     |            |
> > > > > > >     |            v
> > > > > > >     +- xHCI --> [D3C]
> > > > > > >
> > > > > > > Here TBT and D3C are the shared ACPI power resources. ACPI
> > > > > > > _PR3() method returns either TBT or D3C or both.
> > > >
> > > > I'm not very familiar with _PR3.  I guess this is under an ACPI object
> > > > representing a PCI device, e.g., \_SB.PCI0.RP0._PR3?
> > >
> > > Correct.
> > >
> > > > > > > Say we runtime suspend first the root ports RP0 and RP1, then
> > > > > > > NHI. Now since the TBT power resource is still on when the root
> > > > > > > ports are runtime suspended their dev->current_state is set to
> > > > > > > D3hot. When NHI is runtime suspended TBT is finally turned off
> > > > > > > but state of the root ports remain to be D3hot.
> > > >
> > > > So in this example we might have:
> > > >
> > > >   _SB.PCI0.RP0._PR3: TBT
> > > >   _SB.PCI0.RP1._PR3: TBT
> > > >   _SB.PCI0.NHI._PR3: TBT
> > >
> > > and also D3C.
> > >
> > > > And when Linux figures out that everything depending on TBT is in
> > > > D3hot, it evaluates TBT._OFF, which puts them all in D3cold?  And part
> > > > of the problem is that they're now in D3cold (where config access
> > > > doesn't work) but Linux still thinks they're in D3hot (where config
> > > > access would work)?
> > >
> > > Exactly.
> > >
> > > > I feel like I'm missing something because I don't know how D3C is
> > > > involved, since you didn't mention suspending xHCI.
> > >
> > > That's another power resource so we will also have D3C turned off when
> > > xHCI gets suspended but I did not want to complicate things too much in
> > > the changelog.
> >
> > If D3C isn't essential to seeing this problem, you could just omit it
> > altogether.  I think stripping out anything that's not essential will
> > make it easier to think about the underlying issues.
> >
> > > > And I can't mentally match up the patch with the D3hot/D3cold state
> > > > change (if indeed that's the problem).  If we were updating the path
> > > > that evaluates _OFF so it changed the power state of all dependent
> > > > devices, *that* would make a lot of sense to me because it sounds like
> > > > that's where the physical change happens that makes things out of
> > > > sync.
> > >
> > > I did that in the first version [1] but Rafael pointed out that it is
> > > racy one way or another [2].
> > >
> > > [1] https://www.spinics.net/lists/linux-pci/msg83583.html
> > > [2] https://www.spinics.net/lists/linux-pci/msg83600.html
> >
> > Yeah, interesting.  It was definitely a much larger patch.  I don't
> > know enough to comment on the races.
> 
> Say two power resources are listed by _PR3 for one device (because why
> not?) and you want to change the device's state to D3cold only if the
> two power resources are both "off".  Then, you need some locking (or
> equivalent) to synchronize two power resources with each other, so
> that you can change the devices state when the last of them goes _OFF.
> Currently, there is no such synchronization between power resources
> other then the "system_level" value which may not be reliable enough
> for this type of use.
> 
> Or you can say that the device is in D3cold if at least one of the
> power resources is _OFF, but IMO that may not really be consistent
> with the view that the "logical" power state of the device should
> reflect the physical reality accurately.
> 
> > I would wonder whether there's a way to get rid of the caches that become stale,
> 
> I guess what you mean is that the "cached" (or rather "logical" or
> "expected") power state value may become different from what is
> returned by acpi_device_get_power() for the device.
> 
> The problem here is that acpi_device_get_power() really only should be
> used for two purposes: (1) To initialize adev->power.state, or to
> update it via acpi_device_update_power(), and (2) by the
> "real_power_state" sysfs attribute (of ACPI device objects).  The
> adev->power.state value should be used anywhere else, in principle, so
> the Mika's patch is correct.
> 
> [Note that adev->power.state cannot be updated after calling
> acpi_device_get_power() to the value returned by it without updating
> the reference counters of the power resources that are "on" *exactly*
> because of the problem at hand here.]
> 
> > but that's just an idle thought, not a suggestion.
> 
> After the initialization of the ACPI subsystem, the authoritative
> source of the ACPI device power state information is
> adev->power.state.  The ACPI subsystem is expected to update this
> value as needed going forward (including system-wide transitions like
> resume from S3).

Thanks, this is all very helpful!  Do you by any chance add
lore.kernel.org links to commit logs when applying patches?  This is a
case where I think the discussion could be useful in the future.

Link: https://lore.kernel.org/r/20190618161858.77834-2-mika.westerberg@linux.intel.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/3] PCI / ACPI: Use cached ACPI device state to get PCI device power state
  2019-06-21 13:09               ` Bjorn Helgaas
@ 2019-06-22  8:51                 ` Rafael J. Wysocki
  2019-06-24 10:57                   ` Mika Westerberg
  0 siblings, 1 reply; 22+ messages in thread
From: Rafael J. Wysocki @ 2019-06-22  8:51 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, Mika Westerberg, Len Brown, Lukas Wunner,
	Keith Busch, Alex Williamson, Alexandru Gagniuc,
	ACPI Devel Maling List, Linux PCI

On Friday, June 21, 2019 3:09:20 PM CEST Bjorn Helgaas wrote:
> On Fri, Jun 21, 2019 at 12:32:22PM +0200, Rafael J. Wysocki wrote:
> > On Thu, Jun 20, 2019 at 4:15 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Thu, Jun 20, 2019 at 04:37:10PM +0300, Mika Westerberg wrote:
> > > > On Thu, Jun 20, 2019 at 08:16:49AM -0500, Bjorn Helgaas wrote:
> > > > > On Thu, Jun 20, 2019 at 11:27:30AM +0300, Mika Westerberg wrote:
> > > > > > On Wed, Jun 19, 2019 at 04:28:01PM -0500, Bjorn Helgaas wrote:
> > > > > > > On Tue, Jun 18, 2019 at 07:18:56PM +0300, Mika Westerberg wrote:
> > > > > > > > Intel Ice Lake has an integrated Thunderbolt controller which
> > > > > > > > means that the PCIe topology is extended directly from the two
> > > > > > > > root ports (RP0 and RP1).
> > > > > > >
> > > > > > > A PCIe topology is always extended directly from root ports,
> > > > > > > regardless of whether a Thunderbolt controller is integrated, so I
> > > > > > > guess I'm missing the point you're making.  It doesn't sound like
> > > > > > > this is anything specific to Thunderbolt?
> > > > > >
> > > > > > The point I'm trying to make here is to explain why this is problem
> > > > > > now and not with the previous discrete controllers. With the
> > > > > > previous there was only a single ACPI power resource for the root
> > > > > > port and the Thunderbolt host router was connected to that root
> > > > > > port. PCIe hierarchy was extended through downstream ports (not root
> > > > > > ports) of that controller (which includes PCIe switch).
> > > > >
> > > > > Sounds like you're using "PCIe topology extension" to mean
> > > > > specifically something below a Thunderbolt controller, excluding a
> > > > > subtree below a root port.  I don't think the PCI core is aware of
> > > > > that distinction.
> > > >
> > > > Right it is not.
> > > >
> > > > > > Now the thing is part of the SoC so power management is different
> > > > > > and causes problems in Linux.
> > > > >
> > > > > The SoC is a physical packaging issue that really doesn't enter into
> > > > > the specs directly.  I'm trying to get at the logical topology
> > > > > questions in terms of the PCIe and ACPI specs.
> > > > >
> > > > > I assume we could dream up a non-Thunderbolt topology that would show
> > > > > the same problem?
> > > >
> > > > Yes.
> > > >
> > > > > > > > Power management is handled by ACPI power resources that are
> > > > > > > > shared between the root ports, Thunderbolt controller (NHI) and xHCI
> > > > > > > > controller.
> > > > > > > >
> > > > > > > > The topology with the power resources (marked with []) looks like:
> > > > > > > >
> > > > > > > >   Host bridge
> > > > > > > >     |
> > > > > > > >     +- RP0 ---\
> > > > > > > >     +- RP1 ---|--+--> [TBT]
> > > > > > > >     +- NHI --/   |
> > > > > > > >     |            |
> > > > > > > >     |            v
> > > > > > > >     +- xHCI --> [D3C]
> > > > > > > >
> > > > > > > > Here TBT and D3C are the shared ACPI power resources. ACPI
> > > > > > > > _PR3() method returns either TBT or D3C or both.
> > > > >
> > > > > I'm not very familiar with _PR3.  I guess this is under an ACPI object
> > > > > representing a PCI device, e.g., \_SB.PCI0.RP0._PR3?
> > > >
> > > > Correct.
> > > >
> > > > > > > > Say we runtime suspend first the root ports RP0 and RP1, then
> > > > > > > > NHI. Now since the TBT power resource is still on when the root
> > > > > > > > ports are runtime suspended their dev->current_state is set to
> > > > > > > > D3hot. When NHI is runtime suspended TBT is finally turned off
> > > > > > > > but state of the root ports remain to be D3hot.
> > > > >
> > > > > So in this example we might have:
> > > > >
> > > > >   _SB.PCI0.RP0._PR3: TBT
> > > > >   _SB.PCI0.RP1._PR3: TBT
> > > > >   _SB.PCI0.NHI._PR3: TBT
> > > >
> > > > and also D3C.
> > > >
> > > > > And when Linux figures out that everything depending on TBT is in
> > > > > D3hot, it evaluates TBT._OFF, which puts them all in D3cold?  And part
> > > > > of the problem is that they're now in D3cold (where config access
> > > > > doesn't work) but Linux still thinks they're in D3hot (where config
> > > > > access would work)?
> > > >
> > > > Exactly.
> > > >
> > > > > I feel like I'm missing something because I don't know how D3C is
> > > > > involved, since you didn't mention suspending xHCI.
> > > >
> > > > That's another power resource so we will also have D3C turned off when
> > > > xHCI gets suspended but I did not want to complicate things too much in
> > > > the changelog.
> > >
> > > If D3C isn't essential to seeing this problem, you could just omit it
> > > altogether.  I think stripping out anything that's not essential will
> > > make it easier to think about the underlying issues.
> > >
> > > > > And I can't mentally match up the patch with the D3hot/D3cold state
> > > > > change (if indeed that's the problem).  If we were updating the path
> > > > > that evaluates _OFF so it changed the power state of all dependent
> > > > > devices, *that* would make a lot of sense to me because it sounds like
> > > > > that's where the physical change happens that makes things out of
> > > > > sync.
> > > >
> > > > I did that in the first version [1] but Rafael pointed out that it is
> > > > racy one way or another [2].
> > > >
> > > > [1] https://www.spinics.net/lists/linux-pci/msg83583.html
> > > > [2] https://www.spinics.net/lists/linux-pci/msg83600.html
> > >
> > > Yeah, interesting.  It was definitely a much larger patch.  I don't
> > > know enough to comment on the races.
> > 
> > Say two power resources are listed by _PR3 for one device (because why
> > not?) and you want to change the device's state to D3cold only if the
> > two power resources are both "off".  Then, you need some locking (or
> > equivalent) to synchronize two power resources with each other, so
> > that you can change the devices state when the last of them goes _OFF.
> > Currently, there is no such synchronization between power resources
> > other then the "system_level" value which may not be reliable enough
> > for this type of use.
> > 
> > Or you can say that the device is in D3cold if at least one of the
> > power resources is _OFF, but IMO that may not really be consistent
> > with the view that the "logical" power state of the device should
> > reflect the physical reality accurately.
> > 
> > > I would wonder whether there's a way to get rid of the caches that become stale,
> > 
> > I guess what you mean is that the "cached" (or rather "logical" or
> > "expected") power state value may become different from what is
> > returned by acpi_device_get_power() for the device.
> > 
> > The problem here is that acpi_device_get_power() really only should be
> > used for two purposes: (1) To initialize adev->power.state, or to
> > update it via acpi_device_update_power(), and (2) by the
> > "real_power_state" sysfs attribute (of ACPI device objects).  The
> > adev->power.state value should be used anywhere else, in principle, so
> > the Mika's patch is correct.
> > 
> > [Note that adev->power.state cannot be updated after calling
> > acpi_device_get_power() to the value returned by it without updating
> > the reference counters of the power resources that are "on" *exactly*
> > because of the problem at hand here.]
> > 
> > > but that's just an idle thought, not a suggestion.
> > 
> > After the initialization of the ACPI subsystem, the authoritative
> > source of the ACPI device power state information is
> > adev->power.state.  The ACPI subsystem is expected to update this
> > value as needed going forward (including system-wide transitions like
> > resume from S3).
> 
> Thanks, this is all very helpful!  Do you by any chance add
> lore.kernel.org links to commit logs when applying patches?  This is a
> case where I think the discussion could be useful in the future.
> 
> Link: https://lore.kernel.org/r/20190618161858.77834-2-mika.westerberg@linux.intel.com

Agreed, and thanks for the URL.

I guess Mika can add this tag to the patch changelog.





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/3] PCI / ACPI: Use cached ACPI device state to get PCI device power state
  2019-06-22  8:51                 ` Rafael J. Wysocki
@ 2019-06-24 10:57                   ` Mika Westerberg
  0 siblings, 0 replies; 22+ messages in thread
From: Mika Westerberg @ 2019-06-24 10:57 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Len Brown, Lukas Wunner,
	Keith Busch, Alex Williamson, Alexandru Gagniuc,
	ACPI Devel Maling List, Linux PCI

On Sat, Jun 22, 2019 at 10:51:28AM +0200, Rafael J. Wysocki wrote:
> > Thanks, this is all very helpful!  Do you by any chance add
> > lore.kernel.org links to commit logs when applying patches?  This is a
> > case where I think the discussion could be useful in the future.
> > 
> > Link: https://lore.kernel.org/r/20190618161858.77834-2-mika.westerberg@linux.intel.com
> 
> Agreed, and thanks for the URL.
> 
> I guess Mika can add this tag to the patch changelog.

Sure I'll add it.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/3] PCI / ACPI: Use cached ACPI device state to get PCI device power state
  2019-06-21 10:32             ` Rafael J. Wysocki
  2019-06-21 13:09               ` Bjorn Helgaas
@ 2019-06-24 11:14               ` Rafael J. Wysocki
  2019-06-25  9:45                 ` Mika Westerberg
  1 sibling, 1 reply; 22+ messages in thread
From: Rafael J. Wysocki @ 2019-06-24 11:14 UTC (permalink / raw)
  To: Bjorn Helgaas, Mika Westerberg
  Cc: Len Brown, Lukas Wunner, Keith Busch, Alex Williamson,
	Alexandru Gagniuc, ACPI Devel Maling List, Linux PCI

On Friday, June 21, 2019 12:32:22 PM CEST Rafael J. Wysocki wrote:
> On Thu, Jun 20, 2019 at 4:15 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > On Thu, Jun 20, 2019 at 04:37:10PM +0300, Mika Westerberg wrote:
> > > On Thu, Jun 20, 2019 at 08:16:49AM -0500, Bjorn Helgaas wrote:
> > > > On Thu, Jun 20, 2019 at 11:27:30AM +0300, Mika Westerberg wrote:
> > > > > On Wed, Jun 19, 2019 at 04:28:01PM -0500, Bjorn Helgaas wrote:
> > > > > > On Tue, Jun 18, 2019 at 07:18:56PM +0300, Mika Westerberg wrote:
> > > > > > > Intel Ice Lake has an integrated Thunderbolt controller which
> > > > > > > means that the PCIe topology is extended directly from the two
> > > > > > > root ports (RP0 and RP1).
> > > > > >
> > > > > > A PCIe topology is always extended directly from root ports,
> > > > > > regardless of whether a Thunderbolt controller is integrated, so I
> > > > > > guess I'm missing the point you're making.  It doesn't sound like
> > > > > > this is anything specific to Thunderbolt?
> > > > >
> > > > > The point I'm trying to make here is to explain why this is problem
> > > > > now and not with the previous discrete controllers. With the
> > > > > previous there was only a single ACPI power resource for the root
> > > > > port and the Thunderbolt host router was connected to that root
> > > > > port. PCIe hierarchy was extended through downstream ports (not root
> > > > > ports) of that controller (which includes PCIe switch).
> > > >
> > > > Sounds like you're using "PCIe topology extension" to mean
> > > > specifically something below a Thunderbolt controller, excluding a
> > > > subtree below a root port.  I don't think the PCI core is aware of
> > > > that distinction.
> > >
> > > Right it is not.
> > >
> > > > > Now the thing is part of the SoC so power management is different
> > > > > and causes problems in Linux.
> > > >
> > > > The SoC is a physical packaging issue that really doesn't enter into
> > > > the specs directly.  I'm trying to get at the logical topology
> > > > questions in terms of the PCIe and ACPI specs.
> > > >
> > > > I assume we could dream up a non-Thunderbolt topology that would show
> > > > the same problem?
> > >
> > > Yes.
> > >
> > > > > > > Power management is handled by ACPI power resources that are
> > > > > > > shared between the root ports, Thunderbolt controller (NHI) and xHCI
> > > > > > > controller.
> > > > > > >
> > > > > > > The topology with the power resources (marked with []) looks like:
> > > > > > >
> > > > > > >   Host bridge
> > > > > > >     |
> > > > > > >     +- RP0 ---\
> > > > > > >     +- RP1 ---|--+--> [TBT]
> > > > > > >     +- NHI --/   |
> > > > > > >     |            |
> > > > > > >     |            v
> > > > > > >     +- xHCI --> [D3C]
> > > > > > >
> > > > > > > Here TBT and D3C are the shared ACPI power resources. ACPI
> > > > > > > _PR3() method returns either TBT or D3C or both.
> > > >
> > > > I'm not very familiar with _PR3.  I guess this is under an ACPI object
> > > > representing a PCI device, e.g., \_SB.PCI0.RP0._PR3?
> > >
> > > Correct.
> > >
> > > > > > > Say we runtime suspend first the root ports RP0 and RP1, then
> > > > > > > NHI. Now since the TBT power resource is still on when the root
> > > > > > > ports are runtime suspended their dev->current_state is set to
> > > > > > > D3hot. When NHI is runtime suspended TBT is finally turned off
> > > > > > > but state of the root ports remain to be D3hot.
> > > >
> > > > So in this example we might have:
> > > >
> > > >   _SB.PCI0.RP0._PR3: TBT
> > > >   _SB.PCI0.RP1._PR3: TBT
> > > >   _SB.PCI0.NHI._PR3: TBT
> > >
> > > and also D3C.
> > >
> > > > And when Linux figures out that everything depending on TBT is in
> > > > D3hot, it evaluates TBT._OFF, which puts them all in D3cold?  And part
> > > > of the problem is that they're now in D3cold (where config access
> > > > doesn't work) but Linux still thinks they're in D3hot (where config
> > > > access would work)?
> > >
> > > Exactly.
> > >
> > > > I feel like I'm missing something because I don't know how D3C is
> > > > involved, since you didn't mention suspending xHCI.
> > >
> > > That's another power resource so we will also have D3C turned off when
> > > xHCI gets suspended but I did not want to complicate things too much in
> > > the changelog.
> >
> > If D3C isn't essential to seeing this problem, you could just omit it
> > altogether.  I think stripping out anything that's not essential will
> > make it easier to think about the underlying issues.
> >
> > > > And I can't mentally match up the patch with the D3hot/D3cold state
> > > > change (if indeed that's the problem).  If we were updating the path
> > > > that evaluates _OFF so it changed the power state of all dependent
> > > > devices, *that* would make a lot of sense to me because it sounds like
> > > > that's where the physical change happens that makes things out of
> > > > sync.
> > >
> > > I did that in the first version [1] but Rafael pointed out that it is
> > > racy one way or another [2].
> > >
> > > [1] https://www.spinics.net/lists/linux-pci/msg83583.html
> > > [2] https://www.spinics.net/lists/linux-pci/msg83600.html
> >
> > Yeah, interesting.  It was definitely a much larger patch.  I don't
> > know enough to comment on the races.
> 
> Say two power resources are listed by _PR3 for one device (because why
> not?) and you want to change the device's state to D3cold only if the
> two power resources are both "off".  Then, you need some locking (or
> equivalent) to synchronize two power resources with each other, so
> that you can change the devices state when the last of them goes _OFF.
> Currently, there is no such synchronization between power resources
> other then the "system_level" value which may not be reliable enough
> for this type of use.
> 
> Or you can say that the device is in D3cold if at least one of the
> power resources is _OFF, but IMO that may not really be consistent
> with the view that the "logical" power state of the device should
> reflect the physical reality accurately.
> 
> > I would wonder whether there's a way to get rid of the caches that become stale,
> 
> I guess what you mean is that the "cached" (or rather "logical" or
> "expected") power state value may become different from what is
> returned by acpi_device_get_power() for the device.
> 
> The problem here is that acpi_device_get_power() really only should be
> used for two purposes: (1) To initialize adev->power.state, or to
> update it via acpi_device_update_power(), and (2) by the
> "real_power_state" sysfs attribute (of ACPI device objects).  The
> adev->power.state value should be used anywhere else, in principle, so
> the Mika's patch is correct.

Well, it is an improvement, but it is not sufficient.

> [Note that adev->power.state cannot be updated after calling
> acpi_device_get_power() to the value returned by it without updating
> the reference counters of the power resources that are "on" *exactly*
> because of the problem at hand here.]

That is obviously correct, but ->

> > but that's just an idle thought, not a suggestion.
> 
> After the initialization of the ACPI subsystem, the authoritative
> source of the ACPI device power state information is
> adev->power.state.  The ACPI subsystem is expected to update this
> value as needed going forward (including system-wide transitions like
> resume from S3).

-> the "resume from S3 or hibernation" case needs special handling, because
in that case the device power state need not reflect the information the ACPI
subsystem has.  That only matters if adev->power.state is ACPI_STATE_D0 and
the device is actually *not* in D0, because in that case acpi_device_set_power()
will not work.  So that case is not covered currently (it should be rare in practice,
though, if it happens at all), so something like the patch below (untested) may
be needed in addition to the Mika's patch.

Still, there is also the "power state not matching" case in pci_pm_complete() that's
need to be covered and the non-PCI ACPI PM has a similar issue in theory, so I
need to think about this a bit more.

---
 drivers/pci/pci-acpi.c |   14 +++++++++++++-
 drivers/pci/pci-mid.c  |    3 ++-
 drivers/pci/pci.c      |    9 +++++----
 drivers/pci/pci.h      |    2 +-
 4 files changed, 21 insertions(+), 7 deletions(-)

Index: linux-pm/drivers/pci/pci-acpi.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-acpi.c
+++ linux-pm/drivers/pci/pci-acpi.c
@@ -632,7 +632,8 @@ static bool acpi_pci_power_manageable(st
 	return adev ? acpi_device_power_manageable(adev) : false;
 }
 
-static int acpi_pci_set_power_state(struct pci_dev *dev, pci_power_t state)
+static int acpi_pci_set_power_state(struct pci_dev *dev, pci_power_t state,
+				    bool force)
 {
 	struct acpi_device *adev = ACPI_COMPANION(&dev->dev);
 	static const u8 state_conv[] = {
@@ -657,6 +658,17 @@ static int acpi_pci_set_power_state(stru
 		}
 		/* Fall through */
 	case PCI_D0:
+		if (force) {
+			int acpi_state;
+
+			error = acpi_device_update_power(adev, &acpi_state);
+			if (error)
+				return error;
+
+			if (acpi_state == ACPI_STATE_D0)
+				return 0;
+		}
+		/* fall through */
 	case PCI_D1:
 	case PCI_D2:
 	case PCI_D3hot:
Index: linux-pm/drivers/pci/pci-mid.c
===================================================================
--- linux-pm.orig/drivers/pci/pci-mid.c
+++ linux-pm/drivers/pci/pci-mid.c
@@ -21,7 +21,8 @@ static bool mid_pci_power_manageable(str
 	return true;
 }
 
-static int mid_pci_set_power_state(struct pci_dev *pdev, pci_power_t state)
+static int mid_pci_set_power_state(struct pci_dev *pdev, pci_power_t state,
+				   bool not_used)
 {
 	return intel_mid_pci_set_power_state(pdev, state);
 }
Index: linux-pm/drivers/pci/pci.c
===================================================================
--- linux-pm.orig/drivers/pci/pci.c
+++ linux-pm/drivers/pci/pci.c
@@ -767,9 +767,10 @@ static inline bool platform_pci_power_ma
 }
 
 static inline int platform_pci_set_power_state(struct pci_dev *dev,
-					       pci_power_t t)
+					       pci_power_t state, bool force)
 {
-	return pci_platform_pm ? pci_platform_pm->set_state(dev, t) : -ENOSYS;
+	return pci_platform_pm ?
+		pci_platform_pm->set_state(dev, state, force) : -ENOSYS;
 }
 
 static inline pci_power_t platform_pci_get_power_state(struct pci_dev *dev)
@@ -944,7 +945,7 @@ void pci_update_current_state(struct pci
 void pci_power_up(struct pci_dev *dev)
 {
 	if (platform_pci_power_manageable(dev))
-		platform_pci_set_power_state(dev, PCI_D0);
+		platform_pci_set_power_state(dev, PCI_D0, true);
 
 	pci_raw_set_power_state(dev, PCI_D0);
 	pci_update_current_state(dev, PCI_D0);
@@ -960,7 +961,7 @@ static int pci_platform_power_transition
 	int error;
 
 	if (platform_pci_power_manageable(dev)) {
-		error = platform_pci_set_power_state(dev, state);
+		error = platform_pci_set_power_state(dev, state, false);
 		if (!error)
 			pci_update_current_state(dev, state);
 	} else
Index: linux-pm/drivers/pci/pci.h
===================================================================
--- linux-pm.orig/drivers/pci/pci.h
+++ linux-pm/drivers/pci/pci.h
@@ -67,7 +67,7 @@ int pci_bus_error_reset(struct pci_dev *
 struct pci_platform_pm_ops {
 	bool (*bridge_d3)(struct pci_dev *dev);
 	bool (*is_manageable)(struct pci_dev *dev);
-	int (*set_state)(struct pci_dev *dev, pci_power_t state);
+	int (*set_state)(struct pci_dev *dev, pci_power_t state, bool force);
 	pci_power_t (*get_state)(struct pci_dev *dev);
 	pci_power_t (*choose_state)(struct pci_dev *dev);
 	int (*set_wakeup)(struct pci_dev *dev, bool enable);




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/3] PCI / ACPI: Use cached ACPI device state to get PCI device power state
  2019-06-24 11:14               ` Rafael J. Wysocki
@ 2019-06-25  9:45                 ` Mika Westerberg
  2019-06-25 10:00                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 22+ messages in thread
From: Mika Westerberg @ 2019-06-25  9:45 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Bjorn Helgaas, Len Brown, Lukas Wunner, Keith Busch,
	Alex Williamson, Alexandru Gagniuc, ACPI Devel Maling List,
	Linux PCI

On Mon, Jun 24, 2019 at 01:14:47PM +0200, Rafael J. Wysocki wrote:
> > The problem here is that acpi_device_get_power() really only should be
> > used for two purposes: (1) To initialize adev->power.state, or to
> > update it via acpi_device_update_power(), and (2) by the
> > "real_power_state" sysfs attribute (of ACPI device objects).  The
> > adev->power.state value should be used anywhere else, in principle, so
> > the Mika's patch is correct.
> 
> Well, it is an improvement, but it is not sufficient.
> 
> > [Note that adev->power.state cannot be updated after calling
> > acpi_device_get_power() to the value returned by it without updating
> > the reference counters of the power resources that are "on" *exactly*
> > because of the problem at hand here.]
> 
> That is obviously correct, but ->
> 
> > > but that's just an idle thought, not a suggestion.
> > 
> > After the initialization of the ACPI subsystem, the authoritative
> > source of the ACPI device power state information is
> > adev->power.state.  The ACPI subsystem is expected to update this
> > value as needed going forward (including system-wide transitions like
> > resume from S3).
> 
> -> the "resume from S3 or hibernation" case needs special handling, because
> in that case the device power state need not reflect the information the ACPI
> subsystem has.  That only matters if adev->power.state is ACPI_STATE_D0 and
> the device is actually *not* in D0, because in that case acpi_device_set_power()
> will not work. 

I guess you are talking about the special-cased devices that we leave in
D0 when system suspend (via firmware) is entered?

> So that case is not covered currently (it should be rare in practice,
> though, if it happens at all), so something like the patch below (untested) may
> be needed in addition to the Mika's patch.

Looks good to me.

> Still, there is also the "power state not matching" case in pci_pm_complete() that's
> need to be covered and the non-PCI ACPI PM has a similar issue in theory, so I
> need to think about this a bit more.

Do you want me to hold off sending an updated version of the patch
series while we figure this one out or is it fine if I send it out now
and we can add further details on top?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/3] PCI / ACPI: Use cached ACPI device state to get PCI device power state
  2019-06-25  9:45                 ` Mika Westerberg
@ 2019-06-25 10:00                   ` Rafael J. Wysocki
  2019-06-25 10:08                     ` Mika Westerberg
  0 siblings, 1 reply; 22+ messages in thread
From: Rafael J. Wysocki @ 2019-06-25 10:00 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Rafael J. Wysocki, Bjorn Helgaas, Len Brown, Lukas Wunner,
	Keith Busch, Alex Williamson, Alexandru Gagniuc,
	ACPI Devel Maling List, Linux PCI

On Tue, Jun 25, 2019 at 11:46 AM Mika Westerberg
<mika.westerberg@linux.intel.com> wrote:
>
> On Mon, Jun 24, 2019 at 01:14:47PM +0200, Rafael J. Wysocki wrote:
> > > The problem here is that acpi_device_get_power() really only should be
> > > used for two purposes: (1) To initialize adev->power.state, or to
> > > update it via acpi_device_update_power(), and (2) by the
> > > "real_power_state" sysfs attribute (of ACPI device objects).  The
> > > adev->power.state value should be used anywhere else, in principle, so
> > > the Mika's patch is correct.
> >
> > Well, it is an improvement, but it is not sufficient.
> >
> > > [Note that adev->power.state cannot be updated after calling
> > > acpi_device_get_power() to the value returned by it without updating
> > > the reference counters of the power resources that are "on" *exactly*
> > > because of the problem at hand here.]
> >
> > That is obviously correct, but ->
> >
> > > > but that's just an idle thought, not a suggestion.
> > >
> > > After the initialization of the ACPI subsystem, the authoritative
> > > source of the ACPI device power state information is
> > > adev->power.state.  The ACPI subsystem is expected to update this
> > > value as needed going forward (including system-wide transitions like
> > > resume from S3).
> >
> > -> the "resume from S3 or hibernation" case needs special handling, because
> > in that case the device power state need not reflect the information the ACPI
> > subsystem has.  That only matters if adev->power.state is ACPI_STATE_D0 and
> > the device is actually *not* in D0, because in that case acpi_device_set_power()
> > will not work.
>
> I guess you are talking about the special-cased devices that we leave in
> D0 when system suspend (via firmware) is entered?
>
> > So that case is not covered currently (it should be rare in practice,
> > though, if it happens at all), so something like the patch below (untested) may
> > be needed in addition to the Mika's patch.
>
> Looks good to me.

I actually decided to address this issue in acpi_device_set_power() as
it may affect devices beyond PCI in principle.  I will send a patch
for that shortly.

> > Still, there is also the "power state not matching" case in pci_pm_complete() that's
> > need to be covered and the non-PCI ACPI PM has a similar issue in theory, so I
> > need to think about this a bit more.
>
> Do you want me to hold off sending an updated version of the patch
> series while we figure this one out or is it fine if I send it out now
> and we can add further details on top?

It is independent of the other fix, so it can be sent now just fine IMO.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/3] PCI / ACPI: Use cached ACPI device state to get PCI device power state
  2019-06-25 10:00                   ` Rafael J. Wysocki
@ 2019-06-25 10:08                     ` Mika Westerberg
  0 siblings, 0 replies; 22+ messages in thread
From: Mika Westerberg @ 2019-06-25 10:08 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Bjorn Helgaas, Len Brown, Lukas Wunner,
	Keith Busch, Alex Williamson, Alexandru Gagniuc,
	ACPI Devel Maling List, Linux PCI

On Tue, Jun 25, 2019 at 12:00:57PM +0200, Rafael J. Wysocki wrote:
> On Tue, Jun 25, 2019 at 11:46 AM Mika Westerberg
> <mika.westerberg@linux.intel.com> wrote:
> >
> > On Mon, Jun 24, 2019 at 01:14:47PM +0200, Rafael J. Wysocki wrote:
> > > > The problem here is that acpi_device_get_power() really only should be
> > > > used for two purposes: (1) To initialize adev->power.state, or to
> > > > update it via acpi_device_update_power(), and (2) by the
> > > > "real_power_state" sysfs attribute (of ACPI device objects).  The
> > > > adev->power.state value should be used anywhere else, in principle, so
> > > > the Mika's patch is correct.
> > >
> > > Well, it is an improvement, but it is not sufficient.
> > >
> > > > [Note that adev->power.state cannot be updated after calling
> > > > acpi_device_get_power() to the value returned by it without updating
> > > > the reference counters of the power resources that are "on" *exactly*
> > > > because of the problem at hand here.]
> > >
> > > That is obviously correct, but ->
> > >
> > > > > but that's just an idle thought, not a suggestion.
> > > >
> > > > After the initialization of the ACPI subsystem, the authoritative
> > > > source of the ACPI device power state information is
> > > > adev->power.state.  The ACPI subsystem is expected to update this
> > > > value as needed going forward (including system-wide transitions like
> > > > resume from S3).
> > >
> > > -> the "resume from S3 or hibernation" case needs special handling, because
> > > in that case the device power state need not reflect the information the ACPI
> > > subsystem has.  That only matters if adev->power.state is ACPI_STATE_D0 and
> > > the device is actually *not* in D0, because in that case acpi_device_set_power()
> > > will not work.
> >
> > I guess you are talking about the special-cased devices that we leave in
> > D0 when system suspend (via firmware) is entered?
> >
> > > So that case is not covered currently (it should be rare in practice,
> > > though, if it happens at all), so something like the patch below (untested) may
> > > be needed in addition to the Mika's patch.
> >
> > Looks good to me.
> 
> I actually decided to address this issue in acpi_device_set_power() as
> it may affect devices beyond PCI in principle.  I will send a patch
> for that shortly.

Thanks!

> > > Still, there is also the "power state not matching" case in pci_pm_complete() that's
> > > need to be covered and the non-PCI ACPI PM has a similar issue in theory, so I
> > > need to think about this a bit more.
> >
> > Do you want me to hold off sending an updated version of the patch
> > series while we figure this one out or is it fine if I send it out now
> > and we can add further details on top?
> 
> It is independent of the other fix, so it can be sent now just fine IMO.

OK, I'll send it out in a minute.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/3] PCI / ACPI: Use cached ACPI device state to get PCI device power state
  2019-06-18 16:18 ` [PATCH v2 1/3] PCI / ACPI: Use cached ACPI device state to get PCI device power state Mika Westerberg
  2019-06-19 21:28   ` Bjorn Helgaas
@ 2019-06-21 11:56   ` Rafael J. Wysocki
  2019-06-24 10:58     ` Mika Westerberg
  1 sibling, 1 reply; 22+ messages in thread
From: Rafael J. Wysocki @ 2019-06-21 11:56 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Rafael J. Wysocki, Bjorn Helgaas, Len Brown, Lukas Wunner,
	Keith Busch, Alex Williamson, Alexandru Gagniuc,
	ACPI Devel Maling List, Linux PCI

On Tue, Jun 18, 2019 at 6:19 PM Mika Westerberg
<mika.westerberg@linux.intel.com> wrote:
>

Actually, to start with, you can say that the ACPI power state
returned by acpi_device_get_power() may depend on the configuration of
ACPI power resources in the system which may change at any time after
acpi_device_get_power() has returned, unless the reference counters of
the ACPI power resources in question are set to prevent that from
happening.  Thus it is invalid to use acpi_device_get_power() in
acpi_pci_get_power_state() the way it is done now and the value of the
power.state field in the corresponding struct acpi_device object
(which reflects the ACPI power resources reference counting, among
other things) should be used instead.

Then you can describe the particular issue below as an example.

IMO that would explain the rationale better here.

> Intel Ice Lake has an integrated Thunderbolt controller which means that
> the PCIe topology is extended directly from the two root ports (RP0 and
> RP1). Power management is handled by ACPI power resources that are
> shared between the root ports, Thunderbolt controller (NHI) and xHCI
> controller.
>
> The topology with the power resources (marked with []) looks like:
>
>   Host bridge
>     |
>     +- RP0 ---\
>     +- RP1 ---|--+--> [TBT]
>     +- NHI --/   |
>     |            |
>     |            v
>     +- xHCI --> [D3C]
>
> Here TBT and D3C are the shared ACPI power resources. ACPI _PR3() method
> returns either TBT or D3C or both.
>
> Say we runtime suspend first the root ports RP0 and RP1, then NHI. Now
> since the TBT power resource is still on when the root ports are runtime
> suspended their dev->current_state is set to D3hot. When NHI is runtime
> suspended TBT is finally turned off but state of the root ports remain
> to be D3hot.
>
> If the user now runs lspci for instance, the result is all 1's like in
> the below output (07.0 is the first root port, RP0):
>
> 00:07.0 PCI bridge: Intel Corporation Device 8a1d (rev ff) (prog-if ff)
>     !!! Unknown header type 7f
>     Kernel driver in use: pcieport
>
> I short the hardware state is not in sync with the software state
> anymore. The exact same thing happens with the PME polling thread which
> ends up bringing the root ports back into D0 after they are runtime
> suspended.
>
> ACPI core already sets the device state to be D3cold when it drops its
> references to the power resources returned by _PR3 even if these power
> resources are still physically on (other devices still reference them).
> However, in PCI core we call acpi_device_get_power() to figure out the
> power state and that returns the "real" power state based on the state
> of its power resources.
>
> To make it work with the shared power resources modify
> acpi_pci_get_power_state() so that it reads the ACPI device power state
> that was cached by the ACPI core. This makes the PCI device power state
> match the ACPI device power state regardless of state of the shared
> power resources that may still be on at this point.
>
> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> ---
>  drivers/pci/pci-acpi.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> index 1897847ceb0c..b782acac26c5 100644
> --- a/drivers/pci/pci-acpi.c
> +++ b/drivers/pci/pci-acpi.c
> @@ -685,7 +685,8 @@ static pci_power_t acpi_pci_get_power_state(struct pci_dev *dev)
>         if (!adev || !acpi_device_power_manageable(adev))
>                 return PCI_UNKNOWN;
>
> -       if (acpi_device_get_power(adev, &state) || state == ACPI_STATE_UNKNOWN)
> +       state = adev->power.state;
> +       if (state == ACPI_STATE_UNKNOWN)
>                 return PCI_UNKNOWN;
>
>         return state_conv[state];
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/3] PCI / ACPI: Use cached ACPI device state to get PCI device power state
  2019-06-21 11:56   ` Rafael J. Wysocki
@ 2019-06-24 10:58     ` Mika Westerberg
  0 siblings, 0 replies; 22+ messages in thread
From: Mika Westerberg @ 2019-06-24 10:58 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Bjorn Helgaas, Len Brown, Lukas Wunner,
	Keith Busch, Alex Williamson, Alexandru Gagniuc,
	ACPI Devel Maling List, Linux PCI

On Fri, Jun 21, 2019 at 01:56:49PM +0200, Rafael J. Wysocki wrote:
> On Tue, Jun 18, 2019 at 6:19 PM Mika Westerberg
> <mika.westerberg@linux.intel.com> wrote:
> >
> 
> Actually, to start with, you can say that the ACPI power state
> returned by acpi_device_get_power() may depend on the configuration of
> ACPI power resources in the system which may change at any time after
> acpi_device_get_power() has returned, unless the reference counters of
> the ACPI power resources in question are set to prevent that from
> happening.  Thus it is invalid to use acpi_device_get_power() in
> acpi_pci_get_power_state() the way it is done now and the value of the
> power.state field in the corresponding struct acpi_device object
> (which reflects the ACPI power resources reference counting, among
> other things) should be used instead.
> 
> Then you can describe the particular issue below as an example.
> 
> IMO that would explain the rationale better here.

Thanks! I'll update the changelog accordingly.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 2/3] ACPI / PM: Introduce concept of a _PR0 dependent device
  2019-06-18 16:18 [PATCH v2 0/3] PCI / ACPI: Handle sibling devices sharing power resources Mika Westerberg
  2019-06-18 16:18 ` [PATCH v2 1/3] PCI / ACPI: Use cached ACPI device state to get PCI device power state Mika Westerberg
@ 2019-06-18 16:18 ` Mika Westerberg
  2019-06-19 13:20   ` Rafael J. Wysocki
  2019-06-18 16:18 ` [PATCH v2 3/3] PCI / ACPI: Add _PR0 dependent devices Mika Westerberg
  2019-06-19 13:24 ` [PATCH v2 0/3] PCI / ACPI: Handle sibling devices sharing power resources Rafael J. Wysocki
  3 siblings, 1 reply; 22+ messages in thread
From: Mika Westerberg @ 2019-06-18 16:18 UTC (permalink / raw)
  To: Rafael J. Wysocki, Bjorn Helgaas
  Cc: Len Brown, Lukas Wunner, Keith Busch, Alex Williamson,
	Alexandru Gagniuc, Mika Westerberg, linux-acpi, linux-pci

If there are shared power resources between otherwise unrelated devices
turning them on causes the other devices sharing them to be powered up
as well. In case of PCI devices go into D0uninitialized state meaning
that if they were configured to trigger wake that configuration is lost
at this point.

For this reason introduce a concept of "_PR0 dependent device" that can
be added to any ACPI device that has power resources. The dependent
device will be included in a list of dependent devices for all power
resources returned by the ACPI device's _PR0 (assuming it has one).
Whenever a power resource having dependent devices is turned physically
on (its _ON method is called) we runtime resume all of them to allow
their driver or in case of PCI the PCI core to re-initialize the device
and its wake configuration.

This adds two functions that can be used to add and remove these
dependent devices. Note the dependent device does not necessary need
share power resources so this functionality can be used to add "software
dependencies" as well if needed.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
 drivers/acpi/power.c    | 139 ++++++++++++++++++++++++++++++++++++++++
 include/acpi/acpi_bus.h |   4 ++
 2 files changed, 143 insertions(+)

diff --git a/drivers/acpi/power.c b/drivers/acpi/power.c
index a916417b9e70..76d298192940 100644
--- a/drivers/acpi/power.c
+++ b/drivers/acpi/power.c
@@ -42,6 +42,11 @@ ACPI_MODULE_NAME("power");
 #define ACPI_POWER_RESOURCE_STATE_ON	0x01
 #define ACPI_POWER_RESOURCE_STATE_UNKNOWN 0xFF
 
+struct acpi_power_dependent_device {
+	struct device *dev;
+	struct list_head node;
+};
+
 struct acpi_power_resource {
 	struct acpi_device device;
 	struct list_head list_node;
@@ -51,6 +56,7 @@ struct acpi_power_resource {
 	unsigned int ref_count;
 	bool wakeup_enabled;
 	struct mutex resource_lock;
+	struct list_head dependents;
 };
 
 struct acpi_power_resource_entry {
@@ -232,8 +238,125 @@ static int acpi_power_get_list_state(struct list_head *list, int *state)
 	return 0;
 }
 
+static int
+acpi_power_resource_add_dependent(struct acpi_power_resource *resource,
+				  struct device *dev)
+{
+	struct acpi_power_dependent_device *dep;
+	int ret = 0;
+
+	mutex_lock(&resource->resource_lock);
+	list_for_each_entry(dep, &resource->dependents, node) {
+		/* Only add it once */
+		if (dep->dev == dev)
+			goto unlock;
+	}
+
+	dep = kzalloc(sizeof(*dep), GFP_KERNEL);
+	if (!dep) {
+		ret = -ENOMEM;
+		goto unlock;
+	}
+
+	dep->dev = dev;
+	list_add_tail(&dep->node, &resource->dependents);
+	dev_dbg(dev, "added power dependency to [%s]\n", resource->name);
+
+unlock:
+	mutex_unlock(&resource->resource_lock);
+	return ret;
+}
+
+static void
+acpi_power_resource_remove_dependent(struct acpi_power_resource *resource,
+				     struct device *dev)
+{
+	struct acpi_power_dependent_device *dep;
+
+	mutex_lock(&resource->resource_lock);
+	list_for_each_entry(dep, &resource->dependents, node) {
+		if (dep->dev == dev) {
+			list_del(&dep->node);
+			kfree(dep);
+			dev_dbg(dev, "removed power dependency to [%s]\n",
+				resource->name);
+			break;
+		}
+	}
+	mutex_unlock(&resource->resource_lock);
+}
+
+/**
+ * acpi_device_power_add_dependent - Add dependent device of this ACPI device
+ * @adev: ACPI device pointer
+ * @dev: Dependent device
+ *
+ * If @adev has non-empty _PR0 the @dev is added as dependent device to all
+ * power resources returned by it. This means that whenever these power
+ * resources are turned _ON the dependent devices get runtime resumed. This
+ * is needed for devices such as PCI to allow its driver to re-initialize
+ * it after it went to D0uninitialized.
+ *
+ * If @adev does not have _PR0 this does nothing.
+ *
+ * Returns %0 in case of success and negative errno otherwise.
+ */
+int acpi_device_power_add_dependent(struct acpi_device *adev,
+				    struct device *dev)
+{
+	struct acpi_power_resource_entry *entry;
+	struct list_head *resources;
+	int ret;
+
+	if (!adev->power.flags.power_resources)
+		return 0;
+	if (!adev->power.states[ACPI_STATE_D0].flags.valid)
+		return 0;
+
+	resources = &adev->power.states[ACPI_STATE_D0].resources;
+	list_for_each_entry(entry, resources, node) {
+		ret = acpi_power_resource_add_dependent(entry->resource, dev);
+		if (ret)
+			goto err;
+	}
+
+	return 0;
+
+err:
+	list_for_each_entry(entry, resources, node)
+		acpi_power_resource_remove_dependent(entry->resource, dev);
+
+	return ret;
+}
+
+/**
+ * acpi_device_power_remove_dependent - Remove dependent device
+ * @adev: ACPI device pointer
+ * @dev: Dependent device
+ *
+ * Does the opposite of acpi_device_power_add_dependent() and removes the
+ * dependent device if it is found. Can be called to @adev that does not
+ * have _PR0 as well.
+ */
+void acpi_device_power_remove_dependent(struct acpi_device *adev,
+					struct device *dev)
+{
+	struct acpi_power_resource_entry *entry;
+	struct list_head *resources;
+
+	if (!adev->power.flags.power_resources)
+		return;
+	if (!adev->power.states[ACPI_STATE_D0].flags.valid)
+		return;
+
+	resources = &adev->power.states[ACPI_STATE_D0].resources;
+	list_for_each_entry_reverse(entry, resources, node)
+		acpi_power_resource_remove_dependent(entry->resource, dev);
+}
+
 static int __acpi_power_on(struct acpi_power_resource *resource)
 {
+	struct acpi_power_dependent_device *dep;
 	acpi_status status = AE_OK;
 
 	status = acpi_evaluate_object(resource->device.handle, "_ON", NULL, NULL);
@@ -243,6 +366,21 @@ static int __acpi_power_on(struct acpi_power_resource *resource)
 	ACPI_DEBUG_PRINT((ACPI_DB_INFO, "Power resource [%s] turned on\n",
 			  resource->name));
 
+	/*
+	 * If there are other dependents on this power resource we need to
+	 * resume them now so that their drivers can re-initialize the
+	 * hardware properly after it went back to D0.
+	 */
+	if (list_empty(&resource->dependents) ||
+	    list_is_singular(&resource->dependents))
+		return 0;
+
+	list_for_each_entry(dep, &resource->dependents, node) {
+		dev_dbg(dep->dev, "runtime resuming because [%s] turned on\n",
+			resource->name);
+		pm_request_resume(dep->dev);
+	}
+
 	return 0;
 }
 
@@ -810,6 +948,7 @@ int acpi_add_power_resource(acpi_handle handle)
 				ACPI_STA_DEFAULT);
 	mutex_init(&resource->resource_lock);
 	INIT_LIST_HEAD(&resource->list_node);
+	INIT_LIST_HEAD(&resource->dependents);
 	resource->name = device->pnp.bus_id;
 	strcpy(acpi_device_name(device), ACPI_POWER_DEVICE_NAME);
 	strcpy(acpi_device_class(device), ACPI_POWER_CLASS);
diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index 31b6c87d6240..4752ff0a9d9b 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -513,6 +513,10 @@ int acpi_device_fix_up_power(struct acpi_device *device);
 int acpi_bus_update_power(acpi_handle handle, int *state_p);
 int acpi_device_update_power(struct acpi_device *device, int *state_p);
 bool acpi_bus_power_manageable(acpi_handle handle);
+int acpi_device_power_add_dependent(struct acpi_device *adev,
+				    struct device *dev);
+void acpi_device_power_remove_dependent(struct acpi_device *adev,
+					struct device *dev);
 
 #ifdef CONFIG_PM
 bool acpi_bus_can_wakeup(acpi_handle handle);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/3] ACPI / PM: Introduce concept of a _PR0 dependent device
  2019-06-18 16:18 ` [PATCH v2 2/3] ACPI / PM: Introduce concept of a _PR0 dependent device Mika Westerberg
@ 2019-06-19 13:20   ` Rafael J. Wysocki
  2019-06-19 13:34     ` Mika Westerberg
  0 siblings, 1 reply; 22+ messages in thread
From: Rafael J. Wysocki @ 2019-06-19 13:20 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Rafael J. Wysocki, Bjorn Helgaas, Len Brown, Lukas Wunner,
	Keith Busch, Alex Williamson, Alexandru Gagniuc,
	ACPI Devel Maling List, Linux PCI

On Tue, Jun 18, 2019 at 6:19 PM Mika Westerberg
<mika.westerberg@linux.intel.com> wrote:
>
> If there are shared power resources between otherwise unrelated devices
> turning them on causes the other devices sharing them to be powered up
> as well. In case of PCI devices go into D0uninitialized state meaning
> that if they were configured to trigger wake that configuration is lost
> at this point.
>
> For this reason introduce a concept of "_PR0 dependent device" that can
> be added to any ACPI device that has power resources. The dependent
> device will be included in a list of dependent devices for all power
> resources returned by the ACPI device's _PR0 (assuming it has one).
> Whenever a power resource having dependent devices is turned physically
> on (its _ON method is called) we runtime resume all of them to allow
> their driver or in case of PCI the PCI core to re-initialize the device
> and its wake configuration.
>
> This adds two functions that can be used to add and remove these
> dependent devices. Note the dependent device does not necessary need
> share power resources so this functionality can be used to add "software
> dependencies" as well if needed.
>
> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> ---
>  drivers/acpi/power.c    | 139 ++++++++++++++++++++++++++++++++++++++++
>  include/acpi/acpi_bus.h |   4 ++
>  2 files changed, 143 insertions(+)
>
> diff --git a/drivers/acpi/power.c b/drivers/acpi/power.c
> index a916417b9e70..76d298192940 100644
> --- a/drivers/acpi/power.c
> +++ b/drivers/acpi/power.c
> @@ -42,6 +42,11 @@ ACPI_MODULE_NAME("power");
>  #define ACPI_POWER_RESOURCE_STATE_ON   0x01
>  #define ACPI_POWER_RESOURCE_STATE_UNKNOWN 0xFF
>
> +struct acpi_power_dependent_device {
> +       struct device *dev;
> +       struct list_head node;
> +};
> +
>  struct acpi_power_resource {
>         struct acpi_device device;
>         struct list_head list_node;
> @@ -51,6 +56,7 @@ struct acpi_power_resource {
>         unsigned int ref_count;
>         bool wakeup_enabled;
>         struct mutex resource_lock;
> +       struct list_head dependents;
>  };
>
>  struct acpi_power_resource_entry {
> @@ -232,8 +238,125 @@ static int acpi_power_get_list_state(struct list_head *list, int *state)
>         return 0;
>  }
>
> +static int
> +acpi_power_resource_add_dependent(struct acpi_power_resource *resource,
> +                                 struct device *dev)
> +{
> +       struct acpi_power_dependent_device *dep;
> +       int ret = 0;
> +
> +       mutex_lock(&resource->resource_lock);
> +       list_for_each_entry(dep, &resource->dependents, node) {
> +               /* Only add it once */
> +               if (dep->dev == dev)
> +                       goto unlock;
> +       }
> +
> +       dep = kzalloc(sizeof(*dep), GFP_KERNEL);
> +       if (!dep) {
> +               ret = -ENOMEM;
> +               goto unlock;
> +       }
> +
> +       dep->dev = dev;
> +       list_add_tail(&dep->node, &resource->dependents);
> +       dev_dbg(dev, "added power dependency to [%s]\n", resource->name);
> +
> +unlock:
> +       mutex_unlock(&resource->resource_lock);
> +       return ret;
> +}
> +
> +static void
> +acpi_power_resource_remove_dependent(struct acpi_power_resource *resource,
> +                                    struct device *dev)
> +{
> +       struct acpi_power_dependent_device *dep;
> +
> +       mutex_lock(&resource->resource_lock);
> +       list_for_each_entry(dep, &resource->dependents, node) {
> +               if (dep->dev == dev) {
> +                       list_del(&dep->node);
> +                       kfree(dep);
> +                       dev_dbg(dev, "removed power dependency to [%s]\n",
> +                               resource->name);
> +                       break;
> +               }
> +       }
> +       mutex_unlock(&resource->resource_lock);
> +}
> +
> +/**
> + * acpi_device_power_add_dependent - Add dependent device of this ACPI device
> + * @adev: ACPI device pointer
> + * @dev: Dependent device
> + *
> + * If @adev has non-empty _PR0 the @dev is added as dependent device to all
> + * power resources returned by it. This means that whenever these power
> + * resources are turned _ON the dependent devices get runtime resumed. This
> + * is needed for devices such as PCI to allow its driver to re-initialize
> + * it after it went to D0uninitialized.
> + *
> + * If @adev does not have _PR0 this does nothing.
> + *
> + * Returns %0 in case of success and negative errno otherwise.
> + */
> +int acpi_device_power_add_dependent(struct acpi_device *adev,
> +                                   struct device *dev)
> +{
> +       struct acpi_power_resource_entry *entry;
> +       struct list_head *resources;
> +       int ret;
> +
> +       if (!adev->power.flags.power_resources)
> +               return 0;
> +       if (!adev->power.states[ACPI_STATE_D0].flags.valid)
> +               return 0;

The two checks above can be replaced with an
adev->flags.power_manageable one AFAICS (the "valid" flag is always
set for D0 and the list below will be empty if there are no power
resources).

Same for acpi_device_power_remove_dependent(), of course.

Apart from this LGTM.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/3] ACPI / PM: Introduce concept of a _PR0 dependent device
  2019-06-19 13:20   ` Rafael J. Wysocki
@ 2019-06-19 13:34     ` Mika Westerberg
  0 siblings, 0 replies; 22+ messages in thread
From: Mika Westerberg @ 2019-06-19 13:34 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Bjorn Helgaas, Len Brown, Lukas Wunner,
	Keith Busch, Alex Williamson, Alexandru Gagniuc,
	ACPI Devel Maling List, Linux PCI

On Wed, Jun 19, 2019 at 03:20:45PM +0200, Rafael J. Wysocki wrote:
> > +int acpi_device_power_add_dependent(struct acpi_device *adev,
> > +                                   struct device *dev)
> > +{
> > +       struct acpi_power_resource_entry *entry;
> > +       struct list_head *resources;
> > +       int ret;
> > +
> > +       if (!adev->power.flags.power_resources)
> > +               return 0;
> > +       if (!adev->power.states[ACPI_STATE_D0].flags.valid)
> > +               return 0;
> 
> The two checks above can be replaced with an
> adev->flags.power_manageable one AFAICS (the "valid" flag is always
> set for D0 and the list below will be empty if there are no power
> resources).
> 
> Same for acpi_device_power_remove_dependent(), of course.

OK, I'll do that in next version.

> Apart from this LGTM.

Thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 3/3] PCI / ACPI: Add _PR0 dependent devices
  2019-06-18 16:18 [PATCH v2 0/3] PCI / ACPI: Handle sibling devices sharing power resources Mika Westerberg
  2019-06-18 16:18 ` [PATCH v2 1/3] PCI / ACPI: Use cached ACPI device state to get PCI device power state Mika Westerberg
  2019-06-18 16:18 ` [PATCH v2 2/3] ACPI / PM: Introduce concept of a _PR0 dependent device Mika Westerberg
@ 2019-06-18 16:18 ` Mika Westerberg
  2019-06-19 13:24 ` [PATCH v2 0/3] PCI / ACPI: Handle sibling devices sharing power resources Rafael J. Wysocki
  3 siblings, 0 replies; 22+ messages in thread
From: Mika Westerberg @ 2019-06-18 16:18 UTC (permalink / raw)
  To: Rafael J. Wysocki, Bjorn Helgaas
  Cc: Len Brown, Lukas Wunner, Keith Busch, Alex Williamson,
	Alexandru Gagniuc, Mika Westerberg, linux-acpi, linux-pci

If otherwise unrelated PCI devices share ACPI power resources turning
them on causes the devices to enter D0uninitialized power state which may
cause problems.

For example in Intel Ice Lake two root ports (RP0 and RP1), Thunderbolt
controller (NHI) and xHCI controller all share power resources as can be
ween in the topology below where power resources are marked with []:

  Host bridge
    |
    +- RP0 ---\
    +- RP1 ---|--+--> [TBT]
    +- NHI --/   |
    |            |
    |            v
    +- xHCI --> [D3C]

In a situation where all devices sharing the power resources are in
D3cold (the power resources are turned off) and for example the
Thunderbolt controller is runtime resumed resulting that the power
resources are turned on. This means that the other devices sharing them
(RP0, RP1 and xHCI) are transitioned into D0uninitialized state. If they
were configured to trigger wake (PME) on a certain event that
configuration gets lost after reset so we would need to re-initialize
them to get the wakeup working as expected again. To do so we would need
to runtime resume all of them to make sure their registers get restored
properly before we can runtime suspend them again.

Since we just added concept of "_PR0 dependent device" we can solve this
by calling the relevant add/remove functions when the PCI device is bind
to its ACPI representation. If it has power resources the PCI device
will be added as dependent device to them and runtime resumed whenever
they are physically turned on. This should make sure PCI core can
reconfigure wakes after the device is transitioned into D0uninitialized.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
 drivers/pci/pci-acpi.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
index b782acac26c5..2abe0eeafb53 100644
--- a/drivers/pci/pci-acpi.c
+++ b/drivers/pci/pci-acpi.c
@@ -902,6 +902,7 @@ static void pci_acpi_setup(struct device *dev)
 		device_wakeup_enable(dev);

 	acpi_pci_wakeup(pci_dev, false);
+	acpi_device_power_add_dependent(adev, dev);
 }

 static void pci_acpi_cleanup(struct device *dev)
@@ -914,6 +915,7 @@ static void pci_acpi_cleanup(struct device *dev)

 	pci_acpi_remove_pm_notifier(adev);
 	if (adev->wakeup.flags.valid) {
+		acpi_device_power_remove_dependent(adev, dev);
 		if (pci_dev->bridge_d3)
 			device_wakeup_disable(dev);

-- 
2.20.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/3] PCI / ACPI: Handle sibling devices sharing power resources
  2019-06-18 16:18 [PATCH v2 0/3] PCI / ACPI: Handle sibling devices sharing power resources Mika Westerberg
                   ` (2 preceding siblings ...)
  2019-06-18 16:18 ` [PATCH v2 3/3] PCI / ACPI: Add _PR0 dependent devices Mika Westerberg
@ 2019-06-19 13:24 ` Rafael J. Wysocki
  3 siblings, 0 replies; 22+ messages in thread
From: Rafael J. Wysocki @ 2019-06-19 13:24 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Rafael J. Wysocki, Bjorn Helgaas, Len Brown, Lukas Wunner,
	Keith Busch, Alex Williamson, Alexandru Gagniuc,
	ACPI Devel Maling List, Linux PCI

On Tue, Jun 18, 2019 at 6:19 PM Mika Westerberg
<mika.westerberg@linux.intel.com> wrote:
>
> Hi all,
>
> Based on a discussion regarding patch series I sent previously [1] to deal
> with sibling devices sharing ACPI power resources, I prepared a new
> reworked version according to the comments I got.
>
> To summarize, in Intel Ice Lake the Thunderbolt controller, PCIe root ports
> and xHCI all share power resources. When they are all in D3hot power
> resources (returned by _PR3) can be turned off powering off the whole
> block. However, there are two issues around this.
>
> Firstly the PCI core sets the device power state by asking what the real
> ACPI power state is. This results that all but last device sharing the
> power resources are in D3hot when the power resources are turned off. This
> causes issues if user runs for example 'lspci' because the device is really
> in D3cold so what user gets back is all ones (0xffffffff).
>
> Secondly if any of the device is runtime resumed the power resources are
> turned on bringing all other devices sharing the resources to
> D0uninitialized losing their wakeup configuration.
>
> This series aims to fix the two issues by:
>
>   1. Using the ACPI cached power state when PCI devices are transitioned
>      into low power states instead of reading back the "real" power state.
>
>   2. Introducing concept of "_PR0 dependent devices" that get runtime
>      resumed whenever their power resource (which they might share with
>      other sibling devices) gets turned on.
>
> The series is based on the idea of Rafael J. Wysocki <rafael@kernel.org>.
>
> [1] https://www.spinics.net/lists/linux-pci/msg83583.html
>
> Mika Westerberg (3):
>   PCI / ACPI: Use cached ACPI device state to get PCI device power state
>   ACPI / PM: Introduce concept of a _PR0 dependent device
>   PCI / ACPI: Add _PR0 dependent devices

LGMT overall, patch [2/3] can be simplified slightly IMO (already sent
comments for that one).

As far as I'm concerned, the other patches need not be updated.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-06-25 10:08 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-18 16:18 [PATCH v2 0/3] PCI / ACPI: Handle sibling devices sharing power resources Mika Westerberg
2019-06-18 16:18 ` [PATCH v2 1/3] PCI / ACPI: Use cached ACPI device state to get PCI device power state Mika Westerberg
2019-06-19 21:28   ` Bjorn Helgaas
2019-06-20  8:27     ` Mika Westerberg
2019-06-20 13:16       ` Bjorn Helgaas
2019-06-20 13:37         ` Mika Westerberg
2019-06-20 14:15           ` Bjorn Helgaas
2019-06-21 10:32             ` Rafael J. Wysocki
2019-06-21 13:09               ` Bjorn Helgaas
2019-06-22  8:51                 ` Rafael J. Wysocki
2019-06-24 10:57                   ` Mika Westerberg
2019-06-24 11:14               ` Rafael J. Wysocki
2019-06-25  9:45                 ` Mika Westerberg
2019-06-25 10:00                   ` Rafael J. Wysocki
2019-06-25 10:08                     ` Mika Westerberg
2019-06-21 11:56   ` Rafael J. Wysocki
2019-06-24 10:58     ` Mika Westerberg
2019-06-18 16:18 ` [PATCH v2 2/3] ACPI / PM: Introduce concept of a _PR0 dependent device Mika Westerberg
2019-06-19 13:20   ` Rafael J. Wysocki
2019-06-19 13:34     ` Mika Westerberg
2019-06-18 16:18 ` [PATCH v2 3/3] PCI / ACPI: Add _PR0 dependent devices Mika Westerberg
2019-06-19 13:24 ` [PATCH v2 0/3] PCI / ACPI: Handle sibling devices sharing power resources Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).