linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] PCI: don't power-off apple thunderbolt controller on s2idle
@ 2021-05-06 17:38 Konstantin Kharlamov
  2021-05-06 21:48 ` Bjorn Helgaas
  2021-06-07 23:17 ` Bjorn Helgaas
  0 siblings, 2 replies; 30+ messages in thread
From: Konstantin Kharlamov @ 2021-05-06 17:38 UTC (permalink / raw)
  To: linux-pci; +Cc: hi-angel

On Macbook 2013 resuming from s2idle results in external monitor no
longer being detected, and dmesg having errors like:

    pcieport 0000:06:00.0: can't change power state from D3hot to D0 (config space inaccessible)

and a stacktrace. The reason turned out that the hw that the quirk
powers off does not get powered on back on resume.

Thus, add a check for s2idle to the quirk, and do nothing if the suspend
mode is s2idle.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212767
Signed-off-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
---
 drivers/pci/quirks.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 653660e3ba9e..86fedcec37e2 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -27,6 +27,7 @@
 #include <linux/nvme.h>
 #include <linux/platform_data/x86/apple.h>
 #include <linux/pm_runtime.h>
+#include <linux/suspend.h>
 #include <linux/switchtec.h>
 #include <asm/dma.h>	/* isa_dma_bridge_buggy */
 #include "pci.h"
@@ -3646,6 +3647,13 @@ static void quirk_apple_poweroff_thunderbolt(struct pci_dev *dev)
 		return;
 	if (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM)
 		return;
+
+	/*
+	 * If suspend mode is s2idle, power won't get restored on resume.
+	 */
+	if (!pm_suspend_via_firmware())
+		return;
+
 	bridge = ACPI_HANDLE(&dev->dev);
 	if (!bridge)
 		return;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: don't power-off apple thunderbolt controller on s2idle
  2021-05-06 17:38 [PATCH] PCI: don't power-off apple thunderbolt controller on s2idle Konstantin Kharlamov
@ 2021-05-06 21:48 ` Bjorn Helgaas
  2021-05-06 22:07   ` Lukas Wunner
                     ` (2 more replies)
  2021-06-07 23:17 ` Bjorn Helgaas
  1 sibling, 3 replies; 30+ messages in thread
From: Bjorn Helgaas @ 2021-05-06 21:48 UTC (permalink / raw)
  To: Konstantin Kharlamov
  Cc: linux-pci, Rafael J. Wysocki, Andreas Noever, linux-pm

[+cc Rafael, Andreas, linux-pm]

On Thu, May 06, 2021 at 08:38:20PM +0300, Konstantin Kharlamov wrote:
> On Macbook 2013 resuming from s2idle results in external monitor no
> longer being detected, and dmesg having errors like:
> 
>     pcieport 0000:06:00.0: can't change power state from D3hot to D0 (config space inaccessible)
> 
> and a stacktrace. The reason turned out that the hw that the quirk
> powers off does not get powered on back on resume.

quirk_apple_poweroff_thunderbolt() was added in 2014 by 1df5172c5c25
("PCI: Suspend/resume quirks for Apple thunderbolt").  It claims
"power is automatically restored before resume," so there must be
something special about s2idle that prevents the power-on.

IIUC this change will reduce the s2idle power savings.  I would feel
better about this if we understood what the difference was.  

> Thus, add a check for s2idle to the quirk, and do nothing if the suspend
> mode is s2idle.

Obviously the *hardware* hasn't changed since 1df5172c5c25.  Is s2idle
something that wasn't tested back then, or is this problem connected
to an s2idle change since then?  Can we identify a commit that
introduced this problem?  That would help with backporting or stable
tags.

> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212767

Thanks for this!  Would you mind attaching the output of
"sudo lspci -vvv"?  If you attach any other dmesg, could you
use "dmesg --color=never" so the log doesn't include all the
escape characters?

> Signed-off-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
> ---
>  drivers/pci/quirks.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 653660e3ba9e..86fedcec37e2 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -27,6 +27,7 @@
>  #include <linux/nvme.h>
>  #include <linux/platform_data/x86/apple.h>
>  #include <linux/pm_runtime.h>
> +#include <linux/suspend.h>
>  #include <linux/switchtec.h>
>  #include <asm/dma.h>	/* isa_dma_bridge_buggy */
>  #include "pci.h"
> @@ -3646,6 +3647,13 @@ static void quirk_apple_poweroff_thunderbolt(struct pci_dev *dev)
>  		return;
>  	if (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM)
>  		return;
> +
> +	/*
> +	 * If suspend mode is s2idle, power won't get restored on resume.
> +	 */
> +	if (!pm_suspend_via_firmware())
> +		return;
> +
>  	bridge = ACPI_HANDLE(&dev->dev);
>  	if (!bridge)
>  		return;
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: don't power-off apple thunderbolt controller on s2idle
  2021-05-06 21:48 ` Bjorn Helgaas
@ 2021-05-06 22:07   ` Lukas Wunner
  2021-05-07  9:51     ` Rafael J. Wysocki
  2021-05-07 13:30     ` Bjorn Helgaas
  2021-05-07  9:32   ` Konstantin Kharlamov
  2021-05-20 11:58   ` Rafael J. Wysocki
  2 siblings, 2 replies; 30+ messages in thread
From: Lukas Wunner @ 2021-05-06 22:07 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Konstantin Kharlamov, linux-pci, Rafael J. Wysocki,
	Andreas Noever, linux-pm

On Thu, May 06, 2021 at 04:48:42PM -0500, Bjorn Helgaas wrote:
> On Thu, May 06, 2021 at 08:38:20PM +0300, Konstantin Kharlamov wrote:
> > On Macbook 2013 resuming from s2idle results in external monitor no
> > longer being detected, and dmesg having errors like:
> > 
> >     pcieport 0000:06:00.0: can't change power state from D3hot to D0 (config space inaccessible)
> > 
> > and a stacktrace. The reason turned out that the hw that the quirk
> > powers off does not get powered on back on resume.
> 
> quirk_apple_poweroff_thunderbolt() was added in 2014 by 1df5172c5c25
> ("PCI: Suspend/resume quirks for Apple thunderbolt").  It claims
> "power is automatically restored before resume," so there must be
> something special about s2idle that prevents the power-on.

With s2idle, the machine isn't suspended via ACPI, so the AML code
which powers the controller off isn't executed.  The dance to prepare
the controller for power-off consequently isn't necessary but rather
harmful.

To get the same power savings as with ACPI suspend, the controller
needs to be powered off via runtime suspend.  I posted patches for
that back in 2016.  I'm using them on my laptop, they need some
polishing and rebasing before I can repost them due to massive
changes that have happened in the thunderbolt driver in the meantime.
Without these patches, the controller sucks 1.5W of power in s2idle.

> Obviously the *hardware* hasn't changed since 1df5172c5c25.  Is s2idle
> something that wasn't tested back then, or is this problem connected
> to an s2idle change since then?  Can we identify a commit that
> introduced this problem?  That would help with backporting or stable
> tags.

Yes I believe the quirk predates the introduction of s2idle by a couple
of years.

> > Signed-off-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>

Reviewed-by: Lukas Wunner <lukas@wunner.de>

The patch looks fine to me.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: don't power-off apple thunderbolt controller on s2idle
  2021-05-06 21:48 ` Bjorn Helgaas
  2021-05-06 22:07   ` Lukas Wunner
@ 2021-05-07  9:32   ` Konstantin Kharlamov
  2021-05-07 13:07     ` Bjorn Helgaas
  2021-05-20 11:58   ` Rafael J. Wysocki
  2 siblings, 1 reply; 30+ messages in thread
From: Konstantin Kharlamov @ 2021-05-07  9:32 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, Rafael J. Wysocki, Andreas Noever, linux-pm

On Thu, 2021-05-06 at 16:48 -0500, Bjorn Helgaas wrote:
> [+cc Rafael, Andreas, linux-pm]
> 
> On Thu, May 06, 2021 at 08:38:20PM +0300, Konstantin Kharlamov wrote:
> > On Macbook 2013 resuming from s2idle results in external monitor no
> > longer being detected, and dmesg having errors like:
> > 
> >     pcieport 0000:06:00.0: can't change power state from D3hot to D0 (config
> > space inaccessible)
> > 
> > and a stacktrace. The reason turned out that the hw that the quirk
> > powers off does not get powered on back on resume.
> 
> quirk_apple_poweroff_thunderbolt() was added in 2014 by 1df5172c5c25
> ("PCI: Suspend/resume quirks for Apple thunderbolt").  It claims
> "power is automatically restored before resume," so there must be
> something special about s2idle that prevents the power-on.
> 
> IIUC this change will reduce the s2idle power savings.  I would feel
> better about this if we understood what the difference was.  
> 
> > Thus, add a check for s2idle to the quirk, and do nothing if the suspend
> > mode is s2idle.
> 
> Obviously the *hardware* hasn't changed since 1df5172c5c25.  Is s2idle
> something that wasn't tested back then, or is this problem connected
> to an s2idle change since then?  Can we identify a commit that
> introduced this problem?  That would help with backporting or stable
> tags.
> 
> > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212767
> 
> Thanks for this!  Would you mind attaching the output of
> "sudo lspci -vvv"?  If you attach any other dmesg, could you
> use "dmesg --color=never" so the log doesn't include all the
> escape characters?

Thank you! So, just to be clear: in lieu of Lukas Wunner's reply, do you still want `lspci` and `dmesg` outputs, or are you okay with the information Lukas provided?

And while on it, an unrelated question to you as a maintainer: I never contributed to the kernel before: in case you are okay with the patch, what happens now that I got R-b, should I resend a v2 of it with the R-b added?


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: don't power-off apple thunderbolt controller on s2idle
  2021-05-06 22:07   ` Lukas Wunner
@ 2021-05-07  9:51     ` Rafael J. Wysocki
  2021-05-08  8:48       ` Lukas Wunner
  2021-05-07 13:30     ` Bjorn Helgaas
  1 sibling, 1 reply; 30+ messages in thread
From: Rafael J. Wysocki @ 2021-05-07  9:51 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Konstantin Kharlamov, Linux PCI,
	Rafael J. Wysocki, Andreas Noever, Linux PM

On Fri, May 7, 2021 at 12:07 AM Lukas Wunner <lukas@wunner.de> wrote:
>
> On Thu, May 06, 2021 at 04:48:42PM -0500, Bjorn Helgaas wrote:
> > On Thu, May 06, 2021 at 08:38:20PM +0300, Konstantin Kharlamov wrote:
> > > On Macbook 2013 resuming from s2idle results in external monitor no
> > > longer being detected, and dmesg having errors like:
> > >
> > >     pcieport 0000:06:00.0: can't change power state from D3hot to D0 (config space inaccessible)
> > >
> > > and a stacktrace. The reason turned out that the hw that the quirk
> > > powers off does not get powered on back on resume.
> >
> > quirk_apple_poweroff_thunderbolt() was added in 2014 by 1df5172c5c25
> > ("PCI: Suspend/resume quirks for Apple thunderbolt").  It claims
> > "power is automatically restored before resume," so there must be
> > something special about s2idle that prevents the power-on.
>
> With s2idle, the machine isn't suspended via ACPI, so the AML code
> which powers the controller off isn't executed.  The dance to prepare
> the controller for power-off consequently isn't necessary but rather
> harmful.
>
> To get the same power savings as with ACPI suspend, the controller
> needs to be powered off via runtime suspend.

I'm not quite sure why runtime PM needs to be involved.

The controller suspend can happen in the system-wide suspend code path directly.

> I posted patches for
> that back in 2016.  I'm using them on my laptop, they need some
> polishing and rebasing before I can repost them due to massive
> changes that have happened in the thunderbolt driver in the meantime.
> Without these patches, the controller sucks 1.5W of power in s2idle.
>
> > Obviously the *hardware* hasn't changed since 1df5172c5c25.  Is s2idle
> > something that wasn't tested back then, or is this problem connected
> > to an s2idle change since then?  Can we identify a commit that
> > introduced this problem?  That would help with backporting or stable
> > tags.
>
> Yes I believe the quirk predates the introduction of s2idle by a couple
> of years.
>
> > > Signed-off-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
>
> Reviewed-by: Lukas Wunner <lukas@wunner.de>
>
> The patch looks fine to me.
>
> Thanks,
>
> Lukas

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: don't power-off apple thunderbolt controller on s2idle
  2021-05-07  9:32   ` Konstantin Kharlamov
@ 2021-05-07 13:07     ` Bjorn Helgaas
  2021-05-07 13:48       ` Konstantin Kharlamov
  0 siblings, 1 reply; 30+ messages in thread
From: Bjorn Helgaas @ 2021-05-07 13:07 UTC (permalink / raw)
  To: Konstantin Kharlamov
  Cc: linux-pci, Rafael J. Wysocki, Andreas Noever, linux-pm, Lukas Wunner

[+cc Lukas]

On Fri, May 07, 2021 at 12:32:20PM +0300, Konstantin Kharlamov wrote:
> On Thu, 2021-05-06 at 16:48 -0500, Bjorn Helgaas wrote:
> > [+cc Rafael, Andreas, linux-pm]
> > 
> > On Thu, May 06, 2021 at 08:38:20PM +0300, Konstantin Kharlamov wrote:
> > > On Macbook 2013 resuming from s2idle results in external monitor no
> > > longer being detected, and dmesg having errors like:
> > > 
> > >     pcieport 0000:06:00.0: can't change power state from D3hot to D0 (config
> > > space inaccessible)
> > > 
> > > and a stacktrace. The reason turned out that the hw that the quirk
> > > powers off does not get powered on back on resume.
> > 
> > quirk_apple_poweroff_thunderbolt() was added in 2014 by 1df5172c5c25
> > ("PCI: Suspend/resume quirks for Apple thunderbolt").  It claims
> > "power is automatically restored before resume," so there must be
> > something special about s2idle that prevents the power-on.
> > 
> > IIUC this change will reduce the s2idle power savings.  I would feel
> > better about this if we understood what the difference was.  
> > 
> > > Thus, add a check for s2idle to the quirk, and do nothing if the suspend
> > > mode is s2idle.
> > 
> > Obviously the *hardware* hasn't changed since 1df5172c5c25.  Is s2idle
> > something that wasn't tested back then, or is this problem connected
> > to an s2idle change since then?  Can we identify a commit that
> > introduced this problem?  That would help with backporting or stable
> > tags.
> > 
> > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212767
> > 
> > Thanks for this!  Would you mind attaching the output of
> > "sudo lspci -vvv"?  If you attach any other dmesg, could you
> > use "dmesg --color=never" so the log doesn't include all the
> > escape characters?
> 
> Thank you! So, just to be clear: in lieu of Lukas Wunner's reply, do
> you still want `lspci` and `dmesg` outputs, or are you okay with the
> information Lukas provided?

Yes, please attach at least lspci output.  It helps understand the
topology and may be useful in the future.  I wish we had a similar
bugzilla with more information about the original 1df5172c5c25.

> And while on it, an unrelated question to you as a maintainer: I
> never contributed to the kernel before: in case you are okay with
> the patch, what happens now that I got R-b, should I resend a v2 of
> it with the R-b added?

You don't need to resend a patch if the only change is to add
reviewed-by, acked-by, or similar tags.

Bjorn

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: don't power-off apple thunderbolt controller on s2idle
  2021-05-06 22:07   ` Lukas Wunner
  2021-05-07  9:51     ` Rafael J. Wysocki
@ 2021-05-07 13:30     ` Bjorn Helgaas
  2021-05-07 14:08       ` Konstantin Kharlamov
                         ` (2 more replies)
  1 sibling, 3 replies; 30+ messages in thread
From: Bjorn Helgaas @ 2021-05-07 13:30 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Konstantin Kharlamov, linux-pci, Rafael J. Wysocki,
	Andreas Noever, linux-pm

On Fri, May 07, 2021 at 12:07:38AM +0200, Lukas Wunner wrote:
> On Thu, May 06, 2021 at 04:48:42PM -0500, Bjorn Helgaas wrote:
> > On Thu, May 06, 2021 at 08:38:20PM +0300, Konstantin Kharlamov wrote:
> > > On Macbook 2013 resuming from s2idle results in external monitor no
> > > longer being detected, and dmesg having errors like:
> > > 
> > >     pcieport 0000:06:00.0: can't change power state from D3hot to D0 (config space inaccessible)
> > > 
> > > and a stacktrace. The reason turned out that the hw that the quirk
> > > powers off does not get powered on back on resume.
> > 
> > quirk_apple_poweroff_thunderbolt() was added in 2014 by 1df5172c5c25
> > ("PCI: Suspend/resume quirks for Apple thunderbolt").  It claims
> > "power is automatically restored before resume," so there must be
> > something special about s2idle that prevents the power-on.
> 
> With s2idle, the machine isn't suspended via ACPI, so the AML code
> which powers the controller off isn't executed.  The dance to prepare
> the controller for power-off consequently isn't necessary but rather
> harmful.
> 
> To get the same power savings as with ACPI suspend, the controller
> needs to be powered off via runtime suspend.  I posted patches for
> that back in 2016.  I'm using them on my laptop, they need some
> polishing and rebasing before I can repost them due to massive
> changes that have happened in the thunderbolt driver in the meantime.
> Without these patches, the controller sucks 1.5W of power in s2idle.
> 
> > Obviously the *hardware* hasn't changed since 1df5172c5c25.  Is s2idle
> > something that wasn't tested back then, or is this problem connected
> > to an s2idle change since then?  Can we identify a commit that
> > introduced this problem?  That would help with backporting or stable
> > tags.
> 
> Yes I believe the quirk predates the introduction of s2idle by a couple
> of years.

In an ideal world, we would know which commit introduced s2idle and
hence the possibility of hitting this bug, and we would add a Fixes:
tag for that commit so we could connect this fix with it.

Apart from that, what I don't like about this (and about the original
1df5172c5c25) is that there's no connection to a spec or to documented
behavior of the device or of suspend/resume.

For example, "With s2idle, the machine isn't suspended via ACPI, so
the AML code which powers the controller off isn't executed."  AFAICT
that isn't actually a required, documented property of s2idle, but
rather it reaches into the internal implementation.

The code comment "If suspend mode is s2idle, power won't get restored
on resume" is similar.  !pm_suspend_via_firmware() tells us that
platform firmware won't be invoked.  But the connection between *that*
and "power won't get restored" is unexplained.

> > > Signed-off-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
> 
> Reviewed-by: Lukas Wunner <lukas@wunner.de>

Thanks for looking at this!

Bjorn

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: don't power-off apple thunderbolt controller on s2idle
  2021-05-07 13:07     ` Bjorn Helgaas
@ 2021-05-07 13:48       ` Konstantin Kharlamov
  0 siblings, 0 replies; 30+ messages in thread
From: Konstantin Kharlamov @ 2021-05-07 13:48 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Rafael J. Wysocki, Andreas Noever, linux-pm, Lukas Wunner

On Fri, 2021-05-07 at 08:07 -0500, Bjorn Helgaas wrote:
> [+cc Lukas]
> 
> On Fri, May 07, 2021 at 12:32:20PM +0300, Konstantin Kharlamov wrote:
> > On Thu, 2021-05-06 at 16:48 -0500, Bjorn Helgaas wrote:
> > > [+cc Rafael, Andreas, linux-pm]
> > > 
> > > On Thu, May 06, 2021 at 08:38:20PM +0300, Konstantin Kharlamov wrote:
> > > > On Macbook 2013 resuming from s2idle results in external monitor no
> > > > longer being detected, and dmesg having errors like:
> > > > 
> > > >     pcieport 0000:06:00.0: can't change power state from D3hot to D0
> > > > (config
> > > > space inaccessible)
> > > > 
> > > > and a stacktrace. The reason turned out that the hw that the quirk
> > > > powers off does not get powered on back on resume.
> > > 
> > > quirk_apple_poweroff_thunderbolt() was added in 2014 by 1df5172c5c25
> > > ("PCI: Suspend/resume quirks for Apple thunderbolt").  It claims
> > > "power is automatically restored before resume," so there must be
> > > something special about s2idle that prevents the power-on.
> > > 
> > > IIUC this change will reduce the s2idle power savings.  I would feel
> > > better about this if we understood what the difference was.  
> > > 
> > > > Thus, add a check for s2idle to the quirk, and do nothing if the suspend
> > > > mode is s2idle.
> > > 
> > > Obviously the *hardware* hasn't changed since 1df5172c5c25.  Is s2idle
> > > something that wasn't tested back then, or is this problem connected
> > > to an s2idle change since then?  Can we identify a commit that
> > > introduced this problem?  That would help with backporting or stable
> > > tags.
> > > 
> > > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212767
> > > 
> > > Thanks for this!  Would you mind attaching the output of
> > > "sudo lspci -vvv"?  If you attach any other dmesg, could you
> > > use "dmesg --color=never" so the log doesn't include all the
> > > escape characters?
> > 
> > Thank you! So, just to be clear: in lieu of Lukas Wunner's reply, do
> > you still want `lspci` and `dmesg` outputs, or are you okay with the
> > information Lukas provided?
> 
> Yes, please attach at least lspci output.  It helps understand the
> topology and may be useful in the future.  I wish we had a similar
> bugzilla with more information about the original 1df5172c5c25.

Thanks! I attached the `sudo lspci -vvv` output to the bugreport, here's a direct link to the attachment https://bugzilla.kernel.org/attachment.cgi?id=296689


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: don't power-off apple thunderbolt controller on s2idle
  2021-05-07 13:30     ` Bjorn Helgaas
@ 2021-05-07 14:08       ` Konstantin Kharlamov
  2021-05-12 20:36         ` Konstantin Kharlamov
  2021-05-19 17:28         ` Bjorn Helgaas
  2021-05-07 15:02       ` Rafael J. Wysocki
  2021-05-08  8:20       ` Lukas Wunner
  2 siblings, 2 replies; 30+ messages in thread
From: Konstantin Kharlamov @ 2021-05-07 14:08 UTC (permalink / raw)
  To: Bjorn Helgaas, Lukas Wunner
  Cc: linux-pci, Rafael J. Wysocki, Andreas Noever, linux-pm

On Fri, 2021-05-07 at 08:30 -0500, Bjorn Helgaas wrote:
> On Fri, May 07, 2021 at 12:07:38AM +0200, Lukas Wunner wrote:
> > On Thu, May 06, 2021 at 04:48:42PM -0500, Bjorn Helgaas wrote:
> > > On Thu, May 06, 2021 at 08:38:20PM +0300, Konstantin Kharlamov wrote:
> > > > On Macbook 2013 resuming from s2idle results in external monitor no
> > > > longer being detected, and dmesg having errors like:
> > > > 
> > > >     pcieport 0000:06:00.0: can't change power state from D3hot to D0
> > > > (config space inaccessible)
> > > > 
> > > > and a stacktrace. The reason turned out that the hw that the quirk
> > > > powers off does not get powered on back on resume.
> > > 
> > > quirk_apple_poweroff_thunderbolt() was added in 2014 by 1df5172c5c25
> > > ("PCI: Suspend/resume quirks for Apple thunderbolt").  It claims
> > > "power is automatically restored before resume," so there must be
> > > something special about s2idle that prevents the power-on.
> > 
> > With s2idle, the machine isn't suspended via ACPI, so the AML code
> > which powers the controller off isn't executed.  The dance to prepare
> > the controller for power-off consequently isn't necessary but rather
> > harmful.
> > 
> > To get the same power savings as with ACPI suspend, the controller
> > needs to be powered off via runtime suspend.  I posted patches for
> > that back in 2016.  I'm using them on my laptop, they need some
> > polishing and rebasing before I can repost them due to massive
> > changes that have happened in the thunderbolt driver in the meantime.
> > Without these patches, the controller sucks 1.5W of power in s2idle.
> > 
> > > Obviously the *hardware* hasn't changed since 1df5172c5c25.  Is s2idle
> > > something that wasn't tested back then, or is this problem connected
> > > to an s2idle change since then?  Can we identify a commit that
> > > introduced this problem?  That would help with backporting or stable
> > > tags.
> > 
> > Yes I believe the quirk predates the introduction of s2idle by a couple
> > of years.
> 
> In an ideal world, we would know which commit introduced s2idle and
> hence the possibility of hitting this bug, and we would add a Fixes:
> tag for that commit so we could connect this fix with it.
> 
> Apart from that, what I don't like about this (and about the original
> 1df5172c5c25) is that there's no connection to a spec or to documented
> behavior of the device or of suspend/resume.

I did some research, and found that s2idle was first introduced in 2013 in
commit 7e73c5ae6e799 (except it wasn't called "s2idle", by that name it goes
since around 2016 as Lukas mentioned. In 7e73c5ae6e799 it is called "freeze").
This is before 1df5172c5c25 which was added in 2014, so I guess we can add a:

	Fixes: 1df5172c5c25 ("PCI: Suspend/resume quirks for Apple
thunderbolt")

> For example, "With s2idle, the machine isn't suspended via ACPI, so
> the AML code which powers the controller off isn't executed."  AFAICT
> that isn't actually a required, documented property of s2idle, but
> rather it reaches into the internal implementation.
> 
> The code comment "If suspend mode is s2idle, power won't get restored
> on resume" is similar.  !pm_suspend_via_firmware() tells us that
> platform firmware won't be invoked.  But the connection between *that*
> and "power won't get restored" is unexplained.

Sorry, I can't comment anything regarding AML and power management in general since I am really new to all of this. However, regarding the usage of the `pm_suspend_via_firmware()`: yeah, I also think it is unclear what this does, and I was thinking about adding a wrapper function something like `is_s2idle()` to the suspend.h, which would simply call `pm_suspend_via_firmware` internally. I didn't do that because I thought that usage of pm_suspend_via_firmware() for that task is just something people working with power management are supposed to know, but if someone else questions it too, I can make such wrapper, it's just a 3 lines-of-code change.

FWIW, originally I found out that pm_suspend_via_firmware() can be used for detecting s2idle by simply asking about it on linux-pm: https://marc.info/?l=linux-pm&m=162029296108775&w=2


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: don't power-off apple thunderbolt controller on s2idle
  2021-05-07 13:30     ` Bjorn Helgaas
  2021-05-07 14:08       ` Konstantin Kharlamov
@ 2021-05-07 15:02       ` Rafael J. Wysocki
  2021-05-08  8:20       ` Lukas Wunner
  2 siblings, 0 replies; 30+ messages in thread
From: Rafael J. Wysocki @ 2021-05-07 15:02 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Lukas Wunner, Konstantin Kharlamov, Linux PCI, Rafael J. Wysocki,
	Andreas Noever, Linux PM

On Fri, May 7, 2021 at 3:30 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Fri, May 07, 2021 at 12:07:38AM +0200, Lukas Wunner wrote:
> > On Thu, May 06, 2021 at 04:48:42PM -0500, Bjorn Helgaas wrote:
> > > On Thu, May 06, 2021 at 08:38:20PM +0300, Konstantin Kharlamov wrote:
> > > > On Macbook 2013 resuming from s2idle results in external monitor no
> > > > longer being detected, and dmesg having errors like:
> > > >
> > > >     pcieport 0000:06:00.0: can't change power state from D3hot to D0 (config space inaccessible)
> > > >
> > > > and a stacktrace. The reason turned out that the hw that the quirk
> > > > powers off does not get powered on back on resume.
> > >
> > > quirk_apple_poweroff_thunderbolt() was added in 2014 by 1df5172c5c25
> > > ("PCI: Suspend/resume quirks for Apple thunderbolt").  It claims
> > > "power is automatically restored before resume," so there must be
> > > something special about s2idle that prevents the power-on.
> >
> > With s2idle, the machine isn't suspended via ACPI, so the AML code
> > which powers the controller off isn't executed.  The dance to prepare
> > the controller for power-off consequently isn't necessary but rather
> > harmful.
> >
> > To get the same power savings as with ACPI suspend, the controller
> > needs to be powered off via runtime suspend.  I posted patches for
> > that back in 2016.  I'm using them on my laptop, they need some
> > polishing and rebasing before I can repost them due to massive
> > changes that have happened in the thunderbolt driver in the meantime.
> > Without these patches, the controller sucks 1.5W of power in s2idle.
> >
> > > Obviously the *hardware* hasn't changed since 1df5172c5c25.  Is s2idle
> > > something that wasn't tested back then, or is this problem connected
> > > to an s2idle change since then?  Can we identify a commit that
> > > introduced this problem?  That would help with backporting or stable
> > > tags.
> >
> > Yes I believe the quirk predates the introduction of s2idle by a couple
> > of years.
>
> In an ideal world, we would know which commit introduced s2idle and
> hence the possibility of hitting this bug, and we would add a Fixes:
> tag for that commit so we could connect this fix with it.
>
> Apart from that, what I don't like about this (and about the original
> 1df5172c5c25) is that there's no connection to a spec or to documented
> behavior of the device or of suspend/resume.
>
> For example, "With s2idle, the machine isn't suspended via ACPI, so
> the AML code which powers the controller off isn't executed."  AFAICT
> that isn't actually a required, documented property of s2idle, but
> rather it reaches into the internal implementation.

I tend to agree, but not completely.

The substantial difference between s2idle and S2RAM is that with the
latter control is passed to the platform firmware at the end of the
suspend transition, but that part of the platform firmware is SMM
rather than AML and so this has no bearing on whether or not power is
removed from the controller by any AML code.

That said, the platform firmware code completing the S2RAM suspend
transition does remove power from various system components which may
(and probably does) include the Thunderbolt controller.

IOW, the s2idle path needs to actively power manage the controller in
order to achieve desirable results, whereas in the S2RAM case that
isn't strictly necessary due to the fundamental difference between the
two variants of system-wide suspend.

> The code comment "If suspend mode is s2idle, power won't get restored
> on resume" is similar.  !pm_suspend_via_firmware() tells us that
> platform firmware won't be invoked.  But the connection between *that*
> and "power won't get restored" is unexplained.

Right.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: don't power-off apple thunderbolt controller on s2idle
  2021-05-07 13:30     ` Bjorn Helgaas
  2021-05-07 14:08       ` Konstantin Kharlamov
  2021-05-07 15:02       ` Rafael J. Wysocki
@ 2021-05-08  8:20       ` Lukas Wunner
  2 siblings, 0 replies; 30+ messages in thread
From: Lukas Wunner @ 2021-05-08  8:20 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Konstantin Kharlamov, linux-pci, Rafael J. Wysocki,
	Andreas Noever, linux-pm

On Fri, May 07, 2021 at 08:30:02AM -0500, Bjorn Helgaas wrote:
> Apart from that, what I don't like about this (and about the original
> 1df5172c5c25) is that there's no connection to a spec or to documented
> behavior of the device or of suspend/resume.
> 
> For example, "With s2idle, the machine isn't suspended via ACPI, so
> the AML code which powers the controller off isn't executed."  AFAICT
> that isn't actually a required, documented property of s2idle, but
> rather it reaches into the internal implementation.

Actually I'm not sure whether AML code is involved here, I may have mixed
up terms.  To the best of my knowledge, macOS doesn't use s2idle, at least
not on such older MacBooks.  It either puts the machine into ACPI S3
(S2RAM) or S4 (hibernate, suspend to disk).  The Thunderbolt controller's
power rail is off in ACPI S3 and S4.

A peculiarity of the "Cactus Ridge" Thunderbolt 1 controller built into
2012/2013 era MacBooks is that it requires execution of the ACPI methods
in quirk_apple_poweroff_thunderbolt() before power is cut.  Otherwise
the controller is dead when power is reinstated and a cold reboot is
required to resurrect it.  At least that's my understanding, I don't
own a machine with such a chip.  Neither older controllers (such as the
2011 era "Light Ridge") nor newer controllers (such as 2013/2014
"Falcon Ridge") need the quirk.

The ACPI methods seem to fiddle with GPIOs which are connected to
the Force Power, Go2Sx and Ok2Go2Sx pins of the Cactus Ridge controller.
The meaning of the ACPI methods and the pins isn't documented publicly,
it's all reverse-engineered from the macOS driver and leaked schematics.
Some educated guesses as to their meaning are contained in my Thunderbolt
runtime PM patches of 2016:

https://github.com/l1k/linux/commit/65f56e6c8446

 * * On Macs with Thunderbolt 1 Gen 2 controllers (Cactus Ridge 4C):
[...]
 *   * Additional SXFP method ("Force Power"), accepts only argument 0,
 *     switches the controller off. This carries out just the raw power change,
 *     unlike XRPE which disables the link on the PCIe Root Port in an orderly
 *     fashion before switching off the controller.
 *   * Additional SXLV, SXIO, SXIL methods to utilize the Go2Sx and Ok2Go2Sx
 *     pins (see background below). Apparently SXLV toggles the value given to
 *     the POC via Go2Sx (0 or 1), SXIO changes the direction (0 or 1) and SXIL
 *     returns the value received from the POC via Ok2Go2Sx.
[...]
 * * On gen 2 controllers (Cactus Ridge 4C), Intel integrated the MCU into the
 *   controller and called it POC. This caused a change of semantics for XRIN
 *   and XRIL. The POC is powered by a separate 3.3V rail which is active even
 *   in sleep state S4. It only draws a very small current. The regular 3.3V
 *   and 1.05V power rails are no longer controlled by the southbridge but by
 *   the POC. In other words the controller powers *itself* up and down! It is
 *   instructed to do so with the Go2Sx pin. Another pin, Ok2Go2Sx, allows the
 *   controller to indicate if it is ready to power itself down. Apple wires
 *   Go2Sx and Ok2Go2Sx to the same GPIO pin on the southbridge, hence the pin
 *   is used bidirectionally. A third pin, Force Power, is intended by Intel
 *   for debug only but Apple abuses it for XRPE/TRPE and SXFP. Perhaps it
 *   leads to larger power saving gains. They utilize Go2Sx and Ok2Go2Sx only
 *   on Cactus Ridge, presumably because the controller somehow requires that.
 *   On Falcon Ridge they forego these pins and rely solely on Force Power.

With s2idle, the machine never transitions to ACPI S3 or S4.  Consequently,
power to the controller isn't cut, but the quirk prepares it for the
S3/S4 transition.  When the system is woken from s2idle, the chip is
apparently inaccessible.  Solution is to either not execute the PCI quirk
at all (as is done by the present patch) or to undo the quirk when waking
from s2idle.  We don't know how to do the latter, it would have to be
reverse-engineered.  Maybe it can't be done at all with the available
ACPI methods.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: don't power-off apple thunderbolt controller on s2idle
  2021-05-07  9:51     ` Rafael J. Wysocki
@ 2021-05-08  8:48       ` Lukas Wunner
  0 siblings, 0 replies; 30+ messages in thread
From: Lukas Wunner @ 2021-05-08  8:48 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Bjorn Helgaas, Konstantin Kharlamov, Linux PCI,
	Rafael J. Wysocki, Andreas Noever, Linux PM

On Fri, May 07, 2021 at 11:51:20AM +0200, Rafael J. Wysocki wrote:
> On Fri, May 7, 2021 at 12:07 AM Lukas Wunner <lukas@wunner.de> wrote:
> > With s2idle, the machine isn't suspended via ACPI, so the AML code
> > which powers the controller off isn't executed.  The dance to prepare
> > the controller for power-off consequently isn't necessary but rather
> > harmful.
> >
> > To get the same power savings as with ACPI suspend, the controller
> > needs to be powered off via runtime suspend.
> 
> I'm not quite sure why runtime PM needs to be involved.
> 
> The controller suspend can happen in the system-wide suspend code path
> directly.

Sorry, my comments were unclear.  What I meant to say is that the
Thunderbolt runtime PM patches allow us to power the Thunderbolt
controller up or down from the OS and we can re-use that code in
the system-wide suspend code path if s2idle is used.

Without s2idle, power to the controller is cut by transitioning
to ACPI S3 or S4, which disables the controller's power rails
without OS involvement.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: don't power-off apple thunderbolt controller on s2idle
  2021-05-07 14:08       ` Konstantin Kharlamov
@ 2021-05-12 20:36         ` Konstantin Kharlamov
  2021-05-17 19:51           ` PING " Konstantin Kharlamov
  2021-05-19 17:28         ` Bjorn Helgaas
  1 sibling, 1 reply; 30+ messages in thread
From: Konstantin Kharlamov @ 2021-05-12 20:36 UTC (permalink / raw)
  To: Bjorn Helgaas, Lukas Wunner
  Cc: linux-pci, Rafael J. Wysocki, Andreas Noever, linux-pm

Bjorn: so, given this and other comments, what's the decision on this patch? Any
particular changes I should do for it to be accepted?

On Fri, 2021-05-07 at 17:08 +0300, Konstantin Kharlamov wrote:
> On Fri, 2021-05-07 at 08:30 -0500, Bjorn Helgaas wrote:
> > On Fri, May 07, 2021 at 12:07:38AM +0200, Lukas Wunner wrote:
> > > On Thu, May 06, 2021 at 04:48:42PM -0500, Bjorn Helgaas wrote:
> > > > On Thu, May 06, 2021 at 08:38:20PM +0300, Konstantin Kharlamov wrote:
> > > > > On Macbook 2013 resuming from s2idle results in external monitor no
> > > > > longer being detected, and dmesg having errors like:
> > > > > 
> > > > >     pcieport 0000:06:00.0: can't change power state from D3hot to D0
> > > > > (config space inaccessible)
> > > > > 
> > > > > and a stacktrace. The reason turned out that the hw that the quirk
> > > > > powers off does not get powered on back on resume.
> > > > 
> > > > quirk_apple_poweroff_thunderbolt() was added in 2014 by 1df5172c5c25
> > > > ("PCI: Suspend/resume quirks for Apple thunderbolt").  It claims
> > > > "power is automatically restored before resume," so there must be
> > > > something special about s2idle that prevents the power-on.
> > > 
> > > With s2idle, the machine isn't suspended via ACPI, so the AML code
> > > which powers the controller off isn't executed.  The dance to prepare
> > > the controller for power-off consequently isn't necessary but rather
> > > harmful.
> > > 
> > > To get the same power savings as with ACPI suspend, the controller
> > > needs to be powered off via runtime suspend.  I posted patches for
> > > that back in 2016.  I'm using them on my laptop, they need some
> > > polishing and rebasing before I can repost them due to massive
> > > changes that have happened in the thunderbolt driver in the meantime.
> > > Without these patches, the controller sucks 1.5W of power in s2idle.
> > > 
> > > > Obviously the *hardware* hasn't changed since 1df5172c5c25.  Is s2idle
> > > > something that wasn't tested back then, or is this problem connected
> > > > to an s2idle change since then?  Can we identify a commit that
> > > > introduced this problem?  That would help with backporting or stable
> > > > tags.
> > > 
> > > Yes I believe the quirk predates the introduction of s2idle by a couple
> > > of years.
> > 
> > In an ideal world, we would know which commit introduced s2idle and
> > hence the possibility of hitting this bug, and we would add a Fixes:
> > tag for that commit so we could connect this fix with it.
> > 
> > Apart from that, what I don't like about this (and about the original
> > 1df5172c5c25) is that there's no connection to a spec or to documented
> > behavior of the device or of suspend/resume.
> 
> I did some research, and found that s2idle was first introduced in 2013 in
> commit 7e73c5ae6e799 (except it wasn't called "s2idle", by that name it goes
> since around 2016 as Lukas mentioned. In 7e73c5ae6e799 it is called "freeze").
> This is before 1df5172c5c25 which was added in 2014, so I guess we can add a:
> 
>         Fixes: 1df5172c5c25 ("PCI: Suspend/resume quirks for Apple
> thunderbolt")
> 
> > For example, "With s2idle, the machine isn't suspended via ACPI, so
> > the AML code which powers the controller off isn't executed."  AFAICT
> > that isn't actually a required, documented property of s2idle, but
> > rather it reaches into the internal implementation.
> > 
> > The code comment "If suspend mode is s2idle, power won't get restored
> > on resume" is similar.  !pm_suspend_via_firmware() tells us that
> > platform firmware won't be invoked.  But the connection between *that*
> > and "power won't get restored" is unexplained.
> 
> Sorry, I can't comment anything regarding AML and power management in general
> since I am really new to all of this. However, regarding the usage of the
> `pm_suspend_via_firmware()`: yeah, I also think it is unclear what this does,
> and I was thinking about adding a wrapper function something like
> `is_s2idle()` to the suspend.h, which would simply call
> `pm_suspend_via_firmware` internally. I didn't do that because I thought that
> usage of pm_suspend_via_firmware() for that task is just something people
> working with power management are supposed to know, but if someone else
> questions it too, I can make such wrapper, it's just a 3 lines-of-code change.
> 
> FWIW, originally I found out that pm_suspend_via_firmware() can be used for
> detecting s2idle by simply asking about it on linux-pm:
> https://marc.info/?l=linux-pm&m=162029296108775&w=2
> 



^ permalink raw reply	[flat|nested] 30+ messages in thread

* PING Re: [PATCH] PCI: don't power-off apple thunderbolt controller on s2idle
  2021-05-12 20:36         ` Konstantin Kharlamov
@ 2021-05-17 19:51           ` Konstantin Kharlamov
  0 siblings, 0 replies; 30+ messages in thread
From: Konstantin Kharlamov @ 2021-05-17 19:51 UTC (permalink / raw)
  To: Bjorn Helgaas, Lukas Wunner
  Cc: linux-pci, Rafael J. Wysocki, Andreas Noever, linux-pm

Ping

On Wed, 2021-05-12 at 23:36 +0300, Konstantin Kharlamov wrote:
> Bjorn: so, given this and other comments, what's the decision on this patch?
> Any
> particular changes I should do for it to be accepted?
> 
> On Fri, 2021-05-07 at 17:08 +0300, Konstantin Kharlamov wrote:
> > On Fri, 2021-05-07 at 08:30 -0500, Bjorn Helgaas wrote:
> > > On Fri, May 07, 2021 at 12:07:38AM +0200, Lukas Wunner wrote:
> > > > On Thu, May 06, 2021 at 04:48:42PM -0500, Bjorn Helgaas wrote:
> > > > > On Thu, May 06, 2021 at 08:38:20PM +0300, Konstantin Kharlamov wrote:
> > > > > > On Macbook 2013 resuming from s2idle results in external monitor no
> > > > > > longer being detected, and dmesg having errors like:
> > > > > > 
> > > > > >     pcieport 0000:06:00.0: can't change power state from D3hot to D0
> > > > > > (config space inaccessible)
> > > > > > 
> > > > > > and a stacktrace. The reason turned out that the hw that the quirk
> > > > > > powers off does not get powered on back on resume.
> > > > > 
> > > > > quirk_apple_poweroff_thunderbolt() was added in 2014 by 1df5172c5c25
> > > > > ("PCI: Suspend/resume quirks for Apple thunderbolt").  It claims
> > > > > "power is automatically restored before resume," so there must be
> > > > > something special about s2idle that prevents the power-on.
> > > > 
> > > > With s2idle, the machine isn't suspended via ACPI, so the AML code
> > > > which powers the controller off isn't executed.  The dance to prepare
> > > > the controller for power-off consequently isn't necessary but rather
> > > > harmful.
> > > > 
> > > > To get the same power savings as with ACPI suspend, the controller
> > > > needs to be powered off via runtime suspend.  I posted patches for
> > > > that back in 2016.  I'm using them on my laptop, they need some
> > > > polishing and rebasing before I can repost them due to massive
> > > > changes that have happened in the thunderbolt driver in the meantime.
> > > > Without these patches, the controller sucks 1.5W of power in s2idle.
> > > > 
> > > > > Obviously the *hardware* hasn't changed since 1df5172c5c25.  Is s2idle
> > > > > something that wasn't tested back then, or is this problem connected
> > > > > to an s2idle change since then?  Can we identify a commit that
> > > > > introduced this problem?  That would help with backporting or stable
> > > > > tags.
> > > > 
> > > > Yes I believe the quirk predates the introduction of s2idle by a couple
> > > > of years.
> > > 
> > > In an ideal world, we would know which commit introduced s2idle and
> > > hence the possibility of hitting this bug, and we would add a Fixes:
> > > tag for that commit so we could connect this fix with it.
> > > 
> > > Apart from that, what I don't like about this (and about the original
> > > 1df5172c5c25) is that there's no connection to a spec or to documented
> > > behavior of the device or of suspend/resume.
> > 
> > I did some research, and found that s2idle was first introduced in 2013 in
> > commit 7e73c5ae6e799 (except it wasn't called "s2idle", by that name it goes
> > since around 2016 as Lukas mentioned. In 7e73c5ae6e799 it is called
> > "freeze").
> > This is before 1df5172c5c25 which was added in 2014, so I guess we can add
> > a:
> > 
> >         Fixes: 1df5172c5c25 ("PCI: Suspend/resume quirks for Apple
> > thunderbolt")
> > 
> > > For example, "With s2idle, the machine isn't suspended via ACPI, so
> > > the AML code which powers the controller off isn't executed."  AFAICT
> > > that isn't actually a required, documented property of s2idle, but
> > > rather it reaches into the internal implementation.
> > > 
> > > The code comment "If suspend mode is s2idle, power won't get restored
> > > on resume" is similar.  !pm_suspend_via_firmware() tells us that
> > > platform firmware won't be invoked.  But the connection between *that*
> > > and "power won't get restored" is unexplained.
> > 
> > Sorry, I can't comment anything regarding AML and power management in
> > general
> > since I am really new to all of this. However, regarding the usage of the
> > `pm_suspend_via_firmware()`: yeah, I also think it is unclear what this
> > does,
> > and I was thinking about adding a wrapper function something like
> > `is_s2idle()` to the suspend.h, which would simply call
> > `pm_suspend_via_firmware` internally. I didn't do that because I thought
> > that
> > usage of pm_suspend_via_firmware() for that task is just something people
> > working with power management are supposed to know, but if someone else
> > questions it too, I can make such wrapper, it's just a 3 lines-of-code
> > change.
> > 
> > FWIW, originally I found out that pm_suspend_via_firmware() can be used for
> > detecting s2idle by simply asking about it on linux-pm:
> > https://marc.info/?l=linux-pm&m=162029296108775&w=2
> > 
> 
> 



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: don't power-off apple thunderbolt controller on s2idle
  2021-05-07 14:08       ` Konstantin Kharlamov
  2021-05-12 20:36         ` Konstantin Kharlamov
@ 2021-05-19 17:28         ` Bjorn Helgaas
  2021-05-19 19:12           ` Rafael J. Wysocki
  1 sibling, 1 reply; 30+ messages in thread
From: Bjorn Helgaas @ 2021-05-19 17:28 UTC (permalink / raw)
  To: Konstantin Kharlamov
  Cc: Lukas Wunner, linux-pci, Rafael J. Wysocki, Andreas Noever, linux-pm

On Fri, May 07, 2021 at 05:08:42PM +0300, Konstantin Kharlamov wrote:
> On Fri, 2021-05-07 at 08:30 -0500, Bjorn Helgaas wrote:
> > On Fri, May 07, 2021 at 12:07:38AM +0200, Lukas Wunner wrote:
> > > On Thu, May 06, 2021 at 04:48:42PM -0500, Bjorn Helgaas wrote:
> > > > On Thu, May 06, 2021 at 08:38:20PM +0300, Konstantin Kharlamov wrote:
> > > > > On Macbook 2013 resuming from s2idle results in external monitor no
> > > > > longer being detected, and dmesg having errors like:
> > > > > 
> > > > >     pcieport 0000:06:00.0: can't change power state from D3hot to D0
> > > > > (config space inaccessible)
> > > > > 
> > > > > and a stacktrace. The reason turned out that the hw that the quirk
> > > > > powers off does not get powered on back on resume.

> > For example, "With s2idle, the machine isn't suspended via ACPI, so
> > the AML code which powers the controller off isn't executed."  AFAICT
> > that isn't actually a required, documented property of s2idle, but
> > rather it reaches into the internal implementation.

> > The code comment "If suspend mode is s2idle, power won't get restored
> > on resume" is similar.  !pm_suspend_via_firmware() tells us that
> > platform firmware won't be invoked.  But the connection between *that*
> > and "power won't get restored" is unexplained.
> 
> Sorry, I can't comment anything regarding AML and power management
> in general since I am really new to all of this. However, regarding
> the usage of the `pm_suspend_via_firmware()`: yeah, I also think it
> is unclear what this does, and I was thinking about adding a wrapper
> function something like `is_s2idle()` to the suspend.h, which would
> simply call `pm_suspend_via_firmware` internally.

No, that's not my point at all.  I don't really care whether the
interface is called pm_suspend_via_firmware() or is_s2idle().

What I don't like about this is that it's all just unexplained magic,
as was the original quirk.

IIUC, the quirk is applied by pci_pm_suspend_noirq() *after* it puts
the device in a low-power state.  Here's my uninformed speculation
about what happens:

  - On suspend, pci_pm_suspend_noirq() puts device in low-power state.
    My *guess* is this means D0 or D3hot for s2idle, and D3cold for
    everything else.  [Do we have sufficient debug to find out what
    these states are?]

  - pci_pm_suspend_noirq() does pci_fixup_suspend_late fixups,
    including quirk_apple_poweroff_thunderbolt().

  - quirk_apple_poweroff_thunderbolt() runs the magic SXIO/SXFP/SXLF
    methods, which apparently turn off more power.

  - On resume, pci_pm_resume_noirq() brings the device back to D0.

    If we're resuming from standby, S2RAM, or STD, I speculate the
    device is in D3cold, so this involves running AML that seems to
    undo whatever SXIO/SXFP/SXLF did.

    If we're resuming from s2idle, I speculate the device is in D0 or
    D3hot, and we run different AML (or maybe no AML at all), and we
    *don't* undo the effects of SXIO/SXFP/SXLF, so the device doesn't
    work.

If the above is anything like what's happening, we should be able to
skip SXIO/SXFP/SXLF based on the current power state of the device.
E.g., if the device is in D0 or D3hot, we should not use
SXIO/SXFP/SXLF to yank power.

That would seem more connected to the observable state of the device
than using pm_suspend_via_firmware(), which relies on the connection
between s2idle and PM_SUSPEND_FLAG_FW_SUSPEND (which is not at all
obvious) and the power state of the device while in s2idle (also not
obvious).

Bjorn

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: don't power-off apple thunderbolt controller on s2idle
  2021-05-19 17:28         ` Bjorn Helgaas
@ 2021-05-19 19:12           ` Rafael J. Wysocki
  2021-05-19 19:48             ` Bjorn Helgaas
  0 siblings, 1 reply; 30+ messages in thread
From: Rafael J. Wysocki @ 2021-05-19 19:12 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Konstantin Kharlamov, Lukas Wunner, Linux PCI, Rafael J. Wysocki,
	Andreas Noever, Linux PM

On Wed, May 19, 2021 at 7:28 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Fri, May 07, 2021 at 05:08:42PM +0300, Konstantin Kharlamov wrote:
> > On Fri, 2021-05-07 at 08:30 -0500, Bjorn Helgaas wrote:
> > > On Fri, May 07, 2021 at 12:07:38AM +0200, Lukas Wunner wrote:
> > > > On Thu, May 06, 2021 at 04:48:42PM -0500, Bjorn Helgaas wrote:
> > > > > On Thu, May 06, 2021 at 08:38:20PM +0300, Konstantin Kharlamov wrote:
> > > > > > On Macbook 2013 resuming from s2idle results in external monitor no
> > > > > > longer being detected, and dmesg having errors like:
> > > > > >
> > > > > >     pcieport 0000:06:00.0: can't change power state from D3hot to D0
> > > > > > (config space inaccessible)
> > > > > >
> > > > > > and a stacktrace. The reason turned out that the hw that the quirk
> > > > > > powers off does not get powered on back on resume.
>
> > > For example, "With s2idle, the machine isn't suspended via ACPI, so
> > > the AML code which powers the controller off isn't executed."  AFAICT
> > > that isn't actually a required, documented property of s2idle, but
> > > rather it reaches into the internal implementation.
>
> > > The code comment "If suspend mode is s2idle, power won't get restored
> > > on resume" is similar.  !pm_suspend_via_firmware() tells us that
> > > platform firmware won't be invoked.  But the connection between *that*
> > > and "power won't get restored" is unexplained.
> >
> > Sorry, I can't comment anything regarding AML and power management
> > in general since I am really new to all of this. However, regarding
> > the usage of the `pm_suspend_via_firmware()`: yeah, I also think it
> > is unclear what this does, and I was thinking about adding a wrapper
> > function something like `is_s2idle()` to the suspend.h, which would
> > simply call `pm_suspend_via_firmware` internally.
>
> No, that's not my point at all.  I don't really care whether the
> interface is called pm_suspend_via_firmware() or is_s2idle().
>
> What I don't like about this is that it's all just unexplained magic,
> as was the original quirk.
>
> IIUC, the quirk is applied by pci_pm_suspend_noirq() *after* it puts
> the device in a low-power state.  Here's my uninformed speculation
> about what happens:
>
>   - On suspend, pci_pm_suspend_noirq() puts device in low-power state.

Right.

>     My *guess* is this means D0 or D3hot for s2idle, and D3cold for
>     everything else.

This really depends on the ACPI methods involved, but as a rule (and
I'm not aware of any exceptions) the target power state of the device
for s2idle and "everything else" is the same.  If D3cold is possible,
it will be D3cold.  If not, that will be D3hot etc.

The only difference (from the individual device perspective) is that
in the ACPI S3 (or S4 if this matters here) case the platform firmware
(and that's not AML but something like SMM) may cut power off from it
later.

>  [Do we have sufficient debug to find out what these states are?]

I'm not sure what you mean.  What states are supported or something else?

>   - pci_pm_suspend_noirq() does pci_fixup_suspend_late fixups,
>     including quirk_apple_poweroff_thunderbolt().
>
>   - quirk_apple_poweroff_thunderbolt() runs the magic SXIO/SXFP/SXLF
>     methods, which apparently turn off more power.
>
>   - On resume, pci_pm_resume_noirq() brings the device back to D0.
>
>     If we're resuming from standby, S2RAM, or STD, I speculate the
>     device is in D3cold, so this involves running AML that seems to
>     undo whatever SXIO/SXFP/SXLF did.

The standby, S2RAM and STD cases are actually different in that respect.

In the STD case we get the device from the restore kernel and it most
likely has been enumerated by it, so it is in whatever power state the
restore kernel puts it into.

For standby that most likely is the state in which the device was left
during suspend.

For S2RAM, it most likely will be something like D0-uninitialized,
because the platform firmware initiating the resume transition (which
is not AML, but again something like SMM) restores power to the
platform (including the majority of devices), but doesn't initialize
them as a rule.

The STD and S2RAM resume will undo the "magic", standby resume may not.

>
>     If we're resuming from s2idle, I speculate the device is in D0 or
>     D3hot, and we run different AML (or maybe no AML at all), and we

It is in whatever power state it was left in during suspend and we run
the same AML as in the other cases, modulo the possible power state
difference (with respect to the other cases).

>     *don't* undo the effects of SXIO/SXFP/SXLF, so the device doesn't
>     work.

This is correct.

> If the above is anything like what's happening, we should be able to
> skip SXIO/SXFP/SXLF based on the current power state of the device.

Not really.

> E.g., if the device is in D0 or D3hot, we should not use
> SXIO/SXFP/SXLF to yank power.
>
> That would seem more connected to the observable state of the device
> than using pm_suspend_via_firmware(), which relies on the connection
> between s2idle and PM_SUSPEND_FLAG_FW_SUSPEND (which is not at all
> obvious) and the power state of the device while in s2idle (also not
> obvious).

The problem is related to the fact that in s2idle the platform
firmware does not finalize the suspend transition and, consequently,
it doesn't initiate the resume transition.  Therefore whatever power
state the device was left in during suspend must be dealt with during
the subsequent resume.  Hence, if whatever is done by SXIO/SXFP/SXLF
in the suspend path cannot be reversed in the resume path by the
kernel (because there is no known method to do that), they should not
be invoked.  And that's exactly because the platform firmware will not
finalize the suspend transition which is indicated by
PM_SUSPEND_FLAG_FW_SUSPEND being unset.

Cheers!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: don't power-off apple thunderbolt controller on s2idle
  2021-05-19 19:12           ` Rafael J. Wysocki
@ 2021-05-19 19:48             ` Bjorn Helgaas
  2021-05-20 11:27               ` Rafael J. Wysocki
  0 siblings, 1 reply; 30+ messages in thread
From: Bjorn Helgaas @ 2021-05-19 19:48 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Konstantin Kharlamov, Lukas Wunner, Linux PCI, Rafael J. Wysocki,
	Andreas Noever, Linux PM

On Wed, May 19, 2021 at 09:12:26PM +0200, Rafael J. Wysocki wrote:
> On Wed, May 19, 2021 at 7:28 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Fri, May 07, 2021 at 05:08:42PM +0300, Konstantin Kharlamov wrote:
> > > On Fri, 2021-05-07 at 08:30 -0500, Bjorn Helgaas wrote:
> > > > On Fri, May 07, 2021 at 12:07:38AM +0200, Lukas Wunner wrote:
> > > > > On Thu, May 06, 2021 at 04:48:42PM -0500, Bjorn Helgaas wrote:
> > > > > > On Thu, May 06, 2021 at 08:38:20PM +0300, Konstantin Kharlamov wrote:
> > > > > > > On Macbook 2013 resuming from s2idle results in external monitor no
> > > > > > > longer being detected, and dmesg having errors like:
> > > > > > >
> > > > > > >     pcieport 0000:06:00.0: can't change power state from D3hot to D0
> > > > > > > (config space inaccessible)
> > > > > > >
> > > > > > > and a stacktrace. The reason turned out that the hw that the quirk
> > > > > > > powers off does not get powered on back on resume.
> >
> > > > For example, "With s2idle, the machine isn't suspended via ACPI, so
> > > > the AML code which powers the controller off isn't executed."  AFAICT
> > > > that isn't actually a required, documented property of s2idle, but
> > > > rather it reaches into the internal implementation.
> >
> > > > The code comment "If suspend mode is s2idle, power won't get restored
> > > > on resume" is similar.  !pm_suspend_via_firmware() tells us that
> > > > platform firmware won't be invoked.  But the connection between *that*
> > > > and "power won't get restored" is unexplained.
> > >
> > > Sorry, I can't comment anything regarding AML and power management
> > > in general since I am really new to all of this. However, regarding
> > > the usage of the `pm_suspend_via_firmware()`: yeah, I also think it
> > > is unclear what this does, and I was thinking about adding a wrapper
> > > function something like `is_s2idle()` to the suspend.h, which would
> > > simply call `pm_suspend_via_firmware` internally.
> >
> > No, that's not my point at all.  I don't really care whether the
> > interface is called pm_suspend_via_firmware() or is_s2idle().
> >
> > What I don't like about this is that it's all just unexplained magic,
> > as was the original quirk.
> >
> > IIUC, the quirk is applied by pci_pm_suspend_noirq() *after* it puts
> > the device in a low-power state.  Here's my uninformed speculation
> > about what happens:
> >
> >   - On suspend, pci_pm_suspend_noirq() puts device in low-power state.
> 
> Right.
> 
> >     My *guess* is this means D0 or D3hot for s2idle, and D3cold for
> >     everything else.
> 
> This really depends on the ACPI methods involved, but as a rule (and
> I'm not aware of any exceptions) the target power state of the device
> for s2idle and "everything else" is the same.  If D3cold is possible,
> it will be D3cold.  If not, that will be D3hot etc.
> 
> The only difference (from the individual device perspective) is that
> in the ACPI S3 (or S4 if this matters here) case the platform firmware
> (and that's not AML but something like SMM) may cut power off from it
> later.
> 
> >  [Do we have sufficient debug to find out what these states are?]
> 
> I'm not sure what you mean.  What states are supported or something else?

I was curious about what states we actually put this device in for
s2idle/standby/s2ram/std because I was hoping we could decide based
on what state s2idle used.  But if they all use the same target state,
that idea won't work.

> >   - pci_pm_suspend_noirq() does pci_fixup_suspend_late fixups,
> >     including quirk_apple_poweroff_thunderbolt().
> >
> >   - quirk_apple_poweroff_thunderbolt() runs the magic SXIO/SXFP/SXLF
> >     methods, which apparently turn off more power.
> >
> >   - On resume, pci_pm_resume_noirq() brings the device back to D0.
> >
> >     If we're resuming from standby, S2RAM, or STD, I speculate the
> >     device is in D3cold, so this involves running AML that seems to
> >     undo whatever SXIO/SXFP/SXLF did.
> 
> The standby, S2RAM and STD cases are actually different in that respect.
> 
> In the STD case we get the device from the restore kernel and it most
> likely has been enumerated by it, so it is in whatever power state the
> restore kernel puts it into.
> 
> For standby that most likely is the state in which the device was left
> during suspend.
> 
> For S2RAM, it most likely will be something like D0-uninitialized,
> because the platform firmware initiating the resume transition (which
> is not AML, but again something like SMM) restores power to the
> platform (including the majority of devices), but doesn't initialize
> them as a rule.
> 
> The STD and S2RAM resume will undo the "magic", standby resume may not.
> 
> >     If we're resuming from s2idle, I speculate the device is in D0 or
> >     D3hot, and we run different AML (or maybe no AML at all), and we
> 
> It is in whatever power state it was left in during suspend and we run
> the same AML as in the other cases, modulo the possible power state
> difference (with respect to the other cases).
> 
> >     *don't* undo the effects of SXIO/SXFP/SXLF, so the device doesn't
> >     work.
> 
> This is correct.
> 
> > If the above is anything like what's happening, we should be able to
> > skip SXIO/SXFP/SXLF based on the current power state of the device.
> 
> Not really.
> 
> > E.g., if the device is in D0 or D3hot, we should not use
> > SXIO/SXFP/SXLF to yank power.
> >
> > That would seem more connected to the observable state of the device
> > than using pm_suspend_via_firmware(), which relies on the connection
> > between s2idle and PM_SUSPEND_FLAG_FW_SUSPEND (which is not at all
> > obvious) and the power state of the device while in s2idle (also not
> > obvious).
> 
> The problem is related to the fact that in s2idle the platform
> firmware does not finalize the suspend transition and, consequently,
> it doesn't initiate the resume transition.  Therefore whatever power
> state the device was left in during suspend must be dealt with during
> the subsequent resume.  Hence, if whatever is done by SXIO/SXFP/SXLF
> in the suspend path cannot be reversed in the resume path by the
> kernel (because there is no known method to do that), they should not
> be invoked.  And that's exactly because the platform firmware will not
> finalize the suspend transition which is indicated by
> PM_SUSPEND_FLAG_FW_SUSPEND being unset.

How can we connect "if (!pm_suspend_via_firmware())" in this patch
with the fact that firmware doesn't finalize suspend (and consequently
does not reverse things in resume)?

I don't see any use of pm_suspend_via_firmware() or
PM_SUSPEND_FLAG_FW_SUSPEND that looks relevant.  That makes it really
hard to figure out how this patch works and how to avoid breaking it
in the future.

Thanks for your patience in explaining all this!

Bjorn

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: don't power-off apple thunderbolt controller on s2idle
  2021-05-19 19:48             ` Bjorn Helgaas
@ 2021-05-20 11:27               ` Rafael J. Wysocki
  2021-05-20 11:54                 ` Rafael J. Wysocki
  0 siblings, 1 reply; 30+ messages in thread
From: Rafael J. Wysocki @ 2021-05-20 11:27 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, Konstantin Kharlamov, Lukas Wunner, Linux PCI,
	Rafael J. Wysocki, Andreas Noever, Linux PM

On Wed, May 19, 2021 at 9:48 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Wed, May 19, 2021 at 09:12:26PM +0200, Rafael J. Wysocki wrote:
> > On Wed, May 19, 2021 at 7:28 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Fri, May 07, 2021 at 05:08:42PM +0300, Konstantin Kharlamov wrote:
> > > > On Fri, 2021-05-07 at 08:30 -0500, Bjorn Helgaas wrote:
> > > > > On Fri, May 07, 2021 at 12:07:38AM +0200, Lukas Wunner wrote:
> > > > > > On Thu, May 06, 2021 at 04:48:42PM -0500, Bjorn Helgaas wrote:
> > > > > > > On Thu, May 06, 2021 at 08:38:20PM +0300, Konstantin Kharlamov wrote:
> > > > > > > > On Macbook 2013 resuming from s2idle results in external monitor no
> > > > > > > > longer being detected, and dmesg having errors like:
> > > > > > > >
> > > > > > > >     pcieport 0000:06:00.0: can't change power state from D3hot to D0
> > > > > > > > (config space inaccessible)
> > > > > > > >
> > > > > > > > and a stacktrace. The reason turned out that the hw that the quirk
> > > > > > > > powers off does not get powered on back on resume.
> > >
> > > > > For example, "With s2idle, the machine isn't suspended via ACPI, so
> > > > > the AML code which powers the controller off isn't executed."  AFAICT
> > > > > that isn't actually a required, documented property of s2idle, but
> > > > > rather it reaches into the internal implementation.
> > >
> > > > > The code comment "If suspend mode is s2idle, power won't get restored
> > > > > on resume" is similar.  !pm_suspend_via_firmware() tells us that
> > > > > platform firmware won't be invoked.  But the connection between *that*
> > > > > and "power won't get restored" is unexplained.
> > > >
> > > > Sorry, I can't comment anything regarding AML and power management
> > > > in general since I am really new to all of this. However, regarding
> > > > the usage of the `pm_suspend_via_firmware()`: yeah, I also think it
> > > > is unclear what this does, and I was thinking about adding a wrapper
> > > > function something like `is_s2idle()` to the suspend.h, which would
> > > > simply call `pm_suspend_via_firmware` internally.
> > >
> > > No, that's not my point at all.  I don't really care whether the
> > > interface is called pm_suspend_via_firmware() or is_s2idle().
> > >
> > > What I don't like about this is that it's all just unexplained magic,
> > > as was the original quirk.
> > >
> > > IIUC, the quirk is applied by pci_pm_suspend_noirq() *after* it puts
> > > the device in a low-power state.  Here's my uninformed speculation
> > > about what happens:
> > >
> > >   - On suspend, pci_pm_suspend_noirq() puts device in low-power state.
> >
> > Right.
> >
> > >     My *guess* is this means D0 or D3hot for s2idle, and D3cold for
> > >     everything else.
> >
> > This really depends on the ACPI methods involved, but as a rule (and
> > I'm not aware of any exceptions) the target power state of the device
> > for s2idle and "everything else" is the same.  If D3cold is possible,
> > it will be D3cold.  If not, that will be D3hot etc.
> >
> > The only difference (from the individual device perspective) is that
> > in the ACPI S3 (or S4 if this matters here) case the platform firmware
> > (and that's not AML but something like SMM) may cut power off from it
> > later.
> >
> > >  [Do we have sufficient debug to find out what these states are?]
> >
> > I'm not sure what you mean.  What states are supported or something else?
>
> I was curious about what states we actually put this device in for
> s2idle/standby/s2ram/std because I was hoping we could decide based
> on what state s2idle used.  But if they all use the same target state,
> that idea won't work.
>
> > >   - pci_pm_suspend_noirq() does pci_fixup_suspend_late fixups,
> > >     including quirk_apple_poweroff_thunderbolt().
> > >
> > >   - quirk_apple_poweroff_thunderbolt() runs the magic SXIO/SXFP/SXLF
> > >     methods, which apparently turn off more power.
> > >
> > >   - On resume, pci_pm_resume_noirq() brings the device back to D0.
> > >
> > >     If we're resuming from standby, S2RAM, or STD, I speculate the
> > >     device is in D3cold, so this involves running AML that seems to
> > >     undo whatever SXIO/SXFP/SXLF did.
> >
> > The standby, S2RAM and STD cases are actually different in that respect.
> >
> > In the STD case we get the device from the restore kernel and it most
> > likely has been enumerated by it, so it is in whatever power state the
> > restore kernel puts it into.
> >
> > For standby that most likely is the state in which the device was left
> > during suspend.
> >
> > For S2RAM, it most likely will be something like D0-uninitialized,
> > because the platform firmware initiating the resume transition (which
> > is not AML, but again something like SMM) restores power to the
> > platform (including the majority of devices), but doesn't initialize
> > them as a rule.
> >
> > The STD and S2RAM resume will undo the "magic", standby resume may not.
> >
> > >     If we're resuming from s2idle, I speculate the device is in D0 or
> > >     D3hot, and we run different AML (or maybe no AML at all), and we
> >
> > It is in whatever power state it was left in during suspend and we run
> > the same AML as in the other cases, modulo the possible power state
> > difference (with respect to the other cases).
> >
> > >     *don't* undo the effects of SXIO/SXFP/SXLF, so the device doesn't
> > >     work.
> >
> > This is correct.
> >
> > > If the above is anything like what's happening, we should be able to
> > > skip SXIO/SXFP/SXLF based on the current power state of the device.
> >
> > Not really.
> >
> > > E.g., if the device is in D0 or D3hot, we should not use
> > > SXIO/SXFP/SXLF to yank power.
> > >
> > > That would seem more connected to the observable state of the device
> > > than using pm_suspend_via_firmware(), which relies on the connection
> > > between s2idle and PM_SUSPEND_FLAG_FW_SUSPEND (which is not at all
> > > obvious) and the power state of the device while in s2idle (also not
> > > obvious).
> >
> > The problem is related to the fact that in s2idle the platform
> > firmware does not finalize the suspend transition and, consequently,
> > it doesn't initiate the resume transition.  Therefore whatever power
> > state the device was left in during suspend must be dealt with during
> > the subsequent resume.  Hence, if whatever is done by SXIO/SXFP/SXLF
> > in the suspend path cannot be reversed in the resume path by the
> > kernel (because there is no known method to do that), they should not
> > be invoked.  And that's exactly because the platform firmware will not
> > finalize the suspend transition which is indicated by
> > PM_SUSPEND_FLAG_FW_SUSPEND being unset.
>
> How can we connect "if (!pm_suspend_via_firmware())" in this patch
> with the fact that firmware doesn't finalize suspend (and consequently
> does not reverse things in resume)?
>
> I don't see any use of pm_suspend_via_firmware() or
> PM_SUSPEND_FLAG_FW_SUSPEND that looks relevant.

First of all, there is a kerneldoc comment next to
pm_suspend_via_firmware() which is relevant.  Especially the last
paragraph of that comment applies directly to the case at hand IMV.

Next, if you see how pm_suspend_global_flags get populated, you'll
notice that pm_suspend_clear_flags() is called early in both the
system-wide suspend and hibernation paths and then the ACPI platform
callbacks invoke pm_set_suspend_via_firmware() when the platform is
going to finalize the suspend transition.  [In particular, notice the
acpi_state > ACPI_STATE_S1 condition in acpi_suspend_begin(), which is
related to the difference between ACPI S3/S4 and standby that I was
talking about before.]  This means that, on a system with ACPI,
PM_SUSPEND_FLAG_FW_SUSPEND is set only if the target system state is
above S1 and in those cases the firmware finalizes the suspend
transition.

Of course, s2idle callbacks don't invoke pm_set_suspend_via_firmware()
at all, because the target ACPI system state for s2idle is S0 (the
working state from the ACPI perspective).

> That makes it really hard to figure out how this patch works and how to avoid breaking it
> in the future.

Well, if you don't trust kerneldoc comments, you get to figure out
things yourself which may not be straightforward ...

> Thanks for your patience in explaining all this!

YW

Cheers!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: don't power-off apple thunderbolt controller on s2idle
  2021-05-20 11:27               ` Rafael J. Wysocki
@ 2021-05-20 11:54                 ` Rafael J. Wysocki
  2021-05-20 19:49                   ` Bjorn Helgaas
  0 siblings, 1 reply; 30+ messages in thread
From: Rafael J. Wysocki @ 2021-05-20 11:54 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, Konstantin Kharlamov, Lukas Wunner, Linux PCI,
	Rafael J. Wysocki, Andreas Noever, Linux PM

On Thu, May 20, 2021 at 1:27 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Wed, May 19, 2021 at 9:48 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > On Wed, May 19, 2021 at 09:12:26PM +0200, Rafael J. Wysocki wrote:
> > > On Wed, May 19, 2021 at 7:28 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Fri, May 07, 2021 at 05:08:42PM +0300, Konstantin Kharlamov wrote:
> > > > > On Fri, 2021-05-07 at 08:30 -0500, Bjorn Helgaas wrote:
> > > > > > On Fri, May 07, 2021 at 12:07:38AM +0200, Lukas Wunner wrote:
> > > > > > > On Thu, May 06, 2021 at 04:48:42PM -0500, Bjorn Helgaas wrote:
> > > > > > > > On Thu, May 06, 2021 at 08:38:20PM +0300, Konstantin Kharlamov wrote:
> > > > > > > > > On Macbook 2013 resuming from s2idle results in external monitor no
> > > > > > > > > longer being detected, and dmesg having errors like:
> > > > > > > > >
> > > > > > > > >     pcieport 0000:06:00.0: can't change power state from D3hot to D0
> > > > > > > > > (config space inaccessible)
> > > > > > > > >
> > > > > > > > > and a stacktrace. The reason turned out that the hw that the quirk
> > > > > > > > > powers off does not get powered on back on resume.
> > > >
> > > > > > For example, "With s2idle, the machine isn't suspended via ACPI, so
> > > > > > the AML code which powers the controller off isn't executed."  AFAICT
> > > > > > that isn't actually a required, documented property of s2idle, but
> > > > > > rather it reaches into the internal implementation.
> > > >
> > > > > > The code comment "If suspend mode is s2idle, power won't get restored
> > > > > > on resume" is similar.  !pm_suspend_via_firmware() tells us that
> > > > > > platform firmware won't be invoked.  But the connection between *that*
> > > > > > and "power won't get restored" is unexplained.
> > > > >
> > > > > Sorry, I can't comment anything regarding AML and power management
> > > > > in general since I am really new to all of this. However, regarding
> > > > > the usage of the `pm_suspend_via_firmware()`: yeah, I also think it
> > > > > is unclear what this does, and I was thinking about adding a wrapper
> > > > > function something like `is_s2idle()` to the suspend.h, which would
> > > > > simply call `pm_suspend_via_firmware` internally.
> > > >
> > > > No, that's not my point at all.  I don't really care whether the
> > > > interface is called pm_suspend_via_firmware() or is_s2idle().
> > > >
> > > > What I don't like about this is that it's all just unexplained magic,
> > > > as was the original quirk.
> > > >
> > > > IIUC, the quirk is applied by pci_pm_suspend_noirq() *after* it puts
> > > > the device in a low-power state.  Here's my uninformed speculation
> > > > about what happens:
> > > >
> > > >   - On suspend, pci_pm_suspend_noirq() puts device in low-power state.
> > >
> > > Right.
> > >
> > > >     My *guess* is this means D0 or D3hot for s2idle, and D3cold for
> > > >     everything else.
> > >
> > > This really depends on the ACPI methods involved, but as a rule (and
> > > I'm not aware of any exceptions) the target power state of the device
> > > for s2idle and "everything else" is the same.  If D3cold is possible,
> > > it will be D3cold.  If not, that will be D3hot etc.
> > >
> > > The only difference (from the individual device perspective) is that
> > > in the ACPI S3 (or S4 if this matters here) case the platform firmware
> > > (and that's not AML but something like SMM) may cut power off from it
> > > later.
> > >
> > > >  [Do we have sufficient debug to find out what these states are?]
> > >
> > > I'm not sure what you mean.  What states are supported or something else?
> >
> > I was curious about what states we actually put this device in for
> > s2idle/standby/s2ram/std because I was hoping we could decide based
> > on what state s2idle used.  But if they all use the same target state,
> > that idea won't work.
> >
> > > >   - pci_pm_suspend_noirq() does pci_fixup_suspend_late fixups,
> > > >     including quirk_apple_poweroff_thunderbolt().
> > > >
> > > >   - quirk_apple_poweroff_thunderbolt() runs the magic SXIO/SXFP/SXLF
> > > >     methods, which apparently turn off more power.
> > > >
> > > >   - On resume, pci_pm_resume_noirq() brings the device back to D0.
> > > >
> > > >     If we're resuming from standby, S2RAM, or STD, I speculate the
> > > >     device is in D3cold, so this involves running AML that seems to
> > > >     undo whatever SXIO/SXFP/SXLF did.
> > >
> > > The standby, S2RAM and STD cases are actually different in that respect.
> > >
> > > In the STD case we get the device from the restore kernel and it most
> > > likely has been enumerated by it, so it is in whatever power state the
> > > restore kernel puts it into.
> > >
> > > For standby that most likely is the state in which the device was left
> > > during suspend.
> > >
> > > For S2RAM, it most likely will be something like D0-uninitialized,
> > > because the platform firmware initiating the resume transition (which
> > > is not AML, but again something like SMM) restores power to the
> > > platform (including the majority of devices), but doesn't initialize
> > > them as a rule.
> > >
> > > The STD and S2RAM resume will undo the "magic", standby resume may not.
> > >
> > > >     If we're resuming from s2idle, I speculate the device is in D0 or
> > > >     D3hot, and we run different AML (or maybe no AML at all), and we
> > >
> > > It is in whatever power state it was left in during suspend and we run
> > > the same AML as in the other cases, modulo the possible power state
> > > difference (with respect to the other cases).
> > >
> > > >     *don't* undo the effects of SXIO/SXFP/SXLF, so the device doesn't
> > > >     work.
> > >
> > > This is correct.
> > >
> > > > If the above is anything like what's happening, we should be able to
> > > > skip SXIO/SXFP/SXLF based on the current power state of the device.
> > >
> > > Not really.
> > >
> > > > E.g., if the device is in D0 or D3hot, we should not use
> > > > SXIO/SXFP/SXLF to yank power.
> > > >
> > > > That would seem more connected to the observable state of the device
> > > > than using pm_suspend_via_firmware(), which relies on the connection
> > > > between s2idle and PM_SUSPEND_FLAG_FW_SUSPEND (which is not at all
> > > > obvious) and the power state of the device while in s2idle (also not
> > > > obvious).
> > >
> > > The problem is related to the fact that in s2idle the platform
> > > firmware does not finalize the suspend transition and, consequently,
> > > it doesn't initiate the resume transition.  Therefore whatever power
> > > state the device was left in during suspend must be dealt with during
> > > the subsequent resume.  Hence, if whatever is done by SXIO/SXFP/SXLF
> > > in the suspend path cannot be reversed in the resume path by the
> > > kernel (because there is no known method to do that), they should not
> > > be invoked.  And that's exactly because the platform firmware will not
> > > finalize the suspend transition which is indicated by
> > > PM_SUSPEND_FLAG_FW_SUSPEND being unset.
> >
> > How can we connect "if (!pm_suspend_via_firmware())" in this patch
> > with the fact that firmware doesn't finalize suspend (and consequently
> > does not reverse things in resume)?
> >
> > I don't see any use of pm_suspend_via_firmware() or
> > PM_SUSPEND_FLAG_FW_SUSPEND that looks relevant.
>
> First of all, there is a kerneldoc comment next to
> pm_suspend_via_firmware() which is relevant.  Especially the last
> paragraph of that comment applies directly to the case at hand IMV.

BTW, the problem at hand is not that s2idle in particular needs to be
treated in a special way (this appears to be the source of all
confusion here).  The problem is that the kernel cannot undo the
SXIO/SXFP/SXLF magic without passing control to the platform firmware.

And "passing contol to the platform firmware" doesn't mean "executing
some AML" here, because control remains in the kernel when AML is
executed.  "Passing control to the platform firmware" means letting
some native firmware code (like SMM code) run which happens at the end
of S2/S3/S4 suspend transitions and it does not happen during S1
(standby) and s2idle suspend transitions.

That's why using SXIO/SXFP/SXLF is only valid during S2/S3/S4 suspend
transitions and it is not valid during s2idle and S1 suspend
transitions (and yes, S1 is also affected, so s2idle is not special in
that respect at all).

IMO the changelog of the patch needs to be rewritten, but the code
change made by it is reasonable.

Cheers!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: don't power-off apple thunderbolt controller on s2idle
  2021-05-06 21:48 ` Bjorn Helgaas
  2021-05-06 22:07   ` Lukas Wunner
  2021-05-07  9:32   ` Konstantin Kharlamov
@ 2021-05-20 11:58   ` Rafael J. Wysocki
  2 siblings, 0 replies; 30+ messages in thread
From: Rafael J. Wysocki @ 2021-05-20 11:58 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Konstantin Kharlamov, Linux PCI, Rafael J. Wysocki,
	Andreas Noever, Linux PM

On Thu, May 6, 2021 at 11:48 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> [+cc Rafael, Andreas, linux-pm]
>
> On Thu, May 06, 2021 at 08:38:20PM +0300, Konstantin Kharlamov wrote:
> > On Macbook 2013 resuming from s2idle results in external monitor no
> > longer being detected, and dmesg having errors like:
> >
> >     pcieport 0000:06:00.0: can't change power state from D3hot to D0 (config space inaccessible)
> >
> > and a stacktrace. The reason turned out that the hw that the quirk
> > powers off does not get powered on back on resume.
>
> quirk_apple_poweroff_thunderbolt() was added in 2014 by 1df5172c5c25
> ("PCI: Suspend/resume quirks for Apple thunderbolt").  It claims
> "power is automatically restored before resume," so there must be
> something special about s2idle that prevents the power-on.
>
> IIUC this change will reduce the s2idle power savings.  I would feel
> better about this if we understood what the difference was.
>
> > Thus, add a check for s2idle to the quirk, and do nothing if the suspend
> > mode is s2idle.
>
> Obviously the *hardware* hasn't changed since 1df5172c5c25.  Is s2idle
> something that wasn't tested back then, or is this problem connected
> to an s2idle change since then?  Can we identify a commit that
> introduced this problem?  That would help with backporting or stable
> tags.
>
> > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212767
>
> Thanks for this!  Would you mind attaching the output of
> "sudo lspci -vvv"?  If you attach any other dmesg, could you
> use "dmesg --color=never" so the log doesn't include all the
> escape characters?
>
> > Signed-off-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
> > ---
> >  drivers/pci/quirks.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index 653660e3ba9e..86fedcec37e2 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -27,6 +27,7 @@
> >  #include <linux/nvme.h>
> >  #include <linux/platform_data/x86/apple.h>
> >  #include <linux/pm_runtime.h>
> > +#include <linux/suspend.h>
> >  #include <linux/switchtec.h>
> >  #include <asm/dma.h> /* isa_dma_bridge_buggy */
> >  #include "pci.h"
> > @@ -3646,6 +3647,13 @@ static void quirk_apple_poweroff_thunderbolt(struct pci_dev *dev)
> >               return;
> >       if (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM)
> >               return;
> > +
> > +     /*
> > +      * If suspend mode is s2idle, power won't get restored on resume.
> > +      */

The comment above is incorrect.

The reason to return here is because whatever is done below cannot be
undone unless the platform firmware is invoked to finalize the suspend
transition, which is what the check is about.  And this is not
specific to s2idle.

> > +     if (!pm_suspend_via_firmware())
> > +             return;
> > +
> >       bridge = ACPI_HANDLE(&dev->dev);
> >       if (!bridge)
> >               return;
> > --
> > 2.31.1
> >

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: don't power-off apple thunderbolt controller on s2idle
  2021-05-20 11:54                 ` Rafael J. Wysocki
@ 2021-05-20 19:49                   ` Bjorn Helgaas
  2021-05-20 23:28                     ` Konstantin Kharlamov
                                       ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Bjorn Helgaas @ 2021-05-20 19:49 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Konstantin Kharlamov, Lukas Wunner, Linux PCI, Rafael J. Wysocki,
	Andreas Noever, Linux PM

On Thu, May 20, 2021 at 01:54:05PM +0200, Rafael J. Wysocki wrote:
> On Thu, May 20, 2021 at 1:27 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > On Wed, May 19, 2021 at 9:48 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Wed, May 19, 2021 at 09:12:26PM +0200, Rafael J. Wysocki wrote:

> > > > The problem is related to the fact that in s2idle the platform
> > > > firmware does not finalize the suspend transition and, consequently,
> > > > it doesn't initiate the resume transition.  Therefore whatever power
> > > > state the device was left in during suspend must be dealt with during
> > > > the subsequent resume.  Hence, if whatever is done by SXIO/SXFP/SXLF
> > > > in the suspend path cannot be reversed in the resume path by the
> > > > kernel (because there is no known method to do that), they should not
> > > > be invoked.  And that's exactly because the platform firmware will not
> > > > finalize the suspend transition which is indicated by
> > > > PM_SUSPEND_FLAG_FW_SUSPEND being unset.
> > >
> > > How can we connect "if (!pm_suspend_via_firmware())" in this patch
> > > with the fact that firmware doesn't finalize suspend (and consequently
> > > does not reverse things in resume)?
> > >
> > > I don't see any use of pm_suspend_via_firmware() or
> > > PM_SUSPEND_FLAG_FW_SUSPEND that looks relevant.
> >
> > First of all, there is a kerneldoc comment next to
> > pm_suspend_via_firmware() which is relevant.  Especially the last
> > paragraph of that comment applies directly to the case at hand IMV.

I do read kerneldoc, but I *rely* on the code, and it's nice when I
can match up the kerneldoc with what the code is doing :)

Part of my confusion is that "passing control to platform firmware"
isn't particularly useful in itself because it doesn't give a clue
about what firmware is *doing*.  Without knowing what it does, we
can't reason about how kernel's actions interact with firmware's
actions.

> BTW, the problem at hand is not that s2idle in particular needs to be
> treated in a special way (this appears to be the source of all
> confusion here).  The problem is that the kernel cannot undo the
> SXIO/SXFP/SXLF magic without passing control to the platform firmware.

I assume this is really a case of "the kernel doesn't know *what* to
do, but platform firmware does," so in principle the kernel *could*
undo the SXIO/SXFP/SXLF magic if it knew what to do.  

> And "passing control to the platform firmware" doesn't mean "executing
> some AML" here, because control remains in the kernel when AML is
> executed.  "Passing control to the platform firmware" means letting
> some native firmware code (like SMM code) run which happens at the end
> of S2/S3/S4 suspend transitions and it does not happen during S1
> (standby) and s2idle suspend transitions.
> 
> That's why using SXIO/SXFP/SXLF is only valid during S2/S3/S4 suspend
> transitions and it is not valid during s2idle and S1 suspend
> transitions (and yes, S1 is also affected, so s2idle is not special in
> that respect at all).
> 
> IMO the changelog of the patch needs to be rewritten, but the code
> change made by it is reasonable.

So IIUC the comment should say something like:

  SXIO/SXFP/SXLF turns off power to the Thunderbolt controller.  We
  don't know how to turn it back on again, but firmware does, so we
  can only use SXIO/SXFP/SXLF if we're suspending via firmware.

Actually, it sounds like the important thing is that we rely on the
firmware *resume* path to turn on the power again.

pm_resume_via_firmware() *sounds* like it would be appropriate, but
the kerneldoc says that's for use after resume, and it tells us
whether firmware has *already* handled the wakeup event.  And
PM_SUSPEND_FLAG_FW_RESUME isn't set until after we've run these
suspend_late fixups, so it wouldn't work here.

Bjorn

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: don't power-off apple thunderbolt controller on s2idle
  2021-05-20 19:49                   ` Bjorn Helgaas
@ 2021-05-20 23:28                     ` Konstantin Kharlamov
  2021-05-24  6:59                       ` Konstantin Kharlamov
  2021-05-20 23:55                     ` [PATCH v2] PCI: don't call firmware hooks on suspend unless it's fw-controlled Konstantin Kharlamov
  2021-05-21  9:47                     ` [PATCH] PCI: don't power-off apple thunderbolt controller on s2idle Rafael J. Wysocki
  2 siblings, 1 reply; 30+ messages in thread
From: Konstantin Kharlamov @ 2021-05-20 23:28 UTC (permalink / raw)
  To: Bjorn Helgaas, Rafael J. Wysocki
  Cc: Lukas Wunner, Linux PCI, Rafael J. Wysocki, Andreas Noever, Linux PM

Thank you very much. Well send then a v2 with the comment in a minute.

On Thu, 2021-05-20 at 14:49 -0500, Bjorn Helgaas wrote:
> On Thu, May 20, 2021 at 01:54:05PM +0200, Rafael J. Wysocki wrote:
> > On Thu, May 20, 2021 at 1:27 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > On Wed, May 19, 2021 at 9:48 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Wed, May 19, 2021 at 09:12:26PM +0200, Rafael J. Wysocki wrote:
> 
> > > > > The problem is related to the fact that in s2idle the platform
> > > > > firmware does not finalize the suspend transition and, consequently,
> > > > > it doesn't initiate the resume transition.  Therefore whatever power
> > > > > state the device was left in during suspend must be dealt with during
> > > > > the subsequent resume.  Hence, if whatever is done by SXIO/SXFP/SXLF
> > > > > in the suspend path cannot be reversed in the resume path by the
> > > > > kernel (because there is no known method to do that), they should not
> > > > > be invoked.  And that's exactly because the platform firmware will not
> > > > > finalize the suspend transition which is indicated by
> > > > > PM_SUSPEND_FLAG_FW_SUSPEND being unset.
> > > > 
> > > > How can we connect "if (!pm_suspend_via_firmware())" in this patch
> > > > with the fact that firmware doesn't finalize suspend (and consequently
> > > > does not reverse things in resume)?
> > > > 
> > > > I don't see any use of pm_suspend_via_firmware() or
> > > > PM_SUSPEND_FLAG_FW_SUSPEND that looks relevant.
> > > 
> > > First of all, there is a kerneldoc comment next to
> > > pm_suspend_via_firmware() which is relevant.  Especially the last
> > > paragraph of that comment applies directly to the case at hand IMV.
> 
> I do read kerneldoc, but I *rely* on the code, and it's nice when I
> can match up the kerneldoc with what the code is doing :)
> 
> Part of my confusion is that "passing control to platform firmware"
> isn't particularly useful in itself because it doesn't give a clue
> about what firmware is *doing*.  Without knowing what it does, we
> can't reason about how kernel's actions interact with firmware's
> actions.
> 
> > BTW, the problem at hand is not that s2idle in particular needs to be
> > treated in a special way (this appears to be the source of all
> > confusion here).  The problem is that the kernel cannot undo the
> > SXIO/SXFP/SXLF magic without passing control to the platform firmware.
> 
> I assume this is really a case of "the kernel doesn't know *what* to
> do, but platform firmware does," so in principle the kernel *could*
> undo the SXIO/SXFP/SXLF magic if it knew what to do.  
> 
> > And "passing control to the platform firmware" doesn't mean "executing
> > some AML" here, because control remains in the kernel when AML is
> > executed.  "Passing control to the platform firmware" means letting
> > some native firmware code (like SMM code) run which happens at the end
> > of S2/S3/S4 suspend transitions and it does not happen during S1
> > (standby) and s2idle suspend transitions.
> > 
> > That's why using SXIO/SXFP/SXLF is only valid during S2/S3/S4 suspend
> > transitions and it is not valid during s2idle and S1 suspend
> > transitions (and yes, S1 is also affected, so s2idle is not special in
> > that respect at all).
> > 
> > IMO the changelog of the patch needs to be rewritten, but the code
> > change made by it is reasonable.
> 
> So IIUC the comment should say something like:
> 
>   SXIO/SXFP/SXLF turns off power to the Thunderbolt controller.  We
>   don't know how to turn it back on again, but firmware does, so we
>   can only use SXIO/SXFP/SXLF if we're suspending via firmware.
> 
> Actually, it sounds like the important thing is that we rely on the
> firmware *resume* path to turn on the power again.
> 
> pm_resume_via_firmware() *sounds* like it would be appropriate, but
> the kerneldoc says that's for use after resume, and it tells us
> whether firmware has *already* handled the wakeup event.  And
> PM_SUSPEND_FLAG_FW_RESUME isn't set until after we've run these
> suspend_late fixups, so it wouldn't work here.
> 
> Bjorn



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2] PCI: don't call firmware hooks on suspend unless it's fw-controlled
  2021-05-20 19:49                   ` Bjorn Helgaas
  2021-05-20 23:28                     ` Konstantin Kharlamov
@ 2021-05-20 23:55                     ` Konstantin Kharlamov
  2021-05-28  7:39                       ` PING: " Konstantin Kharlamov
  2021-05-21  9:47                     ` [PATCH] PCI: don't power-off apple thunderbolt controller on s2idle Rafael J. Wysocki
  2 siblings, 1 reply; 30+ messages in thread
From: Konstantin Kharlamov @ 2021-05-20 23:55 UTC (permalink / raw)
  To: linux-pci; +Cc: hi-angel, helgaas, lukas, rjw, andreas.noever

On Macbook 2013 resuming from s2idle resulted in external monitor no
longer being detected, and dmesg having errors like:

    pcieport 0000:06:00.0: can't change power state from D3hot to D0 (config space inaccessible)

and a stacktrace. The reason is that in s2idle (and in S1 as noted by
Rafael) we do not call firmware code to handle suspend, and as result
while waking up firmware also does not handle resume.

This means, for the Thunderbolt controller that gets disabled in the
quirk by calling the firmware methods, there's no one to wake it back up
on resume.

To quote Rafael Wysocki:

> "Passing control to the platform firmware" means letting
> some native firmware code (like SMM code) run which happens at the end
> of S2/S3/S4 suspend transitions and it does not happen during S1
> (standby) and s2idle suspend transitions.
>
> That's why using SXIO/SXFP/SXLF is only valid during S2/S3/S4 suspend
> transitions and it is not valid during s2idle and S1 suspend
> transitions (and yes, S1 is also affected, so s2idle is not special in
> that respect at all).

Thus, return early from the quirk when suspend mode isn't one that calls
firmware.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212767
Signed-off-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/quirks.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 653660e3ba9e..f86b6388a04a 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -27,6 +27,7 @@
 #include <linux/nvme.h>
 #include <linux/platform_data/x86/apple.h>
 #include <linux/pm_runtime.h>
+#include <linux/suspend.h>
 #include <linux/switchtec.h>
 #include <asm/dma.h>	/* isa_dma_bridge_buggy */
 #include "pci.h"
@@ -3646,6 +3647,15 @@ static void quirk_apple_poweroff_thunderbolt(struct pci_dev *dev)
 		return;
 	if (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM)
 		return;
+
+	/*
+	 * SXIO/SXFP/SXLF turns off power to the Thunderbolt controller.  We don't
+	 * know how to turn it back on again, but firmware does, so we can only use
+	 * SXIO/SXFP/SXLF if we're suspending via firmware.
+	 */
+	if (!pm_suspend_via_firmware())
+		return;
+
 	bridge = ACPI_HANDLE(&dev->dev);
 	if (!bridge)
 		return;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: don't power-off apple thunderbolt controller on s2idle
  2021-05-20 19:49                   ` Bjorn Helgaas
  2021-05-20 23:28                     ` Konstantin Kharlamov
  2021-05-20 23:55                     ` [PATCH v2] PCI: don't call firmware hooks on suspend unless it's fw-controlled Konstantin Kharlamov
@ 2021-05-21  9:47                     ` Rafael J. Wysocki
  2 siblings, 0 replies; 30+ messages in thread
From: Rafael J. Wysocki @ 2021-05-21  9:47 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, Konstantin Kharlamov, Lukas Wunner, Linux PCI,
	Rafael J. Wysocki, Andreas Noever, Linux PM

On Thu, May 20, 2021 at 9:49 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Thu, May 20, 2021 at 01:54:05PM +0200, Rafael J. Wysocki wrote:
> > On Thu, May 20, 2021 at 1:27 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > On Wed, May 19, 2021 at 9:48 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Wed, May 19, 2021 at 09:12:26PM +0200, Rafael J. Wysocki wrote:
>
> > > > > The problem is related to the fact that in s2idle the platform
> > > > > firmware does not finalize the suspend transition and, consequently,
> > > > > it doesn't initiate the resume transition.  Therefore whatever power
> > > > > state the device was left in during suspend must be dealt with during
> > > > > the subsequent resume.  Hence, if whatever is done by SXIO/SXFP/SXLF
> > > > > in the suspend path cannot be reversed in the resume path by the
> > > > > kernel (because there is no known method to do that), they should not
> > > > > be invoked.  And that's exactly because the platform firmware will not
> > > > > finalize the suspend transition which is indicated by
> > > > > PM_SUSPEND_FLAG_FW_SUSPEND being unset.
> > > >
> > > > How can we connect "if (!pm_suspend_via_firmware())" in this patch
> > > > with the fact that firmware doesn't finalize suspend (and consequently
> > > > does not reverse things in resume)?
> > > >
> > > > I don't see any use of pm_suspend_via_firmware() or
> > > > PM_SUSPEND_FLAG_FW_SUSPEND that looks relevant.
> > >
> > > First of all, there is a kerneldoc comment next to
> > > pm_suspend_via_firmware() which is relevant.  Especially the last
> > > paragraph of that comment applies directly to the case at hand IMV.
>
> I do read kerneldoc, but I *rely* on the code, and it's nice when I
> can match up the kerneldoc with what the code is doing :)

Fair enough.

> Part of my confusion is that "passing control to platform firmware"
> isn't particularly useful in itself because it doesn't give a clue
> about what firmware is *doing*.  Without knowing what it does, we
> can't reason about how kernel's actions interact with firmware's
> actions.

While we don't know exactly what happens when the platform firmware is
running, we can reasonably expect that devices will get reset as a
result of that.

It's kind of like heating up things in a microwave oven: you don't
need to know exactly what happens when it is working, but nevertheless
you can predict with quite high confidence what the outcome of that
will be.

> > BTW, the problem at hand is not that s2idle in particular needs to be
> > treated in a special way (this appears to be the source of all
> > confusion here).  The problem is that the kernel cannot undo the
> > SXIO/SXFP/SXLF magic without passing control to the platform firmware.
>
> I assume this is really a case of "the kernel doesn't know *what* to
> do, but platform firmware does," so in principle the kernel *could*
> undo the SXIO/SXFP/SXLF magic if it knew what to do.

In general, that may or may not be the case.

I guess it is the case here, because Lukas seems to know how to make
this work, but the AML in question might be prepared with the
assumption that the firmware code finalizing the transition will run
after it.

> > And "passing control to the platform firmware" doesn't mean "executing
> > some AML" here, because control remains in the kernel when AML is
> > executed.  "Passing control to the platform firmware" means letting
> > some native firmware code (like SMM code) run which happens at the end
> > of S2/S3/S4 suspend transitions and it does not happen during S1
> > (standby) and s2idle suspend transitions.
> >
> > That's why using SXIO/SXFP/SXLF is only valid during S2/S3/S4 suspend
> > transitions and it is not valid during s2idle and S1 suspend
> > transitions (and yes, S1 is also affected, so s2idle is not special in
> > that respect at all).
> >
> > IMO the changelog of the patch needs to be rewritten, but the code
> > change made by it is reasonable.
>
> So IIUC the comment should say something like:
>
>   SXIO/SXFP/SXLF turns off power to the Thunderbolt controller.  We
>   don't know how to turn it back on again, but firmware does, so we
>   can only use SXIO/SXFP/SXLF if we're suspending via firmware.
>
> Actually, it sounds like the important thing is that we rely on the
> firmware *resume* path to turn on the power again.

Actually both the suspend and resume paths in it together, but the
important piece is that if the firmware runs at the end of suspend, it
will also run at the beginning of resume.

> pm_resume_via_firmware() *sounds* like it would be appropriate, but
> the kerneldoc says that's for use after resume,

That's right.

The rule of thumb is to use pm_suspend_via_firmware() in the
system-wide suspend code paths and pm_resume_via_firmware() in the
resume ones.  They are simply complementary.

And because in this particular case a decision needs to be made
whether or not to do something in a suspend path, the former is the
right one to use.

> and it tells us whether firmware has *already* handled the wakeup event.  And
> PM_SUSPEND_FLAG_FW_RESUME isn't set until after we've run these
> suspend_late fixups, so it wouldn't work here.

Right.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: don't power-off apple thunderbolt controller on s2idle
  2021-05-20 23:28                     ` Konstantin Kharlamov
@ 2021-05-24  6:59                       ` Konstantin Kharlamov
  0 siblings, 0 replies; 30+ messages in thread
From: Konstantin Kharlamov @ 2021-05-24  6:59 UTC (permalink / raw)
  To: Bjorn Helgaas, Rafael J. Wysocki
  Cc: Lukas Wunner, Linux PCI, Rafael J. Wysocki, Andreas Noever, Linux PM

FTR, the patch is on `linux-pci` with all the CCs except `linux-pm`. I changed
the title a bit based on the discussions here, it is:

	[PATCH v2] PCI: don't call firmware hooks on suspend unless it's fw-
controlled

On Fri, 2021-05-21 at 02:28 +0300, Konstantin Kharlamov wrote:
> Thank you very much. Well send then a v2 with the comment in a minute.
> 
> On Thu, 2021-05-20 at 14:49 -0500, Bjorn Helgaas wrote:
> > On Thu, May 20, 2021 at 01:54:05PM +0200, Rafael J. Wysocki wrote:
> > > On Thu, May 20, 2021 at 1:27 PM Rafael J. Wysocki <rafael@kernel.org>
> > > wrote:
> > > > On Wed, May 19, 2021 at 9:48 PM Bjorn Helgaas <helgaas@kernel.org>
> > > > wrote:
> > > > > On Wed, May 19, 2021 at 09:12:26PM +0200, Rafael J. Wysocki wrote:
> > 
> > > > > > The problem is related to the fact that in s2idle the platform
> > > > > > firmware does not finalize the suspend transition and, consequently,
> > > > > > it doesn't initiate the resume transition.  Therefore whatever power
> > > > > > state the device was left in during suspend must be dealt with
> > > > > > during
> > > > > > the subsequent resume.  Hence, if whatever is done by SXIO/SXFP/SXLF
> > > > > > in the suspend path cannot be reversed in the resume path by the
> > > > > > kernel (because there is no known method to do that), they should
> > > > > > not
> > > > > > be invoked.  And that's exactly because the platform firmware will
> > > > > > not
> > > > > > finalize the suspend transition which is indicated by
> > > > > > PM_SUSPEND_FLAG_FW_SUSPEND being unset.
> > > > > 
> > > > > How can we connect "if (!pm_suspend_via_firmware())" in this patch
> > > > > with the fact that firmware doesn't finalize suspend (and consequently
> > > > > does not reverse things in resume)?
> > > > > 
> > > > > I don't see any use of pm_suspend_via_firmware() or
> > > > > PM_SUSPEND_FLAG_FW_SUSPEND that looks relevant.
> > > > 
> > > > First of all, there is a kerneldoc comment next to
> > > > pm_suspend_via_firmware() which is relevant.  Especially the last
> > > > paragraph of that comment applies directly to the case at hand IMV.
> > 
> > I do read kerneldoc, but I *rely* on the code, and it's nice when I
> > can match up the kerneldoc with what the code is doing :)
> > 
> > Part of my confusion is that "passing control to platform firmware"
> > isn't particularly useful in itself because it doesn't give a clue
> > about what firmware is *doing*.  Without knowing what it does, we
> > can't reason about how kernel's actions interact with firmware's
> > actions.
> > 
> > > BTW, the problem at hand is not that s2idle in particular needs to be
> > > treated in a special way (this appears to be the source of all
> > > confusion here).  The problem is that the kernel cannot undo the
> > > SXIO/SXFP/SXLF magic without passing control to the platform firmware.
> > 
> > I assume this is really a case of "the kernel doesn't know *what* to
> > do, but platform firmware does," so in principle the kernel *could*
> > undo the SXIO/SXFP/SXLF magic if it knew what to do.  
> > 
> > > And "passing control to the platform firmware" doesn't mean "executing
> > > some AML" here, because control remains in the kernel when AML is
> > > executed.  "Passing control to the platform firmware" means letting
> > > some native firmware code (like SMM code) run which happens at the end
> > > of S2/S3/S4 suspend transitions and it does not happen during S1
> > > (standby) and s2idle suspend transitions.
> > > 
> > > That's why using SXIO/SXFP/SXLF is only valid during S2/S3/S4 suspend
> > > transitions and it is not valid during s2idle and S1 suspend
> > > transitions (and yes, S1 is also affected, so s2idle is not special in
> > > that respect at all).
> > > 
> > > IMO the changelog of the patch needs to be rewritten, but the code
> > > change made by it is reasonable.
> > 
> > So IIUC the comment should say something like:
> > 
> >   SXIO/SXFP/SXLF turns off power to the Thunderbolt controller.  We
> >   don't know how to turn it back on again, but firmware does, so we
> >   can only use SXIO/SXFP/SXLF if we're suspending via firmware.
> > 
> > Actually, it sounds like the important thing is that we rely on the
> > firmware *resume* path to turn on the power again.
> > 
> > pm_resume_via_firmware() *sounds* like it would be appropriate, but
> > the kerneldoc says that's for use after resume, and it tells us
> > whether firmware has *already* handled the wakeup event.  And
> > PM_SUSPEND_FLAG_FW_RESUME isn't set until after we've run these
> > suspend_late fixups, so it wouldn't work here.
> > 
> > Bjorn
> 
> 



^ permalink raw reply	[flat|nested] 30+ messages in thread

* PING: Re: [PATCH v2] PCI: don't call firmware hooks on suspend unless it's fw-controlled
  2021-05-20 23:55                     ` [PATCH v2] PCI: don't call firmware hooks on suspend unless it's fw-controlled Konstantin Kharlamov
@ 2021-05-28  7:39                       ` Konstantin Kharlamov
  2021-06-03  8:36                         ` PING: " Konstantin Kharlamov
  0 siblings, 1 reply; 30+ messages in thread
From: Konstantin Kharlamov @ 2021-05-28  7:39 UTC (permalink / raw)
  To: linux-pci; +Cc: helgaas, lukas, rjw, andreas.noever

On Fri, 2021-05-21 at 02:55 +0300, Konstantin Kharlamov wrote:
> On Macbook 2013 resuming from s2idle resulted in external monitor no
> longer being detected, and dmesg having errors like:
> 
>     pcieport 0000:06:00.0: can't change power state from D3hot to D0 (config
> space inaccessible)
> 
> and a stacktrace. The reason is that in s2idle (and in S1 as noted by
> Rafael) we do not call firmware code to handle suspend, and as result
> while waking up firmware also does not handle resume.
> 
> This means, for the Thunderbolt controller that gets disabled in the
> quirk by calling the firmware methods, there's no one to wake it back up
> on resume.
> 
> To quote Rafael Wysocki:
> 
> > "Passing control to the platform firmware" means letting
> > some native firmware code (like SMM code) run which happens at the end
> > of S2/S3/S4 suspend transitions and it does not happen during S1
> > (standby) and s2idle suspend transitions.
> > 
> > That's why using SXIO/SXFP/SXLF is only valid during S2/S3/S4 suspend
> > transitions and it is not valid during s2idle and S1 suspend
> > transitions (and yes, S1 is also affected, so s2idle is not special in
> > that respect at all).
> 
> Thus, return early from the quirk when suspend mode isn't one that calls
> firmware.
> 
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212767
> Signed-off-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
> Reviewed-by: Lukas Wunner <lukas@wunner.de>
> ---
>  drivers/pci/quirks.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 653660e3ba9e..f86b6388a04a 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -27,6 +27,7 @@
>  #include <linux/nvme.h>
>  #include <linux/platform_data/x86/apple.h>
>  #include <linux/pm_runtime.h>
> +#include <linux/suspend.h>
>  #include <linux/switchtec.h>
>  #include <asm/dma.h>   /* isa_dma_bridge_buggy */
>  #include "pci.h"
> @@ -3646,6 +3647,15 @@ static void quirk_apple_poweroff_thunderbolt(struct
> pci_dev *dev)
>                 return;
>         if (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM)
>                 return;
> +
> +       /*
> +        * SXIO/SXFP/SXLF turns off power to the Thunderbolt controller.  We
> don't
> +        * know how to turn it back on again, but firmware does, so we can
> only use
> +        * SXIO/SXFP/SXLF if we're suspending via firmware.
> +        */
> +       if (!pm_suspend_via_firmware())
> +               return;
> +
>         bridge = ACPI_HANDLE(&dev->dev);
>         if (!bridge)
>                 return;



^ permalink raw reply	[flat|nested] 30+ messages in thread

* PING: Re: PING: Re: [PATCH v2] PCI: don't call firmware hooks on suspend unless it's fw-controlled
  2021-05-28  7:39                       ` PING: " Konstantin Kharlamov
@ 2021-06-03  8:36                         ` Konstantin Kharlamov
  2021-06-03 17:46                           ` Bjorn Helgaas
  0 siblings, 1 reply; 30+ messages in thread
From: Konstantin Kharlamov @ 2021-06-03  8:36 UTC (permalink / raw)
  To: linux-pci; +Cc: helgaas, lukas, rjw, andreas.noever

On Fri, 2021-05-28 at 10:39 +0300, Konstantin Kharlamov wrote:
> On Fri, 2021-05-21 at 02:55 +0300, Konstantin Kharlamov wrote:
> > On Macbook 2013 resuming from s2idle resulted in external monitor no
> > longer being detected, and dmesg having errors like:
> > 
> >     pcieport 0000:06:00.0: can't change power state from D3hot to D0 (config
> > space inaccessible)
> > 
> > and a stacktrace. The reason is that in s2idle (and in S1 as noted by
> > Rafael) we do not call firmware code to handle suspend, and as result
> > while waking up firmware also does not handle resume.
> > 
> > This means, for the Thunderbolt controller that gets disabled in the
> > quirk by calling the firmware methods, there's no one to wake it back up
> > on resume.
> > 
> > To quote Rafael Wysocki:
> > 
> > > "Passing control to the platform firmware" means letting
> > > some native firmware code (like SMM code) run which happens at the end
> > > of S2/S3/S4 suspend transitions and it does not happen during S1
> > > (standby) and s2idle suspend transitions.
> > > 
> > > That's why using SXIO/SXFP/SXLF is only valid during S2/S3/S4 suspend
> > > transitions and it is not valid during s2idle and S1 suspend
> > > transitions (and yes, S1 is also affected, so s2idle is not special in
> > > that respect at all).
> > 
> > Thus, return early from the quirk when suspend mode isn't one that calls
> > firmware.
> > 
> > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212767
> > Signed-off-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
> > Reviewed-by: Lukas Wunner <lukas@wunner.de>
> > ---
> >  drivers/pci/quirks.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index 653660e3ba9e..f86b6388a04a 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -27,6 +27,7 @@
> >  #include <linux/nvme.h>
> >  #include <linux/platform_data/x86/apple.h>
> >  #include <linux/pm_runtime.h>
> > +#include <linux/suspend.h>
> >  #include <linux/switchtec.h>
> >  #include <asm/dma.h>   /* isa_dma_bridge_buggy */
> >  #include "pci.h"
> > @@ -3646,6 +3647,15 @@ static void quirk_apple_poweroff_thunderbolt(struct
> > pci_dev *dev)
> >                 return;
> >         if (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM)
> >                 return;
> > +
> > +       /*
> > +        * SXIO/SXFP/SXLF turns off power to the Thunderbolt controller.  We
> > don't
> > +        * know how to turn it back on again, but firmware does, so we can
> > only use
> > +        * SXIO/SXFP/SXLF if we're suspending via firmware.
> > +        */
> > +       if (!pm_suspend_via_firmware())
> > +               return;
> > +
> >         bridge = ACPI_HANDLE(&dev->dev);
> >         if (!bridge)
> >                 return;
> 
> 



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: PING: Re: PING: Re: [PATCH v2] PCI: don't call firmware hooks on suspend unless it's fw-controlled
  2021-06-03  8:36                         ` PING: " Konstantin Kharlamov
@ 2021-06-03 17:46                           ` Bjorn Helgaas
  2021-06-04  8:30                             ` Konstantin Kharlamov
  0 siblings, 1 reply; 30+ messages in thread
From: Bjorn Helgaas @ 2021-06-03 17:46 UTC (permalink / raw)
  To: Konstantin Kharlamov; +Cc: linux-pci, lukas, rjw, andreas.noever

On Thu, Jun 03, 2021 at 11:36:43AM +0300, Konstantin Kharlamov wrote:
> On Fri, 2021-05-28 at 10:39 +0300, Konstantin Kharlamov wrote:
> > On Fri, 2021-05-21 at 02:55 +0300, Konstantin Kharlamov wrote:
> > > On Macbook 2013 resuming from s2idle resulted in external monitor no
> > > longer being detected, and dmesg having errors like:
> > > 
> > >     pcieport 0000:06:00.0: can't change power state from D3hot to D0 (config
> > > space inaccessible)
> > > 
> > > and a stacktrace. The reason is that in s2idle (and in S1 as noted by
> > > Rafael) we do not call firmware code to handle suspend, and as result
> > > while waking up firmware also does not handle resume.
> > > 
> > > This means, for the Thunderbolt controller that gets disabled in the
> > > quirk by calling the firmware methods, there's no one to wake it back up
> > > on resume.
> > > 
> > > To quote Rafael Wysocki:
> > > 
> > > > "Passing control to the platform firmware" means letting
> > > > some native firmware code (like SMM code) run which happens at the end
> > > > of S2/S3/S4 suspend transitions and it does not happen during S1
> > > > (standby) and s2idle suspend transitions.
> > > > 
> > > > That's why using SXIO/SXFP/SXLF is only valid during S2/S3/S4 suspend
> > > > transitions and it is not valid during s2idle and S1 suspend
> > > > transitions (and yes, S1 is also affected, so s2idle is not special in
> > > > that respect at all).
> > > 
> > > Thus, return early from the quirk when suspend mode isn't one that calls
> > > firmware.
> > > 
> > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212767
> > > Signed-off-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
> > > Reviewed-by: Lukas Wunner <lukas@wunner.de>
> > > ---
> > >  drivers/pci/quirks.c | 10 ++++++++++
> > >  1 file changed, 10 insertions(+)
> > > 
> > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > index 653660e3ba9e..f86b6388a04a 100644
> > > --- a/drivers/pci/quirks.c
> > > +++ b/drivers/pci/quirks.c
> > > @@ -27,6 +27,7 @@
> > >  #include <linux/nvme.h>
> > >  #include <linux/platform_data/x86/apple.h>
> > >  #include <linux/pm_runtime.h>
> > > +#include <linux/suspend.h>
> > >  #include <linux/switchtec.h>
> > >  #include <asm/dma.h>   /* isa_dma_bridge_buggy */
> > >  #include "pci.h"
> > > @@ -3646,6 +3647,15 @@ static void quirk_apple_poweroff_thunderbolt(struct
> > > pci_dev *dev)
> > >                 return;
> > >         if (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM)
> > >                 return;
> > > +
> > > +       /*
> > > +        * SXIO/SXFP/SXLF turns off power to the Thunderbolt controller.  We
> > > don't
> > > +        * know how to turn it back on again, but firmware does, so we can
> > > only use
> > > +        * SXIO/SXFP/SXLF if we're suspending via firmware.
> > > +        */
> > > +       if (!pm_suspend_via_firmware())
> > > +               return;
> > > +
> > >         bridge = ACPI_HANDLE(&dev->dev);
> > >         if (!bridge)
> > >                 return;
> > 
> > 
> 
> 

Don't worry, I haven't forgotten, but I've been busy with some other
patches.  If you ever want to check on the status, you can search for
the patch on https://patchwork.kernel.org/project/linux-pci/list.  The
patchwork search is not super convenient (it's buried in the "Show
patches with" link), but here's your patch:

https://patchwork.kernel.org/project/linux-pci/patch/20210520235501.917397-1-Hi-Angel@yandex.ru/

It's currently "New" which means it's still in my queue.  I change the
state to "Accepted," "Not Applicable," "Superseded," etc., when I
apply or drop patches.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: PING: Re: PING: Re: [PATCH v2] PCI: don't call firmware hooks on suspend unless it's fw-controlled
  2021-06-03 17:46                           ` Bjorn Helgaas
@ 2021-06-04  8:30                             ` Konstantin Kharlamov
  0 siblings, 0 replies; 30+ messages in thread
From: Konstantin Kharlamov @ 2021-06-04  8:30 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, lukas, rjw, andreas.noever

Cool, I see, thank you very much!

On Thu, 2021-06-03 at 12:46 -0500, Bjorn Helgaas wrote:
> On Thu, Jun 03, 2021 at 11:36:43AM +0300, Konstantin Kharlamov wrote:
> > On Fri, 2021-05-28 at 10:39 +0300, Konstantin Kharlamov wrote:
> > > On Fri, 2021-05-21 at 02:55 +0300, Konstantin Kharlamov wrote:
> > > > On Macbook 2013 resuming from s2idle resulted in external monitor no
> > > > longer being detected, and dmesg having errors like:
> > > > 
> > > >     pcieport 0000:06:00.0: can't change power state from D3hot to D0
> > > > (config
> > > > space inaccessible)
> > > > 
> > > > and a stacktrace. The reason is that in s2idle (and in S1 as noted by
> > > > Rafael) we do not call firmware code to handle suspend, and as result
> > > > while waking up firmware also does not handle resume.
> > > > 
> > > > This means, for the Thunderbolt controller that gets disabled in the
> > > > quirk by calling the firmware methods, there's no one to wake it back up
> > > > on resume.
> > > > 
> > > > To quote Rafael Wysocki:
> > > > 
> > > > > "Passing control to the platform firmware" means letting
> > > > > some native firmware code (like SMM code) run which happens at the end
> > > > > of S2/S3/S4 suspend transitions and it does not happen during S1
> > > > > (standby) and s2idle suspend transitions.
> > > > > 
> > > > > That's why using SXIO/SXFP/SXLF is only valid during S2/S3/S4 suspend
> > > > > transitions and it is not valid during s2idle and S1 suspend
> > > > > transitions (and yes, S1 is also affected, so s2idle is not special in
> > > > > that respect at all).
> > > > 
> > > > Thus, return early from the quirk when suspend mode isn't one that calls
> > > > firmware.
> > > > 
> > > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212767
> > > > Signed-off-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
> > > > Reviewed-by: Lukas Wunner <lukas@wunner.de>
> > > > ---
> > > >  drivers/pci/quirks.c | 10 ++++++++++
> > > >  1 file changed, 10 insertions(+)
> > > > 
> > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > > index 653660e3ba9e..f86b6388a04a 100644
> > > > --- a/drivers/pci/quirks.c
> > > > +++ b/drivers/pci/quirks.c
> > > > @@ -27,6 +27,7 @@
> > > >  #include <linux/nvme.h>
> > > >  #include <linux/platform_data/x86/apple.h>
> > > >  #include <linux/pm_runtime.h>
> > > > +#include <linux/suspend.h>
> > > >  #include <linux/switchtec.h>
> > > >  #include <asm/dma.h>   /* isa_dma_bridge_buggy */
> > > >  #include "pci.h"
> > > > @@ -3646,6 +3647,15 @@ static void
> > > > quirk_apple_poweroff_thunderbolt(struct
> > > > pci_dev *dev)
> > > >                 return;
> > > >         if (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM)
> > > >                 return;
> > > > +
> > > > +       /*
> > > > +        * SXIO/SXFP/SXLF turns off power to the Thunderbolt
> > > > controller.  We
> > > > don't
> > > > +        * know how to turn it back on again, but firmware does, so we
> > > > can
> > > > only use
> > > > +        * SXIO/SXFP/SXLF if we're suspending via firmware.
> > > > +        */
> > > > +       if (!pm_suspend_via_firmware())
> > > > +               return;
> > > > +
> > > >         bridge = ACPI_HANDLE(&dev->dev);
> > > >         if (!bridge)
> > > >                 return;
> > > 
> > > 
> > 
> > 
> 
> Don't worry, I haven't forgotten, but I've been busy with some other
> patches.  If you ever want to check on the status, you can search for
> the patch on https://patchwork.kernel.org/project/linux-pci/list.  The
> patchwork search is not super convenient (it's buried in the "Show
> patches with" link), but here's your patch:
> 
> https://patchwork.kernel.org/project/linux-pci/patch/20210520235501.917397-1-Hi-Angel@yandex.ru/
> 
> It's currently "New" which means it's still in my queue.  I change the
> state to "Accepted," "Not Applicable," "Superseded," etc., when I
> apply or drop patches.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: don't power-off apple thunderbolt controller on s2idle
  2021-05-06 17:38 [PATCH] PCI: don't power-off apple thunderbolt controller on s2idle Konstantin Kharlamov
  2021-05-06 21:48 ` Bjorn Helgaas
@ 2021-06-07 23:17 ` Bjorn Helgaas
  1 sibling, 0 replies; 30+ messages in thread
From: Bjorn Helgaas @ 2021-06-07 23:17 UTC (permalink / raw)
  To: Konstantin Kharlamov
  Cc: linux-pci, Lukas Wunner, Rafael J. Wysocki, Andreas Noever, linux-pm

[+cc Lukas, Rafael, Andreas, linux-pm]

On Thu, May 06, 2021 at 08:38:20PM +0300, Konstantin Kharlamov wrote:
> On Macbook 2013 resuming from s2idle results in external monitor no
> longer being detected, and dmesg having errors like:
> 
>     pcieport 0000:06:00.0: can't change power state from D3hot to D0 (config space inaccessible)
> 
> and a stacktrace. The reason turned out that the hw that the quirk
> powers off does not get powered on back on resume.
> 
> Thus, add a check for s2idle to the quirk, and do nothing if the suspend
> mode is s2idle.
> 
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212767
> Signed-off-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>

Applied as below to pci/pm for v5.14, thanks!

> ---
>  drivers/pci/quirks.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 653660e3ba9e..86fedcec37e2 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -27,6 +27,7 @@
>  #include <linux/nvme.h>
>  #include <linux/platform_data/x86/apple.h>
>  #include <linux/pm_runtime.h>
> +#include <linux/suspend.h>
>  #include <linux/switchtec.h>
>  #include <asm/dma.h>	/* isa_dma_bridge_buggy */
>  #include "pci.h"
> @@ -3646,6 +3647,13 @@ static void quirk_apple_poweroff_thunderbolt(struct pci_dev *dev)
>  		return;
>  	if (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM)
>  		return;
> +
> +	/*
> +	 * If suspend mode is s2idle, power won't get restored on resume.
> +	 */
> +	if (!pm_suspend_via_firmware())
> +		return;
> +
>  	bridge = ACPI_HANDLE(&dev->dev);
>  	if (!bridge)
>  		return;
> 

commit 4694ae373dc2 ("PCI: Leave Apple Thunderbolt controllers on for s2idle or standby")
Author: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Date:   Fri May 21 02:55:01 2021 +0300

    PCI: Leave Apple Thunderbolt controllers on for s2idle or standby
    
    On Macbook 2013, resuming from suspend-to-idle or standby resulted in the
    external monitor no longer being detected, a stacktrace, and errors like
    this in dmesg:
    
      pcieport 0000:06:00.0: can't change power state from D3hot to D0 (config space inaccessible)
    
    The reason is that we know how to turn power to the Thunderbolt controller
    *off* via the SXIO/SXFP/SXLF methods, but we don't know how to turn power
    back on.  We have to rely on firmware to turn the power back on.
    
    When going to the "suspend-to-idle" or "standby" system sleep states,
    firmware is not involved either on the suspend side or the resume side, so
    we can't use SXIO/SXFP/SXLF to turn the power off.
    
    Skip SXIO/SXFP/SXLF when firmware isn't involved in suspend, e.g., when
    we're going to the "suspend-to-idle" or "standby" system sleep states.
    
    Fixes: 1df5172c5c25 ("PCI: Suspend/resume quirks for Apple thunderbolt")
    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212767
    Link: https://lore.kernel.org/r/20210520235501.917397-1-Hi-Angel@yandex.ru
    Signed-off-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    Reviewed-by: Lukas Wunner <lukas@wunner.de>
    Cc: stable@vger.kernel.org

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index dcb229de1acb..0dde9c5259f2 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -27,6 +27,7 @@
 #include <linux/nvme.h>
 #include <linux/platform_data/x86/apple.h>
 #include <linux/pm_runtime.h>
+#include <linux/suspend.h>
 #include <linux/switchtec.h>
 #include <asm/dma.h>	/* isa_dma_bridge_buggy */
 #include "pci.h"
@@ -3634,6 +3635,16 @@ static void quirk_apple_poweroff_thunderbolt(struct pci_dev *dev)
 		return;
 	if (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM)
 		return;
+
+	/*
+	 * SXIO/SXFP/SXLF turns off power to the Thunderbolt controller.
+	 * We don't know how to turn it back on again, but firmware does,
+	 * so we can only use SXIO/SXFP/SXLF if we're suspending via
+	 * firmware.
+	 */
+	if (!pm_suspend_via_firmware())
+		return;
+
 	bridge = ACPI_HANDLE(&dev->dev);
 	if (!bridge)
 		return;

^ permalink raw reply related	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2021-06-07 23:17 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-06 17:38 [PATCH] PCI: don't power-off apple thunderbolt controller on s2idle Konstantin Kharlamov
2021-05-06 21:48 ` Bjorn Helgaas
2021-05-06 22:07   ` Lukas Wunner
2021-05-07  9:51     ` Rafael J. Wysocki
2021-05-08  8:48       ` Lukas Wunner
2021-05-07 13:30     ` Bjorn Helgaas
2021-05-07 14:08       ` Konstantin Kharlamov
2021-05-12 20:36         ` Konstantin Kharlamov
2021-05-17 19:51           ` PING " Konstantin Kharlamov
2021-05-19 17:28         ` Bjorn Helgaas
2021-05-19 19:12           ` Rafael J. Wysocki
2021-05-19 19:48             ` Bjorn Helgaas
2021-05-20 11:27               ` Rafael J. Wysocki
2021-05-20 11:54                 ` Rafael J. Wysocki
2021-05-20 19:49                   ` Bjorn Helgaas
2021-05-20 23:28                     ` Konstantin Kharlamov
2021-05-24  6:59                       ` Konstantin Kharlamov
2021-05-20 23:55                     ` [PATCH v2] PCI: don't call firmware hooks on suspend unless it's fw-controlled Konstantin Kharlamov
2021-05-28  7:39                       ` PING: " Konstantin Kharlamov
2021-06-03  8:36                         ` PING: " Konstantin Kharlamov
2021-06-03 17:46                           ` Bjorn Helgaas
2021-06-04  8:30                             ` Konstantin Kharlamov
2021-05-21  9:47                     ` [PATCH] PCI: don't power-off apple thunderbolt controller on s2idle Rafael J. Wysocki
2021-05-07 15:02       ` Rafael J. Wysocki
2021-05-08  8:20       ` Lukas Wunner
2021-05-07  9:32   ` Konstantin Kharlamov
2021-05-07 13:07     ` Bjorn Helgaas
2021-05-07 13:48       ` Konstantin Kharlamov
2021-05-20 11:58   ` Rafael J. Wysocki
2021-06-07 23:17 ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).