Linux-PM Archive on lore.kernel.org
 help / color / Atom feed
* Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
@ 2019-11-22  0:11 Kenneth R. Crudup
  2019-11-22 10:12 ` Rafael J. Wysocki
  0 siblings, 1 reply; 37+ messages in thread
From: Kenneth R. Crudup @ 2019-11-22  0:11 UTC (permalink / raw)
  To: rafael.j.wysocki; +Cc: linux-pm


I've been running Linus' tip for a while, mostly to get the PM improvements
on s2idle (I hate this mode so much- I'm tired of taking a warm laptop out
of my bag) and I'd reported after a previous merge to Linus' tip from pm-next:

On Wed, 18 Sep 2019, Kenneth R. Crudup wrote:

> - Randomly, if left suspended, nothing other than a hard power off will get
>   it back  ...  It appears "ec_no_wakeup=1" doesn't have this issue

So I noticed this bug seems to only happen if the laptop is plugged in and
charging- but only if the battery is < 100%; now that I had a real failure
mode I started bisecting, and I traced it to 56b991849, "PM: sleep: Simplify
suspend-to-idle control flow".

What happens is I'll initiate a suspend, then when I try to resume the power
LED doesn't come on, and the machine is completely unresponsive. I have to
force a reset by holding the power key.

I'd like to help you fix this issue; I figure if it's affecting my device
it's affecting others, too. I'm quite proficient with the kernel overall,
but know very little about ACPI or the EC. I'm assuming- totally wild guess-
that when we get some of those EC GPEs(?) while suspended(?) we're panic()ing
or hanging somewhere so the BIOS is getting hung up and the whole system is
stalled out.

Where should I begin to help debug this? I have EFI store set up so BUG_ON()s
et al.  will dump there for retrival later if inserting those at critical
places could help.

Thanks,

	-Kenny

-- 
Kenneth R. Crudup  Sr. SW Engineer, Scott County Consulting, Silicon Valley

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-22  0:11 Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow) Kenneth R. Crudup
@ 2019-11-22 10:12 ` Rafael J. Wysocki
  2019-11-22 12:45   ` Rafael J. Wysocki
  2019-11-22 17:29   ` Kenneth R. Crudup
  0 siblings, 2 replies; 37+ messages in thread
From: Rafael J. Wysocki @ 2019-11-22 10:12 UTC (permalink / raw)
  To: Kenneth R. Crudup; +Cc: Rafael Wysocki, Linux PM

On Fri, Nov 22, 2019 at 1:28 AM Kenneth R. Crudup <kenny@panix.com> wrote:
>
>
> I've been running Linus' tip for a while, mostly to get the PM improvements
> on s2idle (I hate this mode so much- I'm tired of taking a warm laptop out
> of my bag) and I'd reported after a previous merge to Linus' tip from pm-next:
>
> On Wed, 18 Sep 2019, Kenneth R. Crudup wrote:
>
> > - Randomly, if left suspended, nothing other than a hard power off will get
> >   it back  ...  It appears "ec_no_wakeup=1" doesn't have this issue
>
> So I noticed this bug seems to only happen if the laptop is plugged in and
> charging- but only if the battery is < 100%; now that I had a real failure
> mode I started bisecting, and I traced it to 56b991849, "PM: sleep: Simplify
> suspend-to-idle control flow".
>
> What happens is I'll initiate a suspend, then when I try to resume the power
> LED doesn't come on, and the machine is completely unresponsive. I have to
> force a reset by holding the power key.
>
> I'd like to help you fix this issue; I figure if it's affecting my device
> it's affecting others, too. I'm quite proficient with the kernel overall,
> but know very little about ACPI or the EC. I'm assuming- totally wild guess-
> that when we get some of those EC GPEs(?) while suspended(?) we're panic()ing
> or hanging somewhere so the BIOS is getting hung up and the whole system is
> stalled out.
>
> Where should I begin to help debug this? I have EFI store set up so BUG_ON()s
> et al.  will dump there for retrival later if inserting those at critical
> places could help.

First off, please try to change the sleep_no_lps0 ACPI module
parameter (/sys/module/acpi/parameters/sleep_no_lps0) to 1 and see if
that makes any difference.  If it helps, I don't think we can do much
except for blacklisting your machine.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-22 10:12 ` Rafael J. Wysocki
@ 2019-11-22 12:45   ` Rafael J. Wysocki
  2019-11-22 17:35     ` Kenneth R. Crudup
  2019-11-23 10:24     ` Kenneth R. Crudup
  2019-11-22 17:29   ` Kenneth R. Crudup
  1 sibling, 2 replies; 37+ messages in thread
From: Rafael J. Wysocki @ 2019-11-22 12:45 UTC (permalink / raw)
  To: Kenneth R. Crudup; +Cc: Rafael Wysocki, Linux PM

On Fri, Nov 22, 2019 at 11:12 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Fri, Nov 22, 2019 at 1:28 AM Kenneth R. Crudup <kenny@panix.com> wrote:
> >
> >
> > I've been running Linus' tip for a while, mostly to get the PM improvements
> > on s2idle (I hate this mode so much- I'm tired of taking a warm laptop out
> > of my bag) and I'd reported after a previous merge to Linus' tip from pm-next:
> >
> > On Wed, 18 Sep 2019, Kenneth R. Crudup wrote:
> >
> > > - Randomly, if left suspended, nothing other than a hard power off will get
> > >   it back  ...  It appears "ec_no_wakeup=1" doesn't have this issue

BTW, is the ec_no_wakeup=1 workaround still effective?

If so, does the lid and/or power button wake up the system from
suspend with ec_no_wakeup=1?

> > So I noticed this bug seems to only happen if the laptop is plugged in and
> > charging- but only if the battery is < 100%; now that I had a real failure
> > mode I started bisecting, and I traced it to 56b991849, "PM: sleep: Simplify
> > suspend-to-idle control flow".
> >
> > What happens is I'll initiate a suspend, then when I try to resume the power
> > LED doesn't come on, and the machine is completely unresponsive. I have to
> > force a reset by holding the power key.
> >
> > I'd like to help you fix this issue; I figure if it's affecting my device
> > it's affecting others, too. I'm quite proficient with the kernel overall,
> > but know very little about ACPI or the EC. I'm assuming- totally wild guess-
> > that when we get some of those EC GPEs(?) while suspended(?) we're panic()ing
> > or hanging somewhere so the BIOS is getting hung up and the whole system is
> > stalled out.
> >
> > Where should I begin to help debug this? I have EFI store set up so BUG_ON()s
> > et al.  will dump there for retrival later if inserting those at critical
> > places could help.
>
> First off, please try to change the sleep_no_lps0 ACPI module
> parameter (/sys/module/acpi/parameters/sleep_no_lps0) to 1 and see if
> that makes any difference.  If it helps, I don't think we can do much
> except for blacklisting your machine.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-22 10:12 ` Rafael J. Wysocki
  2019-11-22 12:45   ` Rafael J. Wysocki
@ 2019-11-22 17:29   ` Kenneth R. Crudup
  1 sibling, 0 replies; 37+ messages in thread
From: Kenneth R. Crudup @ 2019-11-22 17:29 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Rafael Wysocki, Linux PM


On Fri, 22 Nov 2019, Rafael J. Wysocki wrote:

> First off, please try to change the sleep_no_lps0 ACPI module
> parameter (/sys/module/acpi/parameters/sleep_no_lps0) to 1 and see if
> that makes any difference.

I did try that; didn't seem to help. I will run a few tests to make sure,
though.

> If it helps, I don't think we can do much except for blacklisting your machine.

Urgh. Blacklisting from what/which facility?

Every now and then, when this thing goes to sleep it mostly goes down to 50mA
(measured by the USB-C charging port) idle current; I wish it were less, but
this is "modern" suspend. But every Nth suspend it will draw 180 to 240 mA for
reasons I can't deduce, and I was wondering if perhaps we're getting wedged in
an intermediate state on the way into s2idle. That's why I'm trying to make
sure I have access to every possible suspend enhancement.

At one point it was a regression in i915 that caused a high suspend current
at all times and I helped Chris Wilson and those guys track down, but that's
been OK for a while.

One thing that I do get is the dump on resume when it doesn't go into S0ix
(intel_pmc_core.warn_on_s0ix_failures=1) and there's a number of what I
assume are clocks and power rails that are still "on"; I imagine some of
that is done by the BIOS (e.g., "MPHY_CORE_GATED", "CNV_VNNAON_REQ_ACT").
Do you know what these are mnemonics for?

Thanks,

	-Kenny

-- 
Kenneth R. Crudup  Sr. SW Engineer, Scott County Consulting, Silicon Valley

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-22 12:45   ` Rafael J. Wysocki
@ 2019-11-22 17:35     ` Kenneth R. Crudup
  2019-11-23 10:24     ` Kenneth R. Crudup
  1 sibling, 0 replies; 37+ messages in thread
From: Kenneth R. Crudup @ 2019-11-22 17:35 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Rafael Wysocki, Linux PM


On Fri, 22 Nov 2019, Rafael J. Wysocki wrote:

> > > > - Randomly, if left suspended, nothing other than a hard power off will get
> > > >   it back  ...  It appears "ec_no_wakeup=1" doesn't have this issue

> BTW, is the ec_no_wakeup=1 workaround still effective?

I'm pretty sure that's always effective, but:

> If so, does the lid and/or power button wake up the system from
> suspend with ec_no_wakeup=1?

No, hitting a key on the keyboard is the only thing that wakes it up. This
actually isn't a bad thing for me at all- but as per my previous E-mail I'm
trying to determine the cause for the occasional much higher suspended idle
draw and I was thinking using these workarounds means I'm not taking advantage
of all the suspend optimisations you've added.

That being said, I do have "XHC" disabled as a wakeup source (written to
/proc/acpi/wakeup) else BT mouse movement would wake this thing up.

	-Kenny

-- 
Kenneth R. Crudup  Sr. SW Engineer, Scott County Consulting, Silicon Valley

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-22 12:45   ` Rafael J. Wysocki
  2019-11-22 17:35     ` Kenneth R. Crudup
@ 2019-11-23 10:24     ` Kenneth R. Crudup
  2019-11-24 16:02       ` Rafael J. Wysocki
  1 sibling, 1 reply; 37+ messages in thread
From: Kenneth R. Crudup @ 2019-11-23 10:24 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Rafael Wysocki, Linux PM


On Fri, 22 Nov 2019, Rafael J. Wysocki wrote:

> BTW, is the ec_no_wakeup=1 workaround still effective?

OK, I was wrong the first time, at least from the new Linus' tip (as of
merge commit 2027cabe6af anyway). So I did some testing, and I now have a matrix:

If "sleep_no_lps0" == 1, the machine never goes fully to sleep; the power
light stays on and the backlight goes off, but if I have external monitors
connected they're still showing dmesg activity. This is independent of the
state of "ec_no_wakeup".

If "ec_no_wakeup" == 1, the system *at times* will go to sleep and never return
(i.e. no power light comes on, it's totally unresponsive until I do a hard
reset with a power-button long-press) whether I'm plugged in or not.
This is new behavior.

If "ec_no_wakeup" == 0, the system goes fully to sleep and either of the
power button, lid opening or hitting a key resumes the laptop, but if I'm
plugged in and actually charging when I suspend (and I suspect if I plug
it in during suspend) it never returns, as in the case above.

Help! What can I do to return to the behavior of right before the s0 rework?

Where in the code could I start looking to try to find out where the machine
goes dead?

	-Kenny

-- 
Kenneth R. Crudup  Sr. SW Engineer, Scott County Consulting, Silicon Valley

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-23 10:24     ` Kenneth R. Crudup
@ 2019-11-24 16:02       ` Rafael J. Wysocki
  2019-11-25  3:40         ` Kenneth R. Crudup
  2019-11-25  5:50         ` Kenneth R. Crudup
  0 siblings, 2 replies; 37+ messages in thread
From: Rafael J. Wysocki @ 2019-11-24 16:02 UTC (permalink / raw)
  To: Kenneth R. Crudup; +Cc: Rafael J. Wysocki, Rafael Wysocki, Linux PM

On Sat, Nov 23, 2019 at 11:24 AM Kenneth R. Crudup <kenny@panix.com> wrote:
>
>
> On Fri, 22 Nov 2019, Rafael J. Wysocki wrote:
>
> > BTW, is the ec_no_wakeup=1 workaround still effective?
>
> OK, I was wrong the first time, at least from the new Linus' tip (as of
> merge commit 2027cabe6af anyway). So I did some testing, and I now have a matrix:
>
> If "sleep_no_lps0" == 1, the machine never goes fully to sleep; the power
> light stays on and the backlight goes off, but if I have external monitors
> connected they're still showing dmesg activity. This is independent of the
> state of "ec_no_wakeup".

Hmm.  The external monitors part is something you have never mentioned.

> If "ec_no_wakeup" == 1, the system *at times* will go to sleep and never return

This is unclear.

What exactly do you mean by "go to sleep"?

Which part of the behavior does the "at times" phrase apply to?  The
"going to sleep" or coming back?  Or both?

> (i.e. no power light comes on, it's totally unresponsive until I do a hard
> reset with a power-button long-press) whether I'm plugged in or not.
> This is new behavior.

So how did it behave in 5.3.y?

> If "ec_no_wakeup" == 0, the system goes fully to sleep and either of the
> power button, lid opening or hitting a key resumes the laptop, but if I'm
> plugged in and actually charging when I suspend (and I suspect if I plug
> it in during suspend) it never returns, as in the case above.

OK, so ec_no_wakeup doesn't really change the behavior substantially,
it only makes certain things more or less likely to happen.

Does it still hang if you use the keyboard to wake up the system?

> Help! What can I do to return to the behavior of right before the s0 rework?

I guess you mean the 5.3.y behavior.  And what was it?

> Where in the code could I start looking to try to find out where the machine
> goes dead?

Well, because you identified 56b991849 as the first bad commit, the
following three lines of code in drivers/acpi/sleep.c are likely to be
the source of the problem:

        acpi_os_wait_events_complete(); /* synchronize EC GPE processing */
        acpi_ec_flush_work();
        acpi_os_wait_events_complete(); /* synchronize Notify handling */

Can you please try to comment them out and retest?

Note that you most likely won't be able to wake up the system via the
lid/power button without them, so use the keyboard to way it up during
that test.

Thanks!

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-24 16:02       ` Rafael J. Wysocki
@ 2019-11-25  3:40         ` Kenneth R. Crudup
  2019-11-25 13:27           ` Rafael J. Wysocki
  2019-11-25  5:50         ` Kenneth R. Crudup
  1 sibling, 1 reply; 37+ messages in thread
From: Kenneth R. Crudup @ 2019-11-25  3:40 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Rafael Wysocki, Linux PM


Thanks for getting back to me.

On Sun, 24 Nov 2019, Rafael J. Wysocki wrote:

> > If "sleep_no_lps0" == 1, the machine never goes fully to sleep; the power
> > light stays on and the backlight goes off, but if I have external monitors
> > connected they're still showing dmesg activity. This is independent of the
> > state of "ec_no_wakeup".

> Hmm.  The external monitors part is something you have never mentioned.

I didn't realize that myself until I'd tried "sleep_no_lps0" testing for this
thread and happened to have my Thunderbolt dock connected.

> > If "ec_no_wakeup" == 1, the system *at times* will go to sleep and never return

> This is unclear.  What exactly do you mean by "go to sleep"?

It appears to do a suspend cycle; the screen goes off, the power light goes out,
and the power consumption (as measured at the charge port) (usually) goes to the
smallest draw this laptop is capable of in s2idle.

> Which part of the behavior does the "at times" phrase apply to?  The
> "going to sleep" or coming back?  Or both?

The coming back. Many times I'll hit a key on the keyboard (when "ec_no_wakeup"
is set) or open the lid or hit the the power button (if it's not set) and nothing
happens. IIRC the current draw doesn't increase either, but don't quote me on
that (it's easy enough to reproduce, so I'll try it out and report back).

> > (i.e. no power light comes on, it's totally unresponsive until I do a hard
> > reset with a power-button long-press) whether I'm plugged in or not.
> > This is new behavior.

> So how did it behave in 5.3.y?
...
> > Help! What can I do to return to the behavior of right before the s0 rework?
> I guess you mean the 5.3.y behavior.  And what was it?
...

Seemed to always work; I don't recall any issues with s2idle in earlier
kernels. Sometimes my idle draw would be much higher than it should be, but
I have zero clue as to why that is (which is why I'm chasing down bleeding-
edge PM commits).

> > If "ec_no_wakeup" == 0, the system goes fully to sleep and either of the
> > power button, lid opening or hitting a key resumes the laptop, but if I'm
> > plugged in and actually charging when I suspend (and I suspect if I plug
> > it in during suspend) it never returns, as in the case above.

> OK, so ec_no_wakeup doesn't really change the behavior substantially,
> it only makes certain things more or less likely to happen.
> Does it still hang if you use the keyboard to wake up the system?

When "ec_no_wakeup" is set, ONLY the keyboard wakes up the system, and the dead
system is unrelated to the method I'm using to wake things up.

> > Where in the code could I start looking to try to find out where the machine
> > goes dead?
>
> Well, because you identified 56b991849 as the first bad commit, the
> following three lines of code in drivers/acpi/sleep.c are likely to be
> the source of the problem:
>
>         acpi_os_wait_events_complete(); /* synchronize EC GPE processing */
>         acpi_ec_flush_work();
>         acpi_os_wait_events_complete(); /* synchronize Notify handling */
>
> Can you please try to comment them out and retest?

I'll do that and get back to you.

> Note that you most likely won't be able to wake up the system via the
> lid/power button without them

Yeah, I'm used to that.

	-Kenny

-- 
Kenneth R. Crudup  Sr. SW Engineer, Scott County Consulting, Silicon Valley

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-24 16:02       ` Rafael J. Wysocki
  2019-11-25  3:40         ` Kenneth R. Crudup
@ 2019-11-25  5:50         ` Kenneth R. Crudup
  2019-11-25  7:17           ` Kenneth R. Crudup
  1 sibling, 1 reply; 37+ messages in thread
From: Kenneth R. Crudup @ 2019-11-25  5:50 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Rafael Wysocki, Linux PM


On Sun, 24 Nov 2019, Rafael J. Wysocki wrote:

> Well, because you identified 56b991849 as the first bad commit, the
> following three lines of code in drivers/acpi/sleep.c are likely to be
> the source of the problem:
>
>         acpi_os_wait_events_complete(); /* synchronize EC GPE processing */
>         acpi_ec_flush_work();
>         acpi_os_wait_events_complete(); /* synchronize Notify handling */

Huh- so this is encouraging- I've removed these lines, and while I haven't
been able to kill it yet (give me time :)) what DOES happen now is I get
wakeups from sleep on POWER events- charger connect (and disconnect)- and
now I wonder if this is somehow related to the issue of the suspend not
resuming if the machine is charging?

Are we masking out any events, perhaps?

	-Kenny

-- 
Kenneth R. Crudup  Sr. SW Engineer, Scott County Consulting, Silicon Valley

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-25  5:50         ` Kenneth R. Crudup
@ 2019-11-25  7:17           ` Kenneth R. Crudup
  0 siblings, 0 replies; 37+ messages in thread
From: Kenneth R. Crudup @ 2019-11-25  7:17 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Rafael Wysocki, Linux PM


On Sun, 24 Nov 2019, Kenneth R. Crudup wrote:

> Huh- so this is encouraging- I've removed these lines, and while I haven't
> been able to kill it yet (give me time :)) what DOES happen now is I get
> wakeups from sleep on POWER events- charger connect (and disconnect)- and
> now I wonder if this is somehow related to the issue of the suspend not
> resuming if the machine is charging?

Also, the system will not suspend if the charger is attached; it's kind of
like the "sleep_no_lps0=1" behavior (BTW, neither "sleep_no_lps0" nor
"ec_no_wakeup" have been modified from the defaults of "0" while those lines
have been deleted).

I've done 14 suspend cycles according to /sys/power/suspend_stats/success so far,
and while none of them have been for more than an hour or so, the laptop is cold
when I do resume it. I can't tell what the power draw is any more 'cause it
won't sleep while an external charger is connected- which I can live with if
it means the laptop doesn't get stuck in that intermediate suspend state
where it draws > 3x more current than the lowest idle state.

	-Kenny

-- 
Kenneth R. Crudup  Sr. SW Engineer, Scott County Consulting, Silicon Valley

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-25  3:40         ` Kenneth R. Crudup
@ 2019-11-25 13:27           ` Rafael J. Wysocki
  2019-11-25 14:14             ` Rafael J. Wysocki
  2019-11-25 16:21             ` Kenneth R. Crudup
  0 siblings, 2 replies; 37+ messages in thread
From: Rafael J. Wysocki @ 2019-11-25 13:27 UTC (permalink / raw)
  To: Kenneth R. Crudup; +Cc: Rafael J. Wysocki, Rafael Wysocki, Linux PM

On Mon, Nov 25, 2019 at 4:40 AM Kenneth R. Crudup <kenny@panix.com> wrote:
>
>
> Thanks for getting back to me.
>
> On Sun, 24 Nov 2019, Rafael J. Wysocki wrote:
>
> > > If "sleep_no_lps0" == 1, the machine never goes fully to sleep; the power
> > > light stays on and the backlight goes off, but if I have external monitors
> > > connected they're still showing dmesg activity. This is independent of the
> > > state of "ec_no_wakeup".
>
> > Hmm.  The external monitors part is something you have never mentioned.
>
> I didn't realize that myself until I'd tried "sleep_no_lps0" testing for this
> thread and happened to have my Thunderbolt dock connected.

OK, but with sleep_no_lps0, does it also hang during resume or not?

> > > If "ec_no_wakeup" == 1, the system *at times* will go to sleep and never return
>
> > This is unclear.  What exactly do you mean by "go to sleep"?
>
> It appears to do a suspend cycle; the screen goes off, the power light goes out,
> and the power consumption (as measured at the charge port) (usually) goes to the
> smallest draw this laptop is capable of in s2idle.
>
> > Which part of the behavior does the "at times" phrase apply to?  The
> > "going to sleep" or coming back?  Or both?
>
> The coming back. Many times I'll hit a key on the keyboard (when "ec_no_wakeup"
> is set) or open the lid or hit the the power button (if it's not set) and nothing
> happens. IIRC the current draw doesn't increase either, but don't quote me on
> that (it's easy enough to reproduce, so I'll try it out and report back).
>
> > > (i.e. no power light comes on, it's totally unresponsive until I do a hard
> > > reset with a power-button long-press) whether I'm plugged in or not.
> > > This is new behavior.
>
> > So how did it behave in 5.3.y?
> ...
> > > Help! What can I do to return to the behavior of right before the s0 rework?
> > I guess you mean the 5.3.y behavior.  And what was it?
> ...
>
> Seemed to always work; I don't recall any issues with s2idle in earlier
> kernels. Sometimes my idle draw would be much higher than it should be, but
> I have zero clue as to why that is (which is why I'm chasing down bleeding-
> edge PM commits).
>
> > > If "ec_no_wakeup" == 0, the system goes fully to sleep and either of the
> > > power button, lid opening or hitting a key resumes the laptop, but if I'm
> > > plugged in and actually charging when I suspend (and I suspect if I plug
> > > it in during suspend) it never returns, as in the case above.
>
> > OK, so ec_no_wakeup doesn't really change the behavior substantially,
> > it only makes certain things more or less likely to happen.
> > Does it still hang if you use the keyboard to wake up the system?
>
> When "ec_no_wakeup" is set, ONLY the keyboard wakes up the system, and the dead
> system is unrelated to the method I'm using to wake things up.

In that case whatever happens in acpi_s2idle_wake() can be ruled out,
because on your system that function effectively is a NOP with
ec_no_wakeup.  I don't expect the test below to add anything new to
what we know already.

> > > Where in the code could I start looking to try to find out where the machine
> > > goes dead?
> >
> > Well, because you identified 56b991849 as the first bad commit, the
> > following three lines of code in drivers/acpi/sleep.c are likely to be
> > the source of the problem:
> >
> >         acpi_os_wait_events_complete(); /* synchronize EC GPE processing */
> >         acpi_ec_flush_work();
> >         acpi_os_wait_events_complete(); /* synchronize Notify handling */
> >
> > Can you please try to comment them out and retest?
>
> I'll do that and get back to you.
>
> > Note that you most likely won't be able to wake up the system via the
> > lid/power button without them
>
> Yeah, I'm used to that.

So, as stated above, this test is not likely to be conclusive.

Now, given that the changes in acpi_s2idle_wake() don't matter, you
seem to be missing the acpi_s2idle_sync() after
dpm_noirq_resume_devices(), because dropping it was the only other
substantial change made by commit 56b991849 AFAICS.

I'll send you a patch to try for that.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-25 13:27           ` Rafael J. Wysocki
@ 2019-11-25 14:14             ` Rafael J. Wysocki
  2019-11-25 18:27               ` Kenneth R. Crudup
  2019-11-25 16:21             ` Kenneth R. Crudup
  1 sibling, 1 reply; 37+ messages in thread
From: Rafael J. Wysocki @ 2019-11-25 14:14 UTC (permalink / raw)
  To: Kenneth R. Crudup; +Cc: Rafael J. Wysocki, Rafael Wysocki, Linux PM

On Monday, November 25, 2019 2:27:00 PM CET Rafael J. Wysocki wrote:
> On Mon, Nov 25, 2019 at 4:40 AM Kenneth R. Crudup <kenny@panix.com> wrote:
> >
> >
> > Thanks for getting back to me.
> >
> > On Sun, 24 Nov 2019, Rafael J. Wysocki wrote:
> >
> > > > If "sleep_no_lps0" == 1, the machine never goes fully to sleep; the power
> > > > light stays on and the backlight goes off, but if I have external monitors
> > > > connected they're still showing dmesg activity. This is independent of the
> > > > state of "ec_no_wakeup".
> >
> > > Hmm.  The external monitors part is something you have never mentioned.
> >
> > I didn't realize that myself until I'd tried "sleep_no_lps0" testing for this
> > thread and happened to have my Thunderbolt dock connected.
> 
> OK, but with sleep_no_lps0, does it also hang during resume or not?
> 
> > > > If "ec_no_wakeup" == 1, the system *at times* will go to sleep and never return
> >
> > > This is unclear.  What exactly do you mean by "go to sleep"?
> >
> > It appears to do a suspend cycle; the screen goes off, the power light goes out,
> > and the power consumption (as measured at the charge port) (usually) goes to the
> > smallest draw this laptop is capable of in s2idle.
> >
> > > Which part of the behavior does the "at times" phrase apply to?  The
> > > "going to sleep" or coming back?  Or both?
> >
> > The coming back. Many times I'll hit a key on the keyboard (when "ec_no_wakeup"
> > is set) or open the lid or hit the the power button (if it's not set) and nothing
> > happens. IIRC the current draw doesn't increase either, but don't quote me on
> > that (it's easy enough to reproduce, so I'll try it out and report back).
> >
> > > > (i.e. no power light comes on, it's totally unresponsive until I do a hard
> > > > reset with a power-button long-press) whether I'm plugged in or not.
> > > > This is new behavior.
> >
> > > So how did it behave in 5.3.y?
> > ...
> > > > Help! What can I do to return to the behavior of right before the s0 rework?
> > > I guess you mean the 5.3.y behavior.  And what was it?
> > ...
> >
> > Seemed to always work; I don't recall any issues with s2idle in earlier
> > kernels. Sometimes my idle draw would be much higher than it should be, but
> > I have zero clue as to why that is (which is why I'm chasing down bleeding-
> > edge PM commits).
> >
> > > > If "ec_no_wakeup" == 0, the system goes fully to sleep and either of the
> > > > power button, lid opening or hitting a key resumes the laptop, but if I'm
> > > > plugged in and actually charging when I suspend (and I suspect if I plug
> > > > it in during suspend) it never returns, as in the case above.
> >
> > > OK, so ec_no_wakeup doesn't really change the behavior substantially,
> > > it only makes certain things more or less likely to happen.
> > > Does it still hang if you use the keyboard to wake up the system?
> >
> > When "ec_no_wakeup" is set, ONLY the keyboard wakes up the system, and the dead
> > system is unrelated to the method I'm using to wake things up.
> 
> In that case whatever happens in acpi_s2idle_wake() can be ruled out,
> because on your system that function effectively is a NOP with
> ec_no_wakeup.  I don't expect the test below to add anything new to
> what we know already.
> 
> > > > Where in the code could I start looking to try to find out where the machine
> > > > goes dead?
> > >
> > > Well, because you identified 56b991849 as the first bad commit, the
> > > following three lines of code in drivers/acpi/sleep.c are likely to be
> > > the source of the problem:
> > >
> > >         acpi_os_wait_events_complete(); /* synchronize EC GPE processing */
> > >         acpi_ec_flush_work();
> > >         acpi_os_wait_events_complete(); /* synchronize Notify handling */
> > >
> > > Can you please try to comment them out and retest?
> >
> > I'll do that and get back to you.
> >
> > > Note that you most likely won't be able to wake up the system via the
> > > lid/power button without them
> >
> > Yeah, I'm used to that.
> 
> So, as stated above, this test is not likely to be conclusive.
> 
> Now, given that the changes in acpi_s2idle_wake() don't matter, you
> seem to be missing the acpi_s2idle_sync() after
> dpm_noirq_resume_devices(), because dropping it was the only other
> substantial change made by commit 56b991849 AFAICS.
> 
> I'll send you a patch to try for that.

Appended (untested).

---
 drivers/acpi/sleep.c |   13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

Index: linux-pm/drivers/acpi/sleep.c
===================================================================
--- linux-pm.orig/drivers/acpi/sleep.c
+++ linux-pm/drivers/acpi/sleep.c
@@ -977,6 +977,13 @@ static int acpi_s2idle_prepare_late(void
 	return 0;
 }
 
+static void acpi_s2idle_sync(void)
+{
+	acpi_os_wait_events_complete(); /* synchronize GPE processing */
+	acpi_ec_flush_work();
+	acpi_os_wait_events_complete(); /* synchronize Notify handling */
+}
+
 static void acpi_s2idle_wake(void)
 {
 	/*
@@ -1005,9 +1012,7 @@ static void acpi_s2idle_wake(void)
 		 * The EC driver uses the system workqueue and an additional
 		 * special one, so those need to be flushed too.
 		 */
-		acpi_os_wait_events_complete(); /* synchronize EC GPE processing */
-		acpi_ec_flush_work();
-		acpi_os_wait_events_complete(); /* synchronize Notify handling */
+		acpi_s2idle_sync();
 
 		rearm_wake_irq(acpi_sci_irq);
 	}
@@ -1024,6 +1029,8 @@ static void acpi_s2idle_restore_early(vo
 
 static void acpi_s2idle_restore(void)
 {
+	acpi_s2idle_sync();
+
 	s2idle_wakeup = false;
 
 	acpi_enable_all_runtime_gpes();





^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-25 13:27           ` Rafael J. Wysocki
  2019-11-25 14:14             ` Rafael J. Wysocki
@ 2019-11-25 16:21             ` Kenneth R. Crudup
  2019-11-25 21:46               ` Rafael J. Wysocki
  1 sibling, 1 reply; 37+ messages in thread
From: Kenneth R. Crudup @ 2019-11-25 16:21 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Rafael Wysocki, Linux PM


On Mon, 25 Nov 2019, Rafael J. Wysocki wrote:

> OK, but with sleep_no_lps0, does it also hang during resume or not?

It never goes to sleep, is what I'm saying. Screen goes dark, but the machine
seems to never go into sleep states.

> In that case whatever happens in acpi_s2idle_wake() can be ruled out,
> because on your system that function effectively is a NOP with
> ec_no_wakeup.  I don't expect the test below to add anything new to
> what we know already.

> > >         acpi_os_wait_events_complete(); /* synchronize EC GPE processing */
> > >         acpi_ec_flush_work();
> > >         acpi_os_wait_events_complete(); /* synchronize Notify handling */

> So, as stated above, this test is not likely to be conclusive.

I dunno ... did you see my other E-mail? Other than power-charge events
waking up my laptop (and not going to sleep while plugged in), it hasn't
hung up in several resume attempts, and doing power changes while asleep
just wakes it up (be it charger being plugged in, or charger removed).

Maybe this is expected behavor with those lines commented out, but at least
I know now that it's possible to suspend and successfully resume under my
standard workflow on a later-model kernel.

> I'll send you a patch to try for that.

I see that's in your next E-mail. I'll try it out shortly.

	-Kenny

-- 
Kenneth R. Crudup  Sr. SW Engineer, Scott County Consulting, Silicon Valley

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-25 14:14             ` Rafael J. Wysocki
@ 2019-11-25 18:27               ` Kenneth R. Crudup
  2019-11-25 20:11                 ` Kenneth R. Crudup
  2019-11-25 21:47                 ` Rafael J. Wysocki
  0 siblings, 2 replies; 37+ messages in thread
From: Kenneth R. Crudup @ 2019-11-25 18:27 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Rafael J. Wysocki, Rafael Wysocki, Linux PM


On Mon, 25 Nov 2019, Rafael J. Wysocki wrote:

> > I'll send you a patch to try for that.

> Appended (untested).

This is promising! I've only tested a handful of suspend cycles, but so far:

- Charging events don't wake it up
- It suspends with the charger connected
- Suspend current draw consistently gets to the lowest idle point
- It always comes back from suspend, even after connecting a charger when suspended

... IOW, everything just seems to work, and better than before. I just
remembered why I had to use "ec_no_wakeup" (pre suspend-rework) in the first
place- this is a convertible and accelerometer events would take the laptop
out of (and immediately back into) suspend and your rework fixed that but then
I had the issues listed in my original E-mail.

I'll keep testing it (and the best way to get something to break is to proclaim
it's "fixed" :), but for now I'm pretty happy with the patch.

	-Kenny

-- 
Kenneth R. Crudup  Sr. SW Engineer, Scott County Consulting, Silicon Valley

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-25 18:27               ` Kenneth R. Crudup
@ 2019-11-25 20:11                 ` Kenneth R. Crudup
  2019-11-25 20:19                   ` Kenneth R. Crudup
  2019-11-25 22:01                   ` Rafael J. Wysocki
  2019-11-25 21:47                 ` Rafael J. Wysocki
  1 sibling, 2 replies; 37+ messages in thread
From: Kenneth R. Crudup @ 2019-11-25 20:11 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Rafael J. Wysocki, Rafael Wysocki, Linux PM


On Mon, 25 Nov 2019, Kenneth R. Crudup wrote:

> I'll keep testing it (and the best way to get something to break is to
> proclaim it's "fixed" :)

... grumble ...!

So, put the laptop to sleep, put it in my bag[1], then head off to go work.
Get to where I'm working, plug in the power meter, it's now drawing 150mA
while idle ...  uh oh. Sure enough, open the lid, hit some keys, dead. Had
to do a long- power-button reset to get it back up again. I note that it's
now charging (I may not have allowed the battery to drain enough for the
charging circuit think the battery's discharged enough to begin a charge
cycle before).

I'm going to go back to the original patch (the three removed lines) for now
(which may not do any better long-term, but I'll have to at least try.

[1] - I can let it sit idle at home and it'll be OK, but when I put it in
my bag and go somewhere it always seems to be the trigger for failed resumes
I have no idea why.

BTW, what was different about your 2nd patch (the addition of acpi_s2idle_sync()
to the resume cycle) was I was getting WMI power events captured during
resume in the EC blocked/unblocked path, which hadn't happened before:

-----
Nov 25 10:16:56 hp-x360n kernel: [ 4418.792164] PM: suspend entry (s2idle)
Nov 25 10:16:56 hp-x360n kernel: [ 4418.801869] Filesystems sync: 0.009 seconds
Nov 25 10:17:29 hp-x360n kernel: [ 4418.802615] Freezing user space processes ... (elapsed 0.002 seconds) done.
Nov 25 10:17:29 hp-x360n kernel: [ 4418.805200] OOM killer disabled.
Nov 25 10:17:29 hp-x360n kernel: [ 4418.805201] Freezing remaining freezable tasks ... (elapsed 0.000 seconds) done.
Nov 25 10:17:29 hp-x360n kernel: [ 4418.806127] printk: Suspending console(s) (use no_console_suspend to debug)
Nov 25 10:17:29 hp-x360n kernel: [ 4418.817183] [drm] GuC communication disabled
Nov 25 10:17:29 hp-x360n kernel: [ 4419.296506] ACPI: EC: interrupt blocked
*** Nov 25 10:17:29 hp-x360n kernel: [ 4440.111494] hp_wmi_notify(): event_id 0x20001, event_data 0x00000000
*** Nov 25 10:17:29 hp-x360n kernel: [ 4440.111496] hp_wmi: Unknown event_id - 131073 - 0x0
*** Nov 25 10:17:29 hp-x360n kernel: [ 4440.198013] hp_wmi_notify(): event_id 0x03, event_data 0x00000000
Nov 25 10:17:29 hp-x360n kernel: [ 4449.527844] ACPI: EC: interrupt unblocked
Nov 25 10:17:29 hp-x360n kernel: [ 4451.118846] intel_pmc_core INT33A1:00: CPU did not enter SLP_S0!!! (S0ix cnt=0)
Nov 25 10:17:29 hp-x360n kernel: [ 4451.126383] usb usb3: root hub lost power or was reset
Nov 25 10:17:29 hp-x360n kernel: [ 4451.126384] usb usb4: root hub lost power or was reset
Nov 25 10:17:29 hp-x360n kernel: [ 4451.127298] [drm] GuC communication enabled
Nov 25 10:17:29 hp-x360n kernel: [ 4451.127400] i915 0000:00:02.0: GuC firmware i915/kbl_guc_33.0.0.bin version 33.0 subm
ission:disabled
Nov 25 10:17:29 hp-x360n kernel: [ 4451.127401] i915 0000:00:02.0: HuC firmware i915/kbl_huc_ver02_00_1810.bin version 2.
0 authenticated:yes
Nov 25 10:17:29 hp-x360n kernel: [ 4452.229576] OOM killer enabled.
Nov 25 10:17:29 hp-x360n kernel: [ 4452.229577] Restarting tasks ... done.
Nov 25 10:17:29 hp-x360n auto-rotate[15448]: normal
Nov 25 10:17:29 hp-x360n kernel: [ 4452.256340] thermal thermal_zone7: failed to read out thermal zone (-61)
Nov 25 10:17:29 hp-x360n systemd-udevd[18777]: Process '/usr/sbin/tlp auto' failed with exit code 4.
Nov 25 10:17:29 hp-x360n systemd-udevd[18777]: Process '/usr/sbin/tlp auto' failed with exit code 4.
Nov 25 10:17:29 hp-x360n systemd-sleep[18673]: System resumed.
Nov 25 10:17:29 hp-x360n kernel: [ 4452.313345] PM: suspend exit
----

(best I can tell, event_id 0x20001 is a charger unplug event and event_id 0x03 is a charger plug event).

BTW, is ther any way to determine if this is an issue with some other driver? I'd think I'd be getting BUG_ON()s
if that were it, though.

	-Kenny

-- 
Kenneth R. Crudup  Sr. SW Engineer, Scott County Consulting, Silicon Valley

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-25 20:11                 ` Kenneth R. Crudup
@ 2019-11-25 20:19                   ` Kenneth R. Crudup
  2019-11-25 22:01                   ` Rafael J. Wysocki
  1 sibling, 0 replies; 37+ messages in thread
From: Kenneth R. Crudup @ 2019-11-25 20:19 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Rafael J. Wysocki, Rafael Wysocki, Linux PM


On Mon, 25 Nov 2019, Kenneth R. Crudup wrote:

> Had to do a long- power-button reset to get it back up again.

Datapoint that may be useful- when I have to do the hard power cycle after
a failed suspend, the ports on this laptop (2x USB-C/Thunderbolt 3) get
totally wedged up- the charging light stays on even if there's nothing
plugged into either port, and USB/TB operation is almost always wedged
up (devices not recognized). I have to use a special key sequence (hold
down F6 and power at the same time for several seconds) to get it unwedged.

The system seems to also think the laptop is charging as well, even with
nothing plugged into either port. It's like something in the ACPI/BIOS
doesn't get "unsuspended" either.

I used to have TB and USB as modules that would get unloaded at suspend to
see if those were related to this issue, but that only seemed to make idle
power draw even higher for some reason.

	-Kenny

-- 
Kenneth R. Crudup  Sr. SW Engineer, Scott County Consulting, Silicon Valley

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-25 16:21             ` Kenneth R. Crudup
@ 2019-11-25 21:46               ` Rafael J. Wysocki
  2019-11-25 23:02                 ` Kenneth R. Crudup
  0 siblings, 1 reply; 37+ messages in thread
From: Rafael J. Wysocki @ 2019-11-25 21:46 UTC (permalink / raw)
  To: Kenneth R. Crudup; +Cc: Rafael J. Wysocki, Rafael Wysocki, Linux PM

On Mon, Nov 25, 2019 at 5:21 PM Kenneth R. Crudup <kenny@panix.com> wrote:
>
>
> On Mon, 25 Nov 2019, Rafael J. Wysocki wrote:
>
> > OK, but with sleep_no_lps0, does it also hang during resume or not?
>
> It never goes to sleep, is what I'm saying. Screen goes dark, but the machine
> seems to never go into sleep states.

So what exactly does happen when you try to suspend with
sleep_no_lps0?  Does it return to user space right away by itself, or
do you need to do anything to make it do so?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-25 18:27               ` Kenneth R. Crudup
  2019-11-25 20:11                 ` Kenneth R. Crudup
@ 2019-11-25 21:47                 ` Rafael J. Wysocki
  1 sibling, 0 replies; 37+ messages in thread
From: Rafael J. Wysocki @ 2019-11-25 21:47 UTC (permalink / raw)
  To: Kenneth R. Crudup
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Rafael Wysocki, Linux PM

On Mon, Nov 25, 2019 at 7:27 PM Kenneth R. Crudup <kenny@panix.com> wrote:
>
>
> On Mon, 25 Nov 2019, Rafael J. Wysocki wrote:
>
> > > I'll send you a patch to try for that.
>
> > Appended (untested).
>
> This is promising! I've only tested a handful of suspend cycles, but so far:
>
> - Charging events don't wake it up
> - It suspends with the charger connected
> - Suspend current draw consistently gets to the lowest idle point
> - It always comes back from suspend, even after connecting a charger when suspended
>
> ... IOW, everything just seems to work, and better than before. I just
> remembered why I had to use "ec_no_wakeup" (pre suspend-rework) in the first
> place- this is a convertible and accelerometer events would take the laptop
> out of (and immediately back into) suspend and your rework fixed that but then
> I had the issues listed in my original E-mail.
>
> I'll keep testing it (and the best way to get something to break is to proclaim
> it's "fixed" :), but for now I'm pretty happy with the patch.

Well, so with this patch the problem is at least not reproducible
readily it seems?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-25 20:11                 ` Kenneth R. Crudup
  2019-11-25 20:19                   ` Kenneth R. Crudup
@ 2019-11-25 22:01                   ` Rafael J. Wysocki
  2019-11-25 23:32                     ` Kenneth R. Crudup
  1 sibling, 1 reply; 37+ messages in thread
From: Rafael J. Wysocki @ 2019-11-25 22:01 UTC (permalink / raw)
  To: Kenneth R. Crudup
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Rafael Wysocki, Linux PM

On Mon, Nov 25, 2019 at 9:11 PM Kenneth R. Crudup <kenny@panix.com> wrote:
>
>
> On Mon, 25 Nov 2019, Kenneth R. Crudup wrote:
>
> > I'll keep testing it (and the best way to get something to break is to
> > proclaim it's "fixed" :)
>
> ... grumble ...!
>
> So, put the laptop to sleep, put it in my bag[1], then head off to go work.
> Get to where I'm working, plug in the power meter, it's now drawing 150mA
> while idle ...  uh oh. Sure enough, open the lid, hit some keys, dead. Had
> to do a long- power-button reset to get it back up again. I note that it's
> now charging (I may not have allowed the battery to drain enough for the
> charging circuit think the battery's discharged enough to begin a charge
> cycle before).

It looks like the platform has problems with switching power between
AC and battery while suspended.  Also this appears to be related to
the EC.

Also I'm wondering if the problem is reproducible with ec_no_wakeup
and the last patch I posted applied.

> I'm going to go back to the original patch (the three removed lines) for now
> (which may not do any better long-term, but I'll have to at least try.

The last patch I posted is orthogonal to this.  It actually should be
entirely transparent on systems without any issues.

But with that patch applied, please try to comment out the
acpi_s2idle_sync() in acpi_s2idle_wake() and retest.

> [1] - I can let it sit idle at home and it'll be OK, but when I put it in
> my bag and go somewhere it always seems to be the trigger for failed resumes
> I have no idea why.
>
> BTW, what was different about your 2nd patch (the addition of acpi_s2idle_sync()
> to the resume cycle) was I was getting WMI power events captured during
> resume in the EC blocked/unblocked path, which hadn't happened before:

That's because of the added ACPI events flushing in acpi_s2idle_restore().

> -----
> Nov 25 10:16:56 hp-x360n kernel: [ 4418.792164] PM: suspend entry (s2idle)
> Nov 25 10:16:56 hp-x360n kernel: [ 4418.801869] Filesystems sync: 0.009 seconds
> Nov 25 10:17:29 hp-x360n kernel: [ 4418.802615] Freezing user space processes ... (elapsed 0.002 seconds) done.
> Nov 25 10:17:29 hp-x360n kernel: [ 4418.805200] OOM killer disabled.
> Nov 25 10:17:29 hp-x360n kernel: [ 4418.805201] Freezing remaining freezable tasks ... (elapsed 0.000 seconds) done.
> Nov 25 10:17:29 hp-x360n kernel: [ 4418.806127] printk: Suspending console(s) (use no_console_suspend to debug)
> Nov 25 10:17:29 hp-x360n kernel: [ 4418.817183] [drm] GuC communication disabled
> Nov 25 10:17:29 hp-x360n kernel: [ 4419.296506] ACPI: EC: interrupt blocked
> *** Nov 25 10:17:29 hp-x360n kernel: [ 4440.111494] hp_wmi_notify(): event_id 0x20001, event_data 0x00000000
> *** Nov 25 10:17:29 hp-x360n kernel: [ 4440.111496] hp_wmi: Unknown event_id - 131073 - 0x0

This looks kind of weird.

Do you get similar "unknown" events during normal operation too?

Anyway, hp_wmi is involved and I'm quite unsure if it is ready to
handle events during acpi_s2idle_wake(), so that's why it may not work
every time with the last patch applied.

> *** Nov 25 10:17:29 hp-x360n kernel: [ 4440.198013] hp_wmi_notify(): event_id 0x03, event_data 0x00000000
> Nov 25 10:17:29 hp-x360n kernel: [ 4449.527844] ACPI: EC: interrupt unblocked
> Nov 25 10:17:29 hp-x360n kernel: [ 4451.118846] intel_pmc_core INT33A1:00: CPU did not enter SLP_S0!!! (S0ix cnt=0)
> Nov 25 10:17:29 hp-x360n kernel: [ 4451.126383] usb usb3: root hub lost power or was reset
> Nov 25 10:17:29 hp-x360n kernel: [ 4451.126384] usb usb4: root hub lost power or was reset
> Nov 25 10:17:29 hp-x360n kernel: [ 4451.127298] [drm] GuC communication enabled
> Nov 25 10:17:29 hp-x360n kernel: [ 4451.127400] i915 0000:00:02.0: GuC firmware i915/kbl_guc_33.0.0.bin version 33.0 subm
> ission:disabled
> Nov 25 10:17:29 hp-x360n kernel: [ 4451.127401] i915 0000:00:02.0: HuC firmware i915/kbl_huc_ver02_00_1810.bin version 2.
> 0 authenticated:yes
> Nov 25 10:17:29 hp-x360n kernel: [ 4452.229576] OOM killer enabled.
> Nov 25 10:17:29 hp-x360n kernel: [ 4452.229577] Restarting tasks ... done.
> Nov 25 10:17:29 hp-x360n auto-rotate[15448]: normal
> Nov 25 10:17:29 hp-x360n kernel: [ 4452.256340] thermal thermal_zone7: failed to read out thermal zone (-61)
> Nov 25 10:17:29 hp-x360n systemd-udevd[18777]: Process '/usr/sbin/tlp auto' failed with exit code 4.
> Nov 25 10:17:29 hp-x360n systemd-udevd[18777]: Process '/usr/sbin/tlp auto' failed with exit code 4.
> Nov 25 10:17:29 hp-x360n systemd-sleep[18673]: System resumed.
> Nov 25 10:17:29 hp-x360n kernel: [ 4452.313345] PM: suspend exit
> ----
>
> (best I can tell, event_id 0x20001 is a charger unplug event and event_id 0x03 is a charger plug event).
>
> BTW, is ther any way to determine if this is an issue with some other driver? I'd think I'd be getting BUG_ON()s
> if that were it, though.

Not necessarily.

It appears to be related to hp_wmi as mentioned above.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-25 21:46               ` Rafael J. Wysocki
@ 2019-11-25 23:02                 ` Kenneth R. Crudup
  2019-11-26  8:53                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 37+ messages in thread
From: Kenneth R. Crudup @ 2019-11-25 23:02 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Rafael Wysocki, Linux PM


On Mon, 25 Nov 2019, Rafael J. Wysocki wrote:

> So what exactly does happen when you try to suspend with
> sleep_no_lps0?  Does it return to user space right away by itself

Yeah. It's like it never suspends (or suspends quickly and comes right
back; I didn't bother to look at the dmesg in those cases). I could find
out if you wish, but- see my next E-mails.

	-Kenny

-- 
Kenneth R. Crudup  Sr. SW Engineer, Scott County Consulting, Silicon Valley

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-25 22:01                   ` Rafael J. Wysocki
@ 2019-11-25 23:32                     ` Kenneth R. Crudup
  2019-11-26  8:50                       ` Rafael J. Wysocki
  0 siblings, 1 reply; 37+ messages in thread
From: Kenneth R. Crudup @ 2019-11-25 23:32 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Rafael J. Wysocki, Rafael Wysocki, Linux PM


On Mon, 25 Nov 2019, Rafael J. Wysocki wrote:

> It looks like the platform has problems with switching power between
> AC and battery while suspended.  Also this appears to be related to the EC.

That being said, I always keep a stock kernel on this machine (latest is Kubuntu
vmlinuz-5.0.0-35-generic) and the failure mode isn't reproducible on that. I'm
still wondering what's substantially different between the versions.

BTW, I've got breaking it down to a science now:

- Let the battery drain pretty well down (< ~80%)
- Boot the kernel I wish to test, while still on battery
- Intiate a suspend/resume, which will come back OK
- Initiate another suspend
- Plug in the charger. If I have my power meter in, I see it do the PD
  negotiation (it'll start off at 20V/150mA, then it'll PD ramp up to
  a full 2.5-2.75A @20v to charge the battery)
- Try to resume. It'll be totally dead and I have to long-power-button
  to get it back

- If I have "ec_no_wakeup=1" set, then the first time it seems to resume
  OK from the procedure above, then I'll wait about two minutes then
  flip the laptop around a bunch (apparently getting the accelerometer
  involved, which I guess happens during a car ride) which can also
  trigger the hard hangup when I plug it in before suspending

> Also I'm wondering if the problem is reproducible with ec_no_wakeup
> and the last patch I posted applied.

With the stock -rc kernel, I used to be able to work around this issue with
"ec_no_wakeup=1", but sometimes the battery would drain while "suspended"
far higher than idle at random.

With the second patchset (the introduction of acpi_s2idle_sync()) the
*second* resume while the battery is charging locks it up hard.

> But with that patch applied, please try to comment out the
> acpi_s2idle_sync() in acpi_s2idle_wake() and retest.

So get this- I took it out of acpi_s2idle_wake() and left it in acpi_s2idle_restore() ;
when I do this, I get the issue where it seems to no longer care about the
charging state, BUT it is back to responding to charger events (i.e., it won't
go to sleep as long as it's plugged in, and "ec_no_wakeup=1" has no effect on this.

It also turns out that it doesn't apparently matter if I remove the call of acpi_s2idle_sync()
from either of acpi_s2idle_wake() or acpi_s2idle_restore() but I get the behavior above.
I took a shot in the dark and commented out the call to acpi_ec_flush_work() in acpi_s2idle_sync()
and left the calls to acpi_s2idle_sync() in both _wake and _restore and also get this
behavior, which to me implies that whatever is in the acpi_ec_flush_work() call (and I
looked at it, but didn't go any further) is where it's hanging up. I can put WARN_ON(cond)
calls in places that'll write to EFI pstore, if you think that would be useful.

As this workaround does at least (so far, anyway) keep the machine from going off into
a black hole, I'm willing to live with it until/as we find a workaround that gets it
right; even though it'll be somewhat annoying to not go to sleep when charged AND
automatically wake up on power connection/disconnection events (can those be filtered
out of /proc/acpi/wakeup_sources ? Nothing looked obvious) at least I have a laptop
when I need it.

Can you think of any other strategy to try (and I'm all ears at this point; this
kind of breakage- while almost certainly that something in this HP BIOS/ACPI
is doing wrong, should be properly worked around for the rest of us poor suckers
who got these devices so I'm willing to help either fix it,

> > [1] - I can let it sit idle at home and it'll be OK, but when I put it in
> > my bag and go somewhere it always seems to be the trigger for failed resumes
> > I have no idea why.
> > BTW, what was different about your 2nd patch (the addition of acpi_s2idle_sync()
> > to the resume cycle) was I was getting WMI power events captured during
> > resume in the EC blocked/unblocked path, which hadn't happened before:

> That's because of the added ACPI events flushing in acpi_s2idle_restore().

> > Nov 25 10:17:29 hp-x360n kernel: [ 4419.296506] ACPI: EC: interrupt blocked
> > *** Nov 25 10:17:29 hp-x360n kernel: [ 4440.111494] hp_wmi_notify(): event_id 0x20001, event_data 0x00000000
> > *** Nov 25 10:17:29 hp-x360n kernel: [ 4440.111496] hp_wmi: Unknown event_id - 131073 - 0x0

> This looks kind of weird.
> Do you get similar "unknown" events during normal operation too?

Those two "unknows" only come about during charger state events, and I do see them when
up and running when I plug and unplug (that line with hp_wmi_notify() is my own addition).

> Anyway, hp_wmi is involved and I'm quite unsure if it is ready to
> handle events during acpi_s2idle_wake(), so that's why it may not work
> every time with the last patch applied.

So, I tried removing the total WMI subsystem first, then removed the HP-WMI subsystem with
basic WMI in. No change. Going thru the sequence above still locked things up hard.

	-Kenny

-- 
Kenneth R. Crudup  Sr. SW Engineer, Scott County Consulting, Silicon Valley

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-25 23:32                     ` Kenneth R. Crudup
@ 2019-11-26  8:50                       ` Rafael J. Wysocki
  2019-11-26 16:12                         ` Rafael J. Wysocki
  2019-11-26 16:15                         ` Kenneth R. Crudup
  0 siblings, 2 replies; 37+ messages in thread
From: Rafael J. Wysocki @ 2019-11-26  8:50 UTC (permalink / raw)
  To: Kenneth R. Crudup
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Rafael Wysocki, Linux PM

On Tue, Nov 26, 2019 at 12:32 AM Kenneth R. Crudup <kenny@panix.com> wrote:
>
>
> On Mon, 25 Nov 2019, Rafael J. Wysocki wrote:
>
> > It looks like the platform has problems with switching power between
> > AC and battery while suspended.  Also this appears to be related to the EC.
>
> That being said, I always keep a stock kernel on this machine (latest is Kubuntu
> vmlinuz-5.0.0-35-generic) and the failure mode isn't reproducible on that. I'm
> still wondering what's substantially different between the versions.
>
> BTW, I've got breaking it down to a science now:
>
> - Let the battery drain pretty well down (< ~80%)
> - Boot the kernel I wish to test, while still on battery
> - Intiate a suspend/resume, which will come back OK
> - Initiate another suspend
> - Plug in the charger. If I have my power meter in, I see it do the PD
>   negotiation (it'll start off at 20V/150mA, then it'll PD ramp up to
>   a full 2.5-2.75A @20v to charge the battery)
> - Try to resume. It'll be totally dead and I have to long-power-button
>   to get it back
>
> - If I have "ec_no_wakeup=1" set, then the first time it seems to resume
>   OK from the procedure above, then I'll wait about two minutes then
>   flip the laptop around a bunch (apparently getting the accelerometer
>   involved, which I guess happens during a car ride) which can also
>   trigger the hard hangup when I plug it in before suspending
>
> > Also I'm wondering if the problem is reproducible with ec_no_wakeup
> > and the last patch I posted applied.
>
> With the stock -rc kernel, I used to be able to work around this issue with
> "ec_no_wakeup=1", but sometimes the battery would drain while "suspended"
> far higher than idle at random.
>
> With the second patchset (the introduction of acpi_s2idle_sync()) the
> *second* resume while the battery is charging locks it up hard.
>
> > But with that patch applied, please try to comment out the
> > acpi_s2idle_sync() in acpi_s2idle_wake() and retest.
>
> So get this- I took it out of acpi_s2idle_wake() and left it in acpi_s2idle_restore() ;
> when I do this, I get the issue where it seems to no longer care about the
> charging state, BUT it is back to responding to charger events (i.e., it won't
> go to sleep as long as it's plugged in, and "ec_no_wakeup=1" has no effect on this.
>
> It also turns out that it doesn't apparently matter if I remove the call of acpi_s2idle_sync()
> from either of acpi_s2idle_wake() or acpi_s2idle_restore() but I get the behavior above.
> I took a shot in the dark and commented out the call to acpi_ec_flush_work() in acpi_s2idle_sync()
> and left the calls to acpi_s2idle_sync() in both _wake and _restore and also get this
> behavior, which to me implies that whatever is in the acpi_ec_flush_work() call (and I
> looked at it, but didn't go any further) is where it's hanging up.

It doesn't have to be like that.

What acpi_ec_flush_work() does it to process the EC events received so
far and they may trigger notifications that are processed
subsequently.  The next acpi_os_wait_events_complete() is there to
wait for that processing to finish.

It looks like it hangs somewhere in the platform firmware, but it is
hard to say where exactly.

> I can put WARN_ON(cond)
> calls in places that'll write to EFI pstore, if you think that would be useful.
>
> As this workaround does at least (so far, anyway) keep the machine from going off into
> a black hole,

So to be precise you mean commenting out acpi_ec_flush_work() in
acpi_s2idle_sync() with the

https://lore.kernel.org/linux-pm/alpine.DEB.2.21.1911220920500.24730@hp-x360n/T/#m199e175157c9cddf48fdee16681f6c2f0549a364

patch applied?

And does the power button/lid wakeup work without that?

> I'm willing to live with it until/as we find a workaround that gets it
> right; even though it'll be somewhat annoying to not go to sleep when charged AND
> automatically wake up on power connection/disconnection events (can those be filtered
> out of /proc/acpi/wakeup_sources ? Nothing looked obvious) at least I have a laptop
> when I need it.

There should be a way to filter them out.

From the above it looks like this happens with "ec_no_wakeup=1" too,
is that correct?

> Can you think of any other strategy to try (and I'm all ears at this point; this
> kind of breakage- while almost certainly that something in this HP BIOS/ACPI
> is doing wrong, should be properly worked around for the rest of us poor suckers
> who got these devices so I'm willing to help either fix it,

It's rather that the platform firmware has expectations that we don't
know about.

Yes, it should be possible to diagnose it further with some extra work.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-25 23:02                 ` Kenneth R. Crudup
@ 2019-11-26  8:53                   ` Rafael J. Wysocki
  0 siblings, 0 replies; 37+ messages in thread
From: Rafael J. Wysocki @ 2019-11-26  8:53 UTC (permalink / raw)
  To: Kenneth R. Crudup; +Cc: Rafael J. Wysocki, Rafael Wysocki, Linux PM

On Tue, Nov 26, 2019 at 12:02 AM Kenneth R. Crudup <kenny@panix.com> wrote:
>
>
> On Mon, 25 Nov 2019, Rafael J. Wysocki wrote:
>
> > So what exactly does happen when you try to suspend with
> > sleep_no_lps0?  Does it return to user space right away by itself
>
> Yeah. It's like it never suspends (or suspends quickly and comes right
> back; I didn't bother to look at the dmesg in those cases). I could find
> out if you wish, but- see my next E-mails.

It would be good to see how far it goes.

Please echo 1 to /sys/power/pm_debug_messages and send me a dmesg part
covering a suspend cycle with sleep_no_lps0 equal to 1.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-26  8:50                       ` Rafael J. Wysocki
@ 2019-11-26 16:12                         ` Rafael J. Wysocki
  2019-11-26 16:15                         ` Kenneth R. Crudup
  1 sibling, 0 replies; 37+ messages in thread
From: Rafael J. Wysocki @ 2019-11-26 16:12 UTC (permalink / raw)
  To: Kenneth R. Crudup; +Cc: Rafael J. Wysocki, Rafael Wysocki, Linux PM

On Tue, Nov 26, 2019 at 9:50 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Tue, Nov 26, 2019 at 12:32 AM Kenneth R. Crudup <kenny@panix.com> wrote:
> >
> >
> > On Mon, 25 Nov 2019, Rafael J. Wysocki wrote:
> >
> > > It looks like the platform has problems with switching power between
> > > AC and battery while suspended.  Also this appears to be related to the EC.
> >
> > That being said, I always keep a stock kernel on this machine (latest is Kubuntu
> > vmlinuz-5.0.0-35-generic) and the failure mode isn't reproducible on that. I'm
> > still wondering what's substantially different between the versions.
> >
> > BTW, I've got breaking it down to a science now:
> >
> > - Let the battery drain pretty well down (< ~80%)
> > - Boot the kernel I wish to test, while still on battery
> > - Intiate a suspend/resume, which will come back OK
> > - Initiate another suspend
> > - Plug in the charger. If I have my power meter in, I see it do the PD
> >   negotiation (it'll start off at 20V/150mA, then it'll PD ramp up to
> >   a full 2.5-2.75A @20v to charge the battery)
> > - Try to resume. It'll be totally dead and I have to long-power-button
> >   to get it back
> >
> > - If I have "ec_no_wakeup=1" set, then the first time it seems to resume
> >   OK from the procedure above, then I'll wait about two minutes then
> >   flip the laptop around a bunch (apparently getting the accelerometer
> >   involved, which I guess happens during a car ride) which can also
> >   trigger the hard hangup when I plug it in before suspending
> >
> > > Also I'm wondering if the problem is reproducible with ec_no_wakeup
> > > and the last patch I posted applied.
> >
> > With the stock -rc kernel, I used to be able to work around this issue with
> > "ec_no_wakeup=1", but sometimes the battery would drain while "suspended"
> > far higher than idle at random.
> >
> > With the second patchset (the introduction of acpi_s2idle_sync()) the
> > *second* resume while the battery is charging locks it up hard.
> >
> > > But with that patch applied, please try to comment out the
> > > acpi_s2idle_sync() in acpi_s2idle_wake() and retest.
> >
> > So get this- I took it out of acpi_s2idle_wake() and left it in acpi_s2idle_restore() ;
> > when I do this, I get the issue where it seems to no longer care about the
> > charging state, BUT it is back to responding to charger events (i.e., it won't
> > go to sleep as long as it's plugged in, and "ec_no_wakeup=1" has no effect on this.
> >
> > It also turns out that it doesn't apparently matter if I remove the call of acpi_s2idle_sync()
> > from either of acpi_s2idle_wake() or acpi_s2idle_restore() but I get the behavior above.
> > I took a shot in the dark and commented out the call to acpi_ec_flush_work() in acpi_s2idle_sync()
> > and left the calls to acpi_s2idle_sync() in both _wake and _restore and also get this
> > behavior, which to me implies that whatever is in the acpi_ec_flush_work() call (and I
> > looked at it, but didn't go any further) is where it's hanging up.
>
> It doesn't have to be like that.
>
> What acpi_ec_flush_work() does it to process the EC events received so
> far and they may trigger notifications that are processed
> subsequently.  The next acpi_os_wait_events_complete() is there to
> wait for that processing to finish.
>
> It looks like it hangs somewhere in the platform firmware, but it is
> hard to say where exactly.
>
> > I can put WARN_ON(cond)
> > calls in places that'll write to EFI pstore, if you think that would be useful.
> >
> > As this workaround does at least (so far, anyway) keep the machine from going off into
> > a black hole,
>
> So to be precise you mean commenting out acpi_ec_flush_work() in
> acpi_s2idle_sync() with the
>
> https://lore.kernel.org/linux-pm/alpine.DEB.2.21.1911220920500.24730@hp-x360n/T/#m199e175157c9cddf48fdee16681f6c2f0549a364
>
> patch applied?
>
> And does the power button/lid wakeup work without that?
>
> > I'm willing to live with it until/as we find a workaround that gets it
> > right; even though it'll be somewhat annoying to not go to sleep when charged AND
> > automatically wake up on power connection/disconnection events (can those be filtered
> > out of /proc/acpi/wakeup_sources ? Nothing looked obvious) at least I have a laptop
> > when I need it.
>
> There should be a way to filter them out.
>
> From the above it looks like this happens with "ec_no_wakeup=1" too,
> is that correct?
>
> > Can you think of any other strategy to try (and I'm all ears at this point; this
> > kind of breakage- while almost certainly that something in this HP BIOS/ACPI
> > is doing wrong, should be properly worked around for the rest of us poor suckers
> > who got these devices so I'm willing to help either fix it,
>
> It's rather that the platform firmware has expectations that we don't
> know about.
>
> Yes, it should be possible to diagnose it further with some extra work.

To that end, please send me the output of dmidecode and acpidump from
your laptop.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-26  8:50                       ` Rafael J. Wysocki
  2019-11-26 16:12                         ` Rafael J. Wysocki
@ 2019-11-26 16:15                         ` Kenneth R. Crudup
  2019-11-26 16:27                           ` Rafael J. Wysocki
  1 sibling, 1 reply; 37+ messages in thread
From: Kenneth R. Crudup @ 2019-11-26 16:15 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Rafael J. Wysocki, Rafael Wysocki, Linux PM


On Tue, 26 Nov 2019, Rafael J. Wysocki wrote:

> > BTW, I've got breaking it down to a science now:

> > - Let the battery drain pretty well down (< ~80%)
> > - Boot the kernel I wish to test, while still on battery
> > - Intiate a suspend/resume, which will come back OK
> > - Initiate another suspend
> > - Plug in the charger. If I have my power meter in, I see it do the PD
> >   negotiation (it'll start off at 20V/150mA, then it'll PD ramp up to
> >   a full 2.5-2.75A @20v to charge the battery)
> > - Try to resume. It'll be totally dead and I have to long-power-button
> >   to get it back

Huh. So ... I run bleeding edge and grab Linus' tip as it's pushed; I'd
seen some changes to workqueues in last night's merge. Personally, I feel
that if workqueues were AFU the kernel would be a hot mess, but at this
point since there's one in the path of acpi_ec_flush_work() called from
acpi_s2idle_sync() and I'm kinda desperate at this point for at least
something to help fix this, I put your 2nd patch back in earnest (uncommented
out acpi_ec_flush_work() from acpi_s2idle_sync()) and added a pair of
WARN_ON(1)s (which should dump to pstore- or is that "BUG_ON()"?) around
the spin_lock/unlock in acpi_ec_query_flushed().

Who knows if this is a race condition, the extra (slight) overhead of the
output of the WARN_ON, or just dumb luck but it ... here I go, tempting
fate ... seems to be working. I drained the battery to 75% and ran the
procedure a couple of times yesterday, and so far, so good.

Of course, I'll have to give it the "car ride test" (and I'll be damned if
I know why that brings this bug out so consistently that flipping it around
on its axes randomly for a while does), but I'm hopeful (and again, a public
announcement is usually guaranteed to make it break).

Stay tuned, again,

	-Kenny

-- 
Kenneth R. Crudup  Sr. SW Engineer, Scott County Consulting, Silicon Valley

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-26 16:15                         ` Kenneth R. Crudup
@ 2019-11-26 16:27                           ` Rafael J. Wysocki
  2019-11-26 16:35                             ` Kenneth R. Crudup
  0 siblings, 1 reply; 37+ messages in thread
From: Rafael J. Wysocki @ 2019-11-26 16:27 UTC (permalink / raw)
  To: Kenneth R. Crudup
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Rafael Wysocki, Linux PM

On Tue, Nov 26, 2019 at 5:15 PM Kenneth R. Crudup <kenny@panix.com> wrote:
>
>
> On Tue, 26 Nov 2019, Rafael J. Wysocki wrote:
>
> > > BTW, I've got breaking it down to a science now:
>
> > > - Let the battery drain pretty well down (< ~80%)
> > > - Boot the kernel I wish to test, while still on battery
> > > - Intiate a suspend/resume, which will come back OK
> > > - Initiate another suspend
> > > - Plug in the charger. If I have my power meter in, I see it do the PD
> > >   negotiation (it'll start off at 20V/150mA, then it'll PD ramp up to
> > >   a full 2.5-2.75A @20v to charge the battery)
> > > - Try to resume. It'll be totally dead and I have to long-power-button
> > >   to get it back
>
> Huh. So ... I run bleeding edge and grab Linus' tip as it's pushed; I'd
> seen some changes to workqueues in last night's merge. Personally, I feel
> that if workqueues were AFU the kernel would be a hot mess, but at this
> point since there's one in the path of acpi_ec_flush_work() called from
> acpi_s2idle_sync() and I'm kinda desperate at this point for at least
> something to help fix this, I put your 2nd patch back in earnest (uncommented
> out acpi_ec_flush_work() from acpi_s2idle_sync()) and added a pair of
> WARN_ON(1)s (which should dump to pstore- or is that "BUG_ON()"?) around
> the spin_lock/unlock in acpi_ec_query_flushed().

OK, so just to double check if I understand you correctly, you are
running the Linus' tip with this patch on top

https://lore.kernel.org/linux-pm/alpine.DEB.2.21.1911220920500.24730@hp-x360n/T/#m199e175157c9cddf48fdee16681f6c2f0549a364

and with two extra WARN_ON(1) statements in acpi_ec_query_flushed()?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-26 16:27                           ` Rafael J. Wysocki
@ 2019-11-26 16:35                             ` Kenneth R. Crudup
  2019-11-26 18:48                               ` Rafael J. Wysocki
  0 siblings, 1 reply; 37+ messages in thread
From: Kenneth R. Crudup @ 2019-11-26 16:35 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Rafael J. Wysocki, Rafael Wysocki, Linux PM

On Tue, 26 Nov 2019, Rafael J. Wysocki wrote:

> OK, so just to double check if I understand you correctly, you are
> running the Linus' tip with [the patch] on top and with two extra
> WARN_ON(1) statements in acpi_ec_query_flushed()?

Yeah:

----
$ git diff lk-linus/master drivers/acpi/

diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index da1e5c5ce150..0bb13a596e4f 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -530,9 +530,11 @@ static bool acpi_ec_query_flushed(struct acpi_ec *ec)
 	bool flushed;
 	unsigned long flags;

+	WARN_ON(1);
 	spin_lock_irqsave(&ec->lock, flags);
 	flushed = !ec->nr_pending_queries;
 	spin_unlock_irqrestore(&ec->lock, flags);
+	WARN_ON(1);
 	return flushed;
 }

diff --git a/drivers/acpi/sleep.c b/drivers/acpi/sleep.c
index 2af937a8b1c5..003b314eda29 100644
--- a/drivers/acpi/sleep.c
+++ b/drivers/acpi/sleep.c
@@ -977,6 +977,13 @@ static int acpi_s2idle_prepare_late(void)
 	return 0;
 }

+static void acpi_s2idle_sync(void)
+{
+	acpi_os_wait_events_complete(); /* synchronize GPE processing */
+	acpi_ec_flush_work();
+	acpi_os_wait_events_complete(); /* synchronize Notify handling */
+}
+
 static void acpi_s2idle_wake(void)
 {
 	/*
@@ -1005,9 +1012,7 @@ static void acpi_s2idle_wake(void)
 		 * The EC driver uses the system workqueue and an additional
 		 * special one, so those need to be flushed too.
 		 */
-		acpi_os_wait_events_complete(); /* synchronize EC GPE processing */
-		acpi_ec_flush_work();
-		acpi_os_wait_events_complete(); /* synchronize Notify handling */
+		acpi_s2idle_sync();

 		rearm_wake_irq(acpi_sci_irq);
 	}
@@ -1024,6 +1029,8 @@ static void acpi_s2idle_restore_early(void)

 static void acpi_s2idle_restore(void)
 {
+	acpi_s2idle_sync();
+
 	s2idle_wakeup = false;

 	acpi_enable_all_runtime_gpes();
----

	-Kenny

-- 
Kenneth R. Crudup  Sr. SW Engineer, Scott County Consulting, Silicon Valley

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-26 16:35                             ` Kenneth R. Crudup
@ 2019-11-26 18:48                               ` Rafael J. Wysocki
  2019-11-26 19:03                                 ` Kenneth R. Crudup
  0 siblings, 1 reply; 37+ messages in thread
From: Rafael J. Wysocki @ 2019-11-26 18:48 UTC (permalink / raw)
  To: Kenneth R. Crudup; +Cc: Rafael J. Wysocki, Rafael Wysocki, Linux PM

On Tuesday, November 26, 2019 5:35:10 PM CET Kenneth R. Crudup wrote:
> On Tue, 26 Nov 2019, Rafael J. Wysocki wrote:
> 
> > OK, so just to double check if I understand you correctly, you are
> > running the Linus' tip with [the patch] on top and with two extra
> > WARN_ON(1) statements in acpi_ec_query_flushed()?
> 
> Yeah:
> 
> ----
> $ git diff lk-linus/master drivers/acpi/
> 
> diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
> index da1e5c5ce150..0bb13a596e4f 100644
> --- a/drivers/acpi/ec.c
> +++ b/drivers/acpi/ec.c
> @@ -530,9 +530,11 @@ static bool acpi_ec_query_flushed(struct acpi_ec *ec)
>  	bool flushed;
>  	unsigned long flags;
> 
> +	WARN_ON(1);
>  	spin_lock_irqsave(&ec->lock, flags);
>  	flushed = !ec->nr_pending_queries;
>  	spin_unlock_irqrestore(&ec->lock, flags);
> +	WARN_ON(1);
>  	return flushed;
>  }

I've just realized that it is not necessary to use acpi_ec_query_flushed() in
acpi_ec_flush_work() at all, because it calls flush_scheduled_work() anyway,
so the appended patch should be good to go.

Can you try it, please?

---
 drivers/acpi/ec.c    |   20 ++++++++++----------
 drivers/acpi/sleep.c |   13 ++++++++++---
 2 files changed, 20 insertions(+), 13 deletions(-)

Index: linux-pm/drivers/acpi/sleep.c
===================================================================
--- linux-pm.orig/drivers/acpi/sleep.c
+++ linux-pm/drivers/acpi/sleep.c
@@ -977,6 +977,13 @@ static int acpi_s2idle_prepare_late(void
 	return 0;
 }
 
+static void acpi_s2idle_sync(void)
+{
+	acpi_os_wait_events_complete(); /* synchronize GPE processing */
+	acpi_ec_flush_work();
+	acpi_os_wait_events_complete(); /* synchronize Notify handling */
+}
+
 static void acpi_s2idle_wake(void)
 {
 	/*
@@ -1005,9 +1012,7 @@ static void acpi_s2idle_wake(void)
 		 * The EC driver uses the system workqueue and an additional
 		 * special one, so those need to be flushed too.
 		 */
-		acpi_os_wait_events_complete(); /* synchronize EC GPE processing */
-		acpi_ec_flush_work();
-		acpi_os_wait_events_complete(); /* synchronize Notify handling */
+		acpi_s2idle_sync();
 
 		rearm_wake_irq(acpi_sci_irq);
 	}
@@ -1024,6 +1029,8 @@ static void acpi_s2idle_restore_early(vo
 
 static void acpi_s2idle_restore(void)
 {
+	acpi_s2idle_sync();
+
 	s2idle_wakeup = false;
 
 	acpi_enable_all_runtime_gpes();
Index: linux-pm/drivers/acpi/ec.c
===================================================================
--- linux-pm.orig/drivers/acpi/ec.c
+++ linux-pm/drivers/acpi/ec.c
@@ -536,13 +536,8 @@ static bool acpi_ec_query_flushed(struct
 	return flushed;
 }
 
-static void __acpi_ec_flush_event(struct acpi_ec *ec)
+static void acpi_ec_flush_query_queue(void)
 {
-	/*
-	 * When ec_freeze_events is true, we need to flush events in
-	 * the proper position before entering the noirq stage.
-	 */
-	wait_event(ec->wait, acpi_ec_query_flushed(ec));
 	if (ec_query_wq)
 		flush_workqueue(ec_query_wq);
 }
@@ -554,14 +549,19 @@ static void acpi_ec_disable_event(struct
 	spin_lock_irqsave(&ec->lock, flags);
 	__acpi_ec_disable_event(ec);
 	spin_unlock_irqrestore(&ec->lock, flags);
-	__acpi_ec_flush_event(ec);
+
+	/*
+	 * When ec_freeze_events is true, we need to flush events in
+	 * the proper position before entering the noirq stage.
+	 */
+	wait_event(ec->wait, acpi_ec_query_flushed(ec));
+
+	acpi_ec_flush_query_queue();
 }
 
 void acpi_ec_flush_work(void)
 {
-	if (first_ec)
-		__acpi_ec_flush_event(first_ec);
-
+	acpi_ec_flush_query_queue();
 	flush_scheduled_work();
 }
 #endif /* CONFIG_PM_SLEEP */




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-26 18:48                               ` Rafael J. Wysocki
@ 2019-11-26 19:03                                 ` Kenneth R. Crudup
  2019-11-26 19:09                                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 37+ messages in thread
From: Kenneth R. Crudup @ 2019-11-26 19:03 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Rafael J. Wysocki, Rafael Wysocki, Linux PM


On Tue, 26 Nov 2019, Rafael J. Wysocki wrote:

> I've just realized that it is not necessary to use acpi_ec_query_flushed() in
> acpi_ec_flush_work() at all, because it calls flush_scheduled_work() anyway,
> so the appended patch should be good to go.
...
> Can you try it, please?

IFFFF this current fingers-crossed build fails I certainly shall.

	-Kenny

-- 
Kenneth R. Crudup  Sr. SW Engineer, Scott County Consulting, Silicon Valley

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-26 19:03                                 ` Kenneth R. Crudup
@ 2019-11-26 19:09                                   ` Rafael J. Wysocki
  2019-11-26 19:13                                     ` Kenneth R. Crudup
                                                       ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Rafael J. Wysocki @ 2019-11-26 19:09 UTC (permalink / raw)
  To: Kenneth R. Crudup
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Rafael Wysocki, Linux PM

On Tue, Nov 26, 2019 at 8:03 PM Kenneth R. Crudup <kenny@panix.com> wrote:
>
>
> On Tue, 26 Nov 2019, Rafael J. Wysocki wrote:
>
> > I've just realized that it is not necessary to use acpi_ec_query_flushed() in
> > acpi_ec_flush_work() at all, because it calls flush_scheduled_work() anyway,
> > so the appended patch should be good to go.
> ...
> > Can you try it, please?
>
> IFFFF this current fingers-crossed build fails I certainly shall.

Well, it would be useful to try it anyway.

In any case, I'm going to go ahead and post the EC part of it as a
proper patch and integrate it.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-26 19:09                                   ` Rafael J. Wysocki
@ 2019-11-26 19:13                                     ` Kenneth R. Crudup
  2019-11-26 19:45                                     ` Kenneth R. Crudup
  2019-11-26 23:56                                     ` Kenneth R. Crudup
  2 siblings, 0 replies; 37+ messages in thread
From: Kenneth R. Crudup @ 2019-11-26 19:13 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Rafael J. Wysocki, Rafael Wysocki, Linux PM


On Tue, 26 Nov 2019, Rafael J. Wysocki wrote:

> In any case, I'm going to go ahead and post the EC part of it as a
> proper patch and integrate it.

Well, if you're that confident in it, OK. I try to be fragile with the machine
when I'm working 'cause recreating all my Konsoles and environment and such is
kind of a PITA when it fails to resume, but I need to take a break now anyway.

Let me try and break this current setup, git stash it and put yours in and
I'll report back in a little bit.

	-Kenny

-- 
Kenneth R. Crudup  Sr. SW Engineer, Scott County Consulting, Silicon Valley

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-26 19:09                                   ` Rafael J. Wysocki
  2019-11-26 19:13                                     ` Kenneth R. Crudup
@ 2019-11-26 19:45                                     ` Kenneth R. Crudup
  2019-11-26 23:56                                     ` Kenneth R. Crudup
  2 siblings, 0 replies; 37+ messages in thread
From: Kenneth R. Crudup @ 2019-11-26 19:45 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Rafael J. Wysocki, Rafael Wysocki, Linux PM


On Tue, 26 Nov 2019, Rafael J. Wysocki wrote:

> Well, it would be useful to try it anyway.

Running it on top of Linus' tip. Well try and break it and report back.

	-Kenny

-- 
Kenneth R. Crudup  Sr. SW Engineer, Scott County Consulting, Silicon Valley

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-26 19:09                                   ` Rafael J. Wysocki
  2019-11-26 19:13                                     ` Kenneth R. Crudup
  2019-11-26 19:45                                     ` Kenneth R. Crudup
@ 2019-11-26 23:56                                     ` Kenneth R. Crudup
  2019-11-27  2:35                                       ` Kenneth R. Crudup
  2 siblings, 1 reply; 37+ messages in thread
From: Kenneth R. Crudup @ 2019-11-26 23:56 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Rafael J. Wysocki, Rafael Wysocki, Linux PM


On Tue, 26 Nov 2019, Rafael J. Wysocki wrote:

> Well, it would be useful to try it anyway.

Welp, so far, so good- and you MUST have changed something for sure, as I'm
getting these that I never got before; I suspended my laptop at 13:50 to
head to the airport (and yeah, I'm at 35K ft sending this :) ) and I just
opened my laptop for the first time now (15:38) ;

----
$ egrep 'suspend e' /var/log/syslog
...
Nov 26 13:50:09 hp-x360n kernel: [ 4601.889382] PM: suspend entry (s2idle)

Nov 26 14:04:21 hp-x360n kernel: [ 5452.857469] PM: suspend exit
Nov 26 14:04:24 hp-x360n kernel: [ 5455.950753] PM: suspend entry (s2idle)

Nov 26 14:04:34 hp-x360n kernel: [ 5465.709205] PM: suspend exit
Nov 26 14:05:04 hp-x360n kernel: [ 5496.135123] PM: suspend entry (s2idle)

Nov 26 14:05:23 hp-x360n kernel: [ 5515.506735] PM: suspend exit
Nov 26 15:06:56 hp-x360n kernel: [ 9210.061126] PM: suspend entry (s2idle)

Nov 26 15:13:02 hp-x360n kernel: [ 9575.620949] PM: suspend exit
Nov 26 15:23:04 hp-x360n kernel: [10178.010968] PM: suspend entry (s2idle)

Nov 26 15:26:07 hp-x360n kernel: [10361.368381] PM: suspend exit
Nov 26 15:36:10 hp-x360n kernel: [10964.067995] PM: suspend entry (s2idle)

Nov 26 15:38:13 hp-x360n kernel: [11087.514354] PM: suspend exit
----

So there were some 5 times where my laptop spontaneously (I'll bet it was
that damned accelerometer) woke up, sometimes for minutes at at time, all
the while in my laptop bag. I have a recessed power button on this thing
so I don't think it was getting bumped.

But the great news about this is this maybe explains why I'll try and resume
after being mobile; whatever awakening we used to do wouldn't be handled
(again, guessing :)) and we'd freeze up.

I keep saying the best way to break something "fixed" is to pubically
announce it so, so I'ma do it- but I have a good feeling about your last
patchset.

Now- how do I go about figuring out how to keep (what I can only assume is
the EC) quiet unless I'm actually trying to work on this thing? I can go back
to "ec_no_wakeup" but it seems like we're so close, might as well keep on going!

	-Kenny

-- 
Kenneth R. Crudup  Sr. SW Engineer, Scott County Consulting, Silicon Valley

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-26 23:56                                     ` Kenneth R. Crudup
@ 2019-11-27  2:35                                       ` Kenneth R. Crudup
  2019-11-27  8:31                                         ` Rafael J. Wysocki
  0 siblings, 1 reply; 37+ messages in thread
From: Kenneth R. Crudup @ 2019-11-27  2:35 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Rafael J. Wysocki, Rafael Wysocki, Linux PM


On Tue, 26 Nov 2019, Kenneth R. Crudup wrote:

> So there were some 5 times where my laptop spontaneously (I'll bet it was
> that damned accelerometer) woke up, sometimes for minutes at at time, all
> the while in my laptop bag.
...
> But the great news about this is this maybe explains why I'll try and resume
> after being mobile; whatever awakening we used to do wouldn't be handled
> (again, guessing :)) and we'd freeze up.

So I'd just realized something- when you'd first pushed the series of commits
changing the S2idle flow, I'd sent an E-mail about some issues I'd had:

----
Date: Wed, 18 Sep 2019 18:05:53
From: Kenneth R. Crudup <kenny@panix.com>
To: rafael.j.wysocki@intel.com
Cc: linux-pm@vger.kernel.org
Message-ID: <alpine.DEB.2.21.1909181742470.2771@hp-x360n>
Subject: Help me help you debug what seems to be an EC resume issue
...
>>> Before these commits, I used to set "acpi.ec_no_wakeup=1" because the
>>> orientation sensor (at least, and probably other things) would wake up
>>> the laptop (then immediately suspend), which I'm sure was using up
>>> battery while I'm just walking around.
>>> I've turned off "ec_no_wakeup" for testing and the good news is the
>>> orientation sensor doesn't cause the laptop to draw more power when shaking it.
----

Whoops- I guess in the face of everything we now know, that was indeed a
bug and not a feature, because I'd also posted:

>>>  Randomly, if left suspended, nothing other than a hard power off will get
>>>  it back (and I can't be sure, but I think current consumption can be normal
>>>  when it suspends, but this seems to only happen if I've unplugged the
>>>  charger after suspending (so no power meter))

So now we know.

So, now that apparently (fingers still crossed, but so far, so very good)
you've squashed that resume bug, do I have any recourse about this thing
waking up for just by staring at it very hard other than "ec_no_wakeup=1"?

Personally I don't mind at all having the keyboard being the only thing
that'll take the laptop out of suspend, and the only reason I was looking
for not having to modify the EC operation was 'cause I thought having that
set meant we'd short-circuited some other s2idle optimization, and I'm
constantly trying to minimize s2idle power consumption.

If all "ec_no_wakeup=1" actually does is silently ignore EC wakeup events,
then I'll stick with it.

Thanks,

	-Kenny

-- 
Kenneth R. Crudup  Sr. SW Engineer, Scott County Consulting, Silicon Valley

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-27  2:35                                       ` Kenneth R. Crudup
@ 2019-11-27  8:31                                         ` Rafael J. Wysocki
  2019-11-27 22:30                                           ` Kenneth R. Crudup
  0 siblings, 1 reply; 37+ messages in thread
From: Rafael J. Wysocki @ 2019-11-27  8:31 UTC (permalink / raw)
  To: Kenneth R. Crudup
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Rafael Wysocki, Linux PM

On Wed, Nov 27, 2019 at 3:35 AM Kenneth R. Crudup <kenny@panix.com> wrote:
>
>
> On Tue, 26 Nov 2019, Kenneth R. Crudup wrote:
>
> > So there were some 5 times where my laptop spontaneously (I'll bet it was
> > that damned accelerometer) woke up, sometimes for minutes at at time, all
> > the while in my laptop bag.
> ...
> > But the great news about this is this maybe explains why I'll try and resume
> > after being mobile; whatever awakening we used to do wouldn't be handled
> > (again, guessing :)) and we'd freeze up.
>
> So I'd just realized something- when you'd first pushed the series of commits
> changing the S2idle flow, I'd sent an E-mail about some issues I'd had:
>
> ----
> Date: Wed, 18 Sep 2019 18:05:53
> From: Kenneth R. Crudup <kenny@panix.com>
> To: rafael.j.wysocki@intel.com
> Cc: linux-pm@vger.kernel.org
> Message-ID: <alpine.DEB.2.21.1909181742470.2771@hp-x360n>
> Subject: Help me help you debug what seems to be an EC resume issue
> ...
> >>> Before these commits, I used to set "acpi.ec_no_wakeup=1" because the
> >>> orientation sensor (at least, and probably other things) would wake up
> >>> the laptop (then immediately suspend), which I'm sure was using up
> >>> battery while I'm just walking around.
> >>> I've turned off "ec_no_wakeup" for testing and the good news is the
> >>> orientation sensor doesn't cause the laptop to draw more power when shaking it.
> ----
>
> Whoops- I guess in the face of everything we now know, that was indeed a
> bug and not a feature, because I'd also posted:
>
> >>>  Randomly, if left suspended, nothing other than a hard power off will get
> >>>  it back (and I can't be sure, but I think current consumption can be normal
> >>>  when it suspends, but this seems to only happen if I've unplugged the
> >>>  charger after suspending (so no power meter))
>
> So now we know.
>
> So, now that apparently (fingers still crossed, but so far, so very good)
> you've squashed that resume bug, do I have any recourse about this thing
> waking up for just by staring at it very hard other than "ec_no_wakeup=1"?
>
> Personally I don't mind at all having the keyboard being the only thing
> that'll take the laptop out of suspend, and the only reason I was looking
> for not having to modify the EC operation was 'cause I thought having that
> set meant we'd short-circuited some other s2idle optimization, and I'm
> constantly trying to minimize s2idle power consumption.
>
> If all "ec_no_wakeup=1" actually does is silently ignore EC wakeup events,
> then I'll stick with it.

Yes, that's all it does.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-27  8:31                                         ` Rafael J. Wysocki
@ 2019-11-27 22:30                                           ` Kenneth R. Crudup
  2019-11-28 16:25                                             ` Rafael J. Wysocki
  0 siblings, 1 reply; 37+ messages in thread
From: Kenneth R. Crudup @ 2019-11-27 22:30 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Rafael J. Wysocki, Rafael Wysocki, Linux PM


> > If all "ec_no_wakeup=1" actually does is silently ignore EC wakeup events,
> > then I'll stick with it.

> Yes, that's all it does.

Thanks.

BTW, I'm calling it on that patch- it's been working all day long. And with
ec_no_wakeup=1 no spurious wakeups, either:

----
$ fgrep "suspend e" /var/log/syslog
...
Nov 26 19:36:46 hp-x360n kernel: [  995.519483] PM: suspend entry (s2idle)
Nov 27 06:56:44 hp-x360n kernel: [41793.786663] PM: suspend exit
----

Sadly I lost ~9% battery over that period due to "modern" suspend, but at least
it's been cool the last few times I've taken it out of the bag. Wish I could
find out what's left powered up, though.

And finally, last questions about the EC:

- Is it possible to query it once it's awakened the system to find out exactly why?

- Is it possible to have the EC mask/ignore certain of those reasons?

Thanks for all your hard work,

	-Kenny

-- 
Kenneth R. Crudup  Sr. SW Engineer, Scott County Consulting, Silicon Valley

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow)
  2019-11-27 22:30                                           ` Kenneth R. Crudup
@ 2019-11-28 16:25                                             ` Rafael J. Wysocki
  0 siblings, 0 replies; 37+ messages in thread
From: Rafael J. Wysocki @ 2019-11-28 16:25 UTC (permalink / raw)
  To: Kenneth R. Crudup
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Rafael Wysocki, Linux PM

On Wed, Nov 27, 2019 at 11:30 PM Kenneth R. Crudup <kenny@panix.com> wrote:
>
>
> > > If all "ec_no_wakeup=1" actually does is silently ignore EC wakeup events,
> > > then I'll stick with it.
>
> > Yes, that's all it does.
>
> Thanks.
>
> BTW, I'm calling it on that patch- it's been working all day long. And with
> ec_no_wakeup=1 no spurious wakeups, either:
>
> ----
> $ fgrep "suspend e" /var/log/syslog
> ...
> Nov 26 19:36:46 hp-x360n kernel: [  995.519483] PM: suspend entry (s2idle)
> Nov 27 06:56:44 hp-x360n kernel: [41793.786663] PM: suspend exit
> ----
>
> Sadly I lost ~9% battery over that period due to "modern" suspend, but at least
> it's been cool the last few times I've taken it out of the bag. Wish I could
> find out what's left powered up, though.
>
> And finally, last questions about the EC:
>
> - Is it possible to query it once it's awakened the system to find out exactly why?

That's what the current code does, but it also processes everything
the EC wants it to process at that time in case there are real system
wakeup events hidden in that (which is the case on some systems).

> - Is it possible to have the EC mask/ignore certain of those reasons?

Yes, it is, at least in principle, but some extra information is
necessary for that, like for example which Notify() targets to ignore,
and that's rather system-specific.

BTW, it should be possible to enable dynamic debug in ec.c and collect
some extra output from it to see what the EC is doing, for example,
when spurious wakeups occur.

> Thanks for all your hard work,

YW

BTW2, I have a real (non-debug) patch for the $subject issue in the
works, will post it later today or tomorrow, most likely.

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, back to index

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-22  0:11 Help me fix a regression caused by 56b9918490 (PM: sleep: Simplify suspend-to-idle control flow) Kenneth R. Crudup
2019-11-22 10:12 ` Rafael J. Wysocki
2019-11-22 12:45   ` Rafael J. Wysocki
2019-11-22 17:35     ` Kenneth R. Crudup
2019-11-23 10:24     ` Kenneth R. Crudup
2019-11-24 16:02       ` Rafael J. Wysocki
2019-11-25  3:40         ` Kenneth R. Crudup
2019-11-25 13:27           ` Rafael J. Wysocki
2019-11-25 14:14             ` Rafael J. Wysocki
2019-11-25 18:27               ` Kenneth R. Crudup
2019-11-25 20:11                 ` Kenneth R. Crudup
2019-11-25 20:19                   ` Kenneth R. Crudup
2019-11-25 22:01                   ` Rafael J. Wysocki
2019-11-25 23:32                     ` Kenneth R. Crudup
2019-11-26  8:50                       ` Rafael J. Wysocki
2019-11-26 16:12                         ` Rafael J. Wysocki
2019-11-26 16:15                         ` Kenneth R. Crudup
2019-11-26 16:27                           ` Rafael J. Wysocki
2019-11-26 16:35                             ` Kenneth R. Crudup
2019-11-26 18:48                               ` Rafael J. Wysocki
2019-11-26 19:03                                 ` Kenneth R. Crudup
2019-11-26 19:09                                   ` Rafael J. Wysocki
2019-11-26 19:13                                     ` Kenneth R. Crudup
2019-11-26 19:45                                     ` Kenneth R. Crudup
2019-11-26 23:56                                     ` Kenneth R. Crudup
2019-11-27  2:35                                       ` Kenneth R. Crudup
2019-11-27  8:31                                         ` Rafael J. Wysocki
2019-11-27 22:30                                           ` Kenneth R. Crudup
2019-11-28 16:25                                             ` Rafael J. Wysocki
2019-11-25 21:47                 ` Rafael J. Wysocki
2019-11-25 16:21             ` Kenneth R. Crudup
2019-11-25 21:46               ` Rafael J. Wysocki
2019-11-25 23:02                 ` Kenneth R. Crudup
2019-11-26  8:53                   ` Rafael J. Wysocki
2019-11-25  5:50         ` Kenneth R. Crudup
2019-11-25  7:17           ` Kenneth R. Crudup
2019-11-22 17:29   ` Kenneth R. Crudup

Linux-PM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-pm/0 linux-pm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-pm linux-pm/ https://lore.kernel.org/linux-pm \
		linux-pm@vger.kernel.org
	public-inbox-index linux-pm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-pm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git