All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] PM: suspend: Upstreaming wakeup reason capture support
@ 2022-01-10 18:49 Kelly Rossmoyer
  2022-01-24 17:37 ` John Stultz
  2022-01-27 19:54 ` Rafael J. Wysocki
  0 siblings, 2 replies; 17+ messages in thread
From: Kelly Rossmoyer @ 2022-01-10 18:49 UTC (permalink / raw)
  To: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Lee Jones, Vijay Nayak
  Cc: linux-pm, linux-kernel

# Introduction

To aid optimization, troubleshooting, and attribution of battery life, the
Android kernel currently includes a set of patches which provide enhanced
visibility into kernel suspend/resume/abort behaviors.  The capabilities
and implementation of this feature have evolved significantly since an
unsuccessful attempt to upstream the original code
(https://lkml.org/lkml/2014/3/10/716), and we would like to (re)start a
conversation about upstreaming, starting with the central question: is
there support for upstreaming this set of features?

# Motivation

Of the many factors influencing battery life on Linux-powered mobile
devices, kernel suspend tends to be amongst the most impactful.  Maximizing
time spent in suspend and minimizing the frequency of net-negative suspend
cycles are both important contributors to battery life optimization.  But
enabling that optimization - and troubleshooting when things go wrong -
requires more observability of suspend/resume/abort behavior than Linux
currently provides.  While mechanisms like `/sys/power/pm_wakeup_irq` and
wakeup_source stats are useful, they are incomplete and scattered.  The
Android kernel wakeup reason patches implement significant improvements in
that area.

# Features

As of today, the active set of patches surface the following
suspend-related data:

* wakeup IRQs, including:
   * multiple IRQs if more than one is pending during resume flow
   * unmapped HW IRQs (wakeup-capable in HW) that should not be
     occurring
   * misconfigured IRQs (e.g. both enable_irq_wake() and
     IRQF_NO_SUSPEND)
   * threaded IRQs (not just the parent chip's IRQ)

* non-IRQ wakeups, including:
   * wakeups caused by an IRQ that was consumed by lower-level SW
   * wakeups from SOC architecture that don't manifest as IRQs

* abort reasons, including:
   * wakeup_source activity
   * failure to freeze userspace
   * failure to suspend devices
   * failed syscore_suspend callback

* durations from the most recent cycle, including:
   * time spent doing suspend/resume work
   * time spent in suspend

In addition to battery life optimization and troubleshooting, some of these
capabilities also lay the groundwork for efforts around improving
attribution of wakeups/aborts (e.g. to specific processes, device features,
external devices, etc).

# Shortcomings

While the core implementation (see below) is relatively straightforward and
localized, calls into that core are somewhat widely spread in order to
capture the breadth of events of interest.  The pervasiveness of those
hooks is clearly an area where improvement would be beneficial, especially
if a cleaner solution preserved equivalent capabilities.

# Existing Code

As a reference for how Android currently implements the core code for these
features (which would need a bit of work before submission even if all
features were included), see the following link:

https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline/kernel/power/wakeup_reason.c


--

Kelly Rossmoyer | Software Engineer | krossmo@google.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] PM: suspend: Upstreaming wakeup reason capture support
  2022-01-10 18:49 [RFC] PM: suspend: Upstreaming wakeup reason capture support Kelly Rossmoyer
@ 2022-01-24 17:37 ` John Stultz
  2022-01-26  5:09   ` Zichar Zhang
  2022-01-27 19:54 ` Rafael J. Wysocki
  1 sibling, 1 reply; 17+ messages in thread
From: John Stultz @ 2022-01-24 17:37 UTC (permalink / raw)
  To: Kelly Rossmoyer, Zichar Zhang
  Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Lee Jones, Vijay Nayak, linux-pm, linux-kernel, Sumit Semwal

On Tue, Jan 11, 2022 at 5:06 AM Kelly Rossmoyer <krossmo@google.com> wrote:
>
> # Introduction
>
> To aid optimization, troubleshooting, and attribution of battery life, the
> Android kernel currently includes a set of patches which provide enhanced
> visibility into kernel suspend/resume/abort behaviors.  The capabilities
> and implementation of this feature have evolved significantly since an
> unsuccessful attempt to upstream the original code
> (https://lkml.org/lkml/2014/3/10/716), and we would like to (re)start a
> conversation about upstreaming, starting with the central question: is
> there support for upstreaming this set of features?
>
> # Motivation
>
> Of the many factors influencing battery life on Linux-powered mobile
> devices, kernel suspend tends to be amongst the most impactful.  Maximizing
> time spent in suspend and minimizing the frequency of net-negative suspend
> cycles are both important contributors to battery life optimization.  But
> enabling that optimization - and troubleshooting when things go wrong -
> requires more observability of suspend/resume/abort behavior than Linux
> currently provides.  While mechanisms like `/sys/power/pm_wakeup_irq` and
> wakeup_source stats are useful, they are incomplete and scattered.  The
> Android kernel wakeup reason patches implement significant improvements in
> that area.
>
> # Features
>
> As of today, the active set of patches surface the following
> suspend-related data:
>
> * wakeup IRQs, including:
>    * multiple IRQs if more than one is pending during resume flow
>    * unmapped HW IRQs (wakeup-capable in HW) that should not be
>      occurring
>    * misconfigured IRQs (e.g. both enable_irq_wake() and
>      IRQF_NO_SUSPEND)
>    * threaded IRQs (not just the parent chip's IRQ)
>
> * non-IRQ wakeups, including:
>    * wakeups caused by an IRQ that was consumed by lower-level SW
>    * wakeups from SOC architecture that don't manifest as IRQs
>
> * abort reasons, including:
>    * wakeup_source activity
>    * failure to freeze userspace
>    * failure to suspend devices
>    * failed syscore_suspend callback
>
> * durations from the most recent cycle, including:
>    * time spent doing suspend/resume work
>    * time spent in suspend
>
> In addition to battery life optimization and troubleshooting, some of these
> capabilities also lay the groundwork for efforts around improving
> attribution of wakeups/aborts (e.g. to specific processes, device features,
> external devices, etc).
>
> # Shortcomings
>
> While the core implementation (see below) is relatively straightforward and
> localized, calls into that core are somewhat widely spread in order to
> capture the breadth of events of interest.  The pervasiveness of those
> hooks is clearly an area where improvement would be beneficial, especially
> if a cleaner solution preserved equivalent capabilities.
>
> # Existing Code
>
> As a reference for how Android currently implements the core code for these
> features (which would need a bit of work before submission even if all
> features were included), see the following link:
>
> https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline/kernel/power/wakeup_reason.c
>

Hey Kelly!
  So Zichar (added to the thread here) has been working for a little
while on his own approach to upstream a simplified version of the
wakeup_reason functionality. He's just gotten it to a place where it
can be shared, so I wanted to pull him in so he could reply with his
proposal.

thanks
-john

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] PM: suspend: Upstreaming wakeup reason capture support
  2022-01-24 17:37 ` John Stultz
@ 2022-01-26  5:09   ` Zichar Zhang
  2022-01-28  4:55     ` [PATCH 1/1] [RFC] wakeup_reason: Add infrastructure to log and report why the system resumed from suspend Zichar Zhang
  2022-01-29  7:52     ` [RFC] PM: suspend: Upstreaming wakeup reason capture support Kelly Rossmoyer
  0 siblings, 2 replies; 17+ messages in thread
From: Zichar Zhang @ 2022-01-26  5:09 UTC (permalink / raw)
  To: John Stultz
  Cc: Kelly Rossmoyer, Rafael J. Wysocki, Pavel Machek, Len Brown,
	Greg Kroah-Hartman, Lee Jones, Vijay Nayak, linux-pm,
	linux-kernel, Sumit Semwal

hello John,  Kelly!

I'm working on upstream "wakeup_reason" patch.
But I found It's a little bit hard to do that, cause it is too heavy.

The android patch put "wakeup_reason",  suspend/resume process and
interrupt subsystem together make it hard to upstream all these at once. I
think maybe we can cooperate with each other, and do it step by step.
And I make a simplified "wakeup_reason" patch as the start.

It just report the  normal IRQs which cause system wake up from suspend
and non-device "wakeups" which use kernel interface "wakeup source". It
can cover most of the situations.

Here I placed the patch, so that people can review it:
https://android-review.googlesource.com/c/kernel/common/+/1958188

Here is the differences from the requirement:

>> As of today, the active set of patches surface the following
>> suspend-related data:
>>
>> * wakeup IRQs, including:
>>    * multiple IRQs if more than one is pending during resume flow
>>    * unmapped HW IRQs (wakeup-capable in HW) that should not be
>>      occurring
>>    * misconfigured IRQs (e.g. both enable_irq_wake() and
>>      IRQF_NO_SUSPEND)
>>    * threaded IRQs (not just the parent chip's IRQ)

    -- If we do these things, the additional codes should be added in
    interrupt subsystem or some interrupt controller drivers. These
    codes are no relationship with the interrupt subsystem or the
    interrupt controllers. And we can't distinguish these IRQs from
    "non-suspend" environment, even the "Android patch" can't do that.
    So it will make the things complicated.
    -- If these IRQs do hanpen, the code in this commit will catch
    them as "unknown wakeup reason" and suggest user to check the
    kernel log for more information.
    -- I think We should cooperate with interrupt subsystem to log
    these "abnormal" IRQs. And it could be several additional
    commits to accomplish this work together then.

>> * abort reasons, including:
>>    * wakeup_source activity
>>    * failure to freeze userspace
>>    * failure to suspend devices
>>    * failed syscore_suspend callback

    -- These codes are "intrusive" to kernel (suspend/resume).
    -- These "errors" are already printed in kernel log, if we add
    these log to "wakeup_reason" either, it will cause double
    "log string" in section ".data", which just waste of the memory.
    -- As mentioned before, if these "abort action" happened, you
    can catch string "unknown wakeup reason", and check the kernel
    log then.

>> * durations from the most recent cycle, including:
>>    * time spent doing suspend/resume work
>>    * time spent in suspend

    -- Just separate these from "wakeup reason".
    It should be done in another commit.

The patch is not complete, these is the next steps:
1. add interface to show time spend in suspend/resume work.
(after this, I think it could be works and replace the android patch)
2. we need solve the "unmapped HW IRQs", "misconfigured IRQs" and
"threaded IRQs" problem.
(this is the hardest one)
3. add a string report interface for the "wakeup reason" not belongs to
any thing above.
(kernel actually didn't know the hardware wake up reason. The IRQs just
one of the reason for wake up. So it just report a "wakeup reason" from
interrupt subsystem. It just a coincidence that most hardware "wakeup
reason" is also the interupt signal.  Even the "interrupt" and "wake up"
signal are separated from each other in GIC700.
So give them a chance to report the their "wakeup reason".)

This patch can be tested in Android using the following change to AOSP:
 https://android-review.googlesource.com/c/platform/system/hardware/interfaces/+/1958666

And there is a stress test code for the interfaces in kernel:
 https://android-review.googlesource.com/c/kernel/common/+/1958189

Best
zichar


On Tue, 25 Jan 2022 at 01:38, John Stultz <john.stultz@linaro.org> wrote:
>
> On Tue, Jan 11, 2022 at 5:06 AM Kelly Rossmoyer <krossmo@google.com> wrote:
> >
> > # Introduction
> >
> > To aid optimization, troubleshooting, and attribution of battery life, the
> > Android kernel currently includes a set of patches which provide enhanced
> > visibility into kernel suspend/resume/abort behaviors.  The capabilities
> > and implementation of this feature have evolved significantly since an
> > unsuccessful attempt to upstream the original code
> > (https://lkml.org/lkml/2014/3/10/716), and we would like to (re)start a
> > conversation about upstreaming, starting with the central question: is
> > there support for upstreaming this set of features?
> >
> > # Motivation
> >
> > Of the many factors influencing battery life on Linux-powered mobile
> > devices, kernel suspend tends to be amongst the most impactful.  Maximizing
> > time spent in suspend and minimizing the frequency of net-negative suspend
> > cycles are both important contributors to battery life optimization.  But
> > enabling that optimization - and troubleshooting when things go wrong -
> > requires more observability of suspend/resume/abort behavior than Linux
> > currently provides.  While mechanisms like `/sys/power/pm_wakeup_irq` and
> > wakeup_source stats are useful, they are incomplete and scattered.  The
> > Android kernel wakeup reason patches implement significant improvements in
> > that area.
> >
> > # Features
> >
> > As of today, the active set of patches surface the following
> > suspend-related data:
> >
> > * wakeup IRQs, including:
> >    * multiple IRQs if more than one is pending during resume flow
> >    * unmapped HW IRQs (wakeup-capable in HW) that should not be
> >      occurring
> >    * misconfigured IRQs (e.g. both enable_irq_wake() and
> >      IRQF_NO_SUSPEND)
> >    * threaded IRQs (not just the parent chip's IRQ)
> >
> > * non-IRQ wakeups, including:
> >    * wakeups caused by an IRQ that was consumed by lower-level SW
> >    * wakeups from SOC architecture that don't manifest as IRQs
> >
> > * abort reasons, including:
> >    * wakeup_source activity
> >    * failure to freeze userspace
> >    * failure to suspend devices
> >    * failed syscore_suspend callback
> >
> > * durations from the most recent cycle, including:
> >    * time spent doing suspend/resume work
> >    * time spent in suspend
> >
> > In addition to battery life optimization and troubleshooting, some of these
> > capabilities also lay the groundwork for efforts around improving
> > attribution of wakeups/aborts (e.g. to specific processes, device features,
> > external devices, etc).
> >
> > # Shortcomings
> >
> > While the core implementation (see below) is relatively straightforward and
> > localized, calls into that core are somewhat widely spread in order to
> > capture the breadth of events of interest.  The pervasiveness of those
> > hooks is clearly an area where improvement would be beneficial, especially
> > if a cleaner solution preserved equivalent capabilities.
> >
> > # Existing Code
> >
> > As a reference for how Android currently implements the core code for these
> > features (which would need a bit of work before submission even if all
> > features were included), see the following link:
> >
> > https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline/kernel/power/wakeup_reason.c
> >
>
> Hey Kelly!
>   So Zichar (added to the thread here) has been working for a little
> while on his own approach to upstream a simplified version of the
> wakeup_reason functionality. He's just gotten it to a place where it
> can be shared, so I wanted to pull him in so he could reply with his
> proposal.
>
> thanks
> -john

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] PM: suspend: Upstreaming wakeup reason capture support
  2022-01-10 18:49 [RFC] PM: suspend: Upstreaming wakeup reason capture support Kelly Rossmoyer
  2022-01-24 17:37 ` John Stultz
@ 2022-01-27 19:54 ` Rafael J. Wysocki
  2022-01-27 20:10   ` Rafael J. Wysocki
  1 sibling, 1 reply; 17+ messages in thread
From: Rafael J. Wysocki @ 2022-01-27 19:54 UTC (permalink / raw)
  To: Kelly Rossmoyer
  Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Lee Jones, Vijay Nayak, Linux PM, Linux Kernel Mailing List

On Mon, Jan 10, 2022 at 7:49 PM Kelly Rossmoyer <krossmo@google.com> wrote:
>
> # Introduction
>
> To aid optimization, troubleshooting, and attribution of battery life, the
> Android kernel currently includes a set of patches which provide enhanced
> visibility into kernel suspend/resume/abort behaviors.  The capabilities
> and implementation of this feature have evolved significantly since an
> unsuccessful attempt to upstream the original code
> (https://lkml.org/lkml/2014/3/10/716), and we would like to (re)start a
> conversation about upstreaming, starting with the central question: is
> there support for upstreaming this set of features?
>
> # Motivation
>
> Of the many factors influencing battery life on Linux-powered mobile
> devices, kernel suspend tends to be amongst the most impactful.  Maximizing
> time spent in suspend and minimizing the frequency of net-negative suspend
> cycles are both important contributors to battery life optimization.  But
> enabling that optimization - and troubleshooting when things go wrong -
> requires more observability of suspend/resume/abort behavior than Linux
> currently provides.  While mechanisms like `/sys/power/pm_wakeup_irq` and
> wakeup_source stats are useful, they are incomplete and scattered.  The
> Android kernel wakeup reason patches implement significant improvements in
> that area.
>
> # Features
>
> As of today, the active set of patches surface the following
> suspend-related data:
>
> * wakeup IRQs, including:
>    * multiple IRQs if more than one is pending during resume flow
>    * unmapped HW IRQs (wakeup-capable in HW) that should not be
>      occurring
>    * misconfigured IRQs (e.g. both enable_irq_wake() and
>      IRQF_NO_SUSPEND)
>    * threaded IRQs (not just the parent chip's IRQ)
>
> * non-IRQ wakeups, including:
>    * wakeups caused by an IRQ that was consumed by lower-level SW
>    * wakeups from SOC architecture that don't manifest as IRQs
>
> * abort reasons, including:
>    * wakeup_source activity
>    * failure to freeze userspace
>    * failure to suspend devices
>    * failed syscore_suspend callback
>
> * durations from the most recent cycle, including:
>    * time spent doing suspend/resume work
>    * time spent in suspend
>
> In addition to battery life optimization and troubleshooting, some of these
> capabilities also lay the groundwork for efforts around improving
> attribution of wakeups/aborts (e.g. to specific processes, device features,
> external devices, etc).
>
> # Shortcomings
>
> While the core implementation (see below) is relatively straightforward and
> localized, calls into that core are somewhat widely spread in order to
> capture the breadth of events of interest.  The pervasiveness of those
> hooks is clearly an area where improvement would be beneficial, especially
> if a cleaner solution preserved equivalent capabilities.
>
> # Existing Code
>
> As a reference for how Android currently implements the core code for these
> features (which would need a bit of work before submission even if all
> features were included), see the following link:
>
> https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline/kernel/power/wakeup_reason.c

So as Zichar said, this is quite heavy-weight.

I'm not fundamentally against adding more infrastructure to help
identify issues related to system suspend, but there needs to be a
clear benefit associated with any change in this direction.  Also
adding significant overhead just for this purpose alone is rather out
of the question.

I would advise you to follow the suggestion to split the work into
smaller pieces and submit them one at a time, possibly starting with
the ones bringing the most significant benefits to the table.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] PM: suspend: Upstreaming wakeup reason capture support
  2022-01-27 19:54 ` Rafael J. Wysocki
@ 2022-01-27 20:10   ` Rafael J. Wysocki
  2022-01-29  8:26     ` Kelly Rossmoyer
  0 siblings, 1 reply; 17+ messages in thread
From: Rafael J. Wysocki @ 2022-01-27 20:10 UTC (permalink / raw)
  To: Kelly Rossmoyer
  Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Lee Jones, Vijay Nayak, Linux PM, Linux Kernel Mailing List

On Thu, Jan 27, 2022 at 8:54 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Mon, Jan 10, 2022 at 7:49 PM Kelly Rossmoyer <krossmo@google.com> wrote:
> >
> > # Introduction
> >
> > To aid optimization, troubleshooting, and attribution of battery life, the
> > Android kernel currently includes a set of patches which provide enhanced
> > visibility into kernel suspend/resume/abort behaviors.  The capabilities
> > and implementation of this feature have evolved significantly since an
> > unsuccessful attempt to upstream the original code
> > (https://lkml.org/lkml/2014/3/10/716), and we would like to (re)start a
> > conversation about upstreaming, starting with the central question: is
> > there support for upstreaming this set of features?
> >
> > # Motivation
> >
> > Of the many factors influencing battery life on Linux-powered mobile
> > devices, kernel suspend tends to be amongst the most impactful.  Maximizing
> > time spent in suspend and minimizing the frequency of net-negative suspend
> > cycles are both important contributors to battery life optimization.  But
> > enabling that optimization - and troubleshooting when things go wrong -
> > requires more observability of suspend/resume/abort behavior than Linux
> > currently provides.  While mechanisms like `/sys/power/pm_wakeup_irq` and
> > wakeup_source stats are useful, they are incomplete and scattered.  The
> > Android kernel wakeup reason patches implement significant improvements in
> > that area.
> >
> > # Features
> >
> > As of today, the active set of patches surface the following
> > suspend-related data:
> >
> > * wakeup IRQs, including:
> >    * multiple IRQs if more than one is pending during resume flow
> >    * unmapped HW IRQs (wakeup-capable in HW) that should not be
> >      occurring
> >    * misconfigured IRQs (e.g. both enable_irq_wake() and
> >      IRQF_NO_SUSPEND)
> >    * threaded IRQs (not just the parent chip's IRQ)
> >
> > * non-IRQ wakeups, including:
> >    * wakeups caused by an IRQ that was consumed by lower-level SW
> >    * wakeups from SOC architecture that don't manifest as IRQs
> >
> > * abort reasons, including:
> >    * wakeup_source activity
> >    * failure to freeze userspace
> >    * failure to suspend devices
> >    * failed syscore_suspend callback
> >
> > * durations from the most recent cycle, including:
> >    * time spent doing suspend/resume work
> >    * time spent in suspend
> >
> > In addition to battery life optimization and troubleshooting, some of these
> > capabilities also lay the groundwork for efforts around improving
> > attribution of wakeups/aborts (e.g. to specific processes, device features,
> > external devices, etc).
> >
> > # Shortcomings
> >
> > While the core implementation (see below) is relatively straightforward and
> > localized, calls into that core are somewhat widely spread in order to
> > capture the breadth of events of interest.  The pervasiveness of those
> > hooks is clearly an area where improvement would be beneficial, especially
> > if a cleaner solution preserved equivalent capabilities.
> >
> > # Existing Code
> >
> > As a reference for how Android currently implements the core code for these
> > features (which would need a bit of work before submission even if all
> > features were included), see the following link:
> >
> > https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline/kernel/power/wakeup_reason.c
>
> So as Zichar said, this is quite heavy-weight.
>
> I'm not fundamentally against adding more infrastructure to help
> identify issues related to system suspend, but there needs to be a
> clear benefit associated with any change in this direction.

That said, the general idea behind wakeup_source objects is that every
system wakeup event should be recorded in one of them which then can
be used for later analysis.

If there are reasons why this cannot work in general, what are they?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/1] [RFC] wakeup_reason: Add infrastructure to log and report why the system resumed from suspend.
  2022-01-26  5:09   ` Zichar Zhang
@ 2022-01-28  4:55     ` Zichar Zhang
  2022-01-28  6:56       ` kernel test robot
                         ` (3 more replies)
  2022-01-29  7:52     ` [RFC] PM: suspend: Upstreaming wakeup reason capture support Kelly Rossmoyer
  1 sibling, 4 replies; 17+ messages in thread
From: Zichar Zhang @ 2022-01-28  4:55 UTC (permalink / raw)
  To: zichar.zhang
  Cc: gregkh, john.stultz, krossmo, lee.jones, len.brown, linux-kernel,
	linux-pm, nayakvij, pavel, rafael, sumit.semwal, amit.pundir,
	kernel-team

When optimizing for power usage, it's useful to be able to track and
log each time why the system woke up from suspend. This is useful as
it helps us understand why we might be seeing more wakeups then we
expect. For a while now, Android has carried the "wakeup_reasons"
patch which provides this to allow developers to optimize battery life
on devices.

This patch tries to provide a simplified version of the
Android wakeup_reasons functionality. It tracks both software
wakeup_sources as well as IRQS that brought the device out of suspend,
and exposes these values as a string via /sys/power/last_wakeup_reason.

This differs slightly from the Android patch, in that it doesn't track
the suspend-abort reason logging, the misconfigured or unmapped IRQS,
the threaded IRQS, and time spent in suspend/resume. But it provides
an initial simple step to add a useful portion of the logic.

Here is the requirements from https://lkml.org/lkml/2022/1/10/1134 for
"wakeup_reason" with line head "[Y]" and "[N]" to notify if this commit
meet the requirements or not. And if it is not, the detail reason is
right behind it after "--".

* wakeup IRQs, including:
    [Y]* multiple IRQs if more than one is pending during resume flow
    [N]* unmapped HW IRQs
    [N]* misconfigured IRQs
    [N]* threaded IRQs (not just the parent chip's IRQ)
    -- If we do these things, the additional codes should be added in
    interrupt subsystem or some interrupt controller drivers. These
    codes are no relationship with the interrupt subsystem or the
    interrupt controllers. And we can't distinguish these IRQs from
    "non-suspend" environment, even the "Android patch" can't do that.
    So it will make the things complicated.
    -- If these IRQs do hanpen, the code in this commit will catch
    them as "unknown wakeup reason" and suggest user to check the
    kernel log for more information.
    -- I think We should cooperate with interrupt subsystem to log
    these "abnormal" IRQs. And it could be several additional
    commits to accomplish this work together then.

* non-IRQ wakeups, including:
    [Y]* wakeups caused by an IRQ that was consumed by lower-level SW
    [Y]* wakeups from SOC architecture that don't manifest as IRQs

* abort reasons, including:
    [N]* wakeup_source activity
    [N]* failure to freeze userspace
    [N]* failure to suspend devices
    [N]* failed syscore_suspend callback
    -- These codes are "intrusive" to kernel (suspend/resume).
    -- These "errors" are already printed in kernel log, if we add
    these log to "wakeup_reason" either, it will cause double
    "log string" in section ".data", which just waste of the memory.
    -- As mentioned before, if these "abort action" happened, you
    can catch string "unknown wakeup reason", and check the kernel
    log then.

* durations from the most recent cycle, including:
    [N]* time spent doing suspend/resume work
    [N]* time spent in suspend
    -- Just separate these from "wakeup reason". It should be done
    in another commit.

This patch can be tested in Android using the following change to AOSP:
  https://android-review.googlesource.com/c/platform/system/hardware/interfaces/+/1958666

And there is a stress test code for the interfaces in kernel:
  https://android-review.googlesource.com/c/kernel/common/+/1958189

Any feedback or thoughts would be welcome!

Signed-off-by: Zichar Zhang <zichar.zhang@linaro.org>
Change-Id: Id70f3cbec15f24ea999d7f643e9589be9c047f2b
---
 Documentation/ABI/testing/sysfs-power |   9 ++
 drivers/base/power/wakeup.c           |   6 +
 include/linux/wakeup_reason.h         |  35 +++++
 kernel/power/Makefile                 |   1 +
 kernel/power/main.c                   |  11 ++
 kernel/power/wakeup_reason.c          | 183 ++++++++++++++++++++++++++
 6 files changed, 245 insertions(+)
 create mode 100644 include/linux/wakeup_reason.h
 create mode 100644 kernel/power/wakeup_reason.c

diff --git a/Documentation/ABI/testing/sysfs-power b/Documentation/ABI/testing/sysfs-power
index 90ec4987074b..e79d7a49a24e 100644
--- a/Documentation/ABI/testing/sysfs-power
+++ b/Documentation/ABI/testing/sysfs-power
@@ -182,6 +182,15 @@ Description:
 		to a sleep state if any wakeup events are reported after the
 		write has returned.
 
+What:		/sys/power/last_wakeup_reason
+Date:		Jan 2022
+Contact:	Zichar Zhang <zichar.zhang@linaro.org>
+Description:
+		The /sys/power/last_wakeup_reason file shows the last reason
+		causing system "wake up" from suspend state. It could be an
+		interrupt signal, a kernel "wakeup_source" or just some other
+		reason logged into this file, and shows as a string.
+
 What:		/sys/power/reserved_size
 Date:		May 2011
 Contact:	Rafael J. Wysocki <rjw@rjwysocki.net>
diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
index 99bda0da23a8..a91ee1b8c0df 100644
--- a/drivers/base/power/wakeup.c
+++ b/drivers/base/power/wakeup.c
@@ -15,6 +15,7 @@
 #include <linux/seq_file.h>
 #include <linux/debugfs.h>
 #include <linux/pm_wakeirq.h>
+#include <linux/wakeup_reason.h>
 #include <trace/events/power.h>
 
 #include "power.h"
@@ -924,6 +925,7 @@ bool pm_wakeup_pending(void)
 
 	if (ret) {
 		pm_pr_dbg("Wakeup pending, aborting suspend\n");
+		log_ws_wakeup_reason();
 		pm_print_active_wakeup_sources();
 	}
 
@@ -947,11 +949,15 @@ void pm_wakeup_clear(bool reset)
 	pm_wakeup_irq = 0;
 	if (reset)
 		atomic_set(&pm_abort_suspend, 0);
+
+	clear_wakeup_reason();
 }
 
 void pm_system_irq_wakeup(unsigned int irq_number)
 {
 	if (pm_wakeup_irq == 0) {
+		log_irq_wakeup_reason(irq_number);
+
 		pm_wakeup_irq = irq_number;
 		pm_system_wakeup();
 	}
diff --git a/include/linux/wakeup_reason.h b/include/linux/wakeup_reason.h
new file mode 100644
index 000000000000..8d6e7a0e0bad
--- /dev/null
+++ b/include/linux/wakeup_reason.h
@@ -0,0 +1,35 @@
+/*
+ * include/linux/wakeup_reason.h
+ *
+ * Logs the reason which caused the kernel to resume
+ * from the suspend mode.
+ *
+ * Copyright (C) 2021 Linaro, Inc.
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _LINUX_WAKEUP_REASON_H
+#define _LINUX_WAKEUP_REASON_H
+
+#define MAX_WAKEUP_REASON_STR_LEN 256
+
+#ifdef CONFIG_SUSPEND
+ssize_t log_ws_wakeup_reason(void);
+ssize_t log_irq_wakeup_reason(unsigned int irq_number);
+void clear_wakeup_reason(void);
+ssize_t last_wakeup_reason_get(char *buf, ssize_t max);
+#else
+ssize_t log_ws_wakeup_reason(void) { }
+ssize_t log_irq_wakeup_reason(unsigned int irq_number) { }
+void clear_wakeup_reason(void) { }
+ssize_t last_wakeup_reason_get(char *buf, ssize_t max) { }
+#endif
+
+#endif /* _LINUX_WAKEUP_REASON_H */
diff --git a/kernel/power/Makefile b/kernel/power/Makefile
index 5899260a8bef..73d77edc6a08 100644
--- a/kernel/power/Makefile
+++ b/kernel/power/Makefile
@@ -14,6 +14,7 @@ obj-$(CONFIG_HIBERNATION)	+= hibernate.o snapshot.o swap.o
 obj-$(CONFIG_HIBERNATION_SNAPSHOT_DEV) += user.o
 obj-$(CONFIG_PM_AUTOSLEEP)	+= autosleep.o
 obj-$(CONFIG_PM_WAKELOCKS)	+= wakelock.o
+obj-$(CONFIG_PM_SLEEP)		+= wakeup_reason.o
 
 obj-$(CONFIG_MAGIC_SYSRQ)	+= poweroff.o
 
diff --git a/kernel/power/main.c b/kernel/power/main.c
index 44169f3081fd..859af097813c 100644
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -16,6 +16,7 @@
 #include <linux/suspend.h>
 #include <linux/syscalls.h>
 #include <linux/pm_runtime.h>
+#include <linux/wakeup_reason.h>
 
 #include "power.h"
 
@@ -739,6 +740,15 @@ static ssize_t wakeup_count_store(struct kobject *kobj,
 
 power_attr(wakeup_count);
 
+static ssize_t last_wakeup_reason_show(struct kobject *kobj,
+				struct kobj_attribute *attr,
+				char *buf)
+{
+	return last_wakeup_reason_get(buf, PAGE_SIZE);
+}
+
+power_attr_ro(last_wakeup_reason);
+
 #ifdef CONFIG_PM_AUTOSLEEP
 static ssize_t autosleep_show(struct kobject *kobj,
 			      struct kobj_attribute *attr,
@@ -892,6 +902,7 @@ static struct attribute * g[] = {
 #ifdef CONFIG_PM_SLEEP
 	&pm_async_attr.attr,
 	&wakeup_count_attr.attr,
+	&last_wakeup_reason_attr.attr,
 #ifdef CONFIG_SUSPEND
 	&mem_sleep_attr.attr,
 	&sync_on_suspend_attr.attr,
diff --git a/kernel/power/wakeup_reason.c b/kernel/power/wakeup_reason.c
new file mode 100644
index 000000000000..38d5f5e4d373
--- /dev/null
+++ b/kernel/power/wakeup_reason.c
@@ -0,0 +1,183 @@
+/*
+ * driver/base/power/wakeup_reason.c
+ *
+ * Logs the reasons which caused the kernel to resume from
+ * the suspend mode.
+ *
+ * Copyright (C) 2021 Linaro, Inc.
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/kernel.h>
+#include <linux/irq.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/init.h>
+#include <linux/spinlock.h>
+#include <linux/notifier.h>
+#include <linux/suspend.h>
+#include <linux/wakeup_reason.h>
+
+static DEFINE_SPINLOCK(wakeup_reason_lock);
+
+static bool capture_reasons;
+static char wakeup_reason_str[MAX_WAKEUP_REASON_STR_LEN];
+
+ssize_t log_ws_wakeup_reason(void)
+{
+	struct wakeup_source *ws, *last_active_ws = NULL;
+	int idx, max, len = 0;
+	bool active = false;
+	unsigned long flags;
+
+	spin_lock_irqsave(&wakeup_reason_lock, flags);
+
+	if (!capture_reasons) {
+		goto out;
+	}
+
+	idx = wakeup_sources_read_lock();
+	max = MAX_WAKEUP_REASON_STR_LEN;
+	for_each_wakeup_source(ws) {
+		if (ws->active && len < max) {
+			if (!active)
+				len += scnprintf(wakeup_reason_str, max,
+						"Pending Wakeup Sources: ");
+			len += scnprintf(wakeup_reason_str + len, max - len,
+				"%s ", ws->name);
+			active = true;
+		} else if (!active &&
+			   (!last_active_ws ||
+			    ktime_to_ns(ws->last_time) >
+			    ktime_to_ns(last_active_ws->last_time))) {
+			last_active_ws = ws;
+		}
+	}
+	if (!active && last_active_ws) {
+		len = scnprintf(wakeup_reason_str, max,
+				"Last active Wakeup Source: %s",
+				last_active_ws->name);
+	}
+	len += scnprintf(wakeup_reason_str + len, max - len, "\n");
+	wakeup_sources_read_unlock(idx);
+
+out:
+	spin_unlock_irqrestore(&wakeup_reason_lock, flags);
+
+	return len;
+}
+EXPORT_SYMBOL(log_ws_wakeup_reason);
+
+ssize_t log_irq_wakeup_reason(unsigned int irq_number)
+{
+	int len = 0;
+	struct irq_desc *desc;
+	const char *name = "null";
+	unsigned long flags;
+
+	desc = irq_to_desc(irq_number);
+	if (desc == NULL)
+		name = "stray irq";
+	else if (desc->action && desc->action->name)
+		name = desc->action->name;
+
+	spin_lock_irqsave(&wakeup_reason_lock, flags);
+
+	len = strnlen(wakeup_reason_str, MAX_WAKEUP_REASON_STR_LEN);
+	len += scnprintf(wakeup_reason_str + len,
+			MAX_WAKEUP_REASON_STR_LEN - len,
+			"%d %s\n", irq_number, name);
+
+	spin_unlock_irqrestore(&wakeup_reason_lock, flags);
+
+	return len;
+}
+EXPORT_SYMBOL(log_irq_wakeup_reason);
+
+void clear_wakeup_reason(void)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&wakeup_reason_lock, flags);
+
+	memset(wakeup_reason_str, 0, sizeof(wakeup_reason_str));
+
+	spin_unlock_irqrestore(&wakeup_reason_lock, flags);
+}
+EXPORT_SYMBOL(clear_wakeup_reason);
+
+ssize_t last_wakeup_reason_get(char *buf, ssize_t max)
+{
+	ssize_t len, size = 0;
+	unsigned long flags;
+
+	if (!buf) {
+		return 0;
+	}
+
+	spin_lock_irqsave(&wakeup_reason_lock, flags);
+
+	len = strnlen(wakeup_reason_str, MAX_WAKEUP_REASON_STR_LEN);
+	if (len > 0) {
+		size = scnprintf(buf, max, "%s", wakeup_reason_str);
+	} else {
+		size = -ENODATA;
+	}
+
+	spin_unlock_irqrestore(&wakeup_reason_lock, flags);
+
+	return size;
+}
+EXPORT_SYMBOL(last_wakeup_reason_get);
+
+static int wakeup_reason_pm_event(struct notifier_block *notifier,
+		unsigned long pm_event, void *unused)
+{
+	unsigned long flags;
+
+	switch (pm_event) {
+	case PM_SUSPEND_PREPARE:
+		spin_lock_irqsave(&wakeup_reason_lock, flags);
+		capture_reasons = true;
+		spin_unlock_irqrestore(&wakeup_reason_lock, flags);
+
+		clear_wakeup_reason();
+		break;
+	case PM_POST_SUSPEND:
+		spin_lock_irqsave(&wakeup_reason_lock, flags);
+		capture_reasons = false;
+		if (!strnlen(wakeup_reason_str, MAX_WAKEUP_REASON_STR_LEN)) {
+			scnprintf(wakeup_reason_str, MAX_WAKEUP_REASON_STR_LEN,
+				"unknown wakeup reason, please check the kernel log\n");
+		}
+		spin_unlock_irqrestore(&wakeup_reason_lock, flags);
+
+		pr_debug("Resume caused by %s\n", wakeup_reason_str);
+		break;
+	default:
+		break;
+	}
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block wakeup_reason_pm_notifier_block = {
+	.notifier_call = wakeup_reason_pm_event,
+};
+
+static int __init wakeup_reason_init(void)
+{
+	if (register_pm_notifier(&wakeup_reason_pm_notifier_block)) {
+		pr_warn("[%s] failed to register PM notifier\n", __func__);
+		return -EPERM;
+	}
+
+	return 0;
+}
+late_initcall(wakeup_reason_init);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] [RFC] wakeup_reason: Add infrastructure to log and report why the system resumed from suspend.
  2022-01-28  4:55     ` [PATCH 1/1] [RFC] wakeup_reason: Add infrastructure to log and report why the system resumed from suspend Zichar Zhang
@ 2022-01-28  6:56       ` kernel test robot
  2022-01-28  7:01       ` Greg KH
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: kernel test robot @ 2022-01-28  6:56 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 4345 bytes --]

Hi Zichar,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on rafael-pm/linux-next]
[also build test WARNING on linux/master linus/master v5.17-rc1 next-20220127]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Zichar-Zhang/wakeup_reason-Add-infrastructure-to-log-and-report-why-the-system-resumed-from-suspend/20220128-125627
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: riscv-allmodconfig (https://download.01.org/0day-ci/archive/20220128/202201281421.yoAJA4z1-lkp(a)intel.com/config)
compiler: riscv64-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/b932c9637e3e3a14497ff60caedfc67608d93c13
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Zichar-Zhang/wakeup_reason-Add-infrastructure-to-log-and-report-why-the-system-resumed-from-suspend/20220128-125627
        git checkout b932c9637e3e3a14497ff60caedfc67608d93c13
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=riscv SHELL=/bin/bash kernel/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from kernel/power/main.c:19:
>> include/linux/wakeup_reason.h:29:9: warning: no previous prototype for 'log_ws_wakeup_reason' [-Wmissing-prototypes]
      29 | ssize_t log_ws_wakeup_reason(void) { }
         |         ^~~~~~~~~~~~~~~~~~~~
>> include/linux/wakeup_reason.h:30:9: warning: no previous prototype for 'log_irq_wakeup_reason' [-Wmissing-prototypes]
      30 | ssize_t log_irq_wakeup_reason(unsigned int irq_number) { }
         |         ^~~~~~~~~~~~~~~~~~~~~
>> include/linux/wakeup_reason.h:31:6: warning: no previous prototype for 'clear_wakeup_reason' [-Wmissing-prototypes]
      31 | void clear_wakeup_reason(void) { }
         |      ^~~~~~~~~~~~~~~~~~~
>> include/linux/wakeup_reason.h:32:9: warning: no previous prototype for 'last_wakeup_reason_get' [-Wmissing-prototypes]
      32 | ssize_t last_wakeup_reason_get(char *buf, ssize_t max) { }
         |         ^~~~~~~~~~~~~~~~~~~~~~
   include/linux/wakeup_reason.h: In function 'log_ws_wakeup_reason':
   include/linux/wakeup_reason.h:29:38: error: control reaches end of non-void function [-Werror=return-type]
      29 | ssize_t log_ws_wakeup_reason(void) { }
         |                                      ^
   include/linux/wakeup_reason.h: In function 'log_irq_wakeup_reason':
   include/linux/wakeup_reason.h:30:58: error: control reaches end of non-void function [-Werror=return-type]
      30 | ssize_t log_irq_wakeup_reason(unsigned int irq_number) { }
         |                                                          ^
   include/linux/wakeup_reason.h: In function 'last_wakeup_reason_get':
   include/linux/wakeup_reason.h:32:58: error: control reaches end of non-void function [-Werror=return-type]
      32 | ssize_t last_wakeup_reason_get(char *buf, ssize_t max) { }
         |                                                          ^
   cc1: some warnings being treated as errors


vim +/log_ws_wakeup_reason +29 include/linux/wakeup_reason.h

    22	
    23	#ifdef CONFIG_SUSPEND
    24	ssize_t log_ws_wakeup_reason(void);
    25	ssize_t log_irq_wakeup_reason(unsigned int irq_number);
    26	void clear_wakeup_reason(void);
    27	ssize_t last_wakeup_reason_get(char *buf, ssize_t max);
    28	#else
  > 29	ssize_t log_ws_wakeup_reason(void) { }
  > 30	ssize_t log_irq_wakeup_reason(unsigned int irq_number) { }
  > 31	void clear_wakeup_reason(void) { }
  > 32	ssize_t last_wakeup_reason_get(char *buf, ssize_t max) { }
    33	#endif
    34	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] [RFC] wakeup_reason: Add infrastructure to log and report why the system resumed from suspend.
  2022-01-28  4:55     ` [PATCH 1/1] [RFC] wakeup_reason: Add infrastructure to log and report why the system resumed from suspend Zichar Zhang
  2022-01-28  6:56       ` kernel test robot
@ 2022-01-28  7:01       ` Greg KH
  2022-01-28  8:43         ` Zichar Zhang
  2022-01-28  8:45         ` kernel test robot
  2022-01-28  9:32       ` Lee Jones
  3 siblings, 1 reply; 17+ messages in thread
From: Greg KH @ 2022-01-28  7:01 UTC (permalink / raw)
  To: Zichar Zhang
  Cc: john.stultz, krossmo, lee.jones, len.brown, linux-kernel,
	linux-pm, nayakvij, pavel, rafael, sumit.semwal, amit.pundir,
	kernel-team

On Fri, Jan 28, 2022 at 12:55:22PM +0800, Zichar Zhang wrote:
> 
> Signed-off-by: Zichar Zhang <zichar.zhang@linaro.org>
> Change-Id: Id70f3cbec15f24ea999d7f643e9589be9c047f2b

Please always run checkpatch.pl on your submissions so that you do not
get people asking you to run checkpatch.pl on your submissions.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] [RFC] wakeup_reason: Add infrastructure to log and report why the system resumed from suspend.
  2022-01-28  7:01       ` Greg KH
@ 2022-01-28  8:43         ` Zichar Zhang
  0 siblings, 0 replies; 17+ messages in thread
From: Zichar Zhang @ 2022-01-28  8:43 UTC (permalink / raw)
  To: Greg KH
  Cc: john.stultz, krossmo, lee.jones, len.brown, linux-kernel,
	linux-pm, nayakvij, pavel, rafael, sumit.semwal, amit.pundir,
	kernel-team

Sorry, I'll do that and resend to this email thread.

On Fri, 28 Jan 2022 at 15:01, Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Fri, Jan 28, 2022 at 12:55:22PM +0800, Zichar Zhang wrote:
> >
> > Signed-off-by: Zichar Zhang <zichar.zhang@linaro.org>
> > Change-Id: Id70f3cbec15f24ea999d7f643e9589be9c047f2b
>
> Please always run checkpatch.pl on your submissions so that you do not
> get people asking you to run checkpatch.pl on your submissions.
>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] [RFC] wakeup_reason: Add infrastructure to log and report why the system resumed from suspend.
  2022-01-28  4:55     ` [PATCH 1/1] [RFC] wakeup_reason: Add infrastructure to log and report why the system resumed from suspend Zichar Zhang
@ 2022-01-28  8:45         ` kernel test robot
  2022-01-28  7:01       ` Greg KH
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: kernel test robot @ 2022-01-28  8:45 UTC (permalink / raw)
  To: Zichar Zhang; +Cc: llvm, kbuild-all

Hi Zichar,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on rafael-pm/linux-next]
[also build test WARNING on linux/master linus/master v5.17-rc1 next-20220127]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Zichar-Zhang/wakeup_reason-Add-infrastructure-to-log-and-report-why-the-system-resumed-from-suspend/20220128-125627
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: i386-randconfig-a014-20220124 (https://download.01.org/0day-ci/archive/20220128/202201281645.eC7tWb3B-lkp@intel.com/config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 33b45ee44b1f32ffdbc995e6fec806271b4b3ba4)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/b932c9637e3e3a14497ff60caedfc67608d93c13
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Zichar-Zhang/wakeup_reason-Add-infrastructure-to-log-and-report-why-the-system-resumed-from-suspend/20220128-125627
        git checkout b932c9637e3e3a14497ff60caedfc67608d93c13
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash kernel/power/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from kernel/power/main.c:19:
>> include/linux/wakeup_reason.h:29:9: warning: no previous prototype for function 'log_ws_wakeup_reason' [-Wmissing-prototypes]
   ssize_t log_ws_wakeup_reason(void) { }
           ^
   include/linux/wakeup_reason.h:29:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   ssize_t log_ws_wakeup_reason(void) { }
   ^
   static 
   include/linux/wakeup_reason.h:29:38: error: non-void function does not return a value [-Werror,-Wreturn-type]
   ssize_t log_ws_wakeup_reason(void) { }
                                        ^
>> include/linux/wakeup_reason.h:30:9: warning: no previous prototype for function 'log_irq_wakeup_reason' [-Wmissing-prototypes]
   ssize_t log_irq_wakeup_reason(unsigned int irq_number) { }
           ^
   include/linux/wakeup_reason.h:30:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   ssize_t log_irq_wakeup_reason(unsigned int irq_number) { }
   ^
   static 
   include/linux/wakeup_reason.h:30:58: error: non-void function does not return a value [-Werror,-Wreturn-type]
   ssize_t log_irq_wakeup_reason(unsigned int irq_number) { }
                                                            ^
>> include/linux/wakeup_reason.h:31:6: warning: no previous prototype for function 'clear_wakeup_reason' [-Wmissing-prototypes]
   void clear_wakeup_reason(void) { }
        ^
   include/linux/wakeup_reason.h:31:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void clear_wakeup_reason(void) { }
   ^
   static 
>> include/linux/wakeup_reason.h:32:9: warning: no previous prototype for function 'last_wakeup_reason_get' [-Wmissing-prototypes]
   ssize_t last_wakeup_reason_get(char *buf, ssize_t max) { }
           ^
   include/linux/wakeup_reason.h:32:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   ssize_t last_wakeup_reason_get(char *buf, ssize_t max) { }
   ^
   static 
   include/linux/wakeup_reason.h:32:58: error: non-void function does not return a value [-Werror,-Wreturn-type]
   ssize_t last_wakeup_reason_get(char *buf, ssize_t max) { }
                                                            ^
   4 warnings and 3 errors generated.


vim +/log_ws_wakeup_reason +29 include/linux/wakeup_reason.h

    22	
    23	#ifdef CONFIG_SUSPEND
    24	ssize_t log_ws_wakeup_reason(void);
    25	ssize_t log_irq_wakeup_reason(unsigned int irq_number);
    26	void clear_wakeup_reason(void);
    27	ssize_t last_wakeup_reason_get(char *buf, ssize_t max);
    28	#else
  > 29	ssize_t log_ws_wakeup_reason(void) { }
  > 30	ssize_t log_irq_wakeup_reason(unsigned int irq_number) { }
  > 31	void clear_wakeup_reason(void) { }
  > 32	ssize_t last_wakeup_reason_get(char *buf, ssize_t max) { }
    33	#endif
    34	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] [RFC] wakeup_reason: Add infrastructure to log and report why the system resumed from suspend.
@ 2022-01-28  8:45         ` kernel test robot
  0 siblings, 0 replies; 17+ messages in thread
From: kernel test robot @ 2022-01-28  8:45 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 4897 bytes --]

Hi Zichar,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on rafael-pm/linux-next]
[also build test WARNING on linux/master linus/master v5.17-rc1 next-20220127]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Zichar-Zhang/wakeup_reason-Add-infrastructure-to-log-and-report-why-the-system-resumed-from-suspend/20220128-125627
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: i386-randconfig-a014-20220124 (https://download.01.org/0day-ci/archive/20220128/202201281645.eC7tWb3B-lkp(a)intel.com/config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 33b45ee44b1f32ffdbc995e6fec806271b4b3ba4)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/b932c9637e3e3a14497ff60caedfc67608d93c13
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Zichar-Zhang/wakeup_reason-Add-infrastructure-to-log-and-report-why-the-system-resumed-from-suspend/20220128-125627
        git checkout b932c9637e3e3a14497ff60caedfc67608d93c13
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash kernel/power/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from kernel/power/main.c:19:
>> include/linux/wakeup_reason.h:29:9: warning: no previous prototype for function 'log_ws_wakeup_reason' [-Wmissing-prototypes]
   ssize_t log_ws_wakeup_reason(void) { }
           ^
   include/linux/wakeup_reason.h:29:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   ssize_t log_ws_wakeup_reason(void) { }
   ^
   static 
   include/linux/wakeup_reason.h:29:38: error: non-void function does not return a value [-Werror,-Wreturn-type]
   ssize_t log_ws_wakeup_reason(void) { }
                                        ^
>> include/linux/wakeup_reason.h:30:9: warning: no previous prototype for function 'log_irq_wakeup_reason' [-Wmissing-prototypes]
   ssize_t log_irq_wakeup_reason(unsigned int irq_number) { }
           ^
   include/linux/wakeup_reason.h:30:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   ssize_t log_irq_wakeup_reason(unsigned int irq_number) { }
   ^
   static 
   include/linux/wakeup_reason.h:30:58: error: non-void function does not return a value [-Werror,-Wreturn-type]
   ssize_t log_irq_wakeup_reason(unsigned int irq_number) { }
                                                            ^
>> include/linux/wakeup_reason.h:31:6: warning: no previous prototype for function 'clear_wakeup_reason' [-Wmissing-prototypes]
   void clear_wakeup_reason(void) { }
        ^
   include/linux/wakeup_reason.h:31:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void clear_wakeup_reason(void) { }
   ^
   static 
>> include/linux/wakeup_reason.h:32:9: warning: no previous prototype for function 'last_wakeup_reason_get' [-Wmissing-prototypes]
   ssize_t last_wakeup_reason_get(char *buf, ssize_t max) { }
           ^
   include/linux/wakeup_reason.h:32:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   ssize_t last_wakeup_reason_get(char *buf, ssize_t max) { }
   ^
   static 
   include/linux/wakeup_reason.h:32:58: error: non-void function does not return a value [-Werror,-Wreturn-type]
   ssize_t last_wakeup_reason_get(char *buf, ssize_t max) { }
                                                            ^
   4 warnings and 3 errors generated.


vim +/log_ws_wakeup_reason +29 include/linux/wakeup_reason.h

    22	
    23	#ifdef CONFIG_SUSPEND
    24	ssize_t log_ws_wakeup_reason(void);
    25	ssize_t log_irq_wakeup_reason(unsigned int irq_number);
    26	void clear_wakeup_reason(void);
    27	ssize_t last_wakeup_reason_get(char *buf, ssize_t max);
    28	#else
  > 29	ssize_t log_ws_wakeup_reason(void) { }
  > 30	ssize_t log_irq_wakeup_reason(unsigned int irq_number) { }
  > 31	void clear_wakeup_reason(void) { }
  > 32	ssize_t last_wakeup_reason_get(char *buf, ssize_t max) { }
    33	#endif
    34	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/1] [RFC] wakeup_reason: Add infrastructure to log and report why the system resumed from suspend.
  2022-01-28  4:55     ` [PATCH 1/1] [RFC] wakeup_reason: Add infrastructure to log and report why the system resumed from suspend Zichar Zhang
                         ` (2 preceding siblings ...)
  2022-01-28  8:45         ` kernel test robot
@ 2022-01-28  9:32       ` Lee Jones
  3 siblings, 0 replies; 17+ messages in thread
From: Lee Jones @ 2022-01-28  9:32 UTC (permalink / raw)
  To: Zichar Zhang
  Cc: gregkh, john.stultz, krossmo, len.brown, linux-kernel, linux-pm,
	nayakvij, pavel, rafael, sumit.semwal, amit.pundir, kernel-team

Just a little surface review.

I'll let the big people (SMEs) do the technical bit.

On Fri, 28 Jan 2022, Zichar Zhang wrote:

> When optimizing for power usage, it's useful to be able to track and
> log each time why the system woke up from suspend. This is useful as
> it helps us understand why we might be seeing more wakeups then we
> expect. For a while now, Android has carried the "wakeup_reasons"
> patch which provides this to allow developers to optimize battery life
> on devices.
> 
> This patch tries to provide a simplified version of the
> Android wakeup_reasons functionality. It tracks both software
> wakeup_sources as well as IRQS that brought the device out of suspend,
> and exposes these values as a string via /sys/power/last_wakeup_reason.
> 
> This differs slightly from the Android patch, in that it doesn't track
> the suspend-abort reason logging, the misconfigured or unmapped IRQS,
> the threaded IRQS, and time spent in suspend/resume. But it provides
> an initial simple step to add a useful portion of the logic.
> 
> Here is the requirements from https://lkml.org/lkml/2022/1/10/1134 for
> "wakeup_reason" with line head "[Y]" and "[N]" to notify if this commit
> meet the requirements or not. And if it is not, the detail reason is
> right behind it after "--".
> 
> * wakeup IRQs, including:
>     [Y]* multiple IRQs if more than one is pending during resume flow
>     [N]* unmapped HW IRQs
>     [N]* misconfigured IRQs
>     [N]* threaded IRQs (not just the parent chip's IRQ)
>     -- If we do these things, the additional codes should be added in
>     interrupt subsystem or some interrupt controller drivers. These
>     codes are no relationship with the interrupt subsystem or the
>     interrupt controllers. And we can't distinguish these IRQs from
>     "non-suspend" environment, even the "Android patch" can't do that.
>     So it will make the things complicated.
>     -- If these IRQs do hanpen, the code in this commit will catch
>     them as "unknown wakeup reason" and suggest user to check the
>     kernel log for more information.
>     -- I think We should cooperate with interrupt subsystem to log
>     these "abnormal" IRQs. And it could be several additional
>     commits to accomplish this work together then.
> 
> * non-IRQ wakeups, including:
>     [Y]* wakeups caused by an IRQ that was consumed by lower-level SW
>     [Y]* wakeups from SOC architecture that don't manifest as IRQs
> 
> * abort reasons, including:
>     [N]* wakeup_source activity
>     [N]* failure to freeze userspace
>     [N]* failure to suspend devices
>     [N]* failed syscore_suspend callback
>     -- These codes are "intrusive" to kernel (suspend/resume).
>     -- These "errors" are already printed in kernel log, if we add
>     these log to "wakeup_reason" either, it will cause double
>     "log string" in section ".data", which just waste of the memory.
>     -- As mentioned before, if these "abort action" happened, you
>     can catch string "unknown wakeup reason", and check the kernel
>     log then.
> 
> * durations from the most recent cycle, including:
>     [N]* time spent doing suspend/resume work
>     [N]* time spent in suspend
>     -- Just separate these from "wakeup reason". It should be done
>     in another commit.
> 
> This patch can be tested in Android using the following change to AOSP:
>   https://android-review.googlesource.com/c/platform/system/hardware/interfaces/+/1958666
> 
> And there is a stress test code for the interfaces in kernel:
>   https://android-review.googlesource.com/c/kernel/common/+/1958189
> 
> Any feedback or thoughts would be welcome!
> 
> Signed-off-by: Zichar Zhang <zichar.zhang@linaro.org>

You should probably tip your hat to the original authors at one
point.

> Change-Id: Id70f3cbec15f24ea999d7f643e9589be9c047f2b

No Androidness please.

> ---
>  Documentation/ABI/testing/sysfs-power |   9 ++
>  drivers/base/power/wakeup.c           |   6 +
>  include/linux/wakeup_reason.h         |  35 +++++
>  kernel/power/Makefile                 |   1 +
>  kernel/power/main.c                   |  11 ++
>  kernel/power/wakeup_reason.c          | 183 ++++++++++++++++++++++++++
>  6 files changed, 245 insertions(+)
>  create mode 100644 include/linux/wakeup_reason.h
>  create mode 100644 kernel/power/wakeup_reason.c
> 
> diff --git a/Documentation/ABI/testing/sysfs-power b/Documentation/ABI/testing/sysfs-power
> index 90ec4987074b..e79d7a49a24e 100644
> --- a/Documentation/ABI/testing/sysfs-power
> +++ b/Documentation/ABI/testing/sysfs-power
> @@ -182,6 +182,15 @@ Description:
>  		to a sleep state if any wakeup events are reported after the
>  		write has returned.
>  
> +What:		/sys/power/last_wakeup_reason
> +Date:		Jan 2022
> +Contact:	Zichar Zhang <zichar.zhang@linaro.org>

Are you pledging to be the official contact for the foreseeable
future?  Is this okay with the original authors?

> +Description:
> +		The /sys/power/last_wakeup_reason file shows the last reason
> +		causing system "wake up" from suspend state. It could be an
> +		interrupt signal, a kernel "wakeup_source" or just some other
> +		reason logged into this file, and shows as a string.
> +
>  What:		/sys/power/reserved_size
>  Date:		May 2011
>  Contact:	Rafael J. Wysocki <rjw@rjwysocki.net>
> diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
> index 99bda0da23a8..a91ee1b8c0df 100644
> --- a/drivers/base/power/wakeup.c
> +++ b/drivers/base/power/wakeup.c
> @@ -15,6 +15,7 @@
>  #include <linux/seq_file.h>
>  #include <linux/debugfs.h>
>  #include <linux/pm_wakeirq.h>
> +#include <linux/wakeup_reason.h>
>  #include <trace/events/power.h>
>  
>  #include "power.h"
> @@ -924,6 +925,7 @@ bool pm_wakeup_pending(void)
>  
>  	if (ret) {
>  		pm_pr_dbg("Wakeup pending, aborting suspend\n");
> +		log_ws_wakeup_reason();
>  		pm_print_active_wakeup_sources();
>  	}
>  
> @@ -947,11 +949,15 @@ void pm_wakeup_clear(bool reset)
>  	pm_wakeup_irq = 0;
>  	if (reset)
>  		atomic_set(&pm_abort_suspend, 0);
> +
> +	clear_wakeup_reason();
>  }
>  
>  void pm_system_irq_wakeup(unsigned int irq_number)
>  {
>  	if (pm_wakeup_irq == 0) {
> +		log_irq_wakeup_reason(irq_number);
> +
>  		pm_wakeup_irq = irq_number;
>  		pm_system_wakeup();
>  	}
> diff --git a/include/linux/wakeup_reason.h b/include/linux/wakeup_reason.h
> new file mode 100644
> index 000000000000..8d6e7a0e0bad
> --- /dev/null
> +++ b/include/linux/wakeup_reason.h
> @@ -0,0 +1,35 @@
> +/*
> + * include/linux/wakeup_reason.h

These have a tendency to go out of date.

> + * Logs the reason which caused the kernel to resume
> + * from the suspend mode.
> + *
> + * Copyright (C) 2021 Linaro, Inc.

You can't just take Copyright{ed,written} code wholesale.

You need to keep the original Google one.

> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.

Replace with SPDX.

> + */

[...]

> +++ b/kernel/power/wakeup_reason.c
> @@ -0,0 +1,183 @@
> +/*
> + * driver/base/power/wakeup_reason.c
> + *
> + * Logs the reasons which caused the kernel to resume from
> + * the suspend mode.
> + *
> + * Copyright (C) 2021 Linaro, Inc.
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */

As above.

> +#include <linux/kernel.h>
> +#include <linux/irq.h>
> +#include <linux/interrupt.h>
> +#include <linux/io.h>
> +#include <linux/init.h>
> +#include <linux/spinlock.h>
> +#include <linux/notifier.h>
> +#include <linux/suspend.h>
> +#include <linux/wakeup_reason.h>

Alphabetical.

[...]

> +ssize_t log_ws_wakeup_reason(void)
> +{
> +	struct wakeup_source *ws, *last_active_ws = NULL;
> +	int idx, max, len = 0;
> +	bool active = false;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&wakeup_reason_lock, flags);
> +
> +	if (!capture_reasons) {
> +		goto out;
> +	}

Over-bracketing.

[...]

> +EXPORT_SYMBOL(log_ws_wakeup_reason);

wakeup_reason_* might be better/clearer way to namespace.

> +ssize_t log_irq_wakeup_reason(unsigned int irq_number)
> +{
> +	int len = 0;

Smaller data types (int, char, bool) usually go at the bottom below
larger ones (struct *).

> +	struct irq_desc *desc;
> +	const char *name = "null";
> +	unsigned long flags;
> +
> +	desc = irq_to_desc(irq_number);
> +	if (desc == NULL)

if (!desc)

[...]

> +ssize_t last_wakeup_reason_get(char *buf, ssize_t max)
> +{
> +	ssize_t len, size = 0;
> +	unsigned long flags;
> +
> +	if (!buf) {
> +		return 0;
> +	}

Over-bracketing.  I won't keep reporting on this.

-- 
Lee Jones [李琼斯]
Principal Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] PM: suspend: Upstreaming wakeup reason capture support
  2022-01-26  5:09   ` Zichar Zhang
  2022-01-28  4:55     ` [PATCH 1/1] [RFC] wakeup_reason: Add infrastructure to log and report why the system resumed from suspend Zichar Zhang
@ 2022-01-29  7:52     ` Kelly Rossmoyer
  1 sibling, 0 replies; 17+ messages in thread
From: Kelly Rossmoyer @ 2022-01-29  7:52 UTC (permalink / raw)
  To: Zichar Zhang
  Cc: John Stultz, Rafael J. Wysocki, Pavel Machek, Len Brown,
	Greg Kroah-Hartman, Lee Jones, Vijay Nayak, linux-pm,
	linux-kernel, Sumit Semwal

On Tue, Jan 25, 2022 at 9:09 PM Zichar Zhang <zichar.zhang@linaro.org> wrote:
>
> hello John,  Kelly!

Hello!  Happy to see what the future may bring in this space, and looking
forward to what we can all do to move this area forward.

>     -- If these IRQs do hanpen, the code in this commit will catch
>     them as "unknown wakeup reason" and suggest user to check the
>     kernel log for more information.

I would argue that - at least in Android context, as that's what I know -
the kernel log is not a substitute for wakeup reason capture.
1) It is common to find yourself troubleshooting battery life issues for
   which kernel log data is not unavailable.
2) Troubleshooting is not the only consideration.  Since "why aren't we in
   suspend right now?" is such a key question for mobile device battery
   life, this is also about power attribution.  And I think trying to build
   that upon live kernel log parsing would be both inefficient and brittle.
   (But my lack of knowledge is vast, so maybe that's a solvable problem?)

> >> * abort reasons, including:
> >>    * wakeup_source activity
> >>    * failure to freeze userspace
> >>    * failure to suspend devices
> >>    * failed syscore_suspend callback
>
>     -- As mentioned before, if these "abort action" happened, you
>     can catch string "unknown wakeup reason", and check the kernel
>     log then.

I don't think the kernel log is a solution here.  Suspend aborts can be a
significant fraction of how suspend attempts end, making them a key
contributor to battery drain.  If the eventual set of patches doesn't solve
for at least the most common kinds of suspend aborts, that leaves a lot of
power observability off the table.  And again, at least on Android, kernel
log content is often not available for the interval when a series of
suspend aborts contributed to power drain.

> The patch is not complete, these is the next steps:
> 1. add interface to show time spend in suspend/resume work.
> (after this, I think it could be works and replace the android patch)

I believe Android would experience a significant regression in capability
with only those two patches implemented, so that would require a lot of
careful consideration.

> 2. we need solve the "unmapped HW IRQs", "misconfigured IRQs" and
> "threaded IRQs" problem.
> (this is the hardest one)

FWIW, I'm already expecting unmapped HW IRQs and misconfigured IRQs not to
make the cut.  Those have helped solve real and recurring issues, but
they're admittedly niche solutions to infrequent problems at the expense of
messy code coupling, so... maybe not broadly beneficial for Linux.

Threaded wakeup IRQs are a different matter.  As one example, there are
architectures in which the RTC wakeup IRQ is threaded and knowing "this
string of wakeups was due to the RTC" is a lot more useful than "this
string of wakeups was due to an irqchip with tens of child IRQs".

> ... So it just report a "wakeup reason" from
> interrupt subsystem. It just a coincidence that most hardware "wakeup
> reason" is also the interupt signal.  Even the "interrupt" and "wake up"
> signal are separated from each other in GIC700.
> So give them a chance to report the their "wakeup reason".)

Agreed.  But I do hope we can find a way to bring them together so a single
story is presented to userspace.  Or, barring that, at least make it easy
for userspace to figure that out after the fact.

Thanks so much for the work you are doing on this and taking the time to
walk me through your thoughts.  I'm happy to contribute in whatever manner
adds value.

--

Kelly Rossmoyer | Software Engineer | krossmo@google.com | 858-239-4111

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] PM: suspend: Upstreaming wakeup reason capture support
  2022-01-27 20:10   ` Rafael J. Wysocki
@ 2022-01-29  8:26     ` Kelly Rossmoyer
  2022-01-30 14:46       ` Rafael J. Wysocki
  0 siblings, 1 reply; 17+ messages in thread
From: Kelly Rossmoyer @ 2022-01-29  8:26 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Pavel Machek, Len Brown, Greg Kroah-Hartman, Lee Jones,
	Vijay Nayak, Linux PM, Linux Kernel Mailing List

On Thu, Jan 27, 2022 at 12:10 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Thu, Jan 27, 2022 at 8:54 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Mon, Jan 10, 2022 at 7:49 PM Kelly Rossmoyer <krossmo@google.com> wrote:
> > >
> > So as Zichar said, this is quite heavy-weight.
> >
> > I'm not fundamentally against adding more infrastructure to help
> > identify issues related to system suspend, but there needs to be a
> > clear benefit associated with any change in this direction.
>
> That said, the general idea behind wakeup_source objects is that every
> system wakeup event should be recorded in one of them which then can
> be used for later analysis.
>
> If there are reasons why this cannot work in general, what are they?

I won't presume to say that it "cannot work in general."  Nearly everyone on
this thread has more expertise here than I do, and I'm keenly aware of how
much I don't know.  :-)

What I will say is that - across the chips and architectures I've worked upon
over the last few years - that concept has not appeared to match observed
reality.  From what I've seen (which is a very narrow slice of the Linux
universe, but I suspect is at least pretty representative of Android):
* resumes from successful suspends are typically accompanied by a flurry of
  wakesource activity from which it is not possible to determine what actually
  caused the resume (despite last changed timestamps)
* resumes that aren't accompanied by wakeup-armed IRQs can be even
  less well-reflected by wakesource activity
* I believe inferring wakeup reasons from wakesource stats would require
  having a snapshot from the last moment prior to suspend, which seems
  unsolvable from userspace
* suspend aborts (which can be even more harmful for battery life than
  "true" wakeups) are often caused by things that aren't reflected by specific
  wakesources (e.g. a driver failing to suspend)

And as I mentioned in my reply to Zichar, this isn't solely about
troubleshooting.  There's a lot of room to improve on user-focused power
attribution, and I'm hoping to build change in that direction upon the same
foundation.  Having the best possible data about "why we're awake" serves both
goals.

Tangentially, the new(ish) wakesource stats interface has also proved to be
quite difficult to utilize robustly from userspace (at least on Android, maybe
not elsewhere?).  But that's a different fish for a different fryer, that I'm
hoping to tackle later this year.


--

Kelly Rossmoyer | Software Engineer | krossmo@google.com | 858-239-4111

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] PM: suspend: Upstreaming wakeup reason capture support
  2022-01-29  8:26     ` Kelly Rossmoyer
@ 2022-01-30 14:46       ` Rafael J. Wysocki
  2022-02-02  7:59         ` Kelly Rossmoyer
  0 siblings, 1 reply; 17+ messages in thread
From: Rafael J. Wysocki @ 2022-01-30 14:46 UTC (permalink / raw)
  To: Kelly Rossmoyer
  Cc: Rafael J. Wysocki, Pavel Machek, Len Brown, Greg Kroah-Hartman,
	Lee Jones, Vijay Nayak, Linux PM, Linux Kernel Mailing List

On Sat, Jan 29, 2022 at 9:27 AM Kelly Rossmoyer <krossmo@google.com> wrote:
>
> On Thu, Jan 27, 2022 at 12:10 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Thu, Jan 27, 2022 at 8:54 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > On Mon, Jan 10, 2022 at 7:49 PM Kelly Rossmoyer <krossmo@google.com> wrote:
> > > >
> > > So as Zichar said, this is quite heavy-weight.
> > >
> > > I'm not fundamentally against adding more infrastructure to help
> > > identify issues related to system suspend, but there needs to be a
> > > clear benefit associated with any change in this direction.
> >
> > That said, the general idea behind wakeup_source objects is that every
> > system wakeup event should be recorded in one of them which then can
> > be used for later analysis.
> >
> > If there are reasons why this cannot work in general, what are they?
>
> I won't presume to say that it "cannot work in general."  Nearly everyone on
> this thread has more expertise here than I do, and I'm keenly aware of how
> much I don't know.  :-)
>
> What I will say is that - across the chips and architectures I've worked upon
> over the last few years - that concept has not appeared to match observed
> reality.  From what I've seen (which is a very narrow slice of the Linux
> universe, but I suspect is at least pretty representative of Android):
> * resumes from successful suspends are typically accompanied by a flurry of
>   wakesource activity from which it is not possible to determine what actually
>   caused the resume (despite last changed timestamps)

So I wonder how you are going to determine what actually caused the
resume reliably.

> * resumes that aren't accompanied by wakeup-armed IRQs can be even
>   less well-reflected by wakesource activity

Do you have examples of these other than the aborts mentioned below?

> * I believe inferring wakeup reasons from wakesource stats would require
>   having a snapshot from the last moment prior to suspend, which seems
>   unsolvable from userspace

That can be addressed by extending the wakeup sources in principle.

> * suspend aborts (which can be even more harmful for battery life than
>   "true" wakeups) are often caused by things that aren't reflected by specific
>   wakesources (e.g. a driver failing to suspend)

Which again can be addressed by using special wakeup sources for
registering these "wakeups"  or similar.

> And as I mentioned in my reply to Zichar, this isn't solely about
> troubleshooting.  There's a lot of room to improve on user-focused power
> attribution, and I'm hoping to build change in that direction upon the same
> foundation.  Having the best possible data about "why we're awake" serves both
> goals.

Generally speaking, there is one wakeup-related framework in the
kernel (wakeup sources) and you want to add another one sort of on top
of it and it is still quite unclear to me what can be done with the
new framework that cannot be achieved with the old one (possibly with
some extensions),

Let's first talk about the specific problems to address and then we'll
decide whether or not we need yet another piece of infrastructure to
address them.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] PM: suspend: Upstreaming wakeup reason capture support
  2022-01-30 14:46       ` Rafael J. Wysocki
@ 2022-02-02  7:59         ` Kelly Rossmoyer
  0 siblings, 0 replies; 17+ messages in thread
From: Kelly Rossmoyer @ 2022-02-02  7:59 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Pavel Machek, Len Brown, Greg Kroah-Hartman, Lee Jones,
	Vijay Nayak, Linux PM, Linux Kernel Mailing List, Zichar Zhang

+Zichar, to try to pull the threads together.

On Sun, Jan 30, 2022 at 6:46 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Sat, Jan 29, 2022 at 9:27 AM Kelly Rossmoyer <krossmo@google.com> wrote:
> >
> > On Thu, Jan 27, 2022 at 12:10 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > That said, the general idea behind wakeup_source objects is that every
> > > system wakeup event should be recorded in one of them which then can
> > > be used for later analysis.
> > >
> > > If there are reasons why this cannot work in general, what are they?
> >
> > I won't presume to say that it "cannot work in general."  Nearly everyone on
> > this thread has more expertise here than I do, and I'm keenly aware of how
> > much I don't know.  :-)
> >
> > What I will say is that - across the chips and architectures I've worked upon
> > over the last few years - that concept has not appeared to match observed
> > reality.  From what I've seen (which is a very narrow slice of the Linux
> > universe, but I suspect is at least pretty representative of Android):
> > * resumes from successful suspends are typically accompanied by a flurry of
> >   wakesource activity from which it is not possible to determine what actually
> >   caused the resume (despite last changed timestamps)
>
> So I wonder how you are going to determine what actually caused the
> resume reliably.

I don't have the cross-platform/cross-arch experience to know if this is
broadly applicable, but right now what we're doing is taking:
* the first suspend abort reason that was captured if there was one, or if
  not...
* all pending wakeup-armed IRQs, if there are any, or if not...
* the first non-IRQ, non-abort reason logged, which mostly tends to emerge
  from platform-specific code (e.g. the Foo logic block decided it was
  time to power on for reasons that aren't reflected in any IRQs)

> > * resumes that aren't accompanied by wakeup-armed IRQs can be even
> >   less well-reflected by wakesource activity
>
> Do you have examples of these other than the aborts mentioned below?

Nothing super detailed, as my area of power work has only gotten down to
relatively high-level portions of the kernel as opposed to lower-level SOC
architecture.  But two examples that come to mind include:
* a secure watchdog interrupt fires, which wakes up a hypervisor that
  handles and clears the interrupt, leaving no IRQ pending by the time
  kernel execution resumes
* power control logic outside of the CPUs decides to turn CPU clusters
  back on to support a use case currently driven by other logic on the
  chip that doesn't involve any wakeup-armed IRQs (this could plausibly be
  something like a low-power logic block that handles audio playback
  causing the CPUs to be woken up so it can receive the next several
  seconds of audio data to buffer up before the kernel suspends again)

> > * I believe inferring wakeup reasons from wakesource stats would require
> >   having a snapshot from the last moment prior to suspend, which seems
> >   unsolvable from userspace
>
> That can be addressed by extending the wakeup sources in principle.
>
> > * suspend aborts (which can be even more harmful for battery life than
> >   "true" wakeups) are often caused by things that aren't reflected by specific
> >   wakesources (e.g. a driver failing to suspend)
>
> Which again can be addressed by using special wakeup sources for
> registering these "wakeups"  or similar.

Do you have the outline of a concept in mind, or is this more about the
general principle of extending what's there vs. adding something new?
(Apologies if what you're alluding to should be clear to me... I'm afraid
I don't have the relevant experience to envision what this could look like.)

> > And as I mentioned in my reply to Zichar, this isn't solely about
> > troubleshooting.  There's a lot of room to improve on user-focused power
> > attribution, and I'm hoping to build change in that direction upon the same
> > foundation.  Having the best possible data about "why we're awake" serves both
> > goals.
>
> Generally speaking, there is one wakeup-related framework in the
> kernel (wakeup sources) and you want to add another one sort of on top
> of it and it is still quite unclear to me what can be done with the
> new framework that cannot be achieved with the old one (possibly with
> some extensions),

I guess my (neophytic) impression has been that there are actually already
three frameworks (not just one) for wakeup-related data (where "wakeup" in
this sense is "the reason(s) we aren't suspended", including aborts):
* wakeup_source stats
* pm_wakeup_irq
* suspend_stats

And if those were extended, such that:
* the name of the last active wakeup_source was exposed to userspace
  to decode suspend aborts due to wakeup_count changes
* pm_wakeup_irq captured potentially multiple pending wakeup_armed
  IRQs during resume instead of just one
* some sense of causation was added to wakeup_sources to enable their
  use for non-IRQ resume causes (e.g. PM core outside of the CPUs
  turned CPUs back on for reasons that only platform logic can decode
  and report)
* the userspace interface to wakeup_source stats wasn't so flawed

Then very determined userspace code could combine those things with their
pre-suspend-attempt states and the fourth key piece of data (the return
value from the write to /sys/power/state, assuming kernel autosuspend
isn't used) to put together something that would be close to what the
Android wakeup_reason code is doing.  And then it would break because
there's nothing in the kernel tying those different pieces together in a
cohesive way that's guaranteed to work with some degree of stability.

> Let's first talk about the specific problems to address and then we'll
> decide whether or not we need yet another piece of infrastructure to
> address them.

I'm probably being overly optimistic, but my intention is more to tie
together and shore up the gaps in existing pieces that are currently
disjoint and incomplete, as opposed to just throwing yet another
framework into the mix.

--

Kelly Rossmoyer | Software Engineer | krossmo@google.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] PM: suspend: Upstreaming wakeup reason capture support
@ 2022-01-30 18:15 Zichar Zhang
  0 siblings, 0 replies; 17+ messages in thread
From: Zichar Zhang @ 2022-01-30 18:15 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg Kroah-Hartman, Kelly Rossmoyer, Lee Jones, Len Brown,
	linux-kernel, linux-pm, Vijay Nayak, Pavel Machek

Hi Rafael, Kelly

hello Rafael, it is a little bit late for me to reply to you. I was
finding the way to
reply to you, cause I'm not in the "cc list". So, thanks Kelly in that way. :)
I'm totally agree with you that we should split the work into smaller
pieces and do it step by step.

Hi Kelly,
I got the strong signal from you that you insist on your requirement.
It's reasonable, and I want that if I am the user too. :)

But it's could be some problem for me to do all of that. And I am calling help
here. Yes! I need some help!

I want to seperate this task into 4 part:
1. user interface: like sysfs file /sys/power/last_wakeup_reason.
2. report interface: call by "wakeup sources" to report "wakeup reason".
3. report operation in kernel: like "interrupt subsystem". (a common interface)
4. report operation in device: like WDT driver, GIC driver or other
device driver.

I think we should do the first 3 parts, but not the last one, cause it is device
specific things. Device and BSP should do that, I insist that.
Part 1 and 2 are easily to do, and I can rework again and agian until it is all
right for everyone.
So it is clear that we have problem with the third part. And yes! it is very
hard.

Kernel desn't know how the "machine" wakeup, kernel just offer the interface
that user can mark the "wakeup source", like IRQD_WAKEUP_STATE flag
and "ws"(wakeup source) interface (acturally they are fake wakeup sources).
These works well and we can easily to report these which Android patch and
mine already do that.
But the left things is hard.Cause kernel or "subsystem" in kernel desn't has
any mechanism to do that. Then we are facing these three things:

1. "misconfigured" and "unmapped" IRQs reporting.
Android patch just add a "wakeup report" interface here once it was occurred,
even it's not in a "suspend" state, and even one of them was in GIC driver.
if I was the maintainer I won't take this, but the question is what should I do
for that?
(Maybe I shoud give a task to "interrupt subsystem people" and ask them to
do that? :) )

2. errors in suspend/resume process.
That means if there is a error occurs in suspend/resume process I need to
report it as "wakeup reason".Which is just "abort wakeup reason" as  Kelly
said. But it is lots of errors may occurs here, and which one I should report,
and is that enough?
And as  Kelly said the code is "messy", that does hit the point.

3. threaded inerrupt
Sorry, I don't find the properly place in kernel to report there "wakeup
reason". Maybe that's my lack of knowalige. :)
It's seem like some interrupt chip driver should do that? I don't know. Maybe
I should offer a interface and just let "user" to use it?

So, that all the things I got.
And again kelly, I got your mind, and I will try to think this over again to see
if I can find a way to do that.

besides, thanks Jone and John, I will rework the patch after this discusion.

And any advice could help! :)
Or you have a better idea, I can help you to do yours!

Best
Zichar

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-02-02  8:00 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-10 18:49 [RFC] PM: suspend: Upstreaming wakeup reason capture support Kelly Rossmoyer
2022-01-24 17:37 ` John Stultz
2022-01-26  5:09   ` Zichar Zhang
2022-01-28  4:55     ` [PATCH 1/1] [RFC] wakeup_reason: Add infrastructure to log and report why the system resumed from suspend Zichar Zhang
2022-01-28  6:56       ` kernel test robot
2022-01-28  7:01       ` Greg KH
2022-01-28  8:43         ` Zichar Zhang
2022-01-28  8:45       ` kernel test robot
2022-01-28  8:45         ` kernel test robot
2022-01-28  9:32       ` Lee Jones
2022-01-29  7:52     ` [RFC] PM: suspend: Upstreaming wakeup reason capture support Kelly Rossmoyer
2022-01-27 19:54 ` Rafael J. Wysocki
2022-01-27 20:10   ` Rafael J. Wysocki
2022-01-29  8:26     ` Kelly Rossmoyer
2022-01-30 14:46       ` Rafael J. Wysocki
2022-02-02  7:59         ` Kelly Rossmoyer
2022-01-30 18:15 Zichar Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.