All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 RESEND] thermal core: add option to run PM_POST_SUSPEND asynchronously
@ 2023-11-20 23:40 Radu Solea
  2023-11-29 12:20 ` Daniel Lezcano
  2023-11-29 13:08 ` Rafael J. Wysocki
  0 siblings, 2 replies; 15+ messages in thread
From: Radu Solea @ 2023-11-20 23:40 UTC (permalink / raw)
  To: linux-pm; +Cc: rafael, Radu Solea

Some thermal zones are bus connected and slow to resume, thus
delaying actions which depend on completion of PM_POST_SUSPEND.
Add optional execution path to resume thermal zones on the system
unbounded workqueue.

Signed-off-by: Radu Solea <radusolea@google.com>
---
 drivers/thermal/Kconfig        | 11 +++++++
 drivers/thermal/thermal_core.c | 58 ++++++++++++++++++++++++++++++----
 2 files changed, 62 insertions(+), 7 deletions(-)

diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
index c81a00fbca7d..148d6e9734c6 100644
--- a/drivers/thermal/Kconfig
+++ b/drivers/thermal/Kconfig
@@ -91,6 +91,17 @@ config THERMAL_WRITABLE_TRIPS
 	  Say 'Y' here if you would like to allow userspace tools to
 	  change trip temperatures.
 
+config THERMAL_ASYNC_RESUME
+	bool "Thermal async init zones on system resume"
+	default n
+	help
+	  Re-initialize thermal zones asynchronously on system resume.
+	  Thermal zone sensors may be attached on slow buses, impacting
+	  the duration of PM_POST_SUSPEND. If that is a concern enable
+	  this switch.
+
+	  If in doubt, say N.
+
 choice
 	prompt "Default Thermal governor"
 	default THERMAL_DEFAULT_GOV_STEP_WISE
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 9c17d35ccbbd..8706f4ff5746 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -21,6 +21,10 @@
 #include <linux/of.h>
 #include <linux/suspend.h>
 
+#ifdef CONFIG_THERMAL_ASYNC_RESUME
+#include <linux/workqueue.h>
+#endif /* CONFIG_THERMAL_ASYNC_RESUME */
+
 #define CREATE_TRACE_POINTS
 #include "thermal_trace.h"
 
@@ -41,6 +45,10 @@ static atomic_t in_suspend;
 
 static struct thermal_governor *def_governor;
 
+#ifdef CONFIG_THERMAL_ASYNC_RESUME
+static struct work_struct *resume_thermal_zones_wk;
+#endif /* CONFIG_THERMAL_ASYNC_RESUME */
+
 /*
  * Governor section: set of functions to handle thermal governors
  *
@@ -1506,26 +1514,53 @@ struct thermal_zone_device *thermal_zone_get_zone_by_name(const char *name)
 }
 EXPORT_SYMBOL_GPL(thermal_zone_get_zone_by_name);
 
-static int thermal_pm_notify(struct notifier_block *nb,
-			     unsigned long mode, void *_unused)
+static inline void resume_thermal_zones(void)
 {
 	struct thermal_zone_device *tz;
 
+	list_for_each_entry(tz, &thermal_tz_list, node) {
+		if (!thermal_zone_device_is_enabled(tz))
+			continue;
+
+		thermal_zone_device_init(tz);
+		thermal_zone_device_update(tz, THERMAL_EVENT_UNSPECIFIED);
+	}
+}
+
+#ifdef CONFIG_THERMAL_ASYNC_RESUME
+static void thermal_resume_work_fn(struct work_struct *work)
+{
+	resume_thermal_zones();
+}
+
+static inline void thermal_resume_enqueue(void)
+{
+	INIT_WORK(resume_thermal_zones_wk, thermal_resume_work_fn);
+	queue_work(system_unbound_wq, resume_thermal_zones_wk);
+}
+#endif /* CONFIG_THERMAL_ASYNC_RESUME */
+
+static int thermal_pm_notify(struct notifier_block *nb,
+			     unsigned long mode, void *_unused)
+{
 	switch (mode) {
 	case PM_HIBERNATION_PREPARE:
 	case PM_RESTORE_PREPARE:
 	case PM_SUSPEND_PREPARE:
+#ifdef CONFIG_THERMAL_ASYNC_RESUME
+		flush_work(resume_thermal_zones_wk);
+#endif /* CONFIG_THERMAL_ASYNC_RESUME */
 		atomic_set(&in_suspend, 1);
 		break;
 	case PM_POST_HIBERNATION:
 	case PM_POST_RESTORE:
 	case PM_POST_SUSPEND:
 		atomic_set(&in_suspend, 0);
-		list_for_each_entry(tz, &thermal_tz_list, node) {
-			thermal_zone_device_init(tz);
-			thermal_zone_device_update(tz,
-						   THERMAL_EVENT_UNSPECIFIED);
-		}
+#ifdef CONFIG_THERMAL_ASYNC_RESUME
+		thermal_resume_enqueue();
+#else /* CONFIG_THERMAL_ASYNC_RESUME */
+		resume_thermal_zones();
+#endif /* CONFIG_THERMAL_ASYNC_RESUME */
 		break;
 	default:
 		break;
@@ -1541,6 +1576,15 @@ static int __init thermal_init(void)
 {
 	int result;
 
+#ifdef CONFIG_THERMAL_ASYNC_RESUME
+	resume_thermal_zones_wk = kmalloc(sizeof(*resume_thermal_zones_wk),
+					GFP_KERNEL);
+	if (!resume_thermal_zones_wk) {
+		result = -ENOMEM;
+		goto error;
+	}
+#endif /* CONFIG_THERMAL_ASYNC_RESUME */
+
 	result = thermal_netlink_init();
 	if (result)
 		goto error;
-- 
2.43.0.rc1.413.gea7ed67945-goog



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 RESEND] thermal core: add option to run PM_POST_SUSPEND asynchronously
  2023-11-20 23:40 [PATCH v2 RESEND] thermal core: add option to run PM_POST_SUSPEND asynchronously Radu Solea
@ 2023-11-29 12:20 ` Daniel Lezcano
  2023-12-06  1:20   ` Radu Solea
  2023-11-29 13:08 ` Rafael J. Wysocki
  1 sibling, 1 reply; 15+ messages in thread
From: Daniel Lezcano @ 2023-11-29 12:20 UTC (permalink / raw)
  To: Radu Solea, linux-pm; +Cc: rafael

On 21/11/2023 00:40, Radu Solea wrote:
> Some thermal zones are bus connected and slow to resume, thus
> delaying actions which depend on completion of PM_POST_SUSPEND.
> Add optional execution path to resume thermal zones on the system
> unbounded workqueue.
> 
> Signed-off-by: Radu Solea <radusolea@google.com>
> ---

This async change may have a lot of hidden implications.

Could you elaborate more the issue and how the async will fix the problem?

If you have a platform being slow to resume, can you provide numbers 
with and without this option?

Thanks
   -- D.

-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 RESEND] thermal core: add option to run PM_POST_SUSPEND asynchronously
  2023-11-20 23:40 [PATCH v2 RESEND] thermal core: add option to run PM_POST_SUSPEND asynchronously Radu Solea
  2023-11-29 12:20 ` Daniel Lezcano
@ 2023-11-29 13:08 ` Rafael J. Wysocki
  2023-11-30 20:33   ` Radu Solea
  2023-12-06  1:31   ` Radu Solea
  1 sibling, 2 replies; 15+ messages in thread
From: Rafael J. Wysocki @ 2023-11-29 13:08 UTC (permalink / raw)
  To: Radu Solea; +Cc: linux-pm, rafael

On Tue, Nov 21, 2023 at 12:40 AM Radu Solea <radusolea@google.com> wrote:
>
> Some thermal zones are bus connected and slow to resume, thus
> delaying actions which depend on completion of PM_POST_SUSPEND.

What actions in particular?

> Add optional execution path to resume thermal zones on the system
> unbounded workqueue.

Why optional?

This is only useful for people building their own custom kernels.

> Signed-off-by: Radu Solea <radusolea@google.com>
> ---
>  drivers/thermal/Kconfig        | 11 +++++++
>  drivers/thermal/thermal_core.c | 58 ++++++++++++++++++++++++++++++----
>  2 files changed, 62 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
> index c81a00fbca7d..148d6e9734c6 100644
> --- a/drivers/thermal/Kconfig
> +++ b/drivers/thermal/Kconfig
> @@ -91,6 +91,17 @@ config THERMAL_WRITABLE_TRIPS
>           Say 'Y' here if you would like to allow userspace tools to
>           change trip temperatures.
>
> +config THERMAL_ASYNC_RESUME
> +       bool "Thermal async init zones on system resume"
> +       default n
> +       help
> +         Re-initialize thermal zones asynchronously on system resume.
> +         Thermal zone sensors may be attached on slow buses, impacting

"Slow" relative to what?  How can it be determined?

> +         the duration of PM_POST_SUSPEND. If that is a concern enable
> +         this switch.
> +
> +         If in doubt, say N.
> +
>  choice
>         prompt "Default Thermal governor"
>         default THERMAL_DEFAULT_GOV_STEP_WISE

In the first place, I would like to know the exact motivation for this change.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 RESEND] thermal core: add option to run PM_POST_SUSPEND asynchronously
  2023-11-29 13:08 ` Rafael J. Wysocki
@ 2023-11-30 20:33   ` Radu Solea
  2023-12-01  9:12     ` Daniel Lezcano
  2023-12-06  1:31   ` Radu Solea
  1 sibling, 1 reply; 15+ messages in thread
From: Radu Solea @ 2023-11-30 20:33 UTC (permalink / raw)
  To: Rafael J. Wysocki, daniel.lezcano; +Cc: linux-pm

Thank you for taking the time to discuss this.
In the context of PM_POST_SUSPEND, there are not many guarantees. It
mainly just guarantees that the code is executed at a particular step
in the resume sequence and, implicitly, that the execution happens
before another power transition. There are no order or timing promises
made.
Doing work on the notification chain impacts at least two things
relevant to this change: triggering of the next items on the chain and
completion of the write() to /sys/power/state. In a multicore context,
both of these matter.
The system context is battery-powered embedded devices with thermal
zones on-board. These thermal zones are bus connected and operate
asynchronously to the executing core, introducing delays.
I proposed the change as optional because I cannot account for all
possible cases. However, in the specific context of these devices,
custom kernel configurations are the norm and it is expected that the
integration team validates system assumptions (such as system
unbounded queue load).
The ultimate reason for this change is that I want my 50ms back.
Joking aside, there are not many ways to determine that the system has
completed a resume cycle. The write() to /sys/power/state completing
is important in signaling that to the system at large.
I did not include numbers in the initial submission because I do not
expect that the numbers I see on a particular integration are easily
replicable in other situations. However, the 50ms I am currently
chasing are important in user-interactive systems and impact overall
power consumption.
Thank you,
Radu

On Wed, Nov 29, 2023 at 5:09 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Tue, Nov 21, 2023 at 12:40 AM Radu Solea <radusolea@google.com> wrote:
> >
> > Some thermal zones are bus connected and slow to resume, thus
> > delaying actions which depend on completion of PM_POST_SUSPEND.
>
> What actions in particular?
>
> > Add optional execution path to resume thermal zones on the system
> > unbounded workqueue.
>
> Why optional?
>
> This is only useful for people building their own custom kernels.
>
> > Signed-off-by: Radu Solea <radusolea@google.com>
> > ---
> >  drivers/thermal/Kconfig        | 11 +++++++
> >  drivers/thermal/thermal_core.c | 58 ++++++++++++++++++++++++++++++----
> >  2 files changed, 62 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
> > index c81a00fbca7d..148d6e9734c6 100644
> > --- a/drivers/thermal/Kconfig
> > +++ b/drivers/thermal/Kconfig
> > @@ -91,6 +91,17 @@ config THERMAL_WRITABLE_TRIPS
> >           Say 'Y' here if you would like to allow userspace tools to
> >           change trip temperatures.
> >
> > +config THERMAL_ASYNC_RESUME
> > +       bool "Thermal async init zones on system resume"
> > +       default n
> > +       help
> > +         Re-initialize thermal zones asynchronously on system resume.
> > +         Thermal zone sensors may be attached on slow buses, impacting
>
> "Slow" relative to what?  How can it be determined?
>
> > +         the duration of PM_POST_SUSPEND. If that is a concern enable
> > +         this switch.
> > +
> > +         If in doubt, say N.
> > +
> >  choice
> >         prompt "Default Thermal governor"
> >         default THERMAL_DEFAULT_GOV_STEP_WISE
>
> In the first place, I would like to know the exact motivation for this change.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 RESEND] thermal core: add option to run PM_POST_SUSPEND asynchronously
  2023-11-30 20:33   ` Radu Solea
@ 2023-12-01  9:12     ` Daniel Lezcano
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel Lezcano @ 2023-12-01  9:12 UTC (permalink / raw)
  To: Radu Solea, Rafael J. Wysocki; +Cc: linux-pm


Hi Radu,

On 30/11/2023 21:33, Radu Solea wrote:
> Thank you for taking the time to discuss this.
> In the context of PM_POST_SUSPEND, there are not many guarantees. It
> mainly just guarantees that the code is executed at a particular step
> in the resume sequence and, implicitly, that the execution happens
> before another power transition. There are no order or timing promises
> made.
> Doing work on the notification chain impacts at least two things
> relevant to this change: triggering of the next items on the chain and
> completion of the write() to /sys/power/state. In a multicore context,
> both of these matter.
> The system context is battery-powered embedded devices with thermal
> zones on-board. These thermal zones are bus connected and operate
> asynchronously to the executing core, introducing delays.
> I proposed the change as optional because I cannot account for all
> possible cases. However, in the specific context of these devices,
> custom kernel configurations are the norm and it is expected that the
> integration team validates system assumptions (such as system
> unbounded queue load).
> The ultimate reason for this change is that I want my 50ms back.
> Joking aside, there are not many ways to determine that the system has
> completed a resume cycle. The write() to /sys/power/state completing
> is important in signaling that to the system at large.
> I did not include numbers in the initial submission because I do not
> expect that the numbers I see on a particular integration are easily
> replicable in other situations. However, the 50ms I am currently
> chasing are important in user-interactive systems and impact overall
> power consumption.
> Thank you,
> Radu

Please, respond inline the questions Rafael and I asked.

Thanks

   -- Daniel

> On Wed, Nov 29, 2023 at 5:09 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>>
>> On Tue, Nov 21, 2023 at 12:40 AM Radu Solea <radusolea@google.com> wrote:
>>>
>>> Some thermal zones are bus connected and slow to resume, thus
>>> delaying actions which depend on completion of PM_POST_SUSPEND.
>>
>> What actions in particular?
>>
>>> Add optional execution path to resume thermal zones on the system
>>> unbounded workqueue.
>>
>> Why optional?
>>
>> This is only useful for people building their own custom kernels.
>>
>>> Signed-off-by: Radu Solea <radusolea@google.com>
>>> ---
>>>   drivers/thermal/Kconfig        | 11 +++++++
>>>   drivers/thermal/thermal_core.c | 58 ++++++++++++++++++++++++++++++----
>>>   2 files changed, 62 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
>>> index c81a00fbca7d..148d6e9734c6 100644
>>> --- a/drivers/thermal/Kconfig
>>> +++ b/drivers/thermal/Kconfig
>>> @@ -91,6 +91,17 @@ config THERMAL_WRITABLE_TRIPS
>>>            Say 'Y' here if you would like to allow userspace tools to
>>>            change trip temperatures.
>>>
>>> +config THERMAL_ASYNC_RESUME
>>> +       bool "Thermal async init zones on system resume"
>>> +       default n
>>> +       help
>>> +         Re-initialize thermal zones asynchronously on system resume.
>>> +         Thermal zone sensors may be attached on slow buses, impacting
>>
>> "Slow" relative to what?  How can it be determined?
>>
>>> +         the duration of PM_POST_SUSPEND. If that is a concern enable
>>> +         this switch.
>>> +
>>> +         If in doubt, say N.
>>> +
>>>   choice
>>>          prompt "Default Thermal governor"
>>>          default THERMAL_DEFAULT_GOV_STEP_WISE
>>
>> In the first place, I would like to know the exact motivation for this change.

-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 RESEND] thermal core: add option to run PM_POST_SUSPEND asynchronously
  2023-11-29 12:20 ` Daniel Lezcano
@ 2023-12-06  1:20   ` Radu Solea
  2023-12-06 11:23     ` Daniel Lezcano
  0 siblings, 1 reply; 15+ messages in thread
From: Radu Solea @ 2023-12-06  1:20 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: linux-pm, rafael

On Wed, Nov 29, 2023 at 4:20 AM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
> On 21/11/2023 00:40, Radu Solea wrote:
> > Some thermal zones are bus connected and slow to resume, thus
> > delaying actions which depend on completion of PM_POST_SUSPEND.
> > Add optional execution path to resume thermal zones on the system
> > unbounded workqueue.
> >
> > Signed-off-by: Radu Solea <radusolea@google.com>
> > ---
>
> This async change may have a lot of hidden implications.
>
> Could you elaborate more the issue and how the async will fix the problem?
>
> If you have a platform being slow to resume, can you provide numbers
> with and without this option?
>
> Thanks
>    -- D.
>
> --
> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>
> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog
>

In multicore systems PM_POST_SUSPEND is executed on a single core.
Any work done in the notification chain delays all subsequent actions
in the chain with respect to system time, including the completion of
the write() to /sys/power/state.
I didn't include numbers from my system since they are likely
irrelevant for other systems out there. The particular number I'm
chasing is ~50ms.
This comes from having on-board peripherals as thermal zones, they
execute async and significantly slower than the main core, add a/d
conversions and bus delays to that and it's easy to see those numbers.
Making the entire sequence synchronous to itself and async to
PM_POST_SUSPEND isn't that much of a change, it allows the sequence to
run on any core with spare cycles delayed with whatever the system
unbounded queue load is at the time.
(on my target system) I've seen consistent time gains (those same
50ms) to PM_POST_SUSPEND completion with this sequence actually
completing before the chain finishes, this will vary from integration
to integration.

Thank you,
Radu.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 RESEND] thermal core: add option to run PM_POST_SUSPEND asynchronously
  2023-11-29 13:08 ` Rafael J. Wysocki
  2023-11-30 20:33   ` Radu Solea
@ 2023-12-06  1:31   ` Radu Solea
  1 sibling, 0 replies; 15+ messages in thread
From: Radu Solea @ 2023-12-06  1:31 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-pm

On Wed, Nov 29, 2023 at 5:09 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Tue, Nov 21, 2023 at 12:40 AM Radu Solea <radusolea@google.com> wrote:
> >
> > Some thermal zones are bus connected and slow to resume, thus
> > delaying actions which depend on completion of PM_POST_SUSPEND.
>
> What actions in particular?
Anything in the PM_POST_SUSPEND chain coming after this sequence and
the completion of the write() to /sys/power/state.

>
> > Add optional execution path to resume thermal zones on the system
> > unbounded workqueue.
>
> Why optional?
I proposed the change as optional because I'm not certain I can
account for all possible system loads.
However, in the specific context of embedded devices, custom kernel
configurations are the norm and it is expected that the integration
team validates system assumptions (such as system unbounded queue
load).

>
> This is only useful for people building their own custom kernels.
>
> > Signed-off-by: Radu Solea <radusolea@google.com>
> > ---
> >  drivers/thermal/Kconfig        | 11 +++++++
> >  drivers/thermal/thermal_core.c | 58 ++++++++++++++++++++++++++++++----
> >  2 files changed, 62 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
> > index c81a00fbca7d..148d6e9734c6 100644
> > --- a/drivers/thermal/Kconfig
> > +++ b/drivers/thermal/Kconfig
> > @@ -91,6 +91,17 @@ config THERMAL_WRITABLE_TRIPS
> >           Say 'Y' here if you would like to allow userspace tools to
> >           change trip temperatures.
> >
> > +config THERMAL_ASYNC_RESUME
> > +       bool "Thermal async init zones on system resume"
> > +       default n
> > +       help
> > +         Re-initialize thermal zones asynchronously on system resume.
> > +         Thermal zone sensors may be attached on slow buses, impacting
>
> "Slow" relative to what?  How can it be determined?
>
> > +         the duration of PM_POST_SUSPEND. If that is a concern enable
> > +         this switch.
> > +
> > +         If in doubt, say N.
> > +
> >  choice
> >         prompt "Default Thermal governor"
> >         default THERMAL_DEFAULT_GOV_STEP_WISE
>
> In the first place, I would like to know the exact motivation for this change.
Immediate value to me is a significant reduction(within my particular
system context) in the time it takes to complete the write() to
/sys/power/state.
In general this would move the execution to the next available core
when the queue gets to the work item spreading (some of) the resume
load across cores.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 RESEND] thermal core: add option to run PM_POST_SUSPEND asynchronously
  2023-12-06  1:20   ` Radu Solea
@ 2023-12-06 11:23     ` Daniel Lezcano
  2023-12-11 23:25       ` Radu Solea
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Lezcano @ 2023-12-06 11:23 UTC (permalink / raw)
  To: Radu Solea; +Cc: linux-pm, rafael


Hi Radu,

On 06/12/2023 02:20, Radu Solea wrote:
> On Wed, Nov 29, 2023 at 4:20 AM Daniel Lezcano
> <daniel.lezcano@linaro.org> wrote:
>>
>> On 21/11/2023 00:40, Radu Solea wrote:
>>> Some thermal zones are bus connected and slow to resume, thus
>>> delaying actions which depend on completion of PM_POST_SUSPEND.
>>> Add optional execution path to resume thermal zones on the system
>>> unbounded workqueue.
>>>
>>> Signed-off-by: Radu Solea <radusolea@google.com>
>>> ---
>>
>> This async change may have a lot of hidden implications.
>>
>> Could you elaborate more the issue and how the async will fix the problem?
>>
>> If you have a platform being slow to resume, can you provide numbers
>> with and without this option?
>>
>> Thanks
>>     -- D.
>>
>> --
>> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>>
>> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
>> <http://twitter.com/#!/linaroorg> Twitter |
>> <http://www.linaro.org/linaro-blog/> Blog
>>
> 
> In multicore systems PM_POST_SUSPEND is executed on a single core.
> Any work done in the notification chain delays all subsequent actions
> in the chain with respect to system time, including the completion of
> the write() to /sys/power/state.
> I didn't include numbers from my system since they are likely
> irrelevant for other systems out there. The particular number I'm
> chasing is ~50ms.
> This comes from having on-board peripherals as thermal zones, they
> execute async and significantly slower than the main core, add a/d
> conversions and bus delays to that and it's easy to see those numbers.
> Making the entire sequence synchronous to itself and async to
> PM_POST_SUSPEND isn't that much of a change, it allows the sequence to
> run on any core with spare cycles delayed with whatever the system
> unbounded queue load is at the time.
> (on my target system) I've seen consistent time gains (those same
> 50ms) to PM_POST_SUSPEND completion with this sequence actually
> completing before the chain finishes, this will vary from integration
> to integration.

Sorry but I don't see how you can have a gain of 50ms by doing the 
restore asynchronously.

Can you give a more detailed description of the hardware? How many 
thermal zones?



-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 RESEND] thermal core: add option to run PM_POST_SUSPEND asynchronously
  2023-12-06 11:23     ` Daniel Lezcano
@ 2023-12-11 23:25       ` Radu Solea
  2023-12-12  7:00         ` Daniel Lezcano
  0 siblings, 1 reply; 15+ messages in thread
From: Radu Solea @ 2023-12-11 23:25 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: linux-pm, rafael

On Wed, Dec 6, 2023 at 3:23 AM Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
>
>
> Hi Radu,
>
> On 06/12/2023 02:20, Radu Solea wrote:
> > On Wed, Nov 29, 2023 at 4:20 AM Daniel Lezcano
> > <daniel.lezcano@linaro.org> wrote:
> >>
> >> On 21/11/2023 00:40, Radu Solea wrote:
> >>> Some thermal zones are bus connected and slow to resume, thus
> >>> delaying actions which depend on completion of PM_POST_SUSPEND.
> >>> Add optional execution path to resume thermal zones on the system
> >>> unbounded workqueue.
> >>>
> >>> Signed-off-by: Radu Solea <radusolea@google.com>
> >>> ---
> >>
> >> This async change may have a lot of hidden implications.
> >>
> >> Could you elaborate more the issue and how the async will fix the problem?
> >>
> >> If you have a platform being slow to resume, can you provide numbers
> >> with and without this option?
> >>
> >> Thanks
> >>     -- D.
> >>
> >> --
> >> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
> >>
> >> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> >> <http://twitter.com/#!/linaroorg> Twitter |
> >> <http://www.linaro.org/linaro-blog/> Blog
> >>
> >
> > In multicore systems PM_POST_SUSPEND is executed on a single core.
> > Any work done in the notification chain delays all subsequent actions
> > in the chain with respect to system time, including the completion of
> > the write() to /sys/power/state.
> > I didn't include numbers from my system since they are likely
> > irrelevant for other systems out there. The particular number I'm
> > chasing is ~50ms.
> > This comes from having on-board peripherals as thermal zones, they
> > execute async and significantly slower than the main core, add a/d
> > conversions and bus delays to that and it's easy to see those numbers.
> > Making the entire sequence synchronous to itself and async to
> > PM_POST_SUSPEND isn't that much of a change, it allows the sequence to
> > run on any core with spare cycles delayed with whatever the system
> > unbounded queue load is at the time.
> > (on my target system) I've seen consistent time gains (those same
> > 50ms) to PM_POST_SUSPEND completion with this sequence actually
> > completing before the chain finishes, this will vary from integration
> > to integration.
>
> Sorry but I don't see how you can have a gain of 50ms by doing the
> restore asynchronously.
>
> Can you give a more detailed description of the hardware? How many
> thermal zones?
>
>
I can't go into much detail about the hardware. But let's put it this
way, if thermal_zone_device_update() takes 5 ms for each device (read
temp, get trips, set trips, etc). Assume 5 onboard thermal zones, on a
good day, ignoring system churn you'd get to around 25ms (already
significant).
Now on top of that add that these devices have multiple functions,
like a PMIC for example. The resume sequence is the perfect time frame
where you'd encounter more than one operation aimed at any one of
these devices. Unless you have uncommonly smart drivers and devices,
these will be queued.
The driver in most cases will spin (hardly ideal, but realistic), even
if they would yield the effect on the completion of the chain is at
least the same or, likely, worse.

To the patch itself, I realized I've been somewhat hamfisted.
thermal_zone_device_init() should not be deferred, and likely should
execute for all zones before the in_suspend lock-out is released. I'll
correct that once we've landed on something.

To my 50ms, it's almost the worst-case, but it happens way more often
than would be comfortable.

>
> --
> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>
> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 RESEND] thermal core: add option to run PM_POST_SUSPEND asynchronously
  2023-12-11 23:25       ` Radu Solea
@ 2023-12-12  7:00         ` Daniel Lezcano
  2023-12-14  0:21           ` Radu Solea
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Lezcano @ 2023-12-12  7:00 UTC (permalink / raw)
  To: Radu Solea; +Cc: linux-pm, rafael

On 12/12/2023 00:25, Radu Solea wrote:
> On Wed, Dec 6, 2023 at 3:23 AM Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
>>
>>
>> Hi Radu,
>>
>> On 06/12/2023 02:20, Radu Solea wrote:
>>> On Wed, Nov 29, 2023 at 4:20 AM Daniel Lezcano
>>> <daniel.lezcano@linaro.org> wrote:
>>>>
>>>> On 21/11/2023 00:40, Radu Solea wrote:
>>>>> Some thermal zones are bus connected and slow to resume, thus
>>>>> delaying actions which depend on completion of PM_POST_SUSPEND.
>>>>> Add optional execution path to resume thermal zones on the system
>>>>> unbounded workqueue.
>>>>>
>>>>> Signed-off-by: Radu Solea <radusolea@google.com>
>>>>> ---
>>>>
>>>> This async change may have a lot of hidden implications.
>>>>
>>>> Could you elaborate more the issue and how the async will fix the problem?
>>>>
>>>> If you have a platform being slow to resume, can you provide numbers
>>>> with and without this option?
>>>>
>>>> Thanks
>>>>      -- D.
>>>>
>>>> --
>>>> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>>>>
>>>> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
>>>> <http://twitter.com/#!/linaroorg> Twitter |
>>>> <http://www.linaro.org/linaro-blog/> Blog
>>>>
>>>
>>> In multicore systems PM_POST_SUSPEND is executed on a single core.
>>> Any work done in the notification chain delays all subsequent actions
>>> in the chain with respect to system time, including the completion of
>>> the write() to /sys/power/state.
>>> I didn't include numbers from my system since they are likely
>>> irrelevant for other systems out there. The particular number I'm
>>> chasing is ~50ms.
>>> This comes from having on-board peripherals as thermal zones, they
>>> execute async and significantly slower than the main core, add a/d
>>> conversions and bus delays to that and it's easy to see those numbers.
>>> Making the entire sequence synchronous to itself and async to
>>> PM_POST_SUSPEND isn't that much of a change, it allows the sequence to
>>> run on any core with spare cycles delayed with whatever the system
>>> unbounded queue load is at the time.
>>> (on my target system) I've seen consistent time gains (those same
>>> 50ms) to PM_POST_SUSPEND completion with this sequence actually
>>> completing before the chain finishes, this will vary from integration
>>> to integration.
>>
>> Sorry but I don't see how you can have a gain of 50ms by doing the
>> restore asynchronously.
>>
>> Can you give a more detailed description of the hardware? How many
>> thermal zones?
>>
>>
> I can't go into much detail about the hardware. But let's put it this
> way, if thermal_zone_device_update() takes 5 ms for each device (read
> temp, get trips, set trips, etc). Assume 5 onboard thermal zones, on a
> good day, ignoring system churn you'd get to around 25ms (already
> significant).
> Now on top of that add that these devices have multiple functions,
> like a PMIC for example. The resume sequence is the perfect time frame
> where you'd encounter more than one operation aimed at any one of
> these devices. Unless you have uncommonly smart drivers and devices,
> these will be queued.
> The driver in most cases will spin (hardly ideal, but realistic), even
> if they would yield the effect on the completion of the chain is at
> least the same or, likely, worse.
> 
> To the patch itself, I realized I've been somewhat hamfisted.
> thermal_zone_device_init() should not be deferred, and likely should
> execute for all zones before the in_suspend lock-out is released. I'll
> correct that once we've landed on something.
> 
> To my 50ms, it's almost the worst-case, but it happens way more often
> than would be comfortable.

If you call monitor_thermal_zone() instead of 
thermal_zone_device_update(), does it speed up the resume time ?


>> --
>> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>>
>> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
>> <http://twitter.com/#!/linaroorg> Twitter |
>> <http://www.linaro.org/linaro-blog/> Blog
>>

-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 RESEND] thermal core: add option to run PM_POST_SUSPEND asynchronously
  2023-12-12  7:00         ` Daniel Lezcano
@ 2023-12-14  0:21           ` Radu Solea
  2023-12-14  8:25             ` Daniel Lezcano
  0 siblings, 1 reply; 15+ messages in thread
From: Radu Solea @ 2023-12-14  0:21 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: linux-pm, rafael

On Mon, Dec 11, 2023 at 11:00 PM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
> On 12/12/2023 00:25, Radu Solea wrote:
> > On Wed, Dec 6, 2023 at 3:23 AM Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
> >>
> >>
> >> Hi Radu,
> >>
> >> On 06/12/2023 02:20, Radu Solea wrote:
> >>> On Wed, Nov 29, 2023 at 4:20 AM Daniel Lezcano
> >>> <daniel.lezcano@linaro.org> wrote:
> >>>>
> >>>> On 21/11/2023 00:40, Radu Solea wrote:
> >>>>> Some thermal zones are bus connected and slow to resume, thus
> >>>>> delaying actions which depend on completion of PM_POST_SUSPEND.
> >>>>> Add optional execution path to resume thermal zones on the system
> >>>>> unbounded workqueue.
> >>>>>
> >>>>> Signed-off-by: Radu Solea <radusolea@google.com>
> >>>>> ---
> >>>>
> >>>> This async change may have a lot of hidden implications.
> >>>>
> >>>> Could you elaborate more the issue and how the async will fix the problem?
> >>>>
> >>>> If you have a platform being slow to resume, can you provide numbers
> >>>> with and without this option?
> >>>>
> >>>> Thanks
> >>>>      -- D.
> >>>>
> >>>> --
> >>>> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
> >>>>
> >>>> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> >>>> <http://twitter.com/#!/linaroorg> Twitter |
> >>>> <http://www.linaro.org/linaro-blog/> Blog
> >>>>
> >>>
> >>> In multicore systems PM_POST_SUSPEND is executed on a single core.
> >>> Any work done in the notification chain delays all subsequent actions
> >>> in the chain with respect to system time, including the completion of
> >>> the write() to /sys/power/state.
> >>> I didn't include numbers from my system since they are likely
> >>> irrelevant for other systems out there. The particular number I'm
> >>> chasing is ~50ms.
> >>> This comes from having on-board peripherals as thermal zones, they
> >>> execute async and significantly slower than the main core, add a/d
> >>> conversions and bus delays to that and it's easy to see those numbers.
> >>> Making the entire sequence synchronous to itself and async to
> >>> PM_POST_SUSPEND isn't that much of a change, it allows the sequence to
> >>> run on any core with spare cycles delayed with whatever the system
> >>> unbounded queue load is at the time.
> >>> (on my target system) I've seen consistent time gains (those same
> >>> 50ms) to PM_POST_SUSPEND completion with this sequence actually
> >>> completing before the chain finishes, this will vary from integration
> >>> to integration.
> >>
> >> Sorry but I don't see how you can have a gain of 50ms by doing the
> >> restore asynchronously.
> >>
> >> Can you give a more detailed description of the hardware? How many
> >> thermal zones?
> >>
> >>
> > I can't go into much detail about the hardware. But let's put it this
> > way, if thermal_zone_device_update() takes 5 ms for each device (read
> > temp, get trips, set trips, etc). Assume 5 onboard thermal zones, on a
> > good day, ignoring system churn you'd get to around 25ms (already
> > significant).
> > Now on top of that add that these devices have multiple functions,
> > like a PMIC for example. The resume sequence is the perfect time frame
> > where you'd encounter more than one operation aimed at any one of
> > these devices. Unless you have uncommonly smart drivers and devices,
> > these will be queued.
> > The driver in most cases will spin (hardly ideal, but realistic), even
> > if they would yield the effect on the completion of the chain is at
> > least the same or, likely, worse.
> >
> > To the patch itself, I realized I've been somewhat hamfisted.
> > thermal_zone_device_init() should not be deferred, and likely should
> > execute for all zones before the in_suspend lock-out is released. I'll
> > correct that once we've landed on something.
> >
> > To my 50ms, it's almost the worst-case, but it happens way more often
> > than would be comfortable.
>
> If you call monitor_thermal_zone() instead of
> thermal_zone_device_update(), does it speed up the resume time ?
>
Looks like it does, I'll rework the patch. Don't think the config
switch is useful anymore.
>
> >> --
> >> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
> >>
> >> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> >> <http://twitter.com/#!/linaroorg> Twitter |
> >> <http://www.linaro.org/linaro-blog/> Blog
> >>
>
> --
> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>
> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 RESEND] thermal core: add option to run PM_POST_SUSPEND asynchronously
  2023-12-14  0:21           ` Radu Solea
@ 2023-12-14  8:25             ` Daniel Lezcano
  2023-12-14 18:26               ` Radu Solea
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel Lezcano @ 2023-12-14  8:25 UTC (permalink / raw)
  To: Radu Solea; +Cc: linux-pm, rafael

On 14/12/2023 01:21, Radu Solea wrote:

[ ... ]

>>> I can't go into much detail about the hardware. But let's put it this
>>> way, if thermal_zone_device_update() takes 5 ms for each device (read
>>> temp, get trips, set trips, etc). Assume 5 onboard thermal zones, on a
>>> good day, ignoring system churn you'd get to around 25ms (already
>>> significant).
>>> Now on top of that add that these devices have multiple functions,
>>> like a PMIC for example. The resume sequence is the perfect time frame
>>> where you'd encounter more than one operation aimed at any one of
>>> these devices. Unless you have uncommonly smart drivers and devices,
>>> these will be queued.
>>> The driver in most cases will spin (hardly ideal, but realistic), even
>>> if they would yield the effect on the completion of the chain is at
>>> least the same or, likely, worse.
>>>
>>> To the patch itself, I realized I've been somewhat hamfisted.
>>> thermal_zone_device_init() should not be deferred, and likely should
>>> execute for all zones before the in_suspend lock-out is released. I'll
>>> correct that once we've landed on something.
>>>
>>> To my 50ms, it's almost the worst-case, but it happens way more often
>>> than would be comfortable.
>>
>> If you call monitor_thermal_zone() instead of
>> thermal_zone_device_update(), does it speed up the resume time ?
>>
> Looks like it does, I'll rework the patch. Don't think the config
> switch is useful anymore.

Well, we probably have to go a bit further in the concept.

There are some thermal zones which may need to be checked immediately 
and others no. That depends on the temperature speed behavior.

The higher is the temperature speed for a device, the lower is the 
polling (including zero).

So the monitoring delay can be used as a criteria to check if the 
thermal zone needs an update or postpone the monitoring.

As we don't want to change the current behavior, we can add a global 
option telling the polling delay above which we call 
monitor_thermal_zone() and defaulting to zero.

Does it make sense?

-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 RESEND] thermal core: add option to run PM_POST_SUSPEND asynchronously
  2023-12-14  8:25             ` Daniel Lezcano
@ 2023-12-14 18:26               ` Radu Solea
  2023-12-18 19:14                 ` Radu Solea
  0 siblings, 1 reply; 15+ messages in thread
From: Radu Solea @ 2023-12-14 18:26 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: linux-pm, rafael

On Thu, Dec 14, 2023 at 12:25 AM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
> On 14/12/2023 01:21, Radu Solea wrote:
>
> [ ... ]
>
> >>> I can't go into much detail about the hardware. But let's put it this
> >>> way, if thermal_zone_device_update() takes 5 ms for each device (read
> >>> temp, get trips, set trips, etc). Assume 5 onboard thermal zones, on a
> >>> good day, ignoring system churn you'd get to around 25ms (already
> >>> significant).
> >>> Now on top of that add that these devices have multiple functions,
> >>> like a PMIC for example. The resume sequence is the perfect time frame
> >>> where you'd encounter more than one operation aimed at any one of
> >>> these devices. Unless you have uncommonly smart drivers and devices,
> >>> these will be queued.
> >>> The driver in most cases will spin (hardly ideal, but realistic), even
> >>> if they would yield the effect on the completion of the chain is at
> >>> least the same or, likely, worse.
> >>>
> >>> To the patch itself, I realized I've been somewhat hamfisted.
> >>> thermal_zone_device_init() should not be deferred, and likely should
> >>> execute for all zones before the in_suspend lock-out is released. I'll
> >>> correct that once we've landed on something.
> >>>
> >>> To my 50ms, it's almost the worst-case, but it happens way more often
> >>> than would be comfortable.
> >>
> >> If you call monitor_thermal_zone() instead of
> >> thermal_zone_device_update(), does it speed up the resume time ?
> >>
> > Looks like it does, I'll rework the patch. Don't think the config
> > switch is useful anymore.
>
> Well, we probably have to go a bit further in the concept.
>
> There are some thermal zones which may need to be checked immediately
> and others no. That depends on the temperature speed behavior.
>
> The higher is the temperature speed for a device, the lower is the
> polling (including zero).
>
> So the monitoring delay can be used as a criteria to check if the
> thermal zone needs an update or postpone the monitoring.
>
> As we don't want to change the current behavior, we can add a global
> option telling the polling delay above which we call
> monitor_thermal_zone() and defaulting to zero.
>
> Does it make sense?
It does, alternatively we add a device tree flag to the tz and not add
a side effect to the delay.
Either works imo.

>
> --
> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>
> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 RESEND] thermal core: add option to run PM_POST_SUSPEND asynchronously
  2023-12-14 18:26               ` Radu Solea
@ 2023-12-18 19:14                 ` Radu Solea
  2023-12-18 19:37                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 15+ messages in thread
From: Radu Solea @ 2023-12-18 19:14 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: linux-pm, rafael

On Thu, Dec 14, 2023 at 10:26 AM Radu Solea <radusolea@google.com> wrote:
>
> On Thu, Dec 14, 2023 at 12:25 AM Daniel Lezcano
> <daniel.lezcano@linaro.org> wrote:
> >
> > On 14/12/2023 01:21, Radu Solea wrote:
> >
> > [ ... ]
> >
> > >>> I can't go into much detail about the hardware. But let's put it this
> > >>> way, if thermal_zone_device_update() takes 5 ms for each device (read
> > >>> temp, get trips, set trips, etc). Assume 5 onboard thermal zones, on a
> > >>> good day, ignoring system churn you'd get to around 25ms (already
> > >>> significant).
> > >>> Now on top of that add that these devices have multiple functions,
> > >>> like a PMIC for example. The resume sequence is the perfect time frame
> > >>> where you'd encounter more than one operation aimed at any one of
> > >>> these devices. Unless you have uncommonly smart drivers and devices,
> > >>> these will be queued.
> > >>> The driver in most cases will spin (hardly ideal, but realistic), even
> > >>> if they would yield the effect on the completion of the chain is at
> > >>> least the same or, likely, worse.
> > >>>
> > >>> To the patch itself, I realized I've been somewhat hamfisted.
> > >>> thermal_zone_device_init() should not be deferred, and likely should
> > >>> execute for all zones before the in_suspend lock-out is released. I'll
> > >>> correct that once we've landed on something.
> > >>>
> > >>> To my 50ms, it's almost the worst-case, but it happens way more often
> > >>> than would be comfortable.
> > >>
> > >> If you call monitor_thermal_zone() instead of
> > >> thermal_zone_device_update(), does it speed up the resume time ?
> > >>
> > > Looks like it does, I'll rework the patch. Don't think the config
> > > switch is useful anymore.
> >
> > Well, we probably have to go a bit further in the concept.
> >
> > There are some thermal zones which may need to be checked immediately
> > and others no. That depends on the temperature speed behavior.
> >
> > The higher is the temperature speed for a device, the lower is the
> > polling (including zero).
> >
> > So the monitoring delay can be used as a criteria to check if the
> > thermal zone needs an update or postpone the monitoring.
> >
> > As we don't want to change the current behavior, we can add a global
> > option telling the polling delay above which we call
> > monitor_thermal_zone() and defaulting to zero.
> >
> > Does it make sense?
> It does, alternatively we add a device tree flag to the tz and not add
> a side effect to the delay.
> Either works imo.
>
On a second, more careful look, I don't think using polling-delay is
going to work as expected here. A 0 delay is used for interrupt-driven
systems. thermal_zone_device_set_polling() will cancel work if the
delay is 0. So it looks like we effectively end up without a resume
update for those devices. We could reuse the delay-based approach with
a resume-specific work item so it doesn't get cancelled. On the other
hand I suspect that the threshold which we would add for async will
never move from zero.
> >
> > --
> > <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
> >
> > Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> > <http://twitter.com/#!/linaroorg> Twitter |
> > <http://www.linaro.org/linaro-blog/> Blog
> >

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 RESEND] thermal core: add option to run PM_POST_SUSPEND asynchronously
  2023-12-18 19:14                 ` Radu Solea
@ 2023-12-18 19:37                   ` Rafael J. Wysocki
  0 siblings, 0 replies; 15+ messages in thread
From: Rafael J. Wysocki @ 2023-12-18 19:37 UTC (permalink / raw)
  To: Radu Solea; +Cc: Daniel Lezcano, linux-pm, rafael

On Mon, Dec 18, 2023 at 8:15 PM Radu Solea <radusolea@google.com> wrote:
>
> On Thu, Dec 14, 2023 at 10:26 AM Radu Solea <radusolea@google.com> wrote:
> >
> > On Thu, Dec 14, 2023 at 12:25 AM Daniel Lezcano
> > <daniel.lezcano@linaro.org> wrote:
> > >
> > > On 14/12/2023 01:21, Radu Solea wrote:
> > >
> > > [ ... ]
> > >
> > > >>> I can't go into much detail about the hardware. But let's put it this
> > > >>> way, if thermal_zone_device_update() takes 5 ms for each device (read
> > > >>> temp, get trips, set trips, etc). Assume 5 onboard thermal zones, on a
> > > >>> good day, ignoring system churn you'd get to around 25ms (already
> > > >>> significant).
> > > >>> Now on top of that add that these devices have multiple functions,
> > > >>> like a PMIC for example. The resume sequence is the perfect time frame
> > > >>> where you'd encounter more than one operation aimed at any one of
> > > >>> these devices. Unless you have uncommonly smart drivers and devices,
> > > >>> these will be queued.
> > > >>> The driver in most cases will spin (hardly ideal, but realistic), even
> > > >>> if they would yield the effect on the completion of the chain is at
> > > >>> least the same or, likely, worse.
> > > >>>
> > > >>> To the patch itself, I realized I've been somewhat hamfisted.
> > > >>> thermal_zone_device_init() should not be deferred, and likely should
> > > >>> execute for all zones before the in_suspend lock-out is released. I'll
> > > >>> correct that once we've landed on something.
> > > >>>
> > > >>> To my 50ms, it's almost the worst-case, but it happens way more often
> > > >>> than would be comfortable.
> > > >>
> > > >> If you call monitor_thermal_zone() instead of
> > > >> thermal_zone_device_update(), does it speed up the resume time ?
> > > >>
> > > > Looks like it does, I'll rework the patch. Don't think the config
> > > > switch is useful anymore.
> > >
> > > Well, we probably have to go a bit further in the concept.
> > >
> > > There are some thermal zones which may need to be checked immediately
> > > and others no. That depends on the temperature speed behavior.
> > >
> > > The higher is the temperature speed for a device, the lower is the
> > > polling (including zero).
> > >
> > > So the monitoring delay can be used as a criteria to check if the
> > > thermal zone needs an update or postpone the monitoring.
> > >
> > > As we don't want to change the current behavior, we can add a global
> > > option telling the polling delay above which we call
> > > monitor_thermal_zone() and defaulting to zero.
> > >
> > > Does it make sense?
> > It does, alternatively we add a device tree flag to the tz and not add
> > a side effect to the delay.
> > Either works imo.
> >
> On a second, more careful look, I don't think using polling-delay is
> going to work as expected here. A 0 delay is used for interrupt-driven
> systems. thermal_zone_device_set_polling() will cancel work if the
> delay is 0. So it looks like we effectively end up without a resume
> update for those devices. We could reuse the delay-based approach with
> a resume-specific work item so it doesn't get cancelled. On the other
> hand I suspect that the threshold which we would add for async will
> never move from zero.

Please see https://lore.kernel.org/linux-pm/1886695.tdWV9SEqCh@kreacher/

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-12-18 19:37 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-20 23:40 [PATCH v2 RESEND] thermal core: add option to run PM_POST_SUSPEND asynchronously Radu Solea
2023-11-29 12:20 ` Daniel Lezcano
2023-12-06  1:20   ` Radu Solea
2023-12-06 11:23     ` Daniel Lezcano
2023-12-11 23:25       ` Radu Solea
2023-12-12  7:00         ` Daniel Lezcano
2023-12-14  0:21           ` Radu Solea
2023-12-14  8:25             ` Daniel Lezcano
2023-12-14 18:26               ` Radu Solea
2023-12-18 19:14                 ` Radu Solea
2023-12-18 19:37                   ` Rafael J. Wysocki
2023-11-29 13:08 ` Rafael J. Wysocki
2023-11-30 20:33   ` Radu Solea
2023-12-01  9:12     ` Daniel Lezcano
2023-12-06  1:31   ` Radu Solea

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.