linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] drivers: devfreq: use DELAYED_WORK in DEVFREQ monitoring subsystem
@ 2020-01-27 15:17 lukasz.luba
  2020-01-27 15:17 ` [PATCH 1/1] drivers: devfreq: add DELAYED_WORK to " lukasz.luba
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: lukasz.luba @ 2020-01-27 15:17 UTC (permalink / raw)
  To: myungjoo.ham, kyungmin.park, cw00.choi, linux-pm, linux-kernel
  Cc: b.zolnierkie, lukasz.luba

From: Lukasz Luba <lukasz.luba@arm.com>

Hi all,

This patch is a continuation of my previous work for fixing DEVFREQ monitoring
subsystem [1]. The issue is around DEFERRABLE_WORK, which uses TIMER_DEFERRABLE
under the hood which will work normally when the system is busy, but will not
cause a CPU to come out of idle and serve the DEVFREQ monitoring requests.

This is especially important in the SMP systems with many CPUs, when the load
balance tries to keep some CPUs idle. The next service request could not be
triggered when the CPU went idle in the meantime.

The DELAYED_WORK is going to be triggered even on an idle CPU. This will allow
to call the DEVFREQ monitoring in reliable intervals. Some of the drivers might
use internal counters to monitor their load, when the DEVFREQ work is not
triggered in a predictable way, these counters might overflow leaving the
device in undefined state.

To observe the difference, the trace output might be used, i.e.

echo 1 > /sys/kernel/debug/tracing/events/devfreq/enable
#your test starts here, i.e. 'sleep 5' or 'dd ' or 'gfxbench'
echo 0 > /sys/kernel/debug/tracing/events/devfreq/enable
cat /sys/kernel/debug/tracing/trace

When there are some registered devfreq drivers, you should see the traces
'devfreq_moniotor' triggered in reliable intervals.

The patch set is based on Chanwoo's devfreq repository and branch
'devfreq-next' [2].

Regards,
Lukasz Luba

[1] https://lkml.org/lkml/2019/2/12/1179
[2] https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-next


Lukasz Luba (1):
  drivers: devfreq: add DELAYED_WORK to monitoring subsystem

 drivers/devfreq/Kconfig   | 19 +++++++++++++++++++
 drivers/devfreq/devfreq.c |  6 +++++-
 2 files changed, 24 insertions(+), 1 deletion(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/1] drivers: devfreq: add DELAYED_WORK to monitoring subsystem
  2020-01-27 15:17 [PATCH 0/1] drivers: devfreq: use DELAYED_WORK in DEVFREQ monitoring subsystem lukasz.luba
@ 2020-01-27 15:17 ` lukasz.luba
  2020-01-30 11:47 ` [PATCH 0/1] drivers: devfreq: use DELAYED_WORK in DEVFREQ " Lukasz Luba
  2020-01-30 11:47 ` Lukasz Luba
  2 siblings, 0 replies; 8+ messages in thread
From: lukasz.luba @ 2020-01-27 15:17 UTC (permalink / raw)
  To: myungjoo.ham, kyungmin.park, cw00.choi, linux-pm, linux-kernel
  Cc: b.zolnierkie, lukasz.luba

From: Lukasz Luba <lukasz.luba@arm.com>

This change aims to add different mechanism used in monitoring subsystem
which rely on DELAYED_WORK instead of DEFERRABLE_WORK. The DEFERRABLE_WORK
used TIMER_DEFERRABLE under the hood, which will not cause a CPU to come
out of idle and service the DEVFREQ monitoring queued work. This is
especially important in SMP systems, when some of the CPUs might be kept
in idle, while the other may have a big load.

The DELAYED_WORK is going to be triggered even on an idle CPU and will
serve the monitoring update request.

The difference could be observed in the trace output, when devfreq trace
option is enabled, i.e.
echo 1 > /sys/kernel/debug/tracing/events/devfreq/enable
sleep 5
echo 0 > /sys/kernel/debug/tracing/events/devfreq/enable
cat /sys/kernel/debug/tracing/trace

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 drivers/devfreq/Kconfig   | 19 +++++++++++++++++++
 drivers/devfreq/devfreq.c |  6 +++++-
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/devfreq/Kconfig b/drivers/devfreq/Kconfig
index 0b1df12e0f21..fba9cd55f434 100644
--- a/drivers/devfreq/Kconfig
+++ b/drivers/devfreq/Kconfig
@@ -30,6 +30,25 @@ menuconfig PM_DEVFREQ
 
 if PM_DEVFREQ
 
+config DEVFREQ_USE_DELAYED_WORK
+	bool "Use DELAYED_WORK in DEVFREQ monitoring subsystem"
+	help
+	  This changes the default DEVFREQ framework monitoring subsystem
+	  from using DEFERRABLE_WORK to DELAYED_WORK. The former uses
+	  TIMER_DEFERRABLE, which will work normally when the system is busy,
+	  but will not cause a CPU to come out of idle and serve the DEVFREQ
+	  monitoring requests. This is especially important in the SMP systems
+	  with many CPUs, when the load balance tries to keep some CPUs idle.
+	  The next service request could not be triggered when the CPU went
+	  idle in the meantime.
+
+	  The DELAYED_WORK is going to be triggered even on an idle CPU. This
+	  will allow to call the DEVFREQ driver in more reliable fashion and
+	  prevent i.e. from overflowing the device's counters.
+
+	  Enable this when you have SMP system and you do not see the DEVFREQ
+	  monitoring to be triggered in your trace output.
+
 comment "DEVFREQ Governors"
 
 config DEVFREQ_GOV_SIMPLE_ONDEMAND
diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index cceee8bc3c2f..319c63c4774c 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -457,7 +457,11 @@ void devfreq_monitor_start(struct devfreq *devfreq)
 	if (devfreq->governor->interrupt_driven)
 		return;
 
-	INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor);
+	if (IS_ENABLED(CONFIG_DEVFREQ_USE_DELAYED_WORK))
+		INIT_DELAYED_WORK(&devfreq->work, devfreq_monitor);
+	else
+		INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor);
+
 	if (devfreq->profile->polling_ms)
 		queue_delayed_work(devfreq_wq, &devfreq->work,
 			msecs_to_jiffies(devfreq->profile->polling_ms));
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] drivers: devfreq: use DELAYED_WORK in DEVFREQ monitoring subsystem
  2020-01-27 15:17 [PATCH 0/1] drivers: devfreq: use DELAYED_WORK in DEVFREQ monitoring subsystem lukasz.luba
  2020-01-27 15:17 ` [PATCH 1/1] drivers: devfreq: add DELAYED_WORK to " lukasz.luba
@ 2020-01-30 11:47 ` Lukasz Luba
  2020-01-31  0:42   ` Chanwoo Choi
  2020-01-30 11:47 ` Lukasz Luba
  2 siblings, 1 reply; 8+ messages in thread
From: Lukasz Luba @ 2020-01-30 11:47 UTC (permalink / raw)
  To: myungjoo.ham, kyungmin.park, cw00.choi, linux-pm, linux-kernel
  Cc: b.zolnierkie

Hi Chanwoo, MyungJoo,

Gentle ping. The issue is not only in the devfreq itself,
but also it affects thermal. The devfreq cooling rely on
busy_time and total_time updated by the devfreq monitoring
(in simple_ondemand).
Thermal uses DELAYED_WORK and is more reliable, but uses stale
data from devfreq_dev_stats. It is especially visible when
you have cgroup spanning one cluster. Android uses cgroups
heavily. You can make easily this setup using 'taskset',
run some benchmarks and observe 'devfreq_monitor' traces and
timestamps, i.e. for your exynos-bus.

The patch is really non-invasive and simple. It can be a good starting
point for testing and proposing other solutions.

Regards,
Lukasz

On 1/27/20 3:17 PM, lukasz.luba@arm.com wrote:
> From: Lukasz Luba <lukasz.luba@arm.com>
> 
> Hi all,
> 
> This patch is a continuation of my previous work for fixing DEVFREQ monitoring
> subsystem [1]. The issue is around DEFERRABLE_WORK, which uses TIMER_DEFERRABLE
> under the hood which will work normally when the system is busy, but will not
> cause a CPU to come out of idle and serve the DEVFREQ monitoring requests.
> 
> This is especially important in the SMP systems with many CPUs, when the load
> balance tries to keep some CPUs idle. The next service request could not be
> triggered when the CPU went idle in the meantime.
> 
> The DELAYED_WORK is going to be triggered even on an idle CPU. This will allow
> to call the DEVFREQ monitoring in reliable intervals. Some of the drivers might
> use internal counters to monitor their load, when the DEVFREQ work is not
> triggered in a predictable way, these counters might overflow leaving the
> device in undefined state.
> 
> To observe the difference, the trace output might be used, i.e.
> 
> echo 1 > /sys/kernel/debug/tracing/events/devfreq/enable
> #your test starts here, i.e. 'sleep 5' or 'dd ' or 'gfxbench'
> echo 0 > /sys/kernel/debug/tracing/events/devfreq/enable
> cat /sys/kernel/debug/tracing/trace
> 
> When there are some registered devfreq drivers, you should see the traces
> 'devfreq_moniotor' triggered in reliable intervals.
> 
> The patch set is based on Chanwoo's devfreq repository and branch
> 'devfreq-next' [2].
> 
> Regards,
> Lukasz Luba
> 
> [1] https://lkml.org/lkml/2019/2/12/1179
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-next
> 
> 
> Lukasz Luba (1):
>    drivers: devfreq: add DELAYED_WORK to monitoring subsystem
> 
>   drivers/devfreq/Kconfig   | 19 +++++++++++++++++++
>   drivers/devfreq/devfreq.c |  6 +++++-
>   2 files changed, 24 insertions(+), 1 deletion(-)
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] drivers: devfreq: use DELAYED_WORK in DEVFREQ monitoring subsystem
  2020-01-27 15:17 [PATCH 0/1] drivers: devfreq: use DELAYED_WORK in DEVFREQ monitoring subsystem lukasz.luba
  2020-01-27 15:17 ` [PATCH 1/1] drivers: devfreq: add DELAYED_WORK to " lukasz.luba
  2020-01-30 11:47 ` [PATCH 0/1] drivers: devfreq: use DELAYED_WORK in DEVFREQ " Lukasz Luba
@ 2020-01-30 11:47 ` Lukasz Luba
  2 siblings, 0 replies; 8+ messages in thread
From: Lukasz Luba @ 2020-01-30 11:47 UTC (permalink / raw)
  To: myungjoo.ham, kyungmin.park, cw00.choi, linux-pm, linux-kernel
  Cc: b.zolnierkie

Hi Chanwoo, MyungJoo,

Gentle ping. The issue is not only in the devfreq itself,
but also it affects thermal. The devfreq cooling rely on
busy_time and total_time updated by the devfreq monitoring
(in simple_ondemand).
Thermal uses DELAYED_WORK and is more reliable, but uses stale
data from devfreq_dev_stats. It is especially visible when
you have cgroup spanning one cluster. Android uses cgroups
heavily. You can make easily this setup using 'taskset',
run some benchmarks and observe 'devfreq_monitor' traces and
timestamps, i.e. for your exynos-bus.

The patch is really non-invasive and simple. It can be a good starting
point for testing and proposing other solutions.

Regards,
Lukasz

On 1/27/20 3:17 PM, lukasz.luba@arm.com wrote:
> From: Lukasz Luba <lukasz.luba@arm.com>
> 
> Hi all,
> 
> This patch is a continuation of my previous work for fixing DEVFREQ monitoring
> subsystem [1]. The issue is around DEFERRABLE_WORK, which uses TIMER_DEFERRABLE
> under the hood which will work normally when the system is busy, but will not
> cause a CPU to come out of idle and serve the DEVFREQ monitoring requests.
> 
> This is especially important in the SMP systems with many CPUs, when the load
> balance tries to keep some CPUs idle. The next service request could not be
> triggered when the CPU went idle in the meantime.
> 
> The DELAYED_WORK is going to be triggered even on an idle CPU. This will allow
> to call the DEVFREQ monitoring in reliable intervals. Some of the drivers might
> use internal counters to monitor their load, when the DEVFREQ work is not
> triggered in a predictable way, these counters might overflow leaving the
> device in undefined state.
> 
> To observe the difference, the trace output might be used, i.e.
> 
> echo 1 > /sys/kernel/debug/tracing/events/devfreq/enable
> #your test starts here, i.e. 'sleep 5' or 'dd ' or 'gfxbench'
> echo 0 > /sys/kernel/debug/tracing/events/devfreq/enable
> cat /sys/kernel/debug/tracing/trace
> 
> When there are some registered devfreq drivers, you should see the traces
> 'devfreq_moniotor' triggered in reliable intervals.
> 
> The patch set is based on Chanwoo's devfreq repository and branch
> 'devfreq-next' [2].
> 
> Regards,
> Lukasz Luba
> 
> [1] https://lkml.org/lkml/2019/2/12/1179
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-next
> 
> 
> Lukasz Luba (1):
>    drivers: devfreq: add DELAYED_WORK to monitoring subsystem
> 
>   drivers/devfreq/Kconfig   | 19 +++++++++++++++++++
>   drivers/devfreq/devfreq.c |  6 +++++-
>   2 files changed, 24 insertions(+), 1 deletion(-)
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] drivers: devfreq: use DELAYED_WORK in DEVFREQ monitoring subsystem
  2020-01-30 11:47 ` [PATCH 0/1] drivers: devfreq: use DELAYED_WORK in DEVFREQ " Lukasz Luba
@ 2020-01-31  0:42   ` Chanwoo Choi
  2020-01-31  0:47     ` Chanwoo Choi
  0 siblings, 1 reply; 8+ messages in thread
From: Chanwoo Choi @ 2020-01-31  0:42 UTC (permalink / raw)
  To: Lukasz Luba, myungjoo.ham, kyungmin.park, linux-pm, linux-kernel
  Cc: b.zolnierkie, Kamil Konieczny

Hi Lukasz,

On 1/30/20 8:47 PM, Lukasz Luba wrote:
> Hi Chanwoo, MyungJoo,
> 
> Gentle ping. The issue is not only in the devfreq itself,
> but also it affects thermal. The devfreq cooling rely on
> busy_time and total_time updated by the devfreq monitoring
> (in simple_ondemand).
> Thermal uses DELAYED_WORK and is more reliable, but uses stale
> data from devfreq_dev_stats. It is especially visible when
> you have cgroup spanning one cluster. Android uses cgroups
> heavily. You can make easily this setup using 'taskset',
> run some benchmarks and observe 'devfreq_monitor' traces and
> timestamps, i.e. for your exynos-bus.
> 
> The patch is really non-invasive and simple. It can be a good starting
> point for testing and proposing other solutions.

Sorry for late reply. I'm preparing the RFC patch about my approach
to support this requirement as following:

As you knew, DEFERRABLE_WORK with CONFIG_NO_HZ focuses on removing
the redundant of power-consumption by preventing the unneeded wakeup
from idle state if there are no any interrupts and runnable threads.

Finally, I agree the requirement of delaywd_work for devfreq subsystem.
But, I would like to support both deferrable_work and delayed_work
on devfreq subsystem. It is better to select either deferrable_work
or delayed_work by user like Kamil's suggestion[1].
[1] https://lore.kernel.org/patchwork/patch/1164317/
- [2/4] PM / devfreq: add possibility for delayed work

But, I want to change the timer type for devfreq device
using simple_ondemand governor via sysfs as following:

Example:

1.
enum work_timer_type {
	DEVFREQ_WORK_TIMER_DEFERRABLE = 0,
	DEVFREQ_WORK_TIMER_DELAYED = 0,
};

struct devfreq_simple_ondemand_data {
	unsigned int upthreshold;
	unsigned int downdifferential;
	enum work_timer_type timer_type;
};

The developer of devfreq device driver can choose
the default work time type by initializing the 'timer_type of 
struct devfreq_simple_ondemand_data'.

2. Change the work timer type at the runtime
- Change the work timer type from 'deferrable' to 'delayed'
$ echo delayed > /sys/class/devfreq/devfreq0/work_timer_type
$ cat /sys/class/devfreq/devfreq0/work_timer_type
delayed

- Change the work timer type from 'delayed' to 'deferrable'
$ echo deferrable > /sys/class/devfreq/devfreq0/work_timer_type
$ cat /sys/class/devfreq/devfreq0/work_timer_type
deferrable

I'm developing the RFC patch and then I'll send it as soon as possible.

> 
> Regards,
> Lukasz
> 
> On 1/27/20 3:17 PM, lukasz.luba@arm.com wrote:
>> From: Lukasz Luba <lukasz.luba@arm.com>
>>
>> Hi all,
>>
>> This patch is a continuation of my previous work for fixing DEVFREQ monitoring
>> subsystem [1]. The issue is around DEFERRABLE_WORK, which uses TIMER_DEFERRABLE
>> under the hood which will work normally when the system is busy, but will not
>> cause a CPU to come out of idle and serve the DEVFREQ monitoring requests.
>>
>> This is especially important in the SMP systems with many CPUs, when the load
>> balance tries to keep some CPUs idle. The next service request could not be
>> triggered when the CPU went idle in the meantime.
>>
>> The DELAYED_WORK is going to be triggered even on an idle CPU. This will allow
>> to call the DEVFREQ monitoring in reliable intervals. Some of the drivers might
>> use internal counters to monitor their load, when the DEVFREQ work is not
>> triggered in a predictable way, these counters might overflow leaving the
>> device in undefined state.
>>
>> To observe the difference, the trace output might be used, i.e.
>>
>> echo 1 > /sys/kernel/debug/tracing/events/devfreq/enable
>> #your test starts here, i.e. 'sleep 5' or 'dd ' or 'gfxbench'
>> echo 0 > /sys/kernel/debug/tracing/events/devfreq/enable
>> cat /sys/kernel/debug/tracing/trace
>>
>> When there are some registered devfreq drivers, you should see the traces
>> 'devfreq_moniotor' triggered in reliable intervals.
>>
>> The patch set is based on Chanwoo's devfreq repository and branch
>> 'devfreq-next' [2].
>>
>> Regards,
>> Lukasz Luba
>>
>> [1] https://protect2.fireeye.com/url?k=d26154c0-8fb20fd4-d260df8f-0cc47a31ce4e-ba68a61e16ee1965&u=https://lkml.org/lkml/2019/2/12/1179
>> [2] https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-next
>>
>>
>> Lukasz Luba (1):
>>    drivers: devfreq: add DELAYED_WORK to monitoring subsystem
>>
>>   drivers/devfreq/Kconfig   | 19 +++++++++++++++++++
>>   drivers/devfreq/devfreq.c |  6 +++++-
>>   2 files changed, 24 insertions(+), 1 deletion(-)
>>
> 
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] drivers: devfreq: use DELAYED_WORK in DEVFREQ monitoring subsystem
  2020-01-31  0:42   ` Chanwoo Choi
@ 2020-01-31  0:47     ` Chanwoo Choi
  2020-01-31  9:38       ` Lukasz Luba
  0 siblings, 1 reply; 8+ messages in thread
From: Chanwoo Choi @ 2020-01-31  0:47 UTC (permalink / raw)
  To: Lukasz Luba, myungjoo.ham, kyungmin.park, linux-pm, linux-kernel
  Cc: b.zolnierkie, Kamil Konieczny

On 1/31/20 9:42 AM, Chanwoo Choi wrote:
> Hi Lukasz,
> 
> On 1/30/20 8:47 PM, Lukasz Luba wrote:
>> Hi Chanwoo, MyungJoo,
>>
>> Gentle ping. The issue is not only in the devfreq itself,
>> but also it affects thermal. The devfreq cooling rely on
>> busy_time and total_time updated by the devfreq monitoring
>> (in simple_ondemand).
>> Thermal uses DELAYED_WORK and is more reliable, but uses stale
>> data from devfreq_dev_stats. It is especially visible when
>> you have cgroup spanning one cluster. Android uses cgroups
>> heavily. You can make easily this setup using 'taskset',
>> run some benchmarks and observe 'devfreq_monitor' traces and
>> timestamps, i.e. for your exynos-bus.
>>
>> The patch is really non-invasive and simple. It can be a good starting
>> point for testing and proposing other solutions.
> 
> Sorry for late reply. I'm preparing the RFC patch about my approach
> to support this requirement as following:
> 
> As you knew, DEFERRABLE_WORK with CONFIG_NO_HZ focuses on removing
> the redundant of power-consumption by preventing the unneeded wakeup
> from idle state if there are no any interrupts and runnable threads.
> 
> Finally, I agree the requirement of delaywd_work for devfreq subsystem.
> But, I would like to support both deferrable_work and delayed_work
> on devfreq subsystem. It is better to select either deferrable_work
> or delayed_work by user like Kamil's suggestion[1].
> [1] https://lore.kernel.org/patchwork/patch/1164317/
> - [2/4] PM / devfreq: add possibility for delayed work
> 
> But, I want to change the timer type for devfreq device
> using simple_ondemand governor via sysfs as following:
> 
> Example:
> 
> 1.
> enum work_timer_type {
> 	DEVFREQ_WORK_TIMER_DEFERRABLE = 0,
> 	DEVFREQ_WORK_TIMER_DELAYED = 0,
> };
> 
> struct devfreq_simple_ondemand_data {
> 	unsigned int upthreshold;
> 	unsigned int downdifferential;
> 	enum work_timer_type timer_type;
> };
> 
> The developer of devfreq device driver can choose
> the default work time type by initializing the 'timer_type of 
> struct devfreq_simple_ondemand_data'.
> 
> 2. Change the work timer type at the runtime
> - Change the work timer type from 'deferrable' to 'delayed'
> $ echo delayed > /sys/class/devfreq/devfreq0/work_timer_type
> $ cat /sys/class/devfreq/devfreq0/work_timer_type
> delayed
> 
> - Change the work timer type from 'delayed' to 'deferrable'
> $ echo deferrable > /sys/class/devfreq/devfreq0/work_timer_type
> $ cat /sys/class/devfreq/devfreq0/work_timer_type
> deferrable
> 

And
Only show '/sys/class/devfreq/devfreq0/work_timer_type' sysfs attribute,
if devfreq device uses the simple_ondemand. Because this 'work_timer_type'
sysfs attribute only depends on simple_ondemand governor and are useful.

So, 'work_timer_type' sysfs attribute will be handled
at drivers/devfreq/governor_simpleondemand.c.

After posting my suggestion, we can discuss it.


> I'm developing the RFC patch and then I'll send it as soon as possible.
> 
>>
>> Regards,
>> Lukasz
>>
>> On 1/27/20 3:17 PM, lukasz.luba@arm.com wrote:
>>> From: Lukasz Luba <lukasz.luba@arm.com>
>>>
>>> Hi all,
>>>
>>> This patch is a continuation of my previous work for fixing DEVFREQ monitoring
>>> subsystem [1]. The issue is around DEFERRABLE_WORK, which uses TIMER_DEFERRABLE
>>> under the hood which will work normally when the system is busy, but will not
>>> cause a CPU to come out of idle and serve the DEVFREQ monitoring requests.
>>>
>>> This is especially important in the SMP systems with many CPUs, when the load
>>> balance tries to keep some CPUs idle. The next service request could not be
>>> triggered when the CPU went idle in the meantime.
>>>
>>> The DELAYED_WORK is going to be triggered even on an idle CPU. This will allow
>>> to call the DEVFREQ monitoring in reliable intervals. Some of the drivers might
>>> use internal counters to monitor their load, when the DEVFREQ work is not
>>> triggered in a predictable way, these counters might overflow leaving the
>>> device in undefined state.
>>>
>>> To observe the difference, the trace output might be used, i.e.
>>>
>>> echo 1 > /sys/kernel/debug/tracing/events/devfreq/enable
>>> #your test starts here, i.e. 'sleep 5' or 'dd ' or 'gfxbench'
>>> echo 0 > /sys/kernel/debug/tracing/events/devfreq/enable
>>> cat /sys/kernel/debug/tracing/trace
>>>
>>> When there are some registered devfreq drivers, you should see the traces
>>> 'devfreq_moniotor' triggered in reliable intervals.
>>>
>>> The patch set is based on Chanwoo's devfreq repository and branch
>>> 'devfreq-next' [2].
>>>
>>> Regards,
>>> Lukasz Luba
>>>
>>> [1] https://protect2.fireeye.com/url?k=d26154c0-8fb20fd4-d260df8f-0cc47a31ce4e-ba68a61e16ee1965&u=https://lkml.org/lkml/2019/2/12/1179
>>> [2] https://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git/log/?h=devfreq-next
>>>
>>>
>>> Lukasz Luba (1):
>>>    drivers: devfreq: add DELAYED_WORK to monitoring subsystem
>>>
>>>   drivers/devfreq/Kconfig   | 19 +++++++++++++++++++
>>>   drivers/devfreq/devfreq.c |  6 +++++-
>>>   2 files changed, 24 insertions(+), 1 deletion(-)
>>>
>>
>>
> 
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] drivers: devfreq: use DELAYED_WORK in DEVFREQ monitoring subsystem
  2020-01-31  0:47     ` Chanwoo Choi
@ 2020-01-31  9:38       ` Lukasz Luba
  2020-02-03  1:10         ` Chanwoo Choi
  0 siblings, 1 reply; 8+ messages in thread
From: Lukasz Luba @ 2020-01-31  9:38 UTC (permalink / raw)
  To: Chanwoo Choi, myungjoo.ham, kyungmin.park, linux-pm, linux-kernel
  Cc: b.zolnierkie, Kamil Konieczny

Hi Chanwoo,

On 1/31/20 12:47 AM, Chanwoo Choi wrote:
> On 1/31/20 9:42 AM, Chanwoo Choi wrote:
>> Hi Lukasz,
>>
>> On 1/30/20 8:47 PM, Lukasz Luba wrote:
>>> Hi Chanwoo, MyungJoo,
>>>
>>> Gentle ping. The issue is not only in the devfreq itself,
>>> but also it affects thermal. The devfreq cooling rely on
>>> busy_time and total_time updated by the devfreq monitoring
>>> (in simple_ondemand).
>>> Thermal uses DELAYED_WORK and is more reliable, but uses stale
>>> data from devfreq_dev_stats. It is especially visible when
>>> you have cgroup spanning one cluster. Android uses cgroups
>>> heavily. You can make easily this setup using 'taskset',
>>> run some benchmarks and observe 'devfreq_monitor' traces and
>>> timestamps, i.e. for your exynos-bus.
>>>
>>> The patch is really non-invasive and simple. It can be a good starting
>>> point for testing and proposing other solutions.
>>
>> Sorry for late reply. I'm preparing the RFC patch about my approach
>> to support this requirement as following:
>>
>> As you knew, DEFERRABLE_WORK with CONFIG_NO_HZ focuses on removing
>> the redundant of power-consumption by preventing the unneeded wakeup
>> from idle state if there are no any interrupts and runnable threads.
>>
>> Finally, I agree the requirement of delaywd_work for devfreq subsystem.
>> But, I would like to support both deferrable_work and delayed_work
>> on devfreq subsystem. It is better to select either deferrable_work
>> or delayed_work by user like Kamil's suggestion[1].
>> [1] https://lore.kernel.org/patchwork/patch/1164317/
>> - [2/4] PM / devfreq: add possibility for delayed work
>>
>> But, I want to change the timer type for devfreq device
>> using simple_ondemand governor via sysfs as following:
>>
>> Example:
>>
>> 1.
>> enum work_timer_type {
>> 	DEVFREQ_WORK_TIMER_DEFERRABLE = 0,
>> 	DEVFREQ_WORK_TIMER_DELAYED = 0,
>> };
>>
>> struct devfreq_simple_ondemand_data {
>> 	unsigned int upthreshold;
>> 	unsigned int downdifferential;
>> 	enum work_timer_type timer_type;
>> };
>>
>> The developer of devfreq device driver can choose
>> the default work time type by initializing the 'timer_type of
>> struct devfreq_simple_ondemand_data'.
>>
>> 2. Change the work timer type at the runtime
>> - Change the work timer type from 'deferrable' to 'delayed'
>> $ echo delayed > /sys/class/devfreq/devfreq0/work_timer_type
>> $ cat /sys/class/devfreq/devfreq0/work_timer_type
>> delayed
>>
>> - Change the work timer type from 'delayed' to 'deferrable'
>> $ echo deferrable > /sys/class/devfreq/devfreq0/work_timer_type
>> $ cat /sys/class/devfreq/devfreq0/work_timer_type
>> deferrable
>>
> 
> And
> Only show '/sys/class/devfreq/devfreq0/work_timer_type' sysfs attribute,
> if devfreq device uses the simple_ondemand. Because this 'work_timer_type'
> sysfs attribute only depends on simple_ondemand governor and are useful.
> 
> So, 'work_timer_type' sysfs attribute will be handled
> at drivers/devfreq/governor_simpleondemand.c.
> 
> After posting my suggestion, we can discuss it.
> 
> 
>> I'm developing the RFC patch and then I'll send it as soon as possible.

Good, thank you for the explanation. For the first glance the design
looks OK, we can discuss it a bit more in you RFC series.
I would recommend to not make it conditional on simple_ondemand governor
just add a comment that for i.e. performance or passive governors it has
less sense to use this setting. There might be some other governors
loaded as modules, which could benefit from it, or in Android e.g.
https://android.googlesource.com/kernel/msm/+/refs/heads/android-msm-coral-4.14-android10/drivers/devfreq/governor_msm_adreno_tz.c

It would be good if it can land in mainline before v5.8-v5.9.

Regards,
Lukasz



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/1] drivers: devfreq: use DELAYED_WORK in DEVFREQ monitoring subsystem
  2020-01-31  9:38       ` Lukasz Luba
@ 2020-02-03  1:10         ` Chanwoo Choi
  0 siblings, 0 replies; 8+ messages in thread
From: Chanwoo Choi @ 2020-02-03  1:10 UTC (permalink / raw)
  To: Lukasz Luba, myungjoo.ham, kyungmin.park, linux-pm, linux-kernel
  Cc: b.zolnierkie, Kamil Konieczny

On 1/31/20 6:38 PM, Lukasz Luba wrote:
> Hi Chanwoo,
> 
> On 1/31/20 12:47 AM, Chanwoo Choi wrote:
>> On 1/31/20 9:42 AM, Chanwoo Choi wrote:
>>> Hi Lukasz,
>>>
>>> On 1/30/20 8:47 PM, Lukasz Luba wrote:
>>>> Hi Chanwoo, MyungJoo,
>>>>
>>>> Gentle ping. The issue is not only in the devfreq itself,
>>>> but also it affects thermal. The devfreq cooling rely on
>>>> busy_time and total_time updated by the devfreq monitoring
>>>> (in simple_ondemand).
>>>> Thermal uses DELAYED_WORK and is more reliable, but uses stale
>>>> data from devfreq_dev_stats. It is especially visible when
>>>> you have cgroup spanning one cluster. Android uses cgroups
>>>> heavily. You can make easily this setup using 'taskset',
>>>> run some benchmarks and observe 'devfreq_monitor' traces and
>>>> timestamps, i.e. for your exynos-bus.
>>>>
>>>> The patch is really non-invasive and simple. It can be a good starting
>>>> point for testing and proposing other solutions.
>>>
>>> Sorry for late reply. I'm preparing the RFC patch about my approach
>>> to support this requirement as following:
>>>
>>> As you knew, DEFERRABLE_WORK with CONFIG_NO_HZ focuses on removing
>>> the redundant of power-consumption by preventing the unneeded wakeup
>>> from idle state if there are no any interrupts and runnable threads.
>>>
>>> Finally, I agree the requirement of delaywd_work for devfreq subsystem.
>>> But, I would like to support both deferrable_work and delayed_work
>>> on devfreq subsystem. It is better to select either deferrable_work
>>> or delayed_work by user like Kamil's suggestion[1].
>>> [1] https://lore.kernel.org/patchwork/patch/1164317/
>>> - [2/4] PM / devfreq: add possibility for delayed work
>>>
>>> But, I want to change the timer type for devfreq device
>>> using simple_ondemand governor via sysfs as following:
>>>
>>> Example:
>>>
>>> 1.
>>> enum work_timer_type {
>>>     DEVFREQ_WORK_TIMER_DEFERRABLE = 0,
>>>     DEVFREQ_WORK_TIMER_DELAYED = 0,
>>> };
>>>
>>> struct devfreq_simple_ondemand_data {
>>>     unsigned int upthreshold;
>>>     unsigned int downdifferential;
>>>     enum work_timer_type timer_type;
>>> };
>>>
>>> The developer of devfreq device driver can choose
>>> the default work time type by initializing the 'timer_type of
>>> struct devfreq_simple_ondemand_data'.
>>>
>>> 2. Change the work timer type at the runtime
>>> - Change the work timer type from 'deferrable' to 'delayed'
>>> $ echo delayed > /sys/class/devfreq/devfreq0/work_timer_type
>>> $ cat /sys/class/devfreq/devfreq0/work_timer_type
>>> delayed
>>>
>>> - Change the work timer type from 'delayed' to 'deferrable'
>>> $ echo deferrable > /sys/class/devfreq/devfreq0/work_timer_type
>>> $ cat /sys/class/devfreq/devfreq0/work_timer_type
>>> deferrable
>>>
>>
>> And
>> Only show '/sys/class/devfreq/devfreq0/work_timer_type' sysfs attribute,
>> if devfreq device uses the simple_ondemand. Because this 'work_timer_type'
>> sysfs attribute only depends on simple_ondemand governor and are useful.
>>
>> So, 'work_timer_type' sysfs attribute will be handled
>> at drivers/devfreq/governor_simpleondemand.c.
>>
>> After posting my suggestion, we can discuss it.
>>
>>
>>> I'm developing the RFC patch and then I'll send it as soon as possible.
> 
> Good, thank you for the explanation. For the first glance the design
> looks OK, we can discuss it a bit more in you RFC series.
> I would recommend to not make it conditional on simple_ondemand governor
> just add a comment that for i.e. performance or passive governors it has
> less sense to use this setting. There might be some other governors
> loaded as modules, which could benefit from it, or in Android e.g.
> https://android.googlesource.com/kernel/msm/+/refs/heads/android-msm-coral-4.14-android10/drivers/devfreq/governor_msm_adreno_tz.c

OK. Instead, I'll add the flag for governors. The governor flag
indicates each sysfs entries like polling_interval, work_timer_type.

If each governor want to use the specific sysfs attributes,
just set the flag when governor is defined.

Thanks.

> 
> It would be good if it can land in mainline before v5.8-v5.9.
> 
> Regards,
> Lukasz
> 
> 
> 
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-02-03  1:02 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-27 15:17 [PATCH 0/1] drivers: devfreq: use DELAYED_WORK in DEVFREQ monitoring subsystem lukasz.luba
2020-01-27 15:17 ` [PATCH 1/1] drivers: devfreq: add DELAYED_WORK to " lukasz.luba
2020-01-30 11:47 ` [PATCH 0/1] drivers: devfreq: use DELAYED_WORK in DEVFREQ " Lukasz Luba
2020-01-31  0:42   ` Chanwoo Choi
2020-01-31  0:47     ` Chanwoo Choi
2020-01-31  9:38       ` Lukasz Luba
2020-02-03  1:10         ` Chanwoo Choi
2020-01-30 11:47 ` Lukasz Luba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).