linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] Increase in idle power with schedutil
@ 2016-05-18 12:53 Shilpasri G Bhat
  2016-05-18 12:53 ` [RFC PATCH] cpufreq: powernv: Add fast_switch callback Shilpasri G Bhat
  2016-05-18 21:11 ` [RFC PATCH] Increase in idle power with schedutil Rafael J. Wysocki
  0 siblings, 2 replies; 12+ messages in thread
From: Shilpasri G Bhat @ 2016-05-18 12:53 UTC (permalink / raw)
  To: rjw
  Cc: viresh.kumar, linux-pm, linux-kernel, ego, shreyas, akshay.adiga,
	linuxppc-dev, Shilpasri G Bhat

This patch adds driver callback for fast_switch and below observations
on schedutil governor are done with this patch.

In POWER8 there is a regression observed with schedutil compared to
ondemand. With schedutil the frequency is not ramping down and is
mostly stuck at max frequency during idle . This is because of the
watchdog timer, an RT task which is fired every 4 seconds which
results in requesting max frequency.

In a completely idle system, when there are no processes running apart
from few short running housekeeping tasks (like watchdog) the system is
stuck at max frequency due to 'cpufreq_trigger_update()'

static inline void cpufreq_trigger_update(u64 time)
{
        cpufreq_update_util(time, ULONG_MAX, 0);
}

If there is no noise apart from the watchdog timer the cpu is held at
max frequency for no good reason. On a 16 core system I can see an
increase in 20% idle power with schedutil compared to ondemand
governor.

Below is the trace with 'sched:sched_switch' and 'power:cpu_frequency'
events. Here the watchdog timer that runs for a very small period is
requesting Pmax and this gets triggered regularly.

<idle>-0  19059.992912: sched_switch: prev_comm=swapper/16  prev_state=R
				==> next_comm=watchdog/16 
watchdog/16-107 19059.992914: cpu_frequency: state=4322000 cpu_id=16
watchdog/16-107 19059.992915: sched_switch: prev_comm=watchdog/16 prev_state=S
	 			==> next_comm=swapper/16 

However adding a cpufreq hook in pick_next_task_idle() to decrease the
frequency helped to reduce the problem.

static inline void cpufreq_trigger_idle(u64 time)
{
       cpufreq_update_util(time, 0, 1);
}

This might not be the right fix for the problem, however this thread
is reporting the other short-comings of cpufreq_trigger_update().

Shilpasri G Bhat (1):
  cpufreq: powernv: Add fast_switch callback

 drivers/cpufreq/powernv-cpufreq.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC PATCH] cpufreq: powernv: Add fast_switch callback
  2016-05-18 12:53 [RFC PATCH] Increase in idle power with schedutil Shilpasri G Bhat
@ 2016-05-18 12:53 ` Shilpasri G Bhat
  2016-05-18 21:22   ` Rafael J. Wysocki
  2016-05-18 21:11 ` [RFC PATCH] Increase in idle power with schedutil Rafael J. Wysocki
  1 sibling, 1 reply; 12+ messages in thread
From: Shilpasri G Bhat @ 2016-05-18 12:53 UTC (permalink / raw)
  To: rjw
  Cc: viresh.kumar, linux-pm, linux-kernel, ego, shreyas, akshay.adiga,
	linuxppc-dev, Shilpasri G Bhat

Add fast_switch driver callback to support frequency update in
interrupt context while using schedutil governor. Changing frequency
in interrupt context will remove the jitter on the workloads which can
be seen when a kworker thread is used for the changing the frequency.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
---
 drivers/cpufreq/powernv-cpufreq.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index 54c4536..4553eb6 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -678,6 +678,8 @@ static int powernv_cpufreq_cpu_init(struct cpufreq_policy *policy)
 	for (i = 0; i < threads_per_core; i++)
 		cpumask_set_cpu(base + i, policy->cpus);
 
+	policy->fast_switch_possible = true;
+
 	kn = kernfs_find_and_get(policy->kobj.sd, throttle_attr_grp.name);
 	if (!kn) {
 		int ret;
@@ -854,6 +856,24 @@ static void powernv_cpufreq_stop_cpu(struct cpufreq_policy *policy)
 	del_timer_sync(&gpstates->timer);
 }
 
+static unsigned int powernv_fast_switch(struct cpufreq_policy *policy,
+					unsigned int target_freq)
+{
+	int index;
+	struct powernv_smp_call_data freq_data;
+
+	cpufreq_frequency_table_target(policy, policy->freq_table,
+				       target_freq,
+				       CPUFREQ_RELATION_C, &index);
+	if (index < 0 || index >= powernv_pstate_info.nr_pstates)
+		return CPUFREQ_ENTRY_INVALID;
+	freq_data.pstate_id = powernv_freqs[index].driver_data;
+	freq_data.gpstate_id = powernv_freqs[index].driver_data;
+	set_pstate(&freq_data);
+
+	return pstate_id_to_freq(-index);
+}
+
 static struct cpufreq_driver powernv_cpufreq_driver = {
 	.name		= "powernv-cpufreq",
 	.flags		= CPUFREQ_CONST_LOOPS,
@@ -861,6 +881,7 @@ static struct cpufreq_driver powernv_cpufreq_driver = {
 	.exit		= powernv_cpufreq_cpu_exit,
 	.verify		= cpufreq_generic_frequency_table_verify,
 	.target_index	= powernv_cpufreq_target_index,
+	.fast_switch	= powernv_fast_switch,
 	.get		= powernv_cpufreq_get,
 	.stop_cpu	= powernv_cpufreq_stop_cpu,
 	.attr		= powernv_cpu_freq_attr,
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Increase in idle power with schedutil
  2016-05-18 12:53 [RFC PATCH] Increase in idle power with schedutil Shilpasri G Bhat
  2016-05-18 12:53 ` [RFC PATCH] cpufreq: powernv: Add fast_switch callback Shilpasri G Bhat
@ 2016-05-18 21:11 ` Rafael J. Wysocki
  2016-05-19 11:40   ` Peter Zijlstra
  1 sibling, 1 reply; 12+ messages in thread
From: Rafael J. Wysocki @ 2016-05-18 21:11 UTC (permalink / raw)
  To: Shilpasri G Bhat
  Cc: Rafael J. Wysocki, Viresh Kumar, linux-pm,
	Linux Kernel Mailing List, Gautham R. Shenoy, shreyas,
	akshay.adiga, linuxppc-dev, Steve Muckle, Peter Zijlstra

On Wed, May 18, 2016 at 2:53 PM, Shilpasri G Bhat
<shilpa.bhat@linux.vnet.ibm.com> wrote:
> This patch adds driver callback for fast_switch and below observations
> on schedutil governor are done with this patch.
>
> In POWER8 there is a regression observed with schedutil compared to
> ondemand. With schedutil the frequency is not ramping down and is
> mostly stuck at max frequency during idle . This is because of the
> watchdog timer, an RT task which is fired every 4 seconds which
> results in requesting max frequency.

Well, yes, that would be problematic.

I guess the Steve Muckle's cross-CPU utilization updates series might
help (you can find it in the linux-pm patchwork).

> In a completely idle system, when there are no processes running apart
> from few short running housekeeping tasks (like watchdog) the system is
> stuck at max frequency due to 'cpufreq_trigger_update()'
>
> static inline void cpufreq_trigger_update(u64 time)
> {
>         cpufreq_update_util(time, ULONG_MAX, 0);
> }
>
> If there is no noise apart from the watchdog timer the cpu is held at
> max frequency for no good reason. On a 16 core system I can see an
> increase in 20% idle power with schedutil compared to ondemand
> governor.
>
> Below is the trace with 'sched:sched_switch' and 'power:cpu_frequency'
> events. Here the watchdog timer that runs for a very small period is
> requesting Pmax and this gets triggered regularly.
>
> <idle>-0  19059.992912: sched_switch: prev_comm=swapper/16  prev_state=R
>                                 ==> next_comm=watchdog/16
> watchdog/16-107 19059.992914: cpu_frequency: state=4322000 cpu_id=16
> watchdog/16-107 19059.992915: sched_switch: prev_comm=watchdog/16 prev_state=S
>                                 ==> next_comm=swapper/16
>
> However adding a cpufreq hook in pick_next_task_idle() to decrease the
> frequency helped to reduce the problem.
>
> static inline void cpufreq_trigger_idle(u64 time)
> {
>        cpufreq_update_util(time, 0, 1);
> }
>
> This might not be the right fix for the problem, however this thread
> is reporting the other short-comings of cpufreq_trigger_update().

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] cpufreq: powernv: Add fast_switch callback
  2016-05-18 12:53 ` [RFC PATCH] cpufreq: powernv: Add fast_switch callback Shilpasri G Bhat
@ 2016-05-18 21:22   ` Rafael J. Wysocki
  0 siblings, 0 replies; 12+ messages in thread
From: Rafael J. Wysocki @ 2016-05-18 21:22 UTC (permalink / raw)
  To: Shilpasri G Bhat
  Cc: Rafael J. Wysocki, Viresh Kumar, linux-pm,
	Linux Kernel Mailing List, Gautham R. Shenoy, shreyas,
	akshay.adiga, linuxppc-dev, Peter Zijlstra

On Wed, May 18, 2016 at 2:53 PM, Shilpasri G Bhat
<shilpa.bhat@linux.vnet.ibm.com> wrote:
> Add fast_switch driver callback to support frequency update in
> interrupt context while using schedutil governor. Changing frequency
> in interrupt context will remove the jitter on the workloads which can
> be seen when a kworker thread is used for the changing the frequency.
>
> Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>

This looks simple enough. :-)

A couple of comments, though.

> ---
>  drivers/cpufreq/powernv-cpufreq.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
>
> diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
> index 54c4536..4553eb6 100644
> --- a/drivers/cpufreq/powernv-cpufreq.c
> +++ b/drivers/cpufreq/powernv-cpufreq.c
> @@ -678,6 +678,8 @@ static int powernv_cpufreq_cpu_init(struct cpufreq_policy *policy)
>         for (i = 0; i < threads_per_core; i++)
>                 cpumask_set_cpu(base + i, policy->cpus);
>
> +       policy->fast_switch_possible = true;
> +
>         kn = kernfs_find_and_get(policy->kobj.sd, throttle_attr_grp.name);
>         if (!kn) {
>                 int ret;
> @@ -854,6 +856,24 @@ static void powernv_cpufreq_stop_cpu(struct cpufreq_policy *policy)
>         del_timer_sync(&gpstates->timer);
>  }
>
> +static unsigned int powernv_fast_switch(struct cpufreq_policy *policy,
> +                                       unsigned int target_freq)
> +{
> +       int index;
> +       struct powernv_smp_call_data freq_data;
> +
> +       cpufreq_frequency_table_target(policy, policy->freq_table,
> +                                      target_freq,
> +                                      CPUFREQ_RELATION_C, &index);

According to the discussion I had with Peter some time ago, this
should be RELATION_L or you may end up using a frequency that's not
sufficient to meet a deadline somewhere.

Also cpufreq_frequency_table_target() is somewhat heavy-weight
especially if the table is known to be sorted (which I guess is the
case).

> +       if (index < 0 || index >= powernv_pstate_info.nr_pstates)
> +               return CPUFREQ_ENTRY_INVALID;
> +       freq_data.pstate_id = powernv_freqs[index].driver_data;
> +       freq_data.gpstate_id = powernv_freqs[index].driver_data;
> +       set_pstate(&freq_data);
> +
> +       return pstate_id_to_freq(-index);
> +}
> +
>  static struct cpufreq_driver powernv_cpufreq_driver = {
>         .name           = "powernv-cpufreq",
>         .flags          = CPUFREQ_CONST_LOOPS,
> @@ -861,6 +881,7 @@ static struct cpufreq_driver powernv_cpufreq_driver = {
>         .exit           = powernv_cpufreq_cpu_exit,
>         .verify         = cpufreq_generic_frequency_table_verify,
>         .target_index   = powernv_cpufreq_target_index,
> +       .fast_switch    = powernv_fast_switch,
>         .get            = powernv_cpufreq_get,
>         .stop_cpu       = powernv_cpufreq_stop_cpu,
>         .attr           = powernv_cpu_freq_attr,
> --

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Increase in idle power with schedutil
  2016-05-18 21:11 ` [RFC PATCH] Increase in idle power with schedutil Rafael J. Wysocki
@ 2016-05-19 11:40   ` Peter Zijlstra
  2016-05-19 14:30     ` Rafael J. Wysocki
                       ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Peter Zijlstra @ 2016-05-19 11:40 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Shilpasri G Bhat, Rafael J. Wysocki, Viresh Kumar, linux-pm,
	Linux Kernel Mailing List, Gautham R. Shenoy, shreyas,
	akshay.adiga, linuxppc-dev, Steve Muckle

On Wed, May 18, 2016 at 11:11:51PM +0200, Rafael J. Wysocki wrote:
> On Wed, May 18, 2016 at 2:53 PM, Shilpasri G Bhat
> <shilpa.bhat@linux.vnet.ibm.com> wrote:
> > This patch adds driver callback for fast_switch and below observations
> > on schedutil governor are done with this patch.
> >
> > In POWER8 there is a regression observed with schedutil compared to
> > ondemand. With schedutil the frequency is not ramping down and is
> > mostly stuck at max frequency during idle . This is because of the
> > watchdog timer, an RT task which is fired every 4 seconds which
> > results in requesting max frequency.
> 
> Well, yes, that would be problematic.
> 

Right; we need to come up with something for RT tasks; but what happens
if you disable the watchdog? This should be entirely doable and might
give a better comparison.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Increase in idle power with schedutil
  2016-05-19 11:40   ` Peter Zijlstra
@ 2016-05-19 14:30     ` Rafael J. Wysocki
  2016-05-20 12:23     ` Shilpasri G Bhat
       [not found]     ` <201605201223.u4KCNWn9028105@mx0a-001b2d01.pphosted.com>
  2 siblings, 0 replies; 12+ messages in thread
From: Rafael J. Wysocki @ 2016-05-19 14:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rafael J. Wysocki, Shilpasri G Bhat, Rafael J. Wysocki,
	Viresh Kumar, linux-pm, Linux Kernel Mailing List,
	Gautham R. Shenoy, shreyas, akshay.adiga, linuxppc-dev,
	Steve Muckle

On Thu, May 19, 2016 at 1:40 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, May 18, 2016 at 11:11:51PM +0200, Rafael J. Wysocki wrote:
>> On Wed, May 18, 2016 at 2:53 PM, Shilpasri G Bhat
>> <shilpa.bhat@linux.vnet.ibm.com> wrote:
>> > This patch adds driver callback for fast_switch and below observations
>> > on schedutil governor are done with this patch.
>> >
>> > In POWER8 there is a regression observed with schedutil compared to
>> > ondemand. With schedutil the frequency is not ramping down and is
>> > mostly stuck at max frequency during idle . This is because of the
>> > watchdog timer, an RT task which is fired every 4 seconds which
>> > results in requesting max frequency.
>>
>> Well, yes, that would be problematic.
>>
>
> Right; we need to come up with something for RT tasks;

I think we need the hints thing for that to be able to distinguish
between RT and the rest.

Also in this particular case it looks like an RT task is the only task
that wakes up often enough and we don't drop the frequency when going
idle.  Do we need a hook somewhere in the idle path?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Increase in idle power with schedutil
  2016-05-19 11:40   ` Peter Zijlstra
  2016-05-19 14:30     ` Rafael J. Wysocki
@ 2016-05-20 12:23     ` Shilpasri G Bhat
       [not found]     ` <201605201223.u4KCNWn9028105@mx0a-001b2d01.pphosted.com>
  2 siblings, 0 replies; 12+ messages in thread
From: Shilpasri G Bhat @ 2016-05-20 12:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rafael J. Wysocki, Viresh Kumar, linux-pm,
	Linux Kernel Mailing List, Gautham R. Shenoy, shreyas,
	akshay.adiga, linuxppc-dev, Steve Muckle

Hi,

On 05/19/2016 05:10 PM, Peter Zijlstra wrote:
> On Wed, May 18, 2016 at 11:11:51PM +0200, Rafael J. Wysocki wrote:
>> On Wed, May 18, 2016 at 2:53 PM, Shilpasri G Bhat
>> <shilpa.bhat@linux.vnet.ibm.com> wrote:
>>> This patch adds driver callback for fast_switch and below observations
>>> on schedutil governor are done with this patch.
>>>
>>> In POWER8 there is a regression observed with schedutil compared to
>>> ondemand. With schedutil the frequency is not ramping down and is
>>> mostly stuck at max frequency during idle . This is because of the
>>> watchdog timer, an RT task which is fired every 4 seconds which
>>> results in requesting max frequency.
>>
>> Well, yes, that would be problematic.
>>
> 
> Right; we need to come up with something for RT tasks; but what happens
> if you disable the watchdog? This should be entirely doable and might
> give a better comparison.
> 

Below are the comparisons by disabling watchdog.
Both schedutil and ondemand have a similar ramp-down trend. And in both the
cases I can see that frequency of the cpu is not reduced in deterministic
fashion. In a observation window of 30 seconds after running a workload I can
see that the frequency is not ramped down on some cpus in the system and are
idling at max frequency.

Below are the sample trace showcasing the frequency request when the cpu enters
idle with schedutil.
<...>-3528  7650.011010: cpu_frequency: state=4322000 cpu_id=120
<...>-3528  7650.027540: sched_switch: prev_comm=ppc64_cpu prev_state=x ==>
			next_comm=swapper/120
<idle>-0    7650.035017: cpu_frequency: state=4322000 cpu_id=120
<idle>-0    7729.683536: cpu_frequency: state=4322000 cpu_id=120
<idle>-0    7729.683552: sched_switch: prev_comm=swapper/120 prev_state=R ==>
			next_comm=kworker/120:1
kworker/120  7729.683565: sched_switch: prev_comm=kworker/120:1 prev_state=S ==>
			 next_comm=swapper/120

However ondemand governor(with watchdog enabled) benefits from the noise created
by watchdog timer and is able to brig down the frequency.

Thanks and Regards,
Shilpa

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Increase in idle power with schedutil
       [not found]     ` <201605201223.u4KCNWn9028105@mx0a-001b2d01.pphosted.com>
@ 2016-05-22 10:39       ` Peter Zijlstra
  2016-05-22 20:42         ` Steve Muckle
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2016-05-22 10:39 UTC (permalink / raw)
  To: Shilpasri G Bhat
  Cc: Rafael J. Wysocki, Viresh Kumar, linux-pm,
	Linux Kernel Mailing List, Gautham R. Shenoy, shreyas,
	akshay.adiga, linuxppc-dev, Steve Muckle

On Fri, May 20, 2016 at 05:53:41PM +0530, Shilpasri G Bhat wrote:
> 
> Below are the comparisons by disabling watchdog.
> Both schedutil and ondemand have a similar ramp-down trend. And in both the
> cases I can see that frequency of the cpu is not reduced in deterministic
> fashion. In a observation window of 30 seconds after running a workload I can
> see that the frequency is not ramped down on some cpus in the system and are
> idling at max frequency.

So does it actually matter what the frequency is when you idle? Isn't
the whole thing clock gated anyway?

Because this seems to generate contradictory requirements, on the one
hand we want to stay idle as long as possible while on the other hand
you seem to want to clock down while idle, which requires not being
idle.

If it matters; should not your idle state muck explicitly set/restore
frequency?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Increase in idle power with schedutil
  2016-05-22 10:39       ` Peter Zijlstra
@ 2016-05-22 20:42         ` Steve Muckle
  2016-05-23  9:00           ` Lorenzo Pieralisi
  2016-05-23  9:24           ` Peter Zijlstra
  0 siblings, 2 replies; 12+ messages in thread
From: Steve Muckle @ 2016-05-22 20:42 UTC (permalink / raw)
  To: Peter Zijlstra, Daniel Lezcano
  Cc: Shilpasri G Bhat, Rafael J. Wysocki, Viresh Kumar, linux-pm,
	Linux Kernel Mailing List, Gautham R. Shenoy, shreyas,
	akshay.adiga, linuxppc-dev, Steve Muckle

On Sun, May 22, 2016 at 12:39:12PM +0200, Peter Zijlstra wrote:
> On Fri, May 20, 2016 at 05:53:41PM +0530, Shilpasri G Bhat wrote:
> > 
> > Below are the comparisons by disabling watchdog.
> > Both schedutil and ondemand have a similar ramp-down trend. And in both the
> > cases I can see that frequency of the cpu is not reduced in deterministic
> > fashion. In a observation window of 30 seconds after running a workload I can
> > see that the frequency is not ramped down on some cpus in the system and are
> > idling at max frequency.
> 
> So does it actually matter what the frequency is when you idle? Isn't
> the whole thing clock gated anyway?
> 
> Because this seems to generate contradictory requirements, on the one
> hand we want to stay idle as long as possible while on the other hand
> you seem to want to clock down while idle, which requires not being
> idle.
> 
> If it matters; should not your idle state muck explicitly set/restore
> frequency?

AFAIK this is very platform dependent. Some will waste more power than
others when a CPU idles above fmin due to things like resource (bus
bandwidth, shared cache freq etc) voting.

It is also true that there is power spent going to fmin (and then
perhaps restoring the frequency when idle ends) which will be in part a
function of how slow the frequency change operation is on that platform.

I think Daniel Lezcano (added) was exploring the idea of having cpuidle
drivers take the expected idle duration and potentially communicate to
cpufreq to reduce the frequency depending on a platform-specific
cost/benefit analysis.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Increase in idle power with schedutil
  2016-05-22 20:42         ` Steve Muckle
@ 2016-05-23  9:00           ` Lorenzo Pieralisi
  2016-05-23  9:24             ` Peter Zijlstra
  2016-05-23  9:24           ` Peter Zijlstra
  1 sibling, 1 reply; 12+ messages in thread
From: Lorenzo Pieralisi @ 2016-05-23  9:00 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Peter Zijlstra, Daniel Lezcano, Shilpasri G Bhat,
	Rafael J. Wysocki, Viresh Kumar, linux-pm,
	Linux Kernel Mailing List, Gautham R. Shenoy, shreyas,
	akshay.adiga, linuxppc-dev

On Sun, May 22, 2016 at 01:42:52PM -0700, Steve Muckle wrote:
> On Sun, May 22, 2016 at 12:39:12PM +0200, Peter Zijlstra wrote:
> > On Fri, May 20, 2016 at 05:53:41PM +0530, Shilpasri G Bhat wrote:
> > > 
> > > Below are the comparisons by disabling watchdog.
> > > Both schedutil and ondemand have a similar ramp-down trend. And in both the
> > > cases I can see that frequency of the cpu is not reduced in deterministic
> > > fashion. In a observation window of 30 seconds after running a workload I can
> > > see that the frequency is not ramped down on some cpus in the system and are
> > > idling at max frequency.
> > 
> > So does it actually matter what the frequency is when you idle? Isn't
> > the whole thing clock gated anyway?
> > 
> > Because this seems to generate contradictory requirements, on the one
> > hand we want to stay idle as long as possible while on the other hand
> > you seem to want to clock down while idle, which requires not being
> > idle.
> > 
> > If it matters; should not your idle state muck explicitly set/restore
> > frequency?
> 
> AFAIK this is very platform dependent. Some will waste more power than
> others when a CPU idles above fmin due to things like resource (bus
> bandwidth, shared cache freq etc) voting.

It is also related to static leakage power that depends on the operating
voltage (ie higher operating frequencies require higher voltage) so in a
way scaling frequency before going idle may not be effective if voltage
does not scale too in turn.

Lorenzo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Increase in idle power with schedutil
  2016-05-22 20:42         ` Steve Muckle
  2016-05-23  9:00           ` Lorenzo Pieralisi
@ 2016-05-23  9:24           ` Peter Zijlstra
  1 sibling, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2016-05-23  9:24 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Daniel Lezcano, Shilpasri G Bhat, Rafael J. Wysocki,
	Viresh Kumar, linux-pm, Linux Kernel Mailing List,
	Gautham R. Shenoy, shreyas, akshay.adiga, linuxppc-dev

On Sun, May 22, 2016 at 01:42:52PM -0700, Steve Muckle wrote:

> > So does it actually matter what the frequency is when you idle? Isn't
> > the whole thing clock gated anyway?
> > 
> > Because this seems to generate contradictory requirements, on the one
> > hand we want to stay idle as long as possible while on the other hand
> > you seem to want to clock down while idle, which requires not being
> > idle.
> > 
> > If it matters; should not your idle state muck explicitly set/restore
> > frequency?
> 
> AFAIK this is very platform dependent. Some will waste more power than
> others when a CPU idles above fmin due to things like resource (bus
> bandwidth, shared cache freq etc) voting.

Oh agreed, completely platform dependent. 'Luckily' all this cpuidle is
already very platform dependent.

> It is also true that there is power spent going to fmin (and then
> perhaps restoring the frequency when idle ends) which will be in part a
> function of how slow the frequency change operation is on that platform.

Agreed.

> I think Daniel Lezcano (added) was exploring the idea of having cpuidle
> drivers take the expected idle duration and potentially communicate to
> cpufreq to reduce the frequency depending on a platform-specific
> cost/benefit analysis.

Right; that's along the lines I was thinking. If the idle guestimate and
the idle QoS both allow (ie. it wins on power and doesn't violate
wake-up latency) muck with DVSF on the idle path.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Increase in idle power with schedutil
  2016-05-23  9:00           ` Lorenzo Pieralisi
@ 2016-05-23  9:24             ` Peter Zijlstra
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2016-05-23  9:24 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Steve Muckle, Daniel Lezcano, Shilpasri G Bhat,
	Rafael J. Wysocki, Viresh Kumar, linux-pm,
	Linux Kernel Mailing List, Gautham R. Shenoy, shreyas,
	akshay.adiga, linuxppc-dev

On Mon, May 23, 2016 at 10:00:04AM +0100, Lorenzo Pieralisi wrote:
> It is also related to static leakage power that depends on the operating
> voltage (ie higher operating frequencies require higher voltage) so in a
> way scaling frequency before going idle may not be effective if voltage
> does not scale too in turn.

Sure, but the platform drivers 'know' all this and can make the right
decision.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-05-23  9:25 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-18 12:53 [RFC PATCH] Increase in idle power with schedutil Shilpasri G Bhat
2016-05-18 12:53 ` [RFC PATCH] cpufreq: powernv: Add fast_switch callback Shilpasri G Bhat
2016-05-18 21:22   ` Rafael J. Wysocki
2016-05-18 21:11 ` [RFC PATCH] Increase in idle power with schedutil Rafael J. Wysocki
2016-05-19 11:40   ` Peter Zijlstra
2016-05-19 14:30     ` Rafael J. Wysocki
2016-05-20 12:23     ` Shilpasri G Bhat
     [not found]     ` <201605201223.u4KCNWn9028105@mx0a-001b2d01.pphosted.com>
2016-05-22 10:39       ` Peter Zijlstra
2016-05-22 20:42         ` Steve Muckle
2016-05-23  9:00           ` Lorenzo Pieralisi
2016-05-23  9:24             ` Peter Zijlstra
2016-05-23  9:24           ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).