switching to top frequency too frequent with ondemand governor and no

All of lore.kernel.org
 help / color / mirror / Atom feed

* switching to top frequency too frequent with ondemand governor and no_hz
@ 2011-06-01 16:08 Markus Trippelsdorf
  2011-06-01 17:34 ` David C Niemi
  0 siblings, 1 reply; 12+ messages in thread
From: Markus Trippelsdorf @ 2011-06-01 16:08 UTC (permalink / raw)
  To: cpufreq

There seems to be a major difference in the behavior of the ondemand
governor depending on whether CONFIG_NO_HZ is set or not in the kernel
.config.

In the NO_HZ case the ondemand governor spends too much time at the
highest frequency and is also very trigger happy.

I have compared the two cases on my system:
powernow-k8: Found 1 AMD Phenom(tm) II X4 955 Processor (4 cpu cores) (version 2.20.00)
powernow-k8:    0 : pstate 0 (3200 MHz)
powernow-k8:    1 : pstate 1 (2500 MHz)
powernow-k8:    2 : pstate 2 (2100 MHz)
powernow-k8:    3 : pstate 3 (800 MHz)

When I run:
watch -n.1 'cat /proc/cpuinfo|grep MHz'
on an otherwise idle system, I can see that the frequency always stays
at 800 MHz in the "CONFIG_NO_HZ not set" case. But it will very
frequently switch to 3200 MHz in the CONFIG_NO_HZ=y case under the same
conditions.

This also manifests itself in the cpufreq/stats/time_in_state
statistics (again on a mostly idle system):

First taken with:
echo 200 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor
(BTW wouldn't it make sense to use something like this as the default
value?)

cat /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state

CONFIG_NO_HZ not set: 
3200000 5845
2500000 0
2100000 5
800000 31552

CONFIG_NO_HZ=y:
3200000 17650
2500000 0
2100000 0
800000 31129

And with the default sampling_down_factor=1

CONFIG_NO_HZ not set: 
3200000 140
2500000 2
2100000 29
800000 16614

CONFIG_NO_HZ=y:
3200000 538
2500000 9
2100000 77
800000 16287

Now my question is, is this expected? And what could be done to make the
NO_HZ behavior more like the "CONFIG_NO_HZ not set" behavior.

-- 
Markus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: switching to top frequency too frequent with ondemand governor and no_hz
  2011-06-01 16:08 switching to top frequency too frequent with ondemand governor and no_hz Markus Trippelsdorf
@ 2011-06-01 17:34 ` David C Niemi
  2011-06-01 18:00   ` Markus Trippelsdorf
  0 siblings, 1 reply; 12+ messages in thread
From: David C Niemi @ 2011-06-01 17:34 UTC (permalink / raw)
  To: Markus Trippelsdorf; +Cc: cpufreq


A very interesting bit of information.  What do you have set for up_threshold?  You may have to set it higher for CONFIG_NO_HZ than without, based on your symptoms.  Another thing to look at is your sampling_rate.  I'm guessing it differs between CONFIG_NO_HZ being set or not.

And perhaps you need to set sampling_down_factor a bit lower.  I consider 100 a reasonable default, but a default of "1" was put in initially to make the behavior of the patch that enabled the factor identical with not having the patch.  If you are more concerned with saving power than maximizing throughput, you might consider a much lower value like 5 or 10.

DCN


On 06/01/2011 12:08 PM, Markus Trippelsdorf wrote:
> There seems to be a major difference in the behavior of the ondemand
> governor depending on whether CONFIG_NO_HZ is set or not in the kernel
> .config.
>
> In the NO_HZ case the ondemand governor spends too much time at the
> highest frequency and is also very trigger happy.
>
> I have compared the two cases on my system:
> powernow-k8: Found 1 AMD Phenom(tm) II X4 955 Processor (4 cpu cores) (version 2.20.00)
> powernow-k8:    0 : pstate 0 (3200 MHz)
> powernow-k8:    1 : pstate 1 (2500 MHz)
> powernow-k8:    2 : pstate 2 (2100 MHz)
> powernow-k8:    3 : pstate 3 (800 MHz)
>
> When I run:
> watch -n.1 'cat /proc/cpuinfo|grep MHz'
> on an otherwise idle system, I can see that the frequency always stays
> at 800 MHz in the "CONFIG_NO_HZ not set" case. But it will very
> frequently switch to 3200 MHz in the CONFIG_NO_HZ=y case under the same
> conditions.
>
> This also manifests itself in the cpufreq/stats/time_in_state
> statistics (again on a mostly idle system):
>
> First taken with:
> echo 200 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor
> (BTW wouldn't it make sense to use something like this as the default
> value?)
>
> cat /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state
>
> CONFIG_NO_HZ not set: 
> 3200000 5845
> 2500000 0
> 2100000 5
> 800000 31552
>
> CONFIG_NO_HZ=y:
> 3200000 17650
> 2500000 0
> 2100000 0
> 800000 31129
>
>
> And with the default sampling_down_factor=1
>
> CONFIG_NO_HZ not set: 
> 3200000 140
> 2500000 2
> 2100000 29
> 800000 16614
>
> CONFIG_NO_HZ=y:
> 3200000 538
> 2500000 9
> 2100000 77
> 800000 16287
>
> Now my question is, is this expected? And what could be done to make the
> NO_HZ behavior more like the "CONFIG_NO_HZ not set" behavior.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: switching to top frequency too frequent with ondemand governor and no_hz
  2011-06-01 17:34 ` David C Niemi
@ 2011-06-01 18:00   ` Markus Trippelsdorf
  2011-06-02 11:41     ` Markus Trippelsdorf
  0 siblings, 1 reply; 12+ messages in thread
From: Markus Trippelsdorf @ 2011-06-01 18:00 UTC (permalink / raw)
  To: David C Niemi; +Cc: cpufreq, Vincent Guittot, Dave Jones, linux-kernel

On 2011.06.01 at 13:34 -0400, David C Niemi wrote:
> On 06/01/2011 12:08 PM, Markus Trippelsdorf wrote:
> > There seems to be a major difference in the behavior of the ondemand
> > governor depending on whether CONFIG_NO_HZ is set or not in the kernel
> > .config.
> >
> > In the NO_HZ case the ondemand governor spends too much time at the
> > highest frequency and is also very trigger happy.
> >
> > I have compared the two cases on my system:
> > powernow-k8: Found 1 AMD Phenom(tm) II X4 955 Processor (4 cpu cores) (version 2.20.00)
> > powernow-k8:    0 : pstate 0 (3200 MHz)
> > powernow-k8:    1 : pstate 1 (2500 MHz)
> > powernow-k8:    2 : pstate 2 (2100 MHz)
> > powernow-k8:    3 : pstate 3 (800 MHz)
> >
> > When I run:
> > watch -n.1 'cat /proc/cpuinfo|grep MHz'
> > on an otherwise idle system, I can see that the frequency always stays
> > at 800 MHz in the "CONFIG_NO_HZ not set" case. But it will very
> > frequently switch to 3200 MHz in the CONFIG_NO_HZ=y case under the same
> > conditions.
> >
> > This also manifests itself in the cpufreq/stats/time_in_state
> > statistics (again on a mostly idle system):
> >
> > First taken with:
> > echo 200 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor
> > (BTW wouldn't it make sense to use something like this as the default
> > value?)
> >
> > cat /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state
> >
> > CONFIG_NO_HZ not set: 
> > 3200000 5845
> > 2500000 0
> > 2100000 5
> > 800000 31552
> >
> > CONFIG_NO_HZ=y:
> > 3200000 17650
> > 2500000 0
> > 2100000 0
> > 800000 31129
> >
> >
> > And with the default sampling_down_factor=1
> >
> > CONFIG_NO_HZ not set: 
> > 3200000 140
> > 2500000 2
> > 2100000 29
> > 800000 16614
> >
> > CONFIG_NO_HZ=y:
> > 3200000 538
> > 2500000 9
> > 2100000 77
> > 800000 16287
> >
> > Now my question is, is this expected? And what could be done to make the
> > NO_HZ behavior more like the "CONFIG_NO_HZ not set" behavior.
> 
> A very interesting bit of information.  What do you have set for
> up_threshold?  You may have to set it higher for CONFIG_NO_HZ than
> without, based on your symptoms.  Another thing to look at is your
> sampling_rate.  I'm guessing it differs between CONFIG_NO_HZ being set
> or not.

I've played with all those parameters, but unfortunately it didn't make
any difference.

> And perhaps you need to set sampling_down_factor a bit lower.  I
> consider 100 a reasonable default, but a default of "1" was put in
> initially to make the behavior of the patch that enabled the factor
> identical with not having the patch.  If you are more concerned with
> saving power than maximizing throughput, you might consider a much
> lower value like 5 or 10.

Yes, I've tried different values and 200 turned out to be the best based
on my preferences (throughput over power saving). It makes a big
difference in the compile time of bigger projects, especially during the
configuration phase.

But I have found the root cause of symptoms described above by
bisection. It turned out that 2.6.39 is also affected, so I've bisected
down to 2.6.38. 
This is the result:

 5cb2c3bd0c5e0f3ced63f250ec2ad59d7c5c626a is the first bad commit
 commit 5cb2c3bd0c5e0f3ced63f250ec2ad59d7c5c626a
 Author: Vincent Guittot <vincent.guittot@linaro.org>
 Date:   Mon Feb 7 17:14:25 2011 +0100

     [CPUFREQ] calculate delay after dbs_check_cpu

When I revert the above in 3.0-rc1 the CONFIG_NO_HZ=y symptoms vanish.

-- 
Markus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: switching to top frequency too frequent with ondemand governor and no_hz
  2011-06-01 18:00   ` Markus Trippelsdorf
@ 2011-06-02 11:41     ` Markus Trippelsdorf
  2011-06-06  7:35       ` Vincent Guittot
  0 siblings, 1 reply; 12+ messages in thread
From: Markus Trippelsdorf @ 2011-06-02 11:41 UTC (permalink / raw)
  To: David C Niemi; +Cc: cpufreq, Vincent Guittot, Dave Jones, linux-kernel

On 2011.06.01 at 20:00 +0200, Markus Trippelsdorf wrote:
> On 2011.06.01 at 13:34 -0400, David C Niemi wrote:
> > On 06/01/2011 12:08 PM, Markus Trippelsdorf wrote:
> > > There seems to be a major difference in the behavior of the ondemand
> > > governor depending on whether CONFIG_NO_HZ is set or not in the kernel
> > > .config.
> > >
> > > In the NO_HZ case the ondemand governor spends too much time at the
> > > highest frequency and is also very trigger happy.
> > >
> > > I have compared the two cases on my system:
> > > powernow-k8: Found 1 AMD Phenom(tm) II X4 955 Processor (4 cpu cores) (version 2.20.00)
> > > powernow-k8:    0 : pstate 0 (3200 MHz)
> > > powernow-k8:    1 : pstate 1 (2500 MHz)
> > > powernow-k8:    2 : pstate 2 (2100 MHz)
> > > powernow-k8:    3 : pstate 3 (800 MHz)
> > >
> > > When I run:
> > > watch -n.1 'cat /proc/cpuinfo|grep MHz'
> > > on an otherwise idle system, I can see that the frequency always stays
> > > at 800 MHz in the "CONFIG_NO_HZ not set" case. But it will very
> > > frequently switch to 3200 MHz in the CONFIG_NO_HZ=y case under the same
> > > conditions.
> > >
> > > This also manifests itself in the cpufreq/stats/time_in_state
> > > statistics (again on a mostly idle system):
> > >
> > > First taken with:
> > > echo 200 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor
> > > (BTW wouldn't it make sense to use something like this as the default
> > > value?)
> > >
> > > cat /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state
> > >
> > > CONFIG_NO_HZ not set: 
> > > 3200000 5845
> > > 2500000 0
> > > 2100000 5
> > > 800000 31552
> > >
> > > CONFIG_NO_HZ=y:
> > > 3200000 17650
> > > 2500000 0
> > > 2100000 0
> > > 800000 31129
> > >
> > >
> > > And with the default sampling_down_factor=1
> > >
> > > CONFIG_NO_HZ not set: 
> > > 3200000 140
> > > 2500000 2
> > > 2100000 29
> > > 800000 16614
> > >
> > > CONFIG_NO_HZ=y:
> > > 3200000 538
> > > 2500000 9
> > > 2100000 77
> > > 800000 16287
> > >
> > > Now my question is, is this expected? And what could be done to make the
> > > NO_HZ behavior more like the "CONFIG_NO_HZ not set" behavior.
> > 
> > A very interesting bit of information.  What do you have set for
> > up_threshold?  You may have to set it higher for CONFIG_NO_HZ than
> > without, based on your symptoms.  Another thing to look at is your
> > sampling_rate.  I'm guessing it differs between CONFIG_NO_HZ being set
> > or not.
> 
> I've played with all those parameters, but unfortunately it didn't make
> any difference.
> 
> > And perhaps you need to set sampling_down_factor a bit lower.  I
> > consider 100 a reasonable default, but a default of "1" was put in
> > initially to make the behavior of the patch that enabled the factor
> > identical with not having the patch.  If you are more concerned with
> > saving power than maximizing throughput, you might consider a much
> > lower value like 5 or 10.
> 
> Yes, I've tried different values and 200 turned out to be the best based
> on my preferences (throughput over power saving). It makes a big
> difference in the compile time of bigger projects, especially during the
> configuration phase.
> 
> But I have found the root cause of symptoms described above by
> bisection. It turned out that 2.6.39 is also affected, so I've bisected
> down to 2.6.38. 
> This is the result:
> 
>  5cb2c3bd0c5e0f3ced63f250ec2ad59d7c5c626a is the first bad commit
>  commit 5cb2c3bd0c5e0f3ced63f250ec2ad59d7c5c626a
>  Author: Vincent Guittot <vincent.guittot@linaro.org>
>  Date:   Mon Feb 7 17:14:25 2011 +0100
> 
>      [CPUFREQ] calculate delay after dbs_check_cpu
> 
> When I revert the above in 3.0-rc1 the CONFIG_NO_HZ=y symptoms vanish.

Here are some numbers to back this claim:

cat /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state
(with sampling_down_factor=200)

CONFIG_NO_HZ not set:
3200000 1766
2500000 0
2100000 1479
800000 30787

CONFIG_NO_HZ=y:
3200000 922
2500000 0
2100000 2313
800000 31217

So the behavior in both cases is (roughly) the same again.

-- 
Markus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: switching to top frequency too frequent with ondemand governor and no_hz
  2011-06-02 11:41     ` Markus Trippelsdorf
@ 2011-06-06  7:35       ` Vincent Guittot
  2011-06-06 11:20         ` Markus Trippelsdorf
  0 siblings, 1 reply; 12+ messages in thread
From: Vincent Guittot @ 2011-06-06  7:35 UTC (permalink / raw)
  To: Markus Trippelsdorf; +Cc: David C Niemi, cpufreq, Dave Jones, linux-kernel

On 2 June 2011 13:41, Markus Trippelsdorf <markus@trippelsdorf.de> wrote:
> On 2011.06.01 at 20:00 +0200, Markus Trippelsdorf wrote:
>> On 2011.06.01 at 13:34 -0400, David C Niemi wrote:
>> > On 06/01/2011 12:08 PM, Markus Trippelsdorf wrote:
>> > > There seems to be a major difference in the behavior of the ondemand
>> > > governor depending on whether CONFIG_NO_HZ is set or not in the kernel
>> > > .config.
>> > >
>> > > In the NO_HZ case the ondemand governor spends too much time at the
>> > > highest frequency and is also very trigger happy.
>> > >
>> > > I have compared the two cases on my system:
>> > > powernow-k8: Found 1 AMD Phenom(tm) II X4 955 Processor (4 cpu cores) (version 2.20.00)
>> > > powernow-k8:    0 : pstate 0 (3200 MHz)
>> > > powernow-k8:    1 : pstate 1 (2500 MHz)
>> > > powernow-k8:    2 : pstate 2 (2100 MHz)
>> > > powernow-k8:    3 : pstate 3 (800 MHz)
>> > >
>> > > When I run:
>> > > watch -n.1 'cat /proc/cpuinfo|grep MHz'
>> > > on an otherwise idle system, I can see that the frequency always stays
>> > > at 800 MHz in the "CONFIG_NO_HZ not set" case. But it will very
>> > > frequently switch to 3200 MHz in the CONFIG_NO_HZ=y case under the same
>> > > conditions.
>> > >
>> > > This also manifests itself in the cpufreq/stats/time_in_state
>> > > statistics (again on a mostly idle system):
>> > >
>> > > First taken with:
>> > > echo 200 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor
>> > > (BTW wouldn't it make sense to use something like this as the default
>> > > value?)
>> > >
>> > > cat /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state
>> > >
>> > > CONFIG_NO_HZ not set:
>> > > 3200000 5845
>> > > 2500000 0
>> > > 2100000 5
>> > > 800000 31552
>> > >
>> > > CONFIG_NO_HZ=y:
>> > > 3200000 17650
>> > > 2500000 0
>> > > 2100000 0
>> > > 800000 31129
>> > >
>> > >
>> > > And with the default sampling_down_factor=1
>> > >
>> > > CONFIG_NO_HZ not set:
>> > > 3200000 140
>> > > 2500000 2
>> > > 2100000 29
>> > > 800000 16614
>> > >
>> > > CONFIG_NO_HZ=y:
>> > > 3200000 538
>> > > 2500000 9
>> > > 2100000 77
>> > > 800000 16287
>> > >
>> > > Now my question is, is this expected? And what could be done to make the
>> > > NO_HZ behavior more like the "CONFIG_NO_HZ not set" behavior.
>> >
>> > A very interesting bit of information.  What do you have set for
>> > up_threshold?  You may have to set it higher for CONFIG_NO_HZ than
>> > without, based on your symptoms.  Another thing to look at is your
>> > sampling_rate.  I'm guessing it differs between CONFIG_NO_HZ being set
>> > or not.
>>
>> I've played with all those parameters, but unfortunately it didn't make
>> any difference.
>>
>> > And perhaps you need to set sampling_down_factor a bit lower.  I
>> > consider 100 a reasonable default, but a default of "1" was put in
>> > initially to make the behavior of the patch that enabled the factor
>> > identical with not having the patch.  If you are more concerned with
>> > saving power than maximizing throughput, you might consider a much
>> > lower value like 5 or 10.
>>
>> Yes, I've tried different values and 200 turned out to be the best based
>> on my preferences (throughput over power saving). It makes a big
>> difference in the compile time of bigger projects, especially during the
>> configuration phase.
>>
>> But I have found the root cause of symptoms described above by
>> bisection. It turned out that 2.6.39 is also affected, so I've bisected
>> down to 2.6.38.
>> This is the result:
>>
>>  5cb2c3bd0c5e0f3ced63f250ec2ad59d7c5c626a is the first bad commit
>>  commit 5cb2c3bd0c5e0f3ced63f250ec2ad59d7c5c626a
>>  Author: Vincent Guittot <vincent.guittot@linaro.org>
>>  Date:   Mon Feb 7 17:14:25 2011 +0100
>>
>>      [CPUFREQ] calculate delay after dbs_check_cpu
>>
>> When I revert the above in 3.0-rc1 the CONFIG_NO_HZ=y symptoms vanish.
>

The patch, you have mentioned, solves a problem when ondemand governor
goes  from highest frequency to a lower one. Without the patch, the
governor uses the longest sampling period (sampling period * scaling
down factor) with a low frequency during the 1st period after
decreasing the frequency. This can lead to a large time frame
(sampling period * scaling down factor) with a low frequency but an
overloaded cpu.

The other correction of the patch is linked to the powersave bias
mode. The governor didn't use the right period for the low frequency
step (freq_lo_jiffies) but a larger one (sampling period * scaling
down factor). The ratio between low and high frequency was not the
right one.

Do you use the powersave bias mode ?

Could you give us more statistics : the number of state transition
could be an interesting value. Is there a difference with and without
CONFIG_NO_HZ ? What is your sampling rate ?

One difference with CONFIG_NO_HZ is the real sampling period which can
be greater than the timer configuration because of the deferrable
mode. The deferrable mode has nearly no effect when CONFIG_NO_HZ is
not set because the tick timer will ensure enough cpu activity to
trigger the governor. When CONFIG_NO_HZ is set, the ondemand governor
work is triggered at the beginning of a cpu activity so we have more
chance to have a short cpu load in one period instead of splitting it
into 2 differents periods. This behavior is quite useful for
responsiveness but can generates spurious frequency increase if the
sampling rate is too short.

Vincent

> Here are some numbers to back this claim:
>
> cat /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state
> (with sampling_down_factor=200)
>
> CONFIG_NO_HZ not set:
> 3200000 1766
> 2500000 0
> 2100000 1479
> 800000 30787
>
> CONFIG_NO_HZ=y:
> 3200000 922
> 2500000 0
> 2100000 2313
> 800000 31217
>
> So the behavior in both cases is (roughly) the same again.
>

> --
> Markus
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: switching to top frequency too frequent with ondemand governor and no_hz
  2011-06-06  7:35       ` Vincent Guittot
@ 2011-06-06 11:20         ` Markus Trippelsdorf
  2011-06-06 13:11           ` Vincent Guittot
  0 siblings, 1 reply; 12+ messages in thread
From: Markus Trippelsdorf @ 2011-06-06 11:20 UTC (permalink / raw)
  To: Vincent Guittot; +Cc: David C Niemi, cpufreq, Dave Jones, linux-kernel

On 2011.06.06 at 09:35 +0200, Vincent Guittot wrote:
> On 2 June 2011 13:41, Markus Trippelsdorf <markus@trippelsdorf.de> wrote:
> > On 2011.06.01 at 20:00 +0200, Markus Trippelsdorf wrote:
> >> On 2011.06.01 at 13:34 -0400, David C Niemi wrote:
> >> > On 06/01/2011 12:08 PM, Markus Trippelsdorf wrote:
> >> > > There seems to be a major difference in the behavior of the ondemand
> >> > > governor depending on whether CONFIG_NO_HZ is set or not in the kernel
> >> > > .config.
> >> > >
> >> > > In the NO_HZ case the ondemand governor spends too much time at the
> >> > > highest frequency and is also very trigger happy.
> >> > >
> >> > > I have compared the two cases on my system:
> >> > > powernow-k8: Found 1 AMD Phenom(tm) II X4 955 Processor (4 cpu cores) (version 2.20.00)
> >> > > powernow-k8:    0 : pstate 0 (3200 MHz)
> >> > > powernow-k8:    1 : pstate 1 (2500 MHz)
> >> > > powernow-k8:    2 : pstate 2 (2100 MHz)
> >> > > powernow-k8:    3 : pstate 3 (800 MHz)
> >> > >
> >> > > When I run:
> >> > > watch -n.1 'cat /proc/cpuinfo|grep MHz'
> >> > > on an otherwise idle system, I can see that the frequency always stays
> >> > > at 800 MHz in the "CONFIG_NO_HZ not set" case. But it will very
> >> > > frequently switch to 3200 MHz in the CONFIG_NO_HZ=y case under the same
> >> > > conditions.
> >> > >
> >> > > This also manifests itself in the cpufreq/stats/time_in_state
> >> > > statistics (again on a mostly idle system):
> >> > >
> >> > > First taken with:
> >> > > echo 200 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor
> >> > > (BTW wouldn't it make sense to use something like this as the default
> >> > > value?)
> >> > >
> >> > > cat /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state
> >> > >
> >> > > CONFIG_NO_HZ not set:
> >> > > 3200000 5845
> >> > > 2500000 0
> >> > > 2100000 5
> >> > > 800000 31552
> >> > >
> >> > > CONFIG_NO_HZ=y:
> >> > > 3200000 17650
> >> > > 2500000 0
> >> > > 2100000 0
> >> > > 800000 31129
> >> > >
> >> > >
> >> > > And with the default sampling_down_factor=1
> >> > >
> >> > > CONFIG_NO_HZ not set:
> >> > > 3200000 140
> >> > > 2500000 2
> >> > > 2100000 29
> >> > > 800000 16614
> >> > >
> >> > > CONFIG_NO_HZ=y:
> >> > > 3200000 538
> >> > > 2500000 9
> >> > > 2100000 77
> >> > > 800000 16287
> >> > >
> >> > > Now my question is, is this expected? And what could be done to make the
> >> > > NO_HZ behavior more like the "CONFIG_NO_HZ not set" behavior.
> >> >
> >> > A very interesting bit of information.  What do you have set for
> >> > up_threshold?  You may have to set it higher for CONFIG_NO_HZ than
> >> > without, based on your symptoms.  Another thing to look at is your
> >> > sampling_rate.  I'm guessing it differs between CONFIG_NO_HZ being set
> >> > or not.
> >>
> >> I've played with all those parameters, but unfortunately it didn't make
> >> any difference.
> >>
> >> > And perhaps you need to set sampling_down_factor a bit lower.  I
> >> > consider 100 a reasonable default, but a default of "1" was put in
> >> > initially to make the behavior of the patch that enabled the factor
> >> > identical with not having the patch.  If you are more concerned with
> >> > saving power than maximizing throughput, you might consider a much
> >> > lower value like 5 or 10.
> >>
> >> Yes, I've tried different values and 200 turned out to be the best based
> >> on my preferences (throughput over power saving). It makes a big
> >> difference in the compile time of bigger projects, especially during the
> >> configuration phase.
> >>
> >> But I have found the root cause of symptoms described above by
> >> bisection. It turned out that 2.6.39 is also affected, so I've bisected
> >> down to 2.6.38.
> >> This is the result:
> >>
> >>  5cb2c3bd0c5e0f3ced63f250ec2ad59d7c5c626a is the first bad commit
> >>  commit 5cb2c3bd0c5e0f3ced63f250ec2ad59d7c5c626a
> >>  Author: Vincent Guittot <vincent.guittot@linaro.org>
> >>  Date:   Mon Feb 7 17:14:25 2011 +0100
> >>
> >>      [CPUFREQ] calculate delay after dbs_check_cpu
> >>
> >> When I revert the above in 3.0-rc1 the CONFIG_NO_HZ=y symptoms vanish.
> >
> 
> The patch, you have mentioned, solves a problem when ondemand governor
> goes  from highest frequency to a lower one. Without the patch, the
> governor uses the longest sampling period (sampling period * scaling
> down factor) with a low frequency during the 1st period after
> decreasing the frequency. This can lead to a large time frame
> (sampling period * scaling down factor) with a low frequency but an
> overloaded cpu.

The problem with the patch is that it results in an ondemand behavior
that almost totally ignores the middle frequencies (2100 and 2500 MHz in
my case) with CONFIG_NO_HZ. If you also set the sampling_down_factor to
something like >=100 then the CPU will spend much of the time at the top
frequency even if there is no workload whatsoever.

> The other correction of the patch is linked to the powersave bias
> mode. The governor didn't use the right period for the low frequency
> step (freq_lo_jiffies) but a larger one (sampling period * scaling
> down factor). The ratio between low and high frequency was not the
> right one.
> 
> Do you use the powersave bias mode ?

No.

> Could you give us more statistics : the number of state transition
> could be an interesting value. Is there a difference with and without
> CONFIG_NO_HZ ? What is your sampling rate ?

These are my settings:

ignore_nice_load 0
io_is_busy 0
powersave_bias 0
sampling_down_factor 200
sampling_rate 10000
sampling_rate_min 10000
up_threshold 95

cat sys/devices/system/cpu/cpu0/cpufreq/stats/* on an otherwise idle
machine with CONFIG_NO_HZ and 5cb2c3bd0c5e0f reverted:
3200000 532
2500000 172
2100000 2703
800000 20995
153

and with your patch and also CONFIG_NO_HZ:
3200000 11795
2500000 0
2100000 0
800000 20620
213

Which shows the problem very nicely.

> One difference with CONFIG_NO_HZ is the real sampling period which can
> be greater than the timer configuration because of the deferrable
> mode. The deferrable mode has nearly no effect when CONFIG_NO_HZ is
> not set because the tick timer will ensure enough cpu activity to
> trigger the governor. When CONFIG_NO_HZ is set, the ondemand governor
> work is triggered at the beginning of a cpu activity so we have more
> chance to have a short cpu load in one period instead of splitting it
> into 2 differents periods. This behavior is quite useful for
> responsiveness but can generates spurious frequency increase if the
> sampling rate is too short.

Hm, my sampling rate (10000) is already the most minimal rate available.

-- 
Markus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: switching to top frequency too frequent with ondemand governor and no_hz
  2011-06-06 11:20         ` Markus Trippelsdorf
@ 2011-06-06 13:11           ` Vincent Guittot
  2011-06-06 14:16             ` Markus Trippelsdorf
  0 siblings, 1 reply; 12+ messages in thread
From: Vincent Guittot @ 2011-06-06 13:11 UTC (permalink / raw)
  To: Markus Trippelsdorf; +Cc: David C Niemi, cpufreq, Dave Jones, linux-kernel

On 6 June 2011 13:20, Markus Trippelsdorf <markus@trippelsdorf.de> wrote:
> On 2011.06.06 at 09:35 +0200, Vincent Guittot wrote:
>> On 2 June 2011 13:41, Markus Trippelsdorf <markus@trippelsdorf.de> wrote:
>> > On 2011.06.01 at 20:00 +0200, Markus Trippelsdorf wrote:
>> >> On 2011.06.01 at 13:34 -0400, David C Niemi wrote:
>> >> > On 06/01/2011 12:08 PM, Markus Trippelsdorf wrote:
>> >> > > There seems to be a major difference in the behavior of the ondemand
>> >> > > governor depending on whether CONFIG_NO_HZ is set or not in the kernel
>> >> > > .config.
>> >> > >
>> >> > > In the NO_HZ case the ondemand governor spends too much time at the
>> >> > > highest frequency and is also very trigger happy.
>> >> > >
>> >> > > I have compared the two cases on my system:
>> >> > > powernow-k8: Found 1 AMD Phenom(tm) II X4 955 Processor (4 cpu cores) (version 2.20.00)
>> >> > > powernow-k8:    0 : pstate 0 (3200 MHz)
>> >> > > powernow-k8:    1 : pstate 1 (2500 MHz)
>> >> > > powernow-k8:    2 : pstate 2 (2100 MHz)
>> >> > > powernow-k8:    3 : pstate 3 (800 MHz)
>> >> > >
>> >> > > When I run:
>> >> > > watch -n.1 'cat /proc/cpuinfo|grep MHz'
>> >> > > on an otherwise idle system, I can see that the frequency always stays
>> >> > > at 800 MHz in the "CONFIG_NO_HZ not set" case. But it will very
>> >> > > frequently switch to 3200 MHz in the CONFIG_NO_HZ=y case under the same
>> >> > > conditions.
>> >> > >
>> >> > > This also manifests itself in the cpufreq/stats/time_in_state
>> >> > > statistics (again on a mostly idle system):
>> >> > >
>> >> > > First taken with:
>> >> > > echo 200 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor
>> >> > > (BTW wouldn't it make sense to use something like this as the default
>> >> > > value?)
>> >> > >
>> >> > > cat /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state
>> >> > >
>> >> > > CONFIG_NO_HZ not set:
>> >> > > 3200000 5845
>> >> > > 2500000 0
>> >> > > 2100000 5
>> >> > > 800000 31552
>> >> > >
>> >> > > CONFIG_NO_HZ=y:
>> >> > > 3200000 17650
>> >> > > 2500000 0
>> >> > > 2100000 0
>> >> > > 800000 31129
>> >> > >
>> >> > >
>> >> > > And with the default sampling_down_factor=1
>> >> > >
>> >> > > CONFIG_NO_HZ not set:
>> >> > > 3200000 140
>> >> > > 2500000 2
>> >> > > 2100000 29
>> >> > > 800000 16614
>> >> > >
>> >> > > CONFIG_NO_HZ=y:
>> >> > > 3200000 538
>> >> > > 2500000 9
>> >> > > 2100000 77
>> >> > > 800000 16287
>> >> > >
>> >> > > Now my question is, is this expected? And what could be done to make the
>> >> > > NO_HZ behavior more like the "CONFIG_NO_HZ not set" behavior.
>> >> >
>> >> > A very interesting bit of information.  What do you have set for
>> >> > up_threshold?  You may have to set it higher for CONFIG_NO_HZ than
>> >> > without, based on your symptoms.  Another thing to look at is your
>> >> > sampling_rate.  I'm guessing it differs between CONFIG_NO_HZ being set
>> >> > or not.
>> >>
>> >> I've played with all those parameters, but unfortunately it didn't make
>> >> any difference.
>> >>
>> >> > And perhaps you need to set sampling_down_factor a bit lower.  I
>> >> > consider 100 a reasonable default, but a default of "1" was put in
>> >> > initially to make the behavior of the patch that enabled the factor
>> >> > identical with not having the patch.  If you are more concerned with
>> >> > saving power than maximizing throughput, you might consider a much
>> >> > lower value like 5 or 10.
>> >>
>> >> Yes, I've tried different values and 200 turned out to be the best based
>> >> on my preferences (throughput over power saving). It makes a big
>> >> difference in the compile time of bigger projects, especially during the
>> >> configuration phase.
>> >>
>> >> But I have found the root cause of symptoms described above by
>> >> bisection. It turned out that 2.6.39 is also affected, so I've bisected
>> >> down to 2.6.38.
>> >> This is the result:
>> >>
>> >>  5cb2c3bd0c5e0f3ced63f250ec2ad59d7c5c626a is the first bad commit
>> >>  commit 5cb2c3bd0c5e0f3ced63f250ec2ad59d7c5c626a
>> >>  Author: Vincent Guittot <vincent.guittot@linaro.org>
>> >>  Date:   Mon Feb 7 17:14:25 2011 +0100
>> >>
>> >>      [CPUFREQ] calculate delay after dbs_check_cpu
>> >>
>> >> When I revert the above in 3.0-rc1 the CONFIG_NO_HZ=y symptoms vanish.
>> >
>>
>> The patch, you have mentioned, solves a problem when ondemand governor
>> goes  from highest frequency to a lower one. Without the patch, the
>> governor uses the longest sampling period (sampling period * scaling
>> down factor) with a low frequency during the 1st period after
>> decreasing the frequency. This can lead to a large time frame
>> (sampling period * scaling down factor) with a low frequency but an
>> overloaded cpu.
>
> The problem with the patch is that it results in an ondemand behavior
> that almost totally ignores the middle frequencies (2100 and 2500 MHz in
> my case) with CONFIG_NO_HZ. If you also set the sampling_down_factor to
> something like >=100 then the CPU will spend much of the time at the top
> frequency even if there is no workload whatsoever.
>

In fact, one main goal of the ondemand governor is to switch to max
frequency as soon as there is a cpu activity is detected to ensure the
responsiveness of the system. If your idle activity is made of burst
of cpu activity and your sampling period is small,  your sytems will
switch between the highest and the lowest frequency. At the contrary,
the conservative governor modifies the frequency in a step by step
manner.

>> The other correction of the patch is linked to the powersave bias
>> mode. The governor didn't use the right period for the low frequency
>> step (freq_lo_jiffies) but a larger one (sampling period * scaling
>> down factor). The ratio between low and high frequency was not the
>> right one.
>>
>> Do you use the powersave bias mode ?
>
> No.
>
>> Could you give us more statistics : the number of state transition
>> could be an interesting value. Is there a difference with and without
>> CONFIG_NO_HZ ? What is your sampling rate ?
>
> These are my settings:
>
> ignore_nice_load 0
> io_is_busy 0
> powersave_bias 0
> sampling_down_factor 200
> sampling_rate 10000
> sampling_rate_min 10000
> up_threshold 95
>
> cat sys/devices/system/cpu/cpu0/cpufreq/stats/* on an otherwise idle
> machine with CONFIG_NO_HZ and 5cb2c3bd0c5e0f reverted:
> 3200000 532
> 2500000 172
> 2100000 2703
> 800000 20995
> 153
>

With this configuration (without the patch), there is a period of 2
seconds with a low frequency when the governor comes back from the
highest frequency. During these 2 seconds, you will not be able to go
back to max frequency. So, if your cpu is overloaded during this 2
seconds period, you will not increase your frequency. For this use
case, your cpufreq responsiveness is more then 2 seconds.

> and with your patch and also CONFIG_NO_HZ:
> 3200000 11795
> 2500000 0
> 2100000 0
> 800000 20620
> 213
>
> Which shows the problem very nicely.
>

My understand is that your idle activity is made of cpu activities
which are 10ms long and which trigs the increase of the frequency.

>> One difference with CONFIG_NO_HZ is the real sampling period which can
>> be greater than the timer configuration because of the deferrable
>> mode. The deferrable mode has nearly no effect when CONFIG_NO_HZ is
>> not set because the tick timer will ensure enough cpu activity to
>> trigger the governor. When CONFIG_NO_HZ is set, the ondemand governor
>> work is triggered at the beginning of a cpu activity so we have more
>> chance to have a short cpu load in one period instead of splitting it
>> into 2 differents periods. This behavior is quite useful for
>> responsiveness but can generates spurious frequency increase if the
>> sampling rate is too short.
>
> Hm, my sampling rate (10000) is already the most minimal rate available.
>

It's seems that your sampling period is too small and the ondemand
governor detects your idle activity as an increase of the cpu activity
and as a result, it increases the frequency. Have you tried to
increase the sampling rate and decrease your sampling_down_factor
which seems to be also quite high ?

Vincent

> --
> Markus
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: switching to top frequency too frequent with ondemand governor and no_hz
  2011-06-06 13:11           ` Vincent Guittot
@ 2011-06-06 14:16             ` Markus Trippelsdorf
  2011-06-06 16:34               ` Vincent Guittot
  0 siblings, 1 reply; 12+ messages in thread
From: Markus Trippelsdorf @ 2011-06-06 14:16 UTC (permalink / raw)
  To: Vincent Guittot; +Cc: David C Niemi, cpufreq, Dave Jones, linux-kernel

On 2011.06.06 at 15:11 +0200, Vincent Guittot wrote:
> On 6 June 2011 13:20, Markus Trippelsdorf <markus@trippelsdorf.de> wrote:
> > On 2011.06.06 at 09:35 +0200, Vincent Guittot wrote:
> >> On 2 June 2011 13:41, Markus Trippelsdorf <markus@trippelsdorf.de> wrote:
> >> > On 2011.06.01 at 20:00 +0200, Markus Trippelsdorf wrote:
> >> >> But I have found the root cause of symptoms described above by
> >> >> bisection. It turned out that 2.6.39 is also affected, so I've bisected
> >> >> down to 2.6.38.
> >> >> This is the result:
> >> >>
> >> >>  5cb2c3bd0c5e0f3ced63f250ec2ad59d7c5c626a is the first bad commit
> >> >>  commit 5cb2c3bd0c5e0f3ced63f250ec2ad59d7c5c626a
> >> >>  Author: Vincent Guittot <vincent.guittot@linaro.org>
> >> >>  Date:   Mon Feb 7 17:14:25 2011 +0100
> >> >>
> >> >>      [CPUFREQ] calculate delay after dbs_check_cpu
> >> >>
> >> >> When I revert the above in 3.0-rc1 the CONFIG_NO_HZ=y symptoms vanish.
> >> >
> >>
> >> The patch, you have mentioned, solves a problem when ondemand governor
> >> goes  from highest frequency to a lower one. Without the patch, the
> >> governor uses the longest sampling period (sampling period * scaling
> >> down factor) with a low frequency during the 1st period after
> >> decreasing the frequency. This can lead to a large time frame
> >> (sampling period * scaling down factor) with a low frequency but an
> >> overloaded cpu.
> >
> > The problem with the patch is that it results in an ondemand behavior
> > that almost totally ignores the middle frequencies (2100 and 2500 MHz in
> > my case) with CONFIG_NO_HZ. If you also set the sampling_down_factor to
> > something like >=100 then the CPU will spend much of the time at the top
> > frequency even if there is no workload whatsoever.
> >
> 
> In fact, one main goal of the ondemand governor is to switch to max
> frequency as soon as there is a cpu activity is detected to ensure the
> responsiveness of the system. If your idle activity is made of burst
> of cpu activity and your sampling period is small,  your sytems will
> switch between the highest and the lowest frequency. At the contrary,
> the conservative governor modifies the frequency in a step by step
> manner.

Understood. But this a change in behavior due to your patch.

> >> The other correction of the patch is linked to the powersave bias
> >> mode. The governor didn't use the right period for the low frequency
> >> step (freq_lo_jiffies) but a larger one (sampling period * scaling
> >> down factor). The ratio between low and high frequency was not the
> >> right one.
> >>
> >> Do you use the powersave bias mode ?
> >
> > No.
> >
> >> Could you give us more statistics : the number of state transition
> >> could be an interesting value. Is there a difference with and without
> >> CONFIG_NO_HZ ? What is your sampling rate ?
> >
> > These are my settings:
> >
> > ignore_nice_load 0
> > io_is_busy 0
> > powersave_bias 0
> > sampling_down_factor 200
> > sampling_rate 10000
> > sampling_rate_min 10000
> > up_threshold 95
> >
> > cat sys/devices/system/cpu/cpu0/cpufreq/stats/* on an otherwise idle
> > machine with CONFIG_NO_HZ and 5cb2c3bd0c5e0f reverted:
> > 3200000 532
> > 2500000 172
> > 2100000 2703
> > 800000 20995
> > 153
> >
> 
> With this configuration (without the patch), there is a period of 2
> seconds with a low frequency when the governor comes back from the
> highest frequency. During these 2 seconds, you will not be able to go
> back to max frequency. So, if your cpu is overloaded during this 2
> seconds period, you will not increase your frequency. For this use
> case, your cpufreq responsiveness is more then 2 seconds.

I don't see these 2 second delays (being stuck on a low frequency) on my
system. On the contrary as soon as there is sufficient load it switches
to the highest frequency immediately.

> > and with your patch and also CONFIG_NO_HZ:
> > 3200000 11795
> > 2500000 0
> > 2100000 0
> > 800000 20620
> > 213
> >
> > Which shows the problem very nicely.
> >
> 
> My understand is that your idle activity is made of cpu activities
> which are 10ms long and which trigs the increase of the frequency.

Could it be that the call to dbs_check_cpu(dbs_info) itself is the
reason for these activities?

> >> One difference with CONFIG_NO_HZ is the real sampling period which can
> >> be greater than the timer configuration because of the deferrable
> >> mode. The deferrable mode has nearly no effect when CONFIG_NO_HZ is
> >> not set because the tick timer will ensure enough cpu activity to
> >> trigger the governor. When CONFIG_NO_HZ is set, the ondemand governor
> >> work is triggered at the beginning of a cpu activity so we have more
> >> chance to have a short cpu load in one period instead of splitting it
> >> into 2 differents periods. This behavior is quite useful for
> >> responsiveness but can generates spurious frequency increase if the
> >> sampling rate is too short.
> >
> > Hm, my sampling rate (10000) is already the most minimal rate available.
> >
> 
> It's seems that your sampling period is too small and the ondemand
> governor detects your idle activity as an increase of the cpu activity
> and as a result, it increases the frequency. Have you tried to
> increase the sampling rate and decrease your sampling_down_factor
> which seems to be also quite high ?

Please note that these are all default values (with the exception of
sampling_down_factor). So why should I fiddle with the parameters when
everything was working fine before your patch went in? And even if I
increase the sampling rate and decrease the sampling_down_factor, I
cannot replicate the old behavior. So IMHO it's a regression.

Thanks.
-- 
Markus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: switching to top frequency too frequent with ondemand governor and no_hz
  2011-06-06 14:16             ` Markus Trippelsdorf
@ 2011-06-06 16:34               ` Vincent Guittot
  2011-06-06 17:51                 ` Markus Trippelsdorf
  2011-06-06 19:43                 ` Markus Trippelsdorf
  0 siblings, 2 replies; 12+ messages in thread
From: Vincent Guittot @ 2011-06-06 16:34 UTC (permalink / raw)
  To: Markus Trippelsdorf; +Cc: David C Niemi, cpufreq, Dave Jones, linux-kernel

On 6 June 2011 16:16, Markus Trippelsdorf <markus@trippelsdorf.de> wrote:
> On 2011.06.06 at 15:11 +0200, Vincent Guittot wrote:
>> On 6 June 2011 13:20, Markus Trippelsdorf <markus@trippelsdorf.de> wrote:
>> > On 2011.06.06 at 09:35 +0200, Vincent Guittot wrote:
>> >> On 2 June 2011 13:41, Markus Trippelsdorf <markus@trippelsdorf.de> wrote:
>> >> > On 2011.06.01 at 20:00 +0200, Markus Trippelsdorf wrote:
>> >> >> But I have found the root cause of symptoms described above by
>> >> >> bisection. It turned out that 2.6.39 is also affected, so I've bisected
>> >> >> down to 2.6.38.
>> >> >> This is the result:
>> >> >>
>> >> >>  5cb2c3bd0c5e0f3ced63f250ec2ad59d7c5c626a is the first bad commit
>> >> >>  commit 5cb2c3bd0c5e0f3ced63f250ec2ad59d7c5c626a
>> >> >>  Author: Vincent Guittot <vincent.guittot@linaro.org>
>> >> >>  Date:   Mon Feb 7 17:14:25 2011 +0100
>> >> >>
>> >> >>      [CPUFREQ] calculate delay after dbs_check_cpu
>> >> >>
>> >> >> When I revert the above in 3.0-rc1 the CONFIG_NO_HZ=y symptoms vanish.
>> >> >
>> >>
>> >> The patch, you have mentioned, solves a problem when ondemand governor
>> >> goes  from highest frequency to a lower one. Without the patch, the
>> >> governor uses the longest sampling period (sampling period * scaling
>> >> down factor) with a low frequency during the 1st period after
>> >> decreasing the frequency. This can lead to a large time frame
>> >> (sampling period * scaling down factor) with a low frequency but an
>> >> overloaded cpu.
>> >
>> > The problem with the patch is that it results in an ondemand behavior
>> > that almost totally ignores the middle frequencies (2100 and 2500 MHz in
>> > my case) with CONFIG_NO_HZ. If you also set the sampling_down_factor to
>> > something like >=100 then the CPU will spend much of the time at the top
>> > frequency even if there is no workload whatsoever.
>> >
>>
>> In fact, one main goal of the ondemand governor is to switch to max
>> frequency as soon as there is a cpu activity is detected to ensure the
>> responsiveness of the system. If your idle activity is made of burst
>> of cpu activity and your sampling period is small,  your sytems will
>> switch between the highest and the lowest frequency. At the contrary,
>> the conservative governor modifies the frequency in a step by step
>> manner.
>
> Understood. But this a change in behavior due to your patch.
>
>> >> The other correction of the patch is linked to the powersave bias
>> >> mode. The governor didn't use the right period for the low frequency
>> >> step (freq_lo_jiffies) but a larger one (sampling period * scaling
>> >> down factor). The ratio between low and high frequency was not the
>> >> right one.
>> >>
>> >> Do you use the powersave bias mode ?
>> >
>> > No.
>> >
>> >> Could you give us more statistics : the number of state transition
>> >> could be an interesting value. Is there a difference with and without
>> >> CONFIG_NO_HZ ? What is your sampling rate ?
>> >
>> > These are my settings:
>> >
>> > ignore_nice_load 0
>> > io_is_busy 0
>> > powersave_bias 0
>> > sampling_down_factor 200
>> > sampling_rate 10000
>> > sampling_rate_min 10000
>> > up_threshold 95
>> >
>> > cat sys/devices/system/cpu/cpu0/cpufreq/stats/* on an otherwise idle
>> > machine with CONFIG_NO_HZ and 5cb2c3bd0c5e0f reverted:
>> > 3200000 532
>> > 2500000 172
>> > 2100000 2703
>> > 800000 20995
>> > 153
>> >
>>
>> With this configuration (without the patch), there is a period of 2
>> seconds with a low frequency when the governor comes back from the
>> highest frequency. During these 2 seconds, you will not be able to go
>> back to max frequency. So, if your cpu is overloaded during this 2
>> seconds period, you will not increase your frequency. For this use
>> case, your cpufreq responsiveness is more then 2 seconds.
>
> I don't see these 2 second delays (being stuck on a low frequency) on my
> system. On the contrary as soon as there is sufficient load it switches
> to the highest frequency immediately.
>

Let assume that your system is at the highest frequency

without the patch, you have the following sequence :

->do_dbs_timer
    -> delay = usecs_to_jiffies(dbs_tuners_ins.sampling_rate *
dbs_info->rate_mult); // delay will be equal to 10000*200=2000000us
    -> dbs_check_cpu
           Let assume that your cpu load is quite small
          -> freq_next = max_load_freq / (dbs_tuners_ins.up_threshold
- dbs_tuners_ins.down_differential); //freq_next is set to your lowest
frequency
          -> __cpufreq_driver_target(policy, freq_next, CPUFREQ_RELATION_L);
    -> 	queue_delayed_work_on(cpu, kondemand_wq, &dbs_info->work, delay);

the delay value is set to sampling_rate * rate_mult but the frequency
is the lowest one which is not the correct behavior of the
sampling_down_factor feature.
the patch only solves this issue.

>> > and with your patch and also CONFIG_NO_HZ:
>> > 3200000 11795
>> > 2500000 0
>> > 2100000 0
>> > 800000 20620
>> > 213
>> >
>> > Which shows the problem very nicely.
>> >
>>
>> My understand is that your idle activity is made of cpu activities
>> which are 10ms long and which trigs the increase of the frequency.
>
> Could it be that the call to dbs_check_cpu(dbs_info) itself is the
> reason for these activities?
>
>> >> One difference with CONFIG_NO_HZ is the real sampling period which can
>> >> be greater than the timer configuration because of the deferrable
>> >> mode. The deferrable mode has nearly no effect when CONFIG_NO_HZ is
>> >> not set because the tick timer will ensure enough cpu activity to
>> >> trigger the governor. When CONFIG_NO_HZ is set, the ondemand governor
>> >> work is triggered at the beginning of a cpu activity so we have more
>> >> chance to have a short cpu load in one period instead of splitting it
>> >> into 2 differents periods. This behavior is quite useful for
>> >> responsiveness but can generates spurious frequency increase if the
>> >> sampling rate is too short.
>> >
>> > Hm, my sampling rate (10000) is already the most minimal rate available.
>> >
>>
>> It's seems that your sampling period is too small and the ondemand
>> governor detects your idle activity as an increase of the cpu activity
>> and as a result, it increases the frequency. Have you tried to
>> increase the sampling rate and decrease your sampling_down_factor
>> which seems to be also quite high ?
>
> Please note that these are all default values (with the exception of
> sampling_down_factor). So why should I fiddle with the parameters when
> everything was working fine before your patch went in? And even if I
> increase the sampling rate and decrease the sampling_down_factor, I
> cannot replicate the old behavior. So IMHO it's a regression.
>

IMHO, the previous results were "good" because of the bug in the
sampling_down_factor which was "filtering" some cpu activities after
decreasing the frequency.

The best cpufreq statistic should be achieved in idle when the
sampling_down_factor is set to 1 because the sampling_down_factor
feature has been done to "improve performance by reducing the overhead
of load evaluation and helping the CPU stay at its top speed"
(Documentation/cpu-freq/governors.txt).

Could you make some measurements with sampling_down_factor set to 1
and sampling_down_factor set to 200 ? The cpufreq statistic starts at
system boot but we are interested in idle use case result so we should
use the delta between 2 statistics outputs in order to remove boot
measurements. Using the following command in idle should be enough #
cat /sys/devices/system/cpu/cpu0/cpufreq/stats/* && sleep 60 && cat
/sys/devices/system/cpu/cpu0/cpufreq/stats/*

I have tested different configuration on my dual core Arm platform (
sampling_down_factor=1, 10; CONFIG_NO_HZ set or not) but I don't have
any difference.

my settings are :
ignore_nice_load 0
io_is_busy 0
powersave_bias 0
sampling_down_factor 10
sampling_rate 20000
sampling_rate_min 20000
up_threshold 95

Thanks,

Vincent

> Thanks.
> --
> Markus
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: switching to top frequency too frequent with ondemand governor and no_hz
  2011-06-06 16:34               ` Vincent Guittot
@ 2011-06-06 17:51                 ` Markus Trippelsdorf
  2011-06-07  7:34                   ` Vincent Guittot
  2011-06-06 19:43                 ` Markus Trippelsdorf
  1 sibling, 1 reply; 12+ messages in thread
From: Markus Trippelsdorf @ 2011-06-06 17:51 UTC (permalink / raw)
  To: Vincent Guittot; +Cc: David C Niemi, cpufreq, Dave Jones, linux-kernel

On 2011.06.06 at 18:34 +0200, Vincent Guittot wrote:
> On 6 June 2011 16:16, Markus Trippelsdorf <markus@trippelsdorf.de> wrote:
> > On 2011.06.06 at 15:11 +0200, Vincent Guittot wrote:
> >> On 6 June 2011 13:20, Markus Trippelsdorf <markus@trippelsdorf.de> wrote:
> >> > On 2011.06.06 at 09:35 +0200, Vincent Guittot wrote:
> >> >> On 2 June 2011 13:41, Markus Trippelsdorf <markus@trippelsdorf.de> wrote:
> >> >> > On 2011.06.01 at 20:00 +0200, Markus Trippelsdorf wrote:
> >> >> >> But I have found the root cause of symptoms described above by
> >> >> >> bisection. It turned out that 2.6.39 is also affected, so I've bisected
> >> >> >> down to 2.6.38.
> >> >> >> This is the result:
> >> >> >>
> >> >> >>  5cb2c3bd0c5e0f3ced63f250ec2ad59d7c5c626a is the first bad commit
> >> >> >>  commit 5cb2c3bd0c5e0f3ced63f250ec2ad59d7c5c626a
> >> >> >>  Author: Vincent Guittot <vincent.guittot@linaro.org>
> >> >> >>  Date:   Mon Feb 7 17:14:25 2011 +0100
> >> >> >>
> >> >> >>      [CPUFREQ] calculate delay after dbs_check_cpu
> >> >> >>
> >> >> >> When I revert the above in 3.0-rc1 the CONFIG_NO_HZ=y symptoms vanish.
> >> >> >
> >> >>
> >> >> The patch, you have mentioned, solves a problem when ondemand governor
> >> >> goes  from highest frequency to a lower one. Without the patch, the
> >> >> governor uses the longest sampling period (sampling period * scaling
> >> >> down factor) with a low frequency during the 1st period after
> >> >> decreasing the frequency. This can lead to a large time frame
> >> >> (sampling period * scaling down factor) with a low frequency but an
> >> >> overloaded cpu.
> >> >
> >> > The problem with the patch is that it results in an ondemand behavior
> >> > that almost totally ignores the middle frequencies (2100 and 2500 MHz in
> >> > my case) with CONFIG_NO_HZ. If you also set the sampling_down_factor to
> >> > something like >=100 then the CPU will spend much of the time at the top
> >> > frequency even if there is no workload whatsoever.
> >> >
> >>
> >> In fact, one main goal of the ondemand governor is to switch to max
> >> frequency as soon as there is a cpu activity is detected to ensure the
> >> responsiveness of the system. If your idle activity is made of burst
> >> of cpu activity and your sampling period is small,  your sytems will
> >> switch between the highest and the lowest frequency. At the contrary,
> >> the conservative governor modifies the frequency in a step by step
> >> manner.
> >
> > Understood. But this a change in behavior due to your patch.
> >
> >> >> The other correction of the patch is linked to the powersave bias
> >> >> mode. The governor didn't use the right period for the low frequency
> >> >> step (freq_lo_jiffies) but a larger one (sampling period * scaling
> >> >> down factor). The ratio between low and high frequency was not the
> >> >> right one.
> >> >>
> >> >> Do you use the powersave bias mode ?
> >> >
> >> > No.
> >> >
> >> >> Could you give us more statistics : the number of state transition
> >> >> could be an interesting value. Is there a difference with and without
> >> >> CONFIG_NO_HZ ? What is your sampling rate ?
> >> >
> >> > These are my settings:
> >> >
> >> > ignore_nice_load 0
> >> > io_is_busy 0
> >> > powersave_bias 0
> >> > sampling_down_factor 200
> >> > sampling_rate 10000
> >> > sampling_rate_min 10000
> >> > up_threshold 95
> >> >
> >> > cat sys/devices/system/cpu/cpu0/cpufreq/stats/* on an otherwise idle
> >> > machine with CONFIG_NO_HZ and 5cb2c3bd0c5e0f reverted:
> >> > 3200000 532
> >> > 2500000 172
> >> > 2100000 2703
> >> > 800000 20995
> >> > 153
> >> >
> >>
> >> With this configuration (without the patch), there is a period of 2
> >> seconds with a low frequency when the governor comes back from the
> >> highest frequency. During these 2 seconds, you will not be able to go
> >> back to max frequency. So, if your cpu is overloaded during this 2
> >> seconds period, you will not increase your frequency. For this use
> >> case, your cpufreq responsiveness is more then 2 seconds.
> >
> > I don't see these 2 second delays (being stuck on a low frequency) on my
> > system. On the contrary as soon as there is sufficient load it switches
> > to the highest frequency immediately.
> >
> 
> Let assume that your system is at the highest frequency
> 
> without the patch, you have the following sequence :
> 
> ->do_dbs_timer
>     -> delay = usecs_to_jiffies(dbs_tuners_ins.sampling_rate *
> dbs_info->rate_mult); // delay will be equal to 10000*200=2000000us
>     -> dbs_check_cpu
>            Let assume that your cpu load is quite small
>           -> freq_next = max_load_freq / (dbs_tuners_ins.up_threshold
> - dbs_tuners_ins.down_differential); //freq_next is set to your lowest
> frequency
>           -> __cpufreq_driver_target(policy, freq_next, CPUFREQ_RELATION_L);
>     -> 	queue_delayed_work_on(cpu, kondemand_wq, &dbs_info->work, delay);
> 
> the delay value is set to sampling_rate * rate_mult but the frequency
> is the lowest one which is not the correct behavior of the
> sampling_down_factor feature.
> the patch only solves this issue.
> 
> >> > and with your patch and also CONFIG_NO_HZ:
> >> > 3200000 11795
> >> > 2500000 0
> >> > 2100000 0
> >> > 800000 20620
> >> > 213
> >> >
> >> > Which shows the problem very nicely.
> >> >
> >>
> >> My understand is that your idle activity is made of cpu activities
> >> which are 10ms long and which trigs the increase of the frequency.
> >
> > Could it be that the call to dbs_check_cpu(dbs_info) itself is the
> > reason for these activities?
> >
> >> >> One difference with CONFIG_NO_HZ is the real sampling period which can
> >> >> be greater than the timer configuration because of the deferrable
> >> >> mode. The deferrable mode has nearly no effect when CONFIG_NO_HZ is
> >> >> not set because the tick timer will ensure enough cpu activity to
> >> >> trigger the governor. When CONFIG_NO_HZ is set, the ondemand governor
> >> >> work is triggered at the beginning of a cpu activity so we have more
> >> >> chance to have a short cpu load in one period instead of splitting it
> >> >> into 2 differents periods. This behavior is quite useful for
> >> >> responsiveness but can generates spurious frequency increase if the
> >> >> sampling rate is too short.
> >> >
> >> > Hm, my sampling rate (10000) is already the most minimal rate available.
> >> >
> >>
> >> It's seems that your sampling period is too small and the ondemand
> >> governor detects your idle activity as an increase of the cpu activity
> >> and as a result, it increases the frequency. Have you tried to
> >> increase the sampling rate and decrease your sampling_down_factor
> >> which seems to be also quite high ?
> >
> > Please note that these are all default values (with the exception of
> > sampling_down_factor). So why should I fiddle with the parameters when
> > everything was working fine before your patch went in? And even if I
> > increase the sampling rate and decrease the sampling_down_factor, I
> > cannot replicate the old behavior. So IMHO it's a regression.
> >
> 
> IMHO, the previous results were "good" because of the bug in the
> sampling_down_factor which was "filtering" some cpu activities after
> decreasing the frequency.
> 
> The best cpufreq statistic should be achieved in idle when the
> sampling_down_factor is set to 1 because the sampling_down_factor
> feature has been done to "improve performance by reducing the overhead
> of load evaluation and helping the CPU stay at its top speed"
> (Documentation/cpu-freq/governors.txt).
> 
> Could you make some measurements with sampling_down_factor set to 1
> and sampling_down_factor set to 200 ? The cpufreq statistic starts at
> system boot but we are interested in idle use case result so we should
> use the delta between 2 statistics outputs in order to remove boot
> measurements. Using the following command in idle should be enough #
> cat /sys/devices/system/cpu/cpu0/cpufreq/stats/* && sleep 60 && cat
> /sys/devices/system/cpu/cpu0/cpufreq/stats/*

OK. 

On a totally idle system: 

1) With your patch: 

* sampling_down_factor=200
cat /sys/devices/system/cpu/cpu0/cpufreq/stats/* && sleep 60 && cat /sys/devices/system/cpu/cpu0/cpufreq/stats/*
3200000 507	
2500000 0
2100000 0
800000 903
13
3200000 533
2500000 0
2100000 0
800000 6876
14

diff:
3200000 26
2500000 0
2100000 0
800000 5973

* sampling_down_factor=1
3200000 1078
2500000 3
2100000 49
800000 15632
79
3200000 1078
2500000 3
2100000 49
800000 21632
79

diff:
3200000 0
2500000 0
2100000 0
800000 6000


2) Without your patch (reverted):

* sampling_down_factor=200
3200000 106
2500000 0
2100000 339
800000 1260
15
3200000 106
2500000 0
2100000 339
800000 7259
15

diff:
3200000 0
2500000 0
2100000 0
800000 5999

* sampling_down_factor=1
3200000 134
2500000 142
2100000 694
800000 13006
30
3200000 134
2500000 142
2100000 694
800000 19005
30

diff:
3200000 0
2500000 0
2100000 0
800000 5999


And now the same measurements while running:
watch -n.1 'cat /proc/cpuinfo|grep MHz'
in another terminal.

1) With your patch:

* sampling_down_factor=200
3200000 1243
2500000 4
2100000 68
800000 36493
187
3200000 1373
2500000 4
2100000 68
800000 42363
192

diff:
3200000 130
2500000 0
2100000 0
800000 5870

* sampling_down_factor=1
3200000 1205
2500000 4
2100000 67
800000 27873
171
3200000 1209
2500000 4
2100000 67
800000 33869
179

diff:
3200000 4
2500000 0
2100000 0
800000 5996

2) Without your patch (reverted):

* sampling_down_factor=200
3200000 240
2500000 0
2100000 505
800000 12842
41
3200000 245
2500000 0
2100000 505
800000 18836
51

diff:
3200000 5
2500000 0
2100000 0
800000 5994

* sampling_down_factor=1
3200000 230
2500000 0
2100000 505
800000 5497
31
3200000 234
2500000 0
2100000 505
800000 11493
39

diff:
3200000 4
2500000 0
2100000 0
800000 5996

So, with sampling_down_factor=200 and "watch -n.1" running, the CPU
spends 1300 msec on top speed vs. 50 msec without your patch.

BTW what irritates me is that "watch -n.1 'cat /proc/cpuinfo|grep MHz'"
shows way more frequency changes than what is reported in cpufreq/stats/.

-- 
Markus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: switching to top frequency too frequent with ondemand governor and no_hz
  2011-06-06 16:34               ` Vincent Guittot
  2011-06-06 17:51                 ` Markus Trippelsdorf
@ 2011-06-06 19:43                 ` Markus Trippelsdorf
  1 sibling, 0 replies; 12+ messages in thread
From: Markus Trippelsdorf @ 2011-06-06 19:43 UTC (permalink / raw)
  To: Vincent Guittot; +Cc: David C Niemi, cpufreq, Dave Jones, linux-kernel

On 2011.06.06 at 18:34 +0200, Vincent Guittot wrote:
> On 6 June 2011 16:16, Markus Trippelsdorf <markus@trippelsdorf.de> wrote:
> > On 2011.06.06 at 15:11 +0200, Vincent Guittot wrote:
> >> On 6 June 2011 13:20, Markus Trippelsdorf <markus@trippelsdorf.de> wrote:
> >> > On 2011.06.06 at 09:35 +0200, Vincent Guittot wrote:
> >> >> On 2 June 2011 13:41, Markus Trippelsdorf <markus@trippelsdorf.de> wrote:
> >> >> > On 2011.06.01 at 20:00 +0200, Markus Trippelsdorf wrote:
> >> >> >> But I have found the root cause of symptoms described above by
> >> >> >> bisection. It turned out that 2.6.39 is also affected, so I've bisected
> >> >> >> down to 2.6.38.
> >> >> >> This is the result:
> >> >> >>
> >> >> >>  5cb2c3bd0c5e0f3ced63f250ec2ad59d7c5c626a is the first bad commit
> >> >> >>  commit 5cb2c3bd0c5e0f3ced63f250ec2ad59d7c5c626a
> >> >> >>  Author: Vincent Guittot <vincent.guittot@linaro.org>
> >> >> >>  Date:   Mon Feb 7 17:14:25 2011 +0100
> >> >> >>
> >> >> >>      [CPUFREQ] calculate delay after dbs_check_cpu
> >> >> >>
> >> >> >> When I revert the above in 3.0-rc1 the CONFIG_NO_HZ=y symptoms vanish.
> >> >> >
> >> >>
> >> >> The patch, you have mentioned, solves a problem when ondemand governor
> >> >> goes  from highest frequency to a lower one. Without the patch, the
> >> >> governor uses the longest sampling period (sampling period * scaling
> >> >> down factor) with a low frequency during the 1st period after
> >> >> decreasing the frequency. This can lead to a large time frame
> >> >> (sampling period * scaling down factor) with a low frequency but an
> >> >> overloaded cpu.
> >> >
> >> > The problem with the patch is that it results in an ondemand behavior
> >> > that almost totally ignores the middle frequencies (2100 and 2500 MHz in
> >> > my case) with CONFIG_NO_HZ. If you also set the sampling_down_factor to
> >> > something like >=100 then the CPU will spend much of the time at the top
> >> > frequency even if there is no workload whatsoever.
> >> >
> >>
> >> In fact, one main goal of the ondemand governor is to switch to max
> >> frequency as soon as there is a cpu activity is detected to ensure the
> >> responsiveness of the system. If your idle activity is made of burst
> >> of cpu activity and your sampling period is small,  your sytems will
> >> switch between the highest and the lowest frequency. At the contrary,
> >> the conservative governor modifies the frequency in a step by step
> >> manner.
> >
> > Understood. But this a change in behavior due to your patch.
> >
> >> >> The other correction of the patch is linked to the powersave bias
> >> >> mode. The governor didn't use the right period for the low frequency
> >> >> step (freq_lo_jiffies) but a larger one (sampling period * scaling
> >> >> down factor). The ratio between low and high frequency was not the
> >> >> right one.
> >> >>
> >> >> Do you use the powersave bias mode ?
> >> >
> >> > No.
> >> >
> >> >> Could you give us more statistics : the number of state transition
> >> >> could be an interesting value. Is there a difference with and without
> >> >> CONFIG_NO_HZ ? What is your sampling rate ?
> >> >
> >> > These are my settings:
> >> >
> >> > ignore_nice_load 0
> >> > io_is_busy 0
> >> > powersave_bias 0
> >> > sampling_down_factor 200
> >> > sampling_rate 10000
> >> > sampling_rate_min 10000
> >> > up_threshold 95
> >> >
> >> > cat sys/devices/system/cpu/cpu0/cpufreq/stats/* on an otherwise idle
> >> > machine with CONFIG_NO_HZ and 5cb2c3bd0c5e0f reverted:
> >> > 3200000 532
> >> > 2500000 172
> >> > 2100000 2703
> >> > 800000 20995
> >> > 153
> >> >
> >>
> >> With this configuration (without the patch), there is a period of 2
> >> seconds with a low frequency when the governor comes back from the
> >> highest frequency. During these 2 seconds, you will not be able to go
> >> back to max frequency. So, if your cpu is overloaded during this 2
> >> seconds period, you will not increase your frequency. For this use
> >> case, your cpufreq responsiveness is more then 2 seconds.
> >
> > I don't see these 2 second delays (being stuck on a low frequency) on my
> > system. On the contrary as soon as there is sufficient load it switches
> > to the highest frequency immediately.
> >
> 
> Let assume that your system is at the highest frequency
> 
> without the patch, you have the following sequence :
> 
> ->do_dbs_timer
>     -> delay = usecs_to_jiffies(dbs_tuners_ins.sampling_rate *
> dbs_info->rate_mult); // delay will be equal to 10000*200=2000000us
>     -> dbs_check_cpu
>            Let assume that your cpu load is quite small
>           -> freq_next = max_load_freq / (dbs_tuners_ins.up_threshold
> - dbs_tuners_ins.down_differential); //freq_next is set to your lowest
> frequency
>           -> __cpufreq_driver_target(policy, freq_next, CPUFREQ_RELATION_L);
>     -> 	queue_delayed_work_on(cpu, kondemand_wq, &dbs_info->work, delay);
> 
> the delay value is set to sampling_rate * rate_mult but the frequency
> is the lowest one which is not the correct behavior of the
> sampling_down_factor feature.
> the patch only solves this issue.

> 
> IMHO, the previous results were "good" because of the bug in the
> sampling_down_factor which was "filtering" some cpu activities after
> decreasing the frequency.

OK, this explains the issue that I was seeing.

To prove the point here are the "emerge" times of the ncurses library in
Gentoo (unpacking, configuration, compiling and installing) for
different sampling_down_factors.

sampling_down_factor    merge time                                                                                                                   
(with your patch)                                                                                                                                    
1                       1 minute and 59 seconds.                                                                                                     
20                      1 minute and 47 seconds.                                                                                                     
100                     1 minute and 29 seconds.                                                                                                     
150                     1 minute and 24 seconds.                                                                                                     
200                     1 minute and 22 seconds.                                                                                                     
300                     1 minute and 20 seconds.                                                                                                     
500                     1 minute and 12 seconds.                                                                                                     
1500                    1 minute and 7 seconds.                                                                                                      
(with patch reverted)                                                                                                                                
1                       2 minutes and 4 seconds.                                                                                                     
20                      1 minute and 55 seconds.                                                                                                     
200                     1 minute and 41 seconds.                                                                                                     

As you can see your patch always beats the reverted case. It also shows
that sampling_down_factor makes a huge difference in compilation
time.

-- 
Markus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: switching to top frequency too frequent with ondemand governor and no_hz
  2011-06-06 17:51                 ` Markus Trippelsdorf
@ 2011-06-07  7:34                   ` Vincent Guittot
  0 siblings, 0 replies; 12+ messages in thread
From: Vincent Guittot @ 2011-06-07  7:34 UTC (permalink / raw)
  To: Markus Trippelsdorf; +Cc: David C Niemi, cpufreq, Dave Jones, linux-kernel

On 6 June 2011 19:51, Markus Trippelsdorf <markus@trippelsdorf.de> wrote:
> On 2011.06.06 at 18:34 +0200, Vincent Guittot wrote:
>> On 6 June 2011 16:16, Markus Trippelsdorf <markus@trippelsdorf.de> wrote:
>> > On 2011.06.06 at 15:11 +0200, Vincent Guittot wrote:
>> >> On 6 June 2011 13:20, Markus Trippelsdorf <markus@trippelsdorf.de> wrote:
>> >> > On 2011.06.06 at 09:35 +0200, Vincent Guittot wrote:
>> >> >> On 2 June 2011 13:41, Markus Trippelsdorf <markus@trippelsdorf.de> wrote:
>> >> >> > On 2011.06.01 at 20:00 +0200, Markus Trippelsdorf wrote:
>> >> >> >> But I have found the root cause of symptoms described above by
>> >> >> >> bisection. It turned out that 2.6.39 is also affected, so I've bisected
>> >> >> >> down to 2.6.38.
>> >> >> >> This is the result:
>> >> >> >>
>> >> >> >>  5cb2c3bd0c5e0f3ced63f250ec2ad59d7c5c626a is the first bad commit
>> >> >> >>  commit 5cb2c3bd0c5e0f3ced63f250ec2ad59d7c5c626a
>> >> >> >>  Author: Vincent Guittot <vincent.guittot@linaro.org>
>> >> >> >>  Date:   Mon Feb 7 17:14:25 2011 +0100
>> >> >> >>
>> >> >> >>      [CPUFREQ] calculate delay after dbs_check_cpu
>> >> >> >>
>> >> >> >> When I revert the above in 3.0-rc1 the CONFIG_NO_HZ=y symptoms vanish.
>> >> >> >
>> >> >>
>> >> >> The patch, you have mentioned, solves a problem when ondemand governor
>> >> >> goes  from highest frequency to a lower one. Without the patch, the
>> >> >> governor uses the longest sampling period (sampling period * scaling
>> >> >> down factor) with a low frequency during the 1st period after
>> >> >> decreasing the frequency. This can lead to a large time frame
>> >> >> (sampling period * scaling down factor) with a low frequency but an
>> >> >> overloaded cpu.
>> >> >
>> >> > The problem with the patch is that it results in an ondemand behavior
>> >> > that almost totally ignores the middle frequencies (2100 and 2500 MHz in
>> >> > my case) with CONFIG_NO_HZ. If you also set the sampling_down_factor to
>> >> > something like >=100 then the CPU will spend much of the time at the top
>> >> > frequency even if there is no workload whatsoever.
>> >> >
>> >>
>> >> In fact, one main goal of the ondemand governor is to switch to max
>> >> frequency as soon as there is a cpu activity is detected to ensure the
>> >> responsiveness of the system. If your idle activity is made of burst
>> >> of cpu activity and your sampling period is small,  your sytems will
>> >> switch between the highest and the lowest frequency. At the contrary,
>> >> the conservative governor modifies the frequency in a step by step
>> >> manner.
>> >
>> > Understood. But this a change in behavior due to your patch.
>> >
>> >> >> The other correction of the patch is linked to the powersave bias
>> >> >> mode. The governor didn't use the right period for the low frequency
>> >> >> step (freq_lo_jiffies) but a larger one (sampling period * scaling
>> >> >> down factor). The ratio between low and high frequency was not the
>> >> >> right one.
>> >> >>
>> >> >> Do you use the powersave bias mode ?
>> >> >
>> >> > No.
>> >> >
>> >> >> Could you give us more statistics : the number of state transition
>> >> >> could be an interesting value. Is there a difference with and without
>> >> >> CONFIG_NO_HZ ? What is your sampling rate ?
>> >> >
>> >> > These are my settings:
>> >> >
>> >> > ignore_nice_load 0
>> >> > io_is_busy 0
>> >> > powersave_bias 0
>> >> > sampling_down_factor 200
>> >> > sampling_rate 10000
>> >> > sampling_rate_min 10000
>> >> > up_threshold 95
>> >> >
>> >> > cat sys/devices/system/cpu/cpu0/cpufreq/stats/* on an otherwise idle
>> >> > machine with CONFIG_NO_HZ and 5cb2c3bd0c5e0f reverted:
>> >> > 3200000 532
>> >> > 2500000 172
>> >> > 2100000 2703
>> >> > 800000 20995
>> >> > 153
>> >> >
>> >>
>> >> With this configuration (without the patch), there is a period of 2
>> >> seconds with a low frequency when the governor comes back from the
>> >> highest frequency. During these 2 seconds, you will not be able to go
>> >> back to max frequency. So, if your cpu is overloaded during this 2
>> >> seconds period, you will not increase your frequency. For this use
>> >> case, your cpufreq responsiveness is more then 2 seconds.
>> >
>> > I don't see these 2 second delays (being stuck on a low frequency) on my
>> > system. On the contrary as soon as there is sufficient load it switches
>> > to the highest frequency immediately.
>> >
>>
>> Let assume that your system is at the highest frequency
>>
>> without the patch, you have the following sequence :
>>
>> ->do_dbs_timer
>>     -> delay = usecs_to_jiffies(dbs_tuners_ins.sampling_rate *
>> dbs_info->rate_mult); // delay will be equal to 10000*200=2000000us
>>     -> dbs_check_cpu
>>            Let assume that your cpu load is quite small
>>           -> freq_next = max_load_freq / (dbs_tuners_ins.up_threshold
>> - dbs_tuners_ins.down_differential); //freq_next is set to your lowest
>> frequency
>>           -> __cpufreq_driver_target(policy, freq_next, CPUFREQ_RELATION_L);
>>     ->        queue_delayed_work_on(cpu, kondemand_wq, &dbs_info->work, delay);
>>
>> the delay value is set to sampling_rate * rate_mult but the frequency
>> is the lowest one which is not the correct behavior of the
>> sampling_down_factor feature.
>> the patch only solves this issue.
>>
>> >> > and with your patch and also CONFIG_NO_HZ:
>> >> > 3200000 11795
>> >> > 2500000 0
>> >> > 2100000 0
>> >> > 800000 20620
>> >> > 213
>> >> >
>> >> > Which shows the problem very nicely.
>> >> >
>> >>
>> >> My understand is that your idle activity is made of cpu activities
>> >> which are 10ms long and which trigs the increase of the frequency.
>> >
>> > Could it be that the call to dbs_check_cpu(dbs_info) itself is the
>> > reason for these activities?
>> >
>> >> >> One difference with CONFIG_NO_HZ is the real sampling period which can
>> >> >> be greater than the timer configuration because of the deferrable
>> >> >> mode. The deferrable mode has nearly no effect when CONFIG_NO_HZ is
>> >> >> not set because the tick timer will ensure enough cpu activity to
>> >> >> trigger the governor. When CONFIG_NO_HZ is set, the ondemand governor
>> >> >> work is triggered at the beginning of a cpu activity so we have more
>> >> >> chance to have a short cpu load in one period instead of splitting it
>> >> >> into 2 differents periods. This behavior is quite useful for
>> >> >> responsiveness but can generates spurious frequency increase if the
>> >> >> sampling rate is too short.
>> >> >
>> >> > Hm, my sampling rate (10000) is already the most minimal rate available.
>> >> >
>> >>
>> >> It's seems that your sampling period is too small and the ondemand
>> >> governor detects your idle activity as an increase of the cpu activity
>> >> and as a result, it increases the frequency. Have you tried to
>> >> increase the sampling rate and decrease your sampling_down_factor
>> >> which seems to be also quite high ?
>> >
>> > Please note that these are all default values (with the exception of
>> > sampling_down_factor). So why should I fiddle with the parameters when
>> > everything was working fine before your patch went in? And even if I
>> > increase the sampling rate and decrease the sampling_down_factor, I
>> > cannot replicate the old behavior. So IMHO it's a regression.
>> >
>>
>> IMHO, the previous results were "good" because of the bug in the
>> sampling_down_factor which was "filtering" some cpu activities after
>> decreasing the frequency.
>>
>> The best cpufreq statistic should be achieved in idle when the
>> sampling_down_factor is set to 1 because the sampling_down_factor
>> feature has been done to "improve performance by reducing the overhead
>> of load evaluation and helping the CPU stay at its top speed"
>> (Documentation/cpu-freq/governors.txt).
>>
>> Could you make some measurements with sampling_down_factor set to 1
>> and sampling_down_factor set to 200 ? The cpufreq statistic starts at
>> system boot but we are interested in idle use case result so we should
>> use the delta between 2 statistics outputs in order to remove boot
>> measurements. Using the following command in idle should be enough #
>> cat /sys/devices/system/cpu/cpu0/cpufreq/stats/* && sleep 60 && cat
>> /sys/devices/system/cpu/cpu0/cpufreq/stats/*
>
> OK.
>
> On a totally idle system:
>
> 1) With your patch:
>
> * sampling_down_factor=200
> cat /sys/devices/system/cpu/cpu0/cpufreq/stats/* && sleep 60 && cat /sys/devices/system/cpu/cpu0/cpufreq/stats/*
> 3200000 507
> 2500000 0
> 2100000 0
> 800000 903
> 13
> 3200000 533
> 2500000 0
> 2100000 0
> 800000 6876
> 14
>
> diff:
> 3200000 26
> 2500000 0
> 2100000 0
> 800000 5973
>
> * sampling_down_factor=1
> 3200000 1078
> 2500000 3
> 2100000 49
> 800000 15632
> 79
> 3200000 1078
> 2500000 3
> 2100000 49
> 800000 21632
> 79
>
> diff:
> 3200000 0
> 2500000 0
> 2100000 0
> 800000 6000
>
>
> 2) Without your patch (reverted):
>
> * sampling_down_factor=200
> 3200000 106
> 2500000 0
> 2100000 339
> 800000 1260
> 15
> 3200000 106
> 2500000 0
> 2100000 339
> 800000 7259
> 15
>
> diff:
> 3200000 0
> 2500000 0
> 2100000 0
> 800000 5999
>
> * sampling_down_factor=1
> 3200000 134
> 2500000 142
> 2100000 694
> 800000 13006
> 30
> 3200000 134
> 2500000 142
> 2100000 694
> 800000 19005
> 30
>
> diff:
> 3200000 0
> 2500000 0
> 2100000 0
> 800000 5999
>
>
> And now the same measurements while running:
> watch -n.1 'cat /proc/cpuinfo|grep MHz'
> in another terminal.
>
> 1) With your patch:
>
> * sampling_down_factor=200
> 3200000 1243
> 2500000 4
> 2100000 68
> 800000 36493
> 187
> 3200000 1373
> 2500000 4
> 2100000 68
> 800000 42363
> 192
>
> diff:
> 3200000 130
> 2500000 0
> 2100000 0
> 800000 5870
>
> * sampling_down_factor=1
> 3200000 1205
> 2500000 4
> 2100000 67
> 800000 27873
> 171
> 3200000 1209
> 2500000 4
> 2100000 67
> 800000 33869
> 179
>
> diff:
> 3200000 4
> 2500000 0
> 2100000 0
> 800000 5996
>
> 2) Without your patch (reverted):
>
> * sampling_down_factor=200
> 3200000 240
> 2500000 0
> 2100000 505
> 800000 12842
> 41
> 3200000 245
> 2500000 0
> 2100000 505
> 800000 18836
> 51
>
> diff:
> 3200000 5
> 2500000 0
> 2100000 0
> 800000 5994
>
> * sampling_down_factor=1
> 3200000 230
> 2500000 0
> 2100000 505
> 800000 5497
> 31
> 3200000 234
> 2500000 0
> 2100000 505
> 800000 11493
> 39
>
> diff:
> 3200000 4
> 2500000 0
> 2100000 0
> 800000 5996
>
> So, with sampling_down_factor=200 and "watch -n.1" running, the CPU
> spends 1300 msec on top speed vs. 50 msec without your patch.
>
> BTW what irritates me is that "watch -n.1 'cat /proc/cpuinfo|grep MHz'"
> shows way more frequency changes than what is reported in cpufreq/stats/.
>

OK, so the additional activity generated by watch is enough to trig
the ondemand governor and that explains your stats results

> --
> Markus
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-06-07  7:34 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-01 16:08 switching to top frequency too frequent with ondemand governor and no_hz Markus Trippelsdorf
2011-06-01 17:34 ` David C Niemi
2011-06-01 18:00   ` Markus Trippelsdorf
2011-06-02 11:41     ` Markus Trippelsdorf
2011-06-06  7:35       ` Vincent Guittot
2011-06-06 11:20         ` Markus Trippelsdorf
2011-06-06 13:11           ` Vincent Guittot
2011-06-06 14:16             ` Markus Trippelsdorf
2011-06-06 16:34               ` Vincent Guittot
2011-06-06 17:51                 ` Markus Trippelsdorf
2011-06-07  7:34                   ` Vincent Guittot
2011-06-06 19:43                 ` Markus Trippelsdorf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.