* Improving High-Load Performance with the Ondemand Governor
@ 2010-09-09 14:28 David C Niemi
2010-09-10 7:40 ` Andi Kleen
2010-09-16 20:39 ` Improving High-Load Performance with the Ondemand Governor [PATCH ATTACHED] David C Niemi
0 siblings, 2 replies; 15+ messages in thread
From: David C Niemi @ 2010-09-09 14:28 UTC (permalink / raw)
To: cpufreq
I have tested patches for both 2.6.18 and 2.6.32, but before sharing
them I'd like to first describe the problem I'm trying to solve and the
strategy I've been trying and get some feedback on it.
I have an application for RHEL 5-based network servers where the
Performance "governor" was being used due to measurably worse
performance with the stock Ondemand governor. The hardware includes
Woodcrest, Opteron, and Nehalem dual-socket machines with CPUs towards
the high-performance end. My changes have been in production use for
over a year on RHEL 5.x, and I'm now looking at applying them to RHEL 6
and would like to get them into the mainstream kernel. I believe my
changes can be generally beneficial to neutral across all applications
if done right.
The workload has periods of really high CPU utilization with lulls in
between, and the servers need to respond quickly to the onset of load to
avoid dropping packets. This resulted in 3 goals for my work with the
governor:
1) Negligible overhead when at high CPU utilization
2) Save power when truly idle
3) Ramp up quickly to the high-performance state when load appears
One of the first things I discovered is that the Ondemand governor has
symmetric logic for deciding to increase or decrease clock speed. This
might be good for a battery-powered device, but under heavy load, the
overhead of checking load on all cores on a frequent basis impairs
performance very noticeably. I also noticed that even under heavy
loads, the CPU speed would not remain at maximum all the time. The
governor was seeking any chance to downshift for the slightest perceive
dip in load, which in this case resulted in dropped packets; this is
simply not good behavior for my application.
Sampling less frequently helps somewhat, but not enough, and conflicts
with goal #3.
Lowering up_threshold helps somewhat too, but not enough, as it can only
be lowered to 11 and it does not solve the conflict between goals #1 and #3.
My Strategy:
1) (re)introduced the sampling_down tunable, but made it work a bit
differently. This turned out to be the centerpiece and most important
of all my changes. When set to 1 (default) it changes nothing from
existing behavior. If set to more than one, it is a multiplier for the
scheduling interval when in the top CPU speed. So if we set it to 100,
the overhead of checking for idle CPU is reduce to 1% what it was when
we are really busy, and we are much less prone to downshift as long as
we continue to be busy. But as soon as we are not at the top speed,
scheduling goes back to normal so we can quickly respond to a load spike.
2) made it possible for up_threshold to be set much lower (5) to improve
responsiveness to sudden load spikes.
3) Made hysteresis (DOWN_DIFFERENTIAL) scalable based on up_threshold,
in order to make it possible to reach an up_threshold of 5.
4) Clock speed jitter is highly undesirable, and became more noticeable
when up_threshold is small. A specific problem I found is that the
overhead of lowering clock speed can be mistaken for more load, causing
the CPU to upshift again right away. I solved this by throwing away the
sample right after reducing speed, as it is never going to be a good
indication of what the normal load really is. When increasing speed,
the extra load is harmless and nothing needs to be changed.
Additional observations:
5) I don't like the addition of a down_differential variable per CPU. I
consider it to be unnecessary baggage, and would prefer to always
calculate down_differential (hysteresis) whenever needed on the fly
based on up_threshold. I don't think it should be a tunable because
there is a fairly narrow range of useful values that are probably better
to calculate automatically.
6) The MICRO_FREQUENCY changes are not very helpful to my cause. An
UP_THRESHOLD of 95 is awful for my goal #3, a DOWN_DIFFERENTIAL of 3 is
very jitter-inducing, and a sample rate (really interval) of 10000 is
way too fast. I'd like to hear what these changes are intended to do so
I can preserve their intent while meeting my needs too.
David C Niemi
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Improving High-Load Performance with the Ondemand Governor
2010-09-09 14:28 Improving High-Load Performance with the Ondemand Governor David C Niemi
@ 2010-09-10 7:40 ` Andi Kleen
2010-09-13 20:18 ` David C Niemi
2010-09-16 20:39 ` Improving High-Load Performance with the Ondemand Governor [PATCH ATTACHED] David C Niemi
1 sibling, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2010-09-10 7:40 UTC (permalink / raw)
To: David C Niemi; +Cc: cpufreq, lenb
David C Niemi <dniemi@verisign.com> writes:
Perhaps better post to linux-kernel next time, I think cpufreq
is mostly dead these days.
> I have tested patches for both 2.6.18 and 2.6.32, but before sharing
> them I'd like to first describe the problem I'm trying to solve and
> the strategy I've been trying and get some feedback on it.
These are all ancient in terms of mainline kernel. The latest
kernel should have some improvements, perhaps try them first.
On Nehalem class systems with recent kernels it often helps to use the
"intel_idle" driver too, because that gives the governour more
accurate latencies to work with. Many BIOS are known to report
incorrect latencies.
> The workload has periods of really high CPU utilization with lulls in
> between, and the servers need to respond quickly to the onset of load
> to avoid dropping packets. This resulted in 3 goals for my work with
> the governor:
>
> 1) Negligible overhead when at high CPU utilization
> 2) Save power when truly idle
> 3) Ramp up quickly to the high-performance state when load appears
FWIW when you're truly idle you typically don't need ondemand,
the idle states on modern CPUs go to the lowest frequency by themselves
or simply turn off the frequency completely.
ondemand and p-states mainly help you on moderate load.
Just going to highest state unconditionally would be somewhat
contraproductive to that goal.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Improving High-Load Performance with the Ondemand Governor
2010-09-10 7:40 ` Andi Kleen
@ 2010-09-13 20:18 ` David C Niemi
2010-09-13 20:54 ` Andi Kleen
0 siblings, 1 reply; 15+ messages in thread
From: David C Niemi @ 2010-09-13 20:18 UTC (permalink / raw)
To: Andi Kleen; +Cc: cpufreq, lenb
Andi Kleen wrote:
>
> David C Niemi <dniemi@verisign.com> writes:
>
> Perhaps better post to linux-kernel next time, I think cpufreq
> is mostly dead these days.
>
Hello Andi, thanks for your quick response.
There were some lively discussions on it in the fairly recent past.
I'll post on linux-kernel if I don't get enough feedback.
>
> > I have tested patches for both 2.6.18 and 2.6.32, but before sharing
> > them I'd like to first describe the problem I'm trying to solve and
> > the strategy I've been trying and get some feedback on it.
>
> These are all ancient in terms of mainline kernel. The latest
> kernel should have some improvements, perhaps try them first.
>
I have looked at the latest kernels too, and the changes in the ondemand
governor between that and RHEL 6's 2.6.32 kernel are quite modest. I
mention 2.6.18 just because it's what's been out in the field a while.
> On Nehalem class systems with recent kernels it often helps to use the
> "intel_idle" driver too, because that gives the governour more
> accurate latencies to work with. Many BIOS are known to report
> incorrect latencies.
>
Thanks for the suggestion.
I haven't seen much in the way of inaccurate latency problems, but then
most of my testing has been on a fairly constrained set of fairly good
hardware.
> > The workload has periods of really high CPU utilization with lulls in
> > between, and the servers need to respond quickly to the onset of load
> > to avoid dropping packets. This resulted in 3 goals for my work with
> > the governor:
> >
> > 1) Negligible overhead when at high CPU utilization
> > 2) Save power when truly idle
> > 3) Ramp up quickly to the high-performance state when load appears
>
> FWIW when you're truly idle you typically don't need ondemand,
> the idle states on modern CPUs go to the lowest frequency by themselves
> or simply turn off the frequency completely.
>
I do see c-states getting used on Intel hardware to save power, and in
some cases these are quite effective. On AMD hardware lowering
frequency tends to be very important to saving power. But you must
choose some governor or other, and if you choose the performance
(non)governor clock frequency does NOT change by itself. There are
other governors more attuned to portable devices, but that's a different
application; the ondemand governor is the closest I could find.
> ondemand and p-states mainly help you on moderate load.
>
> Just going to highest state unconditionally would be somewhat
> contraproductive to that goal.
>
On moderate load I might agree, but on the servers I care about it is a
workload that's a bit like war -- long periods of boredom punctuated by
sudden bursts of sheer terror. So I am really only very interested in
active idle and max performance, not so much states in between. Of
course, on new Intel hardware that decision can be made in a fairly
fine-grained way; you do not have to ramp up every core just because one
is busy. But if performance during the peaks is inferior to the
performance non-governor, we will end up being told to use that and
running flat-out all the time and save no power at all other than that
automatically saved through c-states.
David C Niemi
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Improving High-Load Performance with the Ondemand Governor
2010-09-13 20:18 ` David C Niemi
@ 2010-09-13 20:54 ` Andi Kleen
2010-09-13 22:02 ` David C Niemi
0 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2010-09-13 20:54 UTC (permalink / raw)
To: David C Niemi; +Cc: cpufreq, lenb
On Mon, 13 Sep 2010 16:18:51 -0400
David C Niemi <dniemi@verisign.com> wrote:
> > > I have tested patches for both 2.6.18 and 2.6.32, but before
> > > sharing them I'd like to first describe the problem I'm trying to
> > > solve and the strategy I've been trying and get some feedback on
> > > it.
> >
> > These are all ancient in terms of mainline kernel. The latest
> > kernel should have some improvements, perhaps try them first.
> >
> I have looked at the latest kernels too, and the changes in the
> ondemand governor between that and RHEL 6's 2.6.32 kernel are quite
> modest. I mention 2.6.18 just because it's what's been out in the
> field a while.
Most of the interesting changes were post 2.6.32 (2.6.32 is ancient
too for mainline)
> > FWIW when you're truly idle you typically don't need ondemand,
> > the idle states on modern CPUs go to the lowest frequency by
> > themselves or simply turn off the frequency completely.
> >
> I do see c-states getting used on Intel hardware to save power, and
ondemand has nothing to do with c-states, c-states are handled
by the menu governor.
> in some cases these are quite effective. On AMD hardware lowering
> frequency tends to be very important to saving power.
AFAIK modern AMD doesn't need this either in c-states.
> > ondemand and p-states mainly help you on moderate load.
> >
> > Just going to highest state unconditionally would be somewhat
> > contraproductive to that goal.
> >
> On moderate load I might agree, but on the servers I care about it is
> a workload that's a bit like war -- long periods of boredom
> punctuated by sudden bursts of sheer terror.
In this case on modern hardware you don't need a p-state
governor at all except for "performance"
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Improving High-Load Performance with the Ondemand Governor
2010-09-13 20:54 ` Andi Kleen
@ 2010-09-13 22:02 ` David C Niemi
0 siblings, 0 replies; 15+ messages in thread
From: David C Niemi @ 2010-09-13 22:02 UTC (permalink / raw)
To: Andi Kleen; +Cc: cpufreq
Andi Kleen wrote:
> On Mon, 13 Sep 2010 16:18:51 -0400
> David C Niemi <dniemi@verisign.com> wrote:
>
>
>> I have looked at the latest kernels too, and the changes in the
>> ondemand governor between that and RHEL 6's 2.6.32 kernel are quite
>> modest. I mention 2.6.18 just because it's what's been out in the
>> field a while.
>>
>
> Most of the interesting changes were post 2.6.32 (2.6.32 is ancient
> too for mainline)
>
I did see a few changes in cpufreq_ondemand.c between 2.6.32 and the git
version I grabbed last week, but not really relevant to what I was
trying to do.
>>> FWIW when you're truly idle you typically don't need ondemand,
>>> the idle states on modern CPUs go to the lowest frequency by
>>> themselves or simply turn off the frequency completely.
>>>
>>>
>> I do see c-states getting used on Intel hardware to save power, and
>>
>
> ondemand has nothing to do with c-states, c-states are handled
> by the menu governor.
>
We're using the standard cpuidle on the newer (RHEL 6 beta-based)
kernels. If you think there are compelling improvements in it after
2.6.32 I'll certainly take a look.
>> in some cases these are quite effective. On AMD hardware lowering
>> frequency tends to be very important to saving power.
>>
>
> AFAIK modern AMD doesn't need this either in c-states.
>
It makes a dramatic difference in power consumption whether you use a
p-state governor on the 2-year-old AMD hardware that matters to me. On
both old (Woodcrest) and new (Nehalem) Intel hardware the difference is
much smaller, as c-states are the dominant form of power saving, but
using a p-state governor still makes a measurable difference. On the
plus side the power-saving c-states we are using don't measurably hurt
performance on our workloads, so cpuidle is doing a pretty good job;
whereas the stock ondemand p-state governor does in a big way.
>> On moderate load I might agree, but on the servers I care about it is
>> a workload that's a bit like war -- long periods of boredom
>> punctuated by sudden bursts of sheer terror.
>>
>
> In this case on modern hardware you don't need a p-state
> governor at all except for "performance"
>
> -Andi
>
No doubt true in the long run, but see above.
DCN
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Improving High-Load Performance with the Ondemand Governor [PATCH ATTACHED]
2010-09-09 14:28 Improving High-Load Performance with the Ondemand Governor David C Niemi
2010-09-10 7:40 ` Andi Kleen
@ 2010-09-16 20:39 ` David C Niemi
2010-09-17 9:25 ` Thomas Renninger
` (2 more replies)
1 sibling, 3 replies; 15+ messages in thread
From: David C Niemi @ 2010-09-16 20:39 UTC (permalink / raw)
To: cpufreq
[-- Attachment #1: Type: text/plain, Size: 1940 bytes --]
I've been doing more testing, and have a couple of observations. I'm
attaching a minimal form of my changes as a patch for the latest
2.6.pre36 git version of the driver. However, it is difficult for me to
test under anything other than 2.6.32 (RHEL 6 beta 2), and there are
some minor differences, though I don't believe they are relevant to my
results.
It looks like "io_is_busy" set to 1 is quite beneficial for quickly
reacting the onset of load.
I do see a lot of downshifting from the top speed when a core is at
"100%" CPU, presumably this means little stalls and lulls, so I expect
"sampling_down_factor" values greater than 1 continue to be useful and
the sampling_down_factor continues to be desirable.
I've testing on a dual Xeon X5680 system (other times I've been testing
on 2-year-old dual Opterons).
I observe about a 10W power consumption reduction at idle between the
"performance" governor and the "ondemand" governor. I've seen even
bigger differences under load, as much as 40 watts, though that could be
associated with some performance differences. I haven't tried to
quantify the effect of the sampling_down_factor tunable on power
consumption under load, presumably it increases it, but its usage is
voluntary and that is to be expected.
I have been unable to find a value of up_threshold that does not switch
frequency on at least one core pretty frequently (ranging a couple of
times a minute to several times a second). However, with fairly fast
sampling intervals (10000 to 50000) I see pretty quick reaction to load
even with UP_THRESHOLD set high (e.g. 50 or even 95). So it is likely
my previous efforts to extend the possible values of UP_THRESHOLD from
11 to 5 are no longer necessary, and are not included in the attached
patch. There are other things I would like to consider doing, however,
that I'll bring up afterwards, but not in this minimal patch.
David C Niemi
[-- Attachment #2: cpufreq_ondemand.c-git.patch --]
[-- Type: text/x-patch, Size: 3608 bytes --]
--- cpufreq_ondemand.c-git 2010-09-08 16:02:01.000000000 -0400
+++ cpufreq_ondemand.c-git-dcn 2010-09-16 16:31:27.000000000 -0400
@@ -30,10 +30,12 @@
#define DEF_FREQUENCY_DOWN_DIFFERENTIAL (10)
#define DEF_FREQUENCY_UP_THRESHOLD (80)
+#define DEF_SAMPLING_DOWN_FACTOR (1)
+#define MAX_SAMPLING_DOWN_FACTOR (100000)
#define MICRO_FREQUENCY_DOWN_DIFFERENTIAL (3)
#define MICRO_FREQUENCY_UP_THRESHOLD (95)
#define MICRO_FREQUENCY_MIN_SAMPLE_RATE (10000)
-#define MIN_FREQUENCY_UP_THRESHOLD (11)
+#define MIN_FREQUENCY_UP_THRESHOLD (5)
#define MAX_FREQUENCY_UP_THRESHOLD (100)
/*
@@ -82,6 +84,7 @@
unsigned int freq_lo;
unsigned int freq_lo_jiffies;
unsigned int freq_hi_jiffies;
+ unsigned int rate_mult;
int cpu;
unsigned int sample_type:1;
/*
@@ -108,10 +111,12 @@
unsigned int up_threshold;
unsigned int down_differential;
unsigned int ignore_nice;
+ unsigned int sampling_down_factor;
unsigned int powersave_bias;
unsigned int io_is_busy;
} dbs_tuners_ins = {
.up_threshold = DEF_FREQUENCY_UP_THRESHOLD,
+ .sampling_down_factor = DEF_SAMPLING_DOWN_FACTOR,
.down_differential = DEF_FREQUENCY_DOWN_DIFFERENTIAL,
.ignore_nice = 0,
.powersave_bias = 0,
@@ -259,6 +264,7 @@
show_one(sampling_rate, sampling_rate);
show_one(io_is_busy, io_is_busy);
show_one(up_threshold, up_threshold);
+show_one(sampling_down_factor, sampling_down_factor);
show_one(ignore_nice_load, ignore_nice);
show_one(powersave_bias, powersave_bias);
@@ -340,6 +346,32 @@
return count;
}
+static ssize_t store_sampling_down_factor(struct kobject *a,
+ struct attribute *b, const char *buf, size_t count)
+{
+ unsigned int input, j;
+ int ret;
+ ret = sscanf(buf, "%u", &input);
+
+ mutex_lock(&dbs_mutex);
+ if (ret != 1 || input > MAX_SAMPLING_DOWN_FACTOR || input < 1) {
+ mutex_unlock(&dbs_mutex);
+ return -EINVAL;
+ }
+
+ dbs_tuners_ins.sampling_down_factor = input;
+
+ /* Reset down sampling multiplier in case it was active */
+ for_each_online_cpu(j) {
+ struct cpu_dbs_info_s *dbs_info;
+ dbs_info = &per_cpu(od_cpu_dbs_info, j);
+ dbs_info->rate_mult = 1;
+ }
+ mutex_unlock(&dbs_mutex);
+
+ return count;
+}
+
static ssize_t store_ignore_nice_load(struct kobject *a, struct attribute *b,
const char *buf, size_t count)
{
@@ -409,6 +441,7 @@
&sampling_rate_min.attr,
&sampling_rate.attr,
&up_threshold.attr,
+ &sampling_down_factor.attr,
&ignore_nice_load.attr,
&powersave_bias.attr,
&io_is_busy.attr,
@@ -562,6 +595,10 @@
/* Check for frequency increase */
if (max_load_freq > dbs_tuners_ins.up_threshold * policy->cur) {
+ /* If switching to max speed, apply sampling_down_factor */
+ if (policy->cur < policy->max)
+ this_dbs_info->rate_mult =
+ dbs_tuners_ins.sampling_down_factor;
dbs_freq_increase(policy, policy->max);
return;
}
@@ -584,6 +621,9 @@
(dbs_tuners_ins.up_threshold -
dbs_tuners_ins.down_differential);
+ /* No longer fully busy, reset rate_mult */
+ this_dbs_info->rate_mult = 1;
+
if (freq_next < policy->min)
freq_next = policy->min;
@@ -607,7 +647,8 @@
int sample_type = dbs_info->sample_type;
/* We want all CPUs to do sampling nearly on same jiffy */
- int delay = usecs_to_jiffies(dbs_tuners_ins.sampling_rate);
+ int delay = usecs_to_jiffies(dbs_tuners_ins.sampling_rate
+ * dbs_info->rate_mult);
if (num_online_cpus() > 1)
delay -= jiffies % delay;
@@ -711,6 +752,7 @@
}
}
this_dbs_info->cpu = cpu;
+ this_dbs_info->rate_mult = 1;
ondemand_powersave_bias_init_cpu(cpu);
/*
* Start the timerschedule work, when this governor
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Improving High-Load Performance with the Ondemand Governor [PATCH ATTACHED]
2010-09-16 20:39 ` Improving High-Load Performance with the Ondemand Governor [PATCH ATTACHED] David C Niemi
@ 2010-09-17 9:25 ` Thomas Renninger
2010-09-17 9:25 ` Thomas Renninger
2010-09-29 18:18 ` Venkatesh Pallipadi
2 siblings, 0 replies; 15+ messages in thread
From: Thomas Renninger @ 2010-09-17 9:25 UTC (permalink / raw)
To: David C Niemi; +Cc: discuss, linux-pm, cpufreq, Arjan van de Ven
On Thursday 16 September 2010 22:39:48 David C Niemi wrote:
> I've been doing more testing, and have a couple of observations. I'm
> attaching a minimal form of my changes as a patch for the latest
> 2.6.pre36 git version of the driver. However, it is difficult for me to
> test under anything other than 2.6.32 (RHEL 6 beta 2), and there are
> some minor differences, though I don't believe they are relevant to my
> results.
...
Adrian van dev Van "pre-announced" changes in the cpufreq area about
half a year ago:
Here is a comment from Arjan on the cpufreq list from 2010-04-19:
====================
Subject: Re: [PATCH 7/7] ondemand: Solve the big performance issue with
ondemand during disk IO
<cut>
As for your general "ondemand is for everyone" concern; there are many
things wrong with ondemand, and I'm writing a new governor to fix the
more fundamental issues with it (and also, frankly, so that I won't
break existing users and hardware I don't have access to). This is
basically a backport of a specific feature of my new governor to
ondemand because Andrew keeps hitting the really bad case and basically
ended up turning power management off.
</cut>
====================
Unfortunately there didn't happen much since then.
If there is already some Alpha version or some measure results, it
would be great to see/compare those.
Also there is a research group who fiddled with that.
They also have a "new governor" approach and already have some
interesting results. They hopefully (said they will the next weeks)
can show some code which can be compared on the same HW
with your or other approaches then:
http://www.betriebssysteme.org/Aktivitaeten/Treffen/2009-Bommerholz/Programm/docs/Talks/richling.pdf
I expect with Arjan's latest "count IO as busy time" there is only some
fine tuning with the ondemand or say polling approach that could still
be done. The patch you sent probably increases performance a bit with again
some power trade-offs depending on the HW and C-states available.
Interesting is:
---------------------
I've testing on a dual Xeon X5680 system
(other times I've been testing on 2-year-old dual Opterons).
I observe about a 10W power consumption reduction at idle between the
"performance" governor and the "ondemand" governor.
---------------------
On the Opteron or Xeon system? That would mean that reducing frequency
from OS still is an important power consumption knob even on latest Westmere
systems.
Thomas
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Improving High-Load Performance with the Ondemand Governor [PATCH ATTACHED]
2010-09-16 20:39 ` Improving High-Load Performance with the Ondemand Governor [PATCH ATTACHED] David C Niemi
2010-09-17 9:25 ` Thomas Renninger
@ 2010-09-17 9:25 ` Thomas Renninger
2010-09-17 13:45 ` David C Niemi
` (3 more replies)
2010-09-29 18:18 ` Venkatesh Pallipadi
2 siblings, 4 replies; 15+ messages in thread
From: Thomas Renninger @ 2010-09-17 9:25 UTC (permalink / raw)
To: David C Niemi; +Cc: cpufreq, discuss, linux-pm, Arjan van de Ven
On Thursday 16 September 2010 22:39:48 David C Niemi wrote:
> I've been doing more testing, and have a couple of observations. I'm
> attaching a minimal form of my changes as a patch for the latest
> 2.6.pre36 git version of the driver. However, it is difficult for me to
> test under anything other than 2.6.32 (RHEL 6 beta 2), and there are
> some minor differences, though I don't believe they are relevant to my
> results.
...
Adrian van dev Van "pre-announced" changes in the cpufreq area about
half a year ago:
Here is a comment from Arjan on the cpufreq list from 2010-04-19:
====================
Subject: Re: [PATCH 7/7] ondemand: Solve the big performance issue with
ondemand during disk IO
<cut>
As for your general "ondemand is for everyone" concern; there are many
things wrong with ondemand, and I'm writing a new governor to fix the
more fundamental issues with it (and also, frankly, so that I won't
break existing users and hardware I don't have access to). This is
basically a backport of a specific feature of my new governor to
ondemand because Andrew keeps hitting the really bad case and basically
ended up turning power management off.
</cut>
====================
Unfortunately there didn't happen much since then.
If there is already some Alpha version or some measure results, it
would be great to see/compare those.
Also there is a research group who fiddled with that.
They also have a "new governor" approach and already have some
interesting results. They hopefully (said they will the next weeks)
can show some code which can be compared on the same HW
with your or other approaches then:
http://www.betriebssysteme.org/Aktivitaeten/Treffen/2009-Bommerholz/Programm/docs/Talks/richling.pdf
I expect with Arjan's latest "count IO as busy time" there is only some
fine tuning with the ondemand or say polling approach that could still
be done. The patch you sent probably increases performance a bit with again
some power trade-offs depending on the HW and C-states available.
Interesting is:
---------------------
I've testing on a dual Xeon X5680 system
(other times I've been testing on 2-year-old dual Opterons).
I observe about a 10W power consumption reduction at idle between the
"performance" governor and the "ondemand" governor.
---------------------
On the Opteron or Xeon system? That would mean that reducing frequency
from OS still is an important power consumption knob even on latest Westmere
systems.
Thomas
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Improving High-Load Performance with the Ondemand Governor [PATCH ATTACHED]
2010-09-17 9:25 ` Thomas Renninger
2010-09-17 13:45 ` David C Niemi
@ 2010-09-17 13:45 ` David C Niemi
2010-09-17 13:46 ` Arjan van de Ven
2010-09-17 13:46 ` Arjan van de Ven
3 siblings, 0 replies; 15+ messages in thread
From: David C Niemi @ 2010-09-17 13:45 UTC (permalink / raw)
To: cpufreq; +Cc: discuss, linux-pm, Arjan van de Ven
Thomas Renninger wrote:
> On Thursday 16 September 2010 22:39:48 David C Niemi wrote:
>
>> I've been doing more testing, and have a couple of observations. I'm
>> attaching a minimal form of my changes as a patch for the latest
>> 2.6.pre36 git version of the driver. However, it is difficult for me to
>> test under anything other than 2.6.32 (RHEL 6 beta 2), and there are
>> some minor differences, though I don't believe they are relevant to my
>> results.
>>
> ...
> Adrian van dev Van "pre-announced" changes in the cpufreq area about
> half a year ago:
>
I saw his message. I expect substantial changes are needed in the long
run, but good alternatives to the Ondemand governor are not ready yet
and will have to go through a long period of testing on many kinds of
hardware. The patch I sent is much more tactical in nature. It
intended to be a light-touch, low-risk change, adding one tunable (under
a name that existed previously in the Conservative governor) and without
changing default behavior in any way.
> http://www.betriebssysteme.org/Aktivitaeten/Treffen/2009-Bommerholz/Programm/docs/Talks/richling.pdf
>
Thanks for the link. I think integration with the scheduler makes a lot
of sense in the long run. I see that particular paper as being a bit
one-dimensional, though:
- It focused energy consumption and performance while completing a
defined task, not power consumption on a mix of tasks and idle time.
Energy consumed in a defined task is an interesting data point, but not
even close to the only one; power consumption while in idle or switching
in and out of idle is how most of our CPU cores spend most of their time.
- There is no inherent reason the Ondemand governor should be inferior
to the Performance governor on long-running tasks (at least with my patch).
- They only looked at AMD hardware. Intel CPUs behave a lot
differently, relying a lot more on C-States than P-States for power
savings, and they may differ in other ways too.
- There will need to be some tunables, even with a very smart governor
integrated with the scheduler. For example, where along the
performance/power consumption tradeoff should the scheduler/governor be
aiming? Should it be optimizing for single-thread or many-thread
performance? Should it try to shut down a whole CPU (or core)
completely whenever possible, or keep everything running in active
idle? How important is it to react quickly at the onset of load?
- Ultimately we need to know something about which P-states do the most
work per unit energy, and that is not going to be the same for every
CPU. I'm skeptical having a wide range of P-States makes much sense.
There should perhaps be 3 states only per core: (A) minimum power active
idle, (B) maximum efficiency in terms of work done per unit energy, and
(C) maximum performance with no regard for energy consumption per se.
There are certain special steady-state workloads where an intermediate
power state is truly helpful, like Blu-Ray playback, but that one in
particular is being taken on by firmware over time, and I'm not sure
they are worth optimizing for.
- Ideally the hardware/firmware should have the task of making sure it
doesn't burn itself up, managing voltages and turning things on/off
appropriate for each P-state and/or C-state, giving the operating system
visibility into what is going on with respect to power consumption and
states, and otherwise following orders from the operating system about
what needs to be done. I think some implementations have gone too far
in the direction of trying to implement governor-like smarts into the
firmware or CPU, while inherently lacking the operating system's more
complete view of what is trying to be accomplished.
> Interesting is:
> ---------------------
> I've testing on a dual Xeon X5680 system
> (other times I've been testing on 2-year-old dual Opterons).
> I observe about a 10W power consumption reduction at idle between the
> "performance" governor and the "ondemand" governor.
> ---------------------
> On the Opteron or Xeon system? That would mean that reducing frequency
> from OS still is an important power consumption knob even on latest Westmere
> systems.
>
That was on the 32nm (Westmere) CPUs, with hyperthreading on. On
Opterons power consumption differences between Performance and Ondemand
are much larger, like I mentioned AMD and Intel behave a lot differently
here. They also change behavior over time -- older Intel CPUs
(Woodcrest) had almost negligible power consumption differences by
changing clock speed, and some of them were not even capable of changing
clock speed at all. AMD has tended to allow very slow idle states,
around 1 GHz, while Intel's minimum is at 1.596 GHz; but Intel has been
more aggressive about shutting off inactive parts of caches and cores.
So anyway, I believe the Ondemand governor will continue to have a lot
of relevance for another year at least, until a replacement is (a) fully
implemented, (b) widely tested, and (c) works its way downstream to
distributions. Without something like this patch, I'll be stuck with
the Performance governor in the mean time, which is far worse.
David C Niemi
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Improving High-Load Performance with the Ondemand Governor [PATCH ATTACHED]
2010-09-17 9:25 ` Thomas Renninger
@ 2010-09-17 13:45 ` David C Niemi
2010-09-18 10:13 ` [linux-pm] " Sripathy, Vishwanath
2010-09-18 10:13 ` Sripathy, Vishwanath
2010-09-17 13:45 ` David C Niemi
` (2 subsequent siblings)
3 siblings, 2 replies; 15+ messages in thread
From: David C Niemi @ 2010-09-17 13:45 UTC (permalink / raw)
To: cpufreq; +Cc: Thomas Renninger, discuss, linux-pm, Arjan van de Ven
Thomas Renninger wrote:
> On Thursday 16 September 2010 22:39:48 David C Niemi wrote:
>
>> I've been doing more testing, and have a couple of observations. I'm
>> attaching a minimal form of my changes as a patch for the latest
>> 2.6.pre36 git version of the driver. However, it is difficult for me to
>> test under anything other than 2.6.32 (RHEL 6 beta 2), and there are
>> some minor differences, though I don't believe they are relevant to my
>> results.
>>
> ...
> Adrian van dev Van "pre-announced" changes in the cpufreq area about
> half a year ago:
>
I saw his message. I expect substantial changes are needed in the long
run, but good alternatives to the Ondemand governor are not ready yet
and will have to go through a long period of testing on many kinds of
hardware. The patch I sent is much more tactical in nature. It
intended to be a light-touch, low-risk change, adding one tunable (under
a name that existed previously in the Conservative governor) and without
changing default behavior in any way.
> http://www.betriebssysteme.org/Aktivitaeten/Treffen/2009-Bommerholz/Programm/docs/Talks/richling.pdf
>
Thanks for the link. I think integration with the scheduler makes a lot
of sense in the long run. I see that particular paper as being a bit
one-dimensional, though:
- It focused energy consumption and performance while completing a
defined task, not power consumption on a mix of tasks and idle time.
Energy consumed in a defined task is an interesting data point, but not
even close to the only one; power consumption while in idle or switching
in and out of idle is how most of our CPU cores spend most of their time.
- There is no inherent reason the Ondemand governor should be inferior
to the Performance governor on long-running tasks (at least with my patch).
- They only looked at AMD hardware. Intel CPUs behave a lot
differently, relying a lot more on C-States than P-States for power
savings, and they may differ in other ways too.
- There will need to be some tunables, even with a very smart governor
integrated with the scheduler. For example, where along the
performance/power consumption tradeoff should the scheduler/governor be
aiming? Should it be optimizing for single-thread or many-thread
performance? Should it try to shut down a whole CPU (or core)
completely whenever possible, or keep everything running in active
idle? How important is it to react quickly at the onset of load?
- Ultimately we need to know something about which P-states do the most
work per unit energy, and that is not going to be the same for every
CPU. I'm skeptical having a wide range of P-States makes much sense.
There should perhaps be 3 states only per core: (A) minimum power active
idle, (B) maximum efficiency in terms of work done per unit energy, and
(C) maximum performance with no regard for energy consumption per se.
There are certain special steady-state workloads where an intermediate
power state is truly helpful, like Blu-Ray playback, but that one in
particular is being taken on by firmware over time, and I'm not sure
they are worth optimizing for.
- Ideally the hardware/firmware should have the task of making sure it
doesn't burn itself up, managing voltages and turning things on/off
appropriate for each P-state and/or C-state, giving the operating system
visibility into what is going on with respect to power consumption and
states, and otherwise following orders from the operating system about
what needs to be done. I think some implementations have gone too far
in the direction of trying to implement governor-like smarts into the
firmware or CPU, while inherently lacking the operating system's more
complete view of what is trying to be accomplished.
> Interesting is:
> ---------------------
> I've testing on a dual Xeon X5680 system
> (other times I've been testing on 2-year-old dual Opterons).
> I observe about a 10W power consumption reduction at idle between the
> "performance" governor and the "ondemand" governor.
> ---------------------
> On the Opteron or Xeon system? That would mean that reducing frequency
> from OS still is an important power consumption knob even on latest Westmere
> systems.
>
That was on the 32nm (Westmere) CPUs, with hyperthreading on. On
Opterons power consumption differences between Performance and Ondemand
are much larger, like I mentioned AMD and Intel behave a lot differently
here. They also change behavior over time -- older Intel CPUs
(Woodcrest) had almost negligible power consumption differences by
changing clock speed, and some of them were not even capable of changing
clock speed at all. AMD has tended to allow very slow idle states,
around 1 GHz, while Intel's minimum is at 1.596 GHz; but Intel has been
more aggressive about shutting off inactive parts of caches and cores.
So anyway, I believe the Ondemand governor will continue to have a lot
of relevance for another year at least, until a replacement is (a) fully
implemented, (b) widely tested, and (c) works its way downstream to
distributions. Without something like this patch, I'll be stuck with
the Performance governor in the mean time, which is far worse.
David C Niemi
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Improving High-Load Performance with the Ondemand Governor [PATCH ATTACHED]
2010-09-17 9:25 ` Thomas Renninger
` (2 preceding siblings ...)
2010-09-17 13:46 ` Arjan van de Ven
@ 2010-09-17 13:46 ` Arjan van de Ven
3 siblings, 0 replies; 15+ messages in thread
From: Arjan van de Ven @ 2010-09-17 13:46 UTC (permalink / raw)
To: Thomas Renninger; +Cc: discuss, linux-pm, David C Niemi, cpufreq
On Fri, 17 Sep 2010 11:25:44 +0200
Thomas Renninger <trenn@suse.de> wrote:
> On the Opteron or Xeon system? That would mean that reducing frequency
> from OS still is an important power consumption knob even on latest
> Westmere systems.
it is for staying out of the turbo range. the turbo range is not power
efficient (but good for performance)
below turbo, the actual impact is a LOT less....
--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Improving High-Load Performance with the Ondemand Governor [PATCH ATTACHED]
2010-09-17 9:25 ` Thomas Renninger
2010-09-17 13:45 ` David C Niemi
2010-09-17 13:45 ` David C Niemi
@ 2010-09-17 13:46 ` Arjan van de Ven
2010-09-17 13:46 ` Arjan van de Ven
3 siblings, 0 replies; 15+ messages in thread
From: Arjan van de Ven @ 2010-09-17 13:46 UTC (permalink / raw)
To: Thomas Renninger; +Cc: David C Niemi, cpufreq, discuss, linux-pm
On Fri, 17 Sep 2010 11:25:44 +0200
Thomas Renninger <trenn@suse.de> wrote:
> On the Opteron or Xeon system? That would mean that reducing frequency
> from OS still is an important power consumption knob even on latest
> Westmere systems.
it is for staying out of the turbo range. the turbo range is not power
efficient (but good for performance)
below turbo, the actual impact is a LOT less....
--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Improving High-Load Performance with the Ondemand Governor [PATCH ATTACHED]
2010-09-17 13:45 ` David C Niemi
2010-09-18 10:13 ` [linux-pm] " Sripathy, Vishwanath
@ 2010-09-18 10:13 ` Sripathy, Vishwanath
1 sibling, 0 replies; 15+ messages in thread
From: Sripathy, Vishwanath @ 2010-09-18 10:13 UTC (permalink / raw)
To: David C Niemi, cpufreq; +Cc: discuss, linux-pm, Arjan van de Ven
David,
> -----Original Message-----
> From: linux-pm-bounces@lists.linux-foundation.org [mailto:linux-pm-
> bounces@lists.linux-foundation.org] On Behalf Of David C Niemi
> Sent: Friday, September 17, 2010 7:15 PM
> To: cpufreq@vger.kernel.org
> Cc: discuss@lesswatts.org; linux-pm@lists.linux-foundation.org; Arjan van de Ven
> Subject: Re: [linux-pm] Improving High-Load Performance with the Ondemand
> Governor [PATCH ATTACHED]
>
> Thomas Renninger wrote:
> > On Thursday 16 September 2010 22:39:48 David C Niemi wrote:
> >
> >> I've been doing more testing, and have a couple of observations. I'm
> >> attaching a minimal form of my changes as a patch for the latest
> >> 2.6.pre36 git version of the driver. However, it is difficult for me to
> >> test under anything other than 2.6.32 (RHEL 6 beta 2), and there are
> >> some minor differences, though I don't believe they are relevant to my
> >> results.
> >>
> > ...
> > Adrian van dev Van "pre-announced" changes in the cpufreq area about
> > half a year ago:
> >
> I saw his message. I expect substantial changes are needed in the long
> run, but good alternatives to the Ondemand governor are not ready yet
> and will have to go through a long period of testing on many kinds of
> hardware. The patch I sent is much more tactical in nature. It
> intended to be a light-touch, low-risk change, adding one tunable (under
> a name that existed previously in the Conservative governor) and without
> changing default behavior in any way.
>
> > http://www.betriebssysteme.org/Aktivitaeten/Treffen/2009-
> Bommerholz/Programm/docs/Talks/richling.pdf
> >
> Thanks for the link. I think integration with the scheduler makes a lot
> of sense in the long run. I see that particular paper as being a bit
> one-dimensional, though:
> - It focused energy consumption and performance while completing a
> defined task, not power consumption on a mix of tasks and idle time.
> Energy consumed in a defined task is an interesting data point, but not
> even close to the only one; power consumption while in idle or switching
> in and out of idle is how most of our CPU cores spend most of their time.
> - There is no inherent reason the Ondemand governor should be inferior
> to the Performance governor on long-running tasks (at least with my patch).
> - They only looked at AMD hardware. Intel CPUs behave a lot
> differently, relying a lot more on C-States than P-States for power
> savings, and they may differ in other ways too.
> - There will need to be some tunables, even with a very smart governor
> integrated with the scheduler. For example, where along the
> performance/power consumption tradeoff should the scheduler/governor be
> aiming? Should it be optimizing for single-thread or many-thread
> performance? Should it try to shut down a whole CPU (or core)
> completely whenever possible, or keep everything running in active
> idle? How important is it to react quickly at the onset of load?
> - Ultimately we need to know something about which P-states do the most
> work per unit energy, and that is not going to be the same for every
> CPU. I'm skeptical having a wide range of P-States makes much sense.
> There should perhaps be 3 states only per core: (A) minimum power active
> idle, (B) maximum efficiency in terms of work done per unit energy, and
> (C) maximum performance with no regard for energy consumption per se.
> There are certain special steady-state workloads where an intermediate
> power state is truly helpful, like Blu-Ray playback, but that one in
> particular is being taken on by firmware over time, and I'm not sure
> they are worth optimizing for.
> - Ideally the hardware/firmware should have the task of making sure it
> doesn't burn itself up, managing voltages and turning things on/off
> appropriate for each P-state and/or C-state, giving the operating system
> visibility into what is going on with respect to power consumption and
> states, and otherwise following orders from the operating system about
> what needs to be done. I think some implementations have gone too far
> in the direction of trying to implement governor-like smarts into the
> firmware or CPU, while inherently lacking the operating system's more
> complete view of what is trying to be accomplished.
>
> > Interesting is:
> > ---------------------
> > I've testing on a dual Xeon X5680 system
> > (other times I've been testing on 2-year-old dual Opterons).
> > I observe about a 10W power consumption reduction at idle between the
> > "performance" governor and the "ondemand" governor.
> > ---------------------
> > On the Opteron or Xeon system? That would mean that reducing frequency
> > from OS still is an important power consumption knob even on latest Westmere
> > systems.
> >
>
> That was on the 32nm (Westmere) CPUs, with hyperthreading on. On
> Opterons power consumption differences between Performance and Ondemand
> are much larger, like I mentioned AMD and Intel behave a lot differently
> here. They also change behavior over time -- older Intel CPUs
> (Woodcrest) had almost negligible power consumption differences by
> changing clock speed, and some of them were not even capable of changing
> clock speed at all. AMD has tended to allow very slow idle states,
> around 1 GHz, while Intel's minimum is at 1.596 GHz; but Intel has been
> more aggressive about shutting off inactive parts of caches and cores.
>
> So anyway, I believe the Ondemand governor will continue to have a lot
> of relevance for another year at least, until a replacement is (a) fully
> implemented, (b) widely tested, and (c) works its way downstream to
> distributions. Without something like this patch, I'll be stuck with
> the Performance governor in the mean time, which is far worse.
Do you mind sharing this patch again? The subject says patch attached, but could not find this patch in the mailing list :-(
Thanks
Vishwa
>
> David C Niemi
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/linux-pm
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [linux-pm] Improving High-Load Performance with the Ondemand Governor [PATCH ATTACHED]
2010-09-17 13:45 ` David C Niemi
@ 2010-09-18 10:13 ` Sripathy, Vishwanath
2010-09-18 10:13 ` Sripathy, Vishwanath
1 sibling, 0 replies; 15+ messages in thread
From: Sripathy, Vishwanath @ 2010-09-18 10:13 UTC (permalink / raw)
To: David C Niemi, cpufreq; +Cc: discuss, linux-pm, Arjan van de Ven
David,
> -----Original Message-----
> From: linux-pm-bounces@lists.linux-foundation.org [mailto:linux-pm-
> bounces@lists.linux-foundation.org] On Behalf Of David C Niemi
> Sent: Friday, September 17, 2010 7:15 PM
> To: cpufreq@vger.kernel.org
> Cc: discuss@lesswatts.org; linux-pm@lists.linux-foundation.org; Arjan van de Ven
> Subject: Re: [linux-pm] Improving High-Load Performance with the Ondemand
> Governor [PATCH ATTACHED]
>
> Thomas Renninger wrote:
> > On Thursday 16 September 2010 22:39:48 David C Niemi wrote:
> >
> >> I've been doing more testing, and have a couple of observations. I'm
> >> attaching a minimal form of my changes as a patch for the latest
> >> 2.6.pre36 git version of the driver. However, it is difficult for me to
> >> test under anything other than 2.6.32 (RHEL 6 beta 2), and there are
> >> some minor differences, though I don't believe they are relevant to my
> >> results.
> >>
> > ...
> > Adrian van dev Van "pre-announced" changes in the cpufreq area about
> > half a year ago:
> >
> I saw his message. I expect substantial changes are needed in the long
> run, but good alternatives to the Ondemand governor are not ready yet
> and will have to go through a long period of testing on many kinds of
> hardware. The patch I sent is much more tactical in nature. It
> intended to be a light-touch, low-risk change, adding one tunable (under
> a name that existed previously in the Conservative governor) and without
> changing default behavior in any way.
>
> > http://www.betriebssysteme.org/Aktivitaeten/Treffen/2009-
> Bommerholz/Programm/docs/Talks/richling.pdf
> >
> Thanks for the link. I think integration with the scheduler makes a lot
> of sense in the long run. I see that particular paper as being a bit
> one-dimensional, though:
> - It focused energy consumption and performance while completing a
> defined task, not power consumption on a mix of tasks and idle time.
> Energy consumed in a defined task is an interesting data point, but not
> even close to the only one; power consumption while in idle or switching
> in and out of idle is how most of our CPU cores spend most of their time.
> - There is no inherent reason the Ondemand governor should be inferior
> to the Performance governor on long-running tasks (at least with my patch).
> - They only looked at AMD hardware. Intel CPUs behave a lot
> differently, relying a lot more on C-States than P-States for power
> savings, and they may differ in other ways too.
> - There will need to be some tunables, even with a very smart governor
> integrated with the scheduler. For example, where along the
> performance/power consumption tradeoff should the scheduler/governor be
> aiming? Should it be optimizing for single-thread or many-thread
> performance? Should it try to shut down a whole CPU (or core)
> completely whenever possible, or keep everything running in active
> idle? How important is it to react quickly at the onset of load?
> - Ultimately we need to know something about which P-states do the most
> work per unit energy, and that is not going to be the same for every
> CPU. I'm skeptical having a wide range of P-States makes much sense.
> There should perhaps be 3 states only per core: (A) minimum power active
> idle, (B) maximum efficiency in terms of work done per unit energy, and
> (C) maximum performance with no regard for energy consumption per se.
> There are certain special steady-state workloads where an intermediate
> power state is truly helpful, like Blu-Ray playback, but that one in
> particular is being taken on by firmware over time, and I'm not sure
> they are worth optimizing for.
> - Ideally the hardware/firmware should have the task of making sure it
> doesn't burn itself up, managing voltages and turning things on/off
> appropriate for each P-state and/or C-state, giving the operating system
> visibility into what is going on with respect to power consumption and
> states, and otherwise following orders from the operating system about
> what needs to be done. I think some implementations have gone too far
> in the direction of trying to implement governor-like smarts into the
> firmware or CPU, while inherently lacking the operating system's more
> complete view of what is trying to be accomplished.
>
> > Interesting is:
> > ---------------------
> > I've testing on a dual Xeon X5680 system
> > (other times I've been testing on 2-year-old dual Opterons).
> > I observe about a 10W power consumption reduction at idle between the
> > "performance" governor and the "ondemand" governor.
> > ---------------------
> > On the Opteron or Xeon system? That would mean that reducing frequency
> > from OS still is an important power consumption knob even on latest Westmere
> > systems.
> >
>
> That was on the 32nm (Westmere) CPUs, with hyperthreading on. On
> Opterons power consumption differences between Performance and Ondemand
> are much larger, like I mentioned AMD and Intel behave a lot differently
> here. They also change behavior over time -- older Intel CPUs
> (Woodcrest) had almost negligible power consumption differences by
> changing clock speed, and some of them were not even capable of changing
> clock speed at all. AMD has tended to allow very slow idle states,
> around 1 GHz, while Intel's minimum is at 1.596 GHz; but Intel has been
> more aggressive about shutting off inactive parts of caches and cores.
>
> So anyway, I believe the Ondemand governor will continue to have a lot
> of relevance for another year at least, until a replacement is (a) fully
> implemented, (b) widely tested, and (c) works its way downstream to
> distributions. Without something like this patch, I'll be stuck with
> the Performance governor in the mean time, which is far worse.
Do you mind sharing this patch again? The subject says patch attached, but could not find this patch in the mailing list :-(
Thanks
Vishwa
>
> David C Niemi
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/linux-pm
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Improving High-Load Performance with the Ondemand Governor [PATCH ATTACHED]
2010-09-16 20:39 ` Improving High-Load Performance with the Ondemand Governor [PATCH ATTACHED] David C Niemi
2010-09-17 9:25 ` Thomas Renninger
2010-09-17 9:25 ` Thomas Renninger
@ 2010-09-29 18:18 ` Venkatesh Pallipadi
2 siblings, 0 replies; 15+ messages in thread
From: Venkatesh Pallipadi @ 2010-09-29 18:18 UTC (permalink / raw)
To: David C Niemi; +Cc: cpufreq
On Thu, Sep 16, 2010 at 1:39 PM, David C Niemi <dniemi@verisign.com> wrote:
> I've been doing more testing, and have a couple of observations. I'm
> attaching a minimal form of my changes as a patch for the latest 2.6.pre36
> git version of the driver. However, it is difficult for me to test under
> anything other than 2.6.32 (RHEL 6 beta 2), and there are some minor
> differences, though I don't believe they are relevant to my results.
>
> It looks like "io_is_busy" set to 1 is quite beneficial for quickly reacting
> the onset of load.
>
> I do see a lot of downshifting from the top speed when a core is at "100%"
> CPU, presumably this means little stalls and lulls, so I expect
> "sampling_down_factor" values greater than 1 continue to be useful and the
> sampling_down_factor continues to be desirable.
>
> I've testing on a dual Xeon X5680 system (other times I've been testing on
> 2-year-old dual Opterons).
>
> I observe about a 10W power consumption reduction at idle between the
> "performance" governor and the "ondemand" governor. I've seen even bigger
> differences under load, as much as 40 watts, though that could be associated
> with some performance differences. I haven't tried to quantify the effect
> of the sampling_down_factor tunable on power consumption under load,
> presumably it increases it, but its usage is voluntary and that is to be
> expected.
>
> I have been unable to find a value of up_threshold that does not switch
> frequency on at least one core pretty frequently (ranging a couple of times
> a minute to several times a second). However, with fairly fast sampling
> intervals (10000 to 50000) I see pretty quick reaction to load even with
> UP_THRESHOLD set high (e.g. 50 or even 95). So it is likely my previous
> efforts to extend the possible values of UP_THRESHOLD from 11 to 5 are no
> longer necessary, and are not included in the attached patch. There are
> other things I would like to consider doing, however, that I'll bring up
> afterwards, but not in this minimal patch.
>
I do see this change in the patch. From the comment above, you did not
want that change?
#define MICRO_FREQUENCY_UP_THRESHOLD (95)
#define MICRO_FREQUENCY_MIN_SAMPLE_RATE (10000)
-#define MIN_FREQUENCY_UP_THRESHOLD (11)
+#define MIN_FREQUENCY_UP_THRESHOLD (5)
#define MAX_FREQUENCY_UP_THRESHOLD (100)
The problem with 5 is that when DEF_FREQUENCY_DOWN_DIFFERENTIAL is
used, up_threshold - down_differential calculation in the code can end
up being negative, which is not good.
One minor comment.
+
+ mutex_lock(&dbs_mutex);
+ if (ret != 1 || input > MAX_SAMPLING_DOWN_FACTOR || input < 1) {
+ mutex_unlock(&dbs_mutex);
+ return -EINVAL;
+ }
+
+ dbs_tuners_ins.sampling_down_factor = input;
You can move mutex_lock after input validation to make it a bit more clean.
Otherwise patch looks good. I agree that having sampling_down_factor
as a tunable will be useful under some situations.
Thanks,
Venki
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2010-09-29 18:18 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-09 14:28 Improving High-Load Performance with the Ondemand Governor David C Niemi
2010-09-10 7:40 ` Andi Kleen
2010-09-13 20:18 ` David C Niemi
2010-09-13 20:54 ` Andi Kleen
2010-09-13 22:02 ` David C Niemi
2010-09-16 20:39 ` Improving High-Load Performance with the Ondemand Governor [PATCH ATTACHED] David C Niemi
2010-09-17 9:25 ` Thomas Renninger
2010-09-17 9:25 ` Thomas Renninger
2010-09-17 13:45 ` David C Niemi
2010-09-18 10:13 ` [linux-pm] " Sripathy, Vishwanath
2010-09-18 10:13 ` Sripathy, Vishwanath
2010-09-17 13:45 ` David C Niemi
2010-09-17 13:46 ` Arjan van de Ven
2010-09-17 13:46 ` Arjan van de Ven
2010-09-29 18:18 ` Venkatesh Pallipadi
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.