All of lore.kernel.org
 help / color / mirror / Atom feed
* Improving High-Load Performance with the Ondemand Governor
@ 2010-09-09 14:28 David C Niemi
  2010-09-10  7:40 ` Andi Kleen
  2010-09-16 20:39 ` Improving High-Load Performance with the Ondemand Governor [PATCH ATTACHED] David C Niemi
  0 siblings, 2 replies; 15+ messages in thread
From: David C Niemi @ 2010-09-09 14:28 UTC (permalink / raw)
  To: cpufreq

I have tested patches for both 2.6.18 and 2.6.32, but before sharing 
them I'd like to first describe the problem I'm trying to solve and the 
strategy I've been trying and get some feedback on it.

I have an application for RHEL 5-based network servers where the 
Performance "governor" was being used due to measurably worse 
performance with the stock Ondemand governor.  The hardware includes 
Woodcrest, Opteron, and Nehalem dual-socket machines with CPUs towards 
the high-performance end.  My changes have been in production use for 
over a year on RHEL 5.x, and I'm now looking at applying them to RHEL 6 
and would like to get them into the mainstream kernel.  I believe my 
changes can be generally beneficial to neutral across all applications 
if done right.

The workload has periods of really high CPU utilization with lulls in 
between, and the servers need to respond quickly to the onset of load to 
avoid dropping packets.  This resulted in 3 goals for my work with the 
governor:

1) Negligible overhead when at high CPU utilization
2) Save power when truly idle
3) Ramp up quickly to the high-performance state when load appears

One of the first things I discovered is that the Ondemand governor has 
symmetric logic for deciding to increase or decrease clock speed.  This 
might be good for a battery-powered device, but under heavy load, the 
overhead of checking load on all cores on a frequent basis impairs 
performance very noticeably.  I also noticed that even under heavy 
loads, the CPU speed would not remain at maximum all the time.  The 
governor was seeking any chance to downshift for the slightest perceive 
dip in load, which in this case resulted in dropped packets; this is 
simply not good behavior for my application.

Sampling less frequently helps somewhat, but not enough, and conflicts 
with goal #3.

Lowering up_threshold helps somewhat too, but not enough, as it can only 
be lowered to 11 and it does not solve the conflict between goals #1 and #3.

My Strategy:

1) (re)introduced the sampling_down tunable, but made it work a bit 
differently.  This turned out to be the centerpiece and most important 
of all my changes.  When set to 1 (default) it changes nothing from 
existing behavior.  If set to more than one, it is a multiplier for the 
scheduling interval when in the top CPU speed.  So if we set it to 100, 
the overhead of checking for idle CPU is reduce to 1% what it was when 
we are really busy, and we are much less prone to downshift as long as 
we continue to be busy.  But as soon as we are not at the top speed, 
scheduling goes back to normal so we can quickly respond to a load spike.

2) made it possible for up_threshold to be set much lower (5) to improve 
responsiveness to sudden load spikes.

3) Made hysteresis (DOWN_DIFFERENTIAL) scalable based on up_threshold, 
in order to make it possible to reach an up_threshold of 5.

4) Clock speed jitter is highly undesirable, and became more noticeable 
when up_threshold is small.  A specific problem I found is that the 
overhead of lowering clock speed can be mistaken for more load, causing 
the CPU to upshift again right away.  I solved this by throwing away the 
sample right after reducing speed, as it is never going to be a good 
indication of what the normal load really is.  When increasing speed, 
the extra load is harmless and nothing needs to be changed.

Additional observations:

5) I don't like the addition of a down_differential variable per CPU.  I 
consider it to be unnecessary baggage, and would prefer to always 
calculate down_differential (hysteresis) whenever needed on the fly 
based on up_threshold.  I don't think it should be a tunable because 
there is a fairly narrow range of useful values that are probably better 
to calculate automatically.

6) The MICRO_FREQUENCY changes are not very helpful to my cause.  An 
UP_THRESHOLD of 95 is awful for my goal #3, a DOWN_DIFFERENTIAL of 3 is 
very jitter-inducing, and a sample rate (really interval) of 10000 is 
way too fast.  I'd like to hear what these changes are intended to do so 
I can preserve their intent while meeting my needs too.

David C Niemi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Improving High-Load Performance with the Ondemand Governor
  2010-09-09 14:28 Improving High-Load Performance with the Ondemand Governor David C Niemi
@ 2010-09-10  7:40 ` Andi Kleen
  2010-09-13 20:18   ` David C Niemi
  2010-09-16 20:39 ` Improving High-Load Performance with the Ondemand Governor [PATCH ATTACHED] David C Niemi
  1 sibling, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2010-09-10  7:40 UTC (permalink / raw)
  To: David C Niemi; +Cc: cpufreq, lenb

David C Niemi <dniemi@verisign.com> writes:

Perhaps better post to linux-kernel next time, I think cpufreq
is mostly dead these days.

> I have tested patches for both 2.6.18 and 2.6.32, but before sharing
> them I'd like to first describe the problem I'm trying to solve and
> the strategy I've been trying and get some feedback on it.

These are all ancient in terms of mainline kernel. The latest
kernel should have some improvements, perhaps try them first.

On Nehalem class systems with recent kernels it often helps to use the
"intel_idle" driver too, because that gives the governour more 
accurate latencies to work with. Many BIOS are known to report
incorrect latencies.

> The workload has periods of really high CPU utilization with lulls in
> between, and the servers need to respond quickly to the onset of load
> to avoid dropping packets.  This resulted in 3 goals for my work with
> the governor:
>
> 1) Negligible overhead when at high CPU utilization
> 2) Save power when truly idle
> 3) Ramp up quickly to the high-performance state when load appears

FWIW when you're truly idle you typically don't need ondemand,
the idle states on modern CPUs go to the lowest frequency by themselves
or simply turn off the frequency completely.

ondemand and p-states mainly help you on moderate load.

Just going to highest state unconditionally would be somewhat 
contraproductive to that goal.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Improving High-Load Performance with the Ondemand Governor
  2010-09-10  7:40 ` Andi Kleen
@ 2010-09-13 20:18   ` David C Niemi
  2010-09-13 20:54     ` Andi Kleen
  0 siblings, 1 reply; 15+ messages in thread
From: David C Niemi @ 2010-09-13 20:18 UTC (permalink / raw)
  To: Andi Kleen; +Cc: cpufreq, lenb

Andi Kleen wrote:
>
> David C Niemi <dniemi@verisign.com> writes:
>
> Perhaps better post to linux-kernel next time, I think cpufreq
> is mostly dead these days.
>
Hello Andi, thanks for your quick response.

There were some lively discussions on it in the fairly recent past.  
I'll post on linux-kernel if I don't get enough feedback.
>
> > I have tested patches for both 2.6.18 and 2.6.32, but before sharing
> > them I'd like to first describe the problem I'm trying to solve and
> > the strategy I've been trying and get some feedback on it.
>
> These are all ancient in terms of mainline kernel. The latest
> kernel should have some improvements, perhaps try them first.
>
I have looked at the latest kernels too, and the changes in the ondemand 
governor between that and RHEL 6's 2.6.32 kernel are quite modest.  I 
mention 2.6.18 just because it's what's been out in the field a while.

> On Nehalem class systems with recent kernels it often helps to use the
> "intel_idle" driver too, because that gives the governour more
> accurate latencies to work with. Many BIOS are known to report
> incorrect latencies.
>
Thanks for the suggestion.

I haven't seen much in the way of inaccurate latency problems, but then 
most of my testing has been on a fairly constrained set of fairly good 
hardware.

> > The workload has periods of really high CPU utilization with lulls in
> > between, and the servers need to respond quickly to the onset of load
> > to avoid dropping packets.  This resulted in 3 goals for my work with
> > the governor:
> >
> > 1) Negligible overhead when at high CPU utilization
> > 2) Save power when truly idle
> > 3) Ramp up quickly to the high-performance state when load appears
>
> FWIW when you're truly idle you typically don't need ondemand,
> the idle states on modern CPUs go to the lowest frequency by themselves
> or simply turn off the frequency completely.
>
I do see c-states getting used on Intel hardware to save power, and in 
some cases these are quite effective.  On AMD hardware lowering 
frequency tends to be very important to saving power.  But you must 
choose some governor or other, and if you choose the performance 
(non)governor clock frequency does NOT change by itself.  There are 
other governors more attuned to portable devices, but that's a different 
application; the ondemand governor is the closest I could find.

> ondemand and p-states mainly help you on moderate load.
>
> Just going to highest state unconditionally would be somewhat
> contraproductive to that goal.
>
On moderate load I might agree, but on the servers I care about it is a 
workload that's a bit like war -- long periods of boredom punctuated by 
sudden bursts of sheer terror.  So I am really only very interested in 
active idle and max performance, not so much states in between.  Of 
course, on new Intel hardware that decision can be made in a fairly 
fine-grained way; you do not have to ramp up every core just because one 
is busy.  But if performance during the peaks is inferior to the 
performance non-governor, we will end up being told to use that and 
running flat-out all the time and save no power at all other than that 
automatically saved through c-states.

David C Niemi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Improving High-Load Performance with the Ondemand Governor
  2010-09-13 20:18   ` David C Niemi
@ 2010-09-13 20:54     ` Andi Kleen
  2010-09-13 22:02       ` David C Niemi
  0 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2010-09-13 20:54 UTC (permalink / raw)
  To: David C Niemi; +Cc: cpufreq, lenb

On Mon, 13 Sep 2010 16:18:51 -0400
David C Niemi <dniemi@verisign.com> wrote:


> > > I have tested patches for both 2.6.18 and 2.6.32, but before
> > > sharing them I'd like to first describe the problem I'm trying to
> > > solve and the strategy I've been trying and get some feedback on
> > > it.
> >
> > These are all ancient in terms of mainline kernel. The latest
> > kernel should have some improvements, perhaps try them first.
> >
> I have looked at the latest kernels too, and the changes in the
> ondemand governor between that and RHEL 6's 2.6.32 kernel are quite
> modest.  I mention 2.6.18 just because it's what's been out in the
> field a while.

Most of the interesting changes were post 2.6.32 (2.6.32 is ancient
too for mainline)  

> > FWIW when you're truly idle you typically don't need ondemand,
> > the idle states on modern CPUs go to the lowest frequency by
> > themselves or simply turn off the frequency completely.
> >
> I do see c-states getting used on Intel hardware to save power, and

ondemand has nothing to do with c-states, c-states are handled
by the menu governor.

> in some cases these are quite effective.  On AMD hardware lowering 
> frequency tends to be very important to saving power.

AFAIK modern AMD doesn't need this either in c-states.

> > ondemand and p-states mainly help you on moderate load.
> >
> > Just going to highest state unconditionally would be somewhat
> > contraproductive to that goal.
> >
> On moderate load I might agree, but on the servers I care about it is
> a workload that's a bit like war -- long periods of boredom
> punctuated by sudden bursts of sheer terror. 

In this case on modern hardware you don't need a p-state
governor at all except for "performance"

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Improving High-Load Performance with the Ondemand Governor
  2010-09-13 20:54     ` Andi Kleen
@ 2010-09-13 22:02       ` David C Niemi
  0 siblings, 0 replies; 15+ messages in thread
From: David C Niemi @ 2010-09-13 22:02 UTC (permalink / raw)
  To: Andi Kleen; +Cc: cpufreq

Andi Kleen wrote:
> On Mon, 13 Sep 2010 16:18:51 -0400
> David C Niemi <dniemi@verisign.com> wrote:
>
>   
>> I have looked at the latest kernels too, and the changes in the
>> ondemand governor between that and RHEL 6's 2.6.32 kernel are quite
>> modest.  I mention 2.6.18 just because it's what's been out in the
>> field a while.
>>     
>
> Most of the interesting changes were post 2.6.32 (2.6.32 is ancient
> too for mainline)  
>   
I did see a few changes in cpufreq_ondemand.c between 2.6.32 and the git 
version I grabbed last week, but not really relevant to what I was 
trying to do.

>>> FWIW when you're truly idle you typically don't need ondemand,
>>> the idle states on modern CPUs go to the lowest frequency by
>>> themselves or simply turn off the frequency completely.
>>>
>>>       
>> I do see c-states getting used on Intel hardware to save power, and
>>     
>
> ondemand has nothing to do with c-states, c-states are handled
> by the menu governor.
>   
We're using the standard cpuidle on the newer (RHEL 6 beta-based) 
kernels.  If you think there are compelling improvements in it after 
2.6.32 I'll certainly take a look.
 
>> in some cases these are quite effective.  On AMD hardware lowering 
>> frequency tends to be very important to saving power.
>>     
>
> AFAIK modern AMD doesn't need this either in c-states.
>   
It makes a dramatic difference in power consumption whether you use a 
p-state governor on the 2-year-old AMD hardware that matters to me.  On 
both old (Woodcrest) and new (Nehalem) Intel hardware the difference is 
much smaller, as c-states are the dominant form of power saving, but 
using a p-state governor still makes a measurable difference.  On the 
plus side the power-saving c-states we are using don't measurably hurt 
performance on our workloads, so cpuidle is doing a pretty good job; 
whereas the stock ondemand p-state governor does in a big way.

>> On moderate load I might agree, but on the servers I care about it is
>> a workload that's a bit like war -- long periods of boredom
>> punctuated by sudden bursts of sheer terror. 
>>     
>
> In this case on modern hardware you don't need a p-state
> governor at all except for "performance"
>
> -Andi
>   
No doubt true in the long run, but see above.

DCN

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Improving High-Load Performance with the Ondemand Governor [PATCH ATTACHED]
  2010-09-09 14:28 Improving High-Load Performance with the Ondemand Governor David C Niemi
  2010-09-10  7:40 ` Andi Kleen
@ 2010-09-16 20:39 ` David C Niemi
  2010-09-17  9:25   ` Thomas Renninger
                     ` (2 more replies)
  1 sibling, 3 replies; 15+ messages in thread
From: David C Niemi @ 2010-09-16 20:39 UTC (permalink / raw)
  To: cpufreq

[-- Attachment #1: Type: text/plain, Size: 1940 bytes --]

I've been doing more testing, and have a couple of observations.  I'm 
attaching a minimal form of my changes as a patch for the latest 
2.6.pre36 git version of the driver.  However, it is difficult for me to 
test under anything other than 2.6.32 (RHEL 6 beta 2), and there are 
some minor differences, though I don't believe they are relevant to my 
results.

It looks like "io_is_busy" set to 1 is quite beneficial for quickly 
reacting the onset of load.

I do see a lot of downshifting from the top speed when a core is at 
"100%" CPU, presumably this means little stalls and lulls, so I expect 
"sampling_down_factor" values greater than 1 continue to be useful and 
the sampling_down_factor continues to be desirable.

I've testing on a dual Xeon X5680 system (other times I've been testing 
on 2-year-old dual Opterons).

I observe about a 10W power consumption reduction at idle between the 
"performance" governor and the "ondemand" governor.  I've seen even 
bigger differences under load, as much as 40 watts, though that could be 
associated with some performance differences.  I haven't tried to 
quantify the effect of the sampling_down_factor tunable on power 
consumption under load, presumably it increases it, but its usage is 
voluntary and that is to be expected.

I have been unable to find a value of up_threshold that does not switch 
frequency on at least one core pretty frequently (ranging a couple of 
times a minute to several times a second).  However, with fairly fast 
sampling intervals (10000 to 50000) I see pretty quick reaction to load 
even with UP_THRESHOLD set high (e.g. 50 or even 95).  So it is likely 
my previous efforts to extend the possible values of UP_THRESHOLD from 
11 to 5 are no longer necessary, and are not included in the attached 
patch.  There are other things I would like to consider doing, however, 
that I'll bring up afterwards, but not in this minimal patch.

David C Niemi

[-- Attachment #2: cpufreq_ondemand.c-git.patch --]
[-- Type: text/x-patch, Size: 3608 bytes --]

--- cpufreq_ondemand.c-git	2010-09-08 16:02:01.000000000 -0400
+++ cpufreq_ondemand.c-git-dcn	2010-09-16 16:31:27.000000000 -0400
@@ -30,10 +30,12 @@
 
 #define DEF_FREQUENCY_DOWN_DIFFERENTIAL		(10)
 #define DEF_FREQUENCY_UP_THRESHOLD		(80)
+#define DEF_SAMPLING_DOWN_FACTOR		(1)
+#define MAX_SAMPLING_DOWN_FACTOR		(100000)
 #define MICRO_FREQUENCY_DOWN_DIFFERENTIAL	(3)
 #define MICRO_FREQUENCY_UP_THRESHOLD		(95)
 #define MICRO_FREQUENCY_MIN_SAMPLE_RATE		(10000)
-#define MIN_FREQUENCY_UP_THRESHOLD		(11)
+#define MIN_FREQUENCY_UP_THRESHOLD		(5)
 #define MAX_FREQUENCY_UP_THRESHOLD		(100)
 
 /*
@@ -82,6 +84,7 @@
 	unsigned int freq_lo;
 	unsigned int freq_lo_jiffies;
 	unsigned int freq_hi_jiffies;
+	unsigned int rate_mult;
 	int cpu;
 	unsigned int sample_type:1;
 	/*
@@ -108,10 +111,12 @@
 	unsigned int up_threshold;
 	unsigned int down_differential;
 	unsigned int ignore_nice;
+	unsigned int sampling_down_factor;
 	unsigned int powersave_bias;
 	unsigned int io_is_busy;
 } dbs_tuners_ins = {
 	.up_threshold = DEF_FREQUENCY_UP_THRESHOLD,
+	.sampling_down_factor = DEF_SAMPLING_DOWN_FACTOR,
 	.down_differential = DEF_FREQUENCY_DOWN_DIFFERENTIAL,
 	.ignore_nice = 0,
 	.powersave_bias = 0,
@@ -259,6 +264,7 @@
 show_one(sampling_rate, sampling_rate);
 show_one(io_is_busy, io_is_busy);
 show_one(up_threshold, up_threshold);
+show_one(sampling_down_factor, sampling_down_factor);
 show_one(ignore_nice_load, ignore_nice);
 show_one(powersave_bias, powersave_bias);
 
@@ -340,6 +346,32 @@
 	return count;
 }
 
+static ssize_t store_sampling_down_factor(struct kobject *a,
+			struct attribute *b, const char *buf, size_t count)
+{
+	unsigned int input, j;
+	int ret;
+	ret = sscanf(buf, "%u", &input);
+
+	mutex_lock(&dbs_mutex);
+	if (ret != 1 || input > MAX_SAMPLING_DOWN_FACTOR || input < 1) {
+		mutex_unlock(&dbs_mutex);
+		return -EINVAL;
+	}
+
+	dbs_tuners_ins.sampling_down_factor = input;
+
+	/* Reset down sampling multiplier in case it was active */
+	for_each_online_cpu(j) {
+		struct cpu_dbs_info_s *dbs_info;
+		dbs_info = &per_cpu(od_cpu_dbs_info, j);
+		dbs_info->rate_mult = 1;
+	}
+	mutex_unlock(&dbs_mutex);
+
+	return count;
+}
+
 static ssize_t store_ignore_nice_load(struct kobject *a, struct attribute *b,
 				      const char *buf, size_t count)
 {
@@ -409,6 +441,7 @@
 	&sampling_rate_min.attr,
 	&sampling_rate.attr,
 	&up_threshold.attr,
+	&sampling_down_factor.attr,
 	&ignore_nice_load.attr,
 	&powersave_bias.attr,
 	&io_is_busy.attr,
@@ -562,6 +595,10 @@
 
 	/* Check for frequency increase */
 	if (max_load_freq > dbs_tuners_ins.up_threshold * policy->cur) {
+		/* If switching to max speed, apply sampling_down_factor */
+		if (policy->cur < policy->max)
+			this_dbs_info->rate_mult =
+				dbs_tuners_ins.sampling_down_factor;
 		dbs_freq_increase(policy, policy->max);
 		return;
 	}
@@ -584,6 +621,9 @@
 				(dbs_tuners_ins.up_threshold -
 				 dbs_tuners_ins.down_differential);
 
+		/* No longer fully busy, reset rate_mult */
+		this_dbs_info->rate_mult = 1;
+
 		if (freq_next < policy->min)
 			freq_next = policy->min;
 
@@ -607,7 +647,8 @@
 	int sample_type = dbs_info->sample_type;
 
 	/* We want all CPUs to do sampling nearly on same jiffy */
-	int delay = usecs_to_jiffies(dbs_tuners_ins.sampling_rate);
+	int delay = usecs_to_jiffies(dbs_tuners_ins.sampling_rate
+		* dbs_info->rate_mult);
 
 	if (num_online_cpus() > 1)
 		delay -= jiffies % delay;
@@ -711,6 +752,7 @@
 			}
 		}
 		this_dbs_info->cpu = cpu;
+		this_dbs_info->rate_mult = 1;
 		ondemand_powersave_bias_init_cpu(cpu);
 		/*
 		 * Start the timerschedule work, when this governor

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Improving High-Load Performance with the Ondemand Governor [PATCH ATTACHED]
  2010-09-16 20:39 ` Improving High-Load Performance with the Ondemand Governor [PATCH ATTACHED] David C Niemi
@ 2010-09-17  9:25   ` Thomas Renninger
  2010-09-17  9:25   ` Thomas Renninger
  2010-09-29 18:18   ` Venkatesh Pallipadi
  2 siblings, 0 replies; 15+ messages in thread
From: Thomas Renninger @ 2010-09-17  9:25 UTC (permalink / raw)
  To: David C Niemi; +Cc: discuss, linux-pm, cpufreq, Arjan van de Ven

On Thursday 16 September 2010 22:39:48 David C Niemi wrote:
> I've been doing more testing, and have a couple of observations.  I'm 
> attaching a minimal form of my changes as a patch for the latest 
> 2.6.pre36 git version of the driver.  However, it is difficult for me to 
> test under anything other than 2.6.32 (RHEL 6 beta 2), and there are 
> some minor differences, though I don't believe they are relevant to my 
> results.
...
Adrian van dev Van "pre-announced" changes in the cpufreq area about
half a year ago:
Here is a comment from Arjan on the cpufreq list from 2010-04-19:
====================
Subject: Re: [PATCH 7/7] ondemand: Solve the big performance issue with 
ondemand during disk IO

<cut>
As for your general "ondemand is for everyone" concern; there are many
things wrong with ondemand, and I'm writing a new governor to fix the
more fundamental issues with it (and also, frankly, so that I won't
break existing users and hardware I don't have access to). This is
basically a backport of a specific feature of my new governor to
ondemand because Andrew keeps hitting the really bad case and basically
ended up turning power management off.
</cut>
====================

Unfortunately there didn't happen much since then.
If there is already some Alpha version or some measure results, it
would be great to see/compare those.

Also there is a research group who fiddled with that.
They also have a "new governor" approach and already have some
interesting results. They hopefully (said they will the next weeks)
can show some code which can be compared on the same HW
with your or other approaches then:
http://www.betriebssysteme.org/Aktivitaeten/Treffen/2009-Bommerholz/Programm/docs/Talks/richling.pdf

I expect with Arjan's latest "count IO as busy time" there is only some
fine tuning with the ondemand or say polling approach that could still
be done. The patch you sent probably increases performance a bit with again
some power trade-offs depending on the HW and C-states available.

Interesting is:
---------------------
I've testing on a dual Xeon X5680 system
(other times I've been testing on 2-year-old dual Opterons).
I observe about a 10W power consumption reduction at idle between the 
"performance" governor and the "ondemand" governor.
---------------------
On the Opteron or Xeon system? That would mean that reducing frequency
from OS still is an important power consumption knob even on latest Westmere
systems.

         Thomas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Improving High-Load Performance with the Ondemand Governor [PATCH ATTACHED]
  2010-09-16 20:39 ` Improving High-Load Performance with the Ondemand Governor [PATCH ATTACHED] David C Niemi
  2010-09-17  9:25   ` Thomas Renninger
@ 2010-09-17  9:25   ` Thomas Renninger
  2010-09-17 13:45     ` David C Niemi
                       ` (3 more replies)
  2010-09-29 18:18   ` Venkatesh Pallipadi
  2 siblings, 4 replies; 15+ messages in thread
From: Thomas Renninger @ 2010-09-17  9:25 UTC (permalink / raw)
  To: David C Niemi; +Cc: cpufreq, discuss, linux-pm, Arjan van de Ven

On Thursday 16 September 2010 22:39:48 David C Niemi wrote:
> I've been doing more testing, and have a couple of observations.  I'm 
> attaching a minimal form of my changes as a patch for the latest 
> 2.6.pre36 git version of the driver.  However, it is difficult for me to 
> test under anything other than 2.6.32 (RHEL 6 beta 2), and there are 
> some minor differences, though I don't believe they are relevant to my 
> results.
...
Adrian van dev Van "pre-announced" changes in the cpufreq area about
half a year ago:
Here is a comment from Arjan on the cpufreq list from 2010-04-19:
====================
Subject: Re: [PATCH 7/7] ondemand: Solve the big performance issue with 
ondemand during disk IO

<cut>
As for your general "ondemand is for everyone" concern; there are many
things wrong with ondemand, and I'm writing a new governor to fix the
more fundamental issues with it (and also, frankly, so that I won't
break existing users and hardware I don't have access to). This is
basically a backport of a specific feature of my new governor to
ondemand because Andrew keeps hitting the really bad case and basically
ended up turning power management off.
</cut>
====================

Unfortunately there didn't happen much since then.
If there is already some Alpha version or some measure results, it
would be great to see/compare those.

Also there is a research group who fiddled with that.
They also have a "new governor" approach and already have some
interesting results. They hopefully (said they will the next weeks)
can show some code which can be compared on the same HW
with your or other approaches then:
http://www.betriebssysteme.org/Aktivitaeten/Treffen/2009-Bommerholz/Programm/docs/Talks/richling.pdf

I expect with Arjan's latest "count IO as busy time" there is only some
fine tuning with the ondemand or say polling approach that could still
be done. The patch you sent probably increases performance a bit with again
some power trade-offs depending on the HW and C-states available.

Interesting is:
---------------------
I've testing on a dual Xeon X5680 system
(other times I've been testing on 2-year-old dual Opterons).
I observe about a 10W power consumption reduction at idle between the 
"performance" governor and the "ondemand" governor.
---------------------
On the Opteron or Xeon system? That would mean that reducing frequency
from OS still is an important power consumption knob even on latest Westmere
systems.

         Thomas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Improving High-Load Performance with the Ondemand Governor [PATCH ATTACHED]
  2010-09-17  9:25   ` Thomas Renninger
  2010-09-17 13:45     ` David C Niemi
@ 2010-09-17 13:45     ` David C Niemi
  2010-09-17 13:46     ` Arjan van de Ven
  2010-09-17 13:46     ` Arjan van de Ven
  3 siblings, 0 replies; 15+ messages in thread
From: David C Niemi @ 2010-09-17 13:45 UTC (permalink / raw)
  To: cpufreq; +Cc: discuss, linux-pm, Arjan van de Ven

Thomas Renninger wrote:
> On Thursday 16 September 2010 22:39:48 David C Niemi wrote:
>   
>> I've been doing more testing, and have a couple of observations.  I'm 
>> attaching a minimal form of my changes as a patch for the latest 
>> 2.6.pre36 git version of the driver.  However, it is difficult for me to 
>> test under anything other than 2.6.32 (RHEL 6 beta 2), and there are 
>> some minor differences, though I don't believe they are relevant to my 
>> results.
>>     
> ...
> Adrian van dev Van "pre-announced" changes in the cpufreq area about
> half a year ago:
>   
I saw his message.  I expect substantial changes are needed in the long 
run, but good alternatives to the Ondemand governor are not ready yet 
and will have to go through a long period of testing on many kinds of 
hardware.  The patch I sent is much more tactical in nature.  It 
intended to be a light-touch, low-risk change, adding one tunable (under 
a name that existed previously in the Conservative governor) and without 
changing default behavior in any way.

> http://www.betriebssysteme.org/Aktivitaeten/Treffen/2009-Bommerholz/Programm/docs/Talks/richling.pdf
>   
Thanks for the link.  I think integration with the scheduler makes a lot 
of sense in the long run.  I see that particular paper as being a bit 
one-dimensional, though:
- It focused energy consumption and performance while completing a 
defined task, not power consumption on a mix of tasks and idle time.  
Energy consumed in a defined task is an interesting data point, but not 
even close to the only one; power consumption while in idle or switching 
in and out of idle is how most of our CPU cores spend most of their time.
- There is no inherent reason the Ondemand governor should be inferior 
to the Performance governor on long-running tasks (at least with my patch).
- They only looked at AMD hardware.  Intel CPUs behave a lot 
differently, relying a lot more on C-States than P-States for power 
savings, and they may differ in other ways too.
- There will need to be some tunables, even with a very smart governor 
integrated with the scheduler.  For example, where along the 
performance/power consumption tradeoff should the scheduler/governor be 
aiming?  Should it be optimizing for single-thread or many-thread 
performance?  Should it try to shut down a whole CPU (or core) 
completely whenever possible, or keep everything running in active 
idle?  How important is it to react quickly at the onset of load?
- Ultimately we need to know something about which P-states do the most 
work per unit energy, and that is not going to be the same for every 
CPU.  I'm skeptical having a wide range of P-States makes much sense.  
There should perhaps be 3 states only per core: (A) minimum power active 
idle, (B) maximum efficiency in terms of work done per unit energy, and 
(C) maximum performance with no regard for energy consumption per se.  
There are certain special steady-state workloads where an intermediate 
power state is truly helpful, like Blu-Ray playback, but that one in 
particular is being taken on by firmware over time, and I'm not sure 
they are worth optimizing for.
- Ideally the hardware/firmware should have the task of making sure it 
doesn't burn itself up, managing voltages and turning things on/off 
appropriate for each P-state and/or C-state, giving the operating system 
visibility into what is going on with respect to power consumption and 
states, and otherwise following orders from the operating system about 
what needs to be done.  I think some implementations have gone too far 
in the direction of trying to implement governor-like smarts into the 
firmware or CPU, while inherently lacking the operating system's more 
complete view of what is trying to be accomplished.

> Interesting is:
> ---------------------
> I've testing on a dual Xeon X5680 system
> (other times I've been testing on 2-year-old dual Opterons).
> I observe about a 10W power consumption reduction at idle between the 
> "performance" governor and the "ondemand" governor.
> ---------------------
> On the Opteron or Xeon system? That would mean that reducing frequency
> from OS still is an important power consumption knob even on latest Westmere
> systems.
>   

That was on the 32nm (Westmere) CPUs, with hyperthreading on.  On 
Opterons power consumption differences between Performance and Ondemand 
are much larger, like I mentioned AMD and Intel behave a lot differently 
here.  They also change behavior over time -- older Intel CPUs 
(Woodcrest) had almost negligible power consumption differences by 
changing clock speed, and some of them were not even capable of changing 
clock speed at all.  AMD has tended to allow very slow idle states, 
around 1 GHz, while Intel's minimum is at 1.596 GHz; but Intel has been 
more aggressive about shutting off inactive parts of caches and cores.

So anyway, I believe the Ondemand governor will continue to have a lot 
of relevance for another year at least, until a replacement is (a) fully 
implemented, (b) widely tested, and (c) works its way downstream to 
distributions.  Without something like this patch, I'll be stuck with 
the Performance governor in the mean time, which is far worse.

David C Niemi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Improving High-Load Performance with the Ondemand Governor [PATCH ATTACHED]
  2010-09-17  9:25   ` Thomas Renninger
@ 2010-09-17 13:45     ` David C Niemi
  2010-09-18 10:13       ` [linux-pm] " Sripathy, Vishwanath
  2010-09-18 10:13       ` Sripathy, Vishwanath
  2010-09-17 13:45     ` David C Niemi
                       ` (2 subsequent siblings)
  3 siblings, 2 replies; 15+ messages in thread
From: David C Niemi @ 2010-09-17 13:45 UTC (permalink / raw)
  To: cpufreq; +Cc: Thomas Renninger, discuss, linux-pm, Arjan van de Ven

Thomas Renninger wrote:
> On Thursday 16 September 2010 22:39:48 David C Niemi wrote:
>   
>> I've been doing more testing, and have a couple of observations.  I'm 
>> attaching a minimal form of my changes as a patch for the latest 
>> 2.6.pre36 git version of the driver.  However, it is difficult for me to 
>> test under anything other than 2.6.32 (RHEL 6 beta 2), and there are 
>> some minor differences, though I don't believe they are relevant to my 
>> results.
>>     
> ...
> Adrian van dev Van "pre-announced" changes in the cpufreq area about
> half a year ago:
>   
I saw his message.  I expect substantial changes are needed in the long 
run, but good alternatives to the Ondemand governor are not ready yet 
and will have to go through a long period of testing on many kinds of 
hardware.  The patch I sent is much more tactical in nature.  It 
intended to be a light-touch, low-risk change, adding one tunable (under 
a name that existed previously in the Conservative governor) and without 
changing default behavior in any way.

> http://www.betriebssysteme.org/Aktivitaeten/Treffen/2009-Bommerholz/Programm/docs/Talks/richling.pdf
>   
Thanks for the link.  I think integration with the scheduler makes a lot 
of sense in the long run.  I see that particular paper as being a bit 
one-dimensional, though:
- It focused energy consumption and performance while completing a 
defined task, not power consumption on a mix of tasks and idle time.  
Energy consumed in a defined task is an interesting data point, but not 
even close to the only one; power consumption while in idle or switching 
in and out of idle is how most of our CPU cores spend most of their time.
- There is no inherent reason the Ondemand governor should be inferior 
to the Performance governor on long-running tasks (at least with my patch).
- They only looked at AMD hardware.  Intel CPUs behave a lot 
differently, relying a lot more on C-States than P-States for power 
savings, and they may differ in other ways too.
- There will need to be some tunables, even with a very smart governor 
integrated with the scheduler.  For example, where along the 
performance/power consumption tradeoff should the scheduler/governor be 
aiming?  Should it be optimizing for single-thread or many-thread 
performance?  Should it try to shut down a whole CPU (or core) 
completely whenever possible, or keep everything running in active 
idle?  How important is it to react quickly at the onset of load?
- Ultimately we need to know something about which P-states do the most 
work per unit energy, and that is not going to be the same for every 
CPU.  I'm skeptical having a wide range of P-States makes much sense.  
There should perhaps be 3 states only per core: (A) minimum power active 
idle, (B) maximum efficiency in terms of work done per unit energy, and 
(C) maximum performance with no regard for energy consumption per se.  
There are certain special steady-state workloads where an intermediate 
power state is truly helpful, like Blu-Ray playback, but that one in 
particular is being taken on by firmware over time, and I'm not sure 
they are worth optimizing for.
- Ideally the hardware/firmware should have the task of making sure it 
doesn't burn itself up, managing voltages and turning things on/off 
appropriate for each P-state and/or C-state, giving the operating system 
visibility into what is going on with respect to power consumption and 
states, and otherwise following orders from the operating system about 
what needs to be done.  I think some implementations have gone too far 
in the direction of trying to implement governor-like smarts into the 
firmware or CPU, while inherently lacking the operating system's more 
complete view of what is trying to be accomplished.

> Interesting is:
> ---------------------
> I've testing on a dual Xeon X5680 system
> (other times I've been testing on 2-year-old dual Opterons).
> I observe about a 10W power consumption reduction at idle between the 
> "performance" governor and the "ondemand" governor.
> ---------------------
> On the Opteron or Xeon system? That would mean that reducing frequency
> from OS still is an important power consumption knob even on latest Westmere
> systems.
>   

That was on the 32nm (Westmere) CPUs, with hyperthreading on.  On 
Opterons power consumption differences between Performance and Ondemand 
are much larger, like I mentioned AMD and Intel behave a lot differently 
here.  They also change behavior over time -- older Intel CPUs 
(Woodcrest) had almost negligible power consumption differences by 
changing clock speed, and some of them were not even capable of changing 
clock speed at all.  AMD has tended to allow very slow idle states, 
around 1 GHz, while Intel's minimum is at 1.596 GHz; but Intel has been 
more aggressive about shutting off inactive parts of caches and cores.

So anyway, I believe the Ondemand governor will continue to have a lot 
of relevance for another year at least, until a replacement is (a) fully 
implemented, (b) widely tested, and (c) works its way downstream to 
distributions.  Without something like this patch, I'll be stuck with 
the Performance governor in the mean time, which is far worse.

David C Niemi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Improving High-Load Performance with the Ondemand Governor [PATCH ATTACHED]
  2010-09-17  9:25   ` Thomas Renninger
                       ` (2 preceding siblings ...)
  2010-09-17 13:46     ` Arjan van de Ven
@ 2010-09-17 13:46     ` Arjan van de Ven
  3 siblings, 0 replies; 15+ messages in thread
From: Arjan van de Ven @ 2010-09-17 13:46 UTC (permalink / raw)
  To: Thomas Renninger; +Cc: discuss, linux-pm, David C Niemi, cpufreq

On Fri, 17 Sep 2010 11:25:44 +0200
Thomas Renninger <trenn@suse.de> wrote:

> On the Opteron or Xeon system? That would mean that reducing frequency
> from OS still is an important power consumption knob even on latest
> Westmere systems.

it is for staying out of the turbo range. the turbo range is not power
efficient (but good for performance)

below turbo, the actual impact is a LOT less....

-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Improving High-Load Performance with the Ondemand Governor [PATCH ATTACHED]
  2010-09-17  9:25   ` Thomas Renninger
  2010-09-17 13:45     ` David C Niemi
  2010-09-17 13:45     ` David C Niemi
@ 2010-09-17 13:46     ` Arjan van de Ven
  2010-09-17 13:46     ` Arjan van de Ven
  3 siblings, 0 replies; 15+ messages in thread
From: Arjan van de Ven @ 2010-09-17 13:46 UTC (permalink / raw)
  To: Thomas Renninger; +Cc: David C Niemi, cpufreq, discuss, linux-pm

On Fri, 17 Sep 2010 11:25:44 +0200
Thomas Renninger <trenn@suse.de> wrote:

> On the Opteron or Xeon system? That would mean that reducing frequency
> from OS still is an important power consumption knob even on latest
> Westmere systems.

it is for staying out of the turbo range. the turbo range is not power
efficient (but good for performance)

below turbo, the actual impact is a LOT less....

-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Improving High-Load Performance with the Ondemand Governor [PATCH ATTACHED]
  2010-09-17 13:45     ` David C Niemi
  2010-09-18 10:13       ` [linux-pm] " Sripathy, Vishwanath
@ 2010-09-18 10:13       ` Sripathy, Vishwanath
  1 sibling, 0 replies; 15+ messages in thread
From: Sripathy, Vishwanath @ 2010-09-18 10:13 UTC (permalink / raw)
  To: David C Niemi, cpufreq; +Cc: discuss, linux-pm, Arjan van de Ven

David,

> -----Original Message-----
> From: linux-pm-bounces@lists.linux-foundation.org [mailto:linux-pm-
> bounces@lists.linux-foundation.org] On Behalf Of David C Niemi
> Sent: Friday, September 17, 2010 7:15 PM
> To: cpufreq@vger.kernel.org
> Cc: discuss@lesswatts.org; linux-pm@lists.linux-foundation.org; Arjan van de Ven
> Subject: Re: [linux-pm] Improving High-Load Performance with the Ondemand
> Governor [PATCH ATTACHED]
> 
> Thomas Renninger wrote:
> > On Thursday 16 September 2010 22:39:48 David C Niemi wrote:
> >
> >> I've been doing more testing, and have a couple of observations.  I'm
> >> attaching a minimal form of my changes as a patch for the latest
> >> 2.6.pre36 git version of the driver.  However, it is difficult for me to
> >> test under anything other than 2.6.32 (RHEL 6 beta 2), and there are
> >> some minor differences, though I don't believe they are relevant to my
> >> results.
> >>
> > ...
> > Adrian van dev Van "pre-announced" changes in the cpufreq area about
> > half a year ago:
> >
> I saw his message.  I expect substantial changes are needed in the long
> run, but good alternatives to the Ondemand governor are not ready yet
> and will have to go through a long period of testing on many kinds of
> hardware.  The patch I sent is much more tactical in nature.  It
> intended to be a light-touch, low-risk change, adding one tunable (under
> a name that existed previously in the Conservative governor) and without
> changing default behavior in any way.
> 
> > http://www.betriebssysteme.org/Aktivitaeten/Treffen/2009-
> Bommerholz/Programm/docs/Talks/richling.pdf
> >
> Thanks for the link.  I think integration with the scheduler makes a lot
> of sense in the long run.  I see that particular paper as being a bit
> one-dimensional, though:
> - It focused energy consumption and performance while completing a
> defined task, not power consumption on a mix of tasks and idle time.
> Energy consumed in a defined task is an interesting data point, but not
> even close to the only one; power consumption while in idle or switching
> in and out of idle is how most of our CPU cores spend most of their time.
> - There is no inherent reason the Ondemand governor should be inferior
> to the Performance governor on long-running tasks (at least with my patch).
> - They only looked at AMD hardware.  Intel CPUs behave a lot
> differently, relying a lot more on C-States than P-States for power
> savings, and they may differ in other ways too.
> - There will need to be some tunables, even with a very smart governor
> integrated with the scheduler.  For example, where along the
> performance/power consumption tradeoff should the scheduler/governor be
> aiming?  Should it be optimizing for single-thread or many-thread
> performance?  Should it try to shut down a whole CPU (or core)
> completely whenever possible, or keep everything running in active
> idle?  How important is it to react quickly at the onset of load?
> - Ultimately we need to know something about which P-states do the most
> work per unit energy, and that is not going to be the same for every
> CPU.  I'm skeptical having a wide range of P-States makes much sense.
> There should perhaps be 3 states only per core: (A) minimum power active
> idle, (B) maximum efficiency in terms of work done per unit energy, and
> (C) maximum performance with no regard for energy consumption per se.
> There are certain special steady-state workloads where an intermediate
> power state is truly helpful, like Blu-Ray playback, but that one in
> particular is being taken on by firmware over time, and I'm not sure
> they are worth optimizing for.
> - Ideally the hardware/firmware should have the task of making sure it
> doesn't burn itself up, managing voltages and turning things on/off
> appropriate for each P-state and/or C-state, giving the operating system
> visibility into what is going on with respect to power consumption and
> states, and otherwise following orders from the operating system about
> what needs to be done.  I think some implementations have gone too far
> in the direction of trying to implement governor-like smarts into the
> firmware or CPU, while inherently lacking the operating system's more
> complete view of what is trying to be accomplished.
> 
> > Interesting is:
> > ---------------------
> > I've testing on a dual Xeon X5680 system
> > (other times I've been testing on 2-year-old dual Opterons).
> > I observe about a 10W power consumption reduction at idle between the
> > "performance" governor and the "ondemand" governor.
> > ---------------------
> > On the Opteron or Xeon system? That would mean that reducing frequency
> > from OS still is an important power consumption knob even on latest Westmere
> > systems.
> >
> 
> That was on the 32nm (Westmere) CPUs, with hyperthreading on.  On
> Opterons power consumption differences between Performance and Ondemand
> are much larger, like I mentioned AMD and Intel behave a lot differently
> here.  They also change behavior over time -- older Intel CPUs
> (Woodcrest) had almost negligible power consumption differences by
> changing clock speed, and some of them were not even capable of changing
> clock speed at all.  AMD has tended to allow very slow idle states,
> around 1 GHz, while Intel's minimum is at 1.596 GHz; but Intel has been
> more aggressive about shutting off inactive parts of caches and cores.
> 
> So anyway, I believe the Ondemand governor will continue to have a lot
> of relevance for another year at least, until a replacement is (a) fully
> implemented, (b) widely tested, and (c) works its way downstream to
> distributions.  Without something like this patch, I'll be stuck with
> the Performance governor in the mean time, which is far worse.

Do you mind sharing this patch again? The subject says patch attached, but could not find this patch in the mailing list :-(

Thanks
Vishwa
> 
> David C Niemi
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/linux-pm

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [linux-pm] Improving High-Load Performance with the Ondemand Governor [PATCH ATTACHED]
  2010-09-17 13:45     ` David C Niemi
@ 2010-09-18 10:13       ` Sripathy, Vishwanath
  2010-09-18 10:13       ` Sripathy, Vishwanath
  1 sibling, 0 replies; 15+ messages in thread
From: Sripathy, Vishwanath @ 2010-09-18 10:13 UTC (permalink / raw)
  To: David C Niemi, cpufreq; +Cc: discuss, linux-pm, Arjan van de Ven

David,

> -----Original Message-----
> From: linux-pm-bounces@lists.linux-foundation.org [mailto:linux-pm-
> bounces@lists.linux-foundation.org] On Behalf Of David C Niemi
> Sent: Friday, September 17, 2010 7:15 PM
> To: cpufreq@vger.kernel.org
> Cc: discuss@lesswatts.org; linux-pm@lists.linux-foundation.org; Arjan van de Ven
> Subject: Re: [linux-pm] Improving High-Load Performance with the Ondemand
> Governor [PATCH ATTACHED]
> 
> Thomas Renninger wrote:
> > On Thursday 16 September 2010 22:39:48 David C Niemi wrote:
> >
> >> I've been doing more testing, and have a couple of observations.  I'm
> >> attaching a minimal form of my changes as a patch for the latest
> >> 2.6.pre36 git version of the driver.  However, it is difficult for me to
> >> test under anything other than 2.6.32 (RHEL 6 beta 2), and there are
> >> some minor differences, though I don't believe they are relevant to my
> >> results.
> >>
> > ...
> > Adrian van dev Van "pre-announced" changes in the cpufreq area about
> > half a year ago:
> >
> I saw his message.  I expect substantial changes are needed in the long
> run, but good alternatives to the Ondemand governor are not ready yet
> and will have to go through a long period of testing on many kinds of
> hardware.  The patch I sent is much more tactical in nature.  It
> intended to be a light-touch, low-risk change, adding one tunable (under
> a name that existed previously in the Conservative governor) and without
> changing default behavior in any way.
> 
> > http://www.betriebssysteme.org/Aktivitaeten/Treffen/2009-
> Bommerholz/Programm/docs/Talks/richling.pdf
> >
> Thanks for the link.  I think integration with the scheduler makes a lot
> of sense in the long run.  I see that particular paper as being a bit
> one-dimensional, though:
> - It focused energy consumption and performance while completing a
> defined task, not power consumption on a mix of tasks and idle time.
> Energy consumed in a defined task is an interesting data point, but not
> even close to the only one; power consumption while in idle or switching
> in and out of idle is how most of our CPU cores spend most of their time.
> - There is no inherent reason the Ondemand governor should be inferior
> to the Performance governor on long-running tasks (at least with my patch).
> - They only looked at AMD hardware.  Intel CPUs behave a lot
> differently, relying a lot more on C-States than P-States for power
> savings, and they may differ in other ways too.
> - There will need to be some tunables, even with a very smart governor
> integrated with the scheduler.  For example, where along the
> performance/power consumption tradeoff should the scheduler/governor be
> aiming?  Should it be optimizing for single-thread or many-thread
> performance?  Should it try to shut down a whole CPU (or core)
> completely whenever possible, or keep everything running in active
> idle?  How important is it to react quickly at the onset of load?
> - Ultimately we need to know something about which P-states do the most
> work per unit energy, and that is not going to be the same for every
> CPU.  I'm skeptical having a wide range of P-States makes much sense.
> There should perhaps be 3 states only per core: (A) minimum power active
> idle, (B) maximum efficiency in terms of work done per unit energy, and
> (C) maximum performance with no regard for energy consumption per se.
> There are certain special steady-state workloads where an intermediate
> power state is truly helpful, like Blu-Ray playback, but that one in
> particular is being taken on by firmware over time, and I'm not sure
> they are worth optimizing for.
> - Ideally the hardware/firmware should have the task of making sure it
> doesn't burn itself up, managing voltages and turning things on/off
> appropriate for each P-state and/or C-state, giving the operating system
> visibility into what is going on with respect to power consumption and
> states, and otherwise following orders from the operating system about
> what needs to be done.  I think some implementations have gone too far
> in the direction of trying to implement governor-like smarts into the
> firmware or CPU, while inherently lacking the operating system's more
> complete view of what is trying to be accomplished.
> 
> > Interesting is:
> > ---------------------
> > I've testing on a dual Xeon X5680 system
> > (other times I've been testing on 2-year-old dual Opterons).
> > I observe about a 10W power consumption reduction at idle between the
> > "performance" governor and the "ondemand" governor.
> > ---------------------
> > On the Opteron or Xeon system? That would mean that reducing frequency
> > from OS still is an important power consumption knob even on latest Westmere
> > systems.
> >
> 
> That was on the 32nm (Westmere) CPUs, with hyperthreading on.  On
> Opterons power consumption differences between Performance and Ondemand
> are much larger, like I mentioned AMD and Intel behave a lot differently
> here.  They also change behavior over time -- older Intel CPUs
> (Woodcrest) had almost negligible power consumption differences by
> changing clock speed, and some of them were not even capable of changing
> clock speed at all.  AMD has tended to allow very slow idle states,
> around 1 GHz, while Intel's minimum is at 1.596 GHz; but Intel has been
> more aggressive about shutting off inactive parts of caches and cores.
> 
> So anyway, I believe the Ondemand governor will continue to have a lot
> of relevance for another year at least, until a replacement is (a) fully
> implemented, (b) widely tested, and (c) works its way downstream to
> distributions.  Without something like this patch, I'll be stuck with
> the Performance governor in the mean time, which is far worse.

Do you mind sharing this patch again? The subject says patch attached, but could not find this patch in the mailing list :-(

Thanks
Vishwa
> 
> David C Niemi
> _______________________________________________
> linux-pm mailing list
> linux-pm@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/linux-pm

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Improving High-Load Performance with the Ondemand Governor [PATCH ATTACHED]
  2010-09-16 20:39 ` Improving High-Load Performance with the Ondemand Governor [PATCH ATTACHED] David C Niemi
  2010-09-17  9:25   ` Thomas Renninger
  2010-09-17  9:25   ` Thomas Renninger
@ 2010-09-29 18:18   ` Venkatesh Pallipadi
  2 siblings, 0 replies; 15+ messages in thread
From: Venkatesh Pallipadi @ 2010-09-29 18:18 UTC (permalink / raw)
  To: David C Niemi; +Cc: cpufreq

On Thu, Sep 16, 2010 at 1:39 PM, David C Niemi <dniemi@verisign.com> wrote:
> I've been doing more testing, and have a couple of observations.  I'm
> attaching a minimal form of my changes as a patch for the latest 2.6.pre36
> git version of the driver.  However, it is difficult for me to test under
> anything other than 2.6.32 (RHEL 6 beta 2), and there are some minor
> differences, though I don't believe they are relevant to my results.
>
> It looks like "io_is_busy" set to 1 is quite beneficial for quickly reacting
> the onset of load.
>
> I do see a lot of downshifting from the top speed when a core is at "100%"
> CPU, presumably this means little stalls and lulls, so I expect
> "sampling_down_factor" values greater than 1 continue to be useful and the
> sampling_down_factor continues to be desirable.
>
> I've testing on a dual Xeon X5680 system (other times I've been testing on
> 2-year-old dual Opterons).
>
> I observe about a 10W power consumption reduction at idle between the
> "performance" governor and the "ondemand" governor.  I've seen even bigger
> differences under load, as much as 40 watts, though that could be associated
> with some performance differences.  I haven't tried to quantify the effect
> of the sampling_down_factor tunable on power consumption under load,
> presumably it increases it, but its usage is voluntary and that is to be
> expected.
>
> I have been unable to find a value of up_threshold that does not switch
> frequency on at least one core pretty frequently (ranging a couple of times
> a minute to several times a second).  However, with fairly fast sampling
> intervals (10000 to 50000) I see pretty quick reaction to load even with
> UP_THRESHOLD set high (e.g. 50 or even 95).  So it is likely my previous
> efforts to extend the possible values of UP_THRESHOLD from 11 to 5 are no
> longer necessary, and are not included in the attached patch.  There are
> other things I would like to consider doing, however, that I'll bring up
> afterwards, but not in this minimal patch.
>

I do see this change in the patch. From the comment above, you did not
want that change?

 #define MICRO_FREQUENCY_UP_THRESHOLD		(95)
 #define MICRO_FREQUENCY_MIN_SAMPLE_RATE		(10000)
-#define MIN_FREQUENCY_UP_THRESHOLD		(11)
+#define MIN_FREQUENCY_UP_THRESHOLD		(5)
 #define MAX_FREQUENCY_UP_THRESHOLD		(100)

The problem with 5 is that when DEF_FREQUENCY_DOWN_DIFFERENTIAL is
used, up_threshold - down_differential calculation in the code can end
up being negative, which is not good.

One minor comment.
+
+	mutex_lock(&dbs_mutex);
+	if (ret != 1 || input > MAX_SAMPLING_DOWN_FACTOR || input < 1) {
+		mutex_unlock(&dbs_mutex);
+		return -EINVAL;
+	}
+
+	dbs_tuners_ins.sampling_down_factor = input;

You can move mutex_lock after input validation to make it a bit more clean.

Otherwise patch looks good. I agree that having sampling_down_factor
as a tunable will be useful under some situations.

Thanks,
Venki

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2010-09-29 18:18 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-09 14:28 Improving High-Load Performance with the Ondemand Governor David C Niemi
2010-09-10  7:40 ` Andi Kleen
2010-09-13 20:18   ` David C Niemi
2010-09-13 20:54     ` Andi Kleen
2010-09-13 22:02       ` David C Niemi
2010-09-16 20:39 ` Improving High-Load Performance with the Ondemand Governor [PATCH ATTACHED] David C Niemi
2010-09-17  9:25   ` Thomas Renninger
2010-09-17  9:25   ` Thomas Renninger
2010-09-17 13:45     ` David C Niemi
2010-09-18 10:13       ` [linux-pm] " Sripathy, Vishwanath
2010-09-18 10:13       ` Sripathy, Vishwanath
2010-09-17 13:45     ` David C Niemi
2010-09-17 13:46     ` Arjan van de Ven
2010-09-17 13:46     ` Arjan van de Ven
2010-09-29 18:18   ` Venkatesh Pallipadi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.