linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] cpufreq_ondemand
@ 2004-10-17 22:29 Alexander Clouter
  2004-10-17 22:35 ` Con Kolivas
  2004-10-18  7:20 ` Dominik Brodowski
  0 siblings, 2 replies; 17+ messages in thread
From: Alexander Clouter @ 2004-10-17 22:29 UTC (permalink / raw)
  To: venkatesh.pallipadi, cpufreq; +Cc: linux-kernel


[-- Attachment #1.1: Type: text/plain, Size: 2944 bytes --]

Hi all,

After playing with the cpufreq_ondemand governor (many thanks to those whom 
made it) I made a number of alterations which suit me at least.  Really 
looking for feedback and of course once people have fixed any bugs they find 
and made the code look neater, possible inclusion?

The improvements (well I think they are) I have made:

1. I have replaced the algoritm it used to one which calculates the number of
	cpu idle cycles that have passed and compares it to the number of cpu
	cycles it would have expected to pass (for, the defaults, 20%/80%)

	this means a couple of divisions have been removed, which is always 
	nice and it lead to clearer code (for me at least), that was 
	until I added the handful of 'if' conditionals though.... :-/

2. controllable through 
	/sys/.../ondemand/ignore_nice, you can tell it to consider 'nice' 
	time as also idle cpu cycles.  Set it to '1' to treat 'nice' as cpu 
	in an active state.

3. (major) the scaling up and down of the cpufreq is now smoother.  I found 
	it really nasty that if it tripped < 20% idle time that the freq was 
	set to 100%.  This code smoothly increases the cpufreq as well as 
	doing a better job of decreasing it too

4. (minor) I changed DEF_SAMPLING_RATE_LATENCY_MULTIPLIER to 50000 and
	DEF_SAMPLING_DOWN_FACTOR to 5 as I found the defaults a bit annoying 
	on my system and resulted in the cpufreq constantly jumping.

	For my patch it works far better if the sampling rate is much lower 
	anyway, which can only be good for cpu efficiency in the long run

5. the grainity of how much cpufreq is increased or decreased is controlled 
	with sending a percentage to /sys/.../ondemand/freq_step_percent

6. debugging (with 'watch -n1 cat /sys/.../ondemand/requested_freq') and 
	backwards 'compatibility' to act like the 'userspace' governor is 
	avaliable with /sys/.../ondemand/requested_freq if 
	'freq_step_percent' is set to zero

7. there are extra checks to not bother to try increasing/decreasing the 
	cpufreq if there is nothing to do, or even can be done as it might 
	already be at min/max (or freq_step_percent is zero)

The code seems to work for me fine.  This is my first patch and the first 
thing I have really posted here so be gentle with me :)

Comments and improvements are of course more than welcome.

Of course full thanks go to all the original authors, my C coding is naff and 
I would of not been able to do this if it was not for the pretty much 
complete (for my needs) cpufreq_ondemand module; Venkatesh did say we could 
rip out the core algorithm and replace it with our own easily, he was right 
:)

Cheers

Alex

-- 
 ___________________________________ 
< Two is company, three is an orgy. >
 ----------------------------------- 
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

[-- Attachment #1.2: updated-ondemand.diff --]
[-- Type: text/plain, Size: 10609 bytes --]

diff -u -U 2 -r -N -d linux-2.6.9-rc4.orig/drivers/cpufreq/cpufreq_ondemand.c linux-2.6.9-rc4/drivers/cpufreq/cpufreq_ondemand.c
--- linux-2.6.9-rc4.orig/drivers/cpufreq/cpufreq_ondemand.c	2004-10-11 03:58:49.000000000 +0100
+++ linux-2.6.9-rc4/drivers/cpufreq/cpufreq_ondemand.c	2004-10-17 18:32:28.000000000 +0100
@@ -56,8 +56,8 @@
 static unsigned int 				def_sampling_rate;
 #define MIN_SAMPLING_RATE			(def_sampling_rate / 2)
 #define MAX_SAMPLING_RATE			(500 * def_sampling_rate)
-#define DEF_SAMPLING_RATE_LATENCY_MULTIPLIER	(1000)
-#define DEF_SAMPLING_DOWN_FACTOR		(10)
+#define DEF_SAMPLING_RATE_LATENCY_MULTIPLIER	(50000)
+#define DEF_SAMPLING_DOWN_FACTOR		(5)
 #define TRANSITION_LATENCY_LIMIT		(10 * 1000)
 #define sampling_rate_in_HZ(x)			(((x * HZ) < (1000 * 1000))?1:((x * HZ) / (1000 * 1000)))
 
@@ -65,8 +65,8 @@
 
 struct cpu_dbs_info_s {
 	struct cpufreq_policy 	*cur_policy;
-	unsigned int 		prev_cpu_idle_up;
-	unsigned int 		prev_cpu_idle_down;
+	unsigned int 		prev_cpu_ticks;
+	unsigned int		prev_cpu_idle_ticks;
 	unsigned int 		enable;
 };
 static DEFINE_PER_CPU(struct cpu_dbs_info_s, cpu_dbs_info);
@@ -81,6 +81,9 @@
 	unsigned int		sampling_down_factor;
 	unsigned int		up_threshold;
 	unsigned int		down_threshold;
+	unsigned int		requested_freq;
+	unsigned int		freq_step_percent;
+	unsigned int		ignore_nice;
 };
 
 struct dbs_tuners dbs_tuners_ins = {
@@ -116,6 +119,22 @@
 {									\
 	return sprintf(buf, "%u\n", dbs_tuners_ins.object);		\
 }
+
+static ssize_t show_requested_freq(struct cpufreq_policy *policy, char *buf)
+{
+	return sprintf (buf, "%u\n", dbs_tuners_ins.requested_freq);
+}
+
+static ssize_t show_freq_step_percent(struct cpufreq_policy *policy, char *buf)
+{
+	return sprintf (buf, "%u\n", dbs_tuners_ins.freq_step_percent);
+}
+
+static ssize_t show_ignore_nice(struct cpufreq_policy *policy, char *buf)
+{
+	return sprintf (buf, "%u\n", dbs_tuners_ins.ignore_nice);
+}
+
 show_one(sampling_rate, sampling_rate);
 show_one(sampling_down_factor, sampling_down_factor);
 show_one(up_threshold, up_threshold);
@@ -189,6 +208,63 @@
 	return count;
 }
 
+static ssize_t store_ignore_nice(struct cpufreq_policy *unused,
+		const char *buf, size_t count)
+{
+	unsigned int input;
+	int ret;
+	ret = sscanf (buf, "%u", &input);
+	down(&dbs_sem);
+	if ( ret == 1 ) {
+		if ( input > 1 )
+			input = 1;
+		dbs_tuners_ins.ignore_nice = input;
+	}
+	up(&dbs_sem);
+	return count;
+}
+
+static ssize_t store_freq_step_percent(struct cpufreq_policy *unused,
+		const char *buf, size_t count)
+{
+	unsigned int input;
+	int ret;
+	ret = sscanf (buf, "%u", &input);
+	down(&dbs_sem);
+	if ( ret == 1 ) {
+		/* someone might find 'freq_step_percent = 0' useful so this is
+		 * why I have added support to manually set the freq also; I
+		 * guess this would then permit a userland tool to jump in
+		 * without rmmod/insmod'ing.  show/store_requested_freq is also
+		 * darn handy for debugging
+		 */
+		if ( input > 100 )
+			input = 100;
+		dbs_tuners_ins.freq_step_percent = input;
+	}
+	up(&dbs_sem);
+	return count;
+}
+
+static ssize_t store_requested_freq(struct cpufreq_policy *policy,
+		const char *buf, size_t count)
+{
+	unsigned int input;
+	int ret;
+	ret = sscanf (buf, "%u", &input);
+	down(&dbs_sem);
+	if ( ret == 1 ) {
+		if ( input < policy->min )
+			input = policy->min;
+		if ( input > policy->max )
+			input = policy->max;
+		dbs_tuners_ins.requested_freq = input;
+		__cpufreq_driver_target(policy, input, CPUFREQ_RELATION_H);
+	}
+	up(&dbs_sem);
+	return count;
+}
+
 #define define_one_rw(_name) 					\
 static struct freq_attr _name = { 				\
 	.attr = { .name = __stringify(_name), .mode = 0644 }, 	\
@@ -200,6 +276,9 @@
 define_one_rw(sampling_down_factor);
 define_one_rw(up_threshold);
 define_one_rw(down_threshold);
+define_one_rw(requested_freq);
+define_one_rw(freq_step_percent);
+define_one_rw(ignore_nice);
 
 static struct attribute * dbs_attributes[] = {
 	&sampling_rate_max.attr,
@@ -208,6 +287,9 @@
 	&sampling_down_factor.attr,
 	&up_threshold.attr,
 	&down_threshold.attr,
+	&requested_freq.attr,
+	&freq_step_percent.attr,
+	&ignore_nice.attr,
 	NULL
 };
 
@@ -220,10 +302,9 @@
 
 static void dbs_check_cpu(int cpu)
 {
-	unsigned int idle_ticks, up_idle_ticks, down_idle_ticks;
-	unsigned int total_idle_ticks;
-	unsigned int freq_down_step;
-	unsigned int freq_down_sampling_rate;
+	unsigned int total_ticks, total_idle_ticks;
+	unsigned int ticks, idle_ticks;
+	unsigned int freq_step;
 	static int down_skip[NR_CPUS];
 	struct cpu_dbs_info_s *this_dbs_info;
 
@@ -242,26 +323,82 @@
 	 *
 	 * Any frequency increase takes it to the maximum frequency. 
 	 * Frequency reduction happens at minimum steps of 
-	 * 5% of max_frequency 
+	 * 5% (default) of max_frequency 
+	 *
+	 * My modified routine compares the number of idle ticks with the
+	 * expected number of idle ticks for the boundaries and acts accordingly
+	 * - Alexander Clouter <alex-kernel@digriz.org.uk>
 	 */
-	/* Check for frequency increase */
-	total_idle_ticks = kstat_cpu(cpu).cpustat.idle +
+
+	/* get various cpu stats */
+	total_ticks =
+		kstat_cpu(cpu).cpustat.user +
+		kstat_cpu(cpu).cpustat.nice +
+		kstat_cpu(cpu).cpustat.system +
+		kstat_cpu(cpu).cpustat.softirq +
+		kstat_cpu(cpu).cpustat.irq +
+		kstat_cpu(cpu).cpustat.idle +
+		kstat_cpu(cpu).cpustat.iowait;
+	total_idle_ticks =
+		kstat_cpu(cpu).cpustat.idle +
 		kstat_cpu(cpu).cpustat.iowait;
-	idle_ticks = total_idle_ticks -
-		this_dbs_info->prev_cpu_idle_up;
-	this_dbs_info->prev_cpu_idle_up = total_idle_ticks;
 
-	/* Scale idle ticks by 100 and compare with up and down ticks */
-	idle_ticks *= 100;
-	up_idle_ticks = (100 - dbs_tuners_ins.up_threshold) *
-			sampling_rate_in_HZ(dbs_tuners_ins.sampling_rate);
+	/* if the /sys says we need to consider nice tasks as 'idle' time too */
+	if (dbs_tuners_ins.ignore_nice == 0)
+		total_idle_ticks += kstat_cpu(cpu).cpustat.nice;
+	
+	ticks = (total_ticks -
+		this_dbs_info->prev_cpu_ticks) * 100;
+	idle_ticks = (total_idle_ticks -
+		this_dbs_info->prev_cpu_idle_ticks) * 100;
+	
+	this_dbs_info->prev_cpu_ticks = total_ticks;
+	this_dbs_info->prev_cpu_idle_ticks = total_idle_ticks;
+	
+	/* nothing to do if we cannot shift the frequency */
+	if (dbs_tuners_ins.freq_step_percent == 0)
+		return;
+	
+	/* checks to see if we have anything to do or can do and breaks out if:
+	 *  - we are within the 20% <-> 80% region
+	 *  - if the cpu freq needs increasing we are not already at max
+	 *  - if the cpu freq needs decreasing we are not already at min
+	 *
+	 *  you have to love those parentheses.... :)
+	 */
+	if (!( ( (ticks-idle_ticks) > (dbs_tuners_ins.up_threshold*idle_ticks)
+			&& dbs_tuners_ins.requested_freq
+				!= this_dbs_info->cur_policy->max
+	       )
+  	    || ( (ticks-idle_ticks) < (dbs_tuners_ins.down_threshold*idle_ticks)
+			&& dbs_tuners_ins.requested_freq
+				!= this_dbs_info->cur_policy->min
+	       ) ) )
+		return;
 
-	if (idle_ticks < up_idle_ticks) {
+	/* max freq cannot be less than 100. But who knows.... */
+	if (unlikely(this_dbs_info->cur_policy->max < 100)) {
+		freq_step = dbs_tuners_ins.freq_step_percent;
+	} else {
+		freq_step = (dbs_tuners_ins.freq_step_percent *
+				this_dbs_info->cur_policy->max) / 100;
+	}
+
+	/* Check for frequency increase */
+	if ( (ticks-idle_ticks) > (dbs_tuners_ins.up_threshold*idle_ticks) ) {
+		dbs_tuners_ins.requested_freq += freq_step;
+		if (dbs_tuners_ins.requested_freq >
+				this_dbs_info->cur_policy->max)
+			dbs_tuners_ins.requested_freq =
+				this_dbs_info->cur_policy->max;
+
+		/* printk("up: %u->%u\n",
+				this_dbs_info->cur_policy->cur,
+				dbs_tuners_ins.requested_freq); */
 		__cpufreq_driver_target(this_dbs_info->cur_policy,
-			this_dbs_info->cur_policy->max, 
-			CPUFREQ_RELATION_H);
+        	       	dbs_tuners_ins.requested_freq,
+        	       	CPUFREQ_RELATION_H);
 		down_skip[cpu] = 0;
-		this_dbs_info->prev_cpu_idle_down = total_idle_ticks;
 		return;
 	}
 
@@ -270,27 +407,19 @@
 	if (down_skip[cpu] < dbs_tuners_ins.sampling_down_factor)
 		return;
 
-	idle_ticks = total_idle_ticks -
-		this_dbs_info->prev_cpu_idle_down;
-	/* Scale idle ticks by 100 and compare with up and down ticks */
-	idle_ticks *= 100;
 	down_skip[cpu] = 0;
-	this_dbs_info->prev_cpu_idle_down = total_idle_ticks;
-
-	freq_down_sampling_rate = dbs_tuners_ins.sampling_rate *
-		dbs_tuners_ins.sampling_down_factor;
-	down_idle_ticks = (100 - dbs_tuners_ins.down_threshold) *
-			sampling_rate_in_HZ(freq_down_sampling_rate);
-
-	if (idle_ticks > down_idle_ticks ) {
-		freq_down_step = (5 * this_dbs_info->cur_policy->max) / 100;
-
-		/* max freq cannot be less than 100. But who knows.... */
-		if (unlikely(freq_down_step == 0))
-			freq_down_step = 5;
-
+	if ( (ticks-idle_ticks) < (dbs_tuners_ins.down_threshold*idle_ticks) ) {
+		dbs_tuners_ins.requested_freq -= freq_step;
+		if (dbs_tuners_ins.requested_freq <
+				this_dbs_info->cur_policy->min)
+			dbs_tuners_ins.requested_freq =
+				this_dbs_info->cur_policy->min;
+		
+		/* printk("down: %u->%u\n",
+				this_dbs_info->cur_policy->cur,
+				dbs_tuners_ins.requested_freq); */
 		__cpufreq_driver_target(this_dbs_info->cur_policy,
-			this_dbs_info->cur_policy->cur - freq_down_step, 
+			dbs_tuners_ins.requested_freq, 
 			CPUFREQ_RELATION_H);
 		return;
 	}
@@ -344,10 +473,16 @@
 		down(&dbs_sem);
 		this_dbs_info->cur_policy = policy;
 		
-		this_dbs_info->prev_cpu_idle_up = 
+		this_dbs_info->prev_cpu_ticks =
+				kstat_cpu(cpu).cpustat.user +
+				kstat_cpu(cpu).cpustat.nice +
+				kstat_cpu(cpu).cpustat.system +
+				kstat_cpu(cpu).cpustat.softirq +
+				kstat_cpu(cpu).cpustat.irq +
 				kstat_cpu(cpu).cpustat.idle +
 				kstat_cpu(cpu).cpustat.iowait;
-		this_dbs_info->prev_cpu_idle_down = 
+		this_dbs_info->prev_cpu_idle_ticks = 
+				kstat_cpu(cpu).cpustat.nice +
 				kstat_cpu(cpu).cpustat.idle +
 				kstat_cpu(cpu).cpustat.iowait;
 		this_dbs_info->enable = 1;
@@ -368,7 +503,10 @@
 			def_sampling_rate = (latency / 1000) *
 					DEF_SAMPLING_RATE_LATENCY_MULTIPLIER;
 			dbs_tuners_ins.sampling_rate = def_sampling_rate;
-
+			dbs_tuners_ins.requested_freq
+				= this_dbs_info->cur_policy->cur;
+			dbs_tuners_ins.freq_step_percent = 5;
+			dbs_tuners_ins.ignore_nice = 0;
 			dbs_timer_init();
 		}
 		

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cpufreq_ondemand
  2004-10-17 22:29 [PATCH] cpufreq_ondemand Alexander Clouter
@ 2004-10-17 22:35 ` Con Kolivas
  2004-10-17 22:44   ` Alexander Clouter
                     ` (2 more replies)
  2004-10-18  7:20 ` Dominik Brodowski
  1 sibling, 3 replies; 17+ messages in thread
From: Con Kolivas @ 2004-10-17 22:35 UTC (permalink / raw)
  To: Alexander Clouter; +Cc: venkatesh.pallipadi, cpufreq, linux-kernel

Alexander Clouter wrote:
>> 3. (major) the scaling up and down of the cpufreq is now smoother.  I found 
> 	it really nasty that if it tripped < 20% idle time that the freq was 
> 	set to 100%.  This code smoothly increases the cpufreq as well as 
> 	doing a better job of decreasing it too

I'd much prefer it shot up to 100% or else every time the cpu usage went 
up there'd be an obvious lag till the machine ran at it's capable speed. 
  I very much doubt the small amount of time it spent at 100% speed with 
the default design would decrease the battery life significantly as well.

Cheers,
Con

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cpufreq_ondemand
  2004-10-17 22:35 ` Con Kolivas
@ 2004-10-17 22:44   ` Alexander Clouter
  2004-10-19 18:22   ` Bruno Ducrot
  2004-10-20  5:03   ` Andre Eisenbach
  2 siblings, 0 replies; 17+ messages in thread
From: Alexander Clouter @ 2004-10-17 22:44 UTC (permalink / raw)
  To: Con Kolivas; +Cc: venkatesh.pallipadi, cpufreq, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1158 bytes --]

On Oct 18, Con Kolivas wrote:
> 
> I'd much prefer it shot up to 100% or else every time the cpu usage went 
> up there'd be an obvious lag till the machine ran at it's capable speed. 
>  I very much doubt the small amount of time it spent at 100% speed with 
> the default design would decrease the battery life significantly as well.
> 
The issue I found was that if you are running a process that is io bound, for
example, then you may never need to run your cpu at 100%, it will speed up
bit by bit[1] till it gets to a speed that is fast enough to to deal with it
without max'ing the cpufreq.

This is after all exactly want most (if not all) the userspace daemons try to 
do anyway.

Cheers

Alex

[1] also you might find that the task does not last long enough to warrant 
	jumping and lurking at 100% speed anyway

-- 
 _________________________________________ 
/ It's always darkest just before it gets \
\ pitch black.                            /
 ----------------------------------------- 
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cpufreq_ondemand
  2004-10-17 22:29 [PATCH] cpufreq_ondemand Alexander Clouter
  2004-10-17 22:35 ` Con Kolivas
@ 2004-10-18  7:20 ` Dominik Brodowski
  2004-10-18  8:12   ` Mattia Dongili
  2004-10-18  8:25   ` Alexander Clouter
  1 sibling, 2 replies; 17+ messages in thread
From: Dominik Brodowski @ 2004-10-18  7:20 UTC (permalink / raw)
  To: Alexander Clouter; +Cc: venkatesh.pallipadi, cpufreq, linux-kernel

Hi,

On Sun, Oct 17, 2004 at 11:29:16PM +0100, Alexander Clouter wrote:
> After playing with the cpufreq_ondemand governor (many thanks to those whom 
> made it) I made a number of alterations which suit me at least.  Really 
> looking for feedback and of course once people have fixed any bugs they find 
> and made the code look neater, possible inclusion?

Or possibly a "fork" -- different dynamic cpufreq governors aren't a bad
thing to have. Else the whole modular approach would be wrong... So, even
if it doesn't get merged into cpufreq_ondemand, you can maintain it as a
differently named cpufreq governor.


> 2. controllable through 
> 	/sys/.../ondemand/ignore_nice, you can tell it to consider 'nice' 
> 	time as also idle cpu cycles.  Set it to '1' to treat 'nice' as cpu 
> 	in an active state.

Interesting bit, IIRC some userspace tool also does that.

> 4. (minor) I changed DEF_SAMPLING_RATE_LATENCY_MULTIPLIER to 50000 and
> 	DEF_SAMPLING_DOWN_FACTOR to 5 as I found the defaults a bit annoying 
> 	on my system and resulted in the cpufreq constantly jumping.
> 
> 	For my patch it works far better if the sampling rate is much lower 
> 	anyway, which can only be good for cpu efficiency in the long run

However, this means it takes much longer for the system to react to changes
in load... it's a tricky issue.

> 6. debugging (with 'watch -n1 cat /sys/.../ondemand/requested_freq') and 
> 	backwards 'compatibility' to act like the 'userspace' governor is 
> 	avaliable with /sys/.../ondemand/requested_freq if 
> 	'freq_step_percent' is set to zero

Please don't do that. Userspace is the governor for userspace frequency
setting; if you want it, switch to userspace, if you want dynamic frequency
selection, use the original ondemand or your governor.

	Dominik

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cpufreq_ondemand
  2004-10-18  7:20 ` Dominik Brodowski
@ 2004-10-18  8:12   ` Mattia Dongili
  2004-10-18  8:25   ` Alexander Clouter
  1 sibling, 0 replies; 17+ messages in thread
From: Mattia Dongili @ 2004-10-18  8:12 UTC (permalink / raw)
  To: cpufreq, linux-kernel; +Cc: Alexander Clouter, venkatesh.pallipadi

On Mon, Oct 18, 2004 at 09:20:45AM +0200, Dominik Brodowski wrote:
> Hi,
[...]
> > 2. controllable through 
> > 	/sys/.../ondemand/ignore_nice, you can tell it to consider 'nice' 
> > 	time as also idle cpu cycles.  Set it to '1' to treat 'nice' as cpu 
> > 	in an active state.
> 
> Interesting bit, IIRC some userspace tool also does that.

I'm implementing an "nice_scale" parameter in cpufreqd that offers more
control on nice cpu time. It's just a parameter (whose value must be >=
1 or if 0 don't care nice time at all) that tells _how_much_ the nice
time has to be take into consideration. It would be nice to have it in
the ondemand governor too.

bye
-- 
mattia
:wq!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cpufreq_ondemand
  2004-10-18  7:20 ` Dominik Brodowski
  2004-10-18  8:12   ` Mattia Dongili
@ 2004-10-18  8:25   ` Alexander Clouter
  1 sibling, 0 replies; 17+ messages in thread
From: Alexander Clouter @ 2004-10-18  8:25 UTC (permalink / raw)
  To: venkatesh.pallipadi, cpufreq, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3051 bytes --]

Morning all,

On Oct 18, Dominik Brodowski wrote:
> 
> Or possibly a "fork" -- different dynamic cpufreq governors aren't a bad
> thing to have. Else the whole modular approach would be wrong... So, even
> if it doesn't get merged into cpufreq_ondemand, you can maintain it as a
> differently named cpufreq governor.
> 
but but...that ruins my plans for world domination....

> 
> > 2. controllable through 
> > 	/sys/.../ondemand/ignore_nice, you can tell it to consider 'nice' 
> > 	time as also idle cpu cycles.  Set it to '1' to treat 'nice' as cpu 
> > 	in an active state.
> 
> Interesting bit, IIRC some userspace tool also does that.
> 
if I recall they have to munch through the whole of /proc to get this 
information; then again there is probably a clean and fast way of pulling 
those time values from /proc that I do not know of.

> > 4. (minor) I changed DEF_SAMPLING_RATE_LATENCY_MULTIPLIER to 50000 and
> > 	DEF_SAMPLING_DOWN_FACTOR to 5 as I found the defaults a bit annoying 
> > 	on my system and resulted in the cpufreq constantly jumping.
> > 
> > 	For my patch it works far better if the sampling rate is much lower 
> > 	anyway, which can only be good for cpu efficiency in the long run
> 
> However, this means it takes much longer for the system to react to changes
> in load... it's a tricky issue.
> 
its all a case of trade-offs and of course everyones millage will vary.  For
me I want the CPU to slowly get faster and faster as a task might complete
fast enough without vamping it up to 100%.  Then again Con will probably
point out "pah, then the difference in battery saving is negligable" :)

On a laptop (regardless of whether it gives an overall order of magnitude
power saving or not) I would prefer the cpu speed to be as low as possible.  
Again everyone (well here in the UK) I chat to seems to prefer the slow 
increasing method which many of the userspace tools try to do anyway; then of 
course the argument "userland userland userland....".

> > 6. debugging (with 'watch -n1 cat /sys/.../ondemand/requested_freq') and 
> > 	backwards 'compatibility' to act like the 'userspace' governor is 
> > 	avaliable with /sys/.../ondemand/requested_freq if 
> > 	'freq_step_percent' is set to zero
> 
> Please don't do that. Userspace is the governor for userspace frequency
> setting; if you want it, switch to userspace, if you want dynamic frequency
> selection, use the original ondemand or your governor.
> 
I thought a few people would grumble about that.  I needed a way to store the 
variable speed knob and that struct was the best place for it; looks like me 
tarting it up as a 'debugging' feature was not good enough :)

Cheers

Alex

-- 
 ________________________________________ 
/ All articles that coruscate with       \
\ resplendence are not truly auriferous. /
 ---------------------------------------- 
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cpufreq_ondemand
  2004-10-17 22:35 ` Con Kolivas
  2004-10-17 22:44   ` Alexander Clouter
@ 2004-10-19 18:22   ` Bruno Ducrot
  2004-10-20  5:03   ` Andre Eisenbach
  2 siblings, 0 replies; 17+ messages in thread
From: Bruno Ducrot @ 2004-10-19 18:22 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Alexander Clouter, linux-kernel, cpufreq

Hi,

On Mon, Oct 18, 2004 at 08:35:49AM +1000, Con Kolivas wrote:
> Alexander Clouter wrote:
> >>3. (major) the scaling up and down of the cpufreq is now smoother.  I 
> >>found 
> >	it really nasty that if it tripped < 20% idle time that the freq was 
> >	set to 100%.  This code smoothly increases the cpufreq as well as 
> >	doing a better job of decreasing it too
> 
> I'd much prefer it shot up to 100% or else every time the cpu usage went 
> up there'd be an obvious lag till the machine ran at it's capable speed. 
>  I very much doubt the small amount of time it spent at 100% speed with 
> the default design would decrease the battery life significantly as well.
> 

I'm almost ok with your words, but the amd64 do have unacceptable
latency between min and max freq transition, due to the step-by-step
requirements (200MHz IIRC).
Alexander's governor may be then OK for those kind of processors.

-- 
Bruno Ducrot

--  Which is worse:  ignorance or apathy?
--  Don't know.  Don't care.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cpufreq_ondemand
  2004-10-17 22:35 ` Con Kolivas
  2004-10-17 22:44   ` Alexander Clouter
  2004-10-19 18:22   ` Bruno Ducrot
@ 2004-10-20  5:03   ` Andre Eisenbach
  2004-10-20  7:35     ` Len Brown
  2 siblings, 1 reply; 17+ messages in thread
From: Andre Eisenbach @ 2004-10-20  5:03 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Alexander Clouter, venkatesh.pallipadi, cpufreq, linux-kernel

On Mon, 18 Oct 2004 08:35:49 +1000, Con Kolivas <kernel@kolivas.org> wrote:
> I'd much prefer it shot up to 100% or else every time the cpu usage went
> up there'd be an obvious lag till the machine ran at it's capable speed.
>   I very much doubt the small amount of time it spent at 100% speed with
> the default design would decrease the battery life significantly as well.

I like Alexanders idea better and will give it a good try. If the
speed steps down slowly but shoots up 100% quickly (as it is right
now), even a small task (like opening a folder, or scrolling down in a
document) will cause a tiny spike to 100% which takes a while to go
back down. The result is that the CPU spends most of it's time at 100%
or calming down. I wrote a small test program on my notebook which
confirms this.

It's either or. Either you go up AND down slowly (which I would
prefer), or you go up and down immediately. But spiking up and slowly
going back down is not a good combo.

Alex has my vote, even so I have to give if some more testing.

Cheers,
    Andre

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cpufreq_ondemand
  2004-10-20  5:03   ` Andre Eisenbach
@ 2004-10-20  7:35     ` Len Brown
  2004-10-20 14:30       ` Dominik Brodowski
  0 siblings, 1 reply; 17+ messages in thread
From: Len Brown @ 2004-10-20  7:35 UTC (permalink / raw)
  To: Andre Eisenbach; +Cc: Con Kolivas, linux-kernel, Alexander Clouter, cpufreq

On Wed, 2004-10-20 at 01:03, Andre Eisenbach wrote:

> ... If the
> speed steps down slowly but shoots up 100% quickly (as it is right
> now), even a small task (like opening a folder, or scrolling down in a
> document) will cause a tiny spike to 100% which takes a while to go
> back down. The result is that the CPU spends most of it's time at 100%
> or calming down. I wrote a small test program on my notebook which
> confirms this.

The question is what POLICY we're trying to implement.  If the goal is
to to be energy efficient while the user notices no performance hit,
then fast-up/slow-down is an EXCELLENT strategy.  But if the goal is to
optimize for power savings at the cost of impacting performance, then
another strategy may work better.

The point is that no strategy will be optimal for all policies.  Linux
needs a global power policy manager that the rest of the system can ask
about the current policy.  This way sub-systems can (automatically)
implement whatever local strategies are consistent with that global
policy.

-Len



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cpufreq_ondemand
  2004-10-20  7:35     ` Len Brown
@ 2004-10-20 14:30       ` Dominik Brodowski
  2004-10-20 21:03         ` Len Brown
  0 siblings, 1 reply; 17+ messages in thread
From: Dominik Brodowski @ 2004-10-20 14:30 UTC (permalink / raw)
  To: Len Brown
  Cc: Andre Eisenbach, Alexander Clouter, linux-kernel, Con Kolivas, cpufreq

On Wed, Oct 20, 2004 at 03:35:35AM -0400, Len Brown wrote:
> On Wed, 2004-10-20 at 01:03, Andre Eisenbach wrote:
> 
> > ... If the
> > speed steps down slowly but shoots up 100% quickly (as it is right
> > now), even a small task (like opening a folder, or scrolling down in a
> > document) will cause a tiny spike to 100% which takes a while to go
> > back down. The result is that the CPU spends most of it's time at 100%
> > or calming down. I wrote a small test program on my notebook which
> > confirms this.
> 
> The question is what POLICY we're trying to implement.

This is why there may be DIFFERENT policies a.k.a. governors in cpufreq.

>  If the goal is
> to to be energy efficient while the user notices no performance hit,
> then fast-up/slow-down is an EXCELLENT strategy.  But if the goal is to
> optimize for power savings at the cost of impacting performance, then
> another strategy may work better.

> The point is that no strategy will be optimal for all policies.  Linux
> needs a global power policy manager that the rest of the system can ask
> about the current policy.  This way sub-systems can (automatically)
> implement whatever local strategies are consistent with that global
> policy.

Put it in userspace, and let it ask the cpufreq core in the kernel to use a
specific governor or another depending on what you want. That's what certain
userspace daemons / scripts already do, btw.

	Dominik

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cpufreq_ondemand
  2004-10-20 14:30       ` Dominik Brodowski
@ 2004-10-20 21:03         ` Len Brown
  2004-10-20 21:18           ` Dominik Brodowski
  0 siblings, 1 reply; 17+ messages in thread
From: Len Brown @ 2004-10-20 21:03 UTC (permalink / raw)
  To: Dominik Brodowski
  Cc: Andre Eisenbach, Alexander Clouter, linux-kernel, Con Kolivas, cpufreq

On Wed, 2004-10-20 at 10:30, Dominik Brodowski wrote:
> On Wed, Oct 20, 2004 at 03:35:35AM -0400, Len Brown wrote:

> > The question is what POLICY we're trying to implement.
> 
> This is why there may be DIFFERENT policies a.k.a. governors in
> cpufreq.
....
> 
> Put it in userspace, and let it ask the cpufreq core in the kernel to
> use a specific governor or another depending on what you want. That's
> what certain userspace daemons / scripts already do, btw.

Processors are not the only devices with power management.  When a
device driver, say USB, or any ACPI or PCI power-managed device,
recognizes that its device is idle, who does it ask to find out what
power state to put the hardware in?  Today there is nobody to tell it
what to do.

The user's global desired power policy needs to be represented in the
kernel where all devices can get at it so they can make low-latency
policy-based decisions.  It isn't clear that the cpufreq multiple
governor implementation model would work well for the system as whole.

-Len



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cpufreq_ondemand
  2004-10-20 21:03         ` Len Brown
@ 2004-10-20 21:18           ` Dominik Brodowski
  0 siblings, 0 replies; 17+ messages in thread
From: Dominik Brodowski @ 2004-10-20 21:18 UTC (permalink / raw)
  To: Len Brown
  Cc: Andre Eisenbach, Alexander Clouter, linux-kernel, Con Kolivas, cpufreq

On Wed, Oct 20, 2004 at 05:03:45PM -0400, Len Brown wrote:
> On Wed, 2004-10-20 at 10:30, Dominik Brodowski wrote:
> > On Wed, Oct 20, 2004 at 03:35:35AM -0400, Len Brown wrote:
> 
> > > The question is what POLICY we're trying to implement.
> > 
> > This is why there may be DIFFERENT policies a.k.a. governors in
> > cpufreq.
> ....
> > 
> > Put it in userspace, and let it ask the cpufreq core in the kernel to
> > use a specific governor or another depending on what you want. That's
> > what certain userspace daemons / scripts already do, btw.
> 
> Processors are not the only devices with power management.  When a
> device driver, say USB, or any ACPI or PCI power-managed device,
> recognizes that its device is idle, who does it ask to find out what
> power state to put the hardware in?  Today there is nobody to tell it
> what to do.

Something like sysfs' "detach_state" comes to my mind...

> The user's global desired power policy needs to be represented in the
> kernel where all devices can get at it so they can make low-latency
> policy-based decisions.  It isn't clear that the cpufreq multiple
> governor implementation model would work well for the system as whole.

The question is how much policy we want in the kernel instead of in
userspace. The actual implementation (i.e. fast transitions to idle states)
must be in the kernel, of course. However the policy decision of whether to
do such idling can and IMHO should be done in userspace.

My $0.02,

	Dominik

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cpufreq_ondemand
  2004-10-18  8:39 ` Alexander Clouter
@ 2004-10-19  5:06   ` Willy Tarreau
  0 siblings, 0 replies; 17+ messages in thread
From: Willy Tarreau @ 2004-10-19  5:06 UTC (permalink / raw)
  To: Alexander Clouter
  Cc: Pallipadi, Venkatesh, Con Kolivas, cpufreq, linux-kernel

Hi,

On Mon, Oct 18, 2004 at 09:39:05AM +0100, Alexander Clouter wrote:
> I'm all for "this really should be done in userspace", but for something like 
> this I have a nagging feeling that its neater in kernel-space.  Of course the 
> userspace one has the advantage (I think cpufreqd does it) that you can 
> decide if you want to increase the freq depending on what applications are 
> running.

Well, I've used a very simple daemon I wrote for more than a year now on a
vaio, and considering that I sometimes wanted to change it or even stop it,
I clearly prefer it in userspace than in kernel. It was so convenient to
issue a "killall cpufrqd" whenever I wanted 'time' to return accurate values
on a particular process, that I cannot imagine what it would have been if it
had been in the kernel. Moreover, the vaio was unreliable with certain
intermediate frequencies, and it too me a lot of time to discover this
(burnBX was the only reliable trigger). I simply had to change a few lines
in my daemon to use different frequencies and that was all.

Cheers,
Willy


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cpufreq_ondemand
  2004-10-18 22:48 Pallipadi, Venkatesh
@ 2004-10-18 23:18 ` Alexander Clouter
  0 siblings, 0 replies; 17+ messages in thread
From: Alexander Clouter @ 2004-10-18 23:18 UTC (permalink / raw)
  To: Pallipadi, Venkatesh; +Cc: Dominik Brodowski, cpufreq, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 5889 bytes --]

On Oct 18, Pallipadi, Venkatesh wrote:
> 
> >The improvements (well I think they are) I have made:
> >
> >1. I have replaced the algoritm it used to one which 
> >calculates the number of
> >	cpu idle cycles that have passed and compares it to the 
> >number of cpu
> >	cycles it would have expected to pass (for, the 
> >defaults, 20%/80%)
> >
> >	this means a couple of divisions have been removed, 
> >which is always 
> >	nice and it lead to clearer code (for me at least), that was 
> >	until I added the handful of 'if' conditionals though.... :-/
> 
> 
> Good idea. This part of the patch has to go into ondemand governor.
>
What I will do over the next few days is split up the patch to little bits 
(seems to keep the kernel gods happier, cannot say I blame them) and then 
post that for you all to pull apart and mull over?

> But, I think there is a minor bug in the code though.
> With current ondemand governor, we poll at some X freq and check 
> whether we need to increase the freq. And with some Y freq (Y > X and 
> a multiple of it), we check whether we need to decrase the freq.
> That is the reason I have two different variables 
> prev_cpu_idle_down and prev_cpu_idle_up to store the previous idle 
> times at these two different polling intervals (X and Y).
> Now, you have previous idle time at only one point. So, this may 
> not work cleanly. From the code I feel what will happen is
> You will only see the CPU activity in last X time and decide on 
> frequency down decisions (even though you check this with Y polling 
> interval). Not sure whether I was clear with this explanation.
> 
My code records the number of both the total idle ticks and the overall ticks
at the last interval.  This means if I subtract those values for the ones at
the next interval I can work out what the 'cpu use' is over that period thats
just passed by looking at the percentage difference between (total-idle) and 
if it trips the expected values if an increase or decrease in frequency was 
needed.

This is really the main reason why the polling interval has to be decreased 
by a large amount (I make it occur 50 times fewer times) so the period does 
not get skewed by *very* brief cpu spikes.

> Note, I haven't really run your version yet. This is what 
> I feel by looking at the patch. I may well be wrong.
> 
Well in the fashion of the netfilter folk, "Works for Me(tm)" :)  Sitting 
there with 'watch' on /sys/.../ondemand/requested_freq seems to return 
perfectly sane results.

> > 2. controllable through 
> >	/sys/.../ondemand/ignore_nice, you can tell it to 
> >consider 'nice' 
> >	time as also idle cpu cycles.  Set it to '1' to treat 
> >'nice' as cpu 
> >	in an active state.
> >
> 
> OK. This has to be in ondemand governor as well.
> 
I'll split this out as I think it should be in there.

> >3. (major) the scaling up and down of the cpufreq is now 
> >smoother.  I found 
> >	it really nasty that if it tripped < 20% idle time that 
> >the freq was 
> >	set to 100%.  This code smoothly increases the cpufreq 
> >as well as 
> >	doing a better job of decreasing it too
> >
> >4. (minor) I changed DEF_SAMPLING_RATE_LATENCY_MULTIPLIER to 50000 and
> >	DEF_SAMPLING_DOWN_FACTOR to 5 as I found the defaults a 
> >bit annoying 
> >	on my system and resulted in the cpufreq constantly jumping.
> >
> >	For my patch it works far better if the sampling rate 
> >is much lower 
> >	anyway, which can only be good for cpu efficiency in 
> >the long run
> 
> Somehow, I feel quick response time for increased load is more 
> important than smooth increase in frequency. As the CPU latency for 
> doing the freq transition is lower, I think governor should use that 
> and do quick adjustments to the freq depending on the load. Probably, I 
> am thinking more in terms of places where performance is critical.
> As Dominik pointed out, it's the time to fork put a new ondemand 
> governor with this algorithm....
> 
I have been chatting to a few people and on desktop machines this is the
behaviour they of course prefer.  Overshoot and then pull down.  However all
us laptop users have a crotch to protect :) We (well four people out of the
*whole* linux community; better than a US poll I hear though) we prefer a
overly conservative approach; hence my approach.  I did write it to suit more
my needs obviously :)

> >5. the grainity of how much cpufreq is increased or decreased 
> >is controlled 
> >	with sending a percentage to /sys/.../ondemand/freq_step_percent
> >
> >6. debugging (with 'watch -n1 cat 
> >/sys/.../ondemand/requested_freq') and 
> >	backwards 'compatibility' to act like the 'userspace' 
> >governor is 
> >	avaliable with /sys/.../ondemand/requested_freq if 
> >	'freq_step_percent' is set to zero
> 
> I again agree with Dominik's opinion on this :)
> 
guess the world domination plans go back to....

1. steal all the pants....
2. ....
3. rule the world

I do though think the step_freq bit should be there.

> Thanks for all the experiments and all these improvements.
> I will rollout a patch for ondemand governor soon, by 
> stealing some code from your patch below :)
> 
Not a problem.  I'm in a 'powersaving' mode so will race you if you want to
produce those patches :) After that I have to tell 'wmpower' its a Bad
Idea(tm) to suck up 5% cpu time to poll for the whole ACPI state every 0.5s
with a host of other major issues :-/ Then there is....and...<complains to
himself>

Cheers all

Alex

-- 
 _____________________________________ 
/ A bird in the hand is worth what it \
\ will bring.                         /
 ------------------------------------- 
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH] cpufreq_ondemand
@ 2004-10-18 22:48 Pallipadi, Venkatesh
  2004-10-18 23:18 ` Alexander Clouter
  0 siblings, 1 reply; 17+ messages in thread
From: Pallipadi, Venkatesh @ 2004-10-18 22:48 UTC (permalink / raw)
  To: Dominik Brodowski, Alexander Clouter; +Cc: cpufreq, linux-kernel


>-----Original Message-----
>From:       Alexander Clouter <alex-kernel () digriz ! org ! uk>
>Date:       2004-10-17 22:29:16
>Message-ID: <20041017222916.GA30841 () inskipp ! digriz ! org ! uk>
>[Download message RAW]
>
>[Attachment #2 (multipart/mixed)]
>
>
>Hi all,
>
>After playing with the cpufreq_ondemand governor (many thanks 
>to those whom 
>made it) I made a number of alterations which suit me at 
>least.  Really 
>looking for feedback and of course once people have fixed any 
>bugs they find 
>and made the code look neater, possible inclusion?
>
>The improvements (well I think they are) I have made:
>
>1. I have replaced the algoritm it used to one which 
>calculates the number of
>	cpu idle cycles that have passed and compares it to the 
>number of cpu
>	cycles it would have expected to pass (for, the 
>defaults, 20%/80%)
>
>	this means a couple of divisions have been removed, 
>which is always 
>	nice and it lead to clearer code (for me at least), that was 
>	until I added the handful of 'if' conditionals though.... :-/


Good idea. This part of the patch has to go into ondemand governor.
But, I think there is a minor bug in the code though.
With current ondemand governor, we poll at some X freq and check 
whether we need to increase the freq. And with some Y freq (Y > X and 
a multiple of it), we check whether we need to decrase the freq.
That is the reason I have two different variables 
prev_cpu_idle_down and prev_cpu_idle_up to store the previous idle 
times at these two different polling intervals (X and Y).
Now, you have previous idle time at only one point. So, this may 
not work cleanly. From the code I feel what will happen is
You will only see the CPU activity in last X time and decide on 
frequency down decisions (even though you check this with Y polling 
interval). Not sure whether I was clear with this explanation.

Note, I haven't really run your version yet. This is what 
I feel by looking at the patch. I may well be wrong.

> 2. controllable through 
>	/sys/.../ondemand/ignore_nice, you can tell it to 
>consider 'nice' 
>	time as also idle cpu cycles.  Set it to '1' to treat 
>'nice' as cpu 
>	in an active state.
>

OK. This has to be in ondemand governor as well.

>3. (major) the scaling up and down of the cpufreq is now 
>smoother.  I found 
>	it really nasty that if it tripped < 20% idle time that 
>the freq was 
>	set to 100%.  This code smoothly increases the cpufreq 
>as well as 
>	doing a better job of decreasing it too
>
>4. (minor) I changed DEF_SAMPLING_RATE_LATENCY_MULTIPLIER to 50000 and
>	DEF_SAMPLING_DOWN_FACTOR to 5 as I found the defaults a 
>bit annoying 
>	on my system and resulted in the cpufreq constantly jumping.
>
>	For my patch it works far better if the sampling rate 
>is much lower 
>	anyway, which can only be good for cpu efficiency in 
>the long run

Somehow, I feel quick response time for increased load is more 
important than smooth increase in frequency. As the CPU latency for 
doing the freq transition is lower, I think governor should use that 
and do quick adjustments to the freq depending on the load. Probably, I 
am thinking more in terms of places where performance is critical.
As Dominik pointed out, it's the time to fork put a new ondemand 
governor with this algorithm....

>5. the grainity of how much cpufreq is increased or decreased 
>is controlled 
>	with sending a percentage to /sys/.../ondemand/freq_step_percent
>
>6. debugging (with 'watch -n1 cat 
>/sys/.../ondemand/requested_freq') and 
>	backwards 'compatibility' to act like the 'userspace' 
>governor is 
>	avaliable with /sys/.../ondemand/requested_freq if 
>	'freq_step_percent' is set to zero

I again agree with Dominik's opinion on this :)

Thanks for all the experiments and all these improvements.
I will rollout a patch for ondemand governor soon, by 
stealing some code from your patch below :)

Thanks,
Venki


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] cpufreq_ondemand
  2004-10-18  4:56 Pallipadi, Venkatesh
@ 2004-10-18  8:39 ` Alexander Clouter
  2004-10-19  5:06   ` Willy Tarreau
  0 siblings, 1 reply; 17+ messages in thread
From: Alexander Clouter @ 2004-10-18  8:39 UTC (permalink / raw)
  To: Pallipadi, Venkatesh; +Cc: Con Kolivas, cpufreq, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2044 bytes --]

On Oct 17, Pallipadi, Venkatesh wrote:
> 
> [snipped]
> 
> We can never accurately predict freq for some future load. 
> Say a CPU capable of 600, 800, 1000, 1200 and 1400 KHz, is 
> running at 600 and we have sudden 100% CPU utilization, then 
> we cannot precisely say which should be the next freq. It 
> can be any of the higher possible freqs. And we felt performance 
> should get a higher priority whenever there is some 
> tradeoffs like this.
> 
it took me a while to work out why speed decreasing was 'working' whilst 
speed increasing was not with my method; a good hour finding out that the 
cpufreq (correctly) goes to the lowest match.

My approach was not to try and avoid predicting the desired freq, it was just 
to increase it...well on demand at a steady rate towards 100% and then once 
the load disappears to reduce it.  Having used powernowd and found it do that 
rather nicely, then seeing the inclusion of cpufreq_ondemand, I tweaked 
cpufreq_ondemand to replace powernowd.

I'm all for "this really should be done in userspace", but for something like 
this I have a nagging feeling that its neater in kernel-space.  Of course the 
userspace one has the advantage (I think cpufreqd does it) that you can 
decide if you want to increase the freq depending on what applications are 
running.

Of course you are using CPU cycles, though bearly any, to have this floating 
requested_freq variable.  Of course I would love this to be in the kernel, 
mainly though I wanted people to improve upon it and such.

Meanwhile I am thinking of moving that freq_step variable bits to the /sys 
show/store functions to remove a avoidable divide.

Cheers

Alex

-- 
 ____________________________________ 
/ Let your conscience be your guide. \
|                                    |
\ -- Pope                            /
 ------------------------------------ 
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH] cpufreq_ondemand
@ 2004-10-18  4:56 Pallipadi, Venkatesh
  2004-10-18  8:39 ` Alexander Clouter
  0 siblings, 1 reply; 17+ messages in thread
From: Pallipadi, Venkatesh @ 2004-10-18  4:56 UTC (permalink / raw)
  To: Con Kolivas, Alexander Clouter; +Cc: cpufreq, linux-kernel

>-----Original Message-----
>From: Con Kolivas [mailto:kernel@kolivas.org] 
>Sent: Sunday, October 17, 2004 3:36 PM
>To: Alexander Clouter
>Cc: Pallipadi, Venkatesh; cpufreq@www.linux.org.uk; 
>linux-kernel@vger.kernel.org
>Subject: Re: [PATCH] cpufreq_ondemand
>
>Alexander Clouter wrote:
>>> 3. (major) the scaling up and down of the cpufreq is now 
>smoother.  I found 
>> 	it really nasty that if it tripped < 20% idle time that 
>the freq was 
>> 	set to 100%.  This code smoothly increases the cpufreq 
>as well as 
>> 	doing a better job of decreasing it too
>
>I'd much prefer it shot up to 100% or else every time the cpu 
>usage went 
>up there'd be an obvious lag till the machine ran at it's 
>capable speed. 
>  I very much doubt the small amount of time it spent at 100% 
>speed with 
>the default design would decrease the battery life 
>significantly as well.
>

True. The current ondemand behaviour is by design. When CPU 
is at the lowest freq, and there is a sudden surge in load, 
we want it to go to max freq immediately, rather than wait 
for some more polling intervals. If max freq is too high, 
it will naturally lower to some intermediate freq later. 

We can never accurately predict freq for some future load. 
Say a CPU capable of 600, 800, 1000, 1200 and 1400 KHz, is 
running at 600 and we have sudden 100% CPU utilization, then 
we cannot precisely say which should be the next freq. It 
can be any of the higher possible freqs. And we felt performance 
should get a higher priority whenever there is some 
tradeoffs like this.

Thanks,
Venki

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2004-10-20 21:26 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-10-17 22:29 [PATCH] cpufreq_ondemand Alexander Clouter
2004-10-17 22:35 ` Con Kolivas
2004-10-17 22:44   ` Alexander Clouter
2004-10-19 18:22   ` Bruno Ducrot
2004-10-20  5:03   ` Andre Eisenbach
2004-10-20  7:35     ` Len Brown
2004-10-20 14:30       ` Dominik Brodowski
2004-10-20 21:03         ` Len Brown
2004-10-20 21:18           ` Dominik Brodowski
2004-10-18  7:20 ` Dominik Brodowski
2004-10-18  8:12   ` Mattia Dongili
2004-10-18  8:25   ` Alexander Clouter
2004-10-18  4:56 Pallipadi, Venkatesh
2004-10-18  8:39 ` Alexander Clouter
2004-10-19  5:06   ` Willy Tarreau
2004-10-18 22:48 Pallipadi, Venkatesh
2004-10-18 23:18 ` Alexander Clouter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).