linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] Fix performance issue with ondemand governor
@ 2010-04-18 18:59 Arjan van de Ven
  2010-04-18 19:00 ` [PATCH 1/7] sched: add a comment to get_cpu_idle_time_us() Arjan van de Ven
                   ` (7 more replies)
  0 siblings, 8 replies; 53+ messages in thread
From: Arjan van de Ven @ 2010-04-18 18:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, mingo, peterz, tglx, davej, cpufreq

There have been various reports of the ondemand governor causing some
serious performance issues, one of the latest ones from Andrew.
There are several fundamental issues with ondemand (being worked on),
but the report from Andrew can be fixed relatively easily.

The fundamental issue is that ondemand will go to a (too) low CPU
frequency for workloads that alternatingly disk and CPU bound...

-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH 1/7] sched: add a comment to get_cpu_idle_time_us()
  2010-04-18 18:59 [PATCH 0/7] Fix performance issue with ondemand governor Arjan van de Ven
@ 2010-04-18 19:00 ` Arjan van de Ven
  2010-04-26 19:25   ` Rik van Riel
  2010-04-18 19:01 ` [PATCH 2/7] sched: introduce a function to update the idle statistics Arjan van de Ven
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 53+ messages in thread
From: Arjan van de Ven @ 2010-04-18 19:00 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, akpm, mingo, peterz, tglx, davej, cpufreq

>From 6f9df3bc6571d6545c552151f408d69265e15f92 Mon Sep 17 00:00:00 2001
From: Arjan van de Ven <arjan@linux.intel.com>
Date: Sun, 18 Apr 2010 10:25:19 -0700
Subject: [PATCH 1/7] sched: add a comment to get_cpu_idle_time_us()

The exported function get_cpu_idle_time_us() has no comment
describing it; add a kerneldoc comment

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 kernel/time/tick-sched.c |   14 ++++++++++++++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index f992762..54dc155 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -179,6 +179,20 @@ static ktime_t tick_nohz_start_idle(struct tick_sched *ts)
 	return now;
 }
 
+/**
+ * get_cpu_idle_time_us - get the total idle time of a cpu
+ * @cpu: CPU number to query
+ * @last_update_time: variable to store update time in
+ *
+ * Return the cummulative idle time (since boot) for a given
+ * CPU, in microseconds. The idle time returned includes
+ * the iowait time (unlike what "top" and co report).
+ *
+ * This time is measured via accounting rather than sampling,
+ * and is as accurate as ktime_get() is.
+ *
+ * This function returns -1 if NOHZ is not enabled.
+ */
 u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
 {
 	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 2/7] sched: introduce a function to update the idle statistics
  2010-04-18 18:59 [PATCH 0/7] Fix performance issue with ondemand governor Arjan van de Ven
  2010-04-18 19:00 ` [PATCH 1/7] sched: add a comment to get_cpu_idle_time_us() Arjan van de Ven
@ 2010-04-18 19:01 ` Arjan van de Ven
  2010-04-26 20:11   ` Rik van Riel
  2010-04-18 19:01 ` [PATCH 3/7] sched: update the idle statistics in get_cpu_idle_time_us() Arjan van de Ven
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 53+ messages in thread
From: Arjan van de Ven @ 2010-04-18 19:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, akpm, mingo, peterz, tglx, davej, cpufreq

>From 166b7526ccfea8b44626b6023ff5b0a8eb869bb3 Mon Sep 17 00:00:00 2001
From: Arjan van de Ven <arjan@linux.intel.com>
Date: Sun, 18 Apr 2010 10:33:02 -0700
Subject: [PATCH 2/7] sched: introduce a function to update the idle statistics

Currently, two places update the idle statistics (and more to
come later in this series).

This patch creates a helper function for updating these statistics.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 kernel/time/tick-sched.c |   29 +++++++++++++++++++----------
 1 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 54dc155..ca2211d 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -150,14 +150,25 @@ static void tick_nohz_update_jiffies(ktime_t now)
 	touch_softlockup_watchdog();
 }
 
-static void tick_nohz_stop_idle(int cpu, ktime_t now)
+/*
+ * Updates the per cpu time idle statistics counters
+ */
+static void update_ts_time_stats(struct tick_sched *ts, ktime_t now)
 {
-	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
 	ktime_t delta;
 
-	delta = ktime_sub(now, ts->idle_entrytime);
 	ts->idle_lastupdate = now;
-	ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
+	if (ts->idle_active) {
+		delta = ktime_sub(now, ts->idle_entrytime);
+		ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
+	}
+}
+
+static void tick_nohz_stop_idle(int cpu, ktime_t now)
+{
+	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+
+	update_ts_time_stats(ts, now);
 	ts->idle_active = 0;
 
 	sched_clock_idle_wakeup_event(0);
@@ -165,14 +176,12 @@ static void tick_nohz_stop_idle(int cpu, ktime_t now)
 
 static ktime_t tick_nohz_start_idle(struct tick_sched *ts)
 {
-	ktime_t now, delta;
+	ktime_t now;
 
 	now = ktime_get();
-	if (ts->idle_active) {
-		delta = ktime_sub(now, ts->idle_entrytime);
-		ts->idle_lastupdate = now;
-		ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
-	}
+
+	update_ts_time_stats(ts, now);
+
 	ts->idle_entrytime = now;
 	ts->idle_active = 1;
 	sched_clock_idle_sleep_event();
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 3/7] sched: update the idle statistics in get_cpu_idle_time_us()
  2010-04-18 18:59 [PATCH 0/7] Fix performance issue with ondemand governor Arjan van de Ven
  2010-04-18 19:00 ` [PATCH 1/7] sched: add a comment to get_cpu_idle_time_us() Arjan van de Ven
  2010-04-18 19:01 ` [PATCH 2/7] sched: introduce a function to update the idle statistics Arjan van de Ven
@ 2010-04-18 19:01 ` Arjan van de Ven
  2010-04-26 20:36   ` Rik van Riel
  2010-04-18 19:02 ` [PATCH 4/7] sched: fold updating of the last update time into update_ts_time_stats() Arjan van de Ven
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 53+ messages in thread
From: Arjan van de Ven @ 2010-04-18 19:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, akpm, mingo, peterz, tglx, davej, cpufreq

>From 60851b131900af03bf013afef69f3bcdbb04f1d6 Mon Sep 17 00:00:00 2001
From: Arjan van de Ven <arjan@linux.intel.com>
Date: Sun, 18 Apr 2010 10:41:30 -0700
Subject: [PATCH 3/7] sched: update the idle statistics in get_cpu_idle_time_us()

Right now, get_cpu_idle_time_us() only reports the idle statistics
upto the point the CPU entered last idle; not what is valid right now.

This patch adds an update of the idle statistics to get_cpu_idle_time_us(),
so that calling this function always returns statistics that are accurate
at the point of the call.

This includes resetting the start of the idle time for accounting purposes
to avoid double accounting.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 kernel/time/tick-sched.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index ca2211d..7dbad2f 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -161,6 +161,7 @@ static void update_ts_time_stats(struct tick_sched *ts, ktime_t now)
 	if (ts->idle_active) {
 		delta = ktime_sub(now, ts->idle_entrytime);
 		ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
+		ts->idle_entrytime = now;
 	}
 }
 
@@ -205,14 +206,18 @@ static ktime_t tick_nohz_start_idle(struct tick_sched *ts)
 u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
 {
 	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+	ktime_t now;
 
 	if (!tick_nohz_enabled)
 		return -1;
 
+	now = ktime_get();
+	update_ts_time_stats(ts, now);
+
 	if (ts->idle_active)
 		*last_update_time = ktime_to_us(ts->idle_lastupdate);
 	else
-		*last_update_time = ktime_to_us(ktime_get());
+		*last_update_time = ktime_to_us(now);
 
 	return ktime_to_us(ts->idle_sleeptime);
 }
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 4/7] sched: fold updating of the last update time into update_ts_time_stats()
  2010-04-18 18:59 [PATCH 0/7] Fix performance issue with ondemand governor Arjan van de Ven
                   ` (2 preceding siblings ...)
  2010-04-18 19:01 ` [PATCH 3/7] sched: update the idle statistics in get_cpu_idle_time_us() Arjan van de Ven
@ 2010-04-18 19:02 ` Arjan van de Ven
  2010-04-26 20:58   ` Rik van Riel
  2010-04-18 19:02 ` [PATCH 5/7] sched: eliminate the ts->idle_lastupdate field Arjan van de Ven
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 53+ messages in thread
From: Arjan van de Ven @ 2010-04-18 19:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, akpm, mingo, peterz, tglx, davej, cpufreq

>From e75d6cd203e43ea4c5e9919f19e2882c066491b8 Mon Sep 17 00:00:00 2001
From: Arjan van de Ven <arjan@linux.intel.com>
Date: Sun, 18 Apr 2010 10:47:02 -0700
Subject: [PATCH 4/7] sched: fold updating of the last update time into update_ts_time_stats()

This patch folds the updating of the last_update_time into the
update_ts_time_stats() function, and updates the callers.

This allows for further cleanups that are done in the next patch.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 kernel/time/tick-sched.c |   22 +++++++++++-----------
 1 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 7dbad2f..ac54543 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -153,7 +153,8 @@ static void tick_nohz_update_jiffies(ktime_t now)
 /*
  * Updates the per cpu time idle statistics counters
  */
-static void update_ts_time_stats(struct tick_sched *ts, ktime_t now)
+static void
+update_ts_time_stats(struct tick_sched *ts, ktime_t now, u64 *last_update_time)
 {
 	ktime_t delta;
 
@@ -163,13 +164,19 @@ static void update_ts_time_stats(struct tick_sched *ts, ktime_t now)
 		ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
 		ts->idle_entrytime = now;
 	}
+
+	if (ts->idle_active && last_update_time)
+		*last_update_time = ktime_to_us(ts->idle_lastupdate);
+	else
+		*last_update_time = ktime_to_us(now);
+
 }
 
 static void tick_nohz_stop_idle(int cpu, ktime_t now)
 {
 	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
 
-	update_ts_time_stats(ts, now);
+	update_ts_time_stats(ts, now, NULL);
 	ts->idle_active = 0;
 
 	sched_clock_idle_wakeup_event(0);
@@ -181,7 +188,7 @@ static ktime_t tick_nohz_start_idle(struct tick_sched *ts)
 
 	now = ktime_get();
 
-	update_ts_time_stats(ts, now);
+	update_ts_time_stats(ts, now, NULL);
 
 	ts->idle_entrytime = now;
 	ts->idle_active = 1;
@@ -206,18 +213,11 @@ static ktime_t tick_nohz_start_idle(struct tick_sched *ts)
 u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
 {
 	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
-	ktime_t now;
 
 	if (!tick_nohz_enabled)
 		return -1;
 
-	now = ktime_get();
-	update_ts_time_stats(ts, now);
-
-	if (ts->idle_active)
-		*last_update_time = ktime_to_us(ts->idle_lastupdate);
-	else
-		*last_update_time = ktime_to_us(now);
+	update_ts_time_stats(ts, ktime_get(), last_update_time);
 
 	return ktime_to_us(ts->idle_sleeptime);
 }
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 5/7] sched: eliminate the ts->idle_lastupdate field
  2010-04-18 18:59 [PATCH 0/7] Fix performance issue with ondemand governor Arjan van de Ven
                   ` (3 preceding siblings ...)
  2010-04-18 19:02 ` [PATCH 4/7] sched: fold updating of the last update time into update_ts_time_stats() Arjan van de Ven
@ 2010-04-18 19:02 ` Arjan van de Ven
  2010-04-26 21:00   ` Rik van Riel
  2010-04-18 19:03 ` [PATCH 6/7] sched: introduce get_cpu_iowait_time_us() Arjan van de Ven
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 53+ messages in thread
From: Arjan van de Ven @ 2010-04-18 19:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, akpm, mingo, peterz, tglx, davej, cpufreq

>From 526a9f347d5a953f37b67b4b2afb39d7b4d77a92 Mon Sep 17 00:00:00 2001
From: Arjan van de Ven <arjan@linux.intel.com>
Date: Sun, 18 Apr 2010 10:49:30 -0700
Subject: [PATCH 5/7] sched: eliminate the ts->idle_lastupdate field

Now that the only user of ts->idle_lastupdate is update_ts_time_stats(),
the entire field can be eliminated.

In update_ts_time_stats(), idle_lastupdate is first set to "now",
and a few lines later, the only user is an if() statement that
assigns a variable either to "now" or to ts->idle_lastupdate,
which has the value of "now" at that point.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 include/linux/tick.h     |    1 -
 kernel/time/tick-sched.c |    5 +----
 2 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index d2ae79e..0343eed 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -60,7 +60,6 @@ struct tick_sched {
 	ktime_t				idle_waketime;
 	ktime_t				idle_exittime;
 	ktime_t				idle_sleeptime;
-	ktime_t				idle_lastupdate;
 	ktime_t				sleep_length;
 	unsigned long			last_jiffies;
 	unsigned long			next_jiffies;
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index ac54543..326f5f8 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -158,16 +158,13 @@ update_ts_time_stats(struct tick_sched *ts, ktime_t now, u64 *last_update_time)
 {
 	ktime_t delta;
 
-	ts->idle_lastupdate = now;
 	if (ts->idle_active) {
 		delta = ktime_sub(now, ts->idle_entrytime);
 		ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
 		ts->idle_entrytime = now;
 	}
 
-	if (ts->idle_active && last_update_time)
-		*last_update_time = ktime_to_us(ts->idle_lastupdate);
-	else
+	if (last_update_time)
 		*last_update_time = ktime_to_us(now);
 
 }
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 6/7] sched: introduce get_cpu_iowait_time_us()
  2010-04-18 18:59 [PATCH 0/7] Fix performance issue with ondemand governor Arjan van de Ven
                   ` (4 preceding siblings ...)
  2010-04-18 19:02 ` [PATCH 5/7] sched: eliminate the ts->idle_lastupdate field Arjan van de Ven
@ 2010-04-18 19:03 ` Arjan van de Ven
  2010-04-26 21:05   ` Rik van Riel
  2010-04-18 19:03 ` [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO Arjan van de Ven
  2010-04-26 21:41 ` [PATCH 0/7] Fix performance issue with ondemand governor Dave Jones
  7 siblings, 1 reply; 53+ messages in thread
From: Arjan van de Ven @ 2010-04-18 19:03 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, akpm, mingo, peterz, tglx, davej, cpufreq

>From c4dd11703034f2ecbc3180603663fab14c292d7c Mon Sep 17 00:00:00 2001
From: Arjan van de Ven <arjan@linux.intel.com>
Date: Sun, 18 Apr 2010 10:57:43 -0700
Subject: [PATCH 6/7] sched: introduce get_cpu_iowait_time_us()

For the ondemand cpufreq governor, it is desired that the iowait
time is microaccounted in a similar way as idle time is.

This patch introduces the infrastructure to account and expose
this information via the get_cpu_iowait_time_us() function.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 include/linux/tick.h     |    4 ++++
 kernel/time/tick-sched.c |   28 ++++++++++++++++++++++++++++
 kernel/time/timer_list.c |    1 +
 3 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index 0343eed..4aa3703 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -42,6 +42,7 @@ enum tick_nohz_mode {
  * @idle_waketime:	Time when the idle was interrupted
  * @idle_exittime:	Time when the idle state was left
  * @idle_sleeptime:	Sum of the time slept in idle with sched tick stopped
+ * @iowait_sleeptime:	Sum of the time slept in idle with sched tick stopped, with IO outstanding
  * @sleep_length:	Duration of the current idle sleep
  * @do_timer_lst:	CPU was the last one doing do_timer before going idle
  */
@@ -60,6 +61,7 @@ struct tick_sched {
 	ktime_t				idle_waketime;
 	ktime_t				idle_exittime;
 	ktime_t				idle_sleeptime;
+	ktime_t				iowait_sleeptime;
 	ktime_t				sleep_length;
 	unsigned long			last_jiffies;
 	unsigned long			next_jiffies;
@@ -123,6 +125,7 @@ extern void tick_nohz_stop_sched_tick(int inidle);
 extern void tick_nohz_restart_sched_tick(void);
 extern ktime_t tick_nohz_get_sleep_length(void);
 extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
+extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
 # else
 static inline void tick_nohz_stop_sched_tick(int inidle) { }
 static inline void tick_nohz_restart_sched_tick(void) { }
@@ -133,6 +136,7 @@ static inline ktime_t tick_nohz_get_sleep_length(void)
 	return len;
 }
 static inline u64 get_cpu_idle_time_us(int cpu, u64 *unused) { return -1; }
+static inline u64 get_cpu_iowait(int cpu, u64 *unused) { return -1; }
 # endif /* !NO_HZ */
 
 #endif
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 326f5f8..a6104a8 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -161,6 +161,8 @@ update_ts_time_stats(struct tick_sched *ts, ktime_t now, u64 *last_update_time)
 	if (ts->idle_active) {
 		delta = ktime_sub(now, ts->idle_entrytime);
 		ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
+		if (nr_iowait_cpu() > 0)
+			ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
 		ts->idle_entrytime = now;
 	}
 
@@ -220,6 +222,32 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
 }
 EXPORT_SYMBOL_GPL(get_cpu_idle_time_us);
 
+/*
+ * get_cpu_iowait_time_us - get the total iowait time of a cpu
+ * @cpu: CPU number to query
+ * @last_update_time: variable to store update time in
+ *
+ * Return the cummulative iowait time (since boot) for a given
+ * CPU, in microseconds.
+ *
+ * This time is measured via accounting rather than sampling,
+ * and is as accurate as ktime_get() is.
+ *
+ * This function returns -1 if NOHZ is not enabled.
+ */
+u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
+{
+	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+
+	if (!tick_nohz_enabled)
+		return -1;
+
+	update_ts_time_stats(ts, ktime_get(), last_update_time);
+
+	return ktime_to_us(ts->iowait_sleeptime);
+}
+EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us);
+
 /**
  * tick_nohz_stop_sched_tick - stop the idle tick from the idle task
  *
diff --git a/kernel/time/timer_list.c b/kernel/time/timer_list.c
index 1a4a7dd..ab8f5e3 100644
--- a/kernel/time/timer_list.c
+++ b/kernel/time/timer_list.c
@@ -176,6 +176,7 @@ static void print_cpu(struct seq_file *m, int cpu, u64 now)
 		P_ns(idle_waketime);
 		P_ns(idle_exittime);
 		P_ns(idle_sleeptime);
+		P_ns(iowait_sleeptime);
 		P(last_jiffies);
 		P(next_jiffies);
 		P_ns(idle_expires);
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO
  2010-04-18 18:59 [PATCH 0/7] Fix performance issue with ondemand governor Arjan van de Ven
                   ` (5 preceding siblings ...)
  2010-04-18 19:03 ` [PATCH 6/7] sched: introduce get_cpu_iowait_time_us() Arjan van de Ven
@ 2010-04-18 19:03 ` Arjan van de Ven
  2010-04-19  8:29   ` Éric Piel
                     ` (4 more replies)
  2010-04-26 21:41 ` [PATCH 0/7] Fix performance issue with ondemand governor Dave Jones
  7 siblings, 5 replies; 53+ messages in thread
From: Arjan van de Ven @ 2010-04-18 19:03 UTC (permalink / raw)
  To: linux-kernel; +Cc: Arjan van de Ven, akpm, mingo, peterz, tglx, davej, cpufreq

>From 27966bedabea83c4f3ae77507eceb746b1f6ebae Mon Sep 17 00:00:00 2001
From: Arjan van de Ven <arjan@linux.intel.com>
Date: Sun, 18 Apr 2010 11:15:56 -0700
Subject: [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO

The ondemand cpufreq governor uses CPU busy time (e.g. not-idle time) as
a measure for scaling the CPU frequency up or down.
If the CPU is busy, the CPU frequency scales up, if it's idle, the CPU
frequency scales down. Effectively, it uses the CPU busy time as proxy
variable for the more nebulous "how critical is performance right now"
question.

This algorithm falls flat on its face in the light of workloads where
you're alternatingly disk and CPU bound, such as the ever popular
"git grep", but also things like startup of programs and maildir using
email clients... much to the chagarin of Andrew Morton.

This patch changes the ondemand algorithm to count iowait time as busy,
not idle, time. As shown in the breakdown cases above, iowait is performance
critical often, and by counting iowait, the proxy variable becomes a more
accurate representation of the "how critical is performance" question.

The problem and fix are both verified with the "perf timechar" tool.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 drivers/cpufreq/cpufreq_ondemand.c |   30 ++++++++++++++++++++++++++++--
 1 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index bd444dc..ed472f8 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -73,6 +73,7 @@ enum {DBS_NORMAL_SAMPLE, DBS_SUB_SAMPLE};
 
 struct cpu_dbs_info_s {
 	cputime64_t prev_cpu_idle;
+	cputime64_t prev_cpu_iowait;
 	cputime64_t prev_cpu_wall;
 	cputime64_t prev_cpu_nice;
 	struct cpufreq_policy *cur_policy;
@@ -148,6 +149,16 @@ static inline cputime64_t get_cpu_idle_time(unsigned int cpu, cputime64_t *wall)
 	return idle_time;
 }
 
+static inline cputime64_t get_cpu_iowait_time(unsigned int cpu, cputime64_t *wall)
+{
+	u64 iowait_time = get_cpu_iowait_time_us(cpu, wall);
+
+	if (iowait_time == -1ULL)
+		return 0;
+
+	return iowait_time;
+}
+
 /*
  * Find right freq to be set now with powersave_bias on.
  * Returns the freq_hi to be used right now and will set freq_hi_jiffies,
@@ -470,14 +481,15 @@ static void dbs_check_cpu(struct cpu_dbs_info_s *this_dbs_info)
 
 	for_each_cpu(j, policy->cpus) {
 		struct cpu_dbs_info_s *j_dbs_info;
-		cputime64_t cur_wall_time, cur_idle_time;
-		unsigned int idle_time, wall_time;
+		cputime64_t cur_wall_time, cur_idle_time, cur_iowait_time;
+		unsigned int idle_time, wall_time, iowait_time;
 		unsigned int load, load_freq;
 		int freq_avg;
 
 		j_dbs_info = &per_cpu(od_cpu_dbs_info, j);
 
 		cur_idle_time = get_cpu_idle_time(j, &cur_wall_time);
+		cur_iowait_time = get_cpu_iowait_time(j, &cur_wall_time);
 
 		wall_time = (unsigned int) cputime64_sub(cur_wall_time,
 				j_dbs_info->prev_cpu_wall);
@@ -487,6 +499,10 @@ static void dbs_check_cpu(struct cpu_dbs_info_s *this_dbs_info)
 				j_dbs_info->prev_cpu_idle);
 		j_dbs_info->prev_cpu_idle = cur_idle_time;
 
+		iowait_time = (unsigned int) cputime64_sub(cur_iowait_time,
+				j_dbs_info->prev_cpu_iowait);
+		j_dbs_info->prev_cpu_iowait = cur_iowait_time;
+
 		if (dbs_tuners_ins.ignore_nice) {
 			cputime64_t cur_nice;
 			unsigned long cur_nice_jiffies;
@@ -504,6 +520,16 @@ static void dbs_check_cpu(struct cpu_dbs_info_s *this_dbs_info)
 			idle_time += jiffies_to_usecs(cur_nice_jiffies);
 		}
 
+		/*
+		 * For the purpose of ondemand, waiting for disk IO is an
+		 * indication that you're performance critical, and not that
+		 * the system is actually idle. So subtract the iowait time
+		 * from the cpu idle time.
+		 */
+
+		if (idle_time >= iowait_time)
+			idle_time -= iowait_time;
+
 		if (unlikely(!wall_time || wall_time < idle_time))
 			continue;
 
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO
  2010-04-18 19:03 ` [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO Arjan van de Ven
@ 2010-04-19  8:29   ` Éric Piel
  2010-04-19 13:43     ` Arjan van de Ven
  2010-04-19  9:09   ` Tvrtko Ursulin
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 53+ messages in thread
From: Éric Piel @ 2010-04-19  8:29 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel, akpm, mingo, peterz, tglx, davej, cpufreq

On 18/04/10 21:03, Arjan van de Ven wrote:
> From 27966bedabea83c4f3ae77507eceb746b1f6ebae Mon Sep 17 00:00:00 2001
> From: Arjan van de Ven <arjan@linux.intel.com>
> Date: Sun, 18 Apr 2010 11:15:56 -0700
> Subject: [PATCH 7/7] ondemand: Solve the big performance issue with
> ondemand during disk IO
> 
> The ondemand cpufreq governor uses CPU busy time (e.g. not-idle time) as
> a measure for scaling the CPU frequency up or down.
> If the CPU is busy, the CPU frequency scales up, if it's idle, the CPU
> frequency scales down. Effectively, it uses the CPU busy time as proxy
> variable for the more nebulous "how critical is performance right now"
> question.
> 
> This algorithm falls flat on its face in the light of workloads where
> you're alternatingly disk and CPU bound, such as the ever popular
> "git grep", but also things like startup of programs and maildir using
> email clients... much to the chagarin of Andrew Morton.
> 
> This patch changes the ondemand algorithm to count iowait time as busy,
> not idle, time. As shown in the breakdown cases above, iowait is performance
> critical often, and by counting iowait, the proxy variable becomes a more
> accurate representation of the "how critical is performance" question.
> 
> The problem and fix are both verified with the "perf timechar" tool.
Hi,
I don't doubt that keeping the cpu full frequency during IO can improve
some specific workloads, however in your log message you don't explain
how much we are loosing.

How much more energy is consumed when doing a "updatedb" or "find /" ?

I'm not saying your patch is wrong, but I'm saying that if you tweak the
ondemand governor just by looking at performance and not at energy
consumption, as it seems in your log message, it will soon look very
similar to the performance governor! So it would be good to check the
whole spectrum of what needs to be optimised :-)

Cheers,
Eric

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO
  2010-04-18 19:03 ` [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO Arjan van de Ven
  2010-04-19  8:29   ` Éric Piel
@ 2010-04-19  9:09   ` Tvrtko Ursulin
  2010-04-19 13:46     ` Arjan van de Ven
  2010-04-20  9:29   ` Thomas Renninger
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 53+ messages in thread
From: Tvrtko Ursulin @ 2010-04-19  9:09 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel, akpm, mingo, peterz, tglx, davej, cpufreq

On Sunday 18 Apr 2010 20:03:46 Arjan van de Ven wrote:
> The ondemand cpufreq governor uses CPU busy time (e.g. not-idle time) as
> a measure for scaling the CPU frequency up or down.
> If the CPU is busy, the CPU frequency scales up, if it's idle, the CPU
> frequency scales down. Effectively, it uses the CPU busy time as proxy
> variable for the more nebulous "how critical is performance right now"
> question.
>
> This algorithm falls flat on its face in the light of workloads where
> you're alternatingly disk and CPU bound, such as the ever popular
> "git grep", but also things like startup of programs and maildir using
> email clients... much to the chagarin of Andrew Morton.
>
> This patch changes the ondemand algorithm to count iowait time as busy,
> not idle, time. As shown in the breakdown cases above, iowait is
>  performance critical often, and by counting iowait, the proxy variable
>  becomes a more accurate representation of the "how critical is
>  performance" question.

Is the improvement really because IO benefited from CPU being held at a higher
frequency, or perhaps because it is now not scaled down during IO, so when CPU
intensive part of git grep comes along it is already "revved up"?

Or in other words, does a pure IO workload benefit from now higher selected
frequency?

One idea I had but a) never had time to implement it and b) was not sure it
would be accepted anyway, was to modify ondemand governor to ramp up
instantly, but slow down slowly (in a configurable way). Or to make it a hybrid
of conservative and ondemand in a way. (This was all long time ago so perhaps
today what I wanted to do then is already achievable with some knobs.) It was
not an idea out of thin air, but based on unpleasant desktop latencies with
ondemand, while performance governor was of course perfect.

Tvrtko




Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 348 3873 20.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO
  2010-04-19  8:29   ` Éric Piel
@ 2010-04-19 13:43     ` Arjan van de Ven
  2010-04-19 14:30       ` Éric Piel
  2010-04-20  9:24       ` Thomas Renninger
  0 siblings, 2 replies; 53+ messages in thread
From: Arjan van de Ven @ 2010-04-19 13:43 UTC (permalink / raw)
  To: Éric Piel; +Cc: linux-kernel, akpm, mingo, peterz, tglx, davej, cpufreq

On Mon, 19 Apr 2010 10:29:47 +0200
Éric Piel <eric.piel@tremplin-utc.net> wrote:
> > 
> > The problem and fix are both verified with the "perf timechar" tool.
> Hi,
> I don't doubt that keeping the cpu full frequency during IO can
> improve some specific workloads, however in your log message you
> don't explain how much we are loosing.

first of all, it's so bad that people will just turn the whole power
management off... at which point fixing the really bad bug is actually
quite a win 

> 
> How much more energy is consumed when doing a "updatedb" or "find /" ?

on the machines I used this on (Core i7's) it's actually hardly
measurable. All CPUs I have access to turn the voltage entirely off
during idle, so while the frequency is higher, if you're mostly IO
bound, it's only for very short durations... while still being mostly
"off".

-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO
  2010-04-19  9:09   ` Tvrtko Ursulin
@ 2010-04-19 13:46     ` Arjan van de Ven
  2010-04-19 15:29       ` Tvrtko Ursulin
  0 siblings, 1 reply; 53+ messages in thread
From: Arjan van de Ven @ 2010-04-19 13:46 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: linux-kernel, akpm, mingo, peterz, tglx, davej, cpufreq

On Mon, 19 Apr 2010 10:09:55 +0100
Tvrtko Ursulin <tvrtko.ursulin@sophos.com> wrote:

> Is the improvement really because IO benefited from CPU being held at
> a higher frequency, or perhaps because it is now not scaled down
> during IO, so when CPU intensive part of git grep comes along it is
> already "revved up"?

the IO part does not get much faster (some systems lower the FSB speed
at low frequencies, but disks are still much slower than the FSB
anyway), but the CPU part, which is performance critical for, say,
"git grep" does go faster.


> 
> Or in other words, does a pure IO workload benefit from now higher
> selected frequency?

no.
Mixed workloads do.
but pure IO workloads also don't suffer since while idle, the voltage
goes down anyway.


> 
> One idea I had but a) never had time to implement it and b) was not
> sure it would be accepted anyway, was to modify ondemand governor to
> ramp up instantly, but slow down slowly (in a configurable way).

that's what ondemand does already.


-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO
  2010-04-19 13:43     ` Arjan van de Ven
@ 2010-04-19 14:30       ` Éric Piel
  2010-04-19 14:47         ` Arjan van de Ven
  2010-04-20  9:24       ` Thomas Renninger
  1 sibling, 1 reply; 53+ messages in thread
From: Éric Piel @ 2010-04-19 14:30 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel, akpm, mingo, peterz, tglx, davej, cpufreq

On 19/04/10 15:43, Arjan van de Ven wrote:
> On Mon, 19 Apr 2010 10:29:47 +0200
> Éric Piel <eric.piel@tremplin-utc.net> wrote:
>>>
>>> The problem and fix are both verified with the "perf timechar" tool.
>> Hi,
>> I don't doubt that keeping the cpu full frequency during IO can
>> improve some specific workloads, however in your log message you
>> don't explain how much we are loosing.
> 
> first of all, it's so bad that people will just turn the whole power
> management off... at which point fixing the really bad bug is actually
> quite a win 
Fair enough, and I have to fully agree, it's better to have a power
management which consumes a bit more than the unreachable optimal, than
having everyone switch it off completely.


>> How much more energy is consumed when doing a "updatedb" or "find /" ?
> 
> on the machines I used this on (Core i7's) it's actually hardly
> measurable. All CPUs I have access to turn the voltage entirely off
> during idle, so while the frequency is higher, if you're mostly IO
> bound, it's only for very short durations... while still being mostly
> "off".
Yes, but keep in mind the Linux ondemand governor should not be tweaked
only for the latest Intel CPUs. I've just done a very little and
_extremely rough_ measurement on my laptop with a Core Duo 2, and while
it seems that, indeed, at idle the frequency didn't matter for the
consumption (about 12.5W with any governor), when running updatedb (so
IO bound), the performance governor seems to consume more than ondemand
and powersave (14.8-15.8W instead of 14.0-14.6W). I'm very careful with
the results of this "experiment" because it's only using the ACPI report
for the power usage and done with many other programs in the background.
Nevertheless, it manages to convince me that this change is not going to
be as harmless for the power consumption as you suggest.

Don't take me wrong. Here, I'm not saying that this patch is bad per se.
In particular, I do understand the specific workloads it tries to
handle, and I don't have a better solution for it in mind. However, this
is quite a change in the ondemand governor logic, and the log message do
not mention anything about energy consumption. This triggers a big
warning "only half of the problem was looked at to get a solution"!

So if this patch ever goes in, it should at least have a better
changelog which describes what are the potential consequences on the
power consumption.

Cheers,
Eric

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO
  2010-04-19 14:30       ` Éric Piel
@ 2010-04-19 14:47         ` Arjan van de Ven
  0 siblings, 0 replies; 53+ messages in thread
From: Arjan van de Ven @ 2010-04-19 14:47 UTC (permalink / raw)
  To: Éric Piel; +Cc: linux-kernel, akpm, mingo, peterz, tglx, davej, cpufreq

> > on the machines I used this on (Core i7's) it's actually hardly
> > measurable. All CPUs I have access to turn the voltage entirely off
> > during idle, so while the frequency is higher, if you're mostly IO
> > bound, it's only for very short durations... while still being
> > mostly "off".
> Yes, but keep in mind the Linux ondemand governor should not be
> tweaked only for the latest Intel CPUs.

this is not just "latest Intel".  Just about everyone does this.


> I've just done a very little
> and _extremely rough_ measurement on my laptop with a Core Duo 2, and
> while it seems that, indeed, at idle the frequency didn't matter for
> the consumption (about 12.5W with any governor), when running
> updatedb (so IO bound), the performance governor seems to consume
> more than ondemand and powersave (14.8-15.8W instead of 14.0-14.6W).
> I'm very careful with the results of this "experiment" because it's
> only using the ACPI report for the power usage and done with many
> other programs in the background. Nevertheless, it manages to
> convince me that this change is not going to be as harmless for the
> power consumption as you suggest.

be careful; you're measuring power not energy. You also need to take
into account that things are now done quicker, so that you can then be
idle longer later! So yes instantaneous you're using a bit more power
(since you're getting much more performance), but you're done much
quicker as well.... Once you do this the equation changes, and it's
more or less a wash.


As for your general "ondemand is for everyone" concern; there are many
things wrong with ondemand, and I'm writing a new governor to fix the
more fundamental issues with it (and also, frankly, so that I won't
break existing users and hardware I don't have access to). This is
basically a backport of a specific feature of my new governor to
ondemand because Andrew keeps hitting the really bad case and basically
ended up turning power management off.


-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO
  2010-04-19 13:46     ` Arjan van de Ven
@ 2010-04-19 15:29       ` Tvrtko Ursulin
  2010-04-20  0:47         ` Arjan van de Ven
  0 siblings, 1 reply; 53+ messages in thread
From: Tvrtko Ursulin @ 2010-04-19 15:29 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel, akpm, mingo, peterz, tglx, davej, cpufreq

On Monday 19 Apr 2010 14:46:17 Arjan van de Ven wrote:
> On Mon, 19 Apr 2010 10:09:55 +0100
> > Or in other words, does a pure IO workload benefit from now higher
> > selected frequency?
>
> no.
> Mixed workloads do.
> but pure IO workloads also don't suffer since while idle, the voltage
> goes down anyway.

You mean that higher frequency does not have effect on power use if CPU is
idle? Is that true for all/most processors?

> > One idea I had but a) never had time to implement it and b) was not
> > sure it would be accepted anyway, was to modify ondemand governor to
> > ramp up instantly, but slow down slowly (in a configurable way).
>
> that's what ondemand does already.

How and where in the code and how to enable that behaviour? From my
experiments frequency goes down to minimum as soon as load goes away. What I
was talking about is gradual lowering over a configurable period. It is not
power efficient, but it could be good for latency in some workloads.

Tvrtko

Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 348 3873 20.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO
  2010-04-19 15:29       ` Tvrtko Ursulin
@ 2010-04-20  0:47         ` Arjan van de Ven
  2010-04-20  9:10           ` Tvrtko Ursulin
  2010-04-23  5:26           ` Pavel Machek
  0 siblings, 2 replies; 53+ messages in thread
From: Arjan van de Ven @ 2010-04-20  0:47 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: linux-kernel, akpm, mingo, peterz, tglx, davej, cpufreq

On Mon, 19 Apr 2010 16:29:39 +0100
Tvrtko Ursulin <tvrtko.ursulin@sophos.com> wrote:

> On Monday 19 Apr 2010 14:46:17 Arjan van de Ven wrote:
> > On Mon, 19 Apr 2010 10:09:55 +0100
> > > Or in other words, does a pure IO workload benefit from now higher
> > > selected frequency?
> >
> > no.
> > Mixed workloads do.
> > but pure IO workloads also don't suffer since while idle, the
> > voltage goes down anyway.
> 
> You mean that higher frequency does not have effect on power use if
> CPU is idle? Is that true for all/most processors?

this is true for most processors that I'm aware of.
there's exceptions for things like where the idle time is really short,
where going up and down in voltage will take more energy than it'll
save and such.

> 
> > > One idea I had but a) never had time to implement it and b) was
> > > not sure it would be accepted anyway, was to modify ondemand
> > > governor to ramp up instantly, but slow down slowly (in a
> > > configurable way).
> >
> > that's what ondemand does already.
> 
> How and where in the code and how to enable that behaviour? From my
> experiments frequency goes down to minimum as soon as load goes away.
> What I was talking about is gradual lowering over a configurable
> period. It is not power efficient, but it could be good for latency
> in some workloads.

it's not even good for that ;-(

it's better then to stay high longer... at least on modern machines the
inbetween states are pretty much either useless or actually energy
hurting compared to the higher state.


> 
> Tvrtko
> 
> Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP,
> United Kingdom. Company Reg No 2096520. VAT Reg No GB 348 3873 20.


-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO
  2010-04-20  0:47         ` Arjan van de Ven
@ 2010-04-20  9:10           ` Tvrtko Ursulin
  2010-04-20 11:02             ` Arjan van de Ven
  2010-04-23  5:26           ` Pavel Machek
  1 sibling, 1 reply; 53+ messages in thread
From: Tvrtko Ursulin @ 2010-04-20  9:10 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel, akpm, mingo, peterz, tglx, davej, cpufreq

On Tuesday 20 Apr 2010 01:47:02 Arjan van de Ven wrote:
> > > > One idea I had but a) never had time to implement it and b) was
> > > > not sure it would be accepted anyway, was to modify ondemand
> > > > governor to ramp up instantly, but slow down slowly (in a
> > > > configurable way).
> > >
> > > that's what ondemand does already.
> >
> > How and where in the code and how to enable that behaviour? From my
> > experiments frequency goes down to minimum as soon as load goes away.
> > What I was talking about is gradual lowering over a configurable
> > period. It is not power efficient, but it could be good for latency
> > in some workloads.
>
> it's not even good for that ;-(
>
> it's better then to stay high longer... at least on modern machines the
> inbetween states are pretty much either useless or actually energy
> hurting compared to the higher state.

Why do you think it is not good for latency to stay at higher frequency for
longer? I had a machine until recently with a relatively slow Turion64 which
had 800Mhz minimum and 1.8Ghz maximum frequency states. With ondemand governor
when web browsing for example, frequency would drop to 800Mhz as soon as
scrolling or such stopped, and then was pushed back up to max on user
interaction. Overall experience was sluggish, but when switching to
performance governor it was much much better. That is why I proposed to have a
gradual lowering as an option in on-demand. You said it already does that - I
ask are you sure? And also now you are saying it would not be good for latency
- above is an example when it clearly does help (a lot).

Would this idea of gradual lowering help for the "git grep" use case as well?
To me it sounds better than taking IO wait into account.  What happens with
your scheme with a pure IO load? Ok, you say power use is not determined by
frequency, fine, but it sounds slightly wrong to max the frequency on pure IO
wait. If frequency per se really does not waste power then gradual lowering
could both "git grep" and my web browsing latency use case, what do you think?

Tvrtko

Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 348 3873 20.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO
  2010-04-19 13:43     ` Arjan van de Ven
  2010-04-19 14:30       ` Éric Piel
@ 2010-04-20  9:24       ` Thomas Renninger
  2010-04-27  0:29         ` Mike Chan
  1 sibling, 1 reply; 53+ messages in thread
From: Thomas Renninger @ 2010-04-20  9:24 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Éric Piel, linux-kernel, akpm, mingo, peterz, tglx, davej, cpufreq

On Monday 19 April 2010 15:43:25 Arjan van de Ven wrote:
> On Mon, 19 Apr 2010 10:29:47 +0200
> Éric Piel <eric.piel@tremplin-utc.net> wrote:
> > > 
> > > The problem and fix are both verified with the "perf timechar" tool.
> > Hi,
> > I don't doubt that keeping the cpu full frequency during IO can
> > improve some specific workloads, however in your log message you
> > don't explain how much we are loosing.
> 
> first of all, it's so bad that people will just turn the whole power
> management off... at which point fixing the really bad bug is actually
> quite a win
Not sure you fix a bug, I expect this was done on purpose.
The ondemand governor disadvantages processes with alternating short CPU
load peaks and idle sequences.
IO bound processes typically show up with such a behavior.

But I follow Eric and agree that if it costs that much, changing
above sounds sane.
Still, I could imagine some people might want to not raise freq on IO bound
process activity, therefore this should get another ondemand param, similar
to ignore_nice_load.

    Thomas

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO
  2010-04-18 19:03 ` [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO Arjan van de Ven
  2010-04-19  8:29   ` Éric Piel
  2010-04-19  9:09   ` Tvrtko Ursulin
@ 2010-04-20  9:29   ` Thomas Renninger
  2010-04-20 11:07     ` Arjan van de Ven
  2010-04-23  5:24   ` Pavel Machek
  2010-04-26 21:30   ` Rik van Riel
  4 siblings, 1 reply; 53+ messages in thread
From: Thomas Renninger @ 2010-04-20  9:29 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel, akpm, mingo, peterz, tglx, davej, cpufreq

On Sunday 18 April 2010 21:03:46 Arjan van de Ven wrote:
> From 27966bedabea83c4f3ae77507eceb746b1f6ebae Mon Sep 17 00:00:00 2001
> From: Arjan van de Ven <arjan@linux.intel.com>
> Date: Sun, 18 Apr 2010 11:15:56 -0700
> Subject: [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO

 
> This algorithm falls flat on its face in the light of workloads where
> you're alternatingly disk and CPU bound, such as the ever popular
> "git grep", but also things like startup of programs and maildir using
> email clients... much to the chagarin of Andrew Morton.
Have you looked at:
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_transition_latency
and
/sys/devices/system/cpu/cpu0/cpufreq/ondemand/sampling_rate
Transition latency comes from ACPI tables and sampling rate depends
on it.
Reducing the sampling rate, significantly reduces performance loss.
There were bugs in this area, on latest kernels with latest (MSR switching)
HW you should see:
cat /sys/devices/system/cpu/cpu0/cpufreq/ondemand/sampling_rate
10000

    Thomas

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO
  2010-04-20  9:10           ` Tvrtko Ursulin
@ 2010-04-20 11:02             ` Arjan van de Ven
  2010-04-28  8:57               ` Tvrtko Ursulin
  0 siblings, 1 reply; 53+ messages in thread
From: Arjan van de Ven @ 2010-04-20 11:02 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: linux-kernel, akpm, mingo, peterz, tglx, davej, cpufreq

On Tue, 20 Apr 2010 10:10:49 +0100
Tvrtko Ursulin <tvrtko.ursulin@sophos.com> wrote:
> >
> > it's better then to stay high longer... at least on modern machines
> > the inbetween states are pretty much either useless or actually
> > energy hurting compared to the higher state.
> 
> Why do you think it is not good for latency to stay at higher
> frequency for longer?

oh it will be. 

> governor it was much much better. That is why I proposed to have a
> gradual lowering as an option in on-demand. You said it already does
> that - I ask are you sure? And also now you are saying it would not
> be good for latency
> - above is an example when it clearly does help (a lot).

but the *gradual* lowering just does not make sense.
might as well just stay at the highest frequency for the duration for
which you do the gradual lowering, it's more efficient for power on most
modern machines.

The problem is if you do this gradual thing for long enough to help
you on desktop loads, or the stay-high thing, on server workloads the
power efficiency will completely and utterly tank. Been there tried
that ;-)


frankly, you're starting to touch at the more fundamental issues with
ondemand. I'm not trying to solve that in this small patch, they really
can only be solved in a complete redesign. While I'm doing such a
redesign, I was asked to at least fix this particular gap urgently.
Solving all of ondemand's issues needs a complete rethink of the core
assumptions it makes (which are not correct anymore nowadays).

-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO
  2010-04-20  9:29   ` Thomas Renninger
@ 2010-04-20 11:07     ` Arjan van de Ven
  0 siblings, 0 replies; 53+ messages in thread
From: Arjan van de Ven @ 2010-04-20 11:07 UTC (permalink / raw)
  To: Thomas Renninger; +Cc: linux-kernel, akpm, mingo, peterz, tglx, davej, cpufreq

On Tue, 20 Apr 2010 11:29:02 +0200
Thomas Renninger <trenn@suse.de> wrote:

> Have you looked at:
> /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_transition_latency
> and
> /sys/devices/system/cpu/cpu0/cpufreq/ondemand/sampling_rate
> Transition latency comes from ACPI tables and sampling rate depends
> on it.

of course I have; in practice on systems i use it is always a 10
milliseconds interval, which is the minimum it gets.
Yes this was a bug, no it wasn't the bug here ;-)

> Reducing the sampling rate, significantly reduces performance loss

while it does, it is not nearly sufficient for the alternating IO/CPU
cases I've looked at. I've looked at many timecharts for various IO/CPU
workloads, including my normal own use as well as Andrews and the CPU
busy periods are in the 1 - 20 msec range between IOs most of the time,
for which a 10 msec sampling is obviously problematic. 

-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO
  2010-04-18 19:03 ` [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO Arjan van de Ven
                     ` (2 preceding siblings ...)
  2010-04-20  9:29   ` Thomas Renninger
@ 2010-04-23  5:24   ` Pavel Machek
  2010-04-23  5:38     ` Willy Tarreau
  2010-04-23 13:52     ` Arjan van de Ven
  2010-04-26 21:30   ` Rik van Riel
  4 siblings, 2 replies; 53+ messages in thread
From: Pavel Machek @ 2010-04-23  5:24 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel, akpm, mingo, peterz, tglx, davej, cpufreq

Hi!

> The ondemand cpufreq governor uses CPU busy time (e.g. not-idle time) as
> a measure for scaling the CPU frequency up or down.
> If the CPU is busy, the CPU frequency scales up, if it's idle, the CPU
> frequency scales down. Effectively, it uses the CPU busy time as proxy
> variable for the more nebulous "how critical is performance right now"
> question.
> 
> This algorithm falls flat on its face in the light of workloads where
> you're alternatingly disk and CPU bound, such as the ever popular
> "git grep", but also things like startup of programs and maildir using
> email clients... much to the chagarin of Andrew Morton.
> 
> This patch changes the ondemand algorithm to count iowait time as busy,
> not idle, time. As shown in the breakdown cases above, iowait is performance
> critical often, and by counting iowait, the proxy variable becomes a more
> accurate representation of the "how critical is performance"
> question.

Well, and now, if you do something like cat /dev/<your usb1.1 hdd> >
/dev/null, you'll keep cpu on max frequency. Not a problem for new
core i7, but probably big deal for athlon 64.

Maybe modern cpus can and should simply react faster?

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO
  2010-04-20  0:47         ` Arjan van de Ven
  2010-04-20  9:10           ` Tvrtko Ursulin
@ 2010-04-23  5:26           ` Pavel Machek
  1 sibling, 0 replies; 53+ messages in thread
From: Pavel Machek @ 2010-04-23  5:26 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Tvrtko Ursulin, linux-kernel, akpm, mingo, peterz, tglx, davej, cpufreq

On Mon 2010-04-19 17:47:02, Arjan van de Ven wrote:
> On Mon, 19 Apr 2010 16:29:39 +0100
> Tvrtko Ursulin <tvrtko.ursulin@sophos.com> wrote:
> 
> > On Monday 19 Apr 2010 14:46:17 Arjan van de Ven wrote:
> > > On Mon, 19 Apr 2010 10:09:55 +0100
> > > > Or in other words, does a pure IO workload benefit from now higher
> > > > selected frequency?
> > >
> > > no.
> > > Mixed workloads do.
> > > but pure IO workloads also don't suffer since while idle, the
> > > voltage goes down anyway.
> > 
> > You mean that higher frequency does not have effect on power use if
> > CPU is idle? Is that true for all/most processors?
> 
> this is true for most processors that I'm aware of.
> there's exceptions for things like where the idle time is really short,

Is not that exactly what will happen for 'cat /dev/<usb1>' case?

Plus I suspect that older cpus are slower at changing voltages, and
slower at powering down when idle...

> > How and where in the code and how to enable that behaviour? From my
> > experiments frequency goes down to minimum as soon as load goes away.
> > What I was talking about is gradual lowering over a configurable
> > period. It is not power efficient, but it could be good for latency
> > in some workloads.
> 
> it's not even good for that ;-(
> 
> it's better then to stay high longer... at least on modern machines the
> inbetween states are pretty much either useless or actually energy
> hurting compared to the higher state.

So what about hiding those from ondemand on modern hw?
								Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO
  2010-04-23  5:24   ` Pavel Machek
@ 2010-04-23  5:38     ` Willy Tarreau
  2010-04-23  8:50       ` Thomas Renninger
  2010-04-23 14:10       ` [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO Arjan van de Ven
  2010-04-23 13:52     ` Arjan van de Ven
  1 sibling, 2 replies; 53+ messages in thread
From: Willy Tarreau @ 2010-04-23  5:38 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Arjan van de Ven, linux-kernel, akpm, mingo, peterz, tglx, davej,
	cpufreq

On Fri, Apr 23, 2010 at 07:24:39AM +0200, Pavel Machek wrote:
> Hi!
> 
> > The ondemand cpufreq governor uses CPU busy time (e.g. not-idle time) as
> > a measure for scaling the CPU frequency up or down.
> > If the CPU is busy, the CPU frequency scales up, if it's idle, the CPU
> > frequency scales down. Effectively, it uses the CPU busy time as proxy
> > variable for the more nebulous "how critical is performance right now"
> > question.
> > 
> > This algorithm falls flat on its face in the light of workloads where
> > you're alternatingly disk and CPU bound, such as the ever popular
> > "git grep", but also things like startup of programs and maildir using
> > email clients... much to the chagarin of Andrew Morton.
> > 
> > This patch changes the ondemand algorithm to count iowait time as busy,
> > not idle, time. As shown in the breakdown cases above, iowait is performance
> > critical often, and by counting iowait, the proxy variable becomes a more
> > accurate representation of the "how critical is performance"
> > question.
> 
> Well, and now, if you do something like cat /dev/<your usb1.1 hdd> >
> /dev/null, you'll keep cpu on max frequency. Not a problem for new
> core i7, but probably big deal for athlon 64.

So that also means that my notebook's CPU fan will spin like mad as soon
as I access a USB key ? It will not help the key go faster...

Probably that iowait should only change the time required to go back to
low frequency. That way, if there is no CPU nor I/O activity, we can
switch back to a low frequency quickly, but if we see I/O activity, we
could decide to wait for 1 second (for instance) for CPU idle before
switching back to low frequency.

> Maybe modern cpus can and should simply react faster?

It's above all that we should not try to switch too fast if we know the
CPU can't follow.

Willy


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO
  2010-04-23 13:52     ` Arjan van de Ven
@ 2010-04-23  8:38       ` Pavel Machek
  2010-04-23 16:06         ` Arjan van de Ven
  2010-04-24  4:56       ` Pavel Machek
  1 sibling, 1 reply; 53+ messages in thread
From: Pavel Machek @ 2010-04-23  8:38 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel, akpm, mingo, peterz, tglx, davej, cpufreq

On Fri 2010-04-23 06:52:48, Arjan van de Ven wrote:
> On Fri, 23 Apr 2010 07:24:39 +0200
> Pavel Machek <pavel@ucw.cz> wrote:
> 
> > 
> > Well, and now, if you do something like cat /dev/<your usb1.1 hdd> >
> > /dev/null, you'll keep cpu on max frequency. Not a problem for new
> > core i7, but probably big deal for athlon 64.
> 
> do you have facts not speculation for this? Does the athlon 64
> really

You want the patch applied, you should be able to justify it.

> keep its voltage high during idle? That would surprise me greatly...
> (and if it does, does it matter? the clock is stopped anyway there)

Yes, I believe it keeps voltage up, and as a leakage is big part of
power consumption there, stopped clocks will not help much.

I believe you are developing on wrong machine. Seems like core i7 just
wants max frequency, all the time. Older designs were not like that.

Do you have early intel speedstep machine near you?

> the only place where my patch makes a real difference is when the cpu
> is idle due to blocking IO! So do you have data that the athlon 64 gets
> too hot if you select a high frequency during an idle period, where the
> clock is already stopped?

iirc even idle power consumption was much higher when on max
voltage... I'll get some numbers from my old notes; I no longer have
the hw.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO
  2010-04-23  5:38     ` Willy Tarreau
@ 2010-04-23  8:50       ` Thomas Renninger
  2010-04-23 16:08         ` Arjan van de Ven
  2010-04-23 14:10       ` [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO Arjan van de Ven
  1 sibling, 1 reply; 53+ messages in thread
From: Thomas Renninger @ 2010-04-23  8:50 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Pavel Machek, Arjan van de Ven, linux-kernel, akpm, mingo,
	peterz, tglx, davej, cpufreq

On Friday 23 April 2010 07:38:58 Willy Tarreau wrote:
> On Fri, Apr 23, 2010 at 07:24:39AM +0200, Pavel Machek wrote:
> > Hi!
> > 
... 
> > > This patch changes the ondemand algorithm to count iowait time as busy,
> > > not idle, time. As shown in the breakdown cases above, iowait is performance
> > > critical often, and by counting iowait, the proxy variable becomes a more
> > > accurate representation of the "how critical is performance"
> > > question.
> > 
> > Well, and now, if you do something like cat /dev/<your usb1.1 hdd> >
> > /dev/null, you'll keep cpu on max frequency. Not a problem for new
> > core i7, but probably big deal for athlon 64.
> 
> So that also means that my notebook's CPU fan will spin like mad as soon
> as I access a USB key ? It will not help the key go faster...
> 
> Probably that iowait should only change the time required to go back to
> low frequency. That way, if there is no CPU nor I/O activity, we can
> switch back to a low frequency quickly, but if we see I/O activity, we
> could decide to wait for 1 second (for instance)
that sounds wrong.
> for CPU idle before switching back to low frequency.
I'd just make this decission (consider IO time yes/no) configurable.
A userspace daemon/app could then e.g. switch when on battery, AC, if
it's a laptop in general or if user switched to a silent/powersave mode
when being in a library.
It may be more efficient to ramp the freq up during the whole IO process and
this is crucial in the server area, but most laptop/desktop users are happy
as long as the data for their video/audio stream is fetched from the device
quickly enough for displaying.
Especially on battery, users will appreciate some minutes of more battery
lifetime and do not care about some ms of IO latencies.
You would cut of this elementary cpufreq feature for the ondemand governor
with these patches.

    Thomas

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO
  2010-04-23  5:24   ` Pavel Machek
  2010-04-23  5:38     ` Willy Tarreau
@ 2010-04-23 13:52     ` Arjan van de Ven
  2010-04-23  8:38       ` Pavel Machek
  2010-04-24  4:56       ` Pavel Machek
  1 sibling, 2 replies; 53+ messages in thread
From: Arjan van de Ven @ 2010-04-23 13:52 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel, akpm, mingo, peterz, tglx, davej, cpufreq

On Fri, 23 Apr 2010 07:24:39 +0200
Pavel Machek <pavel@ucw.cz> wrote:

> 
> Well, and now, if you do something like cat /dev/<your usb1.1 hdd> >
> /dev/null, you'll keep cpu on max frequency. Not a problem for new
> core i7, but probably big deal for athlon 64.

do you have facts not speculation for this? Does the athlon 64 really
keep its voltage high during idle? That would surprise me greatly...
(and if it does, does it matter? the clock is stopped anyway there)

the only place where my patch makes a real difference is when the cpu
is idle due to blocking IO! So do you have data that the athlon 64 gets
too hot if you select a high frequency during an idle period, where the
clock is already stopped?



-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO
  2010-04-23  5:38     ` Willy Tarreau
  2010-04-23  8:50       ` Thomas Renninger
@ 2010-04-23 14:10       ` Arjan van de Ven
  2010-04-23 15:35         ` Willy Tarreau
  1 sibling, 1 reply; 53+ messages in thread
From: Arjan van de Ven @ 2010-04-23 14:10 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Pavel Machek, linux-kernel, akpm, mingo, peterz, tglx, davej, cpufreq

On Fri, 23 Apr 2010 07:38:58 +0200
Willy Tarreau <w@1wt.eu> wrote:

> On Fri, Apr 23, 2010 at 07:24:39AM +0200, Pavel Machek wrote:
> > Hi!
> > 
> > > The ondemand cpufreq governor uses CPU busy time (e.g. not-idle
> > > time) as a measure for scaling the CPU frequency up or down.
> > > If the CPU is busy, the CPU frequency scales up, if it's idle,
> > > the CPU frequency scales down. Effectively, it uses the CPU busy
> > > time as proxy variable for the more nebulous "how critical is
> > > performance right now" question.
> > > 
> > > This algorithm falls flat on its face in the light of workloads
> > > where you're alternatingly disk and CPU bound, such as the ever
> > > popular "git grep", but also things like startup of programs and
> > > maildir using email clients... much to the chagarin of Andrew
> > > Morton.
> > > 
> > > This patch changes the ondemand algorithm to count iowait time as
> > > busy, not idle, time. As shown in the breakdown cases above,
> > > iowait is performance critical often, and by counting iowait, the
> > > proxy variable becomes a more accurate representation of the "how
> > > critical is performance" question.
> > 
> > Well, and now, if you do something like cat /dev/<your usb1.1 hdd> >
> > /dev/null, you'll keep cpu on max frequency. Not a problem for new
> > core i7, but probably big deal for athlon 64.
> 
> So that also means that my notebook's CPU fan will spin like mad as
> soon as I access a USB key ? 

unlikely. your notebook CPU will stop its clocks, if not drop its
voltage, during idle. So during that time the frequency is 0; the only
difference is in how much leakage you get from a higher voltage
for those CPUs that do not powergate or drop their cpu in idle.
Most CPUs that I know of do either or both of that.



-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO
  2010-04-23 14:10       ` [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO Arjan van de Ven
@ 2010-04-23 15:35         ` Willy Tarreau
  0 siblings, 0 replies; 53+ messages in thread
From: Willy Tarreau @ 2010-04-23 15:35 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Pavel Machek, linux-kernel, akpm, mingo, peterz, tglx, davej, cpufreq

On Fri, Apr 23, 2010 at 07:10:50AM -0700, Arjan van de Ven wrote:
> On Fri, 23 Apr 2010 07:38:58 +0200
> Willy Tarreau <w@1wt.eu> wrote:
> 
> > On Fri, Apr 23, 2010 at 07:24:39AM +0200, Pavel Machek wrote:
> > > Hi!
> > > 
> > > > The ondemand cpufreq governor uses CPU busy time (e.g. not-idle
> > > > time) as a measure for scaling the CPU frequency up or down.
> > > > If the CPU is busy, the CPU frequency scales up, if it's idle,
> > > > the CPU frequency scales down. Effectively, it uses the CPU busy
> > > > time as proxy variable for the more nebulous "how critical is
> > > > performance right now" question.
> > > > 
> > > > This algorithm falls flat on its face in the light of workloads
> > > > where you're alternatingly disk and CPU bound, such as the ever
> > > > popular "git grep", but also things like startup of programs and
> > > > maildir using email clients... much to the chagarin of Andrew
> > > > Morton.
> > > > 
> > > > This patch changes the ondemand algorithm to count iowait time as
> > > > busy, not idle, time. As shown in the breakdown cases above,
> > > > iowait is performance critical often, and by counting iowait, the
> > > > proxy variable becomes a more accurate representation of the "how
> > > > critical is performance" question.
> > > 
> > > Well, and now, if you do something like cat /dev/<your usb1.1 hdd> >
> > > /dev/null, you'll keep cpu on max frequency. Not a problem for new
> > > core i7, but probably big deal for athlon 64.
> > 
> > So that also means that my notebook's CPU fan will spin like mad as
> > soon as I access a USB key ? 
> 
> unlikely. your notebook CPU will stop its clocks, if not drop its
> voltage, during idle. So during that time the frequency is 0; the only
> difference is in how much leakage you get from a higher voltage
> for those CPUs that do not powergate or drop their cpu in idle.
> Most CPUs that I know of do either or both of that.

OK, I will have to give that a run on my Atom and Pentium-M, both of
which provide me with a very long run on battery precisely because
they almost always run at the lowest possible frequency and voltage.

On the pentium-M, I already know there is a difference between full
voltage and low voltage even when idle. Its lowest frequency is 600
MHz, and my kernel still has the old phc patch to lower clock and
voltage. When idle on this kernel, the fan is completely stopped
and never wakes up. When idle on a 2.6.32 (which does not have the
patch), its voltage doesn't drop though the frequency does, and the
fan runs at 1/4. It also lasts shorter (about 30 min less). So that
means that even when idle there's a noticeable difference.

As was suggested, if the difference appears to be significant, maybe
a /sys tunable would allow all users to select their preferred mode.
Alternatively, we could have two distinct ondemand settings, one for
pure CPU and another one for CPU+I/O.

Regards,
Willy


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO
  2010-04-23  8:38       ` Pavel Machek
@ 2010-04-23 16:06         ` Arjan van de Ven
  0 siblings, 0 replies; 53+ messages in thread
From: Arjan van de Ven @ 2010-04-23 16:06 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel, akpm, mingo, peterz, tglx, davej, cpufreq

On Fri, 23 Apr 2010 10:38:28 +0200
Pavel Machek <pavel@ucw.cz> wrote:

> On Fri 2010-04-23 06:52:48, Arjan van de Ven wrote:
> > On Fri, 23 Apr 2010 07:24:39 +0200
> > Pavel Machek <pavel@ucw.cz> wrote:
> > 
> > > 
> > > Well, and now, if you do something like cat /dev/<your usb1.1
> > > hdd> > /dev/null, you'll keep cpu on max frequency. Not a problem
> > > hdd> > for new
> > > core i7, but probably big deal for athlon 64.
> > 
> > do you have facts not speculation for this? Does the athlon 64
> > really
> 
> You want the patch applied, you should be able to justify it.

You make a claim... all I am asking if you are doing just random guess
or basing this on facts.

The machines I have access to don't show any impact during actual idle,
because they stop clocks and generally even lower the voltage. 
You make a claim that a certain machine does not do either... all I'm
asking if that claim is based on data or on speculation.

> 
> > keep its voltage high during idle? That would surprise me greatly...
> > (and if it does, does it matter? the clock is stopped anyway there)
> 
> Yes, I believe it keeps voltage up, and as a leakage is big part of
> power consumption there, stopped clocks will not help much.

again do you have actual data? 

 
> I believe you are developing on wrong machine. Seems like core i7 just
> wants max frequency, all the time. Older designs were not like that.
> 
> Do you have early intel speedstep machine near you?

oh I use many different machines. the intel machines at least stop the
clocks, and for a really long time also lower the frequency in idle.
(especially during deeper C states, but even during C1)

> 
> > the only place where my patch makes a real difference is when the
> > cpu is idle due to blocking IO! So do you have data that the athlon
> > 64 gets too hot if you select a high frequency during an idle
> > period, where the clock is already stopped?
> 
> iirc even idle power consumption was much higher when on max
> voltage... I'll get some numbers from my old notes; I no longer have
> the hw.

make sure it's data based on tickless... without tickless we were never
really idle ;-(



-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO
  2010-04-23  8:50       ` Thomas Renninger
@ 2010-04-23 16:08         ` Arjan van de Ven
  2010-04-27 11:39           ` Thomas Renninger
  0 siblings, 1 reply; 53+ messages in thread
From: Arjan van de Ven @ 2010-04-23 16:08 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: Willy Tarreau, Pavel Machek, linux-kernel, akpm, mingo, peterz,
	tglx, davej, cpufreq

On Fri, 23 Apr 2010 10:50:10 +0200
Thomas Renninger <trenn@suse.de> wrote:
 Especially on battery, users will appreciate some minutes
> of more battery lifetime and do not care about some ms of IO
> latencies. 

the assumption that power doesn't matter on AC is a huge fiction
that any data center operator would love to get out of everyones head
as quickly as possible.




-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO
  2010-04-23 13:52     ` Arjan van de Ven
  2010-04-23  8:38       ` Pavel Machek
@ 2010-04-24  4:56       ` Pavel Machek
  2010-05-01 23:29         ` Arjan van de Ven
  1 sibling, 1 reply; 53+ messages in thread
From: Pavel Machek @ 2010-04-24  4:56 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel, akpm, mingo, peterz, tglx, davej, cpufreq

Hi!

> > Well, and now, if you do something like cat /dev/<your usb1.1 hdd> >
> > /dev/null, you'll keep cpu on max frequency. Not a problem for new
> > core i7, but probably big deal for athlon 64.
> 
> do you have facts not speculation for this? Does the athlon 64 really
> keep its voltage high during idle? That would surprise me greatly...

So... some old data. It is not exactly athlon 64 -- I don't have that
particular number for it -- but: (from my old notes):

evo n620c [63.3 Wh]
~~~~~~~~~~~~~~~~~
idle machine at 1.7GHz, min bl: 13.0 W
idle machine at 600MHz, min bl: 12.0 W

thinkpad x32 [52Wh]
~~~~~~~~~~~~
idle machine at 600MHz, min bl:         10 W
             at 1.8GHz:                 +6.6 W

hp nx5k [56.8Wh]
~~~~~~~
idle, min bl:                   19 W
      min bl, 1.4GHz:           22 W

...so yes, I kind of see a pattern there. And you should be able to
easily see the difference, too, if you took something from that era...

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 1/7] sched: add a comment to get_cpu_idle_time_us()
  2010-04-18 19:00 ` [PATCH 1/7] sched: add a comment to get_cpu_idle_time_us() Arjan van de Ven
@ 2010-04-26 19:25   ` Rik van Riel
  0 siblings, 0 replies; 53+ messages in thread
From: Rik van Riel @ 2010-04-26 19:25 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel, akpm, mingo, peterz, tglx, davej, cpufreq

On 04/18/2010 03:00 PM, Arjan van de Ven wrote:
>  From 6f9df3bc6571d6545c552151f408d69265e15f92 Mon Sep 17 00:00:00 2001
> From: Arjan van de Ven<arjan@linux.intel.com>
> Date: Sun, 18 Apr 2010 10:25:19 -0700
> Subject: [PATCH 1/7] sched: add a comment to get_cpu_idle_time_us()
>
> The exported function get_cpu_idle_time_us() has no comment
> describing it; add a kerneldoc comment
>
> Signed-off-by: Arjan van de Ven<arjan@linux.intel.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 2/7] sched: introduce a function to update the idle statistics
  2010-04-18 19:01 ` [PATCH 2/7] sched: introduce a function to update the idle statistics Arjan van de Ven
@ 2010-04-26 20:11   ` Rik van Riel
  0 siblings, 0 replies; 53+ messages in thread
From: Rik van Riel @ 2010-04-26 20:11 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel, akpm, mingo, peterz, tglx, davej, cpufreq

On 04/18/2010 03:01 PM, Arjan van de Ven wrote:
>  From 166b7526ccfea8b44626b6023ff5b0a8eb869bb3 Mon Sep 17 00:00:00 2001
> From: Arjan van de Ven<arjan@linux.intel.com>
> Date: Sun, 18 Apr 2010 10:33:02 -0700
> Subject: [PATCH 2/7] sched: introduce a function to update the idle statistics
>
> Currently, two places update the idle statistics (and more to
> come later in this series).
>
> This patch creates a helper function for updating these statistics.
>
> Signed-off-by: Arjan van de Ven<arjan@linux.intel.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 3/7] sched: update the idle statistics in get_cpu_idle_time_us()
  2010-04-18 19:01 ` [PATCH 3/7] sched: update the idle statistics in get_cpu_idle_time_us() Arjan van de Ven
@ 2010-04-26 20:36   ` Rik van Riel
  0 siblings, 0 replies; 53+ messages in thread
From: Rik van Riel @ 2010-04-26 20:36 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel, akpm, mingo, peterz, tglx, davej, cpufreq

On 04/18/2010 03:01 PM, Arjan van de Ven wrote:
>  From 60851b131900af03bf013afef69f3bcdbb04f1d6 Mon Sep 17 00:00:00 2001
> From: Arjan van de Ven<arjan@linux.intel.com>
> Date: Sun, 18 Apr 2010 10:41:30 -0700
> Subject: [PATCH 3/7] sched: update the idle statistics in get_cpu_idle_time_us()
>
> Right now, get_cpu_idle_time_us() only reports the idle statistics
> upto the point the CPU entered last idle; not what is valid right now.
>
> This patch adds an update of the idle statistics to get_cpu_idle_time_us(),
> so that calling this function always returns statistics that are accurate
> at the point of the call.
>
> This includes resetting the start of the idle time for accounting purposes
> to avoid double accounting.
>
> Signed-off-by: Arjan van de Ven<arjan@linux.intel.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 4/7] sched: fold updating of the last update time into update_ts_time_stats()
  2010-04-18 19:02 ` [PATCH 4/7] sched: fold updating of the last update time into update_ts_time_stats() Arjan van de Ven
@ 2010-04-26 20:58   ` Rik van Riel
  0 siblings, 0 replies; 53+ messages in thread
From: Rik van Riel @ 2010-04-26 20:58 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel, akpm, mingo, peterz, tglx, davej, cpufreq

On 04/18/2010 03:02 PM, Arjan van de Ven wrote:
>  From e75d6cd203e43ea4c5e9919f19e2882c066491b8 Mon Sep 17 00:00:00 2001
> From: Arjan van de Ven<arjan@linux.intel.com>
> Date: Sun, 18 Apr 2010 10:47:02 -0700
> Subject: [PATCH 4/7] sched: fold updating of the last update time into update_ts_time_stats()
>
> This patch folds the updating of the last_update_time into the
> update_ts_time_stats() function, and updates the callers.
>
> This allows for further cleanups that are done in the next patch.
>
> Signed-off-by: Arjan van de Ven<arjan@linux.intel.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 5/7] sched: eliminate the ts->idle_lastupdate field
  2010-04-18 19:02 ` [PATCH 5/7] sched: eliminate the ts->idle_lastupdate field Arjan van de Ven
@ 2010-04-26 21:00   ` Rik van Riel
  0 siblings, 0 replies; 53+ messages in thread
From: Rik van Riel @ 2010-04-26 21:00 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel, akpm, mingo, peterz, tglx, davej, cpufreq

On 04/18/2010 03:02 PM, Arjan van de Ven wrote:
>  From 526a9f347d5a953f37b67b4b2afb39d7b4d77a92 Mon Sep 17 00:00:00 2001
> From: Arjan van de Ven<arjan@linux.intel.com>
> Date: Sun, 18 Apr 2010 10:49:30 -0700
> Subject: [PATCH 5/7] sched: eliminate the ts->idle_lastupdate field
>
> Now that the only user of ts->idle_lastupdate is update_ts_time_stats(),
> the entire field can be eliminated.
>
> In update_ts_time_stats(), idle_lastupdate is first set to "now",
> and a few lines later, the only user is an if() statement that
> assigns a variable either to "now" or to ts->idle_lastupdate,
> which has the value of "now" at that point.
>
> Signed-off-by: Arjan van de Ven<arjan@linux.intel.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 6/7] sched: introduce get_cpu_iowait_time_us()
  2010-04-18 19:03 ` [PATCH 6/7] sched: introduce get_cpu_iowait_time_us() Arjan van de Ven
@ 2010-04-26 21:05   ` Rik van Riel
  0 siblings, 0 replies; 53+ messages in thread
From: Rik van Riel @ 2010-04-26 21:05 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel, akpm, mingo, peterz, tglx, davej, cpufreq

On 04/18/2010 03:03 PM, Arjan van de Ven wrote:
>  From c4dd11703034f2ecbc3180603663fab14c292d7c Mon Sep 17 00:00:00 2001
> From: Arjan van de Ven<arjan@linux.intel.com>
> Date: Sun, 18 Apr 2010 10:57:43 -0700
> Subject: [PATCH 6/7] sched: introduce get_cpu_iowait_time_us()
>
> For the ondemand cpufreq governor, it is desired that the iowait
> time is microaccounted in a similar way as idle time is.
>
> This patch introduces the infrastructure to account and expose
> this information via the get_cpu_iowait_time_us() function.
>
> Signed-off-by: Arjan van de Ven<arjan@linux.intel.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO
  2010-04-18 19:03 ` [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO Arjan van de Ven
                     ` (3 preceding siblings ...)
  2010-04-23  5:24   ` Pavel Machek
@ 2010-04-26 21:30   ` Rik van Riel
  4 siblings, 0 replies; 53+ messages in thread
From: Rik van Riel @ 2010-04-26 21:30 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel, akpm, mingo, peterz, tglx, davej, cpufreq

On 04/18/2010 03:03 PM, Arjan van de Ven wrote:
>  From 27966bedabea83c4f3ae77507eceb746b1f6ebae Mon Sep 17 00:00:00 2001
> From: Arjan van de Ven<arjan@linux.intel.com>
> Date: Sun, 18 Apr 2010 11:15:56 -0700
> Subject: [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO
>
> The ondemand cpufreq governor uses CPU busy time (e.g. not-idle time) as
> a measure for scaling the CPU frequency up or down.
> If the CPU is busy, the CPU frequency scales up, if it's idle, the CPU
> frequency scales down. Effectively, it uses the CPU busy time as proxy
> variable for the more nebulous "how critical is performance right now"
> question.
>
> This algorithm falls flat on its face in the light of workloads where
> you're alternatingly disk and CPU bound, such as the ever popular
> "git grep", but also things like startup of programs and maildir using
> email clients... much to the chagarin of Andrew Morton.
>
> This patch changes the ondemand algorithm to count iowait time as busy,
> not idle, time. As shown in the breakdown cases above, iowait is performance
> critical often, and by counting iowait, the proxy variable becomes a more
> accurate representation of the "how critical is performance" question.
>
> The problem and fix are both verified with the "perf timechar" tool.
>
> Signed-off-by: Arjan van de Ven<arjan@linux.intel.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/7] Fix performance issue with ondemand governor
  2010-04-18 18:59 [PATCH 0/7] Fix performance issue with ondemand governor Arjan van de Ven
                   ` (6 preceding siblings ...)
  2010-04-18 19:03 ` [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO Arjan van de Ven
@ 2010-04-26 21:41 ` Dave Jones
  2010-04-26 21:45   ` Dominik Brodowski
  7 siblings, 1 reply; 53+ messages in thread
From: Dave Jones @ 2010-04-26 21:41 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel, akpm, mingo, peterz, tglx, cpufreq

On Sun, Apr 18, 2010 at 11:59:49AM -0700, Arjan van de Ven wrote:
 > There have been various reports of the ondemand governor causing some
 > serious performance issues, one of the latest ones from Andrew.
 > There are several fundamental issues with ondemand (being worked on),
 > but the report from Andrew can be fixed relatively easily.
 > 
 > The fundamental issue is that ondemand will go to a (too) low CPU
 > frequency for workloads that alternatingly disk and CPU bound...

given the scheduler changes, these probably make more sense to go
via one of Ingo's trees ?
If everyone agrees, feel free to add my Signed-off-by:

	Dave


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/7] Fix performance issue with ondemand governor
  2010-04-26 21:41 ` [PATCH 0/7] Fix performance issue with ondemand governor Dave Jones
@ 2010-04-26 21:45   ` Dominik Brodowski
  2010-04-26 21:59     ` Rik van Riel
  0 siblings, 1 reply; 53+ messages in thread
From: Dominik Brodowski @ 2010-04-26 21:45 UTC (permalink / raw)
  To: Dave Jones, Arjan van de Ven, linux-kernel, akpm, mingo, peterz,
	tglx, cpufreq

On Mon, Apr 26, 2010 at 05:41:56PM -0400, Dave Jones wrote:
> On Sun, Apr 18, 2010 at 11:59:49AM -0700, Arjan van de Ven wrote:
>  > There have been various reports of the ondemand governor causing some
>  > serious performance issues, one of the latest ones from Andrew.
>  > There are several fundamental issues with ondemand (being worked on),
>  > but the report from Andrew can be fixed relatively easily.
>  > 
>  > The fundamental issue is that ondemand will go to a (too) low CPU
>  > frequency for workloads that alternatingly disk and CPU bound...
> 
> given the scheduler changes, these probably make more sense to go
> via one of Ingo's trees ?
> If everyone agrees, feel free to add my Signed-off-by:

What about the issue Pavel reported and Willy confirmed (regression
on older hardware)?

Best,
	Dominik

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/7] Fix performance issue with ondemand governor
  2010-04-26 21:45   ` Dominik Brodowski
@ 2010-04-26 21:59     ` Rik van Riel
  2010-04-26 22:05       ` Dominik Brodowski
  0 siblings, 1 reply; 53+ messages in thread
From: Rik van Riel @ 2010-04-26 21:59 UTC (permalink / raw)
  To: Dominik Brodowski, Dave Jones, Arjan van de Ven, linux-kernel,
	akpm, mingo, peterz, tglx, cpufreq

On 04/26/2010 05:45 PM, Dominik Brodowski wrote:

>> given the scheduler changes, these probably make more sense to go
>> via one of Ingo's trees ?
>> If everyone agrees, feel free to add my Signed-off-by:
>
> What about the issue Pavel reported and Willy confirmed (regression
> on older hardware)?

Having a /sys tunable makes sense, especially when running
on battery power.  I guess that can be introduced as a patch
8/7 :)

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/7] Fix performance issue with ondemand governor
  2010-04-26 21:59     ` Rik van Riel
@ 2010-04-26 22:05       ` Dominik Brodowski
  0 siblings, 0 replies; 53+ messages in thread
From: Dominik Brodowski @ 2010-04-26 22:05 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Dave Jones, Arjan van de Ven, linux-kernel, akpm, mingo, peterz,
	tglx, cpufreq

On Mon, Apr 26, 2010 at 05:59:39PM -0400, Rik van Riel wrote:
> On 04/26/2010 05:45 PM, Dominik Brodowski wrote:
>
>>> given the scheduler changes, these probably make more sense to go
>>> via one of Ingo's trees ?
>>> If everyone agrees, feel free to add my Signed-off-by:
>>
>> What about the issue Pavel reported and Willy confirmed (regression
>> on older hardware)?
>
> Having a /sys tunable makes sense, especially when running
> on battery power.  I guess that can be introduced as a patch
> 8/7 :)

... only if combined with sane default values.

Best,
	Dominik

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] ondemand: Solve the big performance issue with  ondemand during disk IO
  2010-04-20  9:24       ` Thomas Renninger
@ 2010-04-27  0:29         ` Mike Chan
  2010-04-27 13:01           ` Pavel Machek
  0 siblings, 1 reply; 53+ messages in thread
From: Mike Chan @ 2010-04-27  0:29 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: Arjan van de Ven, Éric Piel, linux-kernel, akpm, mingo,
	peterz, tglx, davej, cpufreq

On Tue, Apr 20, 2010 at 2:24 AM, Thomas Renninger <trenn@suse.de> wrote:
> On Monday 19 April 2010 15:43:25 Arjan van de Ven wrote:
>> On Mon, 19 Apr 2010 10:29:47 +0200
>> Éric Piel <eric.piel@tremplin-utc.net> wrote:
>> > >
>> > > The problem and fix are both verified with the "perf timechar" tool.
>> > Hi,
>> > I don't doubt that keeping the cpu full frequency during IO can
>> > improve some specific workloads, however in your log message you
>> > don't explain how much we are loosing.
>>
>> first of all, it's so bad that people will just turn the whole power
>> management off... at which point fixing the really bad bug is actually
>> quite a win
> Not sure you fix a bug, I expect this was done on purpose.
> The ondemand governor disadvantages processes with alternating short CPU
> load peaks and idle sequences.
> IO bound processes typically show up with such a behavior.
>
> But I follow Eric and agree that if it costs that much, changing
> above sounds sane.
> Still, I could imagine some people might want to not raise freq on IO bound
> process activity, therefore this should get another ondemand param, similar
> to ignore_nice_load.
>

I agree with Thomas here. Some of these assumptions on IO / FSB
performance with cpu speed do not hold true on various ARM platforms.

Perhaps we could have a min_io_freq value? Which is the min speed for
the cpu to run at for IO bound activity. In the original patch,
min_io_freq = scaling_max_freq. For various arm devices I can happily
set min_io_freq to the lowest cpu speed that satisfies bus speeds.

-- Mike

>    Thomas
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO
  2010-04-23 16:08         ` Arjan van de Ven
@ 2010-04-27 11:39           ` Thomas Renninger
  2010-05-04  3:48             ` [PATCH 8/7] cpufreq: make the iowait-is-busy-time a sysfs tunable Arjan van de Ven
  0 siblings, 1 reply; 53+ messages in thread
From: Thomas Renninger @ 2010-04-27 11:39 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Willy Tarreau, Pavel Machek, linux-kernel, akpm, mingo, peterz,
	tglx, davej, cpufreq

On Friday 23 April 2010 06:08:19 pm Arjan van de Ven wrote:
> On Fri, 23 Apr 2010 10:50:10 +0200
> Thomas Renninger <trenn@suse.de> wrote:
>  Especially on battery, users will appreciate some minutes
>
> > of more battery lifetime and do not care about some ms of IO
> > latencies.
>
> the assumption that power doesn't matter on AC is a huge fiction
> that any data center operator would love to get out of everyones head
> as quickly as possible.

Have I said power doesn't matter on AC?
Do you agree that a datacenter has different performance vs power 
tradeoff demands as a battery driven mobile device?

Back to the topic:
As you did not answer on my (several) sysfs knob request(s), I expect
you agree with it and will add one.

It's not only for real usage, but to give people a chance to easily
test and get feedback on different HW.

Thanks,

  Thomas

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO
  2010-04-27  0:29         ` Mike Chan
@ 2010-04-27 13:01           ` Pavel Machek
  2010-04-27 18:10             ` Mike Chan
  0 siblings, 1 reply; 53+ messages in thread
From: Pavel Machek @ 2010-04-27 13:01 UTC (permalink / raw)
  To: Mike Chan
  Cc: Thomas Renninger, Arjan van de Ven, ?ric Piel, linux-kernel,
	akpm, mingo, peterz, tglx, davej, cpufreq

Hi!

> > But I follow Eric and agree that if it costs that much, changing
> > above sounds sane.
> > Still, I could imagine some people might want to not raise freq on IO bound
> > process activity, therefore this should get another ondemand param, similar
> > to ignore_nice_load.
> >
> 
> I agree with Thomas here. Some of these assumptions on IO / FSB
> performance with cpu speed do not hold true on various ARM platforms.
> 
> Perhaps we could have a min_io_freq value? Which is the min speed for
> the cpu to run at for IO bound activity. In the original patch,
> min_io_freq = scaling_max_freq. For various arm devices I can happily
> set min_io_freq to the lowest cpu speed that satisfies bus speeds.

'satisfies bus speeds' == minimum cpu frequency where i/o works at all

 or

 == minimum cpu frequency where i/o works at full speed

?
								Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] ondemand: Solve the big performance issue with  ondemand during disk IO
  2010-04-27 13:01           ` Pavel Machek
@ 2010-04-27 18:10             ` Mike Chan
  0 siblings, 0 replies; 53+ messages in thread
From: Mike Chan @ 2010-04-27 18:10 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Thomas Renninger, Arjan van de Ven, ?ric Piel, linux-kernel,
	akpm, mingo, peterz, tglx, davej, cpufreq

On Tue, Apr 27, 2010 at 6:01 AM, Pavel Machek <pavel@ucw.cz> wrote:
> Hi!
>
>> > But I follow Eric and agree that if it costs that much, changing
>> > above sounds sane.
>> > Still, I could imagine some people might want to not raise freq on IO bound
>> > process activity, therefore this should get another ondemand param, similar
>> > to ignore_nice_load.
>> >
>>
>> I agree with Thomas here. Some of these assumptions on IO / FSB
>> performance with cpu speed do not hold true on various ARM platforms.
>>
>> Perhaps we could have a min_io_freq value? Which is the min speed for
>> the cpu to run at for IO bound activity. In the original patch,
>> min_io_freq = scaling_max_freq. For various arm devices I can happily
>> set min_io_freq to the lowest cpu speed that satisfies bus speeds.
>
> 'satisfies bus speeds' == minimum cpu frequency where i/o works at all
>
>  or
>
>  == minimum cpu frequency where i/o works at full speed
>

Sorry for the confusion, I mean the later, cpu frequency where i/o
works at full speed. However this value can be set via sysfs, it could
be both. Really its up to the user, if they want max io performance,
minimum io performance or a happy medium.

-- Mike

> ?
>                                                                Pavel
>
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO
  2010-04-20 11:02             ` Arjan van de Ven
@ 2010-04-28  8:57               ` Tvrtko Ursulin
  0 siblings, 0 replies; 53+ messages in thread
From: Tvrtko Ursulin @ 2010-04-28  8:57 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel, akpm, mingo, peterz, tglx, davej, cpufreq

On Tuesday 20 Apr 2010 12:02:07 Arjan van de Ven wrote:
> On Tue, 20 Apr 2010 10:10:49 +0100
>
> Tvrtko Ursulin <tvrtko.ursulin@sophos.com> wrote:
> > > it's better then to stay high longer... at least on modern machines
> > > the inbetween states are pretty much either useless or actually
> > > energy hurting compared to the higher state.
> >
> > Why do you think it is not good for latency to stay at higher
> > frequency for longer?
>
> oh it will be.

Well you said that "it's not even good for that ;-(" when I said that it may
be good for latency...

> > governor it was much much better. That is why I proposed to have a
> > gradual lowering as an option in on-demand. You said it already does
> > that - I ask are you sure? And also now you are saying it would not
> > be good for latency
> > - above is an example when it clearly does help (a lot).
>
> but the *gradual* lowering just does not make sense.
> might as well just stay at the highest frequency for the duration for
> which you do the gradual lowering, it's more efficient for power on most
> modern machines.
>
> The problem is if you do this gradual thing for long enough to help
> you on desktop loads, or the stay-high thing, on server workloads the
> power efficiency will completely and utterly tank. Been there tried
> that ;-)

Maybe it does not make sense for energy efficiency but it does for latency, I
think we have now agreed on that so it would be a nice knob to have. If I had
some time I would implement it if for nothing then to spice up this thread. :)

I also like Mike Chan's idea of minimum_full_io_speed_frequency parameter, in
addition to my gradual lowering.

> frankly, you're starting to touch at the more fundamental issues with
> ondemand. I'm not trying to solve that in this small patch, they really
> can only be solved in a complete redesign. While I'm doing such a
> redesign, I was asked to at least fix this particular gap urgently.
> Solving all of ondemand's issues needs a complete rethink of the core
> assumptions it makes (which are not correct anymore nowadays).

I understand that, although you gave no detail on how you designed this new
ondemand. Problem I have is that to me this workaround is quite ugly. If you
have to do it please add minimum_full_io_speed_frequency knob and also make it
default to current behaviour (no increase on iowait).

Tvrtko


Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom.
Company Reg No 2096520. VAT Reg No GB 348 3873 20.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO
  2010-04-24  4:56       ` Pavel Machek
@ 2010-05-01 23:29         ` Arjan van de Ven
  0 siblings, 0 replies; 53+ messages in thread
From: Arjan van de Ven @ 2010-05-01 23:29 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel, akpm, mingo, peterz, tglx, davej, cpufreq

On Sat, 24 Apr 2010 06:56:26 +0200
Pavel Machek <pavel@ucw.cz> wrote:

> 
> So... some old data. It is not exactly athlon 64 -- I don't have that
> particular number for it -- but: (from my old notes):
> 
> thinkpad x32 [52Wh]
> ~~~~~~~~~~~~
> idle machine at 600MHz, min bl:         10 W
>              at 1.8GHz:                 +6.6 W
> 
> ...so yes, I kind of see a pattern there. And you should be able to
> easily see the difference, too, if you took something from that era...


so I finally found a machine based on a Pentium M (similar to this
Thinkpad, at least according to google) and spend half a day getting its
Fedora 8 installation to compile a modern kernel (these guys don't
compile very fast....).

Turns out that things are not as simple as your data suggest.
The complexity is in USB autosuspend.
Without USB devices, I see:

"powersave"	9.7 / 9.8  Watts (alternating between these readings)
"ondemand"      9.7 / 9.8  Watts
"performance"   9.8 Watts (solid on 9.8, not alternating)

if all USB devices are in autosuspend, it's pretty much the same
picture.

But if there is an active USB device, the CPU will no longer go to C4,
but will be stuck in C2 (this is the last generation of Intel chipsets
that suffer from this behavior; "c2 popup" is what the ich4m is
missing). And the C2 power behavior does show sensitivity to the CPU
frequency... roughly 3 1/2 Watts for my machine.

Doing the 'cat' from a usb mount as you suggested, which implies active
USB DMA, shows this 3 1/2 Watts increase between the old and new
ondemand. At the same time, the performance went up a little as well (a
3.7Gb took 3m32.5s before, and now takes 3m20.5s, so that saves a
little of the energy back).

 


-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH 8/7] cpufreq: make the iowait-is-busy-time a sysfs tunable
  2010-04-27 11:39           ` Thomas Renninger
@ 2010-05-04  3:48             ` Arjan van de Ven
  2010-05-04  4:16               ` Willy Tarreau
                                 ` (2 more replies)
  0 siblings, 3 replies; 53+ messages in thread
From: Arjan van de Ven @ 2010-05-04  3:48 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: Willy Tarreau, Pavel Machek, linux-kernel, akpm, mingo, peterz,
	tglx, davej, cpufreq, riel

On Tue, 27 Apr 2010 13:39:34 +0200
Thomas Renninger <trenn@suse.de> wrote:

> On Friday 23 April 2010 06:08:19 pm Arjan van de Ven wrote:
> > On Fri, 23 Apr 2010 10:50:10 +0200
> > Thomas Renninger <trenn@suse.de> wrote:
> >  Especially on battery, users will appreciate some minutes
> >
> > > of more battery lifetime and do not care about some ms of IO
> > > latencies.
> >
> > the assumption that power doesn't matter on AC is a huge fiction
> > that any data center operator would love to get out of everyones
> > head as quickly as possible.
> 
> Have I said power doesn't matter on AC?
> Do you agree that a datacenter has different performance vs power 
> tradeoff demands as a battery driven mobile device?
> 
> Back to the topic:
> As you did not answer on my (several) sysfs knob request(s), I expect
> you agree with it and will add one.
> 

yup it makes sense to have a sysfs knob with a sane default value

From: Arjan van de Ven <arjan@linux.intel.com>
Subject: [PATCH] cpufreq: make the iowait-is-busy-time a sysfs tunable

Pavel Machek pointed out that not all CPUs have an efficient idle
at high frequency. Specifically, older Intel and various AMD cpus
would get a higher power usage when copying files from USB.

Mike Chan pointed out that the same is true for various ARM chips
as well.

Thomas Renninger suggested to make this a sysfs tunable with a
reasonable default.

This patch adds a sysfs tunable for the new behavior, and uses
a very simple function to determine a reasonable default, depending
on the CPU vendor/type.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
 drivers/cpufreq/cpufreq_ondemand.c |   46 +++++++++++++++++++++++++++++++++++-
 1 files changed, 45 insertions(+), 1 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index ed472f8..4877e8f 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -109,6 +109,7 @@ static struct dbs_tuners {
 	unsigned int down_differential;
 	unsigned int ignore_nice;
 	unsigned int powersave_bias;
+	unsigned int io_is_busy;
 } dbs_tuners_ins = {
 	.up_threshold = DEF_FREQUENCY_UP_THRESHOLD,
 	.down_differential = DEF_FREQUENCY_DOWN_DIFFERENTIAL,
@@ -260,6 +261,7 @@ static ssize_t show_##file_name						\
 	return sprintf(buf, "%u\n", dbs_tuners_ins.object);		\
 }
 show_one(sampling_rate, sampling_rate);
+show_one(io_is_busy, io_is_busy);
 show_one(up_threshold, up_threshold);
 show_one(ignore_nice_load, ignore_nice);
 show_one(powersave_bias, powersave_bias);
@@ -310,6 +312,22 @@ static ssize_t store_sampling_rate(struct kobject *a, struct attribute *b,
 	return count;
 }
 
+static ssize_t store_io_is_busy(struct kobject *a, struct attribute *b,
+				   const char *buf, size_t count)
+{
+	unsigned int input;
+	int ret;
+	ret = sscanf(buf, "%u", &input);
+	if (ret != 1)
+		return -EINVAL;
+
+	mutex_lock(&dbs_mutex);
+	dbs_tuners_ins.io_is_busy = !!input;
+	mutex_unlock(&dbs_mutex);
+
+	return count;
+}
+
 static ssize_t store_up_threshold(struct kobject *a, struct attribute *b,
 				  const char *buf, size_t count)
 {
@@ -392,6 +410,7 @@ static struct global_attr _name = \
 __ATTR(_name, 0644, show_##_name, store_##_name)
 
 define_one_rw(sampling_rate);
+define_one_rw(io_is_busy);
 define_one_rw(up_threshold);
 define_one_rw(ignore_nice_load);
 define_one_rw(powersave_bias);
@@ -403,6 +422,7 @@ static struct attribute *dbs_attributes[] = {
 	&up_threshold.attr,
 	&ignore_nice_load.attr,
 	&powersave_bias.attr,
+	&io_is_busy.attr,
 	NULL
 };
 
@@ -527,7 +547,7 @@ static void dbs_check_cpu(struct cpu_dbs_info_s *this_dbs_info)
 		 * from the cpu idle time.
 		 */
 
-		if (idle_time >= iowait_time)
+		if (dbs_tuners_ins.io_is_busy && idle_time >= iowait_time)
 			idle_time -= iowait_time;
 
 		if (unlikely(!wall_time || wall_time < idle_time))
@@ -643,6 +663,29 @@ static inline void dbs_timer_exit(struct cpu_dbs_info_s *dbs_info)
 	cancel_delayed_work_sync(&dbs_info->work);
 }
 
+/*
+ * Not all CPUs want IO time to be accounted as busy; this depends on how
+ * efficient idling at a higher frequency/voltage is.
+ * Pavel Machek says this is not so for various generations of AMD and old
+ * Intel systems.
+ * Mike Chan (android.com) says this is also not true for ARM.
+ * Because of this, whitelist specific known (series) of CPUs by default, and
+ * leave all others up to the user.
+ */
+static int should_io_be_busy(void)
+{
+#if defined(CONFIG_X86)
+	/*
+	 * For Intel, Core 2 (model 15) and later have an efficient idle.
+	 */
+	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
+	    boot_cpu_data.x86 == 6 &&
+	    boot_cpu_data.x86_model >= 15)
+		return 1;
+#endif
+	return 0;
+}
+
 static int cpufreq_governor_dbs(struct cpufreq_policy *policy,
 				   unsigned int event)
 {
@@ -705,6 +748,7 @@ static int cpufreq_governor_dbs(struct cpufreq_policy *policy,
 			dbs_tuners_ins.sampling_rate =
 				max(min_sampling_rate,
 				    latency * LATENCY_MULTIPLIER);
+			dbs_tuners_ins.io_is_busy = should_io_be_busy();
 		}
 		mutex_unlock(&dbs_mutex);
 
-- 
1.6.1.3




-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH 8/7] cpufreq: make the iowait-is-busy-time a sysfs tunable
  2010-05-04  3:48             ` [PATCH 8/7] cpufreq: make the iowait-is-busy-time a sysfs tunable Arjan van de Ven
@ 2010-05-04  4:16               ` Willy Tarreau
  2010-05-04  5:43               ` Pavel Machek
  2010-05-04 13:51               ` Rik van Riel
  2 siblings, 0 replies; 53+ messages in thread
From: Willy Tarreau @ 2010-05-04  4:16 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Thomas Renninger, Pavel Machek, linux-kernel, akpm, mingo,
	peterz, tglx, davej, cpufreq, riel

On Mon, May 03, 2010 at 08:48:18PM -0700, Arjan van de Ven wrote:
> From: Arjan van de Ven <arjan@linux.intel.com>
> Subject: [PATCH] cpufreq: make the iowait-is-busy-time a sysfs tunable
> 
> Pavel Machek pointed out that not all CPUs have an efficient idle
> at high frequency. Specifically, older Intel and various AMD cpus
> would get a higher power usage when copying files from USB.
> 
> Mike Chan pointed out that the same is true for various ARM chips
> as well.
> 
> Thomas Renninger suggested to make this a sysfs tunable with a
> reasonable default.
> 
> This patch adds a sysfs tunable for the new behavior, and uses
> a very simple function to determine a reasonable default, depending
> on the CPU vendor/type.
> 
> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> ---

That's very nice Arjan. That way we'll be able to select between
performance and power savings, which is what most laptop users
want after all !

Thanks,
Willy


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 8/7] cpufreq: make the iowait-is-busy-time a sysfs tunable
  2010-05-04  3:48             ` [PATCH 8/7] cpufreq: make the iowait-is-busy-time a sysfs tunable Arjan van de Ven
  2010-05-04  4:16               ` Willy Tarreau
@ 2010-05-04  5:43               ` Pavel Machek
  2010-05-04 13:51               ` Rik van Riel
  2 siblings, 0 replies; 53+ messages in thread
From: Pavel Machek @ 2010-05-04  5:43 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Thomas Renninger, Willy Tarreau, linux-kernel, akpm, mingo,
	peterz, tglx, davej, cpufreq, riel

On Mon 2010-05-03 20:48:18, Arjan van de Ven wrote:
> On Tue, 27 Apr 2010 13:39:34 +0200
> Thomas Renninger <trenn@suse.de> wrote:
> 
> > On Friday 23 April 2010 06:08:19 pm Arjan van de Ven wrote:
> > > On Fri, 23 Apr 2010 10:50:10 +0200
> > > Thomas Renninger <trenn@suse.de> wrote:
> > >  Especially on battery, users will appreciate some minutes
> > >
> > > > of more battery lifetime and do not care about some ms of IO
> > > > latencies.
> > >
> > > the assumption that power doesn't matter on AC is a huge fiction
> > > that any data center operator would love to get out of everyones
> > > head as quickly as possible.
> > 
> > Have I said power doesn't matter on AC?
> > Do you agree that a datacenter has different performance vs power 
> > tradeoff demands as a battery driven mobile device?
> > 
> > Back to the topic:
> > As you did not answer on my (several) sysfs knob request(s), I expect
> > you agree with it and will add one.
> > 
> 
> yup it makes sense to have a sysfs knob with a sane default value
> 
> From: Arjan van de Ven <arjan@linux.intel.com>
> Subject: [PATCH] cpufreq: make the iowait-is-busy-time a sysfs tunable
> 
> Pavel Machek pointed out that not all CPUs have an efficient idle
> at high frequency. Specifically, older Intel and various AMD cpus
> would get a higher power usage when copying files from USB.
> 
> Mike Chan pointed out that the same is true for various ARM chips
> as well.
> 
> Thomas Renninger suggested to make this a sysfs tunable with a
> reasonable default.
> 
> This patch adds a sysfs tunable for the new behavior, and uses
> a very simple function to determine a reasonable default, depending
> on the CPU vendor/type.
> 
> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>

ACK.

> ---
>  drivers/cpufreq/cpufreq_ondemand.c |   46 +++++++++++++++++++++++++++++++++++-
>  1 files changed, 45 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
> index ed472f8..4877e8f 100644
> --- a/drivers/cpufreq/cpufreq_ondemand.c
> +++ b/drivers/cpufreq/cpufreq_ondemand.c
> @@ -109,6 +109,7 @@ static struct dbs_tuners {
>  	unsigned int down_differential;
>  	unsigned int ignore_nice;
>  	unsigned int powersave_bias;
> +	unsigned int io_is_busy;
>  } dbs_tuners_ins = {
>  	.up_threshold = DEF_FREQUENCY_UP_THRESHOLD,
>  	.down_differential = DEF_FREQUENCY_DOWN_DIFFERENTIAL,
> @@ -260,6 +261,7 @@ static ssize_t show_##file_name						\
>  	return sprintf(buf, "%u\n", dbs_tuners_ins.object);		\
>  }
>  show_one(sampling_rate, sampling_rate);
> +show_one(io_is_busy, io_is_busy);
>  show_one(up_threshold, up_threshold);
>  show_one(ignore_nice_load, ignore_nice);
>  show_one(powersave_bias, powersave_bias);
> @@ -310,6 +312,22 @@ static ssize_t store_sampling_rate(struct kobject *a, struct attribute *b,
>  	return count;
>  }
>  
> +static ssize_t store_io_is_busy(struct kobject *a, struct attribute *b,
> +				   const char *buf, size_t count)
> +{
> +	unsigned int input;
> +	int ret;
> +	ret = sscanf(buf, "%u", &input);
> +	if (ret != 1)
> +		return -EINVAL;
> +
> +	mutex_lock(&dbs_mutex);
> +	dbs_tuners_ins.io_is_busy = !!input;
> +	mutex_unlock(&dbs_mutex);
> +
> +	return count;
> +}
> +
>  static ssize_t store_up_threshold(struct kobject *a, struct attribute *b,
>  				  const char *buf, size_t count)
>  {
> @@ -392,6 +410,7 @@ static struct global_attr _name = \
>  __ATTR(_name, 0644, show_##_name, store_##_name)
>  
>  define_one_rw(sampling_rate);
> +define_one_rw(io_is_busy);
>  define_one_rw(up_threshold);
>  define_one_rw(ignore_nice_load);
>  define_one_rw(powersave_bias);
> @@ -403,6 +422,7 @@ static struct attribute *dbs_attributes[] = {
>  	&up_threshold.attr,
>  	&ignore_nice_load.attr,
>  	&powersave_bias.attr,
> +	&io_is_busy.attr,
>  	NULL
>  };
>  
> @@ -527,7 +547,7 @@ static void dbs_check_cpu(struct cpu_dbs_info_s *this_dbs_info)
>  		 * from the cpu idle time.
>  		 */
>  
> -		if (idle_time >= iowait_time)
> +		if (dbs_tuners_ins.io_is_busy && idle_time >= iowait_time)
>  			idle_time -= iowait_time;
>  
>  		if (unlikely(!wall_time || wall_time < idle_time))
> @@ -643,6 +663,29 @@ static inline void dbs_timer_exit(struct cpu_dbs_info_s *dbs_info)
>  	cancel_delayed_work_sync(&dbs_info->work);
>  }
>  
> +/*
> + * Not all CPUs want IO time to be accounted as busy; this depends on how
> + * efficient idling at a higher frequency/voltage is.
> + * Pavel Machek says this is not so for various generations of AMD and old
> + * Intel systems.
> + * Mike Chan (android.com) says this is also not true for ARM.
> + * Because of this, whitelist specific known (series) of CPUs by default, and
> + * leave all others up to the user.
> + */
> +static int should_io_be_busy(void)
> +{
> +#if defined(CONFIG_X86)
> +	/*
> +	 * For Intel, Core 2 (model 15) and later have an efficient idle.
> +	 */
> +	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
> +	    boot_cpu_data.x86 == 6 &&
> +	    boot_cpu_data.x86_model >= 15)
> +		return 1;
> +#endif
> +	return 0;
> +}
> +
>  static int cpufreq_governor_dbs(struct cpufreq_policy *policy,
>  				   unsigned int event)
>  {
> @@ -705,6 +748,7 @@ static int cpufreq_governor_dbs(struct cpufreq_policy *policy,
>  			dbs_tuners_ins.sampling_rate =
>  				max(min_sampling_rate,
>  				    latency * LATENCY_MULTIPLIER);
> +			dbs_tuners_ins.io_is_busy = should_io_be_busy();
>  		}
>  		mutex_unlock(&dbs_mutex);
>  

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 8/7] cpufreq: make the iowait-is-busy-time a sysfs tunable
  2010-05-04  3:48             ` [PATCH 8/7] cpufreq: make the iowait-is-busy-time a sysfs tunable Arjan van de Ven
  2010-05-04  4:16               ` Willy Tarreau
  2010-05-04  5:43               ` Pavel Machek
@ 2010-05-04 13:51               ` Rik van Riel
  2 siblings, 0 replies; 53+ messages in thread
From: Rik van Riel @ 2010-05-04 13:51 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Thomas Renninger, Willy Tarreau, Pavel Machek, linux-kernel,
	akpm, mingo, peterz, tglx, davej, cpufreq

On 05/03/2010 11:48 PM, Arjan van de Ven wrote:

> This patch adds a sysfs tunable for the new behavior, and uses
> a very simple function to determine a reasonable default, depending
> on the CPU vendor/type.
>
> Signed-off-by: Arjan van de Ven<arjan@linux.intel.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2010-05-04 13:52 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-18 18:59 [PATCH 0/7] Fix performance issue with ondemand governor Arjan van de Ven
2010-04-18 19:00 ` [PATCH 1/7] sched: add a comment to get_cpu_idle_time_us() Arjan van de Ven
2010-04-26 19:25   ` Rik van Riel
2010-04-18 19:01 ` [PATCH 2/7] sched: introduce a function to update the idle statistics Arjan van de Ven
2010-04-26 20:11   ` Rik van Riel
2010-04-18 19:01 ` [PATCH 3/7] sched: update the idle statistics in get_cpu_idle_time_us() Arjan van de Ven
2010-04-26 20:36   ` Rik van Riel
2010-04-18 19:02 ` [PATCH 4/7] sched: fold updating of the last update time into update_ts_time_stats() Arjan van de Ven
2010-04-26 20:58   ` Rik van Riel
2010-04-18 19:02 ` [PATCH 5/7] sched: eliminate the ts->idle_lastupdate field Arjan van de Ven
2010-04-26 21:00   ` Rik van Riel
2010-04-18 19:03 ` [PATCH 6/7] sched: introduce get_cpu_iowait_time_us() Arjan van de Ven
2010-04-26 21:05   ` Rik van Riel
2010-04-18 19:03 ` [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO Arjan van de Ven
2010-04-19  8:29   ` Éric Piel
2010-04-19 13:43     ` Arjan van de Ven
2010-04-19 14:30       ` Éric Piel
2010-04-19 14:47         ` Arjan van de Ven
2010-04-20  9:24       ` Thomas Renninger
2010-04-27  0:29         ` Mike Chan
2010-04-27 13:01           ` Pavel Machek
2010-04-27 18:10             ` Mike Chan
2010-04-19  9:09   ` Tvrtko Ursulin
2010-04-19 13:46     ` Arjan van de Ven
2010-04-19 15:29       ` Tvrtko Ursulin
2010-04-20  0:47         ` Arjan van de Ven
2010-04-20  9:10           ` Tvrtko Ursulin
2010-04-20 11:02             ` Arjan van de Ven
2010-04-28  8:57               ` Tvrtko Ursulin
2010-04-23  5:26           ` Pavel Machek
2010-04-20  9:29   ` Thomas Renninger
2010-04-20 11:07     ` Arjan van de Ven
2010-04-23  5:24   ` Pavel Machek
2010-04-23  5:38     ` Willy Tarreau
2010-04-23  8:50       ` Thomas Renninger
2010-04-23 16:08         ` Arjan van de Ven
2010-04-27 11:39           ` Thomas Renninger
2010-05-04  3:48             ` [PATCH 8/7] cpufreq: make the iowait-is-busy-time a sysfs tunable Arjan van de Ven
2010-05-04  4:16               ` Willy Tarreau
2010-05-04  5:43               ` Pavel Machek
2010-05-04 13:51               ` Rik van Riel
2010-04-23 14:10       ` [PATCH 7/7] ondemand: Solve the big performance issue with ondemand during disk IO Arjan van de Ven
2010-04-23 15:35         ` Willy Tarreau
2010-04-23 13:52     ` Arjan van de Ven
2010-04-23  8:38       ` Pavel Machek
2010-04-23 16:06         ` Arjan van de Ven
2010-04-24  4:56       ` Pavel Machek
2010-05-01 23:29         ` Arjan van de Ven
2010-04-26 21:30   ` Rik van Riel
2010-04-26 21:41 ` [PATCH 0/7] Fix performance issue with ondemand governor Dave Jones
2010-04-26 21:45   ` Dominik Brodowski
2010-04-26 21:59     ` Rik van Riel
2010-04-26 22:05       ` Dominik Brodowski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).