linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] cpuidle: teo: Fix issues related to disabled idle states
@ 2019-10-10 21:30 Rafael J. Wysocki
  2019-10-10 21:32 ` [PATCH 1/4] cpuidle: teo: Ignore disabled idle states that are too deep Rafael J. Wysocki
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Rafael J. Wysocki @ 2019-10-10 21:30 UTC (permalink / raw)
  To: Linux PM
  Cc: LKML, Srinivas Pandruvada, Peter Zijlstra, Daniel Lezcano, Doug Smythies

Hi All,

There are a few issues related to the handling of disabled idle states in the
TEO (Timer-Events-Oriented) cpuidle governor which are addressed by this
series.

The application of the entire series is exactly equivalent to the testing patch
at https://lore.kernel.org/lkml/3490479.2dnHFFeJIp@kreacher/ , but IMO it is
cleaner to split the changes into smaller patches which also allows them to
be explained more accurately.

Thanks,
Rafael




^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/4] cpuidle: teo: Ignore disabled idle states that are too deep
  2019-10-10 21:30 [PATCH 0/4] cpuidle: teo: Fix issues related to disabled idle states Rafael J. Wysocki
@ 2019-10-10 21:32 ` Rafael J. Wysocki
  2019-11-05 19:50   ` Doug Smythies
  2019-10-10 21:32 ` [PATCH 2/4] cpuidle: teo: Rename local variable in teo_select() Rafael J. Wysocki
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Rafael J. Wysocki @ 2019-10-10 21:32 UTC (permalink / raw)
  To: Linux PM
  Cc: LKML, Srinivas Pandruvada, Peter Zijlstra, Daniel Lezcano, Doug Smythies

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Prevent disabled CPU idle state with target residencies beyond the
anticipated idle duration from being taken into account by the TEO
governor.

Fixes: b26bf6ab716f ("cpuidle: New timer events oriented governor for tickless systems")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpuidle/governors/teo.c |    7 +++++++
 1 file changed, 7 insertions(+)

Index: linux-pm/drivers/cpuidle/governors/teo.c
===================================================================
--- linux-pm.orig/drivers/cpuidle/governors/teo.c
+++ linux-pm/drivers/cpuidle/governors/teo.c
@@ -258,6 +258,13 @@ static int teo_select(struct cpuidle_dri
 
 		if (s->disabled || su->disable) {
 			/*
+			 * Ignore disabled states with target residencies beyond
+			 * the anticipated idle duration.
+			 */
+			if (s->target_residency > duration_us)
+				continue;
+
+			/*
 			 * If the "early hits" metric of a disabled state is
 			 * greater than the current maximum, it should be taken
 			 * into account, because it would be a mistake to select




^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 2/4] cpuidle: teo: Rename local variable in teo_select()
  2019-10-10 21:30 [PATCH 0/4] cpuidle: teo: Fix issues related to disabled idle states Rafael J. Wysocki
  2019-10-10 21:32 ` [PATCH 1/4] cpuidle: teo: Ignore disabled idle states that are too deep Rafael J. Wysocki
@ 2019-10-10 21:32 ` Rafael J. Wysocki
  2019-11-05 19:50   ` Doug Smythies
  2019-10-10 21:36 ` [PATCH 3/4] cpuidle: teo: Consider hits and misses metrics of disabled states Rafael J. Wysocki
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Rafael J. Wysocki @ 2019-10-10 21:32 UTC (permalink / raw)
  To: Linux PM
  Cc: LKML, Srinivas Pandruvada, Peter Zijlstra, Daniel Lezcano, Doug Smythies

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Rename a local variable in teo_select() in preparation for subsequent
code modifications, no intentional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpuidle/governors/teo.c |   19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

Index: linux-pm/drivers/cpuidle/governors/teo.c
===================================================================
--- linux-pm.orig/drivers/cpuidle/governors/teo.c
+++ linux-pm/drivers/cpuidle/governors/teo.c
@@ -233,7 +233,7 @@ static int teo_select(struct cpuidle_dri
 {
 	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
 	int latency_req = cpuidle_governor_latency_req(dev->cpu);
-	unsigned int duration_us, count;
+	unsigned int duration_us, early_hits;
 	int max_early_idx, constraint_idx, idx, i;
 	ktime_t delta_tick;
 
@@ -247,7 +247,7 @@ static int teo_select(struct cpuidle_dri
 	cpu_data->sleep_length_ns = tick_nohz_get_sleep_length(&delta_tick);
 	duration_us = ktime_to_us(cpu_data->sleep_length_ns);
 
-	count = 0;
+	early_hits = 0;
 	max_early_idx = -1;
 	constraint_idx = drv->state_count;
 	idx = -1;
@@ -270,12 +270,12 @@ static int teo_select(struct cpuidle_dri
 			 * into account, because it would be a mistake to select
 			 * a deeper state with lower "early hits" metric.  The
 			 * index cannot be changed to point to it, however, so
-			 * just increase the max count alone and let the index
-			 * still point to a shallower idle state.
+			 * just increase the "early hits" count alone and let
+			 * the index still point to a shallower idle state.
 			 */
 			if (max_early_idx >= 0 &&
-			    count < cpu_data->states[i].early_hits)
-				count = cpu_data->states[i].early_hits;
+			    early_hits < cpu_data->states[i].early_hits)
+				early_hits = cpu_data->states[i].early_hits;
 
 			continue;
 		}
@@ -291,10 +291,10 @@ static int teo_select(struct cpuidle_dri
 
 		idx = i;
 
-		if (count < cpu_data->states[i].early_hits &&
+		if (early_hits < cpu_data->states[i].early_hits &&
 		    !(tick_nohz_tick_stopped() &&
 		      drv->states[i].target_residency < TICK_USEC)) {
-			count = cpu_data->states[i].early_hits;
+			early_hits = cpu_data->states[i].early_hits;
 			max_early_idx = i;
 		}
 	}
@@ -323,10 +323,9 @@ static int teo_select(struct cpuidle_dri
 	if (idx < 0) {
 		idx = 0; /* No states enabled. Must use 0. */
 	} else if (idx > 0) {
+		unsigned int count = 0;
 		u64 sum = 0;
 
-		count = 0;
-
 		/*
 		 * Count and sum the most recent idle duration values less than
 		 * the current expected idle duration value.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 3/4] cpuidle: teo: Consider hits and misses metrics of disabled states
  2019-10-10 21:30 [PATCH 0/4] cpuidle: teo: Fix issues related to disabled idle states Rafael J. Wysocki
  2019-10-10 21:32 ` [PATCH 1/4] cpuidle: teo: Ignore disabled idle states that are too deep Rafael J. Wysocki
  2019-10-10 21:32 ` [PATCH 2/4] cpuidle: teo: Rename local variable in teo_select() Rafael J. Wysocki
@ 2019-10-10 21:36 ` Rafael J. Wysocki
  2019-11-05 19:50   ` Doug Smythies
  2019-10-10 21:37 ` [PATCH 4/4] cpuidle: teo: Fix "early hits" handling for disabled idle states Rafael J. Wysocki
  2019-10-18  7:21 ` [PATCH 0/4] cpuidle: teo: Fix issues related to " Doug Smythies
  4 siblings, 1 reply; 11+ messages in thread
From: Rafael J. Wysocki @ 2019-10-10 21:36 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, LKML, Srinivas Pandruvada, Peter Zijlstra,
	Daniel Lezcano, Doug Smythies

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

The TEO governor uses idle duration "bins" defined in accordance with
the CPU idle states table provided by the driver, so that each "bin"
covers the idle duration range between the target residency of the
idle state corresponding to it and the target residency of the closest
deeper idle state.  The governor collects statistics for each bin
regardless of whether or not the idle state corresponding to it is
currently enabled.

In particular, the "hits" and "misses" metrics measure the likelihood
of a situation in which both the time till the next timer (sleep
length) and the idle duration measured after wakeup fall into the
given bin.  Namely, if the "hits" value is greater than the "misses"
one, that situation is more likely than the one in which the sleep
length falls into the given bin, but the idle duration measured after
wakeup falls into a bin corresponding to one of the shallower idle
states.

If the idle state corresponding to the given bin is disabled, it
cannot be selected and if it turns out to be the one that should be
selected, a shallower idle state needs to be used instead of it.
Nevertheless, the metrics collected for the bin corresponding to it
are still valid and need to be taken into account as though that
state had not been disabled.

For this reason, make teo_select() always use the "hits" and "misses"
values of the idle duration range that the sleep length falls into
even if the specific idle state corresponding to it is disabled and
if the "hits" values is greater than the "misses" one, select the
closest enabled shallower idle state in that case.

Fixes: b26bf6ab716f ("cpuidle: New timer events oriented governor for tickless systems")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpuidle/governors/teo.c |   25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

Index: linux-pm/drivers/cpuidle/governors/teo.c
===================================================================
--- linux-pm.orig/drivers/cpuidle/governors/teo.c
+++ linux-pm/drivers/cpuidle/governors/teo.c
@@ -233,7 +233,7 @@ static int teo_select(struct cpuidle_dri
 {
 	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
 	int latency_req = cpuidle_governor_latency_req(dev->cpu);
-	unsigned int duration_us, early_hits;
+	unsigned int duration_us, hits, misses, early_hits;
 	int max_early_idx, constraint_idx, idx, i;
 	ktime_t delta_tick;
 
@@ -247,6 +247,8 @@ static int teo_select(struct cpuidle_dri
 	cpu_data->sleep_length_ns = tick_nohz_get_sleep_length(&delta_tick);
 	duration_us = ktime_to_us(cpu_data->sleep_length_ns);
 
+	hits = 0;
+	misses = 0;
 	early_hits = 0;
 	max_early_idx = -1;
 	constraint_idx = drv->state_count;
@@ -265,6 +267,17 @@ static int teo_select(struct cpuidle_dri
 				continue;
 
 			/*
+			 * This state is disabled, so the range of idle duration
+			 * values corresponding to it is covered by the current
+			 * candidate state, but still the "hits" and "misses"
+			 * metrics of the disabled state need to be used to
+			 * decide whether or not the state covering the range in
+			 * question is good enough.
+			 */
+			hits = cpu_data->states[i].hits;
+			misses = cpu_data->states[i].misses;
+
+			/*
 			 * If the "early hits" metric of a disabled state is
 			 * greater than the current maximum, it should be taken
 			 * into account, because it would be a mistake to select
@@ -280,8 +293,11 @@ static int teo_select(struct cpuidle_dri
 			continue;
 		}
 
-		if (idx < 0)
+		if (idx < 0) {
 			idx = i; /* first enabled state */
+			hits = cpu_data->states[i].hits;
+			misses = cpu_data->states[i].misses;
+		}
 
 		if (s->target_residency > duration_us)
 			break;
@@ -290,6 +306,8 @@ static int teo_select(struct cpuidle_dri
 			constraint_idx = i;
 
 		idx = i;
+		hits = cpu_data->states[i].hits;
+		misses = cpu_data->states[i].misses;
 
 		if (early_hits < cpu_data->states[i].early_hits &&
 		    !(tick_nohz_tick_stopped() &&
@@ -307,8 +325,7 @@ static int teo_select(struct cpuidle_dri
 	 * "early hits" metric, but if that cannot be determined, just use the
 	 * state selected so far.
 	 */
-	if (cpu_data->states[idx].hits <= cpu_data->states[idx].misses &&
-	    max_early_idx >= 0) {
+	if (hits <= misses && max_early_idx >= 0) {
 		idx = max_early_idx;
 		duration_us = drv->states[idx].target_residency;
 	}




^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 4/4] cpuidle: teo: Fix "early hits" handling for disabled idle states
  2019-10-10 21:30 [PATCH 0/4] cpuidle: teo: Fix issues related to disabled idle states Rafael J. Wysocki
                   ` (2 preceding siblings ...)
  2019-10-10 21:36 ` [PATCH 3/4] cpuidle: teo: Consider hits and misses metrics of disabled states Rafael J. Wysocki
@ 2019-10-10 21:37 ` Rafael J. Wysocki
  2019-11-05 19:50   ` Doug Smythies
  2019-10-18  7:21 ` [PATCH 0/4] cpuidle: teo: Fix issues related to " Doug Smythies
  4 siblings, 1 reply; 11+ messages in thread
From: Rafael J. Wysocki @ 2019-10-10 21:37 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, LKML, Srinivas Pandruvada, Peter Zijlstra,
	Daniel Lezcano, Doug Smythies

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

The TEO governor uses idle duration "bins" defined in accordance with
the CPU idle states table provided by the driver, so that each "bin"
covers the idle duration range between the target residency of the
idle state corresponding to it and the target residency of the closest
deeper idle state.  The governor collects statistics for each bin
regardless of whether or not the idle state corresponding to it is
currently enabled.

In particular, the "early hits" metric measures the likelihood of a
situation in which the idle duration measured after wakeup falls into
to given bin, but the time till the next timer (sleep length) falls
into a bin corresponding to one of the deeper idle states.  It is
used when the "hits" and "misses" metrics indicate that the state
"matching" the sleep length should not be selected, so that the state
with the maximum "early hits" value is selected instead of it.

If the idle state corresponding to the given bin is disabled, it
cannot be selected and if it turns out to be the one that should be
selected, a shallower idle state needs to be used instead of it.
Nevertheless, the metrics collected for the bin corresponding to it
are still valid and need to be taken into account as though that
state had not been disabled.

As far as the "early hits" metric is concerned, teo_select() tries to
take disabled states into account, but the state index corresponding
to the maximum "early hits" value computed by it may be incorrect.
Namely, it always uses the index of the previous maximum "early hits"
state then, but there may be enabled idle states closer to the
disabled one in question.  In particular, if the current candidate
state (whose index is the idx value) is closer to the disabled one
and the "early hits" value of the disabled state is greater than the
current maximum, the index of the current candidate state (idx)
should replace the "maximum early hits state" index.

Modify the code to handle that case correctly.

Fixes: b26bf6ab716f ("cpuidle: New timer events oriented governor for tickless systems")
Reported-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpuidle/governors/teo.c |   35 ++++++++++++++++++++++++++---------
 1 file changed, 26 insertions(+), 9 deletions(-)

Index: linux-pm/drivers/cpuidle/governors/teo.c
===================================================================
--- linux-pm.orig/drivers/cpuidle/governors/teo.c
+++ linux-pm/drivers/cpuidle/governors/teo.c
@@ -277,18 +277,35 @@ static int teo_select(struct cpuidle_dri
 			hits = cpu_data->states[i].hits;
 			misses = cpu_data->states[i].misses;
 
+			if (early_hits >= cpu_data->states[i].early_hits ||
+			    idx < 0)
+				continue;
+
+			/*
+			 * If the current candidate state has been the one with
+			 * the maximum "early hits" metric so far, the "early
+			 * hits" metric of the disabled state replaces the
+			 * current "early hits" count to avoid selecting a
+			 * deeper state with lower "early hits" metric.
+			 */
+			if (max_early_idx == idx) {
+				early_hits = cpu_data->states[i].early_hits;
+				continue;
+			}
+
 			/*
-			 * If the "early hits" metric of a disabled state is
-			 * greater than the current maximum, it should be taken
-			 * into account, because it would be a mistake to select
-			 * a deeper state with lower "early hits" metric.  The
-			 * index cannot be changed to point to it, however, so
-			 * just increase the "early hits" count alone and let
-			 * the index still point to a shallower idle state.
+			 * The current candidate state is closer to the disabled
+			 * one than the current maximum "early hits" state, so
+			 * replace the latter with it, but in case the maximum
+			 * "early hits" state index has not been set so far,
+			 * check if the current candidate state is not too
+			 * shallow for that role.
 			 */
-			if (max_early_idx >= 0 &&
-			    early_hits < cpu_data->states[i].early_hits)
+			if (!(tick_nohz_tick_stopped() &&
+			      drv->states[idx].target_residency < TICK_USEC)) {
 				early_hits = cpu_data->states[i].early_hits;
+				max_early_idx = idx;
+			}
 
 			continue;
 		}




^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH 0/4] cpuidle: teo: Fix issues related to disabled idle states
  2019-10-10 21:30 [PATCH 0/4] cpuidle: teo: Fix issues related to disabled idle states Rafael J. Wysocki
                   ` (3 preceding siblings ...)
  2019-10-10 21:37 ` [PATCH 4/4] cpuidle: teo: Fix "early hits" handling for disabled idle states Rafael J. Wysocki
@ 2019-10-18  7:21 ` Doug Smythies
  4 siblings, 0 replies; 11+ messages in thread
From: Doug Smythies @ 2019-10-18  7:21 UTC (permalink / raw)
  To: 'Rafael J. Wysocki'
  Cc: 'LKML', 'Srinivas Pandruvada',
	'Peter Zijlstra', 'Daniel Lezcano',
	'Linux PM'

On 2019.10.10 Rafael J. Wysocki wrote:

> There are a few issues related to the handling of disabled idle states in the
> TEO (Timer-Events-Oriented) cpuidle governor which are addressed by this
> series.
>
> The application of the entire series is exactly equivalent to the testing patch
> at https://lore.kernel.org/lkml/3490479.2dnHFFeJIp@kreacher/ , but IMO it is
> cleaner to split the changes into smaller patches which also allows them to
> be explained more accurately.

Hi,

I have re-tested and continued testing using this 4 patch set.

Summary: So far, everything is fine.

Some, but not all, detail:

Reference kernel: 5.4-rc2 "stock"
Test kernel: 5.4-rc2 + this 4 patch set "rjw-4"

Test 1: Where I simply coded, to automate, my best use
case example:

This is an idle state 4 disabled test.
The max entries column is entries and exits for
idle state 0 over the test sample interval
(15 seconds in this case).

stock kernel:

idle-doug01 : begin ...

Per CPU PASS/FAIL : Totals: fail rate: max entries:
 0 1 2 3 4 5 6 7  :       : (percent):            :
 . . F . . . F .  :      8 : 25.0000 :      69939
 . . F . . . F .  :     16 : 25.0000 :      69938
 . . F . . . F .  :     24 : 25.0000 :      70126
...
 . . . . . . F F  :   1928 : 30.8610 :      66530
 . . . . . . F F  :   1936 : 30.8368 :      68451
 . . . . . . F F  :   1944 : 30.8128 :      68643
 . . . . . . F F  :   1952 : 30.7889 :      68645
 . . . . . . F F  :   1960 : 30.7653 :      68152
 . . . . . . F F  :   1968 : 30.7419 :      67145
 . . . . . . F F  :   1976 : 30.7186 :      67349
 . . . . . . F F  :   1984 : 30.6956 :      68481
 . . . . . . F F  :   1992 : 30.6727 :      67394
 . . . . . . F F  :   2000 : 30.6500 :      68645
^C --- SIGINT (^C) detected. Terminate gracefully, saving the sample data...
 . . . . . . F F  :   2008 : 30.6275 :      29010

Summary: Total Tests:   2008 : Total Fails:    615 : Fail rate (percent): 30.6275 : Per CPU:
CPU00:      0, CPU01:      0, CPU02:     12, CPU03:      0, CPU04:    152, CPU05:     43, CPU06:    197, CPU07:    211,

idle-doug01 : end ...

rjw-4 kernel:

idle-doug01 : begin ...

Per CPU PASS/FAIL : Totals: fail rate: max entries:
 0 1 2 3 4 5 6 7  :       : (percent):            :
 . . . . . . . .  :      8 :  0.0000 :          3
 . . . . . . . .  :     16 :  0.0000 :          7
 . . . . . . . .  :     24 :  0.0000 :         10
 . . . . . . . .  :     32 :  0.0000 :         10
 . . . . . . . .  :     40 :  0.0000 :         23
 . . . . . . . .  :     48 :  0.0000 :         23
 . . . . . . . .  :     56 :  0.0000 :          1
 . . . . . . . .  :     64 :  0.0000 :          1
 . . . . . . . .  :     72 :  0.0000 :          0
 . . . . . . . .  :     80 :  0.0000 :          0
 . . . . . . . .  :     88 :  0.0000 :          1
 . . . . . . . .  :     96 :  0.0000 :          1
 . . . . . . . .  :    104 :  0.0000 :          0
 . . . . . . . .  :    112 :  0.0000 :          0
 . . . . . . . .  :    120 :  0.0000 :          1
 . . . . . . . .  :    128 :  0.0000 :          2
 . . . . . . . .  :    136 :  0.0000 :          2
 . . . . . . . .  :    144 :  0.0000 :          3
 . . . . . . . .  :    152 :  0.0000 :          3
 . . . . . . . .  :    160 :  0.0000 :          4
 . . . . . . . .  :    168 :  0.0000 :          4
 . . . . . . . .  :    176 :  0.0000 :          6
 . . . . . . . .  :    184 :  0.0000 :          3
 . . . . . . . .  :    192 :  0.0000 :          4
...
 . . . . . . . .  :  11480 :  0.0000 :          5
 . . . . . . . .  :  11488 :  0.0000 :          4
 . . . . . . . .  :  11496 :  0.0000 :          6
 . . . . . . . .  :  11504 :  0.0000 :          8
 . . . . . . . .  :  11512 :  0.0000 :          5
 . . . . . . . .  :  11520 :  0.0000 :          4
 . . . . . . . .  :  11528 :  0.0000 :          4
 . . . . . . . .  :  11536 :  0.0000 :          3
 . . . . . . . .  :  11544 :  0.0000 :          2
 . . . . . . . .  :  11552 :  0.0000 :          5
 . . . . . . . .  :  11560 :  0.0000 :          3
 . . . . . . . .  :  11568 :  0.0000 :          3
^C --- SIGINT (^C) detected. Terminate gracefully, saving the sample data...
 . . . . . . . .  :  11576 :  0.0000 :          1

Summary: Total Tests:  11576 : Total Fails:      0 : Fail rate (percent):  0.0000 : Per CPU:
CPU00:      0, CPU01:      0, CPU02:      0, CPU03:      0, CPU04:      0, CPU05:      0, CPU06:      0, CPU07:      0,

idle-doug01 : end ...

Test 2: Have a look at all idle state enabled/disabled combinations.
This test is not very good, and only looks at processor package power.
Particularly for idle state 1, see Note 1 below.
But it's better than nothing.

stock kernel:

idle-disable-enable : begin ...

Idle State:  Per test PASS/FAIL  :     Power (Watts)    :
4 3 2 1 0 :                      :  Expected : Max diff :
0 0 0 0 0 : PASS PASS PASS PASS PASS PASS PASS PASS :     3.7 :    0.2
0 0 0 0 1 : PASS PASS PASS PASS PASS PASS PASS PASS :     3.7 :    0.2
0 0 0 1 0 : PASS PASS PASS PASS PASS PASS PASS PASS :     3.7 :    0.2
0 0 0 1 1 : PASS PASS PASS PASS PASS PASS PASS PASS :     3.7 :    0.2
0 0 1 0 0 : PASS PASS PASS PASS PASS PASS PASS PASS :     3.7 :    0.2
0 0 1 0 1 : PASS PASS PASS PASS PASS PASS PASS PASS :     3.7 :    0.2
0 0 1 1 0 : PASS PASS PASS PASS PASS PASS PASS PASS :     3.7 :    0.2
0 0 1 1 1 : PASS PASS PASS PASS PASS PASS PASS PASS :     3.7 :    0.2
0 1 0 0 0 : PASS PASS PASS PASS PASS PASS PASS PASS :     3.7 :    0.2
0 1 0 0 1 : PASS PASS PASS PASS PASS PASS PASS PASS :     3.7 :    0.2
0 1 0 1 0 : PASS PASS PASS PASS PASS PASS PASS PASS :     3.7 :    0.2
0 1 0 1 1 : PASS PASS PASS PASS PASS PASS PASS PASS :     3.7 :    0.2
0 1 1 0 0 : PASS PASS PASS PASS PASS PASS PASS PASS :     3.7 :    0.2
0 1 1 0 1 : PASS PASS PASS PASS PASS PASS PASS PASS :     3.7 :    0.2
0 1 1 1 0 : PASS PASS PASS PASS PASS PASS PASS PASS :     3.7 :    0.2
0 1 1 1 1 : PASS PASS PASS PASS PASS PASS PASS PASS :     3.7 :    0.2
1 0 0 0 0 : FAIL FAIL FAIL FAIL FAIL FAIL FAIL FAIL :     4.9 :   17.0
1 0 0 0 1 : FAIL PASS PASS PASS PASS PASS PASS PASS :     4.9 :    1.6
1 0 0 1 0 : PASS PASS PASS PASS PASS PASS PASS PASS :     4.9 :    0.8
1 0 0 1 1 : PASS PASS PASS PASS PASS PASS PASS PASS :     4.9 :    0.8
1 0 1 0 0 : PASS FAIL FAIL FAIL FAIL FAIL FAIL FAIL :     4.9 :   19.9
1 0 1 0 1 : PASS PASS PASS PASS PASS PASS PASS PASS :     4.9 :    0.9
1 0 1 1 0 : PASS PASS PASS PASS PASS PASS PASS PASS :     4.9 :    0.8
1 0 1 1 1 : PASS PASS PASS PASS PASS PASS PASS PASS :     4.9 :    0.8
1 1 0 0 0 : PASS PASS PASS PASS PASS PASS PASS PASS :     8.7 :    0.7
1 1 0 0 1 : PASS PASS PASS PASS PASS PASS PASS PASS :     8.7 :    0.7
1 1 0 1 0 : PASS PASS PASS PASS PASS PASS PASS PASS :     8.7 :    0.7
1 1 0 1 1 : PASS PASS PASS PASS PASS PASS PASS PASS :     8.7 :    0.7
1 1 1 0 0 : FAIL PASS PASS PASS PASS PASS PASS PASS :     9.6 :    2.5
1 1 1 0 1 : FAIL PASS PASS PASS PASS PASS PASS PASS :     9.6 :   11.3 <<< Note 1.
1 1 1 1 0 : PASS PASS PASS PASS PASS PASS PASS PASS :    52.0 :    1.0
1 1 1 1 1 : PASS PASS PASS PASS PASS PASS PASS PASS :    52.0 :    1.4

idle-disable-enable : end ...

rjw-4 kernel:

Note: Reduced to 3 tests per combination. 8 seconds per sample.

idle-disable-enable : begin ...

Idle State:  Per test PASS/FAIL  :     Power (Watts)    :
4 3 2 1 0 :                      :  Expected : Max diff :
0 0 0 0 0 : PASS PASS PASS :     3.7 :    0.2
0 0 0 0 1 : PASS PASS PASS :     3.7 :    0.2
0 0 0 1 0 : PASS PASS PASS :     3.7 :    0.2
0 0 0 1 1 : PASS PASS PASS :     3.7 :    0.2
0 0 1 0 0 : PASS PASS PASS :     3.7 :    0.3
0 0 1 0 1 : PASS PASS PASS :     3.7 :    0.2
0 0 1 1 0 : PASS PASS PASS :     3.7 :    0.2
0 0 1 1 1 : PASS PASS PASS :     3.7 :    0.3
0 1 0 0 0 : PASS PASS PASS :     3.7 :    0.2
0 1 0 0 1 : PASS PASS PASS :     3.7 :    0.3
0 1 0 1 0 : PASS PASS PASS :     3.7 :    0.3
0 1 0 1 1 : PASS PASS PASS :     3.7 :    0.2
0 1 1 0 0 : PASS PASS PASS :     3.7 :    0.2
0 1 1 0 1 : PASS PASS PASS :     3.7 :    0.2
0 1 1 1 0 : PASS PASS PASS :     3.7 :    0.3
0 1 1 1 1 : PASS PASS PASS :     3.7 :    0.3
1 0 0 0 0 : PASS PASS PASS :     4.9 :    0.8
1 0 0 0 1 : PASS PASS PASS :     4.9 :    0.8
1 0 0 1 0 : PASS PASS PASS :     4.9 :    0.8
1 0 0 1 1 : PASS PASS PASS :     4.9 :    0.8
1 0 1 0 0 : PASS PASS PASS :     4.9 :    0.8
1 0 1 0 1 : PASS PASS PASS :     4.9 :    0.8
1 0 1 1 0 : PASS PASS PASS :     4.9 :    0.8
1 0 1 1 1 : PASS PASS PASS :     4.9 :    0.8
1 1 0 0 0 : PASS PASS PASS :     8.7 :    0.8
1 1 0 0 1 : PASS PASS PASS :     8.7 :    0.8
1 1 0 1 0 : PASS PASS PASS :     8.7 :    0.9
1 1 0 1 1 : PASS PASS PASS :     8.7 :    0.9
1 1 1 0 0 : PASS PASS PASS :     9.6 :    1.7
1 1 1 0 1 : FAIL PASS PASS :     9.6 :    2.1  <<< Note 1.
1 1 1 1 0 : PASS PASS PASS :    52.0 :    0.5
1 1 1 1 1 : PASS PASS PASS :    52.0 :    1.0

idle-disable-enable : end ...

Note 1:
The processor package power used in idle state 1 is
a strong function of the current p-state setting:
p-state 16 (lowest): 8.8 watts
p-state 38 (highest): 21.9 watts
When the system load goes to idle during this test,
it can take on the order of 10s of seconds for the
intel-pstate driver to request low p-states for
all CPUs. If there is an issue here, it is a subject
for another day.

... Doug



^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH 1/4] cpuidle: teo: Ignore disabled idle states that are too deep
  2019-10-10 21:32 ` [PATCH 1/4] cpuidle: teo: Ignore disabled idle states that are too deep Rafael J. Wysocki
@ 2019-11-05 19:50   ` Doug Smythies
  0 siblings, 0 replies; 11+ messages in thread
From: Doug Smythies @ 2019-11-05 19:50 UTC (permalink / raw)
  To: 'Rafael J. Wysocki', 'Linux PM'
  Cc: 'LKML', 'Srinivas Pandruvada',
	'Peter Zijlstra', 'Daniel Lezcano'

On 2019.10.10 14:32 Rafael J. Wysocki wrote:

> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Prevent disabled CPU idle state with target residencies beyond the
> anticipated idle duration from being taken into account by the TEO
> governor.
> 
> Fixes: b26bf6ab716f ("cpuidle: New timer events oriented governor for tickless systems")
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---

Acked-by: Doug Smythies <dsmythies@telus.net>



^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH 2/4] cpuidle: teo: Rename local variable in teo_select()
  2019-10-10 21:32 ` [PATCH 2/4] cpuidle: teo: Rename local variable in teo_select() Rafael J. Wysocki
@ 2019-11-05 19:50   ` Doug Smythies
  0 siblings, 0 replies; 11+ messages in thread
From: Doug Smythies @ 2019-11-05 19:50 UTC (permalink / raw)
  To: 'Rafael J. Wysocki', 'Linux PM'
  Cc: 'LKML', 'Srinivas Pandruvada',
	'Peter Zijlstra', 'Daniel Lezcano'

On 2019.10.10 14:33 Rafael J. Wysocki wrote:

> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Rename a local variable in teo_select() in preparation for subsequent
> code modifications, no intentional impact.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

This change makes it easier to read and understand.
Thanks.

Acked-by: Doug Smythies <dsmythies@telus.net>



^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH 3/4] cpuidle: teo: Consider hits and misses metrics of disabled states
  2019-10-10 21:36 ` [PATCH 3/4] cpuidle: teo: Consider hits and misses metrics of disabled states Rafael J. Wysocki
@ 2019-11-05 19:50   ` Doug Smythies
  0 siblings, 0 replies; 11+ messages in thread
From: Doug Smythies @ 2019-11-05 19:50 UTC (permalink / raw)
  To: 'Rafael J. Wysocki'
  Cc: 'Linux PM', 'LKML', 'Srinivas Pandruvada',
	'Peter Zijlstra', 'Daniel Lezcano'

On 2019.10.10 14:36 Rafael J. Wysocki wrote:

> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> The TEO governor uses idle duration "bins" defined in accordance with
> the CPU idle states table provided by the driver, so that each "bin"
> covers the idle duration range between the target residency of the
> idle state corresponding to it and the target residency of the closest
> deeper idle state.  The governor collects statistics for each bin
> regardless of whether or not the idle state corresponding to it is
> currently enabled.
>
> In particular, the "hits" and "misses" metrics measure the likelihood
> of a situation in which both the time till the next timer (sleep
> length) and the idle duration measured after wakeup fall into the
> given bin.  Namely, if the "hits" value is greater than the "misses"
> one, that situation is more likely than the one in which the sleep
> length falls into the given bin, but the idle duration measured after
> wakeup falls into a bin corresponding to one of the shallower idle
> states.
>
> If the idle state corresponding to the given bin is disabled, it
> cannot be selected and if it turns out to be the one that should be
> selected, a shallower idle state needs to be used instead of it.
> Nevertheless, the metrics collected for the bin corresponding to it
> are still valid and need to be taken into account as though that
> state had not been disabled.
>
> For this reason, make teo_select() always use the "hits" and "misses"
> values of the idle duration range that the sleep length falls into
> even if the specific idle state corresponding to it is disabled and
> if the "hits" values is greater than the "misses" one, select the
> closest enabled shallower idle state in that case.
> 
> Fixes: b26bf6ab716f ("cpuidle: New timer events oriented governor for tickless systems")
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Great thanks.

Acked-by: Doug Smythies <dsmythies@telus.net>



^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH 4/4] cpuidle: teo: Fix "early hits" handling for disabled idle states
  2019-10-10 21:37 ` [PATCH 4/4] cpuidle: teo: Fix "early hits" handling for disabled idle states Rafael J. Wysocki
@ 2019-11-05 19:50   ` Doug Smythies
  2019-11-05 22:15     ` Rafael J. Wysocki
  0 siblings, 1 reply; 11+ messages in thread
From: Doug Smythies @ 2019-11-05 19:50 UTC (permalink / raw)
  To: 'Rafael J. Wysocki'
  Cc: 'Linux PM', 'LKML', 'Srinivas Pandruvada',
	'Peter Zijlstra', 'Daniel Lezcano'

On 2019.10.10 14:38 Rafael J. Wysocki wrote:

> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> The TEO governor uses idle duration "bins" defined in accordance with
> the CPU idle states table provided by the driver, so that each "bin"
> covers the idle duration range between the target residency of the
> idle state corresponding to it and the target residency of the closest
> deeper idle state.  The governor collects statistics for each bin
> regardless of whether or not the idle state corresponding to it is
> currently enabled.
>
> In particular, the "early hits" metric measures the likelihood of a
> situation in which the idle duration measured after wakeup falls into
> to given bin, but the time till the next timer (sleep length) falls
> into a bin corresponding to one of the deeper idle states.  It is
> used when the "hits" and "misses" metrics indicate that the state
> "matching" the sleep length should not be selected, so that the state
> with the maximum "early hits" value is selected instead of it.
>
> If the idle state corresponding to the given bin is disabled, it
> cannot be selected and if it turns out to be the one that should be
> selected, a shallower idle state needs to be used instead of it.
> Nevertheless, the metrics collected for the bin corresponding to it
> are still valid and need to be taken into account as though that
> state had not been disabled.
>
> As far as the "early hits" metric is concerned, teo_select() tries to
> take disabled states into account, but the state index corresponding
> to the maximum "early hits" value computed by it may be incorrect.
> Namely, it always uses the index of the previous maximum "early hits"
> state then, but there may be enabled idle states closer to the
> disabled one in question.  In particular, if the current candidate
> state (whose index is the idx value) is closer to the disabled one
> and the "early hits" value of the disabled state is greater than the
> current maximum, the index of the current candidate state (idx)
> should replace the "maximum early hits state" index.
>
> Modify the code to handle that case correctly.
> 
> Fixes: b26bf6ab716f ("cpuidle: New timer events oriented governor for tickless systems")
> Reported-by: Doug Smythies <dsmythies@telus.net>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

I tested this pretty patch set thoroughly, but can not claim exhaustively.
I did my best to mess it up via trying weird scenarios.
Unrelated issues discovered during testing are being handled on
other e-mail threads.

Tested-by: Doug Smythies <dsmythies@telus.net>



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/4] cpuidle: teo: Fix "early hits" handling for disabled idle states
  2019-11-05 19:50   ` Doug Smythies
@ 2019-11-05 22:15     ` Rafael J. Wysocki
  0 siblings, 0 replies; 11+ messages in thread
From: Rafael J. Wysocki @ 2019-11-05 22:15 UTC (permalink / raw)
  To: Doug Smythies
  Cc: Rafael J. Wysocki, Linux PM, LKML, Srinivas Pandruvada,
	Peter Zijlstra, Daniel Lezcano

On Tue, Nov 5, 2019 at 8:51 PM Doug Smythies <dsmythies@telus.net> wrote:
>
> On 2019.10.10 14:38 Rafael J. Wysocki wrote:
>
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > The TEO governor uses idle duration "bins" defined in accordance with
> > the CPU idle states table provided by the driver, so that each "bin"
> > covers the idle duration range between the target residency of the
> > idle state corresponding to it and the target residency of the closest
> > deeper idle state.  The governor collects statistics for each bin
> > regardless of whether or not the idle state corresponding to it is
> > currently enabled.
> >
> > In particular, the "early hits" metric measures the likelihood of a
> > situation in which the idle duration measured after wakeup falls into
> > to given bin, but the time till the next timer (sleep length) falls
> > into a bin corresponding to one of the deeper idle states.  It is
> > used when the "hits" and "misses" metrics indicate that the state
> > "matching" the sleep length should not be selected, so that the state
> > with the maximum "early hits" value is selected instead of it.
> >
> > If the idle state corresponding to the given bin is disabled, it
> > cannot be selected and if it turns out to be the one that should be
> > selected, a shallower idle state needs to be used instead of it.
> > Nevertheless, the metrics collected for the bin corresponding to it
> > are still valid and need to be taken into account as though that
> > state had not been disabled.
> >
> > As far as the "early hits" metric is concerned, teo_select() tries to
> > take disabled states into account, but the state index corresponding
> > to the maximum "early hits" value computed by it may be incorrect.
> > Namely, it always uses the index of the previous maximum "early hits"
> > state then, but there may be enabled idle states closer to the
> > disabled one in question.  In particular, if the current candidate
> > state (whose index is the idx value) is closer to the disabled one
> > and the "early hits" value of the disabled state is greater than the
> > current maximum, the index of the current candidate state (idx)
> > should replace the "maximum early hits state" index.
> >
> > Modify the code to handle that case correctly.
> >
> > Fixes: b26bf6ab716f ("cpuidle: New timer events oriented governor for tickless systems")
> > Reported-by: Doug Smythies <dsmythies@telus.net>
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> I tested this pretty patch set thoroughly, but can not claim exhaustively.
> I did my best to mess it up via trying weird scenarios.
> Unrelated issues discovered during testing are being handled on
> other e-mail threads.
>
> Tested-by: Doug Smythies <dsmythies@telus.net>

Thanks for the testing and reviews, much appreciated!

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-11-05 22:15 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-10 21:30 [PATCH 0/4] cpuidle: teo: Fix issues related to disabled idle states Rafael J. Wysocki
2019-10-10 21:32 ` [PATCH 1/4] cpuidle: teo: Ignore disabled idle states that are too deep Rafael J. Wysocki
2019-11-05 19:50   ` Doug Smythies
2019-10-10 21:32 ` [PATCH 2/4] cpuidle: teo: Rename local variable in teo_select() Rafael J. Wysocki
2019-11-05 19:50   ` Doug Smythies
2019-10-10 21:36 ` [PATCH 3/4] cpuidle: teo: Consider hits and misses metrics of disabled states Rafael J. Wysocki
2019-11-05 19:50   ` Doug Smythies
2019-10-10 21:37 ` [PATCH 4/4] cpuidle: teo: Fix "early hits" handling for disabled idle states Rafael J. Wysocki
2019-11-05 19:50   ` Doug Smythies
2019-11-05 22:15     ` Rafael J. Wysocki
2019-10-18  7:21 ` [PATCH 0/4] cpuidle: teo: Fix issues related to " Doug Smythies

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).