All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] Cpuidle: Minor fixes
@ 2014-02-24  6:29 Tuukka Tikkanen
  2014-02-24  6:29 ` [PATCH 1/7] Cpuidle: rename expected_us to next_timer_us in menu governor Tuukka Tikkanen
                   ` (8 more replies)
  0 siblings, 9 replies; 26+ messages in thread
From: Tuukka Tikkanen @ 2014-02-24  6:29 UTC (permalink / raw)
  To: linux-pm, rjw; +Cc: daniel.lezcano, tuukka.tikkanen, linux-kernel

This set of patches makes some minor changes to menu governor and the poll
idle state.

Patch 1 is simply a rename of a variable to make the name better represent
the contained information.

Patch 2 fixes calculating actual residency in cases where the entered state
is different from the state decided by the menu governor.

Patch 3 makes sure the menu governor timer coefficients are not updated
with values that might cause a coefficient to reach a value greater than
unity.

Patch 4 fixes calculation actual residency in cases where the entered state
does not support measuring residency. In such cases the residency time
is taken from time remaining until next timer expiry. The timer is expected
to go off at the start of exit latency, not after it. Therefore the exit
latency must not be substracted from the assumed value.

Patch 5 moves the performance multiplier based comparison out of the state
selection loop by changing it into a latency requirement. This allows
using a generic state selection function accepting only (duration, latency)
tuple as input. The change is possible by noting that performance multiplier
is used only to check for a minimum predicted idle duration to exit latency
ratio. As predicted idle duration is a constant for the loop, the maximum
allowed latency can be calculated outside of the loop.

Patch 6 prevents using negative values from tick_nohz_get_sleep_length()
in the menu governor. If unchecked, the negative values are used as huge
unsigned values. Negative values occur fairly often (e.g. on x86_64 I've
seen this happen several times per minute) on a busy system, allowing
the deepest state to win the selection while the shallowest should be picked.

Patch 7 adds CPUIDLE_FLAG_TIME_VALID to poll_idle. I do not know of any
platfrom where cpu_relax() would break ktime_get() and in fact poll_idle
uses ktime_get itself.
(Note: poll_idle updates dev->last_residency for some reason. Does it ever
get called without going through cpuidle_enter_state, which will overwrite
the value? Even if some state redirects to this state, the call will
eventually return to the framework. The redundant time measurement could
be removed, unless there is some obscure way of getting called on some
platform that I am unable to figure out.)

Tuukka Tikkanen (7):
  Cpuidle: rename expected_us to next_timer_us in menu governor
  Cpuidle: Use actual state latency in menu governor
  Cpuidle: Ensure menu coefficients stay within domain
  Cpuidle: Do not substract exit latency from assumed sleep length
  Cpuidle: Move perf multiplier calculation out of the selection loop
  Cpuidle: Deal with timer expiring in the past
  Cpuidle: poll state can measure residency

 drivers/cpuidle/driver.c         |    2 +-
 drivers/cpuidle/governors/menu.c |   85 ++++++++++++++++++++++++--------------
 2 files changed, 54 insertions(+), 33 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/7] Cpuidle: rename expected_us to next_timer_us in menu governor
  2014-02-24  6:29 [PATCH 0/7] Cpuidle: Minor fixes Tuukka Tikkanen
@ 2014-02-24  6:29 ` Tuukka Tikkanen
  2014-02-24 16:52   ` Nicolas Pitre
  2014-03-05 22:26   ` Len Brown
  2014-02-24  6:29 ` [PATCH 2/7] Cpuidle: Use actual state latency " Tuukka Tikkanen
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 26+ messages in thread
From: Tuukka Tikkanen @ 2014-02-24  6:29 UTC (permalink / raw)
  To: linux-pm, rjw; +Cc: daniel.lezcano, tuukka.tikkanen, linux-kernel

The field expected_us is used to store the time remaining until next
timer expiry. The name is inaccurate, as we really do not expect all
wakeups to be caused by timers. In addition, another field with a very
similar name (predicted_us) is used to store the predicted time
remaining until any wakeup source being active.

This patch renames expected_us to next_timer_us in order to better
reflect the contained information.

Signed-off-by: Tuukka Tikkanen <tuukka.tikkanen@linaro.org>
---
 drivers/cpuidle/governors/menu.c |   18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index cf7f2f0..e9a2a27 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -122,7 +122,7 @@ struct menu_device {
 	int		last_state_idx;
 	int             needs_update;
 
-	unsigned int	expected_us;
+	unsigned int	next_timer_us;
 	unsigned int	predicted_us;
 	unsigned int	exit_us;
 	unsigned int	bucket;
@@ -257,7 +257,7 @@ again:
 		stddev = int_sqrt(stddev);
 		if (((avg > stddev * 6) && (divisor * 4 >= INTERVALS * 3))
 							|| stddev <= 20) {
-			if (data->expected_us > avg)
+			if (data->next_timer_us > avg)
 				data->predicted_us = avg;
 			return;
 		}
@@ -306,11 +306,11 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 
 	/* determine the expected residency time, round up */
 	t = ktime_to_timespec(tick_nohz_get_sleep_length());
-	data->expected_us =
+	data->next_timer_us =
 		t.tv_sec * USEC_PER_SEC + t.tv_nsec / NSEC_PER_USEC;
 
 
-	data->bucket = which_bucket(data->expected_us);
+	data->bucket = which_bucket(data->next_timer_us);
 
 	multiplier = performance_multiplier();
 
@@ -326,7 +326,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 	 * operands are 32 bits.
 	 * Make sure to round up for half microseconds.
 	 */
-	data->predicted_us = div_round64((uint64_t)data->expected_us *
+	data->predicted_us = div_round64((uint64_t)data->next_timer_us *
 					 data->correction_factor[data->bucket],
 					 RESOLUTION * DECAY);
 
@@ -336,7 +336,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 	 * We want to default to C1 (hlt), not to busy polling
 	 * unless the timer is happening really really soon.
 	 */
-	if (data->expected_us > 5 &&
+	if (data->next_timer_us > 5 &&
 	    !drv->states[CPUIDLE_DRIVER_STATE_START].disabled &&
 		dev->states_usage[CPUIDLE_DRIVER_STATE_START].disable == 0)
 		data->last_state_idx = CPUIDLE_DRIVER_STATE_START;
@@ -401,7 +401,7 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 	 * for the whole expected time.
 	 */
 	if (unlikely(!(target->flags & CPUIDLE_FLAG_TIME_VALID)))
-		last_idle_us = data->expected_us;
+		last_idle_us = data->next_timer_us;
 
 
 	measured_us = last_idle_us;
@@ -418,8 +418,8 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 	new_factor = data->correction_factor[data->bucket];
 	new_factor -= new_factor / DECAY;
 
-	if (data->expected_us > 0 && measured_us < MAX_INTERESTING)
-		new_factor += RESOLUTION * measured_us / data->expected_us;
+	if (data->next_timer_us > 0 && measured_us < MAX_INTERESTING)
+		new_factor += RESOLUTION * measured_us / data->next_timer_us;
 	else
 		/*
 		 * we were idle so long that we count it as a perfect
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/7] Cpuidle: Use actual state latency in menu governor
  2014-02-24  6:29 [PATCH 0/7] Cpuidle: Minor fixes Tuukka Tikkanen
  2014-02-24  6:29 ` [PATCH 1/7] Cpuidle: rename expected_us to next_timer_us in menu governor Tuukka Tikkanen
@ 2014-02-24  6:29 ` Tuukka Tikkanen
  2014-02-24 22:35   ` Daniel Lezcano
  2014-02-24  6:29 ` [PATCH 3/7] Cpuidle: Ensure menu coefficients stay within domain Tuukka Tikkanen
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Tuukka Tikkanen @ 2014-02-24  6:29 UTC (permalink / raw)
  To: linux-pm, rjw; +Cc: daniel.lezcano, tuukka.tikkanen, linux-kernel

Currently menu governor records the exit latency of the state it has
chosen for the idle period. The stored latency value is then later
used to calculate the actual length of the idle period. This value
may however be incorrect, as the entered state may not be the one
chosen by the governor. The entered state information is available,
so we can use that to obtain the real exit latency.

Signed-off-by: Tuukka Tikkanen <tuukka.tikkanen@linaro.org>
---
 drivers/cpuidle/governors/menu.c |    7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index e9a2a27..115386a 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -124,7 +124,6 @@ struct menu_device {
 
 	unsigned int	next_timer_us;
 	unsigned int	predicted_us;
-	unsigned int	exit_us;
 	unsigned int	bucket;
 	unsigned int	correction_factor[BUCKETS];
 	unsigned int	intervals[INTERVALS];
@@ -298,7 +297,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 	}
 
 	data->last_state_idx = 0;
-	data->exit_us = 0;
 
 	/* Special case when user has set very strict latency requirement */
 	if (unlikely(latency_req == 0))
@@ -359,7 +357,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 			continue;
 
 		data->last_state_idx = i;
-		data->exit_us = s->exit_latency;
 	}
 
 	return data->last_state_idx;
@@ -410,8 +407,8 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 	 * We correct for the exit latency; we are assuming here that the
 	 * exit latency happens after the event that we're interested in.
 	 */
-	if (measured_us > data->exit_us)
-		measured_us -= data->exit_us;
+	if (measured_us > target->exit_latency)
+		measured_us -= target->exit_latency;
 
 
 	/* Update our correction ratio */
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/7] Cpuidle: Ensure menu coefficients stay within domain
  2014-02-24  6:29 [PATCH 0/7] Cpuidle: Minor fixes Tuukka Tikkanen
  2014-02-24  6:29 ` [PATCH 1/7] Cpuidle: rename expected_us to next_timer_us in menu governor Tuukka Tikkanen
  2014-02-24  6:29 ` [PATCH 2/7] Cpuidle: Use actual state latency " Tuukka Tikkanen
@ 2014-02-24  6:29 ` Tuukka Tikkanen
  2014-02-24 16:57   ` Nicolas Pitre
  2014-02-24  6:29 ` [PATCH 4/7] Cpuidle: Do not substract exit latency from assumed sleep length Tuukka Tikkanen
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Tuukka Tikkanen @ 2014-02-24  6:29 UTC (permalink / raw)
  To: linux-pm, rjw; +Cc: daniel.lezcano, tuukka.tikkanen, linux-kernel

The menu governor uses coefficients as one method of actual idle
period length estimation. The coefficients are, as detailed below,
multipliers giving expected idle period length from time until next
timer expiry. The multipliers are supposed to have domain of (0..1].

The coefficients are fractions where only the numerators are stored
and denominators are a shared constant RESOLUTION*DECAY. Since the
value of the coefficient should always be greater than 0 and less
than or equal to 1, the numerator must have a value greater than
0 and less than or equal to RESOLUTION*DECAY.

If the coefficients are updated with measured idle durations exceeding
timer length, the multiplier may reach values exceeding unity (i.e.
the stored numerator exceeds RESOLUTION*DECAY). This patch ensures that
the multipliers are updated with durations capped to timer length.

Signed-off-by: Tuukka Tikkanen <tuukka.tikkanen@linaro.org>
---
 drivers/cpuidle/governors/menu.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 115386a..f0995dd 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -410,6 +410,9 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 	if (measured_us > target->exit_latency)
 		measured_us -= target->exit_latency;
 
+	/* Make sure our coefficients do not exceed unity */
+	if (measured_us > data->next_timer_us)
+		measured_us = data->next_timer_us;
 
 	/* Update our correction ratio */
 	new_factor = data->correction_factor[data->bucket];
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/7] Cpuidle: Do not substract exit latency from assumed sleep length
  2014-02-24  6:29 [PATCH 0/7] Cpuidle: Minor fixes Tuukka Tikkanen
                   ` (2 preceding siblings ...)
  2014-02-24  6:29 ` [PATCH 3/7] Cpuidle: Ensure menu coefficients stay within domain Tuukka Tikkanen
@ 2014-02-24  6:29 ` Tuukka Tikkanen
  2014-02-24  6:29 ` [PATCH 5/7] Cpuidle: Move perf multiplier calculation out of the selection loop Tuukka Tikkanen
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Tuukka Tikkanen @ 2014-02-24  6:29 UTC (permalink / raw)
  To: linux-pm, rjw; +Cc: daniel.lezcano, tuukka.tikkanen, linux-kernel

The menu governor statistics update function tries to determine the
amount of time between entry to low power state and the occurrence
of the wakeup event. However, the time measured by the framework
includes exit latency on top of the desired value. This exit latency
is substracted from the measured value to obtain the desired value.

When measured value is not available, the menu governor assumes
the wakeup was caused by the timer and the time is equal to remaining
timer length. No exit latency should be substracted from this value.

This patch prevents the erroneous substraction and clarifies the
associated comment. It also removes one intermediate variable that
serves no purpose.

Signed-off-by: Tuukka Tikkanen <tuukka.tikkanen@linaro.org>
---
 drivers/cpuidle/governors/menu.c |   44 ++++++++++++++++++++++----------------
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index f0995dd..b347c10 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -387,32 +387,40 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 {
 	struct menu_device *data = &__get_cpu_var(menu_devices);
 	int last_idx = data->last_state_idx;
-	unsigned int last_idle_us = cpuidle_get_last_residency(dev);
 	struct cpuidle_state *target = &drv->states[last_idx];
 	unsigned int measured_us;
 	unsigned int new_factor;
 
 	/*
-	 * Ugh, this idle state doesn't support residency measurements, so we
-	 * are basically lost in the dark.  As a compromise, assume we slept
-	 * for the whole expected time.
+	 * Try to figure out how much time passed between entry to low
+	 * power state and occurrence of the wakeup event.
+	 *
+	 * If the entered idle state didn't support residency measurements,
+	 * we are basically lost in the dark how much time passed.
+	 * As a compromise, assume we slept for the whole expected time.
+	 *
+	 * Any measured amount of time will include the exit latency.
+	 * Since we are interested in when the wakeup begun, not when it
+	 * was completed, we must substract the exit latency. However, if
+	 * the measured amount of time is less than the exit latency,
+	 * assume the state was never reached and the exit latency is 0.
 	 */
-	if (unlikely(!(target->flags & CPUIDLE_FLAG_TIME_VALID)))
-		last_idle_us = data->next_timer_us;
-
+	if (unlikely(!(target->flags & CPUIDLE_FLAG_TIME_VALID))) {
+		/* Use timer value as is */
+		measured_us = data->next_timer_us;
 
-	measured_us = last_idle_us;
+	} else {
+		/* Use measured value */
+		measured_us = cpuidle_get_last_residency(dev);
 
-	/*
-	 * We correct for the exit latency; we are assuming here that the
-	 * exit latency happens after the event that we're interested in.
-	 */
-	if (measured_us > target->exit_latency)
-		measured_us -= target->exit_latency;
+		/* Deduct exit latency */
+		if (measured_us > target->exit_latency)
+			measured_us -= target->exit_latency;
 
-	/* Make sure our coefficients do not exceed unity */
-	if (measured_us > data->next_timer_us)
-		measured_us = data->next_timer_us;
+		/* Make sure our coefficients do not exceed unity */
+		if (measured_us > data->next_timer_us)
+			measured_us = data->next_timer_us;
+	}
 
 	/* Update our correction ratio */
 	new_factor = data->correction_factor[data->bucket];
@@ -439,7 +447,7 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 	data->correction_factor[data->bucket] = new_factor;
 
 	/* update the repeating-pattern data */
-	data->intervals[data->interval_ptr++] = last_idle_us;
+	data->intervals[data->interval_ptr++] = measured_us;
 	if (data->interval_ptr >= INTERVALS)
 		data->interval_ptr = 0;
 }
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 5/7] Cpuidle: Move perf multiplier calculation out of the selection loop
  2014-02-24  6:29 [PATCH 0/7] Cpuidle: Minor fixes Tuukka Tikkanen
                   ` (3 preceding siblings ...)
  2014-02-24  6:29 ` [PATCH 4/7] Cpuidle: Do not substract exit latency from assumed sleep length Tuukka Tikkanen
@ 2014-02-24  6:29 ` Tuukka Tikkanen
  2014-02-24  6:29 ` [PATCH 6/7] Cpuidle: Deal with timer expiring in the past Tuukka Tikkanen
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Tuukka Tikkanen @ 2014-02-24  6:29 UTC (permalink / raw)
  To: linux-pm, rjw; +Cc: daniel.lezcano, tuukka.tikkanen, linux-kernel

The menu governor performance multiplier defines a minimum predicted
idle duration to latency ratio. Instead of checking this separately
in every iteration of the state selection loop, adjust the overall
latency restriction for the whole loop if this restriction is tighter
than what is set by the QoS subsystem.

The original test
s->exit_latency * multiplier > data->predicted_us
becomes
s->exit_latency > data->predicted_us / multiplier
by dividing both sides of the comparison by "multiplier".

While division is likely to be several times slower than multiplication,
the minor performance hit allows making a generic sleep state selection
function based on (sleep duration, maximum latency) tuple.

Signed-off-by: Tuukka Tikkanen <tuukka.tikkanen@linaro.org>
---
 drivers/cpuidle/governors/menu.c |   15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index b347c10..71b5232 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -288,7 +288,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 	struct menu_device *data = &__get_cpu_var(menu_devices);
 	int latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY);
 	int i;
-	int multiplier;
+	unsigned int interactivity_req;
 	struct timespec t;
 
 	if (data->needs_update) {
@@ -310,8 +310,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 
 	data->bucket = which_bucket(data->next_timer_us);
 
-	multiplier = performance_multiplier();
-
 	/*
 	 * if the correction factor is 0 (eg first time init or cpu hotplug
 	 * etc), we actually want to start out with a unity factor.
@@ -331,6 +329,15 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 	get_typical_interval(data);
 
 	/*
+	 * Performance multiplier defines a minimum predicted idle
+	 * duration / latency ratio. Adjust the latency limit if
+	 * necessary.
+	 */
+	interactivity_req = data->predicted_us / performance_multiplier();
+	if (latency_req > interactivity_req)
+		latency_req = interactivity_req;
+
+	/*
 	 * We want to default to C1 (hlt), not to busy polling
 	 * unless the timer is happening really really soon.
 	 */
@@ -353,8 +360,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 			continue;
 		if (s->exit_latency > latency_req)
 			continue;
-		if (s->exit_latency * multiplier > data->predicted_us)
-			continue;
 
 		data->last_state_idx = i;
 	}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 6/7] Cpuidle: Deal with timer expiring in the past
  2014-02-24  6:29 [PATCH 0/7] Cpuidle: Minor fixes Tuukka Tikkanen
                   ` (4 preceding siblings ...)
  2014-02-24  6:29 ` [PATCH 5/7] Cpuidle: Move perf multiplier calculation out of the selection loop Tuukka Tikkanen
@ 2014-02-24  6:29 ` Tuukka Tikkanen
  2014-02-24 17:05   ` Nicolas Pitre
  2014-03-06  7:41   ` Len Brown
  2014-02-24  6:29 ` [PATCH 7/7] Cpuidle: poll state can measure residency Tuukka Tikkanen
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 26+ messages in thread
From: Tuukka Tikkanen @ 2014-02-24  6:29 UTC (permalink / raw)
  To: linux-pm, rjw; +Cc: daniel.lezcano, tuukka.tikkanen, linux-kernel

Sometimes (fairly often) when the cpuidle menu governor is making a decision
about idle state to enter the next timer for the cpu appears to expire in
the past. The menu governor expects the expiry to always be in the future
and in fact stores the time delta in an unsigned variable. However, when
the expiry is in the past, the value returned by tick_nohz_get_sleep_length
can be negative. This patch prevents using negative values, instead making
the governor return immediately similar to having latency requirement set
to 0.

Note: As with latency == 0, the return value is 0 with no check to see if
the state 0 has been disabled or not.

Signed-off-by: Tuukka Tikkanen <tuukka.tikkanen@linaro.org>
---
 drivers/cpuidle/governors/menu.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 71b5232..c414468 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -302,8 +302,16 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 	if (unlikely(latency_req == 0))
 		return 0;
 
-	/* determine the expected residency time, round up */
+	/*
+	 * Determine the expected residency time. If the time is negative,
+	 * a timer interrupt has probably just expired after disabling
+	 * interrupts. Return as quickly as possible in the most shallow
+	 * state possible. tv_nsec is always positive, so only check the
+	 * seconds.
+	 */
 	t = ktime_to_timespec(tick_nohz_get_sleep_length());
+	if (t.tv_sec < 0)
+		return 0;
 	data->next_timer_us =
 		t.tv_sec * USEC_PER_SEC + t.tv_nsec / NSEC_PER_USEC;
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 7/7] Cpuidle: poll state can measure residency
  2014-02-24  6:29 [PATCH 0/7] Cpuidle: Minor fixes Tuukka Tikkanen
                   ` (5 preceding siblings ...)
  2014-02-24  6:29 ` [PATCH 6/7] Cpuidle: Deal with timer expiring in the past Tuukka Tikkanen
@ 2014-02-24  6:29 ` Tuukka Tikkanen
  2014-02-25 10:56   ` Daniel Lezcano
  2014-02-24 21:24 ` [PATCH 0/7] Cpuidle: Minor fixes Daniel Lezcano
  2014-02-26  0:46 ` Rafael J. Wysocki
  8 siblings, 1 reply; 26+ messages in thread
From: Tuukka Tikkanen @ 2014-02-24  6:29 UTC (permalink / raw)
  To: linux-pm, rjw; +Cc: daniel.lezcano, tuukka.tikkanen, linux-kernel

For some platforms, a poll state is inserted in the cpuidle driver states.
The flags for the state do not indicate that timekeeping is not affected.
As the state does not do anything apart from calling cpu_relax(), the
times returned by ktime_get should remain valid. Add the missing flag.

Signed-off-by: Tuukka Tikkanen <tuukka.tikkanen@linaro.org>
---
 drivers/cpuidle/driver.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
index 06dbe7c..136d6a2 100644
--- a/drivers/cpuidle/driver.c
+++ b/drivers/cpuidle/driver.c
@@ -209,7 +209,7 @@ static void poll_idle_init(struct cpuidle_driver *drv)
 	state->exit_latency = 0;
 	state->target_residency = 0;
 	state->power_usage = -1;
-	state->flags = 0;
+	state->flags = CPUIDLE_FLAG_TIME_VALID;
 	state->enter = poll_idle;
 	state->disabled = false;
 }
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/7] Cpuidle: rename expected_us to next_timer_us in menu governor
  2014-02-24  6:29 ` [PATCH 1/7] Cpuidle: rename expected_us to next_timer_us in menu governor Tuukka Tikkanen
@ 2014-02-24 16:52   ` Nicolas Pitre
  2014-03-05 22:26   ` Len Brown
  1 sibling, 0 replies; 26+ messages in thread
From: Nicolas Pitre @ 2014-02-24 16:52 UTC (permalink / raw)
  To: Tuukka Tikkanen; +Cc: linux-pm, rjw, daniel.lezcano, linux-kernel

On Mon, 24 Feb 2014, Tuukka Tikkanen wrote:

> The field expected_us is used to store the time remaining until next
> timer expiry. The name is inaccurate, as we really do not expect all
> wakeups to be caused by timers. In addition, another field with a very
> similar name (predicted_us) is used to store the predicted time
> remaining until any wakeup source being active.
> 
> This patch renames expected_us to next_timer_us in order to better
> reflect the contained information.
> 
> Signed-off-by: Tuukka Tikkanen <tuukka.tikkanen@linaro.org>

Acked-by: Nicolas Pitre <nico@linaro.org>

> ---
>  drivers/cpuidle/governors/menu.c |   18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index cf7f2f0..e9a2a27 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -122,7 +122,7 @@ struct menu_device {
>  	int		last_state_idx;
>  	int             needs_update;
>  
> -	unsigned int	expected_us;
> +	unsigned int	next_timer_us;
>  	unsigned int	predicted_us;
>  	unsigned int	exit_us;
>  	unsigned int	bucket;
> @@ -257,7 +257,7 @@ again:
>  		stddev = int_sqrt(stddev);
>  		if (((avg > stddev * 6) && (divisor * 4 >= INTERVALS * 3))
>  							|| stddev <= 20) {
> -			if (data->expected_us > avg)
> +			if (data->next_timer_us > avg)
>  				data->predicted_us = avg;
>  			return;
>  		}
> @@ -306,11 +306,11 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>  
>  	/* determine the expected residency time, round up */
>  	t = ktime_to_timespec(tick_nohz_get_sleep_length());
> -	data->expected_us =
> +	data->next_timer_us =
>  		t.tv_sec * USEC_PER_SEC + t.tv_nsec / NSEC_PER_USEC;
>  
>  
> -	data->bucket = which_bucket(data->expected_us);
> +	data->bucket = which_bucket(data->next_timer_us);
>  
>  	multiplier = performance_multiplier();
>  
> @@ -326,7 +326,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>  	 * operands are 32 bits.
>  	 * Make sure to round up for half microseconds.
>  	 */
> -	data->predicted_us = div_round64((uint64_t)data->expected_us *
> +	data->predicted_us = div_round64((uint64_t)data->next_timer_us *
>  					 data->correction_factor[data->bucket],
>  					 RESOLUTION * DECAY);
>  
> @@ -336,7 +336,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>  	 * We want to default to C1 (hlt), not to busy polling
>  	 * unless the timer is happening really really soon.
>  	 */
> -	if (data->expected_us > 5 &&
> +	if (data->next_timer_us > 5 &&
>  	    !drv->states[CPUIDLE_DRIVER_STATE_START].disabled &&
>  		dev->states_usage[CPUIDLE_DRIVER_STATE_START].disable == 0)
>  		data->last_state_idx = CPUIDLE_DRIVER_STATE_START;
> @@ -401,7 +401,7 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>  	 * for the whole expected time.
>  	 */
>  	if (unlikely(!(target->flags & CPUIDLE_FLAG_TIME_VALID)))
> -		last_idle_us = data->expected_us;
> +		last_idle_us = data->next_timer_us;
>  
>  
>  	measured_us = last_idle_us;
> @@ -418,8 +418,8 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>  	new_factor = data->correction_factor[data->bucket];
>  	new_factor -= new_factor / DECAY;
>  
> -	if (data->expected_us > 0 && measured_us < MAX_INTERESTING)
> -		new_factor += RESOLUTION * measured_us / data->expected_us;
> +	if (data->next_timer_us > 0 && measured_us < MAX_INTERESTING)
> +		new_factor += RESOLUTION * measured_us / data->next_timer_us;
>  	else
>  		/*
>  		 * we were idle so long that we count it as a perfect
> -- 
> 1.7.9.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/7] Cpuidle: Ensure menu coefficients stay within domain
  2014-02-24  6:29 ` [PATCH 3/7] Cpuidle: Ensure menu coefficients stay within domain Tuukka Tikkanen
@ 2014-02-24 16:57   ` Nicolas Pitre
  0 siblings, 0 replies; 26+ messages in thread
From: Nicolas Pitre @ 2014-02-24 16:57 UTC (permalink / raw)
  To: Tuukka Tikkanen; +Cc: linux-pm, rjw, daniel.lezcano, linux-kernel

On Mon, 24 Feb 2014, Tuukka Tikkanen wrote:

> The menu governor uses coefficients as one method of actual idle
> period length estimation. The coefficients are, as detailed below,
> multipliers giving expected idle period length from time until next
> timer expiry. The multipliers are supposed to have domain of (0..1].
> 
> The coefficients are fractions where only the numerators are stored
> and denominators are a shared constant RESOLUTION*DECAY. Since the
> value of the coefficient should always be greater than 0 and less
> than or equal to 1, the numerator must have a value greater than
> 0 and less than or equal to RESOLUTION*DECAY.
> 
> If the coefficients are updated with measured idle durations exceeding
> timer length, the multiplier may reach values exceeding unity (i.e.
> the stored numerator exceeds RESOLUTION*DECAY). This patch ensures that
> the multipliers are updated with durations capped to timer length.
> 
> Signed-off-by: Tuukka Tikkanen <tuukka.tikkanen@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>


> ---
>  drivers/cpuidle/governors/menu.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index 115386a..f0995dd 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -410,6 +410,9 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>  	if (measured_us > target->exit_latency)
>  		measured_us -= target->exit_latency;
>  
> +	/* Make sure our coefficients do not exceed unity */
> +	if (measured_us > data->next_timer_us)
> +		measured_us = data->next_timer_us;
>  
>  	/* Update our correction ratio */
>  	new_factor = data->correction_factor[data->bucket];
> -- 
> 1.7.9.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/7] Cpuidle: Deal with timer expiring in the past
  2014-02-24  6:29 ` [PATCH 6/7] Cpuidle: Deal with timer expiring in the past Tuukka Tikkanen
@ 2014-02-24 17:05   ` Nicolas Pitre
  2014-02-25  0:20     ` Rafael J. Wysocki
  2014-03-06  7:41   ` Len Brown
  1 sibling, 1 reply; 26+ messages in thread
From: Nicolas Pitre @ 2014-02-24 17:05 UTC (permalink / raw)
  To: Tuukka Tikkanen; +Cc: linux-pm, rjw, daniel.lezcano, linux-kernel

On Mon, 24 Feb 2014, Tuukka Tikkanen wrote:

> Sometimes (fairly often) when the cpuidle menu governor is making a decision
> about idle state to enter the next timer for the cpu appears to expire in
> the past. The menu governor expects the expiry to always be in the future
> and in fact stores the time delta in an unsigned variable. However, when
> the expiry is in the past, the value returned by tick_nohz_get_sleep_length
> can be negative. This patch prevents using negative values, instead making
> the governor return immediately similar to having latency requirement set
> to 0.
> 
> Note: As with latency == 0, the return value is 0 with no check to see if
> the state 0 has been disabled or not.

In your cover letter you mention some occurrences of the negative result 
being observed on x86.  That information is worth capturing in the 
commit log as well to distinguish between theoretical problems from 
actual observations.

> 
> Signed-off-by: Tuukka Tikkanen <tuukka.tikkanen@linaro.org>
> ---
>  drivers/cpuidle/governors/menu.c |   10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index 71b5232..c414468 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -302,8 +302,16 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>  	if (unlikely(latency_req == 0))
>  		return 0;
>  
> -	/* determine the expected residency time, round up */
> +	/*
> +	 * Determine the expected residency time. If the time is negative,
> +	 * a timer interrupt has probably just expired after disabling
> +	 * interrupts. Return as quickly as possible in the most shallow
> +	 * state possible. tv_nsec is always positive, so only check the
> +	 * seconds.
> +	 */
>  	t = ktime_to_timespec(tick_nohz_get_sleep_length());
> +	if (t.tv_sec < 0)
> +		return 0;
>  	data->next_timer_us =
>  		t.tv_sec * USEC_PER_SEC + t.tv_nsec / NSEC_PER_USEC;
>  
> -- 
> 1.7.9.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/7] Cpuidle: Minor fixes
  2014-02-24  6:29 [PATCH 0/7] Cpuidle: Minor fixes Tuukka Tikkanen
                   ` (6 preceding siblings ...)
  2014-02-24  6:29 ` [PATCH 7/7] Cpuidle: poll state can measure residency Tuukka Tikkanen
@ 2014-02-24 21:24 ` Daniel Lezcano
  2014-02-26  0:46 ` Rafael J. Wysocki
  8 siblings, 0 replies; 26+ messages in thread
From: Daniel Lezcano @ 2014-02-24 21:24 UTC (permalink / raw)
  To: Tuukka Tikkanen, linux-pm, rjw; +Cc: linux-kernel, Arjan van de Ven

On 02/24/2014 07:29 AM, Tuukka Tikkanen wrote:
> This set of patches makes some minor changes to menu governor and the poll
> idle state.
>
> Patch 1 is simply a rename of a variable to make the name better represent
> the contained information.
>
> Patch 2 fixes calculating actual residency in cases where the entered state
> is different from the state decided by the menu governor.
>
> Patch 3 makes sure the menu governor timer coefficients are not updated
> with values that might cause a coefficient to reach a value greater than
> unity.
>
> Patch 4 fixes calculation actual residency in cases where the entered state
> does not support measuring residency. In such cases the residency time
> is taken from time remaining until next timer expiry. The timer is expected
> to go off at the start of exit latency, not after it. Therefore the exit
> latency must not be substracted from the assumed value.
>
> Patch 5 moves the performance multiplier based comparison out of the state
> selection loop by changing it into a latency requirement. This allows
> using a generic state selection function accepting only (duration, latency)
> tuple as input. The change is possible by noting that performance multiplier
> is used only to check for a minimum predicted idle duration to exit latency
> ratio. As predicted idle duration is a constant for the loop, the maximum
> allowed latency can be calculated outside of the loop.
>
> Patch 6 prevents using negative values from tick_nohz_get_sleep_length()
> in the menu governor. If unchecked, the negative values are used as huge
> unsigned values. Negative values occur fairly often (e.g. on x86_64 I've
> seen this happen several times per minute) on a busy system, allowing
> the deepest state to win the selection while the shallowest should be picked.
>
> Patch 7 adds CPUIDLE_FLAG_TIME_VALID to poll_idle. I do not know of any
> platfrom where cpu_relax() would break ktime_get() and in fact poll_idle
> uses ktime_get itself.
> (Note: poll_idle updates dev->last_residency for some reason. Does it ever
> get called without going through cpuidle_enter_state, which will overwrite
> the value? Even if some state redirects to this state, the call will
> eventually return to the framework. The redundant time measurement could
> be removed, unless there is some obscure way of getting called on some
> platform that I am unable to figure out.)
>
> Tuukka Tikkanen (7):
>    Cpuidle: rename expected_us to next_timer_us in menu governor
>    Cpuidle: Use actual state latency in menu governor
>    Cpuidle: Ensure menu coefficients stay within domain
>    Cpuidle: Do not substract exit latency from assumed sleep length
>    Cpuidle: Move perf multiplier calculation out of the selection loop
>    Cpuidle: Deal with timer expiring in the past
>    Cpuidle: poll state can measure residency
>
>   drivers/cpuidle/driver.c         |    2 +-
>   drivers/cpuidle/governors/menu.c |   85 ++++++++++++++++++++++++--------------
>   2 files changed, 54 insertions(+), 33 deletions(-)

You should put Arjan van de Ven in Cc for the menu governor changes.

Thanks
   -- Daniel


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/7] Cpuidle: Use actual state latency in menu governor
  2014-02-24  6:29 ` [PATCH 2/7] Cpuidle: Use actual state latency " Tuukka Tikkanen
@ 2014-02-24 22:35   ` Daniel Lezcano
  0 siblings, 0 replies; 26+ messages in thread
From: Daniel Lezcano @ 2014-02-24 22:35 UTC (permalink / raw)
  To: Tuukka Tikkanen, linux-pm, rjw; +Cc: linux-kernel

On 02/24/2014 07:29 AM, Tuukka Tikkanen wrote:
> Currently menu governor records the exit latency of the state it has
> chosen for the idle period. The stored latency value is then later
> used to calculate the actual length of the idle period. This value
> may however be incorrect, as the entered state may not be the one
> chosen by the governor. The entered state information is available,
> so we can use that to obtain the real exit latency.
>
> Signed-off-by: Tuukka Tikkanen <tuukka.tikkanen@linaro.org>

Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>

> ---
>   drivers/cpuidle/governors/menu.c |    7 ++-----
>   1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index e9a2a27..115386a 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -124,7 +124,6 @@ struct menu_device {
>
>   	unsigned int	next_timer_us;
>   	unsigned int	predicted_us;
> -	unsigned int	exit_us;
>   	unsigned int	bucket;
>   	unsigned int	correction_factor[BUCKETS];
>   	unsigned int	intervals[INTERVALS];
> @@ -298,7 +297,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>   	}
>
>   	data->last_state_idx = 0;
> -	data->exit_us = 0;
>
>   	/* Special case when user has set very strict latency requirement */
>   	if (unlikely(latency_req == 0))
> @@ -359,7 +357,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>   			continue;
>
>   		data->last_state_idx = i;
> -		data->exit_us = s->exit_latency;
>   	}
>
>   	return data->last_state_idx;
> @@ -410,8 +407,8 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>   	 * We correct for the exit latency; we are assuming here that the
>   	 * exit latency happens after the event that we're interested in.
>   	 */
> -	if (measured_us > data->exit_us)
> -		measured_us -= data->exit_us;
> +	if (measured_us > target->exit_latency)
> +		measured_us -= target->exit_latency;
>
>
>   	/* Update our correction ratio */
>


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/7] Cpuidle: Deal with timer expiring in the past
  2014-02-24 17:05   ` Nicolas Pitre
@ 2014-02-25  0:20     ` Rafael J. Wysocki
  0 siblings, 0 replies; 26+ messages in thread
From: Rafael J. Wysocki @ 2014-02-25  0:20 UTC (permalink / raw)
  To: Nicolas Pitre, Tuukka Tikkanen; +Cc: linux-pm, daniel.lezcano, linux-kernel

On Monday, February 24, 2014 12:05:39 PM Nicolas Pitre wrote:
> On Mon, 24 Feb 2014, Tuukka Tikkanen wrote:
> 
> > Sometimes (fairly often) when the cpuidle menu governor is making a decision
> > about idle state to enter the next timer for the cpu appears to expire in
> > the past. The menu governor expects the expiry to always be in the future
> > and in fact stores the time delta in an unsigned variable. However, when
> > the expiry is in the past, the value returned by tick_nohz_get_sleep_length
> > can be negative. This patch prevents using negative values, instead making
> > the governor return immediately similar to having latency requirement set
> > to 0.
> > 
> > Note: As with latency == 0, the return value is 0 with no check to see if
> > the state 0 has been disabled or not.
> 
> In your cover letter you mention some occurrences of the negative result 
> being observed on x86.  That information is worth capturing in the 
> commit log as well to distinguish between theoretical problems from 
> actual observations.

In my opinion that applies to the other patches in the series too.  In all
cases it would be good to know whether or not the problems addressed are
theoretical or they can be reproduced in practice (and if so, then how), or
they already have been reported by somebody.

In the two latter cases the patches would be good candidates for -stable,
in particular.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 7/7] Cpuidle: poll state can measure residency
  2014-02-24  6:29 ` [PATCH 7/7] Cpuidle: poll state can measure residency Tuukka Tikkanen
@ 2014-02-25 10:56   ` Daniel Lezcano
  2014-02-25 15:40     ` Tuukka Tikkanen
  0 siblings, 1 reply; 26+ messages in thread
From: Daniel Lezcano @ 2014-02-25 10:56 UTC (permalink / raw)
  To: Tuukka Tikkanen, linux-pm, rjw; +Cc: linux-kernel

On 02/24/2014 07:29 AM, Tuukka Tikkanen wrote:
> For some platforms, a poll state is inserted in the cpuidle driver states.
> The flags for the state do not indicate that timekeeping is not affected.
> As the state does not do anything apart from calling cpu_relax(), the
> times returned by ktime_get should remain valid. Add the missing flag.

Yes, but it is done with the interrupt enabled, so the interrupt handler 
+ softirq handler times will be accounted in the residency time.

I am not sure adding this flag is good.

> Signed-off-by: Tuukka Tikkanen <tuukka.tikkanen@linaro.org>
> ---
>   drivers/cpuidle/driver.c |    2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
> index 06dbe7c..136d6a2 100644
> --- a/drivers/cpuidle/driver.c
> +++ b/drivers/cpuidle/driver.c
> @@ -209,7 +209,7 @@ static void poll_idle_init(struct cpuidle_driver *drv)
>   	state->exit_latency = 0;
>   	state->target_residency = 0;
>   	state->power_usage = -1;
> -	state->flags = 0;
> +	state->flags = CPUIDLE_FLAG_TIME_VALID;
>   	state->enter = poll_idle;
>   	state->disabled = false;
>   }
>


-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 7/7] Cpuidle: poll state can measure residency
  2014-02-25 10:56   ` Daniel Lezcano
@ 2014-02-25 15:40     ` Tuukka Tikkanen
  0 siblings, 0 replies; 26+ messages in thread
From: Tuukka Tikkanen @ 2014-02-25 15:40 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: Linux PM list, Rafael J. Wysocki, linux-kernel

On 25 February 2014 12:56, Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
> On 02/24/2014 07:29 AM, Tuukka Tikkanen wrote:
>>
>> For some platforms, a poll state is inserted in the cpuidle driver states.
>> The flags for the state do not indicate that timekeeping is not affected.
>> As the state does not do anything apart from calling cpu_relax(), the
>> times returned by ktime_get should remain valid. Add the missing flag.
>
>
> Yes, but it is done with the interrupt enabled, so the interrupt handler +
> softirq handler times will be accounted in the residency time.
>
> I am not sure adding this flag is good.

Granted, with a slow CPU and very complex interrupt handing there
might be some extra time added. So let's consider what might happen:

Menu: Not having this flag assumes the residency matches time until
next timer expiry. Any measured amount of time less than that
indicates that the residency was shorter. We might not know exactly
how much, but we are closer to the truth and everything is fine. So
that leaves reporting time that exceeds what was established as the
time until next timer expiry. That too is OK (assuming another patch
in the series) as the value is limited to the time until next timer
expiry. In short, we get the same or better outcome and have no
issues.

Ladder: Last residency is assumed to be
last_state->threshold.promotion_time + 1 if the flag is not set.
Obviously we won't be considering demotion in that case and will go
into the promotion path. Any measured value below that is again closer
to the truth and any value at or above that is simply going to result
in the same outcome.

The only remaining issue is any hypothetical governor not currently in
the kernel tree. It would have to depend on accurate measurements
while being able to work with states that do not have the valid time
flag. I'm not sure if I can picture a scenario like that.

Tuukka

>
>> Signed-off-by: Tuukka Tikkanen <tuukka.tikkanen@linaro.org>
>> ---
>>   drivers/cpuidle/driver.c |    2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
>> index 06dbe7c..136d6a2 100644
>> --- a/drivers/cpuidle/driver.c
>> +++ b/drivers/cpuidle/driver.c
>> @@ -209,7 +209,7 @@ static void poll_idle_init(struct cpuidle_driver *drv)
>>         state->exit_latency = 0;
>>         state->target_residency = 0;
>>         state->power_usage = -1;
>> -       state->flags = 0;
>> +       state->flags = CPUIDLE_FLAG_TIME_VALID;
>>         state->enter = poll_idle;
>>         state->disabled = false;
>>   }
>>
>
>
> --
>  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>
> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/7] Cpuidle: Minor fixes
  2014-02-24  6:29 [PATCH 0/7] Cpuidle: Minor fixes Tuukka Tikkanen
                   ` (7 preceding siblings ...)
  2014-02-24 21:24 ` [PATCH 0/7] Cpuidle: Minor fixes Daniel Lezcano
@ 2014-02-26  0:46 ` Rafael J. Wysocki
  2014-03-07 12:12   ` Rafael J. Wysocki
  8 siblings, 1 reply; 26+ messages in thread
From: Rafael J. Wysocki @ 2014-02-26  0:46 UTC (permalink / raw)
  To: Tuukka Tikkanen; +Cc: linux-pm, daniel.lezcano, linux-kernel

On Monday, February 24, 2014 08:29:30 AM Tuukka Tikkanen wrote:
> This set of patches makes some minor changes to menu governor and the poll
> idle state.
> 
> Patch 1 is simply a rename of a variable to make the name better represent
> the contained information.
> 
> Patch 2 fixes calculating actual residency in cases where the entered state
> is different from the state decided by the menu governor.
> 
> Patch 3 makes sure the menu governor timer coefficients are not updated
> with values that might cause a coefficient to reach a value greater than
> unity.
> 
> Patch 4 fixes calculation actual residency in cases where the entered state
> does not support measuring residency. In such cases the residency time
> is taken from time remaining until next timer expiry. The timer is expected
> to go off at the start of exit latency, not after it. Therefore the exit
> latency must not be substracted from the assumed value.
> 
> Patch 5 moves the performance multiplier based comparison out of the state
> selection loop by changing it into a latency requirement. This allows
> using a generic state selection function accepting only (duration, latency)
> tuple as input. The change is possible by noting that performance multiplier
> is used only to check for a minimum predicted idle duration to exit latency
> ratio. As predicted idle duration is a constant for the loop, the maximum
> allowed latency can be calculated outside of the loop.
> 
> Patch 6 prevents using negative values from tick_nohz_get_sleep_length()
> in the menu governor. If unchecked, the negative values are used as huge
> unsigned values. Negative values occur fairly often (e.g. on x86_64 I've
> seen this happen several times per minute) on a busy system, allowing
> the deepest state to win the selection while the shallowest should be picked.
> 
> Patch 7 adds CPUIDLE_FLAG_TIME_VALID to poll_idle. I do not know of any
> platfrom where cpu_relax() would break ktime_get() and in fact poll_idle
> uses ktime_get itself.
> (Note: poll_idle updates dev->last_residency for some reason. Does it ever
> get called without going through cpuidle_enter_state, which will overwrite
> the value? Even if some state redirects to this state, the call will
> eventually return to the framework. The redundant time measurement could
> be removed, unless there is some obscure way of getting called on some
> platform that I am unable to figure out.)
> 
> Tuukka Tikkanen (7):
>   Cpuidle: rename expected_us to next_timer_us in menu governor
>   Cpuidle: Use actual state latency in menu governor
>   Cpuidle: Ensure menu coefficients stay within domain
>   Cpuidle: Do not substract exit latency from assumed sleep length
>   Cpuidle: Move perf multiplier calculation out of the selection loop
>   Cpuidle: Deal with timer expiring in the past
>   Cpuidle: poll state can measure residency
> 
>  drivers/cpuidle/driver.c         |    2 +-
>  drivers/cpuidle/governors/menu.c |   85 ++++++++++++++++++++++++--------------
>  2 files changed, 54 insertions(+), 33 deletions(-)

I'm queuing this series up for 3.15.

It looks like at least some of these patches should go into 3.14 and -stable,
but for that I'd need some references like links to bug reports or documented
reproducibility (on what hardware, what it takes to reproduce etc.).

Thanks!

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/7] Cpuidle: rename expected_us to next_timer_us in menu governor
  2014-02-24  6:29 ` [PATCH 1/7] Cpuidle: rename expected_us to next_timer_us in menu governor Tuukka Tikkanen
  2014-02-24 16:52   ` Nicolas Pitre
@ 2014-03-05 22:26   ` Len Brown
  1 sibling, 0 replies; 26+ messages in thread
From: Len Brown @ 2014-03-05 22:26 UTC (permalink / raw)
  To: Tuukka Tikkanen
  Cc: Linux PM list, Rafael J. Wysocki, Daniel Lezcano, linux-kernel

> This patch renames expected_us to next_timer_us in order to better
> reflect the contained information.
>
> Signed-off-by: Tuukka Tikkanen <tuukka.tikkanen@linaro.org>

Acked-by: Len Brown <len.brown@intel.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/7] Cpuidle: Deal with timer expiring in the past
  2014-02-24  6:29 ` [PATCH 6/7] Cpuidle: Deal with timer expiring in the past Tuukka Tikkanen
  2014-02-24 17:05   ` Nicolas Pitre
@ 2014-03-06  7:41   ` Len Brown
  2014-03-07  3:09     ` Len Brown
  2014-03-10 10:54     ` Tuukka Tikkanen
  1 sibling, 2 replies; 26+ messages in thread
From: Len Brown @ 2014-03-06  7:41 UTC (permalink / raw)
  To: Tuukka Tikkanen
  Cc: Linux PM list, Rafael J. Wysocki, Daniel Lezcano, linux-kernel

On Mon, Feb 24, 2014 at 1:29 AM, Tuukka Tikkanen
<tuukka.tikkanen@linaro.org> wrote:
> Sometimes (fairly often) when the cpuidle menu governor is making a decision
> about idle state to enter the next timer for the cpu appears to expire in
> the past. The menu governor expects the expiry to always be in the future
> and in fact stores the time delta in an unsigned variable. However, when
> the expiry is in the past, the value returned by tick_nohz_get_sleep_length
> can be negative. This patch prevents using negative values, instead making
> the governor return immediately similar to having latency requirement set
> to 0.
>
> Note: As with latency == 0, the return value is 0 with no check to see if
> the state 0 has been disabled or not.
>
> Signed-off-by: Tuukka Tikkanen <tuukka.tikkanen@linaro.org>
> ---
>  drivers/cpuidle/governors/menu.c |   10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index 71b5232..c414468 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -302,8 +302,16 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>         if (unlikely(latency_req == 0))
>                 return 0;
>
> -       /* determine the expected residency time, round up */
> +       /*
> +        * Determine the expected residency time. If the time is negative,
> +        * a timer interrupt has probably just expired after disabling
> +        * interrupts. Return as quickly as possible in the most shallow
> +        * state possible. tv_nsec is always positive, so only check the
> +        * seconds.
> +        */
>         t = ktime_to_timespec(tick_nohz_get_sleep_length());
> +       if (t.tv_sec < 0)
> +               return 0;
>         data->next_timer_us =
>                 t.tv_sec * USEC_PER_SEC + t.tv_nsec / NSEC_PER_USEC;
>

Are there special conditions that are necessary to provoke a negative
return value?
I've traced this code on several systems, and never seen a negative
return value.

However...
I do see values up to 300.2 seconds, and those large values seem to decay
at the rate of real-time so that after 5 minutes they are small, and then
jump back up to 300 seconds.

Some folks at Oracle debugged it down to use of NEXT_TIMER_MAX_DELTA
when there is _no_ timer currently pending on that CPU.  It seems this is easier
to observe, the more CPUs a system has -- though I've been able to reproduce
it on a system as small as a single-package 8-cpu systems.

One proposed way to address this is to cap large values at 1 second.
However, that will not recognize that for the period when the large
value decays to under 1 second, all of those are fiction.

Also, if we could identify the case where there is no future timer,
it seems that re-using dev->last_residency would probably be
a more useful guess than pretending we'll have a timer expire in 1 second.

thanks,
Len Brown, Intel Open Source Technology Cente

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/7] Cpuidle: Deal with timer expiring in the past
  2014-03-06  7:41   ` Len Brown
@ 2014-03-07  3:09     ` Len Brown
  2014-03-10 10:54     ` Tuukka Tikkanen
  1 sibling, 0 replies; 26+ messages in thread
From: Len Brown @ 2014-03-07  3:09 UTC (permalink / raw)
  To: Tuukka Tikkanen
  Cc: Linux PM list, Rafael J. Wysocki, Daniel Lezcano, linux-kernel

NAK

tick_nohz_get_sleep_length() has exactly 1 consumer -- cpuidle.

Rather than have cpuidle second guess what is returned,
tick_nohz_get_sleep_length() should be updated to return a value
that makes sense to cpuidle.

thanks,
Len Brown, Intel Open Source Technology Cent

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/7] Cpuidle: Minor fixes
  2014-02-26  0:46 ` Rafael J. Wysocki
@ 2014-03-07 12:12   ` Rafael J. Wysocki
  2014-03-11 19:35     ` Rik van Riel
  2014-03-11 21:06     ` Rik van Riel
  0 siblings, 2 replies; 26+ messages in thread
From: Rafael J. Wysocki @ 2014-03-07 12:12 UTC (permalink / raw)
  To: Tuukka Tikkanen; +Cc: linux-pm, daniel.lezcano, linux-kernel, Len Brown

On Wednesday, February 26, 2014 01:46:25 AM Rafael J. Wysocki wrote:
> On Monday, February 24, 2014 08:29:30 AM Tuukka Tikkanen wrote:
> > This set of patches makes some minor changes to menu governor and the poll
> > idle state.
> > 
> > Patch 1 is simply a rename of a variable to make the name better represent
> > the contained information.
> > 
> > Patch 2 fixes calculating actual residency in cases where the entered state
> > is different from the state decided by the menu governor.
> > 
> > Patch 3 makes sure the menu governor timer coefficients are not updated
> > with values that might cause a coefficient to reach a value greater than
> > unity.
> > 
> > Patch 4 fixes calculation actual residency in cases where the entered state
> > does not support measuring residency. In such cases the residency time
> > is taken from time remaining until next timer expiry. The timer is expected
> > to go off at the start of exit latency, not after it. Therefore the exit
> > latency must not be substracted from the assumed value.
> > 
> > Patch 5 moves the performance multiplier based comparison out of the state
> > selection loop by changing it into a latency requirement. This allows
> > using a generic state selection function accepting only (duration, latency)
> > tuple as input. The change is possible by noting that performance multiplier
> > is used only to check for a minimum predicted idle duration to exit latency
> > ratio. As predicted idle duration is a constant for the loop, the maximum
> > allowed latency can be calculated outside of the loop.
> > 
> > Patch 6 prevents using negative values from tick_nohz_get_sleep_length()
> > in the menu governor. If unchecked, the negative values are used as huge
> > unsigned values. Negative values occur fairly often (e.g. on x86_64 I've
> > seen this happen several times per minute) on a busy system, allowing
> > the deepest state to win the selection while the shallowest should be picked.
> > 
> > Patch 7 adds CPUIDLE_FLAG_TIME_VALID to poll_idle. I do not know of any
> > platfrom where cpu_relax() would break ktime_get() and in fact poll_idle
> > uses ktime_get itself.
> > (Note: poll_idle updates dev->last_residency for some reason. Does it ever
> > get called without going through cpuidle_enter_state, which will overwrite
> > the value? Even if some state redirects to this state, the call will
> > eventually return to the framework. The redundant time measurement could
> > be removed, unless there is some obscure way of getting called on some
> > platform that I am unable to figure out.)
> > 
> > Tuukka Tikkanen (7):
> >   Cpuidle: rename expected_us to next_timer_us in menu governor
> >   Cpuidle: Use actual state latency in menu governor
> >   Cpuidle: Ensure menu coefficients stay within domain
> >   Cpuidle: Do not substract exit latency from assumed sleep length
> >   Cpuidle: Move perf multiplier calculation out of the selection loop
> >   Cpuidle: Deal with timer expiring in the past
> >   Cpuidle: poll state can measure residency
> > 
> >  drivers/cpuidle/driver.c         |    2 +-
> >  drivers/cpuidle/governors/menu.c |   85 ++++++++++++++++++++++++--------------
> >  2 files changed, 54 insertions(+), 33 deletions(-)
> 
> I'm queuing this series up for 3.15.

Patch [6/7] dropped, because of the Len's objection.

Thanks!

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/7] Cpuidle: Deal with timer expiring in the past
  2014-03-06  7:41   ` Len Brown
  2014-03-07  3:09     ` Len Brown
@ 2014-03-10 10:54     ` Tuukka Tikkanen
  2014-03-21 23:10       ` Len Brown
  1 sibling, 1 reply; 26+ messages in thread
From: Tuukka Tikkanen @ 2014-03-10 10:54 UTC (permalink / raw)
  To: Len Brown; +Cc: Linux PM list, Rafael J. Wysocki, Daniel Lezcano, linux-kernel

Hi,

On 6 March 2014 09:41, Len Brown <lenb@kernel.org> wrote:
>
> On Mon, Feb 24, 2014 at 1:29 AM, Tuukka Tikkanen
> <tuukka.tikkanen@linaro.org> wrote:
> > Sometimes (fairly often) when the cpuidle menu governor is making a decision
> > about idle state to enter the next timer for the cpu appears to expire in
> > the past. The menu governor expects the expiry to always be in the future
> > and in fact stores the time delta in an unsigned variable. However, when
> > the expiry is in the past, the value returned by tick_nohz_get_sleep_length
> > can be negative. This patch prevents using negative values, instead making
> > the governor return immediately similar to having latency requirement set
> > to 0.
> >
> > Note: As with latency == 0, the return value is 0 with no check to see if
> > the state 0 has been disabled or not.
> >
> > Signed-off-by: Tuukka Tikkanen <tuukka.tikkanen@linaro.org>
> > ---
> >  drivers/cpuidle/governors/menu.c |   10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> > index 71b5232..c414468 100644
> > --- a/drivers/cpuidle/governors/menu.c
> > +++ b/drivers/cpuidle/governors/menu.c
> > @@ -302,8 +302,16 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
> >         if (unlikely(latency_req == 0))
> >                 return 0;
> >
> > -       /* determine the expected residency time, round up */
> > +       /*
> > +        * Determine the expected residency time. If the time is negative,
> > +        * a timer interrupt has probably just expired after disabling
> > +        * interrupts. Return as quickly as possible in the most shallow
> > +        * state possible. tv_nsec is always positive, so only check the
> > +        * seconds.
> > +        */
> >         t = ktime_to_timespec(tick_nohz_get_sleep_length());
> > +       if (t.tv_sec < 0)
> > +               return 0;
> >         data->next_timer_us =
> >                 t.tv_sec * USEC_PER_SEC + t.tv_nsec / NSEC_PER_USEC;
> >
>
> Are there special conditions that are necessary to provoke a negative
> return value?
> I've traced this code on several systems, and never seen a negative
> return value.
>

By changing the if statement to
if (WARN_ONCE(t.tv_sec < 0, "Next timer in the past. tv_sec: %ld,
tv_usec: %lu", t.tv_sec, t.tv_usec)) return 0;

I eventually got
[ 3092.355144] WARNING: CPU: 4 PID: 0 at
drivers/cpuidle/governors/menu.c:315 menu_select+0x410/0x420()
[ 3092.355145] Next timer in the past. tv_sec: -1, tv_usec: 999999645
...
[ 3092.355169] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 3.14.0-rc3+ #3

so the problem is still lurking there. However, as you can see from
the timestamp, either something has changed recently and made it much
much less frequent or I did not use the use case / hardware that
triggers it more often. It has been quite a long time since I
discovered and fixed this in my private trees, so I have forgotten the
best way to reproduce this, sorry. In any case, existence is still
confirmed.

This was on a i7-3770K with 64 bit kernel, running make clean && make
-j8 && make clean && make for kernel source tree. If I happen to
remember the use case + hardware combination that triggers this
rapidly, I'll let you know.

That said, I agree dropping this patch and fixing the called function
is a better idea.

Tuukka

>
> However...
> I do see values up to 300.2 seconds, and those large values seem to decay
> at the rate of real-time so that after 5 minutes they are small, and then
> jump back up to 300 seconds.
>
> Some folks at Oracle debugged it down to use of NEXT_TIMER_MAX_DELTA
> when there is _no_ timer currently pending on that CPU.  It seems this is easier
> to observe, the more CPUs a system has -- though I've been able to reproduce
> it on a system as small as a single-package 8-cpu systems.
>
> One proposed way to address this is to cap large values at 1 second.
> However, that will not recognize that for the period when the large
> value decays to under 1 second, all of those are fiction.
>
> Also, if we could identify the case where there is no future timer,
> it seems that re-using dev->last_residency would probably be
> a more useful guess than pretending we'll have a timer expire in 1 second.
>
> thanks,
> Len Brown, Intel Open Source Technology Cente

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/7] Cpuidle: Minor fixes
  2014-03-07 12:12   ` Rafael J. Wysocki
@ 2014-03-11 19:35     ` Rik van Riel
  2014-03-11 20:07       ` Rafael J. Wysocki
  2014-03-11 21:06     ` Rik van Riel
  1 sibling, 1 reply; 26+ messages in thread
From: Rik van Riel @ 2014-03-11 19:35 UTC (permalink / raw)
  To: Rafael J. Wysocki, Tuukka Tikkanen
  Cc: linux-pm, daniel.lezcano, linux-kernel, Len Brown

On 03/07/2014 07:12 AM, Rafael J. Wysocki wrote:
> On Wednesday, February 26, 2014 01:46:25 AM Rafael J. Wysocki wrote:
>> On Monday, February 24, 2014 08:29:30 AM Tuukka Tikkanen wrote:
>>> This set of patches makes some minor changes to menu governor and the poll
>>> idle state.
>>>
>>> Patch 1 is simply a rename of a variable to make the name better represent
>>> the contained information.
>>>
>>> Patch 2 fixes calculating actual residency in cases where the entered state
>>> is different from the state decided by the menu governor.
>>>
>>> Patch 3 makes sure the menu governor timer coefficients are not updated
>>> with values that might cause a coefficient to reach a value greater than
>>> unity.
>>>
>>> Patch 4 fixes calculation actual residency in cases where the entered state
>>> does not support measuring residency. In such cases the residency time
>>> is taken from time remaining until next timer expiry. The timer is expected
>>> to go off at the start of exit latency, not after it. Therefore the exit
>>> latency must not be substracted from the assumed value.
>>>
>>> Patch 5 moves the performance multiplier based comparison out of the state
>>> selection loop by changing it into a latency requirement. This allows
>>> using a generic state selection function accepting only (duration, latency)
>>> tuple as input. The change is possible by noting that performance multiplier
>>> is used only to check for a minimum predicted idle duration to exit latency
>>> ratio. As predicted idle duration is a constant for the loop, the maximum
>>> allowed latency can be calculated outside of the loop.
>>>
>>> Patch 6 prevents using negative values from tick_nohz_get_sleep_length()
>>> in the menu governor. If unchecked, the negative values are used as huge
>>> unsigned values. Negative values occur fairly often (e.g. on x86_64 I've
>>> seen this happen several times per minute) on a busy system, allowing
>>> the deepest state to win the selection while the shallowest should be picked.
>>>
>>> Patch 7 adds CPUIDLE_FLAG_TIME_VALID to poll_idle. I do not know of any
>>> platfrom where cpu_relax() would break ktime_get() and in fact poll_idle
>>> uses ktime_get itself.
>>> (Note: poll_idle updates dev->last_residency for some reason. Does it ever
>>> get called without going through cpuidle_enter_state, which will overwrite
>>> the value? Even if some state redirects to this state, the call will
>>> eventually return to the framework. The redundant time measurement could
>>> be removed, unless there is some obscure way of getting called on some
>>> platform that I am unable to figure out.)
>>>
>>> Tuukka Tikkanen (7):
>>>    Cpuidle: rename expected_us to next_timer_us in menu governor
>>>    Cpuidle: Use actual state latency in menu governor
>>>    Cpuidle: Ensure menu coefficients stay within domain
>>>    Cpuidle: Do not substract exit latency from assumed sleep length
>>>    Cpuidle: Move perf multiplier calculation out of the selection loop
>>>    Cpuidle: Deal with timer expiring in the past
>>>    Cpuidle: poll state can measure residency
>>>
>>>   drivers/cpuidle/driver.c         |    2 +-
>>>   drivers/cpuidle/governors/menu.c |   85 ++++++++++++++++++++++++--------------
>>>   2 files changed, 54 insertions(+), 33 deletions(-)
>>
>> I'm queuing this series up for 3.15.
>
> Patch [6/7] dropped, because of the Len's objection.

Did a replacement for patch 6/7 (fix the timer code so it
does not return a negative expiry time) get written and
merged somewhere? :)

I have been looking at the menu governor too recently,
but at a different aspect. I'll send out an RFC patch
soon (which will probably be shot down, and start
a discussion).


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/7] Cpuidle: Minor fixes
  2014-03-11 19:35     ` Rik van Riel
@ 2014-03-11 20:07       ` Rafael J. Wysocki
  0 siblings, 0 replies; 26+ messages in thread
From: Rafael J. Wysocki @ 2014-03-11 20:07 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Tuukka Tikkanen, linux-pm, daniel.lezcano, linux-kernel, Len Brown

On Tuesday, March 11, 2014 03:35:06 PM Rik van Riel wrote:
> On 03/07/2014 07:12 AM, Rafael J. Wysocki wrote:
> > On Wednesday, February 26, 2014 01:46:25 AM Rafael J. Wysocki wrote:
> >> On Monday, February 24, 2014 08:29:30 AM Tuukka Tikkanen wrote:
> >>> This set of patches makes some minor changes to menu governor and the poll
> >>> idle state.
> >>>
> >>> Patch 1 is simply a rename of a variable to make the name better represent
> >>> the contained information.
> >>>
> >>> Patch 2 fixes calculating actual residency in cases where the entered state
> >>> is different from the state decided by the menu governor.
> >>>
> >>> Patch 3 makes sure the menu governor timer coefficients are not updated
> >>> with values that might cause a coefficient to reach a value greater than
> >>> unity.
> >>>
> >>> Patch 4 fixes calculation actual residency in cases where the entered state
> >>> does not support measuring residency. In such cases the residency time
> >>> is taken from time remaining until next timer expiry. The timer is expected
> >>> to go off at the start of exit latency, not after it. Therefore the exit
> >>> latency must not be substracted from the assumed value.
> >>>
> >>> Patch 5 moves the performance multiplier based comparison out of the state
> >>> selection loop by changing it into a latency requirement. This allows
> >>> using a generic state selection function accepting only (duration, latency)
> >>> tuple as input. The change is possible by noting that performance multiplier
> >>> is used only to check for a minimum predicted idle duration to exit latency
> >>> ratio. As predicted idle duration is a constant for the loop, the maximum
> >>> allowed latency can be calculated outside of the loop.
> >>>
> >>> Patch 6 prevents using negative values from tick_nohz_get_sleep_length()
> >>> in the menu governor. If unchecked, the negative values are used as huge
> >>> unsigned values. Negative values occur fairly often (e.g. on x86_64 I've
> >>> seen this happen several times per minute) on a busy system, allowing
> >>> the deepest state to win the selection while the shallowest should be picked.
> >>>
> >>> Patch 7 adds CPUIDLE_FLAG_TIME_VALID to poll_idle. I do not know of any
> >>> platfrom where cpu_relax() would break ktime_get() and in fact poll_idle
> >>> uses ktime_get itself.
> >>> (Note: poll_idle updates dev->last_residency for some reason. Does it ever
> >>> get called without going through cpuidle_enter_state, which will overwrite
> >>> the value? Even if some state redirects to this state, the call will
> >>> eventually return to the framework. The redundant time measurement could
> >>> be removed, unless there is some obscure way of getting called on some
> >>> platform that I am unable to figure out.)
> >>>
> >>> Tuukka Tikkanen (7):
> >>>    Cpuidle: rename expected_us to next_timer_us in menu governor
> >>>    Cpuidle: Use actual state latency in menu governor
> >>>    Cpuidle: Ensure menu coefficients stay within domain
> >>>    Cpuidle: Do not substract exit latency from assumed sleep length
> >>>    Cpuidle: Move perf multiplier calculation out of the selection loop
> >>>    Cpuidle: Deal with timer expiring in the past
> >>>    Cpuidle: poll state can measure residency
> >>>
> >>>   drivers/cpuidle/driver.c         |    2 +-
> >>>   drivers/cpuidle/governors/menu.c |   85 ++++++++++++++++++++++++--------------
> >>>   2 files changed, 54 insertions(+), 33 deletions(-)
> >>
> >> I'm queuing this series up for 3.15.
> >
> > Patch [6/7] dropped, because of the Len's objection.
> 
> Did a replacement for patch 6/7 (fix the timer code so it
> does not return a negative expiry time) get written and
> merged somewhere? :)

Not that I know of.

> I have been looking at the menu governor too recently,
> but at a different aspect. I'll send out an RFC patch
> soon (which will probably be shot down, and start
> a discussion).

OK

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/7] Cpuidle: Minor fixes
  2014-03-07 12:12   ` Rafael J. Wysocki
  2014-03-11 19:35     ` Rik van Riel
@ 2014-03-11 21:06     ` Rik van Riel
  1 sibling, 0 replies; 26+ messages in thread
From: Rik van Riel @ 2014-03-11 21:06 UTC (permalink / raw)
  To: Rafael J. Wysocki, Tuukka Tikkanen
  Cc: linux-pm, daniel.lezcano, linux-kernel, Len Brown

On 03/07/2014 07:12 AM, Rafael J. Wysocki wrote:
> On Wednesday, February 26, 2014 01:46:25 AM Rafael J. Wysocki wrote:
>> On Monday, February 24, 2014 08:29:30 AM Tuukka Tikkanen wrote:
>>> This set of patches makes some minor changes to menu governor and the poll
>>> idle state.
>>>
>>> Patch 1 is simply a rename of a variable to make the name better represent
>>> the contained information.
>>>
>>> Patch 2 fixes calculating actual residency in cases where the entered state
>>> is different from the state decided by the menu governor.
>>>
>>> Patch 3 makes sure the menu governor timer coefficients are not updated
>>> with values that might cause a coefficient to reach a value greater than
>>> unity.
>>>
>>> Patch 4 fixes calculation actual residency in cases where the entered state
>>> does not support measuring residency. In such cases the residency time
>>> is taken from time remaining until next timer expiry. The timer is expected
>>> to go off at the start of exit latency, not after it. Therefore the exit
>>> latency must not be substracted from the assumed value.
>>>
>>> Patch 5 moves the performance multiplier based comparison out of the state
>>> selection loop by changing it into a latency requirement. This allows
>>> using a generic state selection function accepting only (duration, latency)
>>> tuple as input. The change is possible by noting that performance multiplier
>>> is used only to check for a minimum predicted idle duration to exit latency
>>> ratio. As predicted idle duration is a constant for the loop, the maximum
>>> allowed latency can be calculated outside of the loop.
>>>
>>> Patch 6 prevents using negative values from tick_nohz_get_sleep_length()
>>> in the menu governor. If unchecked, the negative values are used as huge
>>> unsigned values. Negative values occur fairly often (e.g. on x86_64 I've
>>> seen this happen several times per minute) on a busy system, allowing
>>> the deepest state to win the selection while the shallowest should be picked.
>>>
>>> Patch 7 adds CPUIDLE_FLAG_TIME_VALID to poll_idle. I do not know of any
>>> platfrom where cpu_relax() would break ktime_get() and in fact poll_idle
>>> uses ktime_get itself.
>>> (Note: poll_idle updates dev->last_residency for some reason. Does it ever
>>> get called without going through cpuidle_enter_state, which will overwrite
>>> the value? Even if some state redirects to this state, the call will
>>> eventually return to the framework. The redundant time measurement could
>>> be removed, unless there is some obscure way of getting called on some
>>> platform that I am unable to figure out.)
>>>
>>> Tuukka Tikkanen (7):
>>>    Cpuidle: rename expected_us to next_timer_us in menu governor
>>>    Cpuidle: Use actual state latency in menu governor
>>>    Cpuidle: Ensure menu coefficients stay within domain
>>>    Cpuidle: Do not substract exit latency from assumed sleep length
>>>    Cpuidle: Move perf multiplier calculation out of the selection loop
>>>    Cpuidle: Deal with timer expiring in the past
>>>    Cpuidle: poll state can measure residency
>>>
>>>   drivers/cpuidle/driver.c         |    2 +-
>>>   drivers/cpuidle/governors/menu.c |   85 ++++++++++++++++++++++++--------------
>>>   2 files changed, 54 insertions(+), 33 deletions(-)
>>
>> I'm queuing this series up for 3.15.
>
> Patch [6/7] dropped, because of the Len's objection.

Never mind my earlier email. Patch 6/7 appears to be
in linux-next ...


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/7] Cpuidle: Deal with timer expiring in the past
  2014-03-10 10:54     ` Tuukka Tikkanen
@ 2014-03-21 23:10       ` Len Brown
  0 siblings, 0 replies; 26+ messages in thread
From: Len Brown @ 2014-03-21 23:10 UTC (permalink / raw)
  To: Tuukka Tikkanen
  Cc: Linux PM list, Rafael J. Wysocki, Daniel Lezcano, linux-kernel

Tuukka,
I've reproduced this negative on a 48 thread 2-socket Xeon during boot
(seen it only once, so far).

expected_us gets calculated to be -1, which is truthful, since the
next timer return value was about 500ns in the past
and our math truncates.  This, in turn, confuses the heck out of
menu's prediction of the next event to be
over 500 sec in the future.

I suspect more breakage is lurking, so I'll poke some more...

Len Brown, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2014-03-21 23:10 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-24  6:29 [PATCH 0/7] Cpuidle: Minor fixes Tuukka Tikkanen
2014-02-24  6:29 ` [PATCH 1/7] Cpuidle: rename expected_us to next_timer_us in menu governor Tuukka Tikkanen
2014-02-24 16:52   ` Nicolas Pitre
2014-03-05 22:26   ` Len Brown
2014-02-24  6:29 ` [PATCH 2/7] Cpuidle: Use actual state latency " Tuukka Tikkanen
2014-02-24 22:35   ` Daniel Lezcano
2014-02-24  6:29 ` [PATCH 3/7] Cpuidle: Ensure menu coefficients stay within domain Tuukka Tikkanen
2014-02-24 16:57   ` Nicolas Pitre
2014-02-24  6:29 ` [PATCH 4/7] Cpuidle: Do not substract exit latency from assumed sleep length Tuukka Tikkanen
2014-02-24  6:29 ` [PATCH 5/7] Cpuidle: Move perf multiplier calculation out of the selection loop Tuukka Tikkanen
2014-02-24  6:29 ` [PATCH 6/7] Cpuidle: Deal with timer expiring in the past Tuukka Tikkanen
2014-02-24 17:05   ` Nicolas Pitre
2014-02-25  0:20     ` Rafael J. Wysocki
2014-03-06  7:41   ` Len Brown
2014-03-07  3:09     ` Len Brown
2014-03-10 10:54     ` Tuukka Tikkanen
2014-03-21 23:10       ` Len Brown
2014-02-24  6:29 ` [PATCH 7/7] Cpuidle: poll state can measure residency Tuukka Tikkanen
2014-02-25 10:56   ` Daniel Lezcano
2014-02-25 15:40     ` Tuukka Tikkanen
2014-02-24 21:24 ` [PATCH 0/7] Cpuidle: Minor fixes Daniel Lezcano
2014-02-26  0:46 ` Rafael J. Wysocki
2014-03-07 12:12   ` Rafael J. Wysocki
2014-03-11 19:35     ` Rik van Riel
2014-03-11 20:07       ` Rafael J. Wysocki
2014-03-11 21:06     ` Rik van Riel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.