linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFT][PATCH v2 0/3] cpuidle: teo: Do not check timers unconditionally every time
@ 2023-08-03 20:57 Rafael J. Wysocki
  2023-08-03 21:07 ` [RFT][PATCH v2 1/3] cpuidle: teo: Do not call tick_nohz_get_sleep_length() upfront Rafael J. Wysocki
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Rafael J. Wysocki @ 2023-08-03 20:57 UTC (permalink / raw)
  To: Linux PM, Peter Zijlstra, Anna-Maria Behnsen
  Cc: LKML, Frederic Weisbecker, Kajetan Puchalski

Hi Folks,

This is the second iteration of:

https://lore.kernel.org/linux-pm/4511619.LvFx2qVVIh@kreacher/

with an additional patch.

There are some small modifications of patch [1/3] and the new
patch causes governor statistics to play a role in deciding whether
or not to stop the scheduler tick.

Testing would be much appreciated!

Thanks!




^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFT][PATCH v2 1/3] cpuidle: teo: Do not call tick_nohz_get_sleep_length() upfront
  2023-08-03 20:57 [RFT][PATCH v2 0/3] cpuidle: teo: Do not check timers unconditionally every time Rafael J. Wysocki
@ 2023-08-03 21:07 ` Rafael J. Wysocki
  2023-08-03 21:09 ` [RFT][PATCH v2 2/3] cpuidle: teo: Skip tick_nohz_get_sleep_length() call in some cases Rafael J. Wysocki
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Rafael J. Wysocki @ 2023-08-03 21:07 UTC (permalink / raw)
  To: Linux PM, Peter Zijlstra, Anna-Maria Behnsen
  Cc: LKML, Frederic Weisbecker, Kajetan Puchalski

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Because the cost of calling tick_nohz_get_sleep_length() may increase
in the future, reorder the code in teo_select() so it first uses the
statistics to pick up a candidate idle state and applies the utilization
heuristic to it and only then calls tick_nohz_get_sleep_length() to
obtain the sleep length value and refine the selection if necessary.

This change by itself does not cause tick_nohz_get_sleep_length() to
be called less often, but it prepares the code for subsequent changes
that will do so.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v1 -> v2:
   * Fix up the "2 states and utilized CPU" special case handling, so it
     allows the tick to be stopped in all of the cases when the target
     residency of state 1 is above TICK_NSEC.
   * Fix up a stale comment.

---
 drivers/cpuidle/governors/teo.c |  105 ++++++++++++++++------------------------
 1 file changed, 44 insertions(+), 61 deletions(-)

Index: linux-pm/drivers/cpuidle/governors/teo.c
===================================================================
--- linux-pm.orig/drivers/cpuidle/governors/teo.c
+++ linux-pm/drivers/cpuidle/governors/teo.c
@@ -306,15 +306,10 @@ static void teo_update(struct cpuidle_dr
 	cpu_data->total += PULSE;
 }
 
-static bool teo_time_ok(u64 interval_ns)
+static bool teo_state_ok(int i, struct cpuidle_driver *drv)
 {
-	return !tick_nohz_tick_stopped() || interval_ns >= TICK_NSEC;
-}
-
-static s64 teo_middle_of_bin(int idx, struct cpuidle_driver *drv)
-{
-	return (drv->states[idx].target_residency_ns +
-		drv->states[idx+1].target_residency_ns) / 2;
+	return !tick_nohz_tick_stopped() ||
+		drv->states[i].target_residency_ns >= TICK_NSEC;
 }
 
 /**
@@ -354,6 +349,7 @@ static int teo_select(struct cpuidle_dri
 {
 	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
 	s64 latency_req = cpuidle_governor_latency_req(dev->cpu);
+	ktime_t delta_tick = TICK_NSEC / 2;
 	unsigned int idx_intercept_sum = 0;
 	unsigned int intercept_sum = 0;
 	unsigned int idx_recent_sum = 0;
@@ -363,7 +359,6 @@ static int teo_select(struct cpuidle_dri
 	int constraint_idx = 0;
 	int idx0 = 0, idx = -1;
 	bool alt_intercepts, alt_recent;
-	ktime_t delta_tick;
 	bool cpu_utilized;
 	s64 duration_ns;
 	int i;
@@ -374,9 +369,11 @@ static int teo_select(struct cpuidle_dri
 	}
 
 	cpu_data->time_span_ns = local_clock();
-
-	duration_ns = tick_nohz_get_sleep_length(&delta_tick);
-	cpu_data->sleep_length_ns = duration_ns;
+	/*
+	 * Set the expected sleep length to infinity in case of an early
+	 * return.
+	 */
+	cpu_data->sleep_length_ns = KTIME_MAX;
 
 	/* Check if there is any choice in the first place. */
 	if (drv->state_count < 2) {
@@ -384,11 +381,8 @@ static int teo_select(struct cpuidle_dri
 		goto out_tick;
 	}
 
-	if (!dev->states_usage[0].disable) {
+	if (!dev->states_usage[0].disable)
 		idx = 0;
-		if (drv->states[1].target_residency_ns > duration_ns)
-			goto out_tick;
-	}
 
 	cpu_utilized = teo_cpu_is_utilized(dev->cpu, cpu_data);
 	/*
@@ -397,8 +391,6 @@ static int teo_select(struct cpuidle_dri
 	 * the shallowest non-polling state and exit.
 	 */
 	if (drv->state_count < 3 && cpu_utilized) {
-		/* The CPU is utilized, so assume a short idle duration. */
-		duration_ns = teo_middle_of_bin(0, drv);
 		/*
 		 * If state 0 is enabled and it is not a polling one, select it
 		 * right away unless the scheduler tick has been stopped, in
@@ -408,22 +400,17 @@ static int teo_select(struct cpuidle_dri
 		 * anyway.
 		 */
 		if ((!idx && !(drv->states[0].flags & CPUIDLE_FLAG_POLLING) &&
-		    teo_time_ok(duration_ns)) || dev->states_usage[1].disable) {
+		    teo_state_ok(0, drv)) || dev->states_usage[1].disable) {
 			idx = 0;
 			goto out_tick;
 		}
 		/* Assume that state 1 is not a polling one and use it. */
 		idx = 1;
+		duration_ns = drv->states[1].target_residency_ns;
 		goto end;
 	}
 
-	/*
-	 * Find the deepest idle state whose target residency does not exceed
-	 * the current sleep length and the deepest idle state not deeper than
-	 * the former whose exit latency does not exceed the current latency
-	 * constraint.  Compute the sums of metrics for early wakeup pattern
-	 * detection.
-	 */
+	/* Compute the sums of metrics for early wakeup pattern detection. */
 	for (i = 1; i < drv->state_count; i++) {
 		struct teo_bin *prev_bin = &cpu_data->state_bins[i-1];
 		struct cpuidle_state *s = &drv->states[i];
@@ -439,19 +426,15 @@ static int teo_select(struct cpuidle_dri
 		if (dev->states_usage[i].disable)
 			continue;
 
-		if (idx < 0) {
-			idx = i; /* first enabled state */
-			idx0 = i;
-		}
-
-		if (s->target_residency_ns > duration_ns)
-			break;
+		if (idx < 0)
+			idx0 = i; /* first enabled state */
 
 		idx = i;
 
 		if (s->exit_latency_ns <= latency_req)
 			constraint_idx = i;
 
+		/* Save the sums for the current state. */
 		idx_intercept_sum = intercept_sum;
 		idx_hit_sum = hit_sum;
 		idx_recent_sum = recent_sum;
@@ -465,7 +448,7 @@ static int teo_select(struct cpuidle_dri
 
 	if (idx == idx0) {
 		/*
-		 * This is the first enabled idle state, so use it, but do not
+		 * Only one idle state is enabled, so use it, but do not
 		 * allow the tick to be stopped it is shallow enough.
 		 */
 		duration_ns = drv->states[idx].target_residency_ns;
@@ -479,13 +462,11 @@ static int teo_select(struct cpuidle_dri
 	 * all of the deeper states, or the sum of the numbers of recent
 	 * intercepts over all of the states shallower than the candidate one
 	 * is greater than a half of the number of recent events taken into
-	 * account, the CPU is likely to wake up early, so find an alternative
-	 * idle state to select.
+	 * account, a shallower idle state is likely to be a better choice.
 	 */
 	alt_intercepts = 2 * idx_intercept_sum > cpu_data->total - idx_hit_sum;
 	alt_recent = idx_recent_sum > NR_RECENT / 2;
 	if (alt_recent || alt_intercepts) {
-		s64 first_suitable_span_ns = duration_ns;
 		int first_suitable_idx = idx;
 
 		/*
@@ -494,44 +475,39 @@ static int teo_select(struct cpuidle_dri
 		 * cases (both with respect to intercepts overall and with
 		 * respect to the recent intercepts only) in the past.
 		 *
-		 * Take the possible latency constraint and duration limitation
-		 * present if the tick has been stopped already into account.
+		 * Take the possible duration limitation present if the tick
+		 * has been stopped already into account.
 		 */
 		intercept_sum = 0;
 		recent_sum = 0;
 
 		for (i = idx - 1; i >= 0; i--) {
 			struct teo_bin *bin = &cpu_data->state_bins[i];
-			s64 span_ns;
 
 			intercept_sum += bin->intercepts;
 			recent_sum += bin->recent;
 
-			span_ns = teo_middle_of_bin(i, drv);
-
 			if ((!alt_recent || 2 * recent_sum > idx_recent_sum) &&
 			    (!alt_intercepts ||
 			     2 * intercept_sum > idx_intercept_sum)) {
-				if (teo_time_ok(span_ns) &&
-				    !dev->states_usage[i].disable) {
+				/*
+				 * Use the current state unless it is too
+				 * shallow or disabled, in which case take the
+				 * first enabled state that is deep enough.
+				 */
+				if (teo_state_ok(i, drv) &&
+				    !dev->states_usage[i].disable)
 					idx = i;
-					duration_ns = span_ns;
-				} else {
-					/*
-					 * The current state is too shallow or
-					 * disabled, so take the first enabled
-					 * deeper state with suitable time span.
-					 */
+				else
 					idx = first_suitable_idx;
-					duration_ns = first_suitable_span_ns;
-				}
+
 				break;
 			}
 
 			if (dev->states_usage[i].disable)
 				continue;
 
-			if (!teo_time_ok(span_ns)) {
+			if (!teo_state_ok(i, drv)) {
 				/*
 				 * The current state is too shallow, but if an
 				 * alternative candidate state has been found,
@@ -543,7 +519,6 @@ static int teo_select(struct cpuidle_dri
 				break;
 			}
 
-			first_suitable_span_ns = span_ns;
 			first_suitable_idx = i;
 		}
 	}
@@ -562,14 +537,22 @@ static int teo_select(struct cpuidle_dri
 	 * not sufficiently large.
 	 */
 	if (cpu_utilized) {
-		s64 span_ns;
+		i = teo_find_shallower_state(drv, dev, idx, KTIME_MAX, true);
+		if (teo_state_ok(i, drv))
+			idx = i;
+	}
 
-		i = teo_find_shallower_state(drv, dev, idx, duration_ns, true);
-		span_ns = teo_middle_of_bin(i, drv);
-		if (teo_time_ok(span_ns)) {
+	duration_ns = tick_nohz_get_sleep_length(&delta_tick);
+	cpu_data->sleep_length_ns = duration_ns;
+
+	/*
+	 * If the closest expected timer is before the terget residency of the
+	 * candidate state, a shallower one needs to be found.
+	 */
+	if (drv->states[idx].target_residency_ns > duration_ns) {
+		i = teo_find_shallower_state(drv, dev, idx, duration_ns, false);
+		if (teo_state_ok(i, drv))
 			idx = i;
-			duration_ns = span_ns;
-		}
 	}
 
 end:




^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFT][PATCH v2 2/3] cpuidle: teo: Skip tick_nohz_get_sleep_length() call in some cases
  2023-08-03 20:57 [RFT][PATCH v2 0/3] cpuidle: teo: Do not check timers unconditionally every time Rafael J. Wysocki
  2023-08-03 21:07 ` [RFT][PATCH v2 1/3] cpuidle: teo: Do not call tick_nohz_get_sleep_length() upfront Rafael J. Wysocki
@ 2023-08-03 21:09 ` Rafael J. Wysocki
  2023-08-11  8:52   ` Anna-Maria Behnsen
  2023-08-03 21:11 ` [RFT][PATCH v2 3/3] cpuidle: teo: Gather statistics regarding whether or not to stop the tick Rafael J. Wysocki
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: Rafael J. Wysocki @ 2023-08-03 21:09 UTC (permalink / raw)
  To: Linux PM, Peter Zijlstra, Anna-Maria Behnsen
  Cc: LKML, Frederic Weisbecker, Kajetan Puchalski

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Make teo_select() avoid calling tick_nohz_get_sleep_length() if the
candidate idle state to return is state 0 or if state 0 is a polling
one and the target residency of the current candidate one is below
a certain threshold, in which cases it may be assumed that the CPU will
be woken up immediately by a non-timer wakeup source and the timers
are not likely to matter.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v1 -> v2: No changes

---
 drivers/cpuidle/governors/teo.c |   22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

Index: linux-pm/drivers/cpuidle/governors/teo.c
===================================================================
--- linux-pm.orig/drivers/cpuidle/governors/teo.c
+++ linux-pm/drivers/cpuidle/governors/teo.c
@@ -166,6 +166,12 @@
  */
 #define NR_RECENT	9
 
+/*
+ * Idle state target residency threshold used for deciding whether or not to
+ * check the time till the closest expected timer event.
+ */
+#define RESIDENCY_THRESHOLD_NS	(15 * NSEC_PER_USEC)
+
 /**
  * struct teo_bin - Metrics used by the TEO cpuidle governor.
  * @intercepts: The "intercepts" metric.
@@ -542,6 +548,22 @@ static int teo_select(struct cpuidle_dri
 			idx = i;
 	}
 
+	/*
+	 * Skip the timers check if state 0 is the current candidate one,
+	 * because an immediate non-timer wakeup is expected in that case.
+	 */
+	if (!idx)
+		goto out_tick;
+
+	/*
+	 * If state 0 is a polling one, check if the target residency of
+	 * the current candidate state is low enough and skip the timers
+	 * check in that case too.
+	 */
+	if ((drv->states[0].flags & CPUIDLE_FLAG_POLLING) &&
+	    drv->states[idx].target_residency_ns < RESIDENCY_THRESHOLD_NS)
+		goto out_tick;
+
 	duration_ns = tick_nohz_get_sleep_length(&delta_tick);
 	cpu_data->sleep_length_ns = duration_ns;
 




^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFT][PATCH v2 3/3] cpuidle: teo: Gather statistics regarding whether or not to stop the tick
  2023-08-03 20:57 [RFT][PATCH v2 0/3] cpuidle: teo: Do not check timers unconditionally every time Rafael J. Wysocki
  2023-08-03 21:07 ` [RFT][PATCH v2 1/3] cpuidle: teo: Do not call tick_nohz_get_sleep_length() upfront Rafael J. Wysocki
  2023-08-03 21:09 ` [RFT][PATCH v2 2/3] cpuidle: teo: Skip tick_nohz_get_sleep_length() call in some cases Rafael J. Wysocki
@ 2023-08-03 21:11 ` Rafael J. Wysocki
  2023-08-03 21:33 ` [RFT][PATCH v2 0/3] cpuidle: teo: Do not check timers unconditionally every time Rafael J. Wysocki
  2023-08-07 14:00 ` Kajetan Puchalski
  4 siblings, 0 replies; 21+ messages in thread
From: Rafael J. Wysocki @ 2023-08-03 21:11 UTC (permalink / raw)
  To: Linux PM, Peter Zijlstra, Anna-Maria Behnsen
  Cc: LKML, Frederic Weisbecker, Kajetan Puchalski

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Currently, if the target residency of the deepest idle state is less than
the tick period length, which is quite likely for HZ=100, and the deepest
idle state is about to be selected by the TEO idle governor, the decision
on whether or not to stop the scheduler tick is based entirely on the
time till the closest timer.  This is often insufficient, because timers
may not be in heavy use and there may be a plenty of other CPU wakeup
events between the deepest idle state's target residency and the closest
tick.

Allow the governor to count those events by making the deepest idle
state's bin effectively end at TICK_NSEC and introducing an additional
"bin" for collecting "hit" events (ie. the ones in which the measured
idle duration falls into the same bin as the time till the closest
timer) with idle duration values past TICK_NSEC.

This way the "intercepts" metric for the deepest idle state's bin
becomes nonzero in general, and so it can influence the decision on
whether or not to stop the tick possibly increasing the governor's
accuracy in that respect.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpuidle/governors/teo.c |   41 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 40 insertions(+), 1 deletion(-)

Index: linux-pm/drivers/cpuidle/governors/teo.c
===================================================================
--- linux-pm.orig/drivers/cpuidle/governors/teo.c
+++ linux-pm/drivers/cpuidle/governors/teo.c
@@ -192,6 +192,7 @@ struct teo_bin {
  * @total: Grand total of the "intercepts" and "hits" metrics for all bins.
  * @next_recent_idx: Index of the next @recent_idx entry to update.
  * @recent_idx: Indices of bins corresponding to recent "intercepts".
+ * @tick_hits: Number of "hits" after TICK_NSEC.
  * @util_threshold: Threshold above which the CPU is considered utilized
  */
 struct teo_cpu {
@@ -201,6 +202,7 @@ struct teo_cpu {
 	unsigned int total;
 	int next_recent_idx;
 	int recent_idx[NR_RECENT];
+	unsigned int tick_hits;
 	unsigned long util_threshold;
 };
 
@@ -232,6 +234,7 @@ static void teo_update(struct cpuidle_dr
 {
 	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
 	int i, idx_timer = 0, idx_duration = 0;
+	s64 target_residency_ns;
 	u64 measured_ns;
 
 	if (cpu_data->time_span_ns >= cpu_data->sleep_length_ns) {
@@ -272,7 +275,6 @@ static void teo_update(struct cpuidle_dr
 	 * fall into.
 	 */
 	for (i = 0; i < drv->state_count; i++) {
-		s64 target_residency_ns = drv->states[i].target_residency_ns;
 		struct teo_bin *bin = &cpu_data->state_bins[i];
 
 		bin->hits -= bin->hits >> DECAY_SHIFT;
@@ -280,6 +282,8 @@ static void teo_update(struct cpuidle_dr
 
 		cpu_data->total += bin->hits + bin->intercepts;
 
+		target_residency_ns = drv->states[i].target_residency_ns;
+
 		if (target_residency_ns <= cpu_data->sleep_length_ns) {
 			idx_timer = i;
 			if (target_residency_ns <= measured_ns)
@@ -295,6 +299,26 @@ static void teo_update(struct cpuidle_dr
 		cpu_data->state_bins[cpu_data->recent_idx[i]].recent--;
 
 	/*
+	 * If the deepest state's target residency is below the tick length,
+	 * make a record of it to help teo_select() decide whether or not
+	 * to stop the tick.  This effectively adds an extra hits-only bin
+	 * beyond the last state-related one.
+	 */
+	if (target_residency_ns < TICK_NSEC) {
+		cpu_data->tick_hits -= cpu_data->tick_hits >> DECAY_SHIFT;
+
+		cpu_data->total += cpu_data->tick_hits;
+
+		if (TICK_NSEC <= cpu_data->sleep_length_ns) {
+			idx_timer = drv->state_count;
+			if (TICK_NSEC <= measured_ns) {
+				cpu_data->tick_hits += PULSE;
+				goto end;
+			}
+		}
+	}
+
+	/*
 	 * If the measured idle duration falls into the same bin as the sleep
 	 * length, this is a "hit", so update the "hits" metric for that bin.
 	 * Otherwise, update the "intercepts" metric for the bin fallen into by
@@ -309,6 +333,7 @@ static void teo_update(struct cpuidle_dr
 		cpu_data->recent_idx[i] = idx_duration;
 	}
 
+end:
 	cpu_data->total += PULSE;
 }
 
@@ -356,6 +381,7 @@ static int teo_select(struct cpuidle_dri
 	struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu);
 	s64 latency_req = cpuidle_governor_latency_req(dev->cpu);
 	ktime_t delta_tick = TICK_NSEC / 2;
+	unsigned int tick_intercept_sum = 0;
 	unsigned int idx_intercept_sum = 0;
 	unsigned int intercept_sum = 0;
 	unsigned int idx_recent_sum = 0;
@@ -429,6 +455,8 @@ static int teo_select(struct cpuidle_dri
 		hit_sum += prev_bin->hits;
 		recent_sum += prev_bin->recent;
 
+		tick_intercept_sum = intercept_sum;
+
 		if (dev->states_usage[i].disable)
 			continue;
 
@@ -461,6 +489,8 @@ static int teo_select(struct cpuidle_dri
 		goto end;
 	}
 
+	tick_intercept_sum += cpu_data->state_bins[drv->state_count-1].intercepts;
+
 	/*
 	 * If the sum of the intercepts metric for all of the idle states
 	 * shallower than the current candidate one (idx) is greater than the
@@ -577,6 +607,15 @@ static int teo_select(struct cpuidle_dri
 			idx = i;
 	}
 
+	/*
+	 * If the selected state's target residency is below the tick length
+	 * and intercepts occurring before the tick length are the majority of
+	 * total wakeup events, do not stop the tick.
+	 */
+	if (drv->states[idx].target_residency_ns < TICK_NSEC &&
+	    tick_intercept_sum > cpu_data->total / 2 + cpu_data->total / 8)
+		duration_ns = TICK_NSEC / 2;
+
 end:
 	/*
 	 * Allow the tick to be stopped unless the selected state is a polling




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFT][PATCH v2 0/3] cpuidle: teo: Do not check timers unconditionally every time
  2023-08-03 20:57 [RFT][PATCH v2 0/3] cpuidle: teo: Do not check timers unconditionally every time Rafael J. Wysocki
                   ` (2 preceding siblings ...)
  2023-08-03 21:11 ` [RFT][PATCH v2 3/3] cpuidle: teo: Gather statistics regarding whether or not to stop the tick Rafael J. Wysocki
@ 2023-08-03 21:33 ` Rafael J. Wysocki
  2023-08-07 15:38   ` Anna-Maria Behnsen
  2023-08-08 15:22   ` Doug Smythies
  2023-08-07 14:00 ` Kajetan Puchalski
  4 siblings, 2 replies; 21+ messages in thread
From: Rafael J. Wysocki @ 2023-08-03 21:33 UTC (permalink / raw)
  To: Linux PM, Anna-Maria Behnsen, Kajetan Puchalski
  Cc: Peter Zijlstra, LKML, Frederic Weisbecker

On Thu, Aug 3, 2023 at 11:12 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> Hi Folks,
>
> This is the second iteration of:
>
> https://lore.kernel.org/linux-pm/4511619.LvFx2qVVIh@kreacher/
>
> with an additional patch.
>
> There are some small modifications of patch [1/3] and the new
> patch causes governor statistics to play a role in deciding whether
> or not to stop the scheduler tick.
>
> Testing would be much appreciated!

For convenience, this series is now available in the following git branch:

git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
 pm-cpuidle-teo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFT][PATCH v2 0/3] cpuidle: teo: Do not check timers unconditionally every time
  2023-08-03 20:57 [RFT][PATCH v2 0/3] cpuidle: teo: Do not check timers unconditionally every time Rafael J. Wysocki
                   ` (3 preceding siblings ...)
  2023-08-03 21:33 ` [RFT][PATCH v2 0/3] cpuidle: teo: Do not check timers unconditionally every time Rafael J. Wysocki
@ 2023-08-07 14:00 ` Kajetan Puchalski
  2023-08-07 16:46   ` Rafael J. Wysocki
  4 siblings, 1 reply; 21+ messages in thread
From: Kajetan Puchalski @ 2023-08-07 14:00 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Peter Zijlstra, Anna-Maria Behnsen, LKML, Frederic Weisbecker

Hi Rafael,

On Thu, Aug 03, 2023 at 10:57:04PM +0200, Rafael J. Wysocki wrote:
> Hi Folks,
> 
> This is the second iteration of:
> 
> https://lore.kernel.org/linux-pm/4511619.LvFx2qVVIh@kreacher/
> 
> with an additional patch.
> 
> There are some small modifications of patch [1/3] and the new
> patch causes governor statistics to play a role in deciding whether
> or not to stop the scheduler tick.
> 
> Testing would be much appreciated!
> 
> Thanks!
> 

My test results including the v2 are below.

1. Geekbench 6

+---------------------------+---------------+-----------------+-------------------+----------------------+
|          metric           |      teo      |     teo_tick    |    teo_tick_rfc   |    teo_tick_rfc_v2   |
+---------------------------+---------------+-----------------+-------------------+----------------------+
|      multicore_score      | 3320.9 (0.0%) | 3303.3 (-0.53%) |  3293.6 (-0.82%)  |   3302.3 (-0.56%)    |
|           score           | 1415.7 (0.0%) | 1417.7 (0.14%)  |  1423.4 (0.54%)   |    1425.8 (0.71%)    |
|      CPU_total_power      | 2421.3 (0.0%) | 2429.3 (0.33%)  |  2442.2 (0.86%)   |    2461.9 (1.67%)    |
|  latency (AsyncTask #1)   | 49.41μ (0.0%) | 51.07μ (3.36%)  |   50.1μ (1.4%)    |    50.76μ (2.73%)    |
| latency (labs.geekbench6) | 65.63μ (0.0%) | 77.47μ (18.03%) | 55.82μ (-14.95%)  |    66.12μ (0.75%)    |
| latency (surfaceflinger)  | 39.46μ (0.0%) | 36.94μ (-6.39%) |  35.79μ (-9.28%)  |    40.36μ (2.3%)     |
+---------------------------+---------------+-----------------+-------------------+----------------------+

+----------------------+-------------+------------+
|         tag          |    type     | count_perc |
+----------------------+-------------+------------+
|         teo          |  too deep   |   2.034    |
|       teo_tick       |  too deep   |    2.16    |
|     teo_tick_rfc     |  too deep   |   2.071    |
|    teo_tick_rfc_v2   |  too deep   |   2.548    |
|         teo          | too shallow |   15.791   |
|       teo_tick       | too shallow |   20.881   |
|     teo_tick_rfc     | too shallow |   20.337   |
|    teo_tick_rfc_v2   | too shallow |   19.886   |
+----------------------+-------------+------------+


2. JetNews

+-----------------+---------------+----------------+-----------------+-----------------+
|     metric      |      teo      |    teo_tick    |  teo_tick_rfc   | teo_tick_rfc_v2 |
+-----------------+---------------+----------------+-----------------+-----------------+
|       fps       |  86.2 (0.0%)  |  86.4 (0.16%)  |  86.0 (-0.28%)  |  86.6 (0.41%)   |
|    janks_pc     |  0.8 (0.0%)   |  0.8 (-0.66%)  |  0.8 (-1.37%)   |  0.7 (-11.37%)  |
| CPU_total_power | 185.2 (0.0%)  | 178.2 (-3.76%) |  182.2 (-1.6%)  | 169.4 (-8.53%)  | <- very interesting
+-----------------+---------------+----------------+-----------------+-----------------+

+----------------------+-------------+--------------------+
|         tag          |    type     |     count_perc     |
+----------------------+-------------+--------------------+
|         teo          |  too deep   |       0.992        |
|       teo_tick       |  too deep   |       0.945        |
|     teo_tick_rfc     |  too deep   |       1.035        |
|    teo_tick_rfc_v2   |  too deep   |       1.127        |
|         teo          | too shallow |       17.085       |
|       teo_tick       | too shallow |       15.236       |
|     teo_tick_rfc     | too shallow |       15.379       |
|    teo_tick_rfc_v2   | too shallow |       15.34        |
+----------------------+-------------+--------------------+

All in all looks pretty good. Unfortunately there's a slightly larger
percentage of too deep sleeps with the v2 (which is probably where the
increase in GB6 power usage comes from) but the lower jank percentage +
substantially lower power usage for the UI workload are very promising.

Since we don't care about GB6 power usage as much as UI power usage, I'd
say that the patchset looks good :)

Tested-by: Kajetan Puchalski <kajetan.puchalski@arm.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFT][PATCH v2 0/3] cpuidle: teo: Do not check timers unconditionally every time
  2023-08-03 21:33 ` [RFT][PATCH v2 0/3] cpuidle: teo: Do not check timers unconditionally every time Rafael J. Wysocki
@ 2023-08-07 15:38   ` Anna-Maria Behnsen
  2023-08-07 15:39     ` Anna-Maria Behnsen
  2023-08-07 16:47     ` Rafael J. Wysocki
  2023-08-08 15:22   ` Doug Smythies
  1 sibling, 2 replies; 21+ messages in thread
From: Anna-Maria Behnsen @ 2023-08-07 15:38 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Kajetan Puchalski, Peter Zijlstra, LKML, Frederic Weisbecker

[-- Attachment #1: Type: text/plain, Size: 1306 bytes --]

On Thu, 3 Aug 2023, Rafael J. Wysocki wrote:

> On Thu, Aug 3, 2023 at 11:12 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >
> > Hi Folks,
> >
> > This is the second iteration of:
> >
> > https://lore.kernel.org/linux-pm/4511619.LvFx2qVVIh@kreacher/
> >
> > with an additional patch.
> >
> > There are some small modifications of patch [1/3] and the new
> > patch causes governor statistics to play a role in deciding whether
> > or not to stop the scheduler tick.
> >
> > Testing would be much appreciated!
> 
> For convenience, this series is now available in the following git branch:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
>  pm-cpuidle-teo
> 

Gauthams tests and the distribution of idle time durations looks pretty
good. Also the prevention of calling tick_nohz_get_sleep_length() is very
nice (21477 calls of tick_nohz_next_event() and the tick was stopped 2670
times).

Here is the deviation of idle time durations (based on your branch):

Idle Total		2670	100.00%
x >= 4ms		2537	95.02%
4ms> x >= 2ms		19	0.71%
2ms > x >= 1ms		10	0.37%
1ms > x >= 500us	7	0.26%
500us > x >= 250us	6	0.22%
250us > x >=100us	13	0.49%
100us > x >= 50us	17	0.64%
50us > x >= 25us	25	0.94%
25us > x >= 10us	22	0.82%
10us > x > 5us		9	0.34%
5us > x			5	0.19%


Thanks,

	Anna-Maria

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFT][PATCH v2 0/3] cpuidle: teo: Do not check timers unconditionally every time
  2023-08-07 15:38   ` Anna-Maria Behnsen
@ 2023-08-07 15:39     ` Anna-Maria Behnsen
  2023-08-07 16:47     ` Rafael J. Wysocki
  1 sibling, 0 replies; 21+ messages in thread
From: Anna-Maria Behnsen @ 2023-08-07 15:39 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Kajetan Puchalski, Peter Zijlstra, LKML, Frederic Weisbecker

[-- Attachment #1: Type: text/plain, Size: 1480 bytes --]

On Mon, 7 Aug 2023, Anna-Maria Behnsen wrote:

> On Thu, 3 Aug 2023, Rafael J. Wysocki wrote:
> 
> > On Thu, Aug 3, 2023 at 11:12 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > >
> > > Hi Folks,
> > >
> > > This is the second iteration of:
> > >
> > > https://lore.kernel.org/linux-pm/4511619.LvFx2qVVIh@kreacher/
> > >
> > > with an additional patch.
> > >
> > > There are some small modifications of patch [1/3] and the new
> > > patch causes governor statistics to play a role in deciding whether
> > > or not to stop the scheduler tick.
> > >
> > > Testing would be much appreciated!
> > 
> > For convenience, this series is now available in the following git branch:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
> >  pm-cpuidle-teo
> > 
> 
> Gauthams tests and the distribution of idle time durations looks pretty
> good. Also the prevention of calling tick_nohz_get_sleep_length() is very
> nice (21477 calls of tick_nohz_next_event() and the tick was stopped 2670
> times).
> 
> Here is the deviation of idle time durations (based on your branch):

Sorry, s/deviation/distribution

> Idle Total		2670	100.00%
> x >= 4ms		2537	95.02%
> 4ms> x >= 2ms		19	0.71%
> 2ms > x >= 1ms		10	0.37%
> 1ms > x >= 500us	7	0.26%
> 500us > x >= 250us	6	0.22%
> 250us > x >=100us	13	0.49%
> 100us > x >= 50us	17	0.64%
> 50us > x >= 25us	25	0.94%
> 25us > x >= 10us	22	0.82%
> 10us > x > 5us		9	0.34%
> 5us > x			5	0.19%
> 
> 
> Thanks,
> 
> 	Anna-Maria

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFT][PATCH v2 0/3] cpuidle: teo: Do not check timers unconditionally every time
  2023-08-07 14:00 ` Kajetan Puchalski
@ 2023-08-07 16:46   ` Rafael J. Wysocki
  0 siblings, 0 replies; 21+ messages in thread
From: Rafael J. Wysocki @ 2023-08-07 16:46 UTC (permalink / raw)
  To: Kajetan Puchalski
  Cc: Rafael J. Wysocki, Linux PM, Peter Zijlstra, Anna-Maria Behnsen,
	LKML, Frederic Weisbecker

Hi Kajetan,

On Mon, Aug 7, 2023 at 4:04 PM Kajetan Puchalski
<kajetan.puchalski@arm.com> wrote:
>
> Hi Rafael,
>
> On Thu, Aug 03, 2023 at 10:57:04PM +0200, Rafael J. Wysocki wrote:
> > Hi Folks,
> >
> > This is the second iteration of:
> >
> > https://lore.kernel.org/linux-pm/4511619.LvFx2qVVIh@kreacher/
> >
> > with an additional patch.
> >
> > There are some small modifications of patch [1/3] and the new
> > patch causes governor statistics to play a role in deciding whether
> > or not to stop the scheduler tick.
> >
> > Testing would be much appreciated!
> >
> > Thanks!
> >
>
> My test results including the v2 are below.
>
> 1. Geekbench 6
>
> +---------------------------+---------------+-----------------+-------------------+----------------------+
> |          metric           |      teo      |     teo_tick    |    teo_tick_rfc   |    teo_tick_rfc_v2   |
> +---------------------------+---------------+-----------------+-------------------+----------------------+
> |      multicore_score      | 3320.9 (0.0%) | 3303.3 (-0.53%) |  3293.6 (-0.82%)  |   3302.3 (-0.56%)    |
> |           score           | 1415.7 (0.0%) | 1417.7 (0.14%)  |  1423.4 (0.54%)   |    1425.8 (0.71%)    |
> |      CPU_total_power      | 2421.3 (0.0%) | 2429.3 (0.33%)  |  2442.2 (0.86%)   |    2461.9 (1.67%)    |
> |  latency (AsyncTask #1)   | 49.41μ (0.0%) | 51.07μ (3.36%)  |   50.1μ (1.4%)    |    50.76μ (2.73%)    |
> | latency (labs.geekbench6) | 65.63μ (0.0%) | 77.47μ (18.03%) | 55.82μ (-14.95%)  |    66.12μ (0.75%)    |
> | latency (surfaceflinger)  | 39.46μ (0.0%) | 36.94μ (-6.39%) |  35.79μ (-9.28%)  |    40.36μ (2.3%)     |
> +---------------------------+---------------+-----------------+-------------------+----------------------+
>
> +----------------------+-------------+------------+
> |         tag          |    type     | count_perc |
> +----------------------+-------------+------------+
> |         teo          |  too deep   |   2.034    |
> |       teo_tick       |  too deep   |    2.16    |
> |     teo_tick_rfc     |  too deep   |   2.071    |
> |    teo_tick_rfc_v2   |  too deep   |   2.548    |
> |         teo          | too shallow |   15.791   |
> |       teo_tick       | too shallow |   20.881   |
> |     teo_tick_rfc     | too shallow |   20.337   |
> |    teo_tick_rfc_v2   | too shallow |   19.886   |
> +----------------------+-------------+------------+
>
>
> 2. JetNews
>
> +-----------------+---------------+----------------+-----------------+-----------------+
> |     metric      |      teo      |    teo_tick    |  teo_tick_rfc   | teo_tick_rfc_v2 |
> +-----------------+---------------+----------------+-----------------+-----------------+
> |       fps       |  86.2 (0.0%)  |  86.4 (0.16%)  |  86.0 (-0.28%)  |  86.6 (0.41%)   |
> |    janks_pc     |  0.8 (0.0%)   |  0.8 (-0.66%)  |  0.8 (-1.37%)   |  0.7 (-11.37%)  |
> | CPU_total_power | 185.2 (0.0%)  | 178.2 (-3.76%) |  182.2 (-1.6%)  | 169.4 (-8.53%)  | <- very interesting
> +-----------------+---------------+----------------+-----------------+-----------------+
>
> +----------------------+-------------+--------------------+
> |         tag          |    type     |     count_perc     |
> +----------------------+-------------+--------------------+
> |         teo          |  too deep   |       0.992        |
> |       teo_tick       |  too deep   |       0.945        |
> |     teo_tick_rfc     |  too deep   |       1.035        |
> |    teo_tick_rfc_v2   |  too deep   |       1.127        |
> |         teo          | too shallow |       17.085       |
> |       teo_tick       | too shallow |       15.236       |
> |     teo_tick_rfc     | too shallow |       15.379       |
> |    teo_tick_rfc_v2   | too shallow |       15.34        |
> +----------------------+-------------+--------------------+
>
> All in all looks pretty good. Unfortunately there's a slightly larger
> percentage of too deep sleeps with the v2 (which is probably where the
> increase in GB6 power usage comes from) but the lower jank percentage +
> substantially lower power usage for the UI workload are very promising.
>
> Since we don't care about GB6 power usage as much as UI power usage, I'd
> say that the patchset looks good :)
>
> Tested-by: Kajetan Puchalski <kajetan.puchalski@arm.com>

Thanks a lot, much appreciated!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFT][PATCH v2 0/3] cpuidle: teo: Do not check timers unconditionally every time
  2023-08-07 15:38   ` Anna-Maria Behnsen
  2023-08-07 15:39     ` Anna-Maria Behnsen
@ 2023-08-07 16:47     ` Rafael J. Wysocki
  2023-08-09 15:09       ` Anna-Maria Behnsen
  1 sibling, 1 reply; 21+ messages in thread
From: Rafael J. Wysocki @ 2023-08-07 16:47 UTC (permalink / raw)
  To: Anna-Maria Behnsen
  Cc: Rafael J. Wysocki, Linux PM, Kajetan Puchalski, Peter Zijlstra,
	LKML, Frederic Weisbecker

On Mon, Aug 7, 2023 at 5:38 PM Anna-Maria Behnsen
<anna-maria@linutronix.de> wrote:
>
> On Thu, 3 Aug 2023, Rafael J. Wysocki wrote:
>
> > On Thu, Aug 3, 2023 at 11:12 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > >
> > > Hi Folks,
> > >
> > > This is the second iteration of:
> > >
> > > https://lore.kernel.org/linux-pm/4511619.LvFx2qVVIh@kreacher/
> > >
> > > with an additional patch.
> > >
> > > There are some small modifications of patch [1/3] and the new
> > > patch causes governor statistics to play a role in deciding whether
> > > or not to stop the scheduler tick.
> > >
> > > Testing would be much appreciated!
> >
> > For convenience, this series is now available in the following git branch:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
> >  pm-cpuidle-teo
> >
>
> Gauthams tests and the distribution of idle time durations looks pretty
> good. Also the prevention of calling tick_nohz_get_sleep_length() is very
> nice (21477 calls of tick_nohz_next_event() and the tick was stopped 2670
> times).
>
> Here is the deviation of idle time durations (based on your branch):
>
> Idle Total              2670    100.00%
> x >= 4ms                2537    95.02%
> 4ms> x >= 2ms           19      0.71%
> 2ms > x >= 1ms          10      0.37%
> 1ms > x >= 500us        7       0.26%
> 500us > x >= 250us      6       0.22%
> 250us > x >=100us       13      0.49%
> 100us > x >= 50us       17      0.64%
> 50us > x >= 25us        25      0.94%
> 25us > x >= 10us        22      0.82%
> 10us > x > 5us          9       0.34%
> 5us > x                 5       0.19%

Thanks a lot for the data!

Can I add a Tested-by: tag from you to this series?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [RFT][PATCH v2 0/3] cpuidle: teo: Do not check timers unconditionally every time
  2023-08-03 21:33 ` [RFT][PATCH v2 0/3] cpuidle: teo: Do not check timers unconditionally every time Rafael J. Wysocki
  2023-08-07 15:38   ` Anna-Maria Behnsen
@ 2023-08-08 15:22   ` Doug Smythies
  2023-08-08 16:43     ` Rafael J. Wysocki
  1 sibling, 1 reply; 21+ messages in thread
From: Doug Smythies @ 2023-08-08 15:22 UTC (permalink / raw)
  To: 'Rafael J. Wysocki'
  Cc: 'Peter Zijlstra', 'LKML',
	'Frederic Weisbecker', 'Linux PM',
	'Anna-Maria Behnsen', 'Kajetan Puchalski',
	Doug Smythies

On 2023.08.03 14:33 Rafael wrote:
> On Thu, Aug 3, 2023 at 11:12 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>>
>> Hi Folks,
>>
>> This is the second iteration of:
>>
>> https://lore.kernel.org/linux-pm/4511619.LvFx2qVVIh@kreacher/
>>
>> with an additional patch.
>>
>> There are some small modifications of patch [1/3] and the new
>> patch causes governor statistics to play a role in deciding whether
>> or not to stop the scheduler tick.
>>
>> Testing would be much appreciated!
>
> For convenience, this series is now available in the following git branch:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
> pm-cpuidle-teo

Hi Rafael,

Thank you for the git branch link.

I did some testing:

Disclaimer: I used areas of focus derived
from the original teo-util work last fall,
and did not check if they were still the best
places to look for issues.

CPU: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz
HWP: enabled
CPU frequency scaling driver: intel_pstate
CPU frequency scaling governor: performance
Kernel 1: 6.5-rc4 (1000 Hz tick rate)
Kernel 2: kernel 1 + this patch series (called "rjw")
System is extremely idle, other than the test work.

All tests were done with all idle governors:
menu, teo, ladder, rjw.

Test 1: 2 core ping pong sweep:

Pass a token between 2 CPUs on 2 different cores.
Do a variable amount of work at each stop.

Purpose: To utilize the shallowest idle states
and observe the transition from using more of 1
idle state to another.

Results: 
 
teo and rjw track fairly well, with
rjw reducing its use of idle state 0 before
teo as the work packet increases. The menu governor
does best overall, but performs worse over a greater
range of token loop times.

Details (power and idle stats; times):
http://smythies.com/~doug/linux/idle/teo-util2/ping-sweep/2-1/perf/
http://smythies.com/~doug/linux/idle/teo-util2/ping-sweep/2-1/2-core-ping-pong-sweep.png

Test 2: 6 core ping pong sweep:

Pass a token between 6 CPUs on 6 different cores.
Do a variable amount of work at each stop.

Purpose: To utilize the midrange idle states
and observe the transitions from between use of
idle states.

Results: There is some instability in the results
in the early stages.
For unknown reasons, the rjw governor sometimes works
slower and at lower power. The condition is not 100%
repeatable.

Overall teo completed the test fastest (54.9 minutes)
Followed by menu (56.2 minutes), then rjw (56.7 minutes),
then ladder (58.4 minutes). teo is faster throughout the
latter stages of the test, but at the cost of more power.
The differences seem to be in the transition from idle
state 1 to idle state 2 usage.

Details (power and idle stats; times):
http://smythies.com/~doug/linux/idle/teo-util2/ping-sweep/6-2/perf/
http://smythies.com/~doug/linux/idle/teo-util2/ping-sweep/6-2/6-core-ping-pong-sweep.png
http://smythies.com/~doug/linux/idle/teo-util2/ping-sweep/6-2/6-core-ping-pong-sweep-detail-a.png
http://smythies.com/~doug/linux/idle/teo-util2/ping-sweep/6-2/6-core-ping-pong-sweep-detail-b.png
http://smythies.com/~doug/linux/idle/teo-util2/ping-sweep/6-2/6-core-ping-pong-sweep-diffs.png

a re-run power and idle stats, showing inconsistent behaviour.
teo and rjw only, and no timing data:
http://smythies.com/~doug/linux/idle/teo-util2/ping-sweep/6-1/perf/

Test 3: sleeping ebizzy - 128 threads.

Purpose: This test has given interesting results in the past.
The test varies the sleep interval between record lookups.
The result is varying usage of idle states.

Results: It can be difficult to see any differences in
the overall timing graph, but a graph of differences
is revealing. teo outperforms rjw in the longer intervals
region of the test, at the cost of more power.

Details: (power and idle stats; times):
http://smythies.com/~doug/linux/idle/teo-util2/ebizzy/perf/
http://smythies.com/~doug/linux/idle/teo-util2/ebizzy/ebizzy-128-perf.png
http://smythies.com/~doug/linux/idle/teo-util2/ebizzy/ebizzy-128-perf-diffs.png

Test 4: 2 X 2 pair token passing. Dwell test. Fast:

Purpose: Dwell under one set of conditions. Observe
noise and/or any bi-stability.

Results (reference time is menu):
rjw: 3.0723 usecs/loop average. +3.15%
teo: 2.9917 usecs/loop average. +0.44%
menu: 2.97845 usecs/loop average. Reference
ladder: 4.077375 usecs/loop average. +36.9%

Powers are all similar, with ladder a bit lower.

Details: (power and idle stats; times):
http://smythies.com/~doug/linux/idle/teo-util2/many-0-400000000-2/perf/
http://smythies.com/~doug/linux/idle/teo-util2/many-0-400000000-2/times.txt

Test 5: 2 X 2 pair token passing. Dwell test. Medium:

Purpose: Dwell under one set of conditions. Observe
noise and/or any bi-stability.

Results (reference time is menu):
rjw: 11.3406 usecs/loop average. -0.69%
teo: 11.36765 usecs/loop average. -0.45%
menu: 11.41905 usecs/loop average. reference
ladder: 11.9535 usecs/loop average. +4.68%

Powers are all similar.

Details:
http://smythies.com/~doug/linux/idle/teo-util2/many-3000-100000000-2/perf/
http://smythies.com/~doug/linux/idle/teo-util2/many-3000-100000000-2/times.txt

Test 6: 2 X 2 pair token passing. Dwell test. Slow:

Purpose: Dwell under one set of conditions. Observe
noise and/or any bi-stability.

Results (reference time is menu):
rjw: 2591.70 usecs/loop average. +0.26%
teo: 2566.34 usecs/loop average. -0.72%
menu: 2585.00 usecs/loop average. reference
ladder: 2635.36 usecs/loop average. +1.95%

Powers are all similar, with ladder a bit lower. 
Due to the strong temperature to power use curve,
a much longer dwell test would need to be run to
be sure to get to steady state power usage.

Details:
http://smythies.com/~doug/linux/idle/teo-util2/many-1000000-342000-2/perf/
http://smythies.com/~doug/linux/idle/teo-util2/many-1000000-342000-2/times.txt

Test 7: 500 low load threads.

Purpose: This test has given interesting results
in the past.

500 threads at approximately 10 hertz work/sleep frequency
and about 0.0163 load per thread, 8.15 total.
CPUs about 32% idle.

Results:
rjw executed 0.01% faster than teo.
rjw used 5% less energy than teo.

Details:
http://smythies.com/~doug/linux/idle/teo-util2/waiter/perf/
http://smythies.com/~doug/linux/idle/teo-util2/waiter/times.txt

Conclusions: Overall, I am not seeing a compelling reason to
proceed with this patch set.

... Doug



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFT][PATCH v2 0/3] cpuidle: teo: Do not check timers unconditionally every time
  2023-08-08 15:22   ` Doug Smythies
@ 2023-08-08 16:43     ` Rafael J. Wysocki
  2023-08-08 22:40       ` Doug Smythies
  0 siblings, 1 reply; 21+ messages in thread
From: Rafael J. Wysocki @ 2023-08-08 16:43 UTC (permalink / raw)
  To: Doug Smythies
  Cc: Rafael J. Wysocki, Peter Zijlstra, LKML, Frederic Weisbecker,
	Linux PM, Anna-Maria Behnsen, Kajetan Puchalski

On Tue, Aug 8, 2023 at 5:22 PM Doug Smythies <dsmythies@telus.net> wrote:
>
> On 2023.08.03 14:33 Rafael wrote:
> > On Thu, Aug 3, 2023 at 11:12 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >>
> >> Hi Folks,
> >>
> >> This is the second iteration of:
> >>
> >> https://lore.kernel.org/linux-pm/4511619.LvFx2qVVIh@kreacher/
> >>
> >> with an additional patch.
> >>
> >> There are some small modifications of patch [1/3] and the new
> >> patch causes governor statistics to play a role in deciding whether
> >> or not to stop the scheduler tick.
> >>
> >> Testing would be much appreciated!
> >
> > For convenience, this series is now available in the following git branch:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
> > pm-cpuidle-teo
>
> Hi Rafael,
>
> Thank you for the git branch link.
>
> I did some testing:
>
> Disclaimer: I used areas of focus derived
> from the original teo-util work last fall,
> and did not check if they were still the best
> places to look for issues.
>
> CPU: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz
> HWP: enabled
> CPU frequency scaling driver: intel_pstate
> CPU frequency scaling governor: performance
> Kernel 1: 6.5-rc4 (1000 Hz tick rate)
> Kernel 2: kernel 1 + this patch series (called "rjw")
> System is extremely idle, other than the test work.
>
> All tests were done with all idle governors:
> menu, teo, ladder, rjw.
>
> Test 1: 2 core ping pong sweep:
>
> Pass a token between 2 CPUs on 2 different cores.
> Do a variable amount of work at each stop.
>
> Purpose: To utilize the shallowest idle states
> and observe the transition from using more of 1
> idle state to another.
>
> Results:
>
> teo and rjw track fairly well, with
> rjw reducing its use of idle state 0 before
> teo as the work packet increases. The menu governor
> does best overall, but performs worse over a greater
> range of token loop times.
>
> Details (power and idle stats; times):
> http://smythies.com/~doug/linux/idle/teo-util2/ping-sweep/2-1/perf/
> http://smythies.com/~doug/linux/idle/teo-util2/ping-sweep/2-1/2-core-ping-pong-sweep.png
>
> Test 2: 6 core ping pong sweep:
>
> Pass a token between 6 CPUs on 6 different cores.
> Do a variable amount of work at each stop.
>
> Purpose: To utilize the midrange idle states
> and observe the transitions from between use of
> idle states.
>
> Results: There is some instability in the results
> in the early stages.
> For unknown reasons, the rjw governor sometimes works
> slower and at lower power. The condition is not 100%
> repeatable.
>
> Overall teo completed the test fastest (54.9 minutes)
> Followed by menu (56.2 minutes), then rjw (56.7 minutes),
> then ladder (58.4 minutes). teo is faster throughout the
> latter stages of the test, but at the cost of more power.
> The differences seem to be in the transition from idle
> state 1 to idle state 2 usage.
>
> Details (power and idle stats; times):
> http://smythies.com/~doug/linux/idle/teo-util2/ping-sweep/6-2/perf/
> http://smythies.com/~doug/linux/idle/teo-util2/ping-sweep/6-2/6-core-ping-pong-sweep.png
> http://smythies.com/~doug/linux/idle/teo-util2/ping-sweep/6-2/6-core-ping-pong-sweep-detail-a.png
> http://smythies.com/~doug/linux/idle/teo-util2/ping-sweep/6-2/6-core-ping-pong-sweep-detail-b.png
> http://smythies.com/~doug/linux/idle/teo-util2/ping-sweep/6-2/6-core-ping-pong-sweep-diffs.png
>
> a re-run power and idle stats, showing inconsistent behaviour.
> teo and rjw only, and no timing data:
> http://smythies.com/~doug/linux/idle/teo-util2/ping-sweep/6-1/perf/
>
> Test 3: sleeping ebizzy - 128 threads.
>
> Purpose: This test has given interesting results in the past.
> The test varies the sleep interval between record lookups.
> The result is varying usage of idle states.
>
> Results: It can be difficult to see any differences in
> the overall timing graph, but a graph of differences
> is revealing. teo outperforms rjw in the longer intervals
> region of the test, at the cost of more power.
>
> Details: (power and idle stats; times):
> http://smythies.com/~doug/linux/idle/teo-util2/ebizzy/perf/
> http://smythies.com/~doug/linux/idle/teo-util2/ebizzy/ebizzy-128-perf.png
> http://smythies.com/~doug/linux/idle/teo-util2/ebizzy/ebizzy-128-perf-diffs.png
>
> Test 4: 2 X 2 pair token passing. Dwell test. Fast:
>
> Purpose: Dwell under one set of conditions. Observe
> noise and/or any bi-stability.
>
> Results (reference time is menu):
> rjw: 3.0723 usecs/loop average. +3.15%
> teo: 2.9917 usecs/loop average. +0.44%
> menu: 2.97845 usecs/loop average. Reference
> ladder: 4.077375 usecs/loop average. +36.9%
>
> Powers are all similar, with ladder a bit lower.
>
> Details: (power and idle stats; times):
> http://smythies.com/~doug/linux/idle/teo-util2/many-0-400000000-2/perf/
> http://smythies.com/~doug/linux/idle/teo-util2/many-0-400000000-2/times.txt
>
> Test 5: 2 X 2 pair token passing. Dwell test. Medium:
>
> Purpose: Dwell under one set of conditions. Observe
> noise and/or any bi-stability.
>
> Results (reference time is menu):
> rjw: 11.3406 usecs/loop average. -0.69%
> teo: 11.36765 usecs/loop average. -0.45%
> menu: 11.41905 usecs/loop average. reference
> ladder: 11.9535 usecs/loop average. +4.68%
>
> Powers are all similar.
>
> Details:
> http://smythies.com/~doug/linux/idle/teo-util2/many-3000-100000000-2/perf/
> http://smythies.com/~doug/linux/idle/teo-util2/many-3000-100000000-2/times.txt
>
> Test 6: 2 X 2 pair token passing. Dwell test. Slow:
>
> Purpose: Dwell under one set of conditions. Observe
> noise and/or any bi-stability.
>
> Results (reference time is menu):
> rjw: 2591.70 usecs/loop average. +0.26%
> teo: 2566.34 usecs/loop average. -0.72%
> menu: 2585.00 usecs/loop average. reference
> ladder: 2635.36 usecs/loop average. +1.95%
>
> Powers are all similar, with ladder a bit lower.
> Due to the strong temperature to power use curve,
> a much longer dwell test would need to be run to
> be sure to get to steady state power usage.
>
> Details:
> http://smythies.com/~doug/linux/idle/teo-util2/many-1000000-342000-2/perf/
> http://smythies.com/~doug/linux/idle/teo-util2/many-1000000-342000-2/times.txt
>
> Test 7: 500 low load threads.
>
> Purpose: This test has given interesting results
> in the past.
>
> 500 threads at approximately 10 hertz work/sleep frequency
> and about 0.0163 load per thread, 8.15 total.
> CPUs about 32% idle.
>
> Results:
> rjw executed 0.01% faster than teo.
> rjw used 5% less energy than teo.
>
> Details:
> http://smythies.com/~doug/linux/idle/teo-util2/waiter/perf/
> http://smythies.com/~doug/linux/idle/teo-util2/waiter/times.txt

Thanks a lot for doing this work, much appreciated!

> Conclusions: Overall, I am not seeing a compelling reason to
> proceed with this patch set.

On the other hand, if there is a separate compelling reason to do
that, it doesn't appear to lead to a major regression.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFT][PATCH v2 0/3] cpuidle: teo: Do not check timers unconditionally every time
  2023-08-08 16:43     ` Rafael J. Wysocki
@ 2023-08-08 22:40       ` Doug Smythies
  2023-08-09 16:24         ` Anna-Maria Behnsen
  2023-08-10  1:08         ` Doug Smythies
  0 siblings, 2 replies; 21+ messages in thread
From: Doug Smythies @ 2023-08-08 22:40 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Peter Zijlstra, LKML, Frederic Weisbecker, Linux PM,
	Anna-Maria Behnsen, Kajetan Puchalski, Doug Smythies

On Tue, Aug 8, 2023 at 9:43 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Tue, Aug 8, 2023 at 5:22 PM Doug Smythies <dsmythies@telus.net> wrote:
> > On 2023.08.03 14:33 Rafael wrote:
> > > On Thu, Aug 3, 2023 at 11:12 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > >>
> > >> Hi Folks,
> > >>
> > >> This is the second iteration of:
> > >>
> > >> https://lore.kernel.org/linux-pm/4511619.LvFx2qVVIh@kreacher/
> > >>
> > >> with an additional patch.
> > >>
> > >> There are some small modifications of patch [1/3] and the new
> > >> patch causes governor statistics to play a role in deciding whether
> > >> or not to stop the scheduler tick.
> > >>
> > >> Testing would be much appreciated!
> > >
> > > For convenience, this series is now available in the following git branch:
> > >
> > > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
> > > pm-cpuidle-teo
> >
> > Hi Rafael,
> >
> > Thank you for the git branch link.
> >
> > I did some testing:


... deleted ...

> > Test 2: 6 core ping pong sweep:
> >
> > Pass a token between 6 CPUs on 6 different cores.
> > Do a variable amount of work at each stop.
> >
> > Purpose: To utilize the midrange idle states
> > and observe the transitions from between use of
> > idle states.
> >
> > Results: There is some instability in the results
> > in the early stages.
> > For unknown reasons, the rjw governor sometimes works
> > slower and at lower power. The condition is not 100%
> > repeatable.
> >
> > Overall teo completed the test fastest (54.9 minutes)
> > Followed by menu (56.2 minutes), then rjw (56.7 minutes),
> > then ladder (58.4 minutes). teo is faster throughout the
> > latter stages of the test, but at the cost of more power.
> > The differences seem to be in the transition from idle
> > state 1 to idle state 2 usage.

the magnitude of the later stages differences are significant.

... deleted ...

> Thanks a lot for doing this work, much appreciated!
>
> > Conclusions: Overall, I am not seeing a compelling reason to
> > proceed with this patch set.
>
> On the other hand, if there is a separate compelling reason to do
> that, it doesn't appear to lead to a major regression.

Agreed.

Just for additional information, a 6 core dwell test was run.
The test conditions were cherry picked for dramatic effect:

teo: average: 1162.13 uSec/loop ; Std dev: 0.38
ryw: average: 1266.45 uSec/loop ; Std dev: 6.53 ; +9%

teo: average: 29.98 watts
rjw: average: 30.30 watts
(the same within thermal experimental error)

Details (power and idle stats over the 45 minute test period):
http://smythies.com/~doug/linux/idle/teo-util2/6-13568-147097/perf/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFT][PATCH v2 0/3] cpuidle: teo: Do not check timers unconditionally every time
  2023-08-07 16:47     ` Rafael J. Wysocki
@ 2023-08-09 15:09       ` Anna-Maria Behnsen
  2023-08-09 15:11         ` Rafael J. Wysocki
  0 siblings, 1 reply; 21+ messages in thread
From: Anna-Maria Behnsen @ 2023-08-09 15:09 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Kajetan Puchalski, Peter Zijlstra, LKML, Frederic Weisbecker

[-- Attachment #1: Type: text/plain, Size: 1872 bytes --]

On Mon, 7 Aug 2023, Rafael J. Wysocki wrote:

> On Mon, Aug 7, 2023 at 5:38 PM Anna-Maria Behnsen
> <anna-maria@linutronix.de> wrote:
> >
> > On Thu, 3 Aug 2023, Rafael J. Wysocki wrote:
> >
> > > On Thu, Aug 3, 2023 at 11:12 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > >
> > > > Hi Folks,
> > > >
> > > > This is the second iteration of:
> > > >
> > > > https://lore.kernel.org/linux-pm/4511619.LvFx2qVVIh@kreacher/
> > > >
> > > > with an additional patch.
> > > >
> > > > There are some small modifications of patch [1/3] and the new
> > > > patch causes governor statistics to play a role in deciding whether
> > > > or not to stop the scheduler tick.
> > > >
> > > > Testing would be much appreciated!
> > >
> > > For convenience, this series is now available in the following git branch:
> > >
> > > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
> > >  pm-cpuidle-teo
> > >
> >
> > Gauthams tests and the distribution of idle time durations looks pretty
> > good. Also the prevention of calling tick_nohz_get_sleep_length() is very
> > nice (21477 calls of tick_nohz_next_event() and the tick was stopped 2670
> > times).
> >
> > Here is the deviation of idle time durations (based on your branch):
> >
> > Idle Total              2670    100.00%
> > x >= 4ms                2537    95.02%
> > 4ms> x >= 2ms           19      0.71%
> > 2ms > x >= 1ms          10      0.37%
> > 1ms > x >= 500us        7       0.26%
> > 500us > x >= 250us      6       0.22%
> > 250us > x >=100us       13      0.49%
> > 100us > x >= 50us       17      0.64%
> > 50us > x >= 25us        25      0.94%
> > 25us > x >= 10us        22      0.82%
> > 10us > x > 5us          9       0.34%
> > 5us > x                 5       0.19%
> 
> Thanks a lot for the data!
> 
> Can I add a Tested-by: tag from you to this series?
> 

Sure - sorry for the delay!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFT][PATCH v2 0/3] cpuidle: teo: Do not check timers unconditionally every time
  2023-08-09 15:09       ` Anna-Maria Behnsen
@ 2023-08-09 15:11         ` Rafael J. Wysocki
  0 siblings, 0 replies; 21+ messages in thread
From: Rafael J. Wysocki @ 2023-08-09 15:11 UTC (permalink / raw)
  To: Anna-Maria Behnsen
  Cc: Rafael J. Wysocki, Linux PM, Kajetan Puchalski, Peter Zijlstra,
	LKML, Frederic Weisbecker

On Wed, Aug 9, 2023 at 5:10 PM Anna-Maria Behnsen
<anna-maria@linutronix.de> wrote:
>
> On Mon, 7 Aug 2023, Rafael J. Wysocki wrote:
>
> > On Mon, Aug 7, 2023 at 5:38 PM Anna-Maria Behnsen
> > <anna-maria@linutronix.de> wrote:
> > >
> > > On Thu, 3 Aug 2023, Rafael J. Wysocki wrote:
> > >
> > > > On Thu, Aug 3, 2023 at 11:12 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > > >
> > > > > Hi Folks,
> > > > >
> > > > > This is the second iteration of:
> > > > >
> > > > > https://lore.kernel.org/linux-pm/4511619.LvFx2qVVIh@kreacher/
> > > > >
> > > > > with an additional patch.
> > > > >
> > > > > There are some small modifications of patch [1/3] and the new
> > > > > patch causes governor statistics to play a role in deciding whether
> > > > > or not to stop the scheduler tick.
> > > > >
> > > > > Testing would be much appreciated!
> > > >
> > > > For convenience, this series is now available in the following git branch:
> > > >
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
> > > >  pm-cpuidle-teo
> > > >
> > >
> > > Gauthams tests and the distribution of idle time durations looks pretty
> > > good. Also the prevention of calling tick_nohz_get_sleep_length() is very
> > > nice (21477 calls of tick_nohz_next_event() and the tick was stopped 2670
> > > times).
> > >
> > > Here is the deviation of idle time durations (based on your branch):
> > >
> > > Idle Total              2670    100.00%
> > > x >= 4ms                2537    95.02%
> > > 4ms> x >= 2ms           19      0.71%
> > > 2ms > x >= 1ms          10      0.37%
> > > 1ms > x >= 500us        7       0.26%
> > > 500us > x >= 250us      6       0.22%
> > > 250us > x >=100us       13      0.49%
> > > 100us > x >= 50us       17      0.64%
> > > 50us > x >= 25us        25      0.94%
> > > 25us > x >= 10us        22      0.82%
> > > 10us > x > 5us          9       0.34%
> > > 5us > x                 5       0.19%
> >
> > Thanks a lot for the data!
> >
> > Can I add a Tested-by: tag from you to this series?
> >
>
> Sure - sorry for the delay!

No worries, thanks!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFT][PATCH v2 0/3] cpuidle: teo: Do not check timers unconditionally every time
  2023-08-08 22:40       ` Doug Smythies
@ 2023-08-09 16:24         ` Anna-Maria Behnsen
  2023-08-10  0:43           ` Doug Smythies
  2023-08-10  1:08         ` Doug Smythies
  1 sibling, 1 reply; 21+ messages in thread
From: Anna-Maria Behnsen @ 2023-08-09 16:24 UTC (permalink / raw)
  To: Doug Smythies
  Cc: Rafael J. Wysocki, Peter Zijlstra, LKML, Frederic Weisbecker,
	Linux PM, Kajetan Puchalski

[-- Attachment #1: Type: text/plain, Size: 1828 bytes --]

Hi,

On Tue, 8 Aug 2023, Doug Smythies wrote:
> On Tue, Aug 8, 2023 at 9:43 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > On Tue, Aug 8, 2023 at 5:22 PM Doug Smythies <dsmythies@telus.net> wrote:

[...]

> > > Conclusions: Overall, I am not seeing a compelling reason to
> > > proceed with this patch set.
> >
> > On the other hand, if there is a separate compelling reason to do
> > that, it doesn't appear to lead to a major regression.
> 
> Agreed.
> 

Regarding the compelling reason:

On a fully loaded machine with 256 CPUs tick_nohz_next_event() is executed
~48000 times per second. With this patchset it is reduced to ~120 times per
second. The factor for the difference is 400. This is already an
improvement.

tick_nohz_next_event() marks timer bases idle, whenever possible - even if
the tick is not stopped afterwards. When a timer is enqueued remote into an
idle timer base an IPI is sent. Calling tick_nohz_next_event() only when
the system is not that busy, prevents those unnecessary IPIs.

Beside of those facts, I'm working on the timer pull model [0]. With this,
non pinned timers can also be expired by other CPUs and do not prevent CPUs
from going idle. Those timers will be enqueued on the local CPU without any
heuristics. This helps to improve behavior when a system is idle (regarding
power). But the call of tick_nohz_next_event() will be more expensive which
led to a regression during testing. This regression is gone with the new
teo implementation - it seems that there is also an improvement under
load. I do not have finalized numbers, as it is still WIP (I came across
some other possible optimizations during analyzing the regression, which
I'm evaluating at the moment).

Thanks,

        Anna-Maria


[0] https://lore.kernel.org/lkml/20230524070629.6377-1-anna-maria@linutronix.de/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFT][PATCH v2 0/3] cpuidle: teo: Do not check timers unconditionally every time
  2023-08-09 16:24         ` Anna-Maria Behnsen
@ 2023-08-10  0:43           ` Doug Smythies
  0 siblings, 0 replies; 21+ messages in thread
From: Doug Smythies @ 2023-08-10  0:43 UTC (permalink / raw)
  To: Anna-Maria Behnsen
  Cc: Rafael J. Wysocki, Peter Zijlstra, LKML, Frederic Weisbecker,
	Linux PM, Kajetan Puchalski, Doug Smythies

On Wed, Aug 9, 2023 at 9:24 AM Anna-Maria Behnsen
<anna-maria@linutronix.de> wrote:
> On Tue, 8 Aug 2023, Doug Smythies wrote:
> > On Tue, Aug 8, 2023 at 9:43 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > On Tue, Aug 8, 2023 at 5:22 PM Doug Smythies <dsmythies@telus.net> wrote:
>
> [...]
>
> > > > Conclusions: Overall, I am not seeing a compelling reason to
> > > > proceed with this patch set.
> > >
> > > On the other hand, if there is a separate compelling reason to do
> > > that, it doesn't appear to lead to a major regression.
> >
> > Agreed.
> >
>
> Regarding the compelling reason:
>
> On a fully loaded machine with 256 CPUs tick_nohz_next_event() is executed
> ~48000 times per second. With this patchset it is reduced to ~120 times per
> second. The factor for the difference is 400. This is already an
> improvement.
>
> tick_nohz_next_event() marks timer bases idle, whenever possible - even if
> the tick is not stopped afterwards. When a timer is enqueued remote into an
> idle timer base an IPI is sent. Calling tick_nohz_next_event() only when
> the system is not that busy, prevents those unnecessary IPIs.
>
> Beside of those facts, I'm working on the timer pull model [0]. With this,
> non pinned timers can also be expired by other CPUs and do not prevent CPUs
> from going idle. Those timers will be enqueued on the local CPU without any
> heuristics. This helps to improve behavior when a system is idle (regarding
> power). But the call of tick_nohz_next_event() will be more expensive which
> led to a regression during testing. This regression is gone with the new
> teo implementation - it seems that there is also an improvement under
> load. I do not have finalized numbers, as it is still WIP (I came across
> some other possible optimizations during analyzing the regression, which
> I'm evaluating at the moment).
>
> Thanks,
>
>         Anna-Maria
>
>
> [0] https://lore.kernel.org/lkml/20230524070629.6377-1-anna-maria@linutronix.de/

Thank you for the context and the link.

... Doug

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFT][PATCH v2 0/3] cpuidle: teo: Do not check timers unconditionally every time
  2023-08-08 22:40       ` Doug Smythies
  2023-08-09 16:24         ` Anna-Maria Behnsen
@ 2023-08-10  1:08         ` Doug Smythies
  2023-08-10  7:27           ` Rafael J. Wysocki
  1 sibling, 1 reply; 21+ messages in thread
From: Doug Smythies @ 2023-08-10  1:08 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Peter Zijlstra, LKML, Frederic Weisbecker, Linux PM,
	Anna-Maria Behnsen, Kajetan Puchalski, Doug Smythies

Hi Rafael,

Please bear with me. As you know I have many tests
that search over a wide range of operating conditions
looking for areas to focus on in more detail.

On Tue, Aug 8, 2023 at 3:40 PM Doug Smythies <dsmythies@telus.net> wrote:
> On Tue, Aug 8, 2023 at 9:43 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > On Tue, Aug 8, 2023 at 5:22 PM Doug Smythies <dsmythies@telus.net> wrote:
> > > On 2023.08.03 14:33 Rafael wrote:
> > > > On Thu, Aug 3, 2023 at 11:12 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > >>
> > > >> Hi Folks,
> > > >>
> > > >> This is the second iteration of:
> > > >>
> > > >> https://lore.kernel.org/linux-pm/4511619.LvFx2qVVIh@kreacher/
> > > >>
> > > >> with an additional patch.
> > > >>
> > > >> There are some small modifications of patch [1/3] and the new
> > > >> patch causes governor statistics to play a role in deciding whether
> > > >> or not to stop the scheduler tick.
> > > >>
> > > >> Testing would be much appreciated!
> > > >
> > > > For convenience, this series is now available in the following git branch:
> > > >
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
> > > > pm-cpuidle-teo
> > >
> > > Hi Rafael,
> > >
> > > Thank you for the git branch link.
> > >
> > > I did some testing:
>
>
> ... deleted ...
>
> > > Test 2: 6 core ping pong sweep:
> > >
> > > Pass a token between 6 CPUs on 6 different cores.
> > > Do a variable amount of work at each stop.
> > >
> > > Purpose: To utilize the midrange idle states
> > > and observe the transitions from between use of
> > > idle states.
> > >
> > > Results: There is some instability in the results
> > > in the early stages.
> > > For unknown reasons, the rjw governor sometimes works
> > > slower and at lower power. The condition is not 100%
> > > repeatable.
> > >
> > > Overall teo completed the test fastest (54.9 minutes)
> > > Followed by menu (56.2 minutes), then rjw (56.7 minutes),
> > > then ladder (58.4 minutes). teo is faster throughout the
> > > latter stages of the test, but at the cost of more power.
> > > The differences seem to be in the transition from idle
> > > state 1 to idle state 2 usage.
>
> the magnitude of the later stages differences are significant.
>
> ... deleted ...
>
> > Thanks a lot for doing this work, much appreciated!
> >
> > > Conclusions: Overall, I am not seeing a compelling reason to
> > > proceed with this patch set.
> >
> > On the other hand, if there is a separate compelling reason to do
> > that, it doesn't appear to lead to a major regression.
>
> Agreed.
>
> Just for additional information, a 6 core dwell test was run.
> The test conditions were cherry picked for dramatic effect:
>
> teo: average: 1162.13 uSec/loop ; Std dev: 0.38
> ryw: average: 1266.45 uSec/loop ; Std dev: 6.53 ; +9%
>
> teo: average: 29.98 watts
> rjw: average: 30.30 watts
> (the same within thermal experimental error)
>
> Details (power and idle stats over the 45 minute test period):
> http://smythies.com/~doug/linux/idle/teo-util2/6-13568-147097/perf/

Okay, so while differences in the sometimes selection of a deeper
idle state might be detrimental to latency sensitive workflow such as
above, it is an overwhelming benefit to periodic workflows:

Test 8: low load periodic workflow.

There is an enormous range of work/sleep frequencies and loads
to pick from. There was no cherry picking for this test.

The only criteria is that the periodic fixed packet of work is
completed before the start of the next period.

Test 8 A: 1 load at about 3% and 347 Hz work/sleep frequency:
teo average processor package power: 16.38 watts
rjw average processor package power: 4.29 watts
or 73.8% improvement!!!!!

Test 8 B: 2 loads at about 3% and 347 Hz work/sleep frequency:
teo average processor package power: 18.35 watts
rjw average processor package power: 6.67 watts
or 63.7% improvement!!!!!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFT][PATCH v2 0/3] cpuidle: teo: Do not check timers unconditionally every time
  2023-08-10  1:08         ` Doug Smythies
@ 2023-08-10  7:27           ` Rafael J. Wysocki
  0 siblings, 0 replies; 21+ messages in thread
From: Rafael J. Wysocki @ 2023-08-10  7:27 UTC (permalink / raw)
  To: Doug Smythies
  Cc: Rafael J. Wysocki, Peter Zijlstra, LKML, Frederic Weisbecker,
	Linux PM, Anna-Maria Behnsen, Kajetan Puchalski

Hi Doug,

On Thu, Aug 10, 2023 at 3:08 AM Doug Smythies <dsmythies@telus.net> wrote:
>
> Hi Rafael,
>
> Please bear with me. As you know I have many tests
> that search over a wide range of operating conditions
> looking for areas to focus on in more detail.
>
> On Tue, Aug 8, 2023 at 3:40 PM Doug Smythies <dsmythies@telus.net> wrote:
> > On Tue, Aug 8, 2023 at 9:43 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > On Tue, Aug 8, 2023 at 5:22 PM Doug Smythies <dsmythies@telus.net> wrote:
> > > > On 2023.08.03 14:33 Rafael wrote:
> > > > > On Thu, Aug 3, 2023 at 11:12 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > > > >>
> > > > >> Hi Folks,
> > > > >>
> > > > >> This is the second iteration of:
> > > > >>
> > > > >> https://lore.kernel.org/linux-pm/4511619.LvFx2qVVIh@kreacher/
> > > > >>
> > > > >> with an additional patch.
> > > > >>
> > > > >> There are some small modifications of patch [1/3] and the new
> > > > >> patch causes governor statistics to play a role in deciding whether
> > > > >> or not to stop the scheduler tick.
> > > > >>
> > > > >> Testing would be much appreciated!
> > > > >
> > > > > For convenience, this series is now available in the following git branch:
> > > > >
> > > > > git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \
> > > > > pm-cpuidle-teo
> > > >
> > > > Hi Rafael,
> > > >
> > > > Thank you for the git branch link.
> > > >
> > > > I did some testing:
> >
> >
> > ... deleted ...
> >
> > > > Test 2: 6 core ping pong sweep:
> > > >
> > > > Pass a token between 6 CPUs on 6 different cores.
> > > > Do a variable amount of work at each stop.
> > > >
> > > > Purpose: To utilize the midrange idle states
> > > > and observe the transitions from between use of
> > > > idle states.
> > > >
> > > > Results: There is some instability in the results
> > > > in the early stages.
> > > > For unknown reasons, the rjw governor sometimes works
> > > > slower and at lower power. The condition is not 100%
> > > > repeatable.
> > > >
> > > > Overall teo completed the test fastest (54.9 minutes)
> > > > Followed by menu (56.2 minutes), then rjw (56.7 minutes),
> > > > then ladder (58.4 minutes). teo is faster throughout the
> > > > latter stages of the test, but at the cost of more power.
> > > > The differences seem to be in the transition from idle
> > > > state 1 to idle state 2 usage.
> >
> > the magnitude of the later stages differences are significant.
> >
> > ... deleted ...
> >
> > > Thanks a lot for doing this work, much appreciated!
> > >
> > > > Conclusions: Overall, I am not seeing a compelling reason to
> > > > proceed with this patch set.
> > >
> > > On the other hand, if there is a separate compelling reason to do
> > > that, it doesn't appear to lead to a major regression.
> >
> > Agreed.
> >
> > Just for additional information, a 6 core dwell test was run.
> > The test conditions were cherry picked for dramatic effect:
> >
> > teo: average: 1162.13 uSec/loop ; Std dev: 0.38
> > ryw: average: 1266.45 uSec/loop ; Std dev: 6.53 ; +9%
> >
> > teo: average: 29.98 watts
> > rjw: average: 30.30 watts
> > (the same within thermal experimental error)
> >
> > Details (power and idle stats over the 45 minute test period):
> > http://smythies.com/~doug/linux/idle/teo-util2/6-13568-147097/perf/
>
> Okay, so while differences in the sometimes selection of a deeper
> idle state might be detrimental to latency sensitive workflow such as
> above, it is an overwhelming benefit to periodic workflows:
>
> Test 8: low load periodic workflow.
>
> There is an enormous range of work/sleep frequencies and loads
> to pick from. There was no cherry picking for this test.
>
> The only criteria is that the periodic fixed packet of work is
> completed before the start of the next period.
>
> Test 8 A: 1 load at about 3% and 347 Hz work/sleep frequency:
> teo average processor package power: 16.38 watts
> rjw average processor package power: 4.29 watts
> or 73.8% improvement!!!!!
>
> Test 8 B: 2 loads at about 3% and 347 Hz work/sleep frequency:
> teo average processor package power: 18.35 watts
> rjw average processor package power: 6.67 watts
> or 63.7% improvement!!!!!

This is very interesting, thank you!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFT][PATCH v2 2/3] cpuidle: teo: Skip tick_nohz_get_sleep_length() call in some cases
  2023-08-03 21:09 ` [RFT][PATCH v2 2/3] cpuidle: teo: Skip tick_nohz_get_sleep_length() call in some cases Rafael J. Wysocki
@ 2023-08-11  8:52   ` Anna-Maria Behnsen
  2023-08-11  9:40     ` Rafael J. Wysocki
  0 siblings, 1 reply; 21+ messages in thread
From: Anna-Maria Behnsen @ 2023-08-11  8:52 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Peter Zijlstra, LKML, Frederic Weisbecker, Kajetan Puchalski

Hi Rafael,

On Thu, 3 Aug 2023, Rafael J. Wysocki wrote:

> Index: linux-pm/drivers/cpuidle/governors/teo.c
> ===================================================================
> --- linux-pm.orig/drivers/cpuidle/governors/teo.c
> +++ linux-pm/drivers/cpuidle/governors/teo.c
> @@ -166,6 +166,12 @@
>   */
>  #define NR_RECENT	9
>  
> +/*
> + * Idle state target residency threshold used for deciding whether or not to
> + * check the time till the closest expected timer event.
> + */
> +#define RESIDENCY_THRESHOLD_NS	(15 * NSEC_PER_USEC)
> +

I would like to understand why this residency threshold is a fixed value
and not related to TICK_NSEC. I'm sure there is a reason for it - but for
me it is not obvious. Can you please explain it to me?

Thanks,

	Anna-Maria


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFT][PATCH v2 2/3] cpuidle: teo: Skip tick_nohz_get_sleep_length() call in some cases
  2023-08-11  8:52   ` Anna-Maria Behnsen
@ 2023-08-11  9:40     ` Rafael J. Wysocki
  0 siblings, 0 replies; 21+ messages in thread
From: Rafael J. Wysocki @ 2023-08-11  9:40 UTC (permalink / raw)
  To: Anna-Maria Behnsen
  Cc: Rafael J. Wysocki, Linux PM, Peter Zijlstra, LKML,
	Frederic Weisbecker, Kajetan Puchalski

Hi,

On Fri, Aug 11, 2023 at 10:52 AM Anna-Maria Behnsen
<anna-maria@linutronix.de> wrote:
>
> Hi Rafael,
>
> On Thu, 3 Aug 2023, Rafael J. Wysocki wrote:
>
> > Index: linux-pm/drivers/cpuidle/governors/teo.c
> > ===================================================================
> > --- linux-pm.orig/drivers/cpuidle/governors/teo.c
> > +++ linux-pm/drivers/cpuidle/governors/teo.c
> > @@ -166,6 +166,12 @@
> >   */
> >  #define NR_RECENT    9
> >
> > +/*
> > + * Idle state target residency threshold used for deciding whether or not to
> > + * check the time till the closest expected timer event.
> > + */
> > +#define RESIDENCY_THRESHOLD_NS       (15 * NSEC_PER_USEC)
> > +
>
> I would like to understand why this residency threshold is a fixed value
> and not related to TICK_NSEC. I'm sure there is a reason for it - but for
> me it is not obvious. Can you please explain it to me?

First off, I'm not convinced that there is any direct connection
between the TICK_NSEC value and which idle states can be regarded as
shallow.  HZ may vary between 100 and 1000 (an order of magnitude) and
this doesn't affect the target residencies of idle states in any way.

Next, the checks involving this value don't influence the
tick-stopping decision in any way, so I don't see a reason why they
should depend on TICK_NSEC.

Finally, it can be observed that ideally, the return value of
tick_nohz_get_sleep_length() should always be taken into
consideration, because it may influence the idle state selection in
any case.  However, if the target residency of the idle state selected
so far is really small, calling it may be skipped in case it is
costly, because its contribution is not likely to be significant.
Worst-case we would end up with a CPU wakeup before the target
residency of the selected idle state has elapsed, so some energy will
be lost and some exit latency will be incurred in vain, so this really
should be done when the numbers involved are small enough.

Now, what does "small enough" mean?  My answer is 15 us.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2023-08-11  9:40 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-03 20:57 [RFT][PATCH v2 0/3] cpuidle: teo: Do not check timers unconditionally every time Rafael J. Wysocki
2023-08-03 21:07 ` [RFT][PATCH v2 1/3] cpuidle: teo: Do not call tick_nohz_get_sleep_length() upfront Rafael J. Wysocki
2023-08-03 21:09 ` [RFT][PATCH v2 2/3] cpuidle: teo: Skip tick_nohz_get_sleep_length() call in some cases Rafael J. Wysocki
2023-08-11  8:52   ` Anna-Maria Behnsen
2023-08-11  9:40     ` Rafael J. Wysocki
2023-08-03 21:11 ` [RFT][PATCH v2 3/3] cpuidle: teo: Gather statistics regarding whether or not to stop the tick Rafael J. Wysocki
2023-08-03 21:33 ` [RFT][PATCH v2 0/3] cpuidle: teo: Do not check timers unconditionally every time Rafael J. Wysocki
2023-08-07 15:38   ` Anna-Maria Behnsen
2023-08-07 15:39     ` Anna-Maria Behnsen
2023-08-07 16:47     ` Rafael J. Wysocki
2023-08-09 15:09       ` Anna-Maria Behnsen
2023-08-09 15:11         ` Rafael J. Wysocki
2023-08-08 15:22   ` Doug Smythies
2023-08-08 16:43     ` Rafael J. Wysocki
2023-08-08 22:40       ` Doug Smythies
2023-08-09 16:24         ` Anna-Maria Behnsen
2023-08-10  0:43           ` Doug Smythies
2023-08-10  1:08         ` Doug Smythies
2023-08-10  7:27           ` Rafael J. Wysocki
2023-08-07 14:00 ` Kajetan Puchalski
2023-08-07 16:46   ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).