linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RESEND PATCH v1 0/2] Optimization CPU idle state impacted by tick
@ 2018-08-09 17:20 Leo Yan
  2018-08-09 17:20 ` [RESEND PATCH v1 1/2] cpuidle: menu: Correct the criteria for stopping tick Leo Yan
  2018-08-09 17:20 ` [RESEND PATCH v1 2/2] cpuidle: menu: Dismiss tick impaction on correction factors Leo Yan
  0 siblings, 2 replies; 13+ messages in thread
From: Leo Yan @ 2018-08-09 17:20 UTC (permalink / raw)
  To: Rafael J. Wysocki, Peter Zijlstra (Intel),
	Daniel Lezcano, Vincent Guittot, linux-kernel, Linux PM
  Cc: Leo Yan

After Rafael's patch series 'sched/cpuidle: Idle loop rework' has been
merged in mainline kernel, it perfectly resolved the Powernightmares
issue [1] with not stopping the tick during the idle loop; we verified
this patch series on Arm platform (96boards Hikey620 with octa CA53
CPUs) with the rt-app [2] program to generate workloads: a single task
with below combinded configurations with period 5ms and duty cycle
1%/3%/5%/10%/20%/30%/40%.

After run these testing cases, we found the CPU cannot stay in deepest
idle state as expected, the issues essentialy are related with sched
tick:

The prominent issue is the criteria for decision stopping tick; now
the criteria is checking expected interval is less than TICK_USEC, but
this doesn't consider from the perspective of idle state parameters, so
we can observe the CPU even has enough sleeping time but it cannot run
into deepest idle state; this is very serious for some specific ducy
cycle cases.

Another issue is after tick keeping running in idle state, the tick
can heavily impact on 'menu' governor metrics, especially it will
introduce many noise for next event correction factors.

This patch series tries to fix these two issues; patch 0001 wants to
define a time point to distinguish for stopping or not, this time point
consideres the factors from tick period and the maximum target residency
and use prediction period to compare this time point to decide if need
to stop tick.  Patch 0002 wants to always to give compensation for tick
event so that dimiss the tick impaction on correction factors for next
time prediction.

Blow table are comparison results for testing cases between without and
with this patch series; we run the test case with single task with period
5ms with different dutycycle, the total running time is 10s.  Based on
the tracing log, we do statistics for all CPUs for all idle states
duration, the unit is second (s), on Hikey board the result shows the C2
state (the CPU deepest state) selection improvement.

Some notations are used in the table:

state: C0: WFI; C1: CPU OFF; C2: Cluster OFF

All testing cases have single task with 5ms period:

		 Without patches                  With patches                   Difference
            -----------------------------  -----------------------------  -------------------------------
Duty cycle      C0        C1        C2         C0        C1         C2         C0        C1         C2
  1%        0.218589  4.208460  87.995606  0.119723  0.847116  91.940569  -0.098866  -3.361344  +3.944963
  3%        0.801521  5.031361  86.444753  0.147346  0.820276  91.761191  -0.654175  -4.211085  +5.316438
  5%        0.590236  2.733048  88.284541  0.149237  1.042383  90.490482  -0.440999  -1.690665  +2.205941
 10%        0.601922  6.282368  84.899870  0.169491  1.304985  89.725754  -0.432431  -4.977383  +4.825884
 20%        1.381870  8.531687  80.627691  0.307390  3.302562  86.686887  -1.074480  -5.229125  +6.059196
 30%        1.785221  6.974483  81.083312  0.548050  5.319929  83.551747  -1.237171  -1.654554  +2.468435
 40%        1.403247  6.474203  80.577176  0.467686  6.366482  81.983384  -0.935561  -0.107721  +1.406208


Leo Yan (2):
  cpuidle: menu: Correct the criteria for stopping tick
  cpuidle: menu: Dismiss tick impaction on correction factors

 drivers/cpuidle/governors/menu.c | 55 ++++++++++++++++++++++++++++++++--------
 1 file changed, 45 insertions(+), 10 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RESEND PATCH v1 1/2] cpuidle: menu: Correct the criteria for stopping tick
  2018-08-09 17:20 [RESEND PATCH v1 0/2] Optimization CPU idle state impacted by tick Leo Yan
@ 2018-08-09 17:20 ` Leo Yan
  2018-08-09 20:47   ` Rafael J. Wysocki
  2018-08-09 17:20 ` [RESEND PATCH v1 2/2] cpuidle: menu: Dismiss tick impaction on correction factors Leo Yan
  1 sibling, 1 reply; 13+ messages in thread
From: Leo Yan @ 2018-08-09 17:20 UTC (permalink / raw)
  To: Rafael J. Wysocki, Peter Zijlstra (Intel),
	Daniel Lezcano, Vincent Guittot, linux-kernel, Linux PM
  Cc: Leo Yan

The criteria for keeping tick running is the prediction duration is less
than TICK_USEC, the mainline kernel configures HZ=250 so TICK_USEC equals
to 4000us, so any prediction is less than 4000us will not stop tick and
the idle state will be fixed up to one shallow state.  On the other hand,
let's use 96boards Hikey (CA53 octa-CPUs) as an example, the platform has
the deepest C-state is cluster off state which its 'target_residency' is
2700us, if the 'menu' governor predicts the next idle duration is any
value fallen into the range [2700us, 4000us), then the 'menu' governor
will keep sched tick running and and roll back to a shallow CPU off state
rather than cluster off state.  Finally we can see the CPU has much less
chance to run into deepest state when a task repeatedly running on it
with 5000us period and 40% duty cycle (so the task runs for 2000us and
then sleep for 3000us in every period).  In theory, we should permit the
CPU to stay in cluster off state due the CPU sleeping time 3000us is
over its 'target_residency' 2700us.

This issue is caused by the 'menu' governor's criteria for decision if
need to enable tick and roll back to shallow state, the criteria is:
'expected_interval < TICK_USEC'.  This criteria is only considering from
tick aspect, but it doesn't consider idle state residency so misses
better choice for deeper idle state; e.g., the deepest idle state
'target_residency' is less than TICK_USEC, which is quite common on Arm
platforms.

To fix this issue, this patch is to add one extra variable
'stop_tick_point' to help decision if need to stop tick or not.  If
prediction is longer than 'stop_tick_point' then we can stop tick,
otherwise it will keep tick running.

For 'stop_tick_point', except we need to compare prediction period with
TICK_USEC, we also need consider from the perspective of deepest idle
state 'target_residency'.  Finally, 'stop_tick_point' is coming from the
minimum value within the deepest idle state 'target_residency' and
TICK_USEC.

Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 drivers/cpuidle/governors/menu.c | 41 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 39 insertions(+), 2 deletions(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 30ab759..2ce4068 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -294,6 +294,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 	unsigned int expected_interval;
 	unsigned long nr_iowaiters, cpu_load;
 	ktime_t delta_next;
+	unsigned int stop_tick_point;
 
 	if (data->needs_update) {
 		menu_update(drv, dev);
@@ -406,11 +407,47 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 		idx = 0; /* No states enabled. Must use 0. */
 
 	/*
+	 * Decide the time point for tick stopping, if the prediction is before
+	 * this time point it's better to keep the tick enabled and after the
+	 * time point it means the CPU can stay in idle state for enough long
+	 * time so should stop the tick.  This point needs to consider two
+	 * factors: the first one is tick period and the another factor is the
+	 * maximum target residency.
+	 *
+	 * We can divide into below cases:
+	 *
+	 * The first case is the prediction is shorter than the maximum target
+	 * residency and also shorter than tick period, this means the
+	 * prediction isn't to use deepest idle state and it's suppose the CPU
+	 * will be waken up within tick period, for this case we should keep
+	 * the tick to be enabled;
+	 *
+	 * The second case is the prediction is shorter than the maximum target
+	 * residency and longer than tick period, for this case the idle state
+	 * selection has already based on the prediction for shallow state and
+	 * we will expect some events can arrive later than tick to wake up the
+	 * CPU; another thinking for this case is the CPU is likely to stay in
+	 * the expected idle state for long while (which should be longer than
+	 * tick period), so it's reasonable to stop the tick.
+	 *
+	 * The third case is the prediction is longer than the maximum target
+	 * residency, but weather it's longer or shorter than tick period; for
+	 * this case we have selected the deepest idle state so it's pointless
+	 * to enable tick to wake up CPU from deepest state.
+	 *
+	 * To summary upper cases, we use the value of min(TICK_USEC,
+	 * maximum_target_residency) as the critical point to decide if need to
+	 * stop tick.
+	 */
+	stop_tick_point = min_t(unsigned int, TICK_USEC,
+			drv->states[drv->state_count-1].target_residency);
+
+	/*
 	 * Don't stop the tick if the selected state is a polling one or if the
-	 * expected idle duration is shorter than the tick period length.
+	 * expected idle duration is shorter than the estimated stop tick point.
 	 */
 	if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
-	    expected_interval < TICK_USEC) {
+	    expected_interval < stop_tick_point) {
 		unsigned int delta_next_us = ktime_to_us(delta_next);
 
 		*stop_tick = false;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RESEND PATCH v1 2/2] cpuidle: menu: Dismiss tick impaction on correction factors
  2018-08-09 17:20 [RESEND PATCH v1 0/2] Optimization CPU idle state impacted by tick Leo Yan
  2018-08-09 17:20 ` [RESEND PATCH v1 1/2] cpuidle: menu: Correct the criteria for stopping tick Leo Yan
@ 2018-08-09 17:20 ` Leo Yan
  2018-08-09 21:06   ` Rafael J. Wysocki
  1 sibling, 1 reply; 13+ messages in thread
From: Leo Yan @ 2018-08-09 17:20 UTC (permalink / raw)
  To: Rafael J. Wysocki, Peter Zijlstra (Intel),
	Daniel Lezcano, Vincent Guittot, linux-kernel, Linux PM
  Cc: Leo Yan

If the idle duration predictor detects the tick is triggered, and with
meeting the condition 'data->next_timer_us > TICK_USEC', it will give a
big compensation for the 'measured' interval; this is purposed to avoid
artificially small correction factor values.  Unfortunately, this still
cannot cover all cases of the tick impaction on correction factors,
e.g. if the predicted next event is less than ITCK_USEC, then all
wakening up by the ticks will be taken as usual case and reducing exit
latency, as results the tick events heavily impacts the correction
factors.  Moreover, the coming tick sometimes is very soon, especially
at the first time when the CPU becomes idle the tick expire time might
be vary, so ticks can introduce big deviation on correction factors.

If idle governor deliberately doesn't stop the tick timer, the tick
event is coming as expected with fixed interval, so the tick event is
predictable; if the tick event is coming early than other normal timer
event and other possible wakeup events, we need to dismiss the tick
impaction on correction factors, this can let the correction factor
array is purely used for other wakeup events correctness rather than
sched tick.

This patch is to check if it's a tick wakeup, it takes the CPU can
stay in the idle state for enough time so it gives high compensation
for the measured' interval, this can avoid tick impaction on the
correction factor array.

Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 drivers/cpuidle/governors/menu.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 2ce4068..43cbde3 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -525,15 +525,13 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 	 * assume the state was never reached and the exit latency is 0.
 	 */
 
-	if (data->tick_wakeup && data->next_timer_us > TICK_USEC) {
+	if (data->tick_wakeup) {
 		/*
-		 * The nohz code said that there wouldn't be any events within
-		 * the tick boundary (if the tick was stopped), but the idle
-		 * duration predictor had a differing opinion.  Since the CPU
-		 * was woken up by a tick (that wasn't stopped after all), the
-		 * predictor was not quite right, so assume that the CPU could
-		 * have been idle long (but not forever) to help the idle
-		 * duration predictor do a better job next time.
+		 * Since the CPU was woken up by a tick (that wasn't stopped
+		 * after all), the predictor was not quite right, so assume
+		 * that the CPU could have been idle long (but not forever)
+		 * to help the idle duration predictor do a better job next
+		 * time.
 		 */
 		measured_us = 9 * MAX_INTERESTING / 10;
 	} else {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RESEND PATCH v1 1/2] cpuidle: menu: Correct the criteria for stopping tick
  2018-08-09 17:20 ` [RESEND PATCH v1 1/2] cpuidle: menu: Correct the criteria for stopping tick Leo Yan
@ 2018-08-09 20:47   ` Rafael J. Wysocki
  2018-08-10  7:13     ` leo.yan
  0 siblings, 1 reply; 13+ messages in thread
From: Rafael J. Wysocki @ 2018-08-09 20:47 UTC (permalink / raw)
  To: Leo Yan
  Cc: Rafael J. Wysocki, Peter Zijlstra (Intel),
	Daniel Lezcano, Vincent Guittot, Linux Kernel Mailing List,
	Linux PM

On Thu, Aug 9, 2018 at 7:20 PM, Leo Yan <leo.yan@linaro.org> wrote:
> The criteria for keeping tick running is the prediction duration is less
> than TICK_USEC,

Yes, because if the predicted idle duration is less than the tick
period, stopping the tick is pointless overhead (if the governor
predicts a CPU wakeup within the tick period length range, it may as
well let the tick run, because the CPU will be woken up relatively
shortly anyway).

> the mainline kernel configures HZ=250 so TICK_USEC equals

To be precise, other values of HZ may be used too, depending on how
the kernel is configured.

> to 4000us, so any prediction is less than 4000us will not stop tick and
> the idle state will be fixed up to one shallow state.  On the other hand,
> let's use 96boards Hikey (CA53 octa-CPUs) as an example, the platform has
> the deepest C-state is cluster off state which its 'target_residency' is
> 2700us, if the 'menu' governor predicts the next idle duration is any
> value fallen into the range [2700us, 4000us), then the 'menu' governor
> will keep sched tick running and and roll back to a shallow CPU off state
> rather than cluster off state.

Which is as expected.

> Finally we can see the CPU has much less
> chance to run into deepest state when a task repeatedly running on it
> with 5000us period and 40% duty cycle (so the task runs for 2000us and
> then sleep for 3000us in every period).  In theory, we should permit the
> CPU to stay in cluster off state due the CPU sleeping time 3000us is
> over its 'target_residency' 2700us.

For every particular choice of the criteria you can find a particular
case in which it will be suboptimal.

> This issue is caused by the 'menu' governor's criteria for decision if
> need to enable tick and roll back to shallow state, the criteria is:
> 'expected_interval < TICK_USEC'.  This criteria is only considering from
> tick aspect, but it doesn't consider idle state residency so misses
> better choice for deeper idle state; e.g., the deepest idle state
> 'target_residency' is less than TICK_USEC, which is quite common on Arm
> platforms.
>
> To fix this issue, this patch is to add one extra variable
> 'stop_tick_point' to help decision if need to stop tick or not.  If
> prediction is longer than 'stop_tick_point' then we can stop tick,
> otherwise it will keep tick running.

Opinions may differ on whether or not it is an issue that needs to be fixed.

> For 'stop_tick_point', except we need to compare prediction period with
> TICK_USEC, we also need consider from the perspective of deepest idle
> state 'target_residency'.  Finally, 'stop_tick_point' is coming from the
> minimum value within the deepest idle state 'target_residency' and
> TICK_USEC.
>
> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  drivers/cpuidle/governors/menu.c | 41 ++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 39 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index 30ab759..2ce4068 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -294,6 +294,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>         unsigned int expected_interval;
>         unsigned long nr_iowaiters, cpu_load;
>         ktime_t delta_next;
> +       unsigned int stop_tick_point;
>
>         if (data->needs_update) {
>                 menu_update(drv, dev);
> @@ -406,11 +407,47 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>                 idx = 0; /* No states enabled. Must use 0. */
>
>         /*
> +        * Decide the time point for tick stopping, if the prediction is before
> +        * this time point it's better to keep the tick enabled and after the
> +        * time point it means the CPU can stay in idle state for enough long
> +        * time so should stop the tick.  This point needs to consider two
> +        * factors: the first one is tick period and the another factor is the
> +        * maximum target residency.
> +        *
> +        * We can divide into below cases:
> +        *
> +        * The first case is the prediction is shorter than the maximum target
> +        * residency and also shorter than tick period, this means the
> +        * prediction isn't to use deepest idle state and it's suppose the CPU
> +        * will be waken up within tick period, for this case we should keep
> +        * the tick to be enabled;
> +        *
> +        * The second case is the prediction is shorter than the maximum target
> +        * residency and longer than tick period, for this case the idle state
> +        * selection has already based on the prediction for shallow state and
> +        * we will expect some events can arrive later than tick to wake up the
> +        * CPU; another thinking for this case is the CPU is likely to stay in
> +        * the expected idle state for long while (which should be longer than
> +        * tick period), so it's reasonable to stop the tick.
> +        *
> +        * The third case is the prediction is longer than the maximum target
> +        * residency, but weather it's longer or shorter than tick period; for
> +        * this case we have selected the deepest idle state so it's pointless
> +        * to enable tick to wake up CPU from deepest state.
> +        *
> +        * To summary upper cases, we use the value of min(TICK_USEC,
> +        * maximum_target_residency) as the critical point to decide if need to
> +        * stop tick.
> +        */
> +       stop_tick_point = min_t(unsigned int, TICK_USEC,
> +                       drv->states[drv->state_count-1].target_residency);
> +
> +       /*
>          * Don't stop the tick if the selected state is a polling one or if the
> -        * expected idle duration is shorter than the tick period length.
> +        * expected idle duration is shorter than the estimated stop tick point.
>          */
>         if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> -           expected_interval < TICK_USEC) {
> +           expected_interval < stop_tick_point) {

And that will cause the tick to be stopped unnecessarily in certain
situations, so why is this better?

>                 unsigned int delta_next_us = ktime_to_us(delta_next);
>
>                 *stop_tick = false;
> --

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RESEND PATCH v1 2/2] cpuidle: menu: Dismiss tick impaction on correction factors
  2018-08-09 17:20 ` [RESEND PATCH v1 2/2] cpuidle: menu: Dismiss tick impaction on correction factors Leo Yan
@ 2018-08-09 21:06   ` Rafael J. Wysocki
  0 siblings, 0 replies; 13+ messages in thread
From: Rafael J. Wysocki @ 2018-08-09 21:06 UTC (permalink / raw)
  To: Leo Yan
  Cc: Rafael J. Wysocki, Peter Zijlstra (Intel),
	Daniel Lezcano, Vincent Guittot, Linux Kernel Mailing List,
	Linux PM

On Thu, Aug 9, 2018 at 7:20 PM, Leo Yan <leo.yan@linaro.org> wrote:
> If the idle duration predictor detects the tick is triggered, and with
> meeting the condition 'data->next_timer_us > TICK_USEC', it will give a
> big compensation for the 'measured' interval; this is purposed to avoid
> artificially small correction factor values.  Unfortunately, this still
> cannot cover all cases of the tick impaction on correction factors,
> e.g. if the predicted next event is less than ITCK_USEC, then all
> wakening up by the ticks will be taken as usual case and reducing exit
> latency, as results the tick events heavily impacts the correction
> factors.
>
> Moreover, the coming tick sometimes is very soon, especially
> at the first time when the CPU becomes idle the tick expire time might
> be vary, so ticks can introduce big deviation on correction factors.
>
> If idle governor deliberately doesn't stop the tick timer, the tick
> event is coming as expected with fixed interval, so the tick event is
> predictable; if the tick event is coming early than other normal timer
> event and other possible wakeup events, we need to dismiss the tick
> impaction on correction factors, this can let the correction factor
> array is purely used for other wakeup events correctness rather than
> sched tick.
>
> This patch is to check if it's a tick wakeup, it takes the CPU can
> stay in the idle state for enough time so it gives high compensation
> for the measured' interval, this can avoid tick impaction on the
> correction factor array.

Well, again, this is questionable.

Yes, you can remove the tick influence on correction factors this way,
but will the resulting idle duration predictions be actually better
because of that?

Do you have any data to demonstrate the difference?

>
> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  drivers/cpuidle/governors/menu.c | 14 ++++++--------
>  1 file changed, 6 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index 2ce4068..43cbde3 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -525,15 +525,13 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>          * assume the state was never reached and the exit latency is 0.
>          */
>
> -       if (data->tick_wakeup && data->next_timer_us > TICK_USEC) {
> +       if (data->tick_wakeup) {
>                 /*
> -                * The nohz code said that there wouldn't be any events within
> -                * the tick boundary (if the tick was stopped), but the idle
> -                * duration predictor had a differing opinion.  Since the CPU
> -                * was woken up by a tick (that wasn't stopped after all), the
> -                * predictor was not quite right, so assume that the CPU could
> -                * have been idle long (but not forever) to help the idle
> -                * duration predictor do a better job next time.
> +                * Since the CPU was woken up by a tick (that wasn't stopped
> +                * after all), the predictor was not quite right, so assume

This part of the comment is not valid any more IMO.

The fact that the CPU was woken up by the tick alone doesn't tell you
much about the prediction.  The tick may have not been stopped,
because the nohz code saw timer events within the tick boundary, in
which case the CPU could not be idle very long.  The next_timer_us
check is there to see what the nohz code told us.

> +                * that the CPU could have been idle long (but not forever)
> +                * to help the idle duration predictor do a better job next
> +                * time.
>                  */
>                 measured_us = 9 * MAX_INTERESTING / 10;
>         } else {
> --

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RESEND PATCH v1 1/2] cpuidle: menu: Correct the criteria for stopping tick
  2018-08-09 20:47   ` Rafael J. Wysocki
@ 2018-08-10  7:13     ` leo.yan
  2018-08-10  7:22       ` Rafael J. Wysocki
  0 siblings, 1 reply; 13+ messages in thread
From: leo.yan @ 2018-08-10  7:13 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Peter Zijlstra (Intel),
	Daniel Lezcano, Vincent Guittot, Linux Kernel Mailing List,
	Linux PM

On Thu, Aug 09, 2018 at 10:47:17PM +0200, Rafael J. Wysocki wrote:
> On Thu, Aug 9, 2018 at 7:20 PM, Leo Yan <leo.yan@linaro.org> wrote:
> > The criteria for keeping tick running is the prediction duration is less
> > than TICK_USEC,
> 
> Yes, because if the predicted idle duration is less than the tick
> period, stopping the tick is pointless overhead (if the governor
> predicts a CPU wakeup within the tick period length range, it may as
> well let the tick run, because the CPU will be woken up relatively
> shortly anyway).
> 
> > the mainline kernel configures HZ=250 so TICK_USEC equals
> 
> To be precise, other values of HZ may be used too, depending on how
> the kernel is configured.
> 
> > to 4000us, so any prediction is less than 4000us will not stop tick and
> > the idle state will be fixed up to one shallow state.  On the other hand,
> > let's use 96boards Hikey (CA53 octa-CPUs) as an example, the platform has
> > the deepest C-state is cluster off state which its 'target_residency' is
> > 2700us, if the 'menu' governor predicts the next idle duration is any
> > value fallen into the range [2700us, 4000us), then the 'menu' governor
> > will keep sched tick running and and roll back to a shallow CPU off state
> > rather than cluster off state.
> 
> Which is as expected.
> 
> > Finally we can see the CPU has much less
> > chance to run into deepest state when a task repeatedly running on it
> > with 5000us period and 40% duty cycle (so the task runs for 2000us and
> > then sleep for 3000us in every period).  In theory, we should permit the
> > CPU to stay in cluster off state due the CPU sleeping time 3000us is
> > over its 'target_residency' 2700us.
> 
> For every particular choice of the criteria you can find a particular
> case in which it will be suboptimal.

But this patch wants to resolve a common case rather than the
particular case on Arm board; this issue is possible to happen cross
different platforms.  I will demonstrate in below comments.

[...]

> > diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> > index 30ab759..2ce4068 100644
> > --- a/drivers/cpuidle/governors/menu.c
> > +++ b/drivers/cpuidle/governors/menu.c
> > @@ -294,6 +294,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> >         unsigned int expected_interval;
> >         unsigned long nr_iowaiters, cpu_load;
> >         ktime_t delta_next;
> > +       unsigned int stop_tick_point;
> >
> >         if (data->needs_update) {
> >                 menu_update(drv, dev);
> > @@ -406,11 +407,47 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
> >                 idx = 0; /* No states enabled. Must use 0. */
> >
> >         /*
> > +        * Decide the time point for tick stopping, if the prediction is before
> > +        * this time point it's better to keep the tick enabled and after the
> > +        * time point it means the CPU can stay in idle state for enough long
> > +        * time so should stop the tick.  This point needs to consider two
> > +        * factors: the first one is tick period and the another factor is the
> > +        * maximum target residency.
> > +        *
> > +        * We can divide into below cases:
> > +        *
> > +        * The first case is the prediction is shorter than the maximum target
> > +        * residency and also shorter than tick period, this means the
> > +        * prediction isn't to use deepest idle state and it's suppose the CPU
> > +        * will be waken up within tick period, for this case we should keep
> > +        * the tick to be enabled;
> > +        *
> > +        * The second case is the prediction is shorter than the maximum target
> > +        * residency and longer than tick period, for this case the idle state
> > +        * selection has already based on the prediction for shallow state and
> > +        * we will expect some events can arrive later than tick to wake up the
> > +        * CPU; another thinking for this case is the CPU is likely to stay in
> > +        * the expected idle state for long while (which should be longer than
> > +        * tick period), so it's reasonable to stop the tick.
> > +        *
> > +        * The third case is the prediction is longer than the maximum target
> > +        * residency, but weather it's longer or shorter than tick period; for
> > +        * this case we have selected the deepest idle state so it's pointless
> > +        * to enable tick to wake up CPU from deepest state.
> > +        *
> > +        * To summary upper cases, we use the value of min(TICK_USEC,
> > +        * maximum_target_residency) as the critical point to decide if need to
> > +        * stop tick.
> > +        */
> > +       stop_tick_point = min_t(unsigned int, TICK_USEC,
> > +                       drv->states[drv->state_count-1].target_residency);
> > +
> > +       /*
> >          * Don't stop the tick if the selected state is a polling one or if the
> > -        * expected idle duration is shorter than the tick period length.
> > +        * expected idle duration is shorter than the estimated stop tick point.
> >          */
> >         if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> > -           expected_interval < TICK_USEC) {
> > +           expected_interval < stop_tick_point) {
> 
> And that will cause the tick to be stopped unnecessarily in certain
> situations, so why is this better?

Let's see below two cases, the first one case we configure
TICK_USEC=1000 (1ms) and the second case we configure TICK_USEC=4000
(4ms).

Let's assume we do the testing one the same platform and have two runs,
in the Case 1 we configure HZ=1000 so TICK_USEC=1ms, expected_interval
is 3ms and deepest idle state target residency is 2ms, finally the idle
governor will choose the deepest state and skip to calibrate to shallow
state caused by 'expected_interval' > TICK_USEC;

In the Case 2 we configure HZ=250 so TICK_USE=4ms, expected_interval
(3ms) and deepest idle state target residency (2ms) are same with the
Case 1; but because expected_interval < TICK_USEC so the idle governor
will do calibration to select a shallower state.  If we image on one
platform, the deepest idle state's target residency is smaller value,
then it has bigger gap with TICK_USEC, the deepest idle state is harder
to be selected due 'expected_interval' can be easily hit the range
[Deepest target residency..TICK_USEC).

This patch has no any change for Case 1 and it wants to optimize for
Case 2 so Case 2 has chance to stay in deepest idle state.  I
understand from the performance pespective, we need to avoid to stop
tick for shallow states; on the other hand we cannot prevent CPU run
into deepest idle state just only we want to keep the tick running,
especially the expected interval is longer than the deepest state
target residency.

Case 1:
      Deepest idle state's target residency=2ms
                     |
                     V
|--------------------------------------------------------> time (ms)
      ^                              ^
      |                              |
TICK_USEC=1ms           expected_interval=3ms


Case 2:
      Deepest idle state's target residency = 2ms
                     |
                     V
|--------------------------------------------------------> time (ms)
                                     ^                  ^
                                     |                  |
                          expected_interval = 3ms   TICK_USEC = 4ms



> >                 unsigned int delta_next_us = ktime_to_us(delta_next);
> >
> >                 *stop_tick = false;
> > --

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RESEND PATCH v1 1/2] cpuidle: menu: Correct the criteria for stopping tick
  2018-08-10  7:13     ` leo.yan
@ 2018-08-10  7:22       ` Rafael J. Wysocki
  2018-08-10  8:49         ` leo.yan
  0 siblings, 1 reply; 13+ messages in thread
From: Rafael J. Wysocki @ 2018-08-10  7:22 UTC (permalink / raw)
  To: Leo Yan
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Peter Zijlstra (Intel),
	Daniel Lezcano, Vincent Guittot, Linux Kernel Mailing List,
	Linux PM

On Fri, Aug 10, 2018 at 9:13 AM,  <leo.yan@linaro.org> wrote:
> On Thu, Aug 09, 2018 at 10:47:17PM +0200, Rafael J. Wysocki wrote:
>> On Thu, Aug 9, 2018 at 7:20 PM, Leo Yan <leo.yan@linaro.org> wrote:

[cut]

>> And that will cause the tick to be stopped unnecessarily in certain
>> situations, so why is this better?
>
> Let's see below two cases, the first one case we configure
> TICK_USEC=1000 (1ms) and the second case we configure TICK_USEC=4000
> (4ms).
>
> Let's assume we do the testing one the same platform and have two runs,
> in the Case 1 we configure HZ=1000 so TICK_USEC=1ms, expected_interval
> is 3ms and deepest idle state target residency is 2ms, finally the idle
> governor will choose the deepest state and skip to calibrate to shallow
> state caused by 'expected_interval' > TICK_USEC;
>
> In the Case 2 we configure HZ=250 so TICK_USE=4ms, expected_interval
> (3ms) and deepest idle state target residency (2ms) are same with the
> Case 1; but because expected_interval < TICK_USEC so the idle governor
> will do calibration to select a shallower state.  If we image on one
> platform, the deepest idle state's target residency is smaller value,
> then it has bigger gap with TICK_USEC, the deepest idle state is harder
> to be selected due 'expected_interval' can be easily hit the range
> [Deepest target residency..TICK_USEC).
>
> This patch has no any change for Case 1 and it wants to optimize for
> Case 2 so Case 2 has chance to stay in deepest idle state.  I
> understand from the performance pespective, we need to avoid to stop
> tick for shallow states; on the other hand we cannot prevent CPU run
> into deepest idle state just only we want to keep the tick running,
> especially the expected interval is longer than the deepest state
> target residency.
>
> Case 1:
>       Deepest idle state's target residency=2ms
>                      |
>                      V
> |--------------------------------------------------------> time (ms)
>       ^                              ^
>       |                              |
> TICK_USEC=1ms           expected_interval=3ms
>
>
> Case 2:
>       Deepest idle state's target residency = 2ms
>                      |
>                      V
> |--------------------------------------------------------> time (ms)
>                                      ^                  ^
>                                      |                  |
>                           expected_interval = 3ms   TICK_USEC = 4ms
>
>
>
>> >                 unsigned int delta_next_us = ktime_to_us(delta_next);
>> >
>> >                 *stop_tick = false;
>> > --

Well, I don't quite agree with the approach here, then.

As I said in the previous reply, IMO restarting the stopped tick
before leaving the loop in do_idle() is pointless overhead.  It is not
necessary to do that to avoid leaving CPUs in shallow idle states for
too long (I'll send an alternative patch to fix this issue shortly).

While you may think that pointless overhead is not a problem, I don't
quite agree with that.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RESEND PATCH v1 1/2] cpuidle: menu: Correct the criteria for stopping tick
  2018-08-10  7:22       ` Rafael J. Wysocki
@ 2018-08-10  8:49         ` leo.yan
  2018-08-10  9:03           ` leo.yan
  0 siblings, 1 reply; 13+ messages in thread
From: leo.yan @ 2018-08-10  8:49 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Peter Zijlstra (Intel),
	Daniel Lezcano, Vincent Guittot, Linux Kernel Mailing List,
	Linux PM

On Fri, Aug 10, 2018 at 09:22:10AM +0200, Rafael J. Wysocki wrote:
> On Fri, Aug 10, 2018 at 9:13 AM,  <leo.yan@linaro.org> wrote:
> > On Thu, Aug 09, 2018 at 10:47:17PM +0200, Rafael J. Wysocki wrote:
> >> On Thu, Aug 9, 2018 at 7:20 PM, Leo Yan <leo.yan@linaro.org> wrote:
> 
> [cut]
> 
> >> And that will cause the tick to be stopped unnecessarily in certain
> >> situations, so why is this better?
> >
> > Let's see below two cases, the first one case we configure
> > TICK_USEC=1000 (1ms) and the second case we configure TICK_USEC=4000
> > (4ms).
> >
> > Let's assume we do the testing one the same platform and have two runs,
> > in the Case 1 we configure HZ=1000 so TICK_USEC=1ms, expected_interval
> > is 3ms and deepest idle state target residency is 2ms, finally the idle
> > governor will choose the deepest state and skip to calibrate to shallow
> > state caused by 'expected_interval' > TICK_USEC;
> >
> > In the Case 2 we configure HZ=250 so TICK_USE=4ms, expected_interval
> > (3ms) and deepest idle state target residency (2ms) are same with the
> > Case 1; but because expected_interval < TICK_USEC so the idle governor
> > will do calibration to select a shallower state.  If we image on one
> > platform, the deepest idle state's target residency is smaller value,
> > then it has bigger gap with TICK_USEC, the deepest idle state is harder
> > to be selected due 'expected_interval' can be easily hit the range
> > [Deepest target residency..TICK_USEC).
> >
> > This patch has no any change for Case 1 and it wants to optimize for
> > Case 2 so Case 2 has chance to stay in deepest idle state.  I
> > understand from the performance pespective, we need to avoid to stop
> > tick for shallow states; on the other hand we cannot prevent CPU run
> > into deepest idle state just only we want to keep the tick running,
> > especially the expected interval is longer than the deepest state
> > target residency.
> >
> > Case 1:
> >       Deepest idle state's target residency=2ms
> >                      |
> >                      V
> > |--------------------------------------------------------> time (ms)
> >       ^                              ^
> >       |                              |
> > TICK_USEC=1ms           expected_interval=3ms
> >
> >
> > Case 2:
> >       Deepest idle state's target residency = 2ms
> >                      |
> >                      V
> > |--------------------------------------------------------> time (ms)
> >                                      ^                  ^
> >                                      |                  |
> >                           expected_interval = 3ms   TICK_USEC = 4ms
> >
> >
> >
> >> >                 unsigned int delta_next_us = ktime_to_us(delta_next);
> >> >
> >> >                 *stop_tick = false;
> >> > --
> 
> Well, I don't quite agree with the approach here, then.
> 
> As I said in the previous reply, IMO restarting the stopped tick
> before leaving the loop in do_idle() is pointless overhead.  It is not
> necessary to do that to avoid leaving CPUs in shallow idle states for
> too long (I'll send an alternative patch to fix this issue shortly).
> 
> While you may think that pointless overhead is not a problem, I don't
> quite agree with that.

I disagree this patch will introduce any extra overhead.

Firstly, the idle loop doesn't support restarting tick even this patch
tells idle loop to restart the tick; secondly this patch is mainly to
resolve issue for the CPU cannot stay in deepest state in Case 2, as a
side effect it also can tell idle loop to restart the tick for case 3
in below, actually IMHO this makes sense to tell the idle loop to
enable the tick but idle loop can ignore this info.

Furthermore, we have another thread for the patch to always stop
tick after the the tick has been stopped in the idle loop.

So this patch is still valid.

Case 3:
      Deepest idle state's target residency = 2ms
                     |
                     V
|--------------------------------------------------------> time (ms)
      ^        `
      |         \
TICK_USEC=1ms   expected_interval = 1.5ms



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RESEND PATCH v1 1/2] cpuidle: menu: Correct the criteria for stopping tick
  2018-08-10  8:49         ` leo.yan
@ 2018-08-10  9:03           ` leo.yan
  2018-08-12 11:12             ` Rafael J. Wysocki
  0 siblings, 1 reply; 13+ messages in thread
From: leo.yan @ 2018-08-10  9:03 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Peter Zijlstra (Intel),
	Daniel Lezcano, Vincent Guittot, Linux Kernel Mailing List,
	Linux PM

On Fri, Aug 10, 2018 at 04:49:06PM +0800, Leo Yan wrote:
> On Fri, Aug 10, 2018 at 09:22:10AM +0200, Rafael J. Wysocki wrote:
> > On Fri, Aug 10, 2018 at 9:13 AM,  <leo.yan@linaro.org> wrote:
> > > On Thu, Aug 09, 2018 at 10:47:17PM +0200, Rafael J. Wysocki wrote:
> > >> On Thu, Aug 9, 2018 at 7:20 PM, Leo Yan <leo.yan@linaro.org> wrote:
> > 
> > [cut]
> > 
> > >> And that will cause the tick to be stopped unnecessarily in certain
> > >> situations, so why is this better?
> > >
> > > Let's see below two cases, the first one case we configure
> > > TICK_USEC=1000 (1ms) and the second case we configure TICK_USEC=4000
> > > (4ms).
> > >
> > > Let's assume we do the testing one the same platform and have two runs,
> > > in the Case 1 we configure HZ=1000 so TICK_USEC=1ms, expected_interval
> > > is 3ms and deepest idle state target residency is 2ms, finally the idle
> > > governor will choose the deepest state and skip to calibrate to shallow
> > > state caused by 'expected_interval' > TICK_USEC;
> > >
> > > In the Case 2 we configure HZ=250 so TICK_USE=4ms, expected_interval
> > > (3ms) and deepest idle state target residency (2ms) are same with the
> > > Case 1; but because expected_interval < TICK_USEC so the idle governor
> > > will do calibration to select a shallower state.  If we image on one
> > > platform, the deepest idle state's target residency is smaller value,
> > > then it has bigger gap with TICK_USEC, the deepest idle state is harder
> > > to be selected due 'expected_interval' can be easily hit the range
> > > [Deepest target residency..TICK_USEC).
> > >
> > > This patch has no any change for Case 1 and it wants to optimize for
> > > Case 2 so Case 2 has chance to stay in deepest idle state.  I
> > > understand from the performance pespective, we need to avoid to stop
> > > tick for shallow states; on the other hand we cannot prevent CPU run
> > > into deepest idle state just only we want to keep the tick running,
> > > especially the expected interval is longer than the deepest state
> > > target residency.
> > >
> > > Case 1:
> > >       Deepest idle state's target residency=2ms
> > >                      |
> > >                      V
> > > |--------------------------------------------------------> time (ms)
> > >       ^                              ^
> > >       |                              |
> > > TICK_USEC=1ms           expected_interval=3ms
> > >
> > >
> > > Case 2:
> > >       Deepest idle state's target residency = 2ms
> > >                      |
> > >                      V
> > > |--------------------------------------------------------> time (ms)
> > >                                      ^                  ^
> > >                                      |                  |
> > >                           expected_interval = 3ms   TICK_USEC = 4ms
> > >
> > >
> > >
> > >> >                 unsigned int delta_next_us = ktime_to_us(delta_next);
> > >> >
> > >> >                 *stop_tick = false;
> > >> > --
> > 
> > Well, I don't quite agree with the approach here, then.
> > 
> > As I said in the previous reply, IMO restarting the stopped tick
> > before leaving the loop in do_idle() is pointless overhead.  It is not
> > necessary to do that to avoid leaving CPUs in shallow idle states for
> > too long (I'll send an alternative patch to fix this issue shortly).
> > 
> > While you may think that pointless overhead is not a problem, I don't
> > quite agree with that.
> 
> I disagree this patch will introduce any extra overhead.
> 
> Firstly, the idle loop doesn't support restarting tick even this patch
> tells idle loop to restart the tick; secondly this patch is mainly to
> resolve issue for the CPU cannot stay in deepest state in Case 2, as a
> side effect it also can tell idle loop to restart the tick for case 3
> in below, actually IMHO this makes sense to tell the idle loop to
> enable the tick but idle loop can ignore this info.
> 
> Furthermore, we have another thread for the patch to always stop
> tick after the the tick has been stopped in the idle loop.
> 
> So this patch is still valid.

Correct for Case 3 as below, actually this case will disappear if we
force to set expected_interval=ktime_to_us(delta_next) in another
proposaled patch.  If so, this patch will have no any chance to
introduce extra ticks.

 expected_interval                             Deepest idle state's
 = min(TICK_USEC,ktime_to_us(delta_next))      target residency = 2ms
 = TICK_USEC = 1ms                                    |
      |                                               |
      V                                               V
|--------------------------------------------------------> time (ms)
      ^
      |
TICK_USEC=1ms

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RESEND PATCH v1 1/2] cpuidle: menu: Correct the criteria for stopping tick
  2018-08-10  9:03           ` leo.yan
@ 2018-08-12 11:12             ` Rafael J. Wysocki
  2018-08-12 16:07               ` leo.yan
  0 siblings, 1 reply; 13+ messages in thread
From: Rafael J. Wysocki @ 2018-08-12 11:12 UTC (permalink / raw)
  To: Leo Yan
  Cc: Rafael J. Wysocki, Rafael Wysocki, Peter Zijlstra,
	Daniel Lezcano, Vincent Guittot, Linux Kernel Mailing List,
	Linux PM

On Fri, Aug 10, 2018 at 11:03 AM <leo.yan@linaro.org> wrote:
>
> On Fri, Aug 10, 2018 at 04:49:06PM +0800, Leo Yan wrote:
> > On Fri, Aug 10, 2018 at 09:22:10AM +0200, Rafael J. Wysocki wrote:
> > > On Fri, Aug 10, 2018 at 9:13 AM,  <leo.yan@linaro.org> wrote:
> > > > On Thu, Aug 09, 2018 at 10:47:17PM +0200, Rafael J. Wysocki wrote:
> > > >> On Thu, Aug 9, 2018 at 7:20 PM, Leo Yan <leo.yan@linaro.org> wrote:
> > >
> > > [cut]
> > >
> > > >> And that will cause the tick to be stopped unnecessarily in certain
> > > >> situations, so why is this better?
> > > >
> > > > Let's see below two cases, the first one case we configure
> > > > TICK_USEC=1000 (1ms) and the second case we configure TICK_USEC=4000
> > > > (4ms).
> > > >
> > > > Let's assume we do the testing one the same platform and have two runs,
> > > > in the Case 1 we configure HZ=1000 so TICK_USEC=1ms, expected_interval
> > > > is 3ms and deepest idle state target residency is 2ms, finally the idle
> > > > governor will choose the deepest state and skip to calibrate to shallow
> > > > state caused by 'expected_interval' > TICK_USEC;
> > > >
> > > > In the Case 2 we configure HZ=250 so TICK_USE=4ms, expected_interval
> > > > (3ms) and deepest idle state target residency (2ms) are same with the
> > > > Case 1; but because expected_interval < TICK_USEC so the idle governor
> > > > will do calibration to select a shallower state.  If we image on one
> > > > platform, the deepest idle state's target residency is smaller value,
> > > > then it has bigger gap with TICK_USEC, the deepest idle state is harder
> > > > to be selected due 'expected_interval' can be easily hit the range
> > > > [Deepest target residency..TICK_USEC).
> > > >
> > > > This patch has no any change for Case 1 and it wants to optimize for
> > > > Case 2 so Case 2 has chance to stay in deepest idle state.  I
> > > > understand from the performance pespective, we need to avoid to stop
> > > > tick for shallow states; on the other hand we cannot prevent CPU run
> > > > into deepest idle state just only we want to keep the tick running,
> > > > especially the expected interval is longer than the deepest state
> > > > target residency.
> > > >
> > > > Case 1:
> > > >       Deepest idle state's target residency=2ms
> > > >                      |
> > > >                      V
> > > > |--------------------------------------------------------> time (ms)
> > > >       ^                              ^
> > > >       |                              |
> > > > TICK_USEC=1ms           expected_interval=3ms
> > > >
> > > >
> > > > Case 2:
> > > >       Deepest idle state's target residency = 2ms
> > > >                      |
> > > >                      V
> > > > |--------------------------------------------------------> time (ms)
> > > >                                      ^                  ^
> > > >                                      |                  |
> > > >                           expected_interval = 3ms   TICK_USEC = 4ms
> > > >
> > > >
> > > >
> > > >> >                 unsigned int delta_next_us = ktime_to_us(delta_next);
> > > >> >
> > > >> >                 *stop_tick = false;
> > > >> > --
> > >
> > > Well, I don't quite agree with the approach here, then.
> > >
> > > As I said in the previous reply, IMO restarting the stopped tick
> > > before leaving the loop in do_idle() is pointless overhead.  It is not
> > > necessary to do that to avoid leaving CPUs in shallow idle states for
> > > too long (I'll send an alternative patch to fix this issue shortly).
> > >
> > > While you may think that pointless overhead is not a problem, I don't
> > > quite agree with that.
> >
> > I disagree this patch will introduce any extra overhead.
> >
> > Firstly, the idle loop doesn't support restarting tick even this patch
> > tells idle loop to restart the tick;

I'm not talking about restarting the tick, but about stopping it more
often on average.

> > secondly this patch is mainly to
> > resolve issue for the CPU cannot stay in deepest state in Case 2,

I understand what you are trying to achieve here, but I don't agree with it.

The condition modified by this patch is not about how much time the
CPU can potentially be idle, but about when it is expected to wake up.
The "expected" part is really key here.

The governor has gone through the effort of making an idle duration
prediction and it now it has a certain expectation regarding when the
CPU will wake up.  If the governor's prediction is any good at all and
this expectation is in the tick range, the CPU will be woken up by
something close enough to the tick in the majority of cases, so there
is no need to stop the tick.  Not because the CPU cannot be idle
longer, but because it is expected to wake up early enough anyway (and
yes, you can argue that 2 times the tick range may still be "early
enough" and so on, but then I'd like to see numbers in support of
that).

Now, if the governor is junk and its predictions are useless, the
above will not be the case any more, but then I'm not sure what the
benefit from using that governor at all is. :-)

> > as a side effect it also can tell idle loop to restart the tick for case 3
> > in below, actually IMHO this makes sense to tell the idle loop to
> > enable the tick but idle loop can ignore this info.
> >
> > Furthermore, we have another thread for the patch to always stop
> > tick after the the tick has been stopped in the idle loop.
> >
> > So this patch is still valid.
>
> Correct for Case 3 as below, actually this case will disappear if we
> force to set expected_interval=ktime_to_us(delta_next) in another
> proposaled patch.  If so, this patch will have no any chance to
> introduce extra ticks.

Yes, it will or at least it may.

Assuming shot noise wakeups, if
drv->states[drv->state_count-1].target_residency is less than
TICK_USEC, the tick will be stopped for CPUs more often on average
with the patch applied (simply because the idle duration range for
which it will not be stopped is narrower then).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RESEND PATCH v1 1/2] cpuidle: menu: Correct the criteria for stopping tick
  2018-08-12 11:12             ` Rafael J. Wysocki
@ 2018-08-12 16:07               ` leo.yan
  2018-08-13  8:01                 ` Rafael J. Wysocki
  0 siblings, 1 reply; 13+ messages in thread
From: leo.yan @ 2018-08-12 16:07 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael Wysocki, Peter Zijlstra, Daniel Lezcano, Vincent Guittot,
	Linux Kernel Mailing List, Linux PM

On Sun, Aug 12, 2018 at 01:12:41PM +0200, Rafael J. Wysocki wrote:
> On Fri, Aug 10, 2018 at 11:03 AM <leo.yan@linaro.org> wrote:
> >
> > On Fri, Aug 10, 2018 at 04:49:06PM +0800, Leo Yan wrote:
> > > On Fri, Aug 10, 2018 at 09:22:10AM +0200, Rafael J. Wysocki wrote:
> > > > On Fri, Aug 10, 2018 at 9:13 AM,  <leo.yan@linaro.org> wrote:
> > > > > On Thu, Aug 09, 2018 at 10:47:17PM +0200, Rafael J. Wysocki wrote:
> > > > >> On Thu, Aug 9, 2018 at 7:20 PM, Leo Yan <leo.yan@linaro.org> wrote:
> > > >
> > > > [cut]
> > > >
> > > > >> And that will cause the tick to be stopped unnecessarily in certain
> > > > >> situations, so why is this better?
> > > > >
> > > > > Let's see below two cases, the first one case we configure
> > > > > TICK_USEC=1000 (1ms) and the second case we configure TICK_USEC=4000
> > > > > (4ms).
> > > > >
> > > > > Let's assume we do the testing one the same platform and have two runs,
> > > > > in the Case 1 we configure HZ=1000 so TICK_USEC=1ms, expected_interval
> > > > > is 3ms and deepest idle state target residency is 2ms, finally the idle
> > > > > governor will choose the deepest state and skip to calibrate to shallow
> > > > > state caused by 'expected_interval' > TICK_USEC;
> > > > >
> > > > > In the Case 2 we configure HZ=250 so TICK_USE=4ms, expected_interval
> > > > > (3ms) and deepest idle state target residency (2ms) are same with the
> > > > > Case 1; but because expected_interval < TICK_USEC so the idle governor
> > > > > will do calibration to select a shallower state.  If we image on one
> > > > > platform, the deepest idle state's target residency is smaller value,
> > > > > then it has bigger gap with TICK_USEC, the deepest idle state is harder
> > > > > to be selected due 'expected_interval' can be easily hit the range
> > > > > [Deepest target residency..TICK_USEC).
> > > > >
> > > > > This patch has no any change for Case 1 and it wants to optimize for
> > > > > Case 2 so Case 2 has chance to stay in deepest idle state.  I
> > > > > understand from the performance pespective, we need to avoid to stop
> > > > > tick for shallow states; on the other hand we cannot prevent CPU run
> > > > > into deepest idle state just only we want to keep the tick running,
> > > > > especially the expected interval is longer than the deepest state
> > > > > target residency.
> > > > >
> > > > > Case 1:
> > > > >       Deepest idle state's target residency=2ms
> > > > >                      |
> > > > >                      V
> > > > > |--------------------------------------------------------> time (ms)
> > > > >       ^                              ^
> > > > >       |                              |
> > > > > TICK_USEC=1ms           expected_interval=3ms
> > > > >
> > > > >
> > > > > Case 2:
> > > > >       Deepest idle state's target residency = 2ms
> > > > >                      |
> > > > >                      V
> > > > > |--------------------------------------------------------> time (ms)
> > > > >                                      ^                  ^
> > > > >                                      |                  |
> > > > >                           expected_interval = 3ms   TICK_USEC = 4ms
> > > > >
> > > > >
> > > > >
> > > > >> >                 unsigned int delta_next_us = ktime_to_us(delta_next);
> > > > >> >
> > > > >> >                 *stop_tick = false;
> > > > >> > --
> > > >
> > > > Well, I don't quite agree with the approach here, then.
> > > >
> > > > As I said in the previous reply, IMO restarting the stopped tick
> > > > before leaving the loop in do_idle() is pointless overhead.  It is not
> > > > necessary to do that to avoid leaving CPUs in shallow idle states for
> > > > too long (I'll send an alternative patch to fix this issue shortly).
> > > >
> > > > While you may think that pointless overhead is not a problem, I don't
> > > > quite agree with that.
> > >
> > > I disagree this patch will introduce any extra overhead.
> > >
> > > Firstly, the idle loop doesn't support restarting tick even this patch
> > > tells idle loop to restart the tick;
> 
> I'm not talking about restarting the tick, but about stopping it more
> often on average.

Ah, yes, I agree.

> > > secondly this patch is mainly to
> > > resolve issue for the CPU cannot stay in deepest state in Case 2,
> 
> I understand what you are trying to achieve here, but I don't agree with it.

I agree we need find more general method for fixing.

> The condition modified by this patch is not about how much time the
> CPU can potentially be idle, but about when it is expected to wake up.
> The "expected" part is really key here.
> 
> The governor has gone through the effort of making an idle duration
> prediction and it now it has a certain expectation regarding when the
> CPU will wake up.  If the governor's prediction is any good at all and
> this expectation is in the tick range, the CPU will be woken up by
> something close enough to the tick in the majority of cases, so there
> is no need to stop the tick.  Not because the CPU cannot be idle
> longer, but because it is expected to wake up early enough anyway (and
> yes, you can argue that 2 times the tick range may still be "early
> enough" and so on, but then I'd like to see numbers in support of
> that).

Thanks for explaination; I also think this is good methodology, but
just want to improve a bit based on this.  For example, the governor
is always to use 'expectation' to compare with TICK_USEC, TICK_USEC is a
predefined interval as a boundary, but in reality the tick incoming time
is in the range of [0..TICK_USEC]; so currently method we cannot make
decision according to the tick's delta in realtime.

I'd like take this issue as 'how to improve the decision for stopping
tick?', if we can make better decision for stopping tick, then it's
possible to resolve Case 2 and without stopping tick more offten, e.g.
the CPU even can run into deepest idle state without stopping the tick
if the prediction is less than the tick.

I will send out a new patch set based on these ideas for reviewing.

> Now, if the governor is junk and its predictions are useless, the
> above will not be the case any more, but then I'm not sure what the
> benefit from using that governor at all is. :-)

I really don't think the governor and predictions are useless :)

Just want to remind the side topic, after introducing tick in idle loop,
the tick also can impact the predictions (e.g. it have some impactions
on correction factors but need more time for modeling on this).

> > > as a side effect it also can tell idle loop to restart the tick for case 3
> > > in below, actually IMHO this makes sense to tell the idle loop to
> > > enable the tick but idle loop can ignore this info.
> > >
> > > Furthermore, we have another thread for the patch to always stop
> > > tick after the the tick has been stopped in the idle loop.
> > >
> > > So this patch is still valid.
> >
> > Correct for Case 3 as below, actually this case will disappear if we
> > force to set expected_interval=ktime_to_us(delta_next) in another
> > proposaled patch.  If so, this patch will have no any chance to
> > introduce extra ticks.
> 
> Yes, it will or at least it may.
> 
> Assuming shot noise wakeups, if
> drv->states[drv->state_count-1].target_residency is less than
> TICK_USEC, the tick will be stopped for CPUs more often on average
> with the patch applied (simply because the idle duration range for
> which it will not be stopped is narrower then).

Yes, good point, so in the new approach I try to change the code
to compare with next tick delta rather than TICK_USEC, it can keeps
running tick for the tick with long expire time (similiar with
comparing with TICK_USEC) but we also can stop tick if the tick is likely
to break idle residency.

Thanks,
Leo Yan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RESEND PATCH v1 1/2] cpuidle: menu: Correct the criteria for stopping tick
  2018-08-12 16:07               ` leo.yan
@ 2018-08-13  8:01                 ` Rafael J. Wysocki
  2018-08-13  9:58                   ` leo.yan
  0 siblings, 1 reply; 13+ messages in thread
From: Rafael J. Wysocki @ 2018-08-13  8:01 UTC (permalink / raw)
  To: Leo Yan
  Cc: Rafael J. Wysocki, Rafael Wysocki, Peter Zijlstra,
	Daniel Lezcano, Vincent Guittot, Linux Kernel Mailing List,
	Linux PM

On Sun, Aug 12, 2018 at 6:07 PM <leo.yan@linaro.org> wrote:
>
> On Sun, Aug 12, 2018 at 01:12:41PM +0200, Rafael J. Wysocki wrote:
> > On Fri, Aug 10, 2018 at 11:03 AM <leo.yan@linaro.org> wrote:

[cut]

> >
> > Assuming shot noise wakeups, if
> > drv->states[drv->state_count-1].target_residency is less than
> > TICK_USEC, the tick will be stopped for CPUs more often on average
> > with the patch applied (simply because the idle duration range for
> > which it will not be stopped is narrower then).
>
> Yes, good point, so in the new approach I try to change the code
> to compare with next tick delta rather than TICK_USEC, it can keeps
> running tick for the tick with long expire time (similiar with
> comparing with TICK_USEC) but we also can stop tick if the tick is likely
> to break idle residency.

We tried something similar and the results were not encouraging.
Please see https://lore.kernel.org/lkml/079e16b6-2e04-2518-0553-66becab226a6@tu-dresden.de/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RESEND PATCH v1 1/2] cpuidle: menu: Correct the criteria for stopping tick
  2018-08-13  8:01                 ` Rafael J. Wysocki
@ 2018-08-13  9:58                   ` leo.yan
  0 siblings, 0 replies; 13+ messages in thread
From: leo.yan @ 2018-08-13  9:58 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael Wysocki, Peter Zijlstra, Daniel Lezcano, Vincent Guittot,
	Linux Kernel Mailing List, Linux PM

On Mon, Aug 13, 2018 at 10:01:20AM +0200, Rafael J. Wysocki wrote:
> On Sun, Aug 12, 2018 at 6:07 PM <leo.yan@linaro.org> wrote:
> >
> > On Sun, Aug 12, 2018 at 01:12:41PM +0200, Rafael J. Wysocki wrote:
> > > On Fri, Aug 10, 2018 at 11:03 AM <leo.yan@linaro.org> wrote:
> 
> [cut]
> 
> > >
> > > Assuming shot noise wakeups, if
> > > drv->states[drv->state_count-1].target_residency is less than
> > > TICK_USEC, the tick will be stopped for CPUs more often on average
> > > with the patch applied (simply because the idle duration range for
> > > which it will not be stopped is narrower then).
> >
> > Yes, good point, so in the new approach I try to change the code
> > to compare with next tick delta rather than TICK_USEC, it can keeps
> > running tick for the tick with long expire time (similiar with
> > comparing with TICK_USEC) but we also can stop tick if the tick is likely
> > to break idle residency.
> 
> We tried something similar and the results were not encouraging.
> Please see https://lore.kernel.org/lkml/079e16b6-2e04-2518-0553-66becab226a6@tu-dresden.de/

I reviewed the result, in the shared page, it said to use next tick
delta and results in the power data increasing, I think this can be
explained.

If we use prediction to compare with next tick delta rather than
TICK_USEC, Thomas gave the expectation is 'This works as a I expect in
the sense of stopping the tick more often and allowing deeper sleep
states in some cases.'; but we also need to expect the change will give
more chance for powernightmares to occurring, if the expect_interval
just falls into the range [delta_next_us..TICK_USEC) then idle
governor will stop tick after comparing with the next tick delta, but
at the meantime the idle governor is very likely to select one shallow
state for expect_interval is a small value.  So though the change
gives more chance for staying deeper state but it also give more
chance for staying in shallow state for longer time.

From the testing report, I don't find it do statistics for idle state
duration time.  The email said 'No more sched tick, no more C1E
requests, but increased power.', so this is just for statistics for
idle state requests (entering/exiting times), but not for duration
statistics.  So this isn't clear for me how the difference for idle
duration.

Because the change gives more chance for powernightmares, so we need
use extra method to avoid it.  This is why I add one extra patch for
this [1], it checks for shallow state with long expire timer, for this
case we should not stop the tick.

Actually the powernightmares issue is not completely resolved so it
still impact the power data; after we really get rid of the impaction
of powernightmares, I think we may have chance to see the benefits of
comparing with next tick delta.

Based on these ideas, I worked out the patch set 'Improvement stopping
tick decision making in 'menu' idle governor' [2], the testing result
supports the idle duration improvement, which shared in the cover letter.

Please help review and let me know if it's doable or not.  Thanks for
the suggestion.

Thanks,
Leo Yan

[1] https://lkml.org/lkml/2018/8/12/84
[2] https://lkml.org/lkml/2018/8/12/82

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-08-13  9:58 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-09 17:20 [RESEND PATCH v1 0/2] Optimization CPU idle state impacted by tick Leo Yan
2018-08-09 17:20 ` [RESEND PATCH v1 1/2] cpuidle: menu: Correct the criteria for stopping tick Leo Yan
2018-08-09 20:47   ` Rafael J. Wysocki
2018-08-10  7:13     ` leo.yan
2018-08-10  7:22       ` Rafael J. Wysocki
2018-08-10  8:49         ` leo.yan
2018-08-10  9:03           ` leo.yan
2018-08-12 11:12             ` Rafael J. Wysocki
2018-08-12 16:07               ` leo.yan
2018-08-13  8:01                 ` Rafael J. Wysocki
2018-08-13  9:58                   ` leo.yan
2018-08-09 17:20 ` [RESEND PATCH v1 2/2] cpuidle: menu: Dismiss tick impaction on correction factors Leo Yan
2018-08-09 21:06   ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).