linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] cpuidle: menu: Handle stopped tick more aggressively
@ 2018-08-10  7:34 Rafael J. Wysocki
  2018-08-10  7:57 ` [PATCH v2] " Rafael J. Wysocki
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Rafael J. Wysocki @ 2018-08-10  7:34 UTC (permalink / raw)
  To: Linux PM; +Cc: Peter Zijlstra, LKML, Leo Yan, Frederic Weisbecker

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
with stopped tick) missed the case when the target residencies of
deep idle states of CPUs are above the tick boundary which may cause
the CPU to get stuck in a shallow idle state for a long time.

Say there are two CPU idle states available: one shallow, with the
target residency much below the tick boundary and one deep, with
the target residency significantly above the tick boundary.  In
that case, if the tick has been stopped already and the expected
next timer event is relatively far in the future, the governor will
assume the idle duration to be equal to TICK_USEC and it will select
the idle state for the CPU accordingly.  However, that will cause the
shallow state to be selected even though it would have been more
energy-efficient to select the deep one.

To address this issue, modify the governor to always assume idle
duration to be equal to the time till the closest timer event if
the tick is not running which will cause the selected idle states
to always match the known CPU wakeup time.

Also make it always indicate that the tick should be stopped in
that case for consistency.

Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with stopped tick)
Reported-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpuidle/governors/menu.c |   49 ++++++++++++++++++---------------------
 1 file changed, 23 insertions(+), 26 deletions(-)

Index: linux-pm/drivers/cpuidle/governors/menu.c
===================================================================
--- linux-pm.orig/drivers/cpuidle/governors/menu.c
+++ linux-pm/drivers/cpuidle/governors/menu.c
@@ -307,6 +307,18 @@ static int menu_select(struct cpuidle_dr
 	/* determine the expected residency time, round up */
 	data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length(&delta_next));
 
+	/*
+	 * If the tick is already stopped, the cost of possible short idle
+	 * duration misprediction is much higher, because the CPU may be stuck
+	 * in a shallow idle state for a long time as a result of it.  In that
+	 * case say we might mispredict and use the known time till the closest
+	 * timer event for the idle state selection.
+	 */
+	if (tick_nohz_tick_stopped()) {
+		data->predicted_us = ktime_to_us(delta_next);
+		goto select;
+	}
+
 	get_iowait_load(&nr_iowaiters, &cpu_load);
 	data->bucket = which_bucket(data->next_timer_us, nr_iowaiters);
 
@@ -344,29 +356,15 @@ static int menu_select(struct cpuidle_dr
 	 */
 	data->predicted_us = min(data->predicted_us, expected_interval);
 
-	if (tick_nohz_tick_stopped()) {
-		/*
-		 * If the tick is already stopped, the cost of possible short
-		 * idle duration misprediction is much higher, because the CPU
-		 * may be stuck in a shallow idle state for a long time as a
-		 * result of it.  In that case say we might mispredict and try
-		 * to force the CPU into a state for which we would have stopped
-		 * the tick, unless a timer is going to expire really soon
-		 * anyway.
-		 */
-		if (data->predicted_us < TICK_USEC)
-			data->predicted_us = min_t(unsigned int, TICK_USEC,
-						   ktime_to_us(delta_next));
-	} else {
-		/*
-		 * Use the performance multiplier and the user-configurable
-		 * latency_req to determine the maximum exit latency.
-		 */
-		interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
-		if (latency_req > interactivity_req)
-			latency_req = interactivity_req;
-	}
+	/*
+	 * Use the performance multiplier and the user-configurable latency_req
+	 * to determine the maximum exit latency.
+	 */
+	interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
+	if (latency_req > interactivity_req)
+		latency_req = interactivity_req;
 
+select:
 	expected_interval = data->predicted_us;
 	/*
 	 * Find the idle state with the lowest power while satisfying
@@ -403,14 +401,13 @@ static int menu_select(struct cpuidle_dr
 	 * Don't stop the tick if the selected state is a polling one or if the
 	 * expected idle duration is shorter than the tick period length.
 	 */
-	if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
-	    expected_interval < TICK_USEC) {
+	if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
+	    expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) {
 		unsigned int delta_next_us = ktime_to_us(delta_next);
 
 		*stop_tick = false;
 
-		if (!tick_nohz_tick_stopped() && idx > 0 &&
-		    drv->states[idx].target_residency > delta_next_us) {
+		if (idx > 0 && drv->states[idx].target_residency > delta_next_us) {
 			/*
 			 * The tick is not going to be stopped and the target
 			 * residency of the state to be returned is not within


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2] cpuidle: menu: Handle stopped tick more aggressively
  2018-08-10  7:34 [PATCH] cpuidle: menu: Handle stopped tick more aggressively Rafael J. Wysocki
@ 2018-08-10  7:57 ` Rafael J. Wysocki
  2018-08-10  9:20   ` leo.yan
  2018-08-10 11:15   ` [PATCH v3] " Rafael J. Wysocki
  2018-08-11 16:32 ` [PATCH] " kbuild test robot
  2018-08-12 22:13 ` Dan Carpenter
  2 siblings, 2 replies; 20+ messages in thread
From: Rafael J. Wysocki @ 2018-08-10  7:57 UTC (permalink / raw)
  To: Linux PM; +Cc: Peter Zijlstra, LKML, Leo Yan, Frederic Weisbecker

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Subject: [PATCH] cpuidle: menu: Handle stopped tick more aggressively

Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
with stopped tick) missed the case when the target residencies of
deep idle states of CPUs are above the tick boundary which may cause
the CPU to get stuck in a shallow idle state for a long time.

Say there are two CPU idle states available: one shallow, with the
target residency much below the tick boundary and one deep, with
the target residency significantly above the tick boundary.  In
that case, if the tick has been stopped already and the expected
next timer event is relatively far in the future, the governor will
assume the idle duration to be equal to TICK_USEC and it will select
the idle state for the CPU accordingly.  However, that will cause the
shallow state to be selected even though it would have been more
energy-efficient to select the deep one.

To address this issue, modify the governor to always assume idle
duration to be equal to the time till the closest timer event if
the tick is not running which will cause the selected idle states
to always match the known CPU wakeup time.

Also make it always indicate that the tick should be stopped in
that case for consistency.

Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with stopped tick)
Reported-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

-> v2: Initialize first_idx properly in the stopped tick case.

---
 drivers/cpuidle/governors/menu.c |   55 +++++++++++++++++----------------------
 1 file changed, 25 insertions(+), 30 deletions(-)

Index: linux-pm/drivers/cpuidle/governors/menu.c
===================================================================
--- linux-pm.orig/drivers/cpuidle/governors/menu.c
+++ linux-pm/drivers/cpuidle/governors/menu.c
@@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr
 {
 	struct menu_device *data = this_cpu_ptr(&menu_devices);
 	int latency_req = cpuidle_governor_latency_req(dev->cpu);
-	int i;
-	int first_idx;
-	int idx;
+	int first_idx = 0;
+	int idx, i;
 	unsigned int interactivity_req;
 	unsigned int expected_interval;
 	unsigned long nr_iowaiters, cpu_load;
@@ -307,6 +306,18 @@ static int menu_select(struct cpuidle_dr
 	/* determine the expected residency time, round up */
 	data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length(&delta_next));
 
+	/*
+	 * If the tick is already stopped, the cost of possible short idle
+	 * duration misprediction is much higher, because the CPU may be stuck
+	 * in a shallow idle state for a long time as a result of it.  In that
+	 * case say we might mispredict and use the known time till the closest
+	 * timer event for the idle state selection.
+	 */
+	if (tick_nohz_tick_stopped()) {
+		data->predicted_us = ktime_to_us(delta_next);
+		goto select;
+	}
+
 	get_iowait_load(&nr_iowaiters, &cpu_load);
 	data->bucket = which_bucket(data->next_timer_us, nr_iowaiters);
 
@@ -322,7 +333,6 @@ static int menu_select(struct cpuidle_dr
 	expected_interval = get_typical_interval(data);
 	expected_interval = min(expected_interval, data->next_timer_us);
 
-	first_idx = 0;
 	if (drv->states[0].flags & CPUIDLE_FLAG_POLLING) {
 		struct cpuidle_state *s = &drv->states[1];
 		unsigned int polling_threshold;
@@ -344,29 +354,15 @@ static int menu_select(struct cpuidle_dr
 	 */
 	data->predicted_us = min(data->predicted_us, expected_interval);
 
-	if (tick_nohz_tick_stopped()) {
-		/*
-		 * If the tick is already stopped, the cost of possible short
-		 * idle duration misprediction is much higher, because the CPU
-		 * may be stuck in a shallow idle state for a long time as a
-		 * result of it.  In that case say we might mispredict and try
-		 * to force the CPU into a state for which we would have stopped
-		 * the tick, unless a timer is going to expire really soon
-		 * anyway.
-		 */
-		if (data->predicted_us < TICK_USEC)
-			data->predicted_us = min_t(unsigned int, TICK_USEC,
-						   ktime_to_us(delta_next));
-	} else {
-		/*
-		 * Use the performance multiplier and the user-configurable
-		 * latency_req to determine the maximum exit latency.
-		 */
-		interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
-		if (latency_req > interactivity_req)
-			latency_req = interactivity_req;
-	}
+	/*
+	 * Use the performance multiplier and the user-configurable latency_req
+	 * to determine the maximum exit latency.
+	 */
+	interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
+	if (latency_req > interactivity_req)
+		latency_req = interactivity_req;
 
+select:
 	expected_interval = data->predicted_us;
 	/*
 	 * Find the idle state with the lowest power while satisfying
@@ -403,14 +399,13 @@ static int menu_select(struct cpuidle_dr
 	 * Don't stop the tick if the selected state is a polling one or if the
 	 * expected idle duration is shorter than the tick period length.
 	 */
-	if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
-	    expected_interval < TICK_USEC) {
+	if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
+	    expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) {
 		unsigned int delta_next_us = ktime_to_us(delta_next);
 
 		*stop_tick = false;
 
-		if (!tick_nohz_tick_stopped() && idx > 0 &&
-		    drv->states[idx].target_residency > delta_next_us) {
+		if (idx > 0 && drv->states[idx].target_residency > delta_next_us) {
 			/*
 			 * The tick is not going to be stopped and the target
 			 * residency of the state to be returned is not within


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] cpuidle: menu: Handle stopped tick more aggressively
  2018-08-10  7:57 ` [PATCH v2] " Rafael J. Wysocki
@ 2018-08-10  9:20   ` leo.yan
  2018-08-10 11:04     ` Rafael J. Wysocki
  2018-08-10 11:15   ` [PATCH v3] " Rafael J. Wysocki
  1 sibling, 1 reply; 20+ messages in thread
From: leo.yan @ 2018-08-10  9:20 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux PM, Peter Zijlstra, LKML, Frederic Weisbecker

On Fri, Aug 10, 2018 at 09:57:18AM +0200, Rafael J . Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Subject: [PATCH] cpuidle: menu: Handle stopped tick more aggressively
> 
> Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
> with stopped tick) missed the case when the target residencies of
> deep idle states of CPUs are above the tick boundary which may cause
> the CPU to get stuck in a shallow idle state for a long time.
> 
> Say there are two CPU idle states available: one shallow, with the
> target residency much below the tick boundary and one deep, with
> the target residency significantly above the tick boundary.  In
> that case, if the tick has been stopped already and the expected
> next timer event is relatively far in the future, the governor will
> assume the idle duration to be equal to TICK_USEC and it will select
> the idle state for the CPU accordingly.  However, that will cause the
> shallow state to be selected even though it would have been more
> energy-efficient to select the deep one.
> 
> To address this issue, modify the governor to always assume idle
> duration to be equal to the time till the closest timer event if
> the tick is not running which will cause the selected idle states
> to always match the known CPU wakeup time.
> 
> Also make it always indicate that the tick should be stopped in
> that case for consistency.
> 
> Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with stopped tick)
> Reported-by: Leo Yan <leo.yan@linaro.org>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> 
> -> v2: Initialize first_idx properly in the stopped tick case.
> 
> ---
>  drivers/cpuidle/governors/menu.c |   55 +++++++++++++++++----------------------
>  1 file changed, 25 insertions(+), 30 deletions(-)
> 
> Index: linux-pm/drivers/cpuidle/governors/menu.c
> ===================================================================
> --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> +++ linux-pm/drivers/cpuidle/governors/menu.c
> @@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr
>  {
>  	struct menu_device *data = this_cpu_ptr(&menu_devices);
>  	int latency_req = cpuidle_governor_latency_req(dev->cpu);
> -	int i;
> -	int first_idx;
> -	int idx;
> +	int first_idx = 0;
> +	int idx, i;
>  	unsigned int interactivity_req;
>  	unsigned int expected_interval;
>  	unsigned long nr_iowaiters, cpu_load;
> @@ -307,6 +306,18 @@ static int menu_select(struct cpuidle_dr
>  	/* determine the expected residency time, round up */
>  	data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length(&delta_next));
>  
> +	/*
> +	 * If the tick is already stopped, the cost of possible short idle
> +	 * duration misprediction is much higher, because the CPU may be stuck
> +	 * in a shallow idle state for a long time as a result of it.  In that
> +	 * case say we might mispredict and use the known time till the closest
> +	 * timer event for the idle state selection.
> +	 */
> +	if (tick_nohz_tick_stopped()) {
> +		data->predicted_us = ktime_to_us(delta_next);
> +		goto select;
> +	}
> +

This introduce two potential issues:

- This will totally ignore the typical pattern in idle loop; I
  observed on the mmc driver can trigger multiple times (> 10 times)
  with consistent interval;  but I have no strong opinion to not
  use next timer event for this case.

- Will this break correction factors when the CPU exit from idle?
  data->bucket is stale value ....

>  	get_iowait_load(&nr_iowaiters, &cpu_load);
>  	data->bucket = which_bucket(data->next_timer_us, nr_iowaiters);
>  
> @@ -322,7 +333,6 @@ static int menu_select(struct cpuidle_dr
>  	expected_interval = get_typical_interval(data);
>  	expected_interval = min(expected_interval, data->next_timer_us);
>  
> -	first_idx = 0;
>  	if (drv->states[0].flags & CPUIDLE_FLAG_POLLING) {
>  		struct cpuidle_state *s = &drv->states[1];
>  		unsigned int polling_threshold;
> @@ -344,29 +354,15 @@ static int menu_select(struct cpuidle_dr
>  	 */
>  	data->predicted_us = min(data->predicted_us, expected_interval);
>  
> -	if (tick_nohz_tick_stopped()) {
> -		/*
> -		 * If the tick is already stopped, the cost of possible short
> -		 * idle duration misprediction is much higher, because the CPU
> -		 * may be stuck in a shallow idle state for a long time as a
> -		 * result of it.  In that case say we might mispredict and try
> -		 * to force the CPU into a state for which we would have stopped
> -		 * the tick, unless a timer is going to expire really soon
> -		 * anyway.
> -		 */
> -		if (data->predicted_us < TICK_USEC)
> -			data->predicted_us = min_t(unsigned int, TICK_USEC,
> -						   ktime_to_us(delta_next));
> -	} else {
> -		/*
> -		 * Use the performance multiplier and the user-configurable
> -		 * latency_req to determine the maximum exit latency.
> -		 */
> -		interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
> -		if (latency_req > interactivity_req)
> -			latency_req = interactivity_req;
> -	}
> +	/*
> +	 * Use the performance multiplier and the user-configurable latency_req
> +	 * to determine the maximum exit latency.
> +	 */
> +	interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
> +	if (latency_req > interactivity_req)
> +		latency_req = interactivity_req;
>  
> +select:
>  	expected_interval = data->predicted_us;
>  	/*
>  	 * Find the idle state with the lowest power while satisfying
> @@ -403,14 +399,13 @@ static int menu_select(struct cpuidle_dr
>  	 * Don't stop the tick if the selected state is a polling one or if the
>  	 * expected idle duration is shorter than the tick period length.
>  	 */
> -	if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> -	    expected_interval < TICK_USEC) {
> +	if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> +	    expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) {

I am not sure this logic is right... Why not use below checking, so
for POLLING state we will never ask to stop the tick?

        if (drv->states[idx].flags & CPUIDLE_FLAG_POLLING ||
            (expected_interval < TICK_USEC && !tick_nohz_tick_stopped())) {

>  		unsigned int delta_next_us = ktime_to_us(delta_next);
>  
>  		*stop_tick = false;
>  
> -		if (!tick_nohz_tick_stopped() && idx > 0 &&
> -		    drv->states[idx].target_residency > delta_next_us) {
> +		if (idx > 0 && drv->states[idx].target_residency > delta_next_us) {
>  			/*
>  			 * The tick is not going to be stopped and the target
>  			 * residency of the state to be returned is not within
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] cpuidle: menu: Handle stopped tick more aggressively
  2018-08-10  9:20   ` leo.yan
@ 2018-08-10 11:04     ` Rafael J. Wysocki
  2018-08-10 12:31       ` leo.yan
  0 siblings, 1 reply; 20+ messages in thread
From: Rafael J. Wysocki @ 2018-08-10 11:04 UTC (permalink / raw)
  To: Leo Yan
  Cc: Rafael J. Wysocki, Linux PM, Peter Zijlstra,
	Linux Kernel Mailing List, Frederic Weisbecker

On Fri, Aug 10, 2018 at 11:20 AM <leo.yan@linaro.org> wrote:
>
> On Fri, Aug 10, 2018 at 09:57:18AM +0200, Rafael J . Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Subject: [PATCH] cpuidle: menu: Handle stopped tick more aggressively
> >
> > Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
> > with stopped tick) missed the case when the target residencies of
> > deep idle states of CPUs are above the tick boundary which may cause
> > the CPU to get stuck in a shallow idle state for a long time.
> >
> > Say there are two CPU idle states available: one shallow, with the
> > target residency much below the tick boundary and one deep, with
> > the target residency significantly above the tick boundary.  In
> > that case, if the tick has been stopped already and the expected
> > next timer event is relatively far in the future, the governor will
> > assume the idle duration to be equal to TICK_USEC and it will select
> > the idle state for the CPU accordingly.  However, that will cause the
> > shallow state to be selected even though it would have been more
> > energy-efficient to select the deep one.
> >
> > To address this issue, modify the governor to always assume idle
> > duration to be equal to the time till the closest timer event if
> > the tick is not running which will cause the selected idle states
> > to always match the known CPU wakeup time.
> >
> > Also make it always indicate that the tick should be stopped in
> > that case for consistency.
> >
> > Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with stopped tick)
> > Reported-by: Leo Yan <leo.yan@linaro.org>
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >
> > -> v2: Initialize first_idx properly in the stopped tick case.
> >
> > ---
> >  drivers/cpuidle/governors/menu.c |   55 +++++++++++++++++----------------------
> >  1 file changed, 25 insertions(+), 30 deletions(-)
> >
> > Index: linux-pm/drivers/cpuidle/governors/menu.c
> > ===================================================================
> > --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> > +++ linux-pm/drivers/cpuidle/governors/menu.c
> > @@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr
> >  {
> >       struct menu_device *data = this_cpu_ptr(&menu_devices);
> >       int latency_req = cpuidle_governor_latency_req(dev->cpu);
> > -     int i;
> > -     int first_idx;
> > -     int idx;
> > +     int first_idx = 0;
> > +     int idx, i;
> >       unsigned int interactivity_req;
> >       unsigned int expected_interval;
> >       unsigned long nr_iowaiters, cpu_load;
> > @@ -307,6 +306,18 @@ static int menu_select(struct cpuidle_dr
> >       /* determine the expected residency time, round up */
> >       data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length(&delta_next));
> >
> > +     /*
> > +      * If the tick is already stopped, the cost of possible short idle
> > +      * duration misprediction is much higher, because the CPU may be stuck
> > +      * in a shallow idle state for a long time as a result of it.  In that
> > +      * case say we might mispredict and use the known time till the closest
> > +      * timer event for the idle state selection.
> > +      */
> > +     if (tick_nohz_tick_stopped()) {
> > +             data->predicted_us = ktime_to_us(delta_next);
> > +             goto select;
> > +     }
> > +
>
> This introduce two potential issues:
>
> - This will totally ignore the typical pattern in idle loop; I
>   observed on the mmc driver can trigger multiple times (> 10 times)
>   with consistent interval;

I'm not sure what you mean by "ignore".

>  but I have no strong opinion to not  use next timer event for this case.

OK

> - Will this break correction factors when the CPU exit from idle?
>   data->bucket is stale value ....

Good point.

I'll send a v3 with this addressed.

>
> >       get_iowait_load(&nr_iowaiters, &cpu_load);
> >       data->bucket = which_bucket(data->next_timer_us, nr_iowaiters);
> >
> > @@ -322,7 +333,6 @@ static int menu_select(struct cpuidle_dr
> >       expected_interval = get_typical_interval(data);
> >       expected_interval = min(expected_interval, data->next_timer_us);
> >
> > -     first_idx = 0;
> >       if (drv->states[0].flags & CPUIDLE_FLAG_POLLING) {
> >               struct cpuidle_state *s = &drv->states[1];
> >               unsigned int polling_threshold;
> > @@ -344,29 +354,15 @@ static int menu_select(struct cpuidle_dr
> >        */
> >       data->predicted_us = min(data->predicted_us, expected_interval);
> >
> > -     if (tick_nohz_tick_stopped()) {
> > -             /*
> > -              * If the tick is already stopped, the cost of possible short
> > -              * idle duration misprediction is much higher, because the CPU
> > -              * may be stuck in a shallow idle state for a long time as a
> > -              * result of it.  In that case say we might mispredict and try
> > -              * to force the CPU into a state for which we would have stopped
> > -              * the tick, unless a timer is going to expire really soon
> > -              * anyway.
> > -              */
> > -             if (data->predicted_us < TICK_USEC)
> > -                     data->predicted_us = min_t(unsigned int, TICK_USEC,
> > -                                                ktime_to_us(delta_next));
> > -     } else {
> > -             /*
> > -              * Use the performance multiplier and the user-configurable
> > -              * latency_req to determine the maximum exit latency.
> > -              */
> > -             interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
> > -             if (latency_req > interactivity_req)
> > -                     latency_req = interactivity_req;
> > -     }
> > +     /*
> > +      * Use the performance multiplier and the user-configurable latency_req
> > +      * to determine the maximum exit latency.
> > +      */
> > +     interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
> > +     if (latency_req > interactivity_req)
> > +             latency_req = interactivity_req;
> >
> > +select:
> >       expected_interval = data->predicted_us;
> >       /*
> >        * Find the idle state with the lowest power while satisfying
> > @@ -403,14 +399,13 @@ static int menu_select(struct cpuidle_dr
> >        * Don't stop the tick if the selected state is a polling one or if the
> >        * expected idle duration is shorter than the tick period length.
> >        */
> > -     if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> > -         expected_interval < TICK_USEC) {
> > +     if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> > +         expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) {
>
> I am not sure this logic is right... Why not use below checking, so
> for POLLING state we will never ask to stop the tick?
>
>         if (drv->states[idx].flags & CPUIDLE_FLAG_POLLING ||
>             (expected_interval < TICK_USEC && !tick_nohz_tick_stopped())) {
>

The only effect of it would be setting stop_tick to false, but why
would that matter?

> >               unsigned int delta_next_us = ktime_to_us(delta_next);
> >
> >               *stop_tick = false;
> >
> > -             if (!tick_nohz_tick_stopped() && idx > 0 &&
> > -                 drv->states[idx].target_residency > delta_next_us) {
> > +             if (idx > 0 && drv->states[idx].target_residency > delta_next_us) {
> >                       /*
> >                        * The tick is not going to be stopped and the target
> >                        * residency of the state to be returned is not within
> >

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3] cpuidle: menu: Handle stopped tick more aggressively
  2018-08-10  7:57 ` [PATCH v2] " Rafael J. Wysocki
  2018-08-10  9:20   ` leo.yan
@ 2018-08-10 11:15   ` Rafael J. Wysocki
  2018-08-12 14:55     ` leo.yan
  2018-08-13 11:26     ` [PATCH v4] " Rafael J. Wysocki
  1 sibling, 2 replies; 20+ messages in thread
From: Rafael J. Wysocki @ 2018-08-10 11:15 UTC (permalink / raw)
  To: Linux PM; +Cc: Peter Zijlstra, LKML, Leo Yan, Frederic Weisbecker

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
with stopped tick) missed the case when the target residencies of
deep idle states of CPUs are above the tick boundary which may cause
the CPU to get stuck in a shallow idle state for a long time.

Say there are two CPU idle states available: one shallow, with the
target residency much below the tick boundary and one deep, with
the target residency significantly above the tick boundary.  In
that case, if the tick has been stopped already and the expected
next timer event is relatively far in the future, the governor will
assume the idle duration to be equal to TICK_USEC and it will select
the idle state for the CPU accordingly.  However, that will cause the
shallow state to be selected even though it would have been more
energy-efficient to select the deep one.

To address this issue, modify the governor to always assume idle
duration to be equal to the time till the closest timer event if
the tick is not running which will cause the selected idle states
to always match the known CPU wakeup time.

Also make it always indicate that the tick should be stopped in
that case for consistency.

Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with stopped tick)
Reported-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

-> v2: Initialize first_idx properly in the stopped tick case.

-> v3: Compute data->bucket before checking whether or not the tick has been
       stopped already to prevent it from becoming stale.

---
 drivers/cpuidle/governors/menu.c |   55 +++++++++++++++++----------------------
 1 file changed, 25 insertions(+), 30 deletions(-)

Index: linux-pm/drivers/cpuidle/governors/menu.c
===================================================================
--- linux-pm.orig/drivers/cpuidle/governors/menu.c
+++ linux-pm/drivers/cpuidle/governors/menu.c
@@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr
 {
 	struct menu_device *data = this_cpu_ptr(&menu_devices);
 	int latency_req = cpuidle_governor_latency_req(dev->cpu);
-	int i;
-	int first_idx;
-	int idx;
+	int first_idx = 0;
+	int idx, i;
 	unsigned int interactivity_req;
 	unsigned int expected_interval;
 	unsigned long nr_iowaiters, cpu_load;
@@ -311,6 +310,18 @@ static int menu_select(struct cpuidle_dr
 	data->bucket = which_bucket(data->next_timer_us, nr_iowaiters);
 
 	/*
+	 * If the tick is already stopped, the cost of possible short idle
+	 * duration misprediction is much higher, because the CPU may be stuck
+	 * in a shallow idle state for a long time as a result of it.  In that
+	 * case say we might mispredict and use the known time till the closest
+	 * timer event for the idle state selection.
+	 */
+	if (tick_nohz_tick_stopped()) {
+		data->predicted_us = ktime_to_us(delta_next);
+		goto select;
+	}
+
+	/*
 	 * Force the result of multiplication to be 64 bits even if both
 	 * operands are 32 bits.
 	 * Make sure to round up for half microseconds.
@@ -322,7 +333,6 @@ static int menu_select(struct cpuidle_dr
 	expected_interval = get_typical_interval(data);
 	expected_interval = min(expected_interval, data->next_timer_us);
 
-	first_idx = 0;
 	if (drv->states[0].flags & CPUIDLE_FLAG_POLLING) {
 		struct cpuidle_state *s = &drv->states[1];
 		unsigned int polling_threshold;
@@ -344,29 +354,15 @@ static int menu_select(struct cpuidle_dr
 	 */
 	data->predicted_us = min(data->predicted_us, expected_interval);
 
-	if (tick_nohz_tick_stopped()) {
-		/*
-		 * If the tick is already stopped, the cost of possible short
-		 * idle duration misprediction is much higher, because the CPU
-		 * may be stuck in a shallow idle state for a long time as a
-		 * result of it.  In that case say we might mispredict and try
-		 * to force the CPU into a state for which we would have stopped
-		 * the tick, unless a timer is going to expire really soon
-		 * anyway.
-		 */
-		if (data->predicted_us < TICK_USEC)
-			data->predicted_us = min_t(unsigned int, TICK_USEC,
-						   ktime_to_us(delta_next));
-	} else {
-		/*
-		 * Use the performance multiplier and the user-configurable
-		 * latency_req to determine the maximum exit latency.
-		 */
-		interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
-		if (latency_req > interactivity_req)
-			latency_req = interactivity_req;
-	}
+	/*
+	 * Use the performance multiplier and the user-configurable latency_req
+	 * to determine the maximum exit latency.
+	 */
+	interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
+	if (latency_req > interactivity_req)
+		latency_req = interactivity_req;
 
+select:
 	expected_interval = data->predicted_us;
 	/*
 	 * Find the idle state with the lowest power while satisfying
@@ -403,14 +399,13 @@ static int menu_select(struct cpuidle_dr
 	 * Don't stop the tick if the selected state is a polling one or if the
 	 * expected idle duration is shorter than the tick period length.
 	 */
-	if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
-	    expected_interval < TICK_USEC) {
+	if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
+	    expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) {
 		unsigned int delta_next_us = ktime_to_us(delta_next);
 
 		*stop_tick = false;
 
-		if (!tick_nohz_tick_stopped() && idx > 0 &&
-		    drv->states[idx].target_residency > delta_next_us) {
+		if (idx > 0 && drv->states[idx].target_residency > delta_next_us) {
 			/*
 			 * The tick is not going to be stopped and the target
 			 * residency of the state to be returned is not within


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] cpuidle: menu: Handle stopped tick more aggressively
  2018-08-10 11:04     ` Rafael J. Wysocki
@ 2018-08-10 12:31       ` leo.yan
  2018-08-12 10:07         ` Rafael J. Wysocki
  0 siblings, 1 reply; 20+ messages in thread
From: leo.yan @ 2018-08-10 12:31 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Linux PM, Peter Zijlstra,
	Linux Kernel Mailing List, Frederic Weisbecker

On Fri, Aug 10, 2018 at 01:04:22PM +0200, Rafael J. Wysocki wrote:
> On Fri, Aug 10, 2018 at 11:20 AM <leo.yan@linaro.org> wrote:
> >
> > On Fri, Aug 10, 2018 at 09:57:18AM +0200, Rafael J . Wysocki wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > Subject: [PATCH] cpuidle: menu: Handle stopped tick more aggressively
> > >
> > > Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
> > > with stopped tick) missed the case when the target residencies of
> > > deep idle states of CPUs are above the tick boundary which may cause
> > > the CPU to get stuck in a shallow idle state for a long time.
> > >
> > > Say there are two CPU idle states available: one shallow, with the
> > > target residency much below the tick boundary and one deep, with
> > > the target residency significantly above the tick boundary.  In
> > > that case, if the tick has been stopped already and the expected
> > > next timer event is relatively far in the future, the governor will
> > > assume the idle duration to be equal to TICK_USEC and it will select
> > > the idle state for the CPU accordingly.  However, that will cause the
> > > shallow state to be selected even though it would have been more
> > > energy-efficient to select the deep one.
> > >
> > > To address this issue, modify the governor to always assume idle
> > > duration to be equal to the time till the closest timer event if
> > > the tick is not running which will cause the selected idle states
> > > to always match the known CPU wakeup time.
> > >
> > > Also make it always indicate that the tick should be stopped in
> > > that case for consistency.
> > >
> > > Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with stopped tick)
> > > Reported-by: Leo Yan <leo.yan@linaro.org>
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > ---
> > >
> > > -> v2: Initialize first_idx properly in the stopped tick case.
> > >
> > > ---
> > >  drivers/cpuidle/governors/menu.c |   55 +++++++++++++++++----------------------
> > >  1 file changed, 25 insertions(+), 30 deletions(-)
> > >
> > > Index: linux-pm/drivers/cpuidle/governors/menu.c
> > > ===================================================================
> > > --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> > > +++ linux-pm/drivers/cpuidle/governors/menu.c
> > > @@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr
> > >  {
> > >       struct menu_device *data = this_cpu_ptr(&menu_devices);
> > >       int latency_req = cpuidle_governor_latency_req(dev->cpu);
> > > -     int i;
> > > -     int first_idx;
> > > -     int idx;
> > > +     int first_idx = 0;
> > > +     int idx, i;
> > >       unsigned int interactivity_req;
> > >       unsigned int expected_interval;
> > >       unsigned long nr_iowaiters, cpu_load;
> > > @@ -307,6 +306,18 @@ static int menu_select(struct cpuidle_dr
> > >       /* determine the expected residency time, round up */
> > >       data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length(&delta_next));
> > >
> > > +     /*
> > > +      * If the tick is already stopped, the cost of possible short idle
> > > +      * duration misprediction is much higher, because the CPU may be stuck
> > > +      * in a shallow idle state for a long time as a result of it.  In that
> > > +      * case say we might mispredict and use the known time till the closest
> > > +      * timer event for the idle state selection.
> > > +      */
> > > +     if (tick_nohz_tick_stopped()) {
> > > +             data->predicted_us = ktime_to_us(delta_next);
> > > +             goto select;
> > > +     }
> > > +
> >
> > This introduce two potential issues:
> >
> > - This will totally ignore the typical pattern in idle loop; I
> >   observed on the mmc driver can trigger multiple times (> 10 times)
> >   with consistent interval;
> 
> I'm not sure what you mean by "ignore".

You could see after move code from blow to this position, the typical
pattern interval will not be accounted; so if in the middle of idles
there have a bunch of interrupts with fix pattern, the upper code
cannot detect this pattern anymore.

[...]

> > > -     if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> > > -         expected_interval < TICK_USEC) {
> > > +     if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> > > +         expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) {
> >
> > I am not sure this logic is right... Why not use below checking, so
> > for POLLING state we will never ask to stop the tick?
> >
> >         if (drv->states[idx].flags & CPUIDLE_FLAG_POLLING ||
> >             (expected_interval < TICK_USEC && !tick_nohz_tick_stopped())) {
> >
> 
> The only effect of it would be setting stop_tick to false, but why
> would that matter?

Please consider below situation, not sure if this case is existed or
not:

  step1: first time: enter one idle state with stopping tick;
  step2: second time: select POLLING state and tick_nohz_tick_stopped()
  is true;

So in step2, it cannot set stop_tick to false with below sentence.

> > >               unsigned int delta_next_us = ktime_to_us(delta_next);
> > >
> > >               *stop_tick = false;

[...]

Thanks,
Leo Yan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] cpuidle: menu: Handle stopped tick more aggressively
  2018-08-10  7:34 [PATCH] cpuidle: menu: Handle stopped tick more aggressively Rafael J. Wysocki
  2018-08-10  7:57 ` [PATCH v2] " Rafael J. Wysocki
@ 2018-08-11 16:32 ` kbuild test robot
  2018-08-12 22:13 ` Dan Carpenter
  2 siblings, 0 replies; 20+ messages in thread
From: kbuild test robot @ 2018-08-11 16:32 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: kbuild-all, Linux PM, Peter Zijlstra, LKML, Leo Yan, Frederic Weisbecker

[-- Attachment #1: Type: text/plain, Size: 15443 bytes --]

Hi Rafael,

I love your patch! Perhaps something to improve:

[auto build test WARNING on pm/linux-next]
[also build test WARNING on v4.18-rc8 next-20180810]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Rafael-J-Wysocki/cpuidle-menu-Handle-stopped-tick-more-aggressively/20180811-191914
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: mips-allyesconfig (attached as .config)
compiler: mips-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.2.0 make.cross ARCH=mips 

Note: it may well be a FALSE warning. FWIW you are at least aware of it now.
http://gcc.gnu.org/wiki/Better_Uninitialized_Warnings

All warnings (new ones prefixed by >>):

   drivers/cpuidle/governors/menu.c: In function 'menu_select':
>> drivers/cpuidle/governors/menu.c:289:6: warning: 'first_idx' may be used uninitialized in this function [-Wmaybe-uninitialized]
     int first_idx;
         ^~~~~~~~~

vim +/first_idx +289 drivers/cpuidle/governors/menu.c

1f85f87d4 Arjan van de Ven              2010-05-24  276  
4f86d3a8e Len Brown                     2007-10-03  277  /**
4f86d3a8e Len Brown                     2007-10-03  278   * menu_select - selects the next idle state to enter
46bcfad7a Deepthi Dharwar               2011-10-28  279   * @drv: cpuidle driver containing state data
4f86d3a8e Len Brown                     2007-10-03  280   * @dev: the CPU
45f1ff59e Rafael J. Wysocki             2018-03-22  281   * @stop_tick: indication on whether or not to stop the tick
4f86d3a8e Len Brown                     2007-10-03  282   */
45f1ff59e Rafael J. Wysocki             2018-03-22  283  static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
45f1ff59e Rafael J. Wysocki             2018-03-22  284  		       bool *stop_tick)
4f86d3a8e Len Brown                     2007-10-03  285  {
229b6863b Christoph Lameter             2014-08-17  286  	struct menu_device *data = this_cpu_ptr(&menu_devices);
0fc784fb0 Rafael J. Wysocki             2018-05-30  287  	int latency_req = cpuidle_governor_latency_req(dev->cpu);
4f86d3a8e Len Brown                     2007-10-03  288  	int i;
3ed09c945 Nicholas Piggin               2017-06-26 @289  	int first_idx;
3ed09c945 Nicholas Piggin               2017-06-26  290  	int idx;
96e95182e tuukka.tikkanen@linaro.org    2014-02-24  291  	unsigned int interactivity_req;
e132b9b3b Rik van Riel                  2016-03-16  292  	unsigned int expected_interval;
372ba8cb4 Mel Gorman                    2014-08-06  293  	unsigned long nr_iowaiters, cpu_load;
296bb1e51 Rafael J. Wysocki             2018-04-05  294  	ktime_t delta_next;
4f86d3a8e Len Brown                     2007-10-03  295  
672917dcc Corrado Zoccolo               2009-09-21  296  	if (data->needs_update) {
46bcfad7a Deepthi Dharwar               2011-10-28  297  		menu_update(drv, dev);
672917dcc Corrado Zoccolo               2009-09-21  298  		data->needs_update = 0;
672917dcc Corrado Zoccolo               2009-09-21  299  	}
672917dcc Corrado Zoccolo               2009-09-21  300  
69d25870f Arjan van de Ven              2009-09-21  301  	/* Special case when user has set very strict latency requirement */
45f1ff59e Rafael J. Wysocki             2018-03-22  302  	if (unlikely(latency_req == 0)) {
45f1ff59e Rafael J. Wysocki             2018-03-22  303  		*stop_tick = false;
a2bd92023 venkatesh.pallipadi@intel.com 2008-07-30  304  		return 0;
45f1ff59e Rafael J. Wysocki             2018-03-22  305  	}
a2bd92023 venkatesh.pallipadi@intel.com 2008-07-30  306  
69d25870f Arjan van de Ven              2009-09-21  307  	/* determine the expected residency time, round up */
296bb1e51 Rafael J. Wysocki             2018-04-05  308  	data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length(&delta_next));
69d25870f Arjan van de Ven              2009-09-21  309  
5f9f09809 Rafael J. Wysocki             2018-08-10  310  	/*
5f9f09809 Rafael J. Wysocki             2018-08-10  311  	 * If the tick is already stopped, the cost of possible short idle
5f9f09809 Rafael J. Wysocki             2018-08-10  312  	 * duration misprediction is much higher, because the CPU may be stuck
5f9f09809 Rafael J. Wysocki             2018-08-10  313  	 * in a shallow idle state for a long time as a result of it.  In that
5f9f09809 Rafael J. Wysocki             2018-08-10  314  	 * case say we might mispredict and use the known time till the closest
5f9f09809 Rafael J. Wysocki             2018-08-10  315  	 * timer event for the idle state selection.
5f9f09809 Rafael J. Wysocki             2018-08-10  316  	 */
5f9f09809 Rafael J. Wysocki             2018-08-10  317  	if (tick_nohz_tick_stopped()) {
5f9f09809 Rafael J. Wysocki             2018-08-10  318  		data->predicted_us = ktime_to_us(delta_next);
5f9f09809 Rafael J. Wysocki             2018-08-10  319  		goto select;
5f9f09809 Rafael J. Wysocki             2018-08-10  320  	}
5f9f09809 Rafael J. Wysocki             2018-08-10  321  
372ba8cb4 Mel Gorman                    2014-08-06  322  	get_iowait_load(&nr_iowaiters, &cpu_load);
64b4ca5cb Mel Gorman                    2014-08-06  323  	data->bucket = which_bucket(data->next_timer_us, nr_iowaiters);
69d25870f Arjan van de Ven              2009-09-21  324  
69d25870f Arjan van de Ven              2009-09-21  325  	/*
51f245b89 Tuukka Tikkanen               2013-08-14  326  	 * Force the result of multiplication to be 64 bits even if both
51f245b89 Tuukka Tikkanen               2013-08-14  327  	 * operands are 32 bits.
51f245b89 Tuukka Tikkanen               2013-08-14  328  	 * Make sure to round up for half microseconds.
51f245b89 Tuukka Tikkanen               2013-08-14  329  	 */
ee3c86f35 Javi Merino                   2015-04-16  330  	data->predicted_us = DIV_ROUND_CLOSEST_ULL((uint64_t)data->next_timer_us *
51f245b89 Tuukka Tikkanen               2013-08-14  331  					 data->correction_factor[data->bucket],
69d25870f Arjan van de Ven              2009-09-21  332  					 RESOLUTION * DECAY);
69d25870f Arjan van de Ven              2009-09-21  333  
e132b9b3b Rik van Riel                  2016-03-16  334  	expected_interval = get_typical_interval(data);
e132b9b3b Rik van Riel                  2016-03-16  335  	expected_interval = min(expected_interval, data->next_timer_us);
96e95182e tuukka.tikkanen@linaro.org    2014-02-24  336  
dc2251bf9 Rafael J. Wysocki             2017-08-23  337  	first_idx = 0;
dc2251bf9 Rafael J. Wysocki             2017-08-23  338  	if (drv->states[0].flags & CPUIDLE_FLAG_POLLING) {
dc2251bf9 Rafael J. Wysocki             2017-08-23  339  		struct cpuidle_state *s = &drv->states[1];
0c313cb20 Rafael J. Wysocki             2016-03-20  340  		unsigned int polling_threshold;
0c313cb20 Rafael J. Wysocki             2016-03-20  341  
96e95182e tuukka.tikkanen@linaro.org    2014-02-24  342  		/*
69d25870f Arjan van de Ven              2009-09-21  343  		 * We want to default to C1 (hlt), not to busy polling
e132b9b3b Rik van Riel                  2016-03-16  344  		 * unless the timer is happening really really soon, or
e132b9b3b Rik van Riel                  2016-03-16  345  		 * C1's exit latency exceeds the user configured limit.
69d25870f Arjan van de Ven              2009-09-21  346  		 */
0c313cb20 Rafael J. Wysocki             2016-03-20  347  		polling_threshold = max_t(unsigned int, 20, s->target_residency);
0c313cb20 Rafael J. Wysocki             2016-03-20  348  		if (data->next_timer_us > polling_threshold &&
0c313cb20 Rafael J. Wysocki             2016-03-20  349  		    latency_req > s->exit_latency && !s->disabled &&
dc2251bf9 Rafael J. Wysocki             2017-08-23  350  		    !dev->states_usage[1].disable)
dc2251bf9 Rafael J. Wysocki             2017-08-23  351  			first_idx = 1;
9c4b2867e Rafael J. Wysocki             2016-01-14  352  	}
4f86d3a8e Len Brown                     2007-10-03  353  
71abbbf85 Ai Li                         2010-08-09  354  	/*
e132b9b3b Rik van Riel                  2016-03-16  355  	 * Use the lowest expected idle interval to pick the idle state.
e132b9b3b Rik van Riel                  2016-03-16  356  	 */
e132b9b3b Rik van Riel                  2016-03-16  357  	data->predicted_us = min(data->predicted_us, expected_interval);
e132b9b3b Rik van Riel                  2016-03-16  358  
e132b9b3b Rik van Riel                  2016-03-16  359  	/*
5f9f09809 Rafael J. Wysocki             2018-08-10  360  	 * Use the performance multiplier and the user-configurable latency_req
5f9f09809 Rafael J. Wysocki             2018-08-10  361  	 * to determine the maximum exit latency.
e132b9b3b Rik van Riel                  2016-03-16  362  	 */
e132b9b3b Rik van Riel                  2016-03-16  363  	interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
e132b9b3b Rik van Riel                  2016-03-16  364  	if (latency_req > interactivity_req)
e132b9b3b Rik van Riel                  2016-03-16  365  		latency_req = interactivity_req;
e132b9b3b Rik van Riel                  2016-03-16  366  
5f9f09809 Rafael J. Wysocki             2018-08-10  367  select:
45f1ff59e Rafael J. Wysocki             2018-03-22  368  	expected_interval = data->predicted_us;
e132b9b3b Rik van Riel                  2016-03-16  369  	/*
71abbbf85 Ai Li                         2010-08-09  370  	 * Find the idle state with the lowest power while satisfying
71abbbf85 Ai Li                         2010-08-09  371  	 * our constraints.
71abbbf85 Ai Li                         2010-08-09  372  	 */
3ed09c945 Nicholas Piggin               2017-06-26  373  	idx = -1;
3ed09c945 Nicholas Piggin               2017-06-26  374  	for (i = first_idx; i < drv->state_count; i++) {
46bcfad7a Deepthi Dharwar               2011-10-28  375  		struct cpuidle_state *s = &drv->states[i];
dc7fd275a ShuoX Liu                     2012-07-03  376  		struct cpuidle_state_usage *su = &dev->states_usage[i];
4f86d3a8e Len Brown                     2007-10-03  377  
cbc9ef028 Rafael J. Wysocki             2012-07-03  378  		if (s->disabled || su->disable)
3a53396b0 ShuoX Liu                     2012-03-28  379  			continue;
3ed09c945 Nicholas Piggin               2017-06-26  380  		if (idx == -1)
3ed09c945 Nicholas Piggin               2017-06-26  381  			idx = i; /* first enabled state */
148519120 Rafael J. Wysocki             2013-07-27  382  		if (s->target_residency > data->predicted_us)
8e37e1a2a Alex Shi                      2017-01-12  383  			break;
45f1ff59e Rafael J. Wysocki             2018-03-22  384  		if (s->exit_latency > latency_req) {
45f1ff59e Rafael J. Wysocki             2018-03-22  385  			/*
45f1ff59e Rafael J. Wysocki             2018-03-22  386  			 * If we break out of the loop for latency reasons, use
45f1ff59e Rafael J. Wysocki             2018-03-22  387  			 * the target residency of the selected state as the
45f1ff59e Rafael J. Wysocki             2018-03-22  388  			 * expected idle duration so that the tick is retained
45f1ff59e Rafael J. Wysocki             2018-03-22  389  			 * as long as that target residency is low enough.
45f1ff59e Rafael J. Wysocki             2018-03-22  390  			 */
45f1ff59e Rafael J. Wysocki             2018-03-22  391  			expected_interval = drv->states[idx].target_residency;
8e37e1a2a Alex Shi                      2017-01-12  392  			break;
45f1ff59e Rafael J. Wysocki             2018-03-22  393  		}
3ed09c945 Nicholas Piggin               2017-06-26  394  		idx = i;
71abbbf85 Ai Li                         2010-08-09  395  	}
4f86d3a8e Len Brown                     2007-10-03  396  
3ed09c945 Nicholas Piggin               2017-06-26  397  	if (idx == -1)
3ed09c945 Nicholas Piggin               2017-06-26  398  		idx = 0; /* No states enabled. Must use 0. */
3ed09c945 Nicholas Piggin               2017-06-26  399  
45f1ff59e Rafael J. Wysocki             2018-03-22  400  	/*
45f1ff59e Rafael J. Wysocki             2018-03-22  401  	 * Don't stop the tick if the selected state is a polling one or if the
45f1ff59e Rafael J. Wysocki             2018-03-22  402  	 * expected idle duration is shorter than the tick period length.
45f1ff59e Rafael J. Wysocki             2018-03-22  403  	 */
5f9f09809 Rafael J. Wysocki             2018-08-10  404  	if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
5f9f09809 Rafael J. Wysocki             2018-08-10  405  	    expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) {
296bb1e51 Rafael J. Wysocki             2018-04-05  406  		unsigned int delta_next_us = ktime_to_us(delta_next);
296bb1e51 Rafael J. Wysocki             2018-04-05  407  
45f1ff59e Rafael J. Wysocki             2018-03-22  408  		*stop_tick = false;
45f1ff59e Rafael J. Wysocki             2018-03-22  409  
5f9f09809 Rafael J. Wysocki             2018-08-10  410  		if (idx > 0 && drv->states[idx].target_residency > delta_next_us) {
296bb1e51 Rafael J. Wysocki             2018-04-05  411  			/*
296bb1e51 Rafael J. Wysocki             2018-04-05  412  			 * The tick is not going to be stopped and the target
296bb1e51 Rafael J. Wysocki             2018-04-05  413  			 * residency of the state to be returned is not within
296bb1e51 Rafael J. Wysocki             2018-04-05  414  			 * the time until the next timer event including the
296bb1e51 Rafael J. Wysocki             2018-04-05  415  			 * tick, so try to correct that.
296bb1e51 Rafael J. Wysocki             2018-04-05  416  			 */
296bb1e51 Rafael J. Wysocki             2018-04-05  417  			for (i = idx - 1; i >= 0; i--) {
296bb1e51 Rafael J. Wysocki             2018-04-05  418  			    if (drv->states[i].disabled ||
296bb1e51 Rafael J. Wysocki             2018-04-05  419  			        dev->states_usage[i].disable)
296bb1e51 Rafael J. Wysocki             2018-04-05  420  					continue;
296bb1e51 Rafael J. Wysocki             2018-04-05  421  
296bb1e51 Rafael J. Wysocki             2018-04-05  422  				idx = i;
296bb1e51 Rafael J. Wysocki             2018-04-05  423  				if (drv->states[i].target_residency <= delta_next_us)
296bb1e51 Rafael J. Wysocki             2018-04-05  424  					break;
296bb1e51 Rafael J. Wysocki             2018-04-05  425  			}
296bb1e51 Rafael J. Wysocki             2018-04-05  426  		}
296bb1e51 Rafael J. Wysocki             2018-04-05  427  	}
296bb1e51 Rafael J. Wysocki             2018-04-05  428  
3ed09c945 Nicholas Piggin               2017-06-26  429  	data->last_state_idx = idx;
3ed09c945 Nicholas Piggin               2017-06-26  430  
69d25870f Arjan van de Ven              2009-09-21  431  	return data->last_state_idx;
4f86d3a8e Len Brown                     2007-10-03  432  }
4f86d3a8e Len Brown                     2007-10-03  433  

:::::: The code at line 289 was first introduced by commit
:::::: 3ed09c94580de9d5b18cc35d1f97e9f24cd9233b cpuidle: menu: allow state 0 to be disabled

:::::: TO: Nicholas Piggin <npiggin@gmail.com>
:::::: CC: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 56510 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] cpuidle: menu: Handle stopped tick more aggressively
  2018-08-10 12:31       ` leo.yan
@ 2018-08-12 10:07         ` Rafael J. Wysocki
  2018-08-12 13:44           ` leo.yan
  0 siblings, 1 reply; 20+ messages in thread
From: Rafael J. Wysocki @ 2018-08-12 10:07 UTC (permalink / raw)
  To: Leo Yan
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Linux PM, Peter Zijlstra,
	Linux Kernel Mailing List, Frederic Weisbecker

On Fri, Aug 10, 2018 at 2:31 PM <leo.yan@linaro.org> wrote:
>
> On Fri, Aug 10, 2018 at 01:04:22PM +0200, Rafael J. Wysocki wrote:
> > On Fri, Aug 10, 2018 at 11:20 AM <leo.yan@linaro.org> wrote:
> > >
> > > On Fri, Aug 10, 2018 at 09:57:18AM +0200, Rafael J . Wysocki wrote:
> > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > Subject: [PATCH] cpuidle: menu: Handle stopped tick more aggressively
> > > >
> > > > Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
> > > > with stopped tick) missed the case when the target residencies of
> > > > deep idle states of CPUs are above the tick boundary which may cause
> > > > the CPU to get stuck in a shallow idle state for a long time.
> > > >
> > > > Say there are two CPU idle states available: one shallow, with the
> > > > target residency much below the tick boundary and one deep, with
> > > > the target residency significantly above the tick boundary.  In
> > > > that case, if the tick has been stopped already and the expected
> > > > next timer event is relatively far in the future, the governor will
> > > > assume the idle duration to be equal to TICK_USEC and it will select
> > > > the idle state for the CPU accordingly.  However, that will cause the
> > > > shallow state to be selected even though it would have been more
> > > > energy-efficient to select the deep one.
> > > >
> > > > To address this issue, modify the governor to always assume idle
> > > > duration to be equal to the time till the closest timer event if
> > > > the tick is not running which will cause the selected idle states
> > > > to always match the known CPU wakeup time.
> > > >
> > > > Also make it always indicate that the tick should be stopped in
> > > > that case for consistency.
> > > >
> > > > Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with stopped tick)
> > > > Reported-by: Leo Yan <leo.yan@linaro.org>
> > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > ---
> > > >
> > > > -> v2: Initialize first_idx properly in the stopped tick case.
> > > >
> > > > ---
> > > >  drivers/cpuidle/governors/menu.c |   55 +++++++++++++++++----------------------
> > > >  1 file changed, 25 insertions(+), 30 deletions(-)
> > > >
> > > > Index: linux-pm/drivers/cpuidle/governors/menu.c
> > > > ===================================================================
> > > > --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> > > > +++ linux-pm/drivers/cpuidle/governors/menu.c
> > > > @@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr
> > > >  {
> > > >       struct menu_device *data = this_cpu_ptr(&menu_devices);
> > > >       int latency_req = cpuidle_governor_latency_req(dev->cpu);
> > > > -     int i;
> > > > -     int first_idx;
> > > > -     int idx;
> > > > +     int first_idx = 0;
> > > > +     int idx, i;
> > > >       unsigned int interactivity_req;
> > > >       unsigned int expected_interval;
> > > >       unsigned long nr_iowaiters, cpu_load;
> > > > @@ -307,6 +306,18 @@ static int menu_select(struct cpuidle_dr
> > > >       /* determine the expected residency time, round up */
> > > >       data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length(&delta_next));
> > > >
> > > > +     /*
> > > > +      * If the tick is already stopped, the cost of possible short idle
> > > > +      * duration misprediction is much higher, because the CPU may be stuck
> > > > +      * in a shallow idle state for a long time as a result of it.  In that
> > > > +      * case say we might mispredict and use the known time till the closest
> > > > +      * timer event for the idle state selection.
> > > > +      */
> > > > +     if (tick_nohz_tick_stopped()) {
> > > > +             data->predicted_us = ktime_to_us(delta_next);
> > > > +             goto select;
> > > > +     }
> > > > +
> > >
> > > This introduce two potential issues:
> > >
> > > - This will totally ignore the typical pattern in idle loop; I
> > >   observed on the mmc driver can trigger multiple times (> 10 times)
> > >   with consistent interval;
> >
> > I'm not sure what you mean by "ignore".
>
> You could see after move code from blow to this position, the typical
> pattern interval will not be accounted; so if in the middle of idles
> there have a bunch of interrupts with fix pattern, the upper code
> cannot detect this pattern anymore.

I'm not really following you here.

The part of the code skipped for tick_nohz_tick_stopped() doesn't
update the data at all AFAICS.  It only computes some values that
would be discarded later anyway, so I'm not sure what the point of
running that computation is.

The statistics are updated by menu_update() and that still runs and it
will take the actual wakeup events into account, won't it?

> [...]
>
> > > > -     if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> > > > -         expected_interval < TICK_USEC) {
> > > > +     if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> > > > +         expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) {
> > >
> > > I am not sure this logic is right... Why not use below checking, so
> > > for POLLING state we will never ask to stop the tick?
> > >
> > >         if (drv->states[idx].flags & CPUIDLE_FLAG_POLLING ||
> > >             (expected_interval < TICK_USEC && !tick_nohz_tick_stopped())) {
> > >
> >
> > The only effect of it would be setting stop_tick to false, but why
> > would that matter?
>
> Please consider below situation, not sure if this case is existed or
> not:
>
>   step1: first time: enter one idle state with stopping tick;
>   step2: second time: select POLLING state and tick_nohz_tick_stopped()
>   is true;
>
> So in step2, it cannot set stop_tick to false with below sentence.
>
> > > >               unsigned int delta_next_us = ktime_to_us(delta_next);
> > > >
> > > >               *stop_tick = false;

But setting *stop_tick has no effect as far as the current code is
concerned (up to the bug fixed by the other patch).

Also the POLLING state can only be selected if there are no other
states matching delta_next available in that case which means that
there will be a timer to break the polling loop soon enough (and BTW
the polling has a built-in timeout too), so I don't really see a
problem here.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] cpuidle: menu: Handle stopped tick more aggressively
  2018-08-12 10:07         ` Rafael J. Wysocki
@ 2018-08-12 13:44           ` leo.yan
  2018-08-13  7:58             ` Rafael J. Wysocki
  0 siblings, 1 reply; 20+ messages in thread
From: leo.yan @ 2018-08-12 13:44 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Linux PM, Peter Zijlstra,
	Linux Kernel Mailing List, Frederic Weisbecker

On Sun, Aug 12, 2018 at 12:07:45PM +0200, Rafael J. Wysocki wrote:

[...]

> > > > > --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> > > > > +++ linux-pm/drivers/cpuidle/governors/menu.c
> > > > > @@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr
> > > > >  {
> > > > >       struct menu_device *data = this_cpu_ptr(&menu_devices);
> > > > >       int latency_req = cpuidle_governor_latency_req(dev->cpu);
> > > > > -     int i;
> > > > > -     int first_idx;
> > > > > -     int idx;
> > > > > +     int first_idx = 0;
> > > > > +     int idx, i;
> > > > >       unsigned int interactivity_req;
> > > > >       unsigned int expected_interval;
> > > > >       unsigned long nr_iowaiters, cpu_load;
> > > > > @@ -307,6 +306,18 @@ static int menu_select(struct cpuidle_dr
> > > > >       /* determine the expected residency time, round up */
> > > > >       data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length(&delta_next));
> > > > >
> > > > > +     /*
> > > > > +      * If the tick is already stopped, the cost of possible short idle
> > > > > +      * duration misprediction is much higher, because the CPU may be stuck
> > > > > +      * in a shallow idle state for a long time as a result of it.  In that
> > > > > +      * case say we might mispredict and use the known time till the closest
> > > > > +      * timer event for the idle state selection.
> > > > > +      */
> > > > > +     if (tick_nohz_tick_stopped()) {
> > > > > +             data->predicted_us = ktime_to_us(delta_next);
> > > > > +             goto select;
> > > > > +     }
> > > > > +
> > > >
> > > > This introduce two potential issues:
> > > >
> > > > - This will totally ignore the typical pattern in idle loop; I
> > > >   observed on the mmc driver can trigger multiple times (> 10 times)
> > > >   with consistent interval;
> > >
> > > I'm not sure what you mean by "ignore".
> >
> > You could see after move code from blow to this position, the typical
> > pattern interval will not be accounted; so if in the middle of idles
> > there have a bunch of interrupts with fix pattern, the upper code
> > cannot detect this pattern anymore.
> 
> I'm not really following you here.
> 
> The part of the code skipped for tick_nohz_tick_stopped() doesn't
> update the data at all AFAICS.  It only computes some values that
> would be discarded later anyway, so I'm not sure what the point of
> running that computation is.

Sorry I don't explain clearly, so try to rephrase:

With your patch for the tick stopped case, it directly uses tick delta
value as prediction and goto 'select' tag.  So it skips below code
pieces, these codes have minor improvement for typical pattern which
can be applied in the middle of idles, for example, the mmc driver
triggers 16 interrupts with ~1500us interval, these interrupts are all
handled within the idle loop, so the typical pattern can detect the mmc
interrupts pattern and it will help idle governor to select a shallower
idle state so can avoid to break the residency.

You mentioned these computed values would be discarded later, this is
true for most cases, but it isn't always true actually.  Without your
patch, the governor will discard the computed values only when
'data->predicted_us < TICK_USEC', otherwise the interval pattern is
still be applied in the prediction.

	expected_interval = get_typical_interval(data);
	expected_interval = min(expected_interval, data->next_timer_us);

	[...]

	/*
	 * Use the lowest expected idle interval to pick the idle state.
	 */
	data->predicted_us = min(data->predicted_us, expected_interval);

> The statistics are updated by menu_update() and that still runs and it
> will take the actual wakeup events into account, won't it?

Yes.

> > [...]
> >
> > > > > -     if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> > > > > -         expected_interval < TICK_USEC) {
> > > > > +     if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> > > > > +         expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) {
> > > >
> > > > I am not sure this logic is right... Why not use below checking, so
> > > > for POLLING state we will never ask to stop the tick?
> > > >
> > > >         if (drv->states[idx].flags & CPUIDLE_FLAG_POLLING ||
> > > >             (expected_interval < TICK_USEC && !tick_nohz_tick_stopped())) {
> > > >
> > >
> > > The only effect of it would be setting stop_tick to false, but why
> > > would that matter?
> >
> > Please consider below situation, not sure if this case is existed or
> > not:
> >
> >   step1: first time: enter one idle state with stopping tick;
> >   step2: second time: select POLLING state and tick_nohz_tick_stopped()
> >   is true;
> >
> > So in step2, it cannot set stop_tick to false with below sentence.
> >
> > > > >               unsigned int delta_next_us = ktime_to_us(delta_next);
> > > > >
> > > > >               *stop_tick = false;
> 
> But setting *stop_tick has no effect as far as the current code is
> concerned (up to the bug fixed by the other patch).

Yeah.

> Also the POLLING state can only be selected if there are no other
> states matching delta_next available in that case which means that
> there will be a timer to break the polling loop soon enough (and BTW
> the polling has a built-in timeout too), so I don't really see a
> problem here.

Ah, now I understand the logic and I misunderstand the POLLING mode
before; now agree with this.  Sorry for noise.

Thanks,
Leo Yan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] cpuidle: menu: Handle stopped tick more aggressively
  2018-08-10 11:15   ` [PATCH v3] " Rafael J. Wysocki
@ 2018-08-12 14:55     ` leo.yan
  2018-08-13  8:11       ` Rafael J. Wysocki
  2018-08-20 10:15       ` Peter Zijlstra
  2018-08-13 11:26     ` [PATCH v4] " Rafael J. Wysocki
  1 sibling, 2 replies; 20+ messages in thread
From: leo.yan @ 2018-08-12 14:55 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux PM, Peter Zijlstra, LKML, Frederic Weisbecker

On Fri, Aug 10, 2018 at 01:15:58PM +0200, Rafael J . Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
> with stopped tick) missed the case when the target residencies of
> deep idle states of CPUs are above the tick boundary which may cause
> the CPU to get stuck in a shallow idle state for a long time.
> 
> Say there are two CPU idle states available: one shallow, with the
> target residency much below the tick boundary and one deep, with
> the target residency significantly above the tick boundary.  In
> that case, if the tick has been stopped already and the expected
> next timer event is relatively far in the future, the governor will
> assume the idle duration to be equal to TICK_USEC and it will select
> the idle state for the CPU accordingly.  However, that will cause the
> shallow state to be selected even though it would have been more
> energy-efficient to select the deep one.
> 
> To address this issue, modify the governor to always assume idle
> duration to be equal to the time till the closest timer event if
> the tick is not running which will cause the selected idle states
> to always match the known CPU wakeup time.
> 
> Also make it always indicate that the tick should be stopped in
> that case for consistency.
> 
> Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with stopped tick)
> Reported-by: Leo Yan <leo.yan@linaro.org>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> 
> -> v2: Initialize first_idx properly in the stopped tick case.
> 
> -> v3: Compute data->bucket before checking whether or not the tick has been
>        stopped already to prevent it from becoming stale.
> 
> ---
>  drivers/cpuidle/governors/menu.c |   55 +++++++++++++++++----------------------
>  1 file changed, 25 insertions(+), 30 deletions(-)
> 
> Index: linux-pm/drivers/cpuidle/governors/menu.c
> ===================================================================
> --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> +++ linux-pm/drivers/cpuidle/governors/menu.c
> @@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr
>  {
>  	struct menu_device *data = this_cpu_ptr(&menu_devices);
>  	int latency_req = cpuidle_governor_latency_req(dev->cpu);
> -	int i;
> -	int first_idx;
> -	int idx;
> +	int first_idx = 0;
> +	int idx, i;
>  	unsigned int interactivity_req;
>  	unsigned int expected_interval;
>  	unsigned long nr_iowaiters, cpu_load;
> @@ -311,6 +310,18 @@ static int menu_select(struct cpuidle_dr
>  	data->bucket = which_bucket(data->next_timer_us, nr_iowaiters);
>  
>  	/*
> +	 * If the tick is already stopped, the cost of possible short idle
> +	 * duration misprediction is much higher, because the CPU may be stuck
> +	 * in a shallow idle state for a long time as a result of it.  In that
> +	 * case say we might mispredict and use the known time till the closest
> +	 * timer event for the idle state selection.
> +	 */
> +	if (tick_nohz_tick_stopped()) {
> +		data->predicted_us = ktime_to_us(delta_next);
> +		goto select;
> +	}

I tried this patch at my side, firstly just clarify this patch is okay
for me, but there have other underlying issues I observed the CPU
staying shallow idle state with tick stopped, so just note at here.

From my understanding, the rational for this patch is we
only use the timer event as the reliable wake up source; if there have
one short timer event then we can select shallow state, otherwise we
also can select deepest idle state for long expired timer.

This means the idle governor needs to know the reliable info for the
timer event, so far I observe there at least have two issues for timer
event delta value cannot be trusted.

The first one issue is caused by timer cancel, I wrote one case for
CPU_0 starting a hrtimer with pinned mode with short expire time and
when the CPU_0 goes to sleep this short timeout timer can let idle
governor selects a shallow state; at the meantime another CPU_1 will
be used to try to cancel the timer, my purpose is to cheat CPU_0 so can
see the CPU_0 staying in shallow state for long time;  it has low
percentage to cancel the timer successfully, but I do see seldomly the
timer can be canceled successfully so CPU_0 will stay in idle for long
time (I cannot explain why the timer cannot be canceled successfully
for every time, this might be another issue?).  This case is tricky,
but it's possible happen in drivers with timer cancel.

Another issue is caused by spurious interrupts; if we review the
function tick_nohz_get_sleep_length(), it uses 'ts->idle_entrytime' to
calculate tick or timer delta, so every time when exit from interrupt
and before enter idle governor, it needs to update
'ts->idle_entrytime'; but for spurious interrupts, it will not call
irq_enter() and irq_exit() pairs, so it doesn't invoke below flows:

  irq_exit()
    `->tick_irq_exit()
         `->tick_nohz_irq_exit()
              `->tick_nohz_start_idle()

As result, after spurious interrupts handling, the idle loop doesn't
update for ts->idle_entrytime so the governor might read back a stale
value.  I don't really locate this issue, but I can see the CPU is
waken up without any interrupt handling and then directly go to
sleep again, the menu governor selects one shallow state so the cpu
stay in shallow state for long time.

> +
> +	/*
>  	 * Force the result of multiplication to be 64 bits even if both
>  	 * operands are 32 bits.
>  	 * Make sure to round up for half microseconds.
> @@ -322,7 +333,6 @@ static int menu_select(struct cpuidle_dr
>  	expected_interval = get_typical_interval(data);
>  	expected_interval = min(expected_interval, data->next_timer_us);
>  
> -	first_idx = 0;
>  	if (drv->states[0].flags & CPUIDLE_FLAG_POLLING) {
>  		struct cpuidle_state *s = &drv->states[1];
>  		unsigned int polling_threshold;
> @@ -344,29 +354,15 @@ static int menu_select(struct cpuidle_dr
>  	 */
>  	data->predicted_us = min(data->predicted_us, expected_interval);
>  
> -	if (tick_nohz_tick_stopped()) {
> -		/*
> -		 * If the tick is already stopped, the cost of possible short
> -		 * idle duration misprediction is much higher, because the CPU
> -		 * may be stuck in a shallow idle state for a long time as a
> -		 * result of it.  In that case say we might mispredict and try
> -		 * to force the CPU into a state for which we would have stopped
> -		 * the tick, unless a timer is going to expire really soon
> -		 * anyway.
> -		 */
> -		if (data->predicted_us < TICK_USEC)
> -			data->predicted_us = min_t(unsigned int, TICK_USEC,
> -						   ktime_to_us(delta_next));
> -	} else {
> -		/*
> -		 * Use the performance multiplier and the user-configurable
> -		 * latency_req to determine the maximum exit latency.
> -		 */
> -		interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
> -		if (latency_req > interactivity_req)
> -			latency_req = interactivity_req;
> -	}
> +	/*
> +	 * Use the performance multiplier and the user-configurable latency_req
> +	 * to determine the maximum exit latency.
> +	 */
> +	interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
> +	if (latency_req > interactivity_req)
> +		latency_req = interactivity_req;
>  
> +select:
>  	expected_interval = data->predicted_us;
>  	/*
>  	 * Find the idle state with the lowest power while satisfying
> @@ -403,14 +399,13 @@ static int menu_select(struct cpuidle_dr
>  	 * Don't stop the tick if the selected state is a polling one or if the
>  	 * expected idle duration is shorter than the tick period length.
>  	 */
> -	if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> -	    expected_interval < TICK_USEC) {
> +	if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> +	    expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) {
>  		unsigned int delta_next_us = ktime_to_us(delta_next);
>  
>  		*stop_tick = false;
>  
> -		if (!tick_nohz_tick_stopped() && idx > 0 &&
> -		    drv->states[idx].target_residency > delta_next_us) {
> +		if (idx > 0 && drv->states[idx].target_residency > delta_next_us) {
>  			/*
>  			 * The tick is not going to be stopped and the target
>  			 * residency of the state to be returned is not within
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] cpuidle: menu: Handle stopped tick more aggressively
  2018-08-10  7:34 [PATCH] cpuidle: menu: Handle stopped tick more aggressively Rafael J. Wysocki
  2018-08-10  7:57 ` [PATCH v2] " Rafael J. Wysocki
  2018-08-11 16:32 ` [PATCH] " kbuild test robot
@ 2018-08-12 22:13 ` Dan Carpenter
  2 siblings, 0 replies; 20+ messages in thread
From: Dan Carpenter @ 2018-08-12 22:13 UTC (permalink / raw)
  To: kbuild, Rafael J. Wysocki
  Cc: kbuild-all, Linux PM, Peter Zijlstra, LKML, Leo Yan, Frederic Weisbecker

Hi Rafael,

I love your patch! Perhaps something to improve:

url:    https://github.com/0day-ci/linux/commits/Rafael-J-Wysocki/cpuidle-menu-Handle-stopped-tick-more-aggressively/20180811-191914
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next

smatch warnings:
drivers/cpuidle/governors/menu.c:374 menu_select() error: uninitialized symbol 'first_idx'.

# https://github.com/0day-ci/linux/commit/5f9f09809ebd1b4f7820c9925a0cbd417bd3a823
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout 5f9f09809ebd1b4f7820c9925a0cbd417bd3a823
vim +/first_idx +374 drivers/cpuidle/governors/menu.c

1f85f87d4 Arjan van de Ven              2010-05-24  276  
4f86d3a8e Len Brown                     2007-10-03  277  /**
4f86d3a8e Len Brown                     2007-10-03  278   * menu_select - selects the next idle state to enter
46bcfad7a Deepthi Dharwar               2011-10-28  279   * @drv: cpuidle driver containing state data
4f86d3a8e Len Brown                     2007-10-03  280   * @dev: the CPU
45f1ff59e Rafael J. Wysocki             2018-03-22  281   * @stop_tick: indication on whether or not to stop the tick
4f86d3a8e Len Brown                     2007-10-03  282   */
45f1ff59e Rafael J. Wysocki             2018-03-22  283  static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
45f1ff59e Rafael J. Wysocki             2018-03-22  284  		       bool *stop_tick)
4f86d3a8e Len Brown                     2007-10-03  285  {
229b6863b Christoph Lameter             2014-08-17  286  	struct menu_device *data = this_cpu_ptr(&menu_devices);
0fc784fb0 Rafael J. Wysocki             2018-05-30  287  	int latency_req = cpuidle_governor_latency_req(dev->cpu);
4f86d3a8e Len Brown                     2007-10-03  288  	int i;
3ed09c945 Nicholas Piggin               2017-06-26  289  	int first_idx;
3ed09c945 Nicholas Piggin               2017-06-26  290  	int idx;
96e95182e tuukka.tikkanen@linaro.org    2014-02-24  291  	unsigned int interactivity_req;
e132b9b3b Rik van Riel                  2016-03-16  292  	unsigned int expected_interval;
372ba8cb4 Mel Gorman                    2014-08-06  293  	unsigned long nr_iowaiters, cpu_load;
296bb1e51 Rafael J. Wysocki             2018-04-05  294  	ktime_t delta_next;
4f86d3a8e Len Brown                     2007-10-03  295  
672917dcc Corrado Zoccolo               2009-09-21  296  	if (data->needs_update) {
46bcfad7a Deepthi Dharwar               2011-10-28  297  		menu_update(drv, dev);
672917dcc Corrado Zoccolo               2009-09-21  298  		data->needs_update = 0;
672917dcc Corrado Zoccolo               2009-09-21  299  	}
672917dcc Corrado Zoccolo               2009-09-21  300  
69d25870f Arjan van de Ven              2009-09-21  301  	/* Special case when user has set very strict latency requirement */
45f1ff59e Rafael J. Wysocki             2018-03-22  302  	if (unlikely(latency_req == 0)) {
45f1ff59e Rafael J. Wysocki             2018-03-22  303  		*stop_tick = false;
a2bd92023 venkatesh.pallipadi@intel.com 2008-07-30  304  		return 0;
45f1ff59e Rafael J. Wysocki             2018-03-22  305  	}
a2bd92023 venkatesh.pallipadi@intel.com 2008-07-30  306  
69d25870f Arjan van de Ven              2009-09-21  307  	/* determine the expected residency time, round up */
296bb1e51 Rafael J. Wysocki             2018-04-05  308  	data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length(&delta_next));
69d25870f Arjan van de Ven              2009-09-21  309  
5f9f09809 Rafael J. Wysocki             2018-08-10  310  	/*
5f9f09809 Rafael J. Wysocki             2018-08-10  311  	 * If the tick is already stopped, the cost of possible short idle
5f9f09809 Rafael J. Wysocki             2018-08-10  312  	 * duration misprediction is much higher, because the CPU may be stuck
5f9f09809 Rafael J. Wysocki             2018-08-10  313  	 * in a shallow idle state for a long time as a result of it.  In that
5f9f09809 Rafael J. Wysocki             2018-08-10  314  	 * case say we might mispredict and use the known time till the closest
5f9f09809 Rafael J. Wysocki             2018-08-10  315  	 * timer event for the idle state selection.
5f9f09809 Rafael J. Wysocki             2018-08-10  316  	 */
5f9f09809 Rafael J. Wysocki             2018-08-10  317  	if (tick_nohz_tick_stopped()) {
5f9f09809 Rafael J. Wysocki             2018-08-10  318  		data->predicted_us = ktime_to_us(delta_next);
5f9f09809 Rafael J. Wysocki             2018-08-10  319  		goto select;
                                                                        ^^^^^^^^^^^
We hit this goto

5f9f09809 Rafael J. Wysocki             2018-08-10  320  	}
5f9f09809 Rafael J. Wysocki             2018-08-10  321  
372ba8cb4 Mel Gorman                    2014-08-06  322  	get_iowait_load(&nr_iowaiters, &cpu_load);
64b4ca5cb Mel Gorman                    2014-08-06  323  	data->bucket = which_bucket(data->next_timer_us, nr_iowaiters);
69d25870f Arjan van de Ven              2009-09-21  324  
69d25870f Arjan van de Ven              2009-09-21  325  	/*
51f245b89 Tuukka Tikkanen               2013-08-14  326  	 * Force the result of multiplication to be 64 bits even if both
51f245b89 Tuukka Tikkanen               2013-08-14  327  	 * operands are 32 bits.
51f245b89 Tuukka Tikkanen               2013-08-14  328  	 * Make sure to round up for half microseconds.
51f245b89 Tuukka Tikkanen               2013-08-14  329  	 */
ee3c86f35 Javi Merino                   2015-04-16  330  	data->predicted_us = DIV_ROUND_CLOSEST_ULL((uint64_t)data->next_timer_us *
51f245b89 Tuukka Tikkanen               2013-08-14  331  					 data->correction_factor[data->bucket],
69d25870f Arjan van de Ven              2009-09-21  332  					 RESOLUTION * DECAY);
69d25870f Arjan van de Ven              2009-09-21  333  
e132b9b3b Rik van Riel                  2016-03-16  334  	expected_interval = get_typical_interval(data);
e132b9b3b Rik van Riel                  2016-03-16  335  	expected_interval = min(expected_interval, data->next_timer_us);
96e95182e tuukka.tikkanen@linaro.org    2014-02-24  336  
dc2251bf9 Rafael J. Wysocki             2017-08-23  337  	first_idx = 0;
dc2251bf9 Rafael J. Wysocki             2017-08-23  338  	if (drv->states[0].flags & CPUIDLE_FLAG_POLLING) {
dc2251bf9 Rafael J. Wysocki             2017-08-23  339  		struct cpuidle_state *s = &drv->states[1];
0c313cb20 Rafael J. Wysocki             2016-03-20  340  		unsigned int polling_threshold;
0c313cb20 Rafael J. Wysocki             2016-03-20  341  
96e95182e tuukka.tikkanen@linaro.org    2014-02-24  342  		/*
69d25870f Arjan van de Ven              2009-09-21  343  		 * We want to default to C1 (hlt), not to busy polling
e132b9b3b Rik van Riel                  2016-03-16  344  		 * unless the timer is happening really really soon, or
e132b9b3b Rik van Riel                  2016-03-16  345  		 * C1's exit latency exceeds the user configured limit.
69d25870f Arjan van de Ven              2009-09-21  346  		 */
0c313cb20 Rafael J. Wysocki             2016-03-20  347  		polling_threshold = max_t(unsigned int, 20, s->target_residency);
0c313cb20 Rafael J. Wysocki             2016-03-20  348  		if (data->next_timer_us > polling_threshold &&
0c313cb20 Rafael J. Wysocki             2016-03-20  349  		    latency_req > s->exit_latency && !s->disabled &&
dc2251bf9 Rafael J. Wysocki             2017-08-23  350  		    !dev->states_usage[1].disable)
dc2251bf9 Rafael J. Wysocki             2017-08-23  351  			first_idx = 1;
9c4b2867e Rafael J. Wysocki             2016-01-14  352  	}
4f86d3a8e Len Brown                     2007-10-03  353  
71abbbf85 Ai Li                         2010-08-09  354  	/*
e132b9b3b Rik van Riel                  2016-03-16  355  	 * Use the lowest expected idle interval to pick the idle state.
e132b9b3b Rik van Riel                  2016-03-16  356  	 */
e132b9b3b Rik van Riel                  2016-03-16  357  	data->predicted_us = min(data->predicted_us, expected_interval);
e132b9b3b Rik van Riel                  2016-03-16  358  
e132b9b3b Rik van Riel                  2016-03-16  359  	/*
5f9f09809 Rafael J. Wysocki             2018-08-10  360  	 * Use the performance multiplier and the user-configurable latency_req
5f9f09809 Rafael J. Wysocki             2018-08-10  361  	 * to determine the maximum exit latency.
e132b9b3b Rik van Riel                  2016-03-16  362  	 */
e132b9b3b Rik van Riel                  2016-03-16  363  	interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
e132b9b3b Rik van Riel                  2016-03-16  364  	if (latency_req > interactivity_req)
e132b9b3b Rik van Riel                  2016-03-16  365  		latency_req = interactivity_req;
e132b9b3b Rik van Riel                  2016-03-16  366  
5f9f09809 Rafael J. Wysocki             2018-08-10  367  select:
45f1ff59e Rafael J. Wysocki             2018-03-22  368  	expected_interval = data->predicted_us;
e132b9b3b Rik van Riel                  2016-03-16  369  	/*
71abbbf85 Ai Li                         2010-08-09  370  	 * Find the idle state with the lowest power while satisfying
71abbbf85 Ai Li                         2010-08-09  371  	 * our constraints.
71abbbf85 Ai Li                         2010-08-09  372  	 */
3ed09c945 Nicholas Piggin               2017-06-26  373  	idx = -1;
3ed09c945 Nicholas Piggin               2017-06-26 @374  	for (i = first_idx; i < drv->state_count; i++) {
                                                                     ^^^^^^^^^^^^^^
This is uninitialized

46bcfad7a Deepthi Dharwar               2011-10-28  375  		struct cpuidle_state *s = &drv->states[i];
dc7fd275a ShuoX Liu                     2012-07-03  376  		struct cpuidle_state_usage *su = &dev->states_usage[i];
4f86d3a8e Len Brown                     2007-10-03  377  
cbc9ef028 Rafael J. Wysocki             2012-07-03  378  		if (s->disabled || su->disable)
3a53396b0 ShuoX Liu                     2012-03-28  379  			continue;
3ed09c945 Nicholas Piggin               2017-06-26  380  		if (idx == -1)
3ed09c945 Nicholas Piggin               2017-06-26  381  			idx = i; /* first enabled state */
148519120 Rafael J. Wysocki             2013-07-27  382  		if (s->target_residency > data->predicted_us)
8e37e1a2a Alex Shi                      2017-01-12  383  			break;
45f1ff59e Rafael J. Wysocki             2018-03-22  384  		if (s->exit_latency > latency_req) {
45f1ff59e Rafael J. Wysocki             2018-03-22  385  			/*
45f1ff59e Rafael J. Wysocki             2018-03-22  386  			 * If we break out of the loop for latency reasons, use
45f1ff59e Rafael J. Wysocki             2018-03-22  387  			 * the target residency of the selected state as the
45f1ff59e Rafael J. Wysocki             2018-03-22  388  			 * expected idle duration so that the tick is retained
45f1ff59e Rafael J. Wysocki             2018-03-22  389  			 * as long as that target residency is low enough.
45f1ff59e Rafael J. Wysocki             2018-03-22  390  			 */
45f1ff59e Rafael J. Wysocki             2018-03-22  391  			expected_interval = drv->states[idx].target_residency;
8e37e1a2a Alex Shi                      2017-01-12  392  			break;
45f1ff59e Rafael J. Wysocki             2018-03-22  393  		}
3ed09c945 Nicholas Piggin               2017-06-26  394  		idx = i;
71abbbf85 Ai Li                         2010-08-09  395  	}
4f86d3a8e Len Brown                     2007-10-03  396  
3ed09c945 Nicholas Piggin               2017-06-26  397  	if (idx == -1)
3ed09c945 Nicholas Piggin               2017-06-26  398  		idx = 0; /* No states enabled. Must use 0. */
3ed09c945 Nicholas Piggin               2017-06-26  399  
45f1ff59e Rafael J. Wysocki             2018-03-22  400  	/*
45f1ff59e Rafael J. Wysocki             2018-03-22  401  	 * Don't stop the tick if the selected state is a polling one or if the
45f1ff59e Rafael J. Wysocki             2018-03-22  402  	 * expected idle duration is shorter than the tick period length.
45f1ff59e Rafael J. Wysocki             2018-03-22  403  	 */
5f9f09809 Rafael J. Wysocki             2018-08-10  404  	if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
5f9f09809 Rafael J. Wysocki             2018-08-10  405  	    expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) {
296bb1e51 Rafael J. Wysocki             2018-04-05  406  		unsigned int delta_next_us = ktime_to_us(delta_next);
296bb1e51 Rafael J. Wysocki             2018-04-05  407  
45f1ff59e Rafael J. Wysocki             2018-03-22  408  		*stop_tick = false;
45f1ff59e Rafael J. Wysocki             2018-03-22  409  
5f9f09809 Rafael J. Wysocki             2018-08-10  410  		if (idx > 0 && drv->states[idx].target_residency > delta_next_us) {
296bb1e51 Rafael J. Wysocki             2018-04-05  411  			/*
296bb1e51 Rafael J. Wysocki             2018-04-05  412  			 * The tick is not going to be stopped and the target
296bb1e51 Rafael J. Wysocki             2018-04-05  413  			 * residency of the state to be returned is not within
296bb1e51 Rafael J. Wysocki             2018-04-05  414  			 * the time until the next timer event including the
296bb1e51 Rafael J. Wysocki             2018-04-05  415  			 * tick, so try to correct that.
296bb1e51 Rafael J. Wysocki             2018-04-05  416  			 */
296bb1e51 Rafael J. Wysocki             2018-04-05  417  			for (i = idx - 1; i >= 0; i--) {
296bb1e51 Rafael J. Wysocki             2018-04-05  418  			    if (drv->states[i].disabled ||
296bb1e51 Rafael J. Wysocki             2018-04-05  419  			        dev->states_usage[i].disable)
296bb1e51 Rafael J. Wysocki             2018-04-05  420  					continue;
296bb1e51 Rafael J. Wysocki             2018-04-05  421  
296bb1e51 Rafael J. Wysocki             2018-04-05  422  				idx = i;
296bb1e51 Rafael J. Wysocki             2018-04-05  423  				if (drv->states[i].target_residency <= delta_next_us)
296bb1e51 Rafael J. Wysocki             2018-04-05  424  					break;
296bb1e51 Rafael J. Wysocki             2018-04-05  425  			}
296bb1e51 Rafael J. Wysocki             2018-04-05  426  		}
296bb1e51 Rafael J. Wysocki             2018-04-05  427  	}
296bb1e51 Rafael J. Wysocki             2018-04-05  428  
3ed09c945 Nicholas Piggin               2017-06-26  429  	data->last_state_idx = idx;
3ed09c945 Nicholas Piggin               2017-06-26  430  
69d25870f Arjan van de Ven              2009-09-21  431  	return data->last_state_idx;
4f86d3a8e Len Brown                     2007-10-03  432  }
4f86d3a8e Len Brown                     2007-10-03  433  

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] cpuidle: menu: Handle stopped tick more aggressively
  2018-08-12 13:44           ` leo.yan
@ 2018-08-13  7:58             ` Rafael J. Wysocki
  0 siblings, 0 replies; 20+ messages in thread
From: Rafael J. Wysocki @ 2018-08-13  7:58 UTC (permalink / raw)
  To: Leo Yan
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Linux PM, Peter Zijlstra,
	Linux Kernel Mailing List, Frederic Weisbecker

On Sun, Aug 12, 2018 at 3:44 PM <leo.yan@linaro.org> wrote:
>
> On Sun, Aug 12, 2018 at 12:07:45PM +0200, Rafael J. Wysocki wrote:
>
> [...]
>
> > > > > > --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> > > > > > +++ linux-pm/drivers/cpuidle/governors/menu.c
> > > > > > @@ -285,9 +285,8 @@ static int menu_select(struct cpuidle_dr
> > > > > >  {
> > > > > >       struct menu_device *data = this_cpu_ptr(&menu_devices);
> > > > > >       int latency_req = cpuidle_governor_latency_req(dev->cpu);
> > > > > > -     int i;
> > > > > > -     int first_idx;
> > > > > > -     int idx;
> > > > > > +     int first_idx = 0;
> > > > > > +     int idx, i;
> > > > > >       unsigned int interactivity_req;
> > > > > >       unsigned int expected_interval;
> > > > > >       unsigned long nr_iowaiters, cpu_load;
> > > > > > @@ -307,6 +306,18 @@ static int menu_select(struct cpuidle_dr
> > > > > >       /* determine the expected residency time, round up */
> > > > > >       data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length(&delta_next));
> > > > > >
> > > > > > +     /*
> > > > > > +      * If the tick is already stopped, the cost of possible short idle
> > > > > > +      * duration misprediction is much higher, because the CPU may be stuck
> > > > > > +      * in a shallow idle state for a long time as a result of it.  In that
> > > > > > +      * case say we might mispredict and use the known time till the closest
> > > > > > +      * timer event for the idle state selection.
> > > > > > +      */
> > > > > > +     if (tick_nohz_tick_stopped()) {
> > > > > > +             data->predicted_us = ktime_to_us(delta_next);
> > > > > > +             goto select;
> > > > > > +     }
> > > > > > +
> > > > >
> > > > > This introduce two potential issues:
> > > > >
> > > > > - This will totally ignore the typical pattern in idle loop; I
> > > > >   observed on the mmc driver can trigger multiple times (> 10 times)
> > > > >   with consistent interval;
> > > >
> > > > I'm not sure what you mean by "ignore".
> > >
> > > You could see after move code from blow to this position, the typical
> > > pattern interval will not be accounted; so if in the middle of idles
> > > there have a bunch of interrupts with fix pattern, the upper code
> > > cannot detect this pattern anymore.
> >
> > I'm not really following you here.
> >
> > The part of the code skipped for tick_nohz_tick_stopped() doesn't
> > update the data at all AFAICS.  It only computes some values that
> > would be discarded later anyway, so I'm not sure what the point of
> > running that computation is.
>
> Sorry I don't explain clearly, so try to rephrase:
>
> With your patch for the tick stopped case, it directly uses tick delta
> value as prediction and goto 'select' tag.  So it skips below code
> pieces, these codes have minor improvement for typical pattern which
> can be applied in the middle of idles, for example, the mmc driver
> triggers 16 interrupts with ~1500us interval, these interrupts are all
> handled within the idle loop, so the typical pattern can detect the mmc
> interrupts pattern and it will help idle governor to select a shallower
> idle state so can avoid to break the residency.
>
> You mentioned these computed values would be discarded later, this is
> true for most cases, but it isn't always true actually.  Without your
> patch, the governor will discard the computed values only when
> 'data->predicted_us < TICK_USEC', otherwise the interval pattern is
> still be applied in the prediction.

OK, right.

I'll fix that up in v4, thanks!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] cpuidle: menu: Handle stopped tick more aggressively
  2018-08-12 14:55     ` leo.yan
@ 2018-08-13  8:11       ` Rafael J. Wysocki
  2018-08-20 10:15       ` Peter Zijlstra
  1 sibling, 0 replies; 20+ messages in thread
From: Rafael J. Wysocki @ 2018-08-13  8:11 UTC (permalink / raw)
  To: Leo Yan
  Cc: Rafael J. Wysocki, Linux PM, Peter Zijlstra,
	Linux Kernel Mailing List, Frederic Weisbecker

On Sun, Aug 12, 2018 at 4:55 PM <leo.yan@linaro.org> wrote:
>
> On Fri, Aug 10, 2018 at 01:15:58PM +0200, Rafael J . Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >

[cut]

>
> I tried this patch at my side, firstly just clarify this patch is okay
> for me, but there have other underlying issues I observed the CPU
> staying shallow idle state with tick stopped, so just note at here.

Thanks for testing!

> From my understanding, the rational for this patch is we
> only use the timer event as the reliable wake up source; if there have
> one short timer event then we can select shallow state, otherwise we
> also can select deepest idle state for long expired timer.
>
> This means the idle governor needs to know the reliable info for the
> timer event, so far I observe there at least have two issues for timer
> event delta value cannot be trusted.
>
> The first one issue is caused by timer cancel, I wrote one case for
> CPU_0 starting a hrtimer with pinned mode with short expire time and
> when the CPU_0 goes to sleep this short timeout timer can let idle
> governor selects a shallow state; at the meantime another CPU_1 will
> be used to try to cancel the timer, my purpose is to cheat CPU_0 so can
> see the CPU_0 staying in shallow state for long time;  it has low
> percentage to cancel the timer successfully, but I do see seldomly the
> timer can be canceled successfully so CPU_0 will stay in idle for long
> time (I cannot explain why the timer cannot be canceled successfully
> for every time, this might be another issue?).  This case is tricky,
> but it's possible happen in drivers with timer cancel.

Yes, it can potentially happen, but I'm not worried about it.  If it
happens, that will only be occasionally and without measurable effect
on total energy usage of the system.

> Another issue is caused by spurious interrupts; if we review the
> function tick_nohz_get_sleep_length(), it uses 'ts->idle_entrytime' to
> calculate tick or timer delta, so every time when exit from interrupt
> and before enter idle governor, it needs to update
> 'ts->idle_entrytime'; but for spurious interrupts, it will not call
> irq_enter() and irq_exit() pairs, so it doesn't invoke below flows:
>
>   irq_exit()
>     `->tick_irq_exit()
>          `->tick_nohz_irq_exit()
>               `->tick_nohz_start_idle()
>
> As result, after spurious interrupts handling, the idle loop doesn't
> update for ts->idle_entrytime so the governor might read back a stale
> value.  I don't really locate this issue, but I can see the CPU is
> waken up without any interrupt handling and then directly go to
> sleep again, the menu governor selects one shallow state so the cpu
> stay in shallow state for long time.

This sounds buggy, but again, spurious interrupts are not expected to
occur too often and if they do, they are a serious enough issue by
themselves.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v4] cpuidle: menu: Handle stopped tick more aggressively
  2018-08-10 11:15   ` [PATCH v3] " Rafael J. Wysocki
  2018-08-12 14:55     ` leo.yan
@ 2018-08-13 11:26     ` Rafael J. Wysocki
  2018-08-14 10:34       ` [PATCH v5] " Rafael J. Wysocki
  1 sibling, 1 reply; 20+ messages in thread
From: Rafael J. Wysocki @ 2018-08-13 11:26 UTC (permalink / raw)
  To: Linux PM; +Cc: Peter Zijlstra, LKML, Leo Yan, Frederic Weisbecker

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
with stopped tick) missed the case when the target residencies of
deep idle states of CPUs are above the tick boundary which may cause
the CPU to get stuck in a shallow idle state for a long time.

Say there are two CPU idle states available: one shallow, with the
target residency much below the tick boundary and one deep, with
the target residency significantly above the tick boundary.  In
that case, if the tick has been stopped already and the expected
next timer event is relatively far in the future, the governor will
assume the idle duration to be equal to TICK_USEC and it will select
the idle state for the CPU accordingly.  However, that will cause the
shallow state to be selected even though it would have been more
energy-efficient to select the deep one.

To address this issue, modify the governor to always use the time
till the closest timer event instead of the predicted idle duration
if the latter is less than the tick period length and the tick has
been stopped already.  Also make it extend the search for a matching
idle state if the tick is stopped to avoid settling on a shallow
state if deep states with target residencies above the tick period
length are available.

In addition, make it always indicate that the tick should be stopped
if it has been stopped already for consistency.

Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with stopped tick)
Reported-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
-> v2: Initialize first_idx properly in the stopped tick case.
 
v2 -> v3: Compute data->bucket before checking whether or not the tick has been
          stopped already to prevent it from becoming stale.

v3 -> v4: Allow the usual state selection to be carried out if the tick has been
          stopped in case the predicted idle duration is greater than the tick
          period length and a matching state can be found without overriding
          the prediction result.
---
 drivers/cpuidle/governors/menu.c |   33 +++++++++++++++++++++------------
 1 file changed, 21 insertions(+), 12 deletions(-)

Index: linux-pm/drivers/cpuidle/governors/menu.c
===================================================================
--- linux-pm.orig/drivers/cpuidle/governors/menu.c
+++ linux-pm/drivers/cpuidle/governors/menu.c
@@ -349,14 +349,12 @@ static int menu_select(struct cpuidle_dr
 		 * If the tick is already stopped, the cost of possible short
 		 * idle duration misprediction is much higher, because the CPU
 		 * may be stuck in a shallow idle state for a long time as a
-		 * result of it.  In that case say we might mispredict and try
-		 * to force the CPU into a state for which we would have stopped
-		 * the tick, unless a timer is going to expire really soon
-		 * anyway.
+		 * result of it.  In that case say we might mispredict and use
+		 * the known time till the closest timer event for the idle
+		 * state selection.
 		 */
 		if (data->predicted_us < TICK_USEC)
-			data->predicted_us = min_t(unsigned int, TICK_USEC,
-						   ktime_to_us(delta_next));
+			data->predicted_us = ktime_to_us(delta_next);
 	} else {
 		/*
 		 * Use the performance multiplier and the user-configurable
@@ -381,8 +379,20 @@ static int menu_select(struct cpuidle_dr
 			continue;
 		if (idx == -1)
 			idx = i; /* first enabled state */
-		if (s->target_residency > data->predicted_us)
-			break;
+		if (s->target_residency > data->predicted_us) {
+			if (!tick_nohz_tick_stopped() ||
+			    drv->states[idx].target_residency >= TICK_USEC)
+				break;
+
+			/*
+			 * If the state selected so far is shallow and the time
+			 * till the closest timer event matches this state's
+			 * target residency, select this one to avoid getting
+			 * stuck in the shallow one for too long.
+			 */
+			if (s->target_residency > ktime_to_us(delta_next))
+				break;
+		}
 		if (s->exit_latency > latency_req) {
 			/*
 			 * If we break out of the loop for latency reasons, use
@@ -403,14 +413,13 @@ static int menu_select(struct cpuidle_dr
 	 * Don't stop the tick if the selected state is a polling one or if the
 	 * expected idle duration is shorter than the tick period length.
 	 */
-	if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
-	    expected_interval < TICK_USEC) {
+	if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
+	     expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) {
 		unsigned int delta_next_us = ktime_to_us(delta_next);
 
 		*stop_tick = false;
 
-		if (!tick_nohz_tick_stopped() && idx > 0 &&
-		    drv->states[idx].target_residency > delta_next_us) {
+		if (idx > 0 && drv->states[idx].target_residency > delta_next_us) {
 			/*
 			 * The tick is not going to be stopped and the target
 			 * residency of the state to be returned is not within


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v5] cpuidle: menu: Handle stopped tick more aggressively
  2018-08-13 11:26     ` [PATCH v4] " Rafael J. Wysocki
@ 2018-08-14 10:34       ` Rafael J. Wysocki
  2018-08-14 15:44         ` leo.yan
  2018-08-20 11:02         ` Peter Zijlstra
  0 siblings, 2 replies; 20+ messages in thread
From: Rafael J. Wysocki @ 2018-08-14 10:34 UTC (permalink / raw)
  To: Linux PM; +Cc: Peter Zijlstra, LKML, Leo Yan, Frederic Weisbecker

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
with stopped tick) missed the case when the target residencies of
deep idle states of CPUs are above the tick boundary which may cause
the CPU to get stuck in a shallow idle state for a long time.

Say there are two CPU idle states available: one shallow, with the
target residency much below the tick boundary and one deep, with
the target residency significantly above the tick boundary.  In
that case, if the tick has been stopped already and the expected
next timer event is relatively far in the future, the governor will
assume the idle duration to be equal to TICK_USEC and it will select
the idle state for the CPU accordingly.  However, that will cause the
shallow state to be selected even though it would have been more
energy-efficient to select the deep one.

To address this issue, modify the governor to always use the time
till the closest timer event instead of the predicted idle duration
if the latter is less than the tick period length and the tick has
been stopped already.  Also make it extend the search for a matching
idle state if the tick is stopped to avoid settling on a shallow
state if deep states with target residencies above the tick period
length are available.

In addition, make it always indicate that the tick should be stopped
if it has been stopped already for consistency.

Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with stopped tick)
Reported-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
-> v2: Initialize first_idx properly in the stopped tick case.
 
v2 -> v3: Compute data->bucket before checking whether or not the tick has been
          stopped already to prevent it from becoming stale.

v3 -> v4: Allow the usual state selection to be carried out if the tick has been
          stopped in case the predicted idle duration is greater than the tick
          period length and a matching state can be found without overriding
          the prediction result.

v4 -> v5: Rework code to be more straightforward.  Functionally, it should
          behave like the v4.
---
 drivers/cpuidle/governors/menu.c |   36 ++++++++++++++++++++++++------------
 1 file changed, 24 insertions(+), 12 deletions(-)

Index: linux-pm/drivers/cpuidle/governors/menu.c
===================================================================
--- linux-pm.orig/drivers/cpuidle/governors/menu.c
+++ linux-pm/drivers/cpuidle/governors/menu.c
@@ -349,14 +349,12 @@ static int menu_select(struct cpuidle_dr
 		 * If the tick is already stopped, the cost of possible short
 		 * idle duration misprediction is much higher, because the CPU
 		 * may be stuck in a shallow idle state for a long time as a
-		 * result of it.  In that case say we might mispredict and try
-		 * to force the CPU into a state for which we would have stopped
-		 * the tick, unless a timer is going to expire really soon
-		 * anyway.
+		 * result of it.  In that case say we might mispredict and use
+		 * the known time till the closest timer event for the idle
+		 * state selection.
 		 */
 		if (data->predicted_us < TICK_USEC)
-			data->predicted_us = min_t(unsigned int, TICK_USEC,
-						   ktime_to_us(delta_next));
+			data->predicted_us = ktime_to_us(delta_next);
 	} else {
 		/*
 		 * Use the performance multiplier and the user-configurable
@@ -381,8 +379,22 @@ static int menu_select(struct cpuidle_dr
 			continue;
 		if (idx == -1)
 			idx = i; /* first enabled state */
-		if (s->target_residency > data->predicted_us)
-			break;
+		if (s->target_residency > data->predicted_us) {
+			if (!tick_nohz_tick_stopped())
+				break;
+
+			/*
+			 * If the state selected so far is shallow and this
+			 * state's target residency matches the time till the
+			 * closest timer event, select this one to avoid getting
+			 * stuck in the shallow one for too long.
+			 */
+			if (drv->states[idx].target_residency < TICK_USEC &&
+			    s->target_residency <= ktime_to_us(delta_next))
+				idx = i;
+
+			goto out;
+		}
 		if (s->exit_latency > latency_req) {
 			/*
 			 * If we break out of the loop for latency reasons, use
@@ -403,14 +415,13 @@ static int menu_select(struct cpuidle_dr
 	 * Don't stop the tick if the selected state is a polling one or if the
 	 * expected idle duration is shorter than the tick period length.
 	 */
-	if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
-	    expected_interval < TICK_USEC) {
+	if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
+	     expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) {
 		unsigned int delta_next_us = ktime_to_us(delta_next);
 
 		*stop_tick = false;
 
-		if (!tick_nohz_tick_stopped() && idx > 0 &&
-		    drv->states[idx].target_residency > delta_next_us) {
+		if (idx > 0 && drv->states[idx].target_residency > delta_next_us) {
 			/*
 			 * The tick is not going to be stopped and the target
 			 * residency of the state to be returned is not within
@@ -429,6 +440,7 @@ static int menu_select(struct cpuidle_dr
 		}
 	}
 
+out:
 	data->last_state_idx = idx;
 
 	return data->last_state_idx;


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5] cpuidle: menu: Handle stopped tick more aggressively
  2018-08-14 10:34       ` [PATCH v5] " Rafael J. Wysocki
@ 2018-08-14 15:44         ` leo.yan
  2018-08-14 17:26           ` Rafael J. Wysocki
  2018-08-20 11:04           ` Peter Zijlstra
  2018-08-20 11:02         ` Peter Zijlstra
  1 sibling, 2 replies; 20+ messages in thread
From: leo.yan @ 2018-08-14 15:44 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux PM, Peter Zijlstra, LKML, Frederic Weisbecker

On Tue, Aug 14, 2018 at 12:34:40PM +0200, Rafael J . Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
> with stopped tick) missed the case when the target residencies of
> deep idle states of CPUs are above the tick boundary which may cause
> the CPU to get stuck in a shallow idle state for a long time.
> 
> Say there are two CPU idle states available: one shallow, with the
> target residency much below the tick boundary and one deep, with
> the target residency significantly above the tick boundary.  In
> that case, if the tick has been stopped already and the expected
> next timer event is relatively far in the future, the governor will
> assume the idle duration to be equal to TICK_USEC and it will select
> the idle state for the CPU accordingly.  However, that will cause the
> shallow state to be selected even though it would have been more
> energy-efficient to select the deep one.
> 
> To address this issue, modify the governor to always use the time
> till the closest timer event instead of the predicted idle duration
> if the latter is less than the tick period length and the tick has
> been stopped already.  Also make it extend the search for a matching
> idle state if the tick is stopped to avoid settling on a shallow
> state if deep states with target residencies above the tick period
> length are available.
> 
> In addition, make it always indicate that the tick should be stopped
> if it has been stopped already for consistency.
> 
> Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with stopped tick)
> Reported-by: Leo Yan <leo.yan@linaro.org>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> -> v2: Initialize first_idx properly in the stopped tick case.
>  
> v2 -> v3: Compute data->bucket before checking whether or not the tick has been
>           stopped already to prevent it from becoming stale.
> 
> v3 -> v4: Allow the usual state selection to be carried out if the tick has been
>           stopped in case the predicted idle duration is greater than the tick
>           period length and a matching state can be found without overriding
>           the prediction result.
> 
> v4 -> v5: Rework code to be more straightforward.  Functionally, it should
>           behave like the v4.
> ---
>  drivers/cpuidle/governors/menu.c |   36 ++++++++++++++++++++++++------------
>  1 file changed, 24 insertions(+), 12 deletions(-)
> 
> Index: linux-pm/drivers/cpuidle/governors/menu.c
> ===================================================================
> --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> +++ linux-pm/drivers/cpuidle/governors/menu.c
> @@ -349,14 +349,12 @@ static int menu_select(struct cpuidle_dr
>  		 * If the tick is already stopped, the cost of possible short
>  		 * idle duration misprediction is much higher, because the CPU
>  		 * may be stuck in a shallow idle state for a long time as a
> -		 * result of it.  In that case say we might mispredict and try
> -		 * to force the CPU into a state for which we would have stopped
> -		 * the tick, unless a timer is going to expire really soon
> -		 * anyway.
> +		 * result of it.  In that case say we might mispredict and use
> +		 * the known time till the closest timer event for the idle
> +		 * state selection.
>  		 */
>  		if (data->predicted_us < TICK_USEC)
> -			data->predicted_us = min_t(unsigned int, TICK_USEC,
> -						   ktime_to_us(delta_next));
> +			data->predicted_us = ktime_to_us(delta_next);
>  	} else {
>  		/*
>  		 * Use the performance multiplier and the user-configurable
> @@ -381,8 +379,22 @@ static int menu_select(struct cpuidle_dr
>  			continue;
>  		if (idx == -1)
>  			idx = i; /* first enabled state */
> -		if (s->target_residency > data->predicted_us)
> -			break;
> +		if (s->target_residency > data->predicted_us) {
> +			if (!tick_nohz_tick_stopped())
> +				break;
> +
> +			/*
> +			 * If the state selected so far is shallow and this
> +			 * state's target residency matches the time till the
> +			 * closest timer event, select this one to avoid getting
> +			 * stuck in the shallow one for too long.
> +			 */
> +			if (drv->states[idx].target_residency < TICK_USEC &&
> +			    s->target_residency <= ktime_to_us(delta_next))
> +				idx = i;

I took some time to understand v4 and v5; now I understand you want to
prompt to deeper idle state if the prediction falls in the range
between [TICK_USEC..max_target_residency).  This seems to me is a
huristics method but not general enough and IMHO this is more difficult
for code readable.

I agree this patch can resolve the issue you mentioned in the commit
log, but this patch will be fragile for below case, so it will select
state1 but not state2, how about you think for this case?

Idle_state0::
target_residency   TICK_USEC   Prediction
    |                 |       /
    V                 V      /
-----------------------------------------------------> Time length
                                ^                 ^
                                |                 |
                            Idle_state1:       Idle_state2:
                            target_residency   target_residency

> +			goto out;
> +		}
>  		if (s->exit_latency > latency_req) {
>  			/*
>  			 * If we break out of the loop for latency reasons, use
> @@ -403,14 +415,13 @@ static int menu_select(struct cpuidle_dr
>  	 * Don't stop the tick if the selected state is a polling one or if the
>  	 * expected idle duration is shorter than the tick period length.
>  	 */
> -	if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> -	    expected_interval < TICK_USEC) {
> +	if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
> +	     expected_interval < TICK_USEC) && !tick_nohz_tick_stopped()) {
>  		unsigned int delta_next_us = ktime_to_us(delta_next);
>  
>  		*stop_tick = false;
>  
> -		if (!tick_nohz_tick_stopped() && idx > 0 &&
> -		    drv->states[idx].target_residency > delta_next_us) {
> +		if (idx > 0 && drv->states[idx].target_residency > delta_next_us) {
>  			/*
>  			 * The tick is not going to be stopped and the target
>  			 * residency of the state to be returned is not within
> @@ -429,6 +440,7 @@ static int menu_select(struct cpuidle_dr
>  		}
>  	}
>  
> +out:
>  	data->last_state_idx = idx;
>  
>  	return data->last_state_idx;
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5] cpuidle: menu: Handle stopped tick more aggressively
  2018-08-14 15:44         ` leo.yan
@ 2018-08-14 17:26           ` Rafael J. Wysocki
  2018-08-20 11:04           ` Peter Zijlstra
  1 sibling, 0 replies; 20+ messages in thread
From: Rafael J. Wysocki @ 2018-08-14 17:26 UTC (permalink / raw)
  To: Leo Yan
  Cc: Rafael J. Wysocki, Linux PM, Peter Zijlstra,
	Linux Kernel Mailing List, Frederic Weisbecker

On Tue, Aug 14, 2018 at 5:44 PM <leo.yan@linaro.org> wrote:
>
> On Tue, Aug 14, 2018 at 12:34:40PM +0200, Rafael J . Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
> > with stopped tick) missed the case when the target residencies of
> > deep idle states of CPUs are above the tick boundary which may cause
> > the CPU to get stuck in a shallow idle state for a long time.
> >
> > Say there are two CPU idle states available: one shallow, with the
> > target residency much below the tick boundary and one deep, with
> > the target residency significantly above the tick boundary.  In
> > that case, if the tick has been stopped already and the expected
> > next timer event is relatively far in the future, the governor will
> > assume the idle duration to be equal to TICK_USEC and it will select
> > the idle state for the CPU accordingly.  However, that will cause the
> > shallow state to be selected even though it would have been more
> > energy-efficient to select the deep one.
> >
> > To address this issue, modify the governor to always use the time
> > till the closest timer event instead of the predicted idle duration
> > if the latter is less than the tick period length and the tick has
> > been stopped already.  Also make it extend the search for a matching
> > idle state if the tick is stopped to avoid settling on a shallow
> > state if deep states with target residencies above the tick period
> > length are available.
> >
> > In addition, make it always indicate that the tick should be stopped
> > if it has been stopped already for consistency.
> >
> > Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with stopped tick)
> > Reported-by: Leo Yan <leo.yan@linaro.org>
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> > -> v2: Initialize first_idx properly in the stopped tick case.
> >
> > v2 -> v3: Compute data->bucket before checking whether or not the tick has been
> >           stopped already to prevent it from becoming stale.
> >
> > v3 -> v4: Allow the usual state selection to be carried out if the tick has been
> >           stopped in case the predicted idle duration is greater than the tick
> >           period length and a matching state can be found without overriding
> >           the prediction result.
> >
> > v4 -> v5: Rework code to be more straightforward.  Functionally, it should
> >           behave like the v4.
> > ---
> >  drivers/cpuidle/governors/menu.c |   36 ++++++++++++++++++++++++------------
> >  1 file changed, 24 insertions(+), 12 deletions(-)
> >
> > Index: linux-pm/drivers/cpuidle/governors/menu.c
> > ===================================================================
> > --- linux-pm.orig/drivers/cpuidle/governors/menu.c
> > +++ linux-pm/drivers/cpuidle/governors/menu.c
> > @@ -349,14 +349,12 @@ static int menu_select(struct cpuidle_dr
> >                * If the tick is already stopped, the cost of possible short
> >                * idle duration misprediction is much higher, because the CPU
> >                * may be stuck in a shallow idle state for a long time as a
> > -              * result of it.  In that case say we might mispredict and try
> > -              * to force the CPU into a state for which we would have stopped
> > -              * the tick, unless a timer is going to expire really soon
> > -              * anyway.
> > +              * result of it.  In that case say we might mispredict and use
> > +              * the known time till the closest timer event for the idle
> > +              * state selection.
> >                */
> >               if (data->predicted_us < TICK_USEC)
> > -                     data->predicted_us = min_t(unsigned int, TICK_USEC,
> > -                                                ktime_to_us(delta_next));
> > +                     data->predicted_us = ktime_to_us(delta_next);
> >       } else {
> >               /*
> >                * Use the performance multiplier and the user-configurable
> > @@ -381,8 +379,22 @@ static int menu_select(struct cpuidle_dr
> >                       continue;
> >               if (idx == -1)
> >                       idx = i; /* first enabled state */
> > -             if (s->target_residency > data->predicted_us)
> > -                     break;
> > +             if (s->target_residency > data->predicted_us) {
> > +                     if (!tick_nohz_tick_stopped())
> > +                             break;
> > +
> > +                     /*
> > +                      * If the state selected so far is shallow and this
> > +                      * state's target residency matches the time till the
> > +                      * closest timer event, select this one to avoid getting
> > +                      * stuck in the shallow one for too long.
> > +                      */
> > +                     if (drv->states[idx].target_residency < TICK_USEC &&
> > +                         s->target_residency <= ktime_to_us(delta_next))
> > +                             idx = i;
>
> I took some time to understand v4 and v5; now I understand you want to
> prompt to deeper idle state if the prediction falls in the range
> between [TICK_USEC..max_target_residency).  This seems to me is a
> huristics method but not general enough and IMHO this is more difficult
> for code readable.
>
> I agree this patch can resolve the issue you mentioned in the commit
> log, but this patch will be fragile for below case, so it will select
> state1 but not state2, how about you think for this case?

It is OK to select state1 in this case IMO, because if the tick had
not been stopped already, it would have been stopped after selecting
state1 anyway (it is not a "polling" state and expected_interval is
not below TICK_USEC).

> Idle_state0::
> target_residency   TICK_USEC   Prediction
>     |                 |       /
>     V                 V      /
> -----------------------------------------------------> Time length
>                                 ^                 ^
>                                 |                 |
>                             Idle_state1:       Idle_state2:
>                             target_residency   target_residency
>

Basically, the idea is to avoid getting stuck in the idle states for
which the tick would not have been stopped, had it been still running.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] cpuidle: menu: Handle stopped tick more aggressively
  2018-08-12 14:55     ` leo.yan
  2018-08-13  8:11       ` Rafael J. Wysocki
@ 2018-08-20 10:15       ` Peter Zijlstra
  1 sibling, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2018-08-20 10:15 UTC (permalink / raw)
  To: leo.yan
  Cc: Rafael J. Wysocki, Linux PM, LKML, Frederic Weisbecker, Thomas Gleixner

On Sun, Aug 12, 2018 at 10:55:15PM +0800, leo.yan@linaro.org wrote:
> The first one issue is caused by timer cancel, I wrote one case for
> CPU_0 starting a hrtimer with pinned mode with short expire time and
> when the CPU_0 goes to sleep this short timeout timer can let idle
> governor selects a shallow state; at the meantime another CPU_1 will
> be used to try to cancel the timer, my purpose is to cheat CPU_0 so can
> see the CPU_0 staying in shallow state for long time;  it has low
> percentage to cancel the timer successfully, but I do see seldomly the
> timer can be canceled successfully so CPU_0 will stay in idle for long
> time (I cannot explain why the timer cannot be canceled successfully
> for every time, this might be another issue?).  This case is tricky,
> but it's possible happen in drivers with timer cancel.

So this is really difficuly to make hapen I think; you first need the
CPU to go deep idle, such that it disabled the tick. Then you have to
start the hrtimer there (using an IPI I suppose) which will then force
the governor to pick a shallow idle state, and then you have to cancel
the timer before it gets triggered.

And then, if the CPU stays perfectly idle, it will be stuck in that
shallow state... forever more.

_However_ IIRC when we (remotely) cancel an hrtimer, we do not in fact
reprogram the timer hardware. So the timer _will_ trigger.
hrtimer_interrupt() will observe nothing to do and reprogram the
hardware for the next timer (if there is one).

This should be enough to cycle through the idle loop and re-select an
idle state and avoid this whole problem.

If that is not happening, then something is busted and we need to figure
out what.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5] cpuidle: menu: Handle stopped tick more aggressively
  2018-08-14 10:34       ` [PATCH v5] " Rafael J. Wysocki
  2018-08-14 15:44         ` leo.yan
@ 2018-08-20 11:02         ` Peter Zijlstra
  1 sibling, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2018-08-20 11:02 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux PM, LKML, Leo Yan, Frederic Weisbecker

On Tue, Aug 14, 2018 at 12:34:40PM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Commit 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states
> with stopped tick) missed the case when the target residencies of
> deep idle states of CPUs are above the tick boundary which may cause
> the CPU to get stuck in a shallow idle state for a long time.
> 
> Say there are two CPU idle states available: one shallow, with the
> target residency much below the tick boundary and one deep, with
> the target residency significantly above the tick boundary.  In
> that case, if the tick has been stopped already and the expected
> next timer event is relatively far in the future, the governor will
> assume the idle duration to be equal to TICK_USEC and it will select
> the idle state for the CPU accordingly.  However, that will cause the
> shallow state to be selected even though it would have been more
> energy-efficient to select the deep one.
> 
> To address this issue, modify the governor to always use the time
> till the closest timer event instead of the predicted idle duration
> if the latter is less than the tick period length and the tick has
> been stopped already.  Also make it extend the search for a matching
> idle state if the tick is stopped to avoid settling on a shallow
> state if deep states with target residencies above the tick period
> length are available.
> 
> In addition, make it always indicate that the tick should be stopped
> if it has been stopped already for consistency.
> 
> Fixes: 87c9fe6ee495 (cpuidle: menu: Avoid selecting shallow states with stopped tick)
> Reported-by: Leo Yan <leo.yan@linaro.org>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Seems OK,

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5] cpuidle: menu: Handle stopped tick more aggressively
  2018-08-14 15:44         ` leo.yan
  2018-08-14 17:26           ` Rafael J. Wysocki
@ 2018-08-20 11:04           ` Peter Zijlstra
  1 sibling, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2018-08-20 11:04 UTC (permalink / raw)
  To: leo.yan; +Cc: Rafael J. Wysocki, Linux PM, LKML, Frederic Weisbecker

On Tue, Aug 14, 2018 at 11:44:41PM +0800, leo.yan@linaro.org wrote:
> I agree this patch can resolve the issue you mentioned in the commit
> log, but this patch will be fragile for below case, so it will select
> state1 but not state2, how about you think for this case?
> 
> Idle_state0::
> target_residency   TICK_USEC   Prediction
>     |                 |       /
>     V                 V      /
> -----------------------------------------------------> Time length
>                                 ^                 ^
>                                 |                 |
>                             Idle_state1:       Idle_state2:
>                             target_residency   target_residency

If someone can demonstate this is an actual problem; then we need to fix
it I suppose. But it would be very good to have a real world
need/reproducer for it first.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-08-20 11:04 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-10  7:34 [PATCH] cpuidle: menu: Handle stopped tick more aggressively Rafael J. Wysocki
2018-08-10  7:57 ` [PATCH v2] " Rafael J. Wysocki
2018-08-10  9:20   ` leo.yan
2018-08-10 11:04     ` Rafael J. Wysocki
2018-08-10 12:31       ` leo.yan
2018-08-12 10:07         ` Rafael J. Wysocki
2018-08-12 13:44           ` leo.yan
2018-08-13  7:58             ` Rafael J. Wysocki
2018-08-10 11:15   ` [PATCH v3] " Rafael J. Wysocki
2018-08-12 14:55     ` leo.yan
2018-08-13  8:11       ` Rafael J. Wysocki
2018-08-20 10:15       ` Peter Zijlstra
2018-08-13 11:26     ` [PATCH v4] " Rafael J. Wysocki
2018-08-14 10:34       ` [PATCH v5] " Rafael J. Wysocki
2018-08-14 15:44         ` leo.yan
2018-08-14 17:26           ` Rafael J. Wysocki
2018-08-20 11:04           ` Peter Zijlstra
2018-08-20 11:02         ` Peter Zijlstra
2018-08-11 16:32 ` [PATCH] " kbuild test robot
2018-08-12 22:13 ` Dan Carpenter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).