linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] CPUIdle: Reevaluate C-states under CPU load to favor deeper C-states
@ 2011-09-19 23:35 Kevin Hilman
  2011-10-19 13:11 ` Kevin Hilman
  2011-11-04 21:46 ` Kevin Hilman
  0 siblings, 2 replies; 7+ messages in thread
From: Kevin Hilman @ 2011-09-19 23:35 UTC (permalink / raw)
  To: linux-kernel, Arjan van de Ven
  Cc: linux-arm-kernel, linux-omap, linux-pm, Nicole Chalhoub,
	Nicole Chalhoub, Kevin Hilman

From: Nicole Chalhoub <n-chalhoub@ti.com>

While there is CPU load, program a C-state specific one-shot timer in
order to give CPUidle another opportunity to pick a deeper C-state
instead of spending potentially long idle times in a shallow C-state.

Long winded version:
When going idle with a high load average, CPUidle menu governor will
decide to pick a shallow C-state since one of the guiding principles
of the menu governor is "The busier the system, the less impact of
C-states is acceptable" (taken from cpuidle/governors/menu.c.)
That makes perfect sense.

However, there are missed power-saving opportunities for bursty
workloads with long idle times (e.g. MP3 playback.)  Given such a
workload, because of the load average, CPUidle tends to pick a shallow
C-state.  Because we also go tickless, this shallow C-state is used
for the duration of the idle period. If the idle period is long, a
deeper C state would've resulted in better power savings.
This patch provides an additional opportuntity for CPUidle to pick a
deeper C-state by programming a timer (with a C-state specific timeout)
such that the CPUidle governor will have another opportunity to pick a
deeper C-state.

Adding this timer for C-state reevaluation improved the load estimation
on our ARM/OMAP4 platform and increased the time spent in deep C-states
(~50% of idle time in C-states deeper than C1).  A power saving of ~10mA
at battery level is observed during MP3 playback on OMAP4/Blaze board.

Signed-off-by: Nicole Chalhoub <n-chalhoub@ti.com>
Signed-off-by: Kevin Hilman <khilman@ti.com>
---
 drivers/cpuidle/cpuidle.c        |   28 +++++++++++++++++++++++++-
 drivers/cpuidle/governors/menu.c |   39 ++++++++++++++++++++++++++++++++-----
 include/linux/cpuidle.h          |    4 +++
 3 files changed, 63 insertions(+), 8 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 1994885..4b1ac0c 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -92,13 +92,33 @@ static void cpuidle_idle_call(void)
 	target_state->time += (unsigned long long)dev->last_residency;
 	target_state->usage++;
 
-	/* give the governor an opportunity to reflect on the outcome */
-	if (cpuidle_curr_governor->reflect)
+	hrtimer_cancel(&dev->cstate_timer);
+
+	/*
+	 * Give the governor an opportunity to reflect on the outcome
+	 * Do not take into account the wakeups due to the hrtimer, they
+	 * should not impact the predicted idle time.
+	 */
+	if ((!dev->hrtimer_expired) && cpuidle_curr_governor->reflect)
 		cpuidle_curr_governor->reflect(dev);
 	trace_power_end(0);
 }
 
 /**
+ * cstate_reassessment_timer - interrupt handler of the cstate hrtimer
+ * @handle:	the expired hrtimer
+ */
+static enum hrtimer_restart cstate_reassessment_timer(struct hrtimer *handle)
+{
+	struct cpuidle_device *data =
+		container_of(handle, struct cpuidle_device, cstate_timer);
+
+	data->hrtimer_expired = 1;
+
+	return HRTIMER_NORESTART;
+}
+
+/**
  * cpuidle_install_idle_handler - installs the cpuidle idle loop handler
  */
 void cpuidle_install_idle_handler(void)
@@ -185,6 +205,10 @@ int cpuidle_enable_device(struct cpuidle_device *dev)
 
 	dev->enabled = 1;
 
+	dev->hrtimer_expired = 0;
+	hrtimer_init(&dev->cstate_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	dev->cstate_timer.function = cstate_reassessment_timer;
+
 	enabled_devices++;
 	return 0;
 
diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 1b12870..fd54584 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -125,10 +125,21 @@ struct menu_device {
 #define LOAD_INT(x) ((x) >> FSHIFT)
 #define LOAD_FRAC(x) LOAD_INT(((x) & (FIXED_1-1)) * 100)
 
-static int get_loadavg(void)
+static int get_loadavg(struct cpuidle_device *dev)
 {
-	unsigned long this = this_cpu_load();
+	unsigned long this;
 
+	/*
+	 * this_cpu_load() returns the value of rq->load.weight
+	 * at the previous scheduler tick and not the current value.
+	 * If the timer expired, that means we are in idle,there
+	 * are no more runnable processes in the current queue
+	 * =>return the current value of rq->load.weight which is 0.
+	 */
+	if (dev->hrtimer_expired == 1)
+		return 0;
+	else
+		this = this_cpu_load();
 
 	return LOAD_INT(this) * 10 + LOAD_FRAC(this) / 10;
 }
@@ -166,13 +177,13 @@ static inline int which_bucket(unsigned int duration)
  * to be, the higher this multiplier, and thus the higher
  * the barrier to go to an expensive C state.
  */
-static inline int performance_multiplier(void)
+static inline int performance_multiplier(struct cpuidle_device *dev)
 {
 	int mult = 1;
 
 	/* for higher loadavg, we are more reluctant */
 
-	mult += 2 * get_loadavg();
+	mult += 2 * get_loadavg(dev);
 
 	/* for IO wait tasks (per cpu!) we add 5x each */
 	mult += 10 * nr_iowait_cpu(smp_processor_id());
@@ -236,6 +247,7 @@ static int menu_select(struct cpuidle_device *dev)
 	int latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY);
 	int i;
 	int multiplier;
+	ktime_t timeout;
 
 	if (data->needs_update) {
 		menu_update(dev);
@@ -256,7 +268,7 @@ static int menu_select(struct cpuidle_device *dev)
 
 	data->bucket = which_bucket(data->expected_us);
 
-	multiplier = performance_multiplier();
+	multiplier = performance_multiplier(dev);
 
 	/*
 	 * if the correction factor is 0 (eg first time init or cpu hotplug
@@ -287,12 +299,27 @@ static int menu_select(struct cpuidle_device *dev)
 			break;
 		if (s->exit_latency > latency_req)
 			break;
-		if (s->exit_latency * multiplier > data->predicted_us)
+		if (s->exit_latency * multiplier > data->predicted_us) {
+			/*
+			 * Could not enter the next C-state because of a high
+			 * load. Set a timer in order to check the load again
+			 * after the timeout expires and re-evaluate cstate.
+			 */
+			if (s->hrtimer_timeout != 0 && get_loadavg(dev)) {
+				timeout =
+				       ktime_set(0,
+					   s->hrtimer_timeout * NSEC_PER_USEC);
+				hrtimer_start(&dev->cstate_timer, timeout,
+					   HRTIMER_MODE_REL);
+			}
 			break;
+		}
 		data->exit_us = s->exit_latency;
 		data->last_state_idx = i;
 	}
 
+	/* Reset hrtimer_expired which is set when the hrtimer fires */
+	dev->hrtimer_expired = 0;
 	return data->last_state_idx;
 }
 
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 55215cc..8d11b52 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -16,6 +16,7 @@
 #include <linux/module.h>
 #include <linux/kobject.h>
 #include <linux/completion.h>
+#include <linux/hrtimer.h>
 
 #define CPUIDLE_STATE_MAX	8
 #define CPUIDLE_NAME_LEN	16
@@ -37,6 +38,7 @@ struct cpuidle_state {
 	unsigned int	exit_latency; /* in US */
 	unsigned int	power_usage; /* in mW */
 	unsigned int	target_residency; /* in US */
+	unsigned int	hrtimer_timeout; /* in US */
 
 	unsigned long long	usage;
 	unsigned long long	time; /* in US */
@@ -97,6 +99,8 @@ struct cpuidle_device {
 	struct completion	kobj_unregister;
 	void			*governor_data;
 	struct cpuidle_state	*safe_state;
+	struct hrtimer          cstate_timer;
+	unsigned int            hrtimer_expired;
 };
 
 DECLARE_PER_CPU(struct cpuidle_device *, cpuidle_devices);
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] CPUIdle: Reevaluate C-states under CPU load to favor deeper C-states
  2011-09-19 23:35 [PATCH] CPUIdle: Reevaluate C-states under CPU load to favor deeper C-states Kevin Hilman
@ 2011-10-19 13:11 ` Kevin Hilman
  2011-11-04 21:46 ` Kevin Hilman
  1 sibling, 0 replies; 7+ messages in thread
From: Kevin Hilman @ 2011-10-19 13:11 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: linux-kernel, linux-arm-kernel, linux-omap, linux-pm, Nicole Chalhoub

Hi Arjan,

Kevin Hilman <khilman@ti.com> writes:

> From: Nicole Chalhoub <n-chalhoub@ti.com>
>
> While there is CPU load, program a C-state specific one-shot timer in
> order to give CPUidle another opportunity to pick a deeper C-state
> instead of spending potentially long idle times in a shallow C-state.

Any comments on this?

This is an implementation of an idea proposed by you on our previous
attempt[1] to do something similar.

Thanks,

Kevin

[1] http://lkml.org/lkml/2011/4/7/155
>
> Long winded version:
> When going idle with a high load average, CPUidle menu governor will
> decide to pick a shallow C-state since one of the guiding principles
> of the menu governor is "The busier the system, the less impact of
> C-states is acceptable" (taken from cpuidle/governors/menu.c.)
> That makes perfect sense.
>
> However, there are missed power-saving opportunities for bursty
> workloads with long idle times (e.g. MP3 playback.)  Given such a
> workload, because of the load average, CPUidle tends to pick a shallow
> C-state.  Because we also go tickless, this shallow C-state is used
> for the duration of the idle period. If the idle period is long, a
> deeper C state would've resulted in better power savings.
> This patch provides an additional opportuntity for CPUidle to pick a
> deeper C-state by programming a timer (with a C-state specific timeout)
> such that the CPUidle governor will have another opportunity to pick a
> deeper C-state.
>
> Adding this timer for C-state reevaluation improved the load estimation
> on our ARM/OMAP4 platform and increased the time spent in deep C-states
> (~50% of idle time in C-states deeper than C1).  A power saving of ~10mA
> at battery level is observed during MP3 playback on OMAP4/Blaze board.
>
> Signed-off-by: Nicole Chalhoub <n-chalhoub@ti.com>
> Signed-off-by: Kevin Hilman <khilman@ti.com>
> ---
>  drivers/cpuidle/cpuidle.c        |   28 +++++++++++++++++++++++++-
>  drivers/cpuidle/governors/menu.c |   39 ++++++++++++++++++++++++++++++++-----
>  include/linux/cpuidle.h          |    4 +++
>  3 files changed, 63 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index 1994885..4b1ac0c 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -92,13 +92,33 @@ static void cpuidle_idle_call(void)
>  	target_state->time += (unsigned long long)dev->last_residency;
>  	target_state->usage++;
>  
> -	/* give the governor an opportunity to reflect on the outcome */
> -	if (cpuidle_curr_governor->reflect)
> +	hrtimer_cancel(&dev->cstate_timer);
> +
> +	/*
> +	 * Give the governor an opportunity to reflect on the outcome
> +	 * Do not take into account the wakeups due to the hrtimer, they
> +	 * should not impact the predicted idle time.
> +	 */
> +	if ((!dev->hrtimer_expired) && cpuidle_curr_governor->reflect)
>  		cpuidle_curr_governor->reflect(dev);
>  	trace_power_end(0);
>  }
>  
>  /**
> + * cstate_reassessment_timer - interrupt handler of the cstate hrtimer
> + * @handle:	the expired hrtimer
> + */
> +static enum hrtimer_restart cstate_reassessment_timer(struct hrtimer *handle)
> +{
> +	struct cpuidle_device *data =
> +		container_of(handle, struct cpuidle_device, cstate_timer);
> +
> +	data->hrtimer_expired = 1;
> +
> +	return HRTIMER_NORESTART;
> +}
> +
> +/**
>   * cpuidle_install_idle_handler - installs the cpuidle idle loop handler
>   */
>  void cpuidle_install_idle_handler(void)
> @@ -185,6 +205,10 @@ int cpuidle_enable_device(struct cpuidle_device *dev)
>  
>  	dev->enabled = 1;
>  
> +	dev->hrtimer_expired = 0;
> +	hrtimer_init(&dev->cstate_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> +	dev->cstate_timer.function = cstate_reassessment_timer;
> +
>  	enabled_devices++;
>  	return 0;
>  
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index 1b12870..fd54584 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -125,10 +125,21 @@ struct menu_device {
>  #define LOAD_INT(x) ((x) >> FSHIFT)
>  #define LOAD_FRAC(x) LOAD_INT(((x) & (FIXED_1-1)) * 100)
>  
> -static int get_loadavg(void)
> +static int get_loadavg(struct cpuidle_device *dev)
>  {
> -	unsigned long this = this_cpu_load();
> +	unsigned long this;
>  
> +	/*
> +	 * this_cpu_load() returns the value of rq->load.weight
> +	 * at the previous scheduler tick and not the current value.
> +	 * If the timer expired, that means we are in idle,there
> +	 * are no more runnable processes in the current queue
> +	 * =>return the current value of rq->load.weight which is 0.
> +	 */
> +	if (dev->hrtimer_expired == 1)
> +		return 0;
> +	else
> +		this = this_cpu_load();
>  
>  	return LOAD_INT(this) * 10 + LOAD_FRAC(this) / 10;
>  }
> @@ -166,13 +177,13 @@ static inline int which_bucket(unsigned int duration)
>   * to be, the higher this multiplier, and thus the higher
>   * the barrier to go to an expensive C state.
>   */
> -static inline int performance_multiplier(void)
> +static inline int performance_multiplier(struct cpuidle_device *dev)
>  {
>  	int mult = 1;
>  
>  	/* for higher loadavg, we are more reluctant */
>  
> -	mult += 2 * get_loadavg();
> +	mult += 2 * get_loadavg(dev);
>  
>  	/* for IO wait tasks (per cpu!) we add 5x each */
>  	mult += 10 * nr_iowait_cpu(smp_processor_id());
> @@ -236,6 +247,7 @@ static int menu_select(struct cpuidle_device *dev)
>  	int latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY);
>  	int i;
>  	int multiplier;
> +	ktime_t timeout;
>  
>  	if (data->needs_update) {
>  		menu_update(dev);
> @@ -256,7 +268,7 @@ static int menu_select(struct cpuidle_device *dev)
>  
>  	data->bucket = which_bucket(data->expected_us);
>  
> -	multiplier = performance_multiplier();
> +	multiplier = performance_multiplier(dev);
>  
>  	/*
>  	 * if the correction factor is 0 (eg first time init or cpu hotplug
> @@ -287,12 +299,27 @@ static int menu_select(struct cpuidle_device *dev)
>  			break;
>  		if (s->exit_latency > latency_req)
>  			break;
> -		if (s->exit_latency * multiplier > data->predicted_us)
> +		if (s->exit_latency * multiplier > data->predicted_us) {
> +			/*
> +			 * Could not enter the next C-state because of a high
> +			 * load. Set a timer in order to check the load again
> +			 * after the timeout expires and re-evaluate cstate.
> +			 */
> +			if (s->hrtimer_timeout != 0 && get_loadavg(dev)) {
> +				timeout =
> +				       ktime_set(0,
> +					   s->hrtimer_timeout * NSEC_PER_USEC);
> +				hrtimer_start(&dev->cstate_timer, timeout,
> +					   HRTIMER_MODE_REL);
> +			}
>  			break;
> +		}
>  		data->exit_us = s->exit_latency;
>  		data->last_state_idx = i;
>  	}
>  
> +	/* Reset hrtimer_expired which is set when the hrtimer fires */
> +	dev->hrtimer_expired = 0;
>  	return data->last_state_idx;
>  }
>  
> diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
> index 55215cc..8d11b52 100644
> --- a/include/linux/cpuidle.h
> +++ b/include/linux/cpuidle.h
> @@ -16,6 +16,7 @@
>  #include <linux/module.h>
>  #include <linux/kobject.h>
>  #include <linux/completion.h>
> +#include <linux/hrtimer.h>
>  
>  #define CPUIDLE_STATE_MAX	8
>  #define CPUIDLE_NAME_LEN	16
> @@ -37,6 +38,7 @@ struct cpuidle_state {
>  	unsigned int	exit_latency; /* in US */
>  	unsigned int	power_usage; /* in mW */
>  	unsigned int	target_residency; /* in US */
> +	unsigned int	hrtimer_timeout; /* in US */
>  
>  	unsigned long long	usage;
>  	unsigned long long	time; /* in US */
> @@ -97,6 +99,8 @@ struct cpuidle_device {
>  	struct completion	kobj_unregister;
>  	void			*governor_data;
>  	struct cpuidle_state	*safe_state;
> +	struct hrtimer          cstate_timer;
> +	unsigned int            hrtimer_expired;
>  };
>  
>  DECLARE_PER_CPU(struct cpuidle_device *, cpuidle_devices);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] CPUIdle: Reevaluate C-states under CPU load to favor deeper C-states
  2011-09-19 23:35 [PATCH] CPUIdle: Reevaluate C-states under CPU load to favor deeper C-states Kevin Hilman
  2011-10-19 13:11 ` Kevin Hilman
@ 2011-11-04 21:46 ` Kevin Hilman
  2011-11-09 11:13   ` Deepthi Dharwar
  1 sibling, 1 reply; 7+ messages in thread
From: Kevin Hilman @ 2011-11-04 21:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: Arjan van de Ven, linux-arm-kernel, linux-omap, linux-pm,
	Nicole Chalhoub

ping v2

Kevin Hilman <khilman@ti.com> writes:

> From: Nicole Chalhoub <n-chalhoub@ti.com>
>
> While there is CPU load, program a C-state specific one-shot timer in
> order to give CPUidle another opportunity to pick a deeper C-state
> instead of spending potentially long idle times in a shallow C-state.
>
> Long winded version:
> When going idle with a high load average, CPUidle menu governor will
> decide to pick a shallow C-state since one of the guiding principles
> of the menu governor is "The busier the system, the less impact of
> C-states is acceptable" (taken from cpuidle/governors/menu.c.)
> That makes perfect sense.
>
> However, there are missed power-saving opportunities for bursty
> workloads with long idle times (e.g. MP3 playback.)  Given such a
> workload, because of the load average, CPUidle tends to pick a shallow
> C-state.  Because we also go tickless, this shallow C-state is used
> for the duration of the idle period. If the idle period is long, a
> deeper C state would've resulted in better power savings.
> This patch provides an additional opportuntity for CPUidle to pick a
> deeper C-state by programming a timer (with a C-state specific timeout)
> such that the CPUidle governor will have another opportunity to pick a
> deeper C-state.
>
> Adding this timer for C-state reevaluation improved the load estimation
> on our ARM/OMAP4 platform and increased the time spent in deep C-states
> (~50% of idle time in C-states deeper than C1).  A power saving of ~10mA
> at battery level is observed during MP3 playback on OMAP4/Blaze board.
>
> Signed-off-by: Nicole Chalhoub <n-chalhoub@ti.com>
> Signed-off-by: Kevin Hilman <khilman@ti.com>
> ---
>  drivers/cpuidle/cpuidle.c        |   28 +++++++++++++++++++++++++-
>  drivers/cpuidle/governors/menu.c |   39 ++++++++++++++++++++++++++++++++-----
>  include/linux/cpuidle.h          |    4 +++
>  3 files changed, 63 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index 1994885..4b1ac0c 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -92,13 +92,33 @@ static void cpuidle_idle_call(void)
>  	target_state->time += (unsigned long long)dev->last_residency;
>  	target_state->usage++;
>  
> -	/* give the governor an opportunity to reflect on the outcome */
> -	if (cpuidle_curr_governor->reflect)
> +	hrtimer_cancel(&dev->cstate_timer);
> +
> +	/*
> +	 * Give the governor an opportunity to reflect on the outcome
> +	 * Do not take into account the wakeups due to the hrtimer, they
> +	 * should not impact the predicted idle time.
> +	 */
> +	if ((!dev->hrtimer_expired) && cpuidle_curr_governor->reflect)
>  		cpuidle_curr_governor->reflect(dev);
>  	trace_power_end(0);
>  }
>  
>  /**
> + * cstate_reassessment_timer - interrupt handler of the cstate hrtimer
> + * @handle:	the expired hrtimer
> + */
> +static enum hrtimer_restart cstate_reassessment_timer(struct hrtimer *handle)
> +{
> +	struct cpuidle_device *data =
> +		container_of(handle, struct cpuidle_device, cstate_timer);
> +
> +	data->hrtimer_expired = 1;
> +
> +	return HRTIMER_NORESTART;
> +}
> +
> +/**
>   * cpuidle_install_idle_handler - installs the cpuidle idle loop handler
>   */
>  void cpuidle_install_idle_handler(void)
> @@ -185,6 +205,10 @@ int cpuidle_enable_device(struct cpuidle_device *dev)
>  
>  	dev->enabled = 1;
>  
> +	dev->hrtimer_expired = 0;
> +	hrtimer_init(&dev->cstate_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> +	dev->cstate_timer.function = cstate_reassessment_timer;
> +
>  	enabled_devices++;
>  	return 0;
>  
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index 1b12870..fd54584 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -125,10 +125,21 @@ struct menu_device {
>  #define LOAD_INT(x) ((x) >> FSHIFT)
>  #define LOAD_FRAC(x) LOAD_INT(((x) & (FIXED_1-1)) * 100)
>  
> -static int get_loadavg(void)
> +static int get_loadavg(struct cpuidle_device *dev)
>  {
> -	unsigned long this = this_cpu_load();
> +	unsigned long this;
>  
> +	/*
> +	 * this_cpu_load() returns the value of rq->load.weight
> +	 * at the previous scheduler tick and not the current value.
> +	 * If the timer expired, that means we are in idle,there
> +	 * are no more runnable processes in the current queue
> +	 * =>return the current value of rq->load.weight which is 0.
> +	 */
> +	if (dev->hrtimer_expired == 1)
> +		return 0;
> +	else
> +		this = this_cpu_load();
>  
>  	return LOAD_INT(this) * 10 + LOAD_FRAC(this) / 10;
>  }
> @@ -166,13 +177,13 @@ static inline int which_bucket(unsigned int duration)
>   * to be, the higher this multiplier, and thus the higher
>   * the barrier to go to an expensive C state.
>   */
> -static inline int performance_multiplier(void)
> +static inline int performance_multiplier(struct cpuidle_device *dev)
>  {
>  	int mult = 1;
>  
>  	/* for higher loadavg, we are more reluctant */
>  
> -	mult += 2 * get_loadavg();
> +	mult += 2 * get_loadavg(dev);
>  
>  	/* for IO wait tasks (per cpu!) we add 5x each */
>  	mult += 10 * nr_iowait_cpu(smp_processor_id());
> @@ -236,6 +247,7 @@ static int menu_select(struct cpuidle_device *dev)
>  	int latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY);
>  	int i;
>  	int multiplier;
> +	ktime_t timeout;
>  
>  	if (data->needs_update) {
>  		menu_update(dev);
> @@ -256,7 +268,7 @@ static int menu_select(struct cpuidle_device *dev)
>  
>  	data->bucket = which_bucket(data->expected_us);
>  
> -	multiplier = performance_multiplier();
> +	multiplier = performance_multiplier(dev);
>  
>  	/*
>  	 * if the correction factor is 0 (eg first time init or cpu hotplug
> @@ -287,12 +299,27 @@ static int menu_select(struct cpuidle_device *dev)
>  			break;
>  		if (s->exit_latency > latency_req)
>  			break;
> -		if (s->exit_latency * multiplier > data->predicted_us)
> +		if (s->exit_latency * multiplier > data->predicted_us) {
> +			/*
> +			 * Could not enter the next C-state because of a high
> +			 * load. Set a timer in order to check the load again
> +			 * after the timeout expires and re-evaluate cstate.
> +			 */
> +			if (s->hrtimer_timeout != 0 && get_loadavg(dev)) {
> +				timeout =
> +				       ktime_set(0,
> +					   s->hrtimer_timeout * NSEC_PER_USEC);
> +				hrtimer_start(&dev->cstate_timer, timeout,
> +					   HRTIMER_MODE_REL);
> +			}
>  			break;
> +		}
>  		data->exit_us = s->exit_latency;
>  		data->last_state_idx = i;
>  	}
>  
> +	/* Reset hrtimer_expired which is set when the hrtimer fires */
> +	dev->hrtimer_expired = 0;
>  	return data->last_state_idx;
>  }
>  
> diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
> index 55215cc..8d11b52 100644
> --- a/include/linux/cpuidle.h
> +++ b/include/linux/cpuidle.h
> @@ -16,6 +16,7 @@
>  #include <linux/module.h>
>  #include <linux/kobject.h>
>  #include <linux/completion.h>
> +#include <linux/hrtimer.h>
>  
>  #define CPUIDLE_STATE_MAX	8
>  #define CPUIDLE_NAME_LEN	16
> @@ -37,6 +38,7 @@ struct cpuidle_state {
>  	unsigned int	exit_latency; /* in US */
>  	unsigned int	power_usage; /* in mW */
>  	unsigned int	target_residency; /* in US */
> +	unsigned int	hrtimer_timeout; /* in US */
>  
>  	unsigned long long	usage;
>  	unsigned long long	time; /* in US */
> @@ -97,6 +99,8 @@ struct cpuidle_device {
>  	struct completion	kobj_unregister;
>  	void			*governor_data;
>  	struct cpuidle_state	*safe_state;
> +	struct hrtimer          cstate_timer;
> +	unsigned int            hrtimer_expired;
>  };
>  
>  DECLARE_PER_CPU(struct cpuidle_device *, cpuidle_devices);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] CPUIdle: Reevaluate C-states under CPU load to favor deeper C-states
  2011-11-04 21:46 ` Kevin Hilman
@ 2011-11-09 11:13   ` Deepthi Dharwar
  2011-11-09 18:06     ` Chalhoub, Nicole
  0 siblings, 1 reply; 7+ messages in thread
From: Deepthi Dharwar @ 2011-11-09 11:13 UTC (permalink / raw)
  To: Kevin Hilman
  Cc: linux-kernel, Arjan van de Ven, linux-arm-kernel, linux-omap,
	linux-pm, Nicole Chalhoub

On Saturday 05 November 2011 03:16 AM, Kevin Hilman wrote:
> ping v2
> 
> Kevin Hilman <khilman@ti.com> writes:
> 
>> From: Nicole Chalhoub <n-chalhoub@ti.com>
>>
>> While there is CPU load, program a C-state specific one-shot timer in
>> order to give CPUidle another opportunity to pick a deeper C-state
>> instead of spending potentially long idle times in a shallow C-state.
>>
>> Long winded version:
>> When going idle with a high load average, CPUidle menu governor will
>> decide to pick a shallow C-state since one of the guiding principles
>> of the menu governor is "The busier the system, the less impact of
>> C-states is acceptable" (taken from cpuidle/governors/menu.c.)
>> That makes perfect sense.
>>
>> However, there are missed power-saving opportunities for bursty
>> workloads with long idle times (e.g. MP3 playback.)  Given such a
>> workload, because of the load average, CPUidle tends to pick a shallow
>> C-state.  Because we also go tickless, this shallow C-state is used
>> for the duration of the idle period. If the idle period is long, a
>> deeper C state would've resulted in better power savings.
>> This patch provides an additional opportuntity for CPUidle to pick a
>> deeper C-state by programming a timer (with a C-state specific timeout)
>> such that the CPUidle governor will have another opportunity to pick a
>> deeper C-state.
>>
>> Adding this timer for C-state reevaluation improved the load estimation
>> on our ARM/OMAP4 platform and increased the time spent in deep C-states
>> (~50% of idle time in C-states deeper than C1).  A power saving of ~10mA
>> at battery level is observed during MP3 playback on OMAP4/Blaze board.
>>
>> Signed-off-by: Nicole Chalhoub <n-chalhoub@ti.com>
>> Signed-off-by: Kevin Hilman <khilman@ti.com>
>> ---
>>  drivers/cpuidle/cpuidle.c        |   28 +++++++++++++++++++++++++-
>>  drivers/cpuidle/governors/menu.c |   39 ++++++++++++++++++++++++++++++++-----
>>  include/linux/cpuidle.h          |    4 +++
>>  3 files changed, 63 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
>> index 1994885..4b1ac0c 100644
>> --- a/drivers/cpuidle/cpuidle.c
>> +++ b/drivers/cpuidle/cpuidle.c
>> @@ -92,13 +92,33 @@ static void cpuidle_idle_call(void)
>>  	target_state->time += (unsigned long long)dev->last_residency;
>>  	target_state->usage++;
>>  
>> -	/* give the governor an opportunity to reflect on the outcome */
>> -	if (cpuidle_curr_governor->reflect)
>> +	hrtimer_cancel(&dev->cstate_timer);
>> +
>> +	/*
>> +	 * Give the governor an opportunity to reflect on the outcome
>> +	 * Do not take into account the wakeups due to the hrtimer, they
>> +	 * should not impact the predicted idle time.
>> +	 */
>> +	if ((!dev->hrtimer_expired) && cpuidle_curr_governor->reflect)
>>  		cpuidle_curr_governor->reflect(dev);
>>  	trace_power_end(0);
>>  }
>>  
>>  /**
>> + * cstate_reassessment_timer - interrupt handler of the cstate hrtimer
>> + * @handle:	the expired hrtimer
>> + */
>> +static enum hrtimer_restart cstate_reassessment_timer(struct hrtimer *handle)
>> +{
>> +	struct cpuidle_device *data =
>> +		container_of(handle, struct cpuidle_device, cstate_timer);
>> +
>> +	data->hrtimer_expired = 1;
>> +
>> +	return HRTIMER_NORESTART;
>> +}
>> +
>> +/**
>>   * cpuidle_install_idle_handler - installs the cpuidle idle loop handler
>>   */
>>  void cpuidle_install_idle_handler(void)
>> @@ -185,6 +205,10 @@ int cpuidle_enable_device(struct cpuidle_device *dev)
>>  
>>  	dev->enabled = 1;
>>  
>> +	dev->hrtimer_expired = 0;
>> +	hrtimer_init(&dev->cstate_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
>> +	dev->cstate_timer.function = cstate_reassessment_timer;
>> +
>>  	enabled_devices++;
>>  	return 0;
>>  
>> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
>> index 1b12870..fd54584 100644
>> --- a/drivers/cpuidle/governors/menu.c
>> +++ b/drivers/cpuidle/governors/menu.c
>> @@ -125,10 +125,21 @@ struct menu_device {
>>  #define LOAD_INT(x) ((x) >> FSHIFT)
>>  #define LOAD_FRAC(x) LOAD_INT(((x) & (FIXED_1-1)) * 100)
>>  
>> -static int get_loadavg(void)
>> +static int get_loadavg(struct cpuidle_device *dev)
>>  {
>> -	unsigned long this = this_cpu_load();
>> +	unsigned long this;
>>  
>> +	/*
>> +	 * this_cpu_load() returns the value of rq->load.weight
>> +	 * at the previous scheduler tick and not the current value.
>> +	 * If the timer expired, that means we are in idle,there
>> +	 * are no more runnable processes in the current queue
>> +	 * =>return the current value of rq->load.weight which is 0.
>> +	 */
>> +	if (dev->hrtimer_expired == 1)
>> +		return 0;
>> +	else
>> +		this = this_cpu_load();
>>  
>>  	return LOAD_INT(this) * 10 + LOAD_FRAC(this) / 10;
>>  }
>> @@ -166,13 +177,13 @@ static inline int which_bucket(unsigned int duration)
>>   * to be, the higher this multiplier, and thus the higher
>>   * the barrier to go to an expensive C state.
>>   */
>> -static inline int performance_multiplier(void)
>> +static inline int performance_multiplier(struct cpuidle_device *dev)
>>  {
>>  	int mult = 1;
>>  
>>  	/* for higher loadavg, we are more reluctant */
>>  
>> -	mult += 2 * get_loadavg();
>> +	mult += 2 * get_loadavg(dev);
>>  
>>  	/* for IO wait tasks (per cpu!) we add 5x each */
>>  	mult += 10 * nr_iowait_cpu(smp_processor_id());
>> @@ -236,6 +247,7 @@ static int menu_select(struct cpuidle_device *dev)
>>  	int latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY);
>>  	int i;
>>  	int multiplier;
>> +	ktime_t timeout;
>>  
>>  	if (data->needs_update) {
>>  		menu_update(dev);
>> @@ -256,7 +268,7 @@ static int menu_select(struct cpuidle_device *dev)
>>  
>>  	data->bucket = which_bucket(data->expected_us);
>>  
>> -	multiplier = performance_multiplier();
>> +	multiplier = performance_multiplier(dev);
>>  
>>  	/*
>>  	 * if the correction factor is 0 (eg first time init or cpu hotplug
>> @@ -287,12 +299,27 @@ static int menu_select(struct cpuidle_device *dev)
>>  			break;
>>  		if (s->exit_latency > latency_req)
>>  			break;
>> -		if (s->exit_latency * multiplier > data->predicted_us)
>> +		if (s->exit_latency * multiplier > data->predicted_us) {
>> +			/*
>> +			 * Could not enter the next C-state because of a high
>> +			 * load. Set a timer in order to check the load again
>> +			 * after the timeout expires and re-evaluate cstate.
>> +			 */
>> +			if (s->hrtimer_timeout != 0 && get_loadavg(dev)) {
>> +				timeout =
>> +				       ktime_set(0,
>> +					   s->hrtimer_timeout * NSEC_PER_USEC);
>> +				hrtimer_start(&dev->cstate_timer, timeout,
>> +					   HRTIMER_MODE_REL);
>> +			}
>>  			break;
>> +		}
>>  		data->exit_us = s->exit_latency;
>>  		data->last_state_idx = i;
>>  	}
>>  
>> +	/* Reset hrtimer_expired which is set when the hrtimer fires */
>> +	dev->hrtimer_expired = 0;
>>  	return data->last_state_idx;
>>  }
>>  
>> diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
>> index 55215cc..8d11b52 100644
>> --- a/include/linux/cpuidle.h
>> +++ b/include/linux/cpuidle.h
>> @@ -16,6 +16,7 @@
>>  #include <linux/module.h>
>>  #include <linux/kobject.h>
>>  #include <linux/completion.h>
>> +#include <linux/hrtimer.h>
>>  
>>  #define CPUIDLE_STATE_MAX	8
>>  #define CPUIDLE_NAME_LEN	16
>> @@ -37,6 +38,7 @@ struct cpuidle_state {
>>  	unsigned int	exit_latency; /* in US */
>>  	unsigned int	power_usage; /* in mW */
>>  	unsigned int	target_residency; /* in US */
>> +	unsigned int	hrtimer_timeout; /* in US */
>>  
>>  	unsigned long long	usage;
>>  	unsigned long long	time; /* in US */
>> @@ -97,6 +99,8 @@ struct cpuidle_device {
>>  	struct completion	kobj_unregister;
>>  	void			*governor_data;
>>  	struct cpuidle_state	*safe_state;
>> +	struct hrtimer          cstate_timer;
>> +	unsigned int            hrtimer_expired;
>>  };
>>  
>>  DECLARE_PER_CPU(struct cpuidle_device *, cpuidle_devices);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

By setting timers when we enter non-deepest C-state possible, such that when it fires we
can re-evaluate and try moving into deeper and deeper C-states enhancing the
power savings is a good feature to have. 

Looking at the current implementation, is it possible to have it as configurable option
where one can enable/disable this functionality through the backhand driver ?

Also I am thinking, instead of having them in governor
wouldnt it be a good idea to have it implemented in 
the backhand driver itself ? 

--Deepthi


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] CPUIdle: Reevaluate C-states under CPU load to favor deeper C-states
  2011-11-09 11:13   ` Deepthi Dharwar
@ 2011-11-09 18:06     ` Chalhoub, Nicole
  2012-03-20 14:49       ` melwyn lobo
  0 siblings, 1 reply; 7+ messages in thread
From: Chalhoub, Nicole @ 2011-11-09 18:06 UTC (permalink / raw)
  To: Deepthi Dharwar, Hilman, Kevin
  Cc: linux-kernel, Arjan van de Ven, linux-arm-kernel, linux-omap, linux-pm

Hi Deepthi,

>
Texas Instruments France SA, 821 Avenue Jack Kilby, 06270 Villeneuve Loubet. 036 420 040 R.C.S Antibes. Capital de EUR 753.920

-----Original Message-----
> From: Deepthi Dharwar [mailto:deepthi@linux.vnet.ibm.com]
> Sent: Wednesday, November 09, 2011 12:13 PM
> To: Hilman, Kevin
> Cc: linux-kernel@vger.kernel.org; Arjan van de Ven; linux-arm-
> kernel@lists.infradead.org; linux-omap@vger.kernel.org; linux-
> pm@lists.linux-foundation.org; Chalhoub, Nicole
> Subject: Re: [PATCH] CPUIdle: Reevaluate C-states under CPU load to favor
> deeper C-states

[...]
> By setting timers when we enter non-deepest C-state possible, such that
> when it fires we
> can re-evaluate and try moving into deeper and deeper C-states enhancing
> the
> power savings is a good feature to have.
>
> Looking at the current implementation, is it possible to have it as
> configurable option
> where one can enable/disable this functionality through the backhand
> driver ?

The timeout values of the c state timers are set in the backhand driver.
By setting the timeout to 0 the timers will not fire so you'll not have this functionality enabled

> Also I am thinking, instead of having them in governor
> wouldnt it be a good idea to have it implemented in
> the backhand driver itself ?
> --Deepthi


In fact each C-state had its own configurable timer, so it is a parameter characterizing a C-state as it is for the exit_latency and target_residency parameters.
And we wanted the timer to fire only when we do not go in deep Cstate due to a high load. This decision is made in the CPU idle governor. So the functionality should be seen from the governor..

Thanks and Regards
Nicole



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] CPUIdle: Reevaluate C-states under CPU load to favor deeper C-states
  2011-11-09 18:06     ` Chalhoub, Nicole
@ 2012-03-20 14:49       ` melwyn lobo
  2012-03-20 17:49         ` Kevin Hilman
  0 siblings, 1 reply; 7+ messages in thread
From: melwyn lobo @ 2012-03-20 14:49 UTC (permalink / raw)
  To: Chalhoub, Nicole, Hilman, Kevin
  Cc: Deepthi Dharwar, linux-kernel, Arjan van de Ven,
	linux-arm-kernel, linux-omap, linux-pm

Hey Kevin,
I would like to try out this patch in my platform see the benefits
that you are reporting. But there is one issue in this patch. You have
not initialized "hrtimer_timeout" variable.
This will always be 0 right ?.
Thanks,
-M

On Wed, Nov 9, 2011 at 11:36 PM, Chalhoub, Nicole <n-chalhoub@ti.com> wrote:
> Hi Deepthi,
>
>>
> Texas Instruments France SA, 821 Avenue Jack Kilby, 06270 Villeneuve Loubet. 036 420 040 R.C.S Antibes. Capital de EUR 753.920
>
> -----Original Message-----
>> From: Deepthi Dharwar [mailto:deepthi@linux.vnet.ibm.com]
>> Sent: Wednesday, November 09, 2011 12:13 PM
>> To: Hilman, Kevin
>> Cc: linux-kernel@vger.kernel.org; Arjan van de Ven; linux-arm-
>> kernel@lists.infradead.org; linux-omap@vger.kernel.org; linux-
>> pm@lists.linux-foundation.org; Chalhoub, Nicole
>> Subject: Re: [PATCH] CPUIdle: Reevaluate C-states under CPU load to favor
>> deeper C-states
>
> [...]
>> By setting timers when we enter non-deepest C-state possible, such that
>> when it fires we
>> can re-evaluate and try moving into deeper and deeper C-states enhancing
>> the
>> power savings is a good feature to have.
>>
>> Looking at the current implementation, is it possible to have it as
>> configurable option
>> where one can enable/disable this functionality through the backhand
>> driver ?
>
> The timeout values of the c state timers are set in the backhand driver.
> By setting the timeout to 0 the timers will not fire so you'll not have this functionality enabled
>
>> Also I am thinking, instead of having them in governor
>> wouldnt it be a good idea to have it implemented in
>> the backhand driver itself ?
>> --Deepthi
>
>
> In fact each C-state had its own configurable timer, so it is a parameter characterizing a C-state as it is for the exit_latency and target_residency parameters.
> And we wanted the timer to fire only when we do not go in deep Cstate due to a high load. This decision is made in the CPU idle governor. So the functionality should be seen from the governor..
>
> Thanks and Regards
> Nicole
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] CPUIdle: Reevaluate C-states under CPU load to favor deeper C-states
  2012-03-20 14:49       ` melwyn lobo
@ 2012-03-20 17:49         ` Kevin Hilman
  0 siblings, 0 replies; 7+ messages in thread
From: Kevin Hilman @ 2012-03-20 17:49 UTC (permalink / raw)
  To: melwyn lobo
  Cc: Chalhoub, Nicole, Deepthi Dharwar, linux-kernel, linux-pm,
	linux-omap, linux-arm-kernel, Arjan van de Ven

melwyn lobo <linux.melwyn@gmail.com> writes:

> Hey Kevin,
> I would like to try out this patch in my platform see the benefits
> that you are reporting. But there is one issue in this patch. You have
> not initialized "hrtimer_timeout" variable.
> This will always be 0 right ?.

Correct.

The generic code defaults to zero so that the default behavior with this
patch is unchanged from previous behavior.  In order to use this
feature, your platform-specific code which creates your C-states sets
the per-C-state timer values.

Kevin

> Thanks,
> -M
>
> On Wed, Nov 9, 2011 at 11:36 PM, Chalhoub, Nicole <n-chalhoub@ti.com> wrote:
>> Hi Deepthi,
>>
>>>
>> Texas Instruments France SA, 821 Avenue Jack Kilby, 06270 Villeneuve Loubet. 036 420 040 R.C.S Antibes. Capital de EUR 753.920
>>
>> -----Original Message-----
>>> From: Deepthi Dharwar [mailto:deepthi@linux.vnet.ibm.com]
>>> Sent: Wednesday, November 09, 2011 12:13 PM
>>> To: Hilman, Kevin
>>> Cc: linux-kernel@vger.kernel.org; Arjan van de Ven; linux-arm-
>>> kernel@lists.infradead.org; linux-omap@vger.kernel.org; linux-
>>> pm@lists.linux-foundation.org; Chalhoub, Nicole
>>> Subject: Re: [PATCH] CPUIdle: Reevaluate C-states under CPU load to favor
>>> deeper C-states
>>
>> [...]
>>> By setting timers when we enter non-deepest C-state possible, such that
>>> when it fires we
>>> can re-evaluate and try moving into deeper and deeper C-states enhancing
>>> the
>>> power savings is a good feature to have.
>>>
>>> Looking at the current implementation, is it possible to have it as
>>> configurable option
>>> where one can enable/disable this functionality through the backhand
>>> driver ?
>>
>> The timeout values of the c state timers are set in the backhand driver.
>> By setting the timeout to 0 the timers will not fire so you'll not have this functionality enabled
>>
>>> Also I am thinking, instead of having them in governor
>>> wouldnt it be a good idea to have it implemented in
>>> the backhand driver itself ?
>>> --Deepthi
>>
>>
>> In fact each C-state had its own configurable timer, so it is a parameter characterizing a C-state as it is for the exit_latency and target_residency parameters.
>> And we wanted the timer to fire only when we do not go in deep Cstate due to a high load. This decision is made in the CPU idle governor. So the functionality should be seen from the governor..
>>
>> Thanks and Regards
>> Nicole
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-03-20 17:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-19 23:35 [PATCH] CPUIdle: Reevaluate C-states under CPU load to favor deeper C-states Kevin Hilman
2011-10-19 13:11 ` Kevin Hilman
2011-11-04 21:46 ` Kevin Hilman
2011-11-09 11:13   ` Deepthi Dharwar
2011-11-09 18:06     ` Chalhoub, Nicole
2012-03-20 14:49       ` melwyn lobo
2012-03-20 17:49         ` Kevin Hilman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).