linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] Make iowait_boost optional and default to policy
@ 2017-05-19  6:23 Joel Fernandes
  2017-05-19  6:23 ` [PATCH v2 1/2] cpufreq: Make iowait boost a policy option Joel Fernandes
  2017-05-19  6:23 ` [PATCH v2 2/2] sched: Make iowait_boost optional in schedutil Joel Fernandes
  0 siblings, 2 replies; 14+ messages in thread
From: Joel Fernandes @ 2017-05-19  6:23 UTC (permalink / raw)
  To: linux-pm, linux-kernel
  Cc: Joel Fernandes, Srinivas Pandruvada, Len Brown,
	Rafael J . Wysocki, Viresh Kumar, Ingo Molnar, Peter Zijlstra

iowait_boost is causing power regression on our arm64 SoC. Really going to max
frequency is bad for power on mobile devices and not wise.
These patches make it optional and default to what the policy suggests coming
from the cpufreq driver as input to the governor.

Here are some power numbers collected on an arm64 based Qualcomm SoC on a mobile
device running a YouTube video for 30 seconds:

Before: 8.042533 mWh
After: 7.948377 mWh
Energy savings is ~1.2%

Joel Fernandes (2):
  cpufreq: Make iowait boost a policy option
  sched: Use iowait boost policy option in schedutil

 drivers/cpufreq/intel_pstate.c   |  1 +
 include/linux/cpufreq.h          |  3 +++
 kernel/sched/cpufreq_schedutil.c | 32 ++++++++++++++++++++++++++++++++
 3 files changed, 36 insertions(+)

Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
-- 
2.13.0.303.g4ebf302169-goog

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/2] cpufreq: Make iowait boost a policy option
  2017-05-19  6:23 [PATCH v2 0/2] Make iowait_boost optional and default to policy Joel Fernandes
@ 2017-05-19  6:23 ` Joel Fernandes
  2017-05-19  9:42   ` Peter Zijlstra
  2017-05-19  6:23 ` [PATCH v2 2/2] sched: Make iowait_boost optional in schedutil Joel Fernandes
  1 sibling, 1 reply; 14+ messages in thread
From: Joel Fernandes @ 2017-05-19  6:23 UTC (permalink / raw)
  To: linux-pm, linux-kernel
  Cc: Joel Fernandes, Srinivas Pandruvada, Len Brown,
	Rafael J . Wysocki, Viresh Kumar, Ingo Molnar, Peter Zijlstra

Make iowait boost a cpufreq policy option and enable it for intel_pstate
cpufreq driver. Governors like schedutil can use it to determine if
boosting for tasks that wake up with p->in_iowait set is needed.

Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Ingo Molnar <mingo@redhat.com> 
Cc: Peter Zijlstra <peterz@infradead.org> 
Signed-off-by: Joel Fernandes <joelaf@google.com>
---
 drivers/cpufreq/intel_pstate.c | 1 +
 include/linux/cpufreq.h        | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index b7de5bd76a31..5dddc21da4f6 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -2239,6 +2239,7 @@ static int intel_cpufreq_cpu_init(struct cpufreq_policy *policy)
 
 	policy->cpuinfo.transition_latency = INTEL_CPUFREQ_TRANSITION_LATENCY;
 	policy->transition_delay_us = INTEL_CPUFREQ_TRANSITION_DELAY;
+	policy->iowait_boost_enable = true;
 	/* This reflects the intel_pstate_get_cpu_pstates() setting. */
 	policy->cur = policy->cpuinfo.min_freq;
 
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index a5ce0bbeadb5..0783d8b52ec8 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -127,6 +127,9 @@ struct cpufreq_policy {
 	 */
 	unsigned int		transition_delay_us;
 
+	/* Boost switch for tasks with p->in_iowait set */
+	bool iowait_boost_enable;
+
 	 /* Cached frequency lookup from cpufreq_driver_resolve_freq. */
 	unsigned int cached_target_freq;
 	int cached_resolved_idx;
-- 
2.13.0.303.g4ebf302169-goog

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/2] sched: Make iowait_boost optional in schedutil
  2017-05-19  6:23 [PATCH v2 0/2] Make iowait_boost optional and default to policy Joel Fernandes
  2017-05-19  6:23 ` [PATCH v2 1/2] cpufreq: Make iowait boost a policy option Joel Fernandes
@ 2017-05-19  6:23 ` Joel Fernandes
  2017-05-19  6:50   ` Viresh Kumar
  1 sibling, 1 reply; 14+ messages in thread
From: Joel Fernandes @ 2017-05-19  6:23 UTC (permalink / raw)
  To: linux-pm, linux-kernel
  Cc: Joel Fernandes, Srinivas Pandruvada, Len Brown,
	Rafael J . Wysocki, Viresh Kumar, Ingo Molnar, Peter Zijlstra

We should apply the iowait boost only if cpufreq policy has iowait boost
enabled. Also make it a schedutil configuration from sysfs so it can be turned
on/off if needed (by default initialize it to the policy value).

For systems that don't need/want it enabled, such as those on arm64 based
mobile devices that are battery operated, it saves energy when the cpufreq
driver policy doesn't have it enabled (details below):

Here are some results for energy measurements collected running a YouTube video
for 30 seconds:
Before: 8.042533 mWh
After: 7.948377 mWh
Energy savings is ~1.2%

Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Ingo Molnar <mingo@redhat.com> 
Cc: Peter Zijlstra <peterz@infradead.org> 
Signed-off-by: Joel Fernandes <joelaf@google.com>
---
 kernel/sched/cpufreq_schedutil.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 76877a62b5fa..0e392b58b9b3 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -24,6 +24,7 @@
 struct sugov_tunables {
 	struct gov_attr_set attr_set;
 	unsigned int rate_limit_us;
+	bool iowait_boost_enable;
 };
 
 struct sugov_policy {
@@ -171,6 +172,11 @@ static void sugov_get_util(unsigned long *util, unsigned long *max)
 static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
 				   unsigned int flags)
 {
+	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
+
+	if (!sg_policy->tunables->iowait_boost_enable)
+		return;
+
 	if (flags & SCHED_CPUFREQ_IOWAIT) {
 		sg_cpu->iowait_boost = sg_cpu->iowait_boost_max;
 	} else if (sg_cpu->iowait_boost) {
@@ -386,10 +392,34 @@ static ssize_t rate_limit_us_store(struct gov_attr_set *attr_set, const char *bu
 	return count;
 }
 
+static ssize_t iowait_boost_enable_show(struct gov_attr_set *attr_set,
+					char *buf)
+{
+	struct sugov_tunables *tunables = to_sugov_tunables(attr_set);
+
+	return sprintf(buf, "%u\n", tunables->iowait_boost_enable);
+}
+
+static ssize_t iowait_boost_enable_store(struct gov_attr_set *attr_set,
+					 const char *buf, size_t count)
+{
+	struct sugov_tunables *tunables = to_sugov_tunables(attr_set);
+	bool enable;
+
+	if (kstrtobool(buf, &enable))
+		return -EINVAL;
+
+	tunables->iowait_boost_enable = enable;
+
+	return count;
+}
+
 static struct governor_attr rate_limit_us = __ATTR_RW(rate_limit_us);
+static struct governor_attr iowait_boost_enable = __ATTR_RW(iowait_boost_enable);
 
 static struct attribute *sugov_attributes[] = {
 	&rate_limit_us.attr,
+	&iowait_boost_enable.attr,
 	NULL
 };
 
@@ -543,6 +573,8 @@ static int sugov_init(struct cpufreq_policy *policy)
 			tunables->rate_limit_us *= lat;
 	}
 
+	tunables->iowait_boost_enable = policy->iowait_boost_enable;
+
 	policy->governor_data = sg_policy;
 	sg_policy->tunables = tunables;
 
-- 
2.13.0.303.g4ebf302169-goog

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] sched: Make iowait_boost optional in schedutil
  2017-05-19  6:23 ` [PATCH v2 2/2] sched: Make iowait_boost optional in schedutil Joel Fernandes
@ 2017-05-19  6:50   ` Viresh Kumar
  2017-05-19 16:10     ` Joel Fernandes
  0 siblings, 1 reply; 14+ messages in thread
From: Viresh Kumar @ 2017-05-19  6:50 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-pm, linux-kernel, Srinivas Pandruvada, Len Brown,
	Rafael J . Wysocki, Ingo Molnar, Peter Zijlstra

On 18-05-17, 23:23, Joel Fernandes wrote:
> We should apply the iowait boost only if cpufreq policy has iowait boost
> enabled. Also make it a schedutil configuration from sysfs so it can be turned
> on/off if needed (by default initialize it to the policy value).
> 
> For systems that don't need/want it enabled, such as those on arm64 based
> mobile devices that are battery operated, it saves energy when the cpufreq
> driver policy doesn't have it enabled (details below):
> 
> Here are some results for energy measurements collected running a YouTube video
> for 30 seconds:
> Before: 8.042533 mWh
> After: 7.948377 mWh
> Energy savings is ~1.2%
> 
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: Len Brown <lenb@kernel.org>
> Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Cc: Ingo Molnar <mingo@redhat.com> 
> Cc: Peter Zijlstra <peterz@infradead.org> 
> Signed-off-by: Joel Fernandes <joelaf@google.com>
> ---
>  kernel/sched/cpufreq_schedutil.c | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 76877a62b5fa..0e392b58b9b3 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -24,6 +24,7 @@
>  struct sugov_tunables {
>  	struct gov_attr_set attr_set;
>  	unsigned int rate_limit_us;
> +	bool iowait_boost_enable;

I suggested s/iowait_boost_enable/iowait_boost/ and you said okay for
that change.

>  };
>  
>  struct sugov_policy {
> @@ -171,6 +172,11 @@ static void sugov_get_util(unsigned long *util, unsigned long *max)
>  static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
>  				   unsigned int flags)
>  {
> +	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
> +
> +	if (!sg_policy->tunables->iowait_boost_enable)
> +		return;
> +
>  	if (flags & SCHED_CPUFREQ_IOWAIT) {
>  		sg_cpu->iowait_boost = sg_cpu->iowait_boost_max;
>  	} else if (sg_cpu->iowait_boost) {
> @@ -386,10 +392,34 @@ static ssize_t rate_limit_us_store(struct gov_attr_set *attr_set, const char *bu
>  	return count;
>  }
>  
> +static ssize_t iowait_boost_enable_show(struct gov_attr_set *attr_set,
> +					char *buf)
> +{
> +	struct sugov_tunables *tunables = to_sugov_tunables(attr_set);
> +
> +	return sprintf(buf, "%u\n", tunables->iowait_boost_enable);
> +}
> +
> +static ssize_t iowait_boost_enable_store(struct gov_attr_set *attr_set,
> +					 const char *buf, size_t count)
> +{
> +	struct sugov_tunables *tunables = to_sugov_tunables(attr_set);
> +	bool enable;
> +
> +	if (kstrtobool(buf, &enable))
> +		return -EINVAL;
> +
> +	tunables->iowait_boost_enable = enable;
> +
> +	return count;
> +}
> +
>  static struct governor_attr rate_limit_us = __ATTR_RW(rate_limit_us);
> +static struct governor_attr iowait_boost_enable = __ATTR_RW(iowait_boost_enable);
>  
>  static struct attribute *sugov_attributes[] = {
>  	&rate_limit_us.attr,
> +	&iowait_boost_enable.attr,
>  	NULL
>  };

Do we really need this right now? I mean, are you going to use it this
way? It may never get used eventually and may be better to leave the
sysfs option for now.

> @@ -543,6 +573,8 @@ static int sugov_init(struct cpufreq_policy *policy)
>  			tunables->rate_limit_us *= lat;
>  	}
>  
> +	tunables->iowait_boost_enable = policy->iowait_boost_enable;
> +
>  	policy->governor_data = sg_policy;
>  	sg_policy->tunables = tunables;

-- 
viresh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] cpufreq: Make iowait boost a policy option
  2017-05-19  6:23 ` [PATCH v2 1/2] cpufreq: Make iowait boost a policy option Joel Fernandes
@ 2017-05-19  9:42   ` Peter Zijlstra
  2017-05-19 10:21     ` Peter Zijlstra
  2017-05-19 17:04     ` Joel Fernandes
  0 siblings, 2 replies; 14+ messages in thread
From: Peter Zijlstra @ 2017-05-19  9:42 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-pm, linux-kernel, Srinivas Pandruvada, Len Brown,
	Rafael J . Wysocki, Viresh Kumar, Ingo Molnar

On Thu, May 18, 2017 at 11:23:43PM -0700, Joel Fernandes wrote:
> Make iowait boost a cpufreq policy option and enable it for intel_pstate
> cpufreq driver. Governors like schedutil can use it to determine if
> boosting for tasks that wake up with p->in_iowait set is needed.

Rather than just flat out disabling the option, is there something
better we can do on ARM?

The reason for the IO-wait boost is to ensure we feed our external
devices data ASAP, this reduces wait times, increases throughput and
decreases the duration the devices have to operate.

I realize max freq/volt might not be the best option for you, but is
there another spot that would make sense? I can imagine you want to
return your MMC to low power state ASAP as well.


So rather than a disable flag, I would really rather see an IO-wait OPP
state selector or something.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] cpufreq: Make iowait boost a policy option
  2017-05-19  9:42   ` Peter Zijlstra
@ 2017-05-19 10:21     ` Peter Zijlstra
  2017-05-19 17:04     ` Joel Fernandes
  1 sibling, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2017-05-19 10:21 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-pm, linux-kernel, Srinivas Pandruvada, Len Brown,
	Rafael J . Wysocki, Viresh Kumar, Ingo Molnar, Juri Lelli,
	Morten Rasmussen

On Fri, May 19, 2017 at 11:42:45AM +0200, Peter Zijlstra wrote:
> On Thu, May 18, 2017 at 11:23:43PM -0700, Joel Fernandes wrote:
> > Make iowait boost a cpufreq policy option and enable it for intel_pstate
> > cpufreq driver. Governors like schedutil can use it to determine if
> > boosting for tasks that wake up with p->in_iowait set is needed.
> 
> Rather than just flat out disabling the option, is there something
> better we can do on ARM?
> 
> The reason for the IO-wait boost is to ensure we feed our external
> devices data ASAP, this reduces wait times, increases throughput and
> decreases the duration the devices have to operate.
> 
> I realize max freq/volt might not be the best option for you, but is
> there another spot that would make sense? I can imagine you want to
> return your MMC to low power state ASAP as well.
> 
> 
> So rather than a disable flag, I would really rather see an IO-wait OPP
> state selector or something.

It would be even better if we can determine that point from the power
model data.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] sched: Make iowait_boost optional in schedutil
  2017-05-19  6:50   ` Viresh Kumar
@ 2017-05-19 16:10     ` Joel Fernandes
  2017-07-11 19:02       ` Saravana Kannan
  0 siblings, 1 reply; 14+ messages in thread
From: Joel Fernandes @ 2017-05-19 16:10 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Linux PM, LKML, Srinivas Pandruvada, Len Brown,
	Rafael J . Wysocki, Ingo Molnar, Peter Zijlstra

Hi Viresh,

On Thu, May 18, 2017 at 11:50 PM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 18-05-17, 23:23, Joel Fernandes wrote:
>> We should apply the iowait boost only if cpufreq policy has iowait boost
>> enabled. Also make it a schedutil configuration from sysfs so it can be turned
>> on/off if needed (by default initialize it to the policy value).
>>
>> For systems that don't need/want it enabled, such as those on arm64 based
>> mobile devices that are battery operated, it saves energy when the cpufreq
>> driver policy doesn't have it enabled (details below):
>>
>> Here are some results for energy measurements collected running a YouTube video
>> for 30 seconds:
>> Before: 8.042533 mWh
>> After: 7.948377 mWh
>> Energy savings is ~1.2%
>>
>> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
>> Cc: Len Brown <lenb@kernel.org>
>> Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Signed-off-by: Joel Fernandes <joelaf@google.com>
>> ---
>>  kernel/sched/cpufreq_schedutil.c | 32 ++++++++++++++++++++++++++++++++
>>  1 file changed, 32 insertions(+)
>>
>> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
>> index 76877a62b5fa..0e392b58b9b3 100644
>> --- a/kernel/sched/cpufreq_schedutil.c
>> +++ b/kernel/sched/cpufreq_schedutil.c
>> @@ -24,6 +24,7 @@
>>  struct sugov_tunables {
>>       struct gov_attr_set attr_set;
>>       unsigned int rate_limit_us;
>> +     bool iowait_boost_enable;
>
> I suggested s/iowait_boost_enable/iowait_boost/ and you said okay for
> that change.

Yes, I somehow only picked up 'bool' from your comment.  I'll drop the
'_enable' in the next version. Sorry and thanks.

>>  };
>>
>>  struct sugov_policy {
>> @@ -171,6 +172,11 @@ static void sugov_get_util(unsigned long *util, unsigned long *max)
>>  static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
>>                                  unsigned int flags)
>>  {
>> +     struct sugov_policy *sg_policy = sg_cpu->sg_policy;
>> +
>> +     if (!sg_policy->tunables->iowait_boost_enable)
>> +             return;
>> +
>>       if (flags & SCHED_CPUFREQ_IOWAIT) {
>>               sg_cpu->iowait_boost = sg_cpu->iowait_boost_max;
>>       } else if (sg_cpu->iowait_boost) {
>> @@ -386,10 +392,34 @@ static ssize_t rate_limit_us_store(struct gov_attr_set *attr_set, const char *bu
>>       return count;
>>  }
>>
>> +static ssize_t iowait_boost_enable_show(struct gov_attr_set *attr_set,
>> +                                     char *buf)
>> +{
>> +     struct sugov_tunables *tunables = to_sugov_tunables(attr_set);
>> +
>> +     return sprintf(buf, "%u\n", tunables->iowait_boost_enable);
>> +}
>> +
>> +static ssize_t iowait_boost_enable_store(struct gov_attr_set *attr_set,
>> +                                      const char *buf, size_t count)
>> +{
>> +     struct sugov_tunables *tunables = to_sugov_tunables(attr_set);
>> +     bool enable;
>> +
>> +     if (kstrtobool(buf, &enable))
>> +             return -EINVAL;
>> +
>> +     tunables->iowait_boost_enable = enable;
>> +
>> +     return count;
>> +}
>> +
>>  static struct governor_attr rate_limit_us = __ATTR_RW(rate_limit_us);
>> +static struct governor_attr iowait_boost_enable = __ATTR_RW(iowait_boost_enable);
>>
>>  static struct attribute *sugov_attributes[] = {
>>       &rate_limit_us.attr,
>> +     &iowait_boost_enable.attr,
>>       NULL
>>  };
>
> Do we really need this right now? I mean, are you going to use it this
> way? It may never get used eventually and may be better to leave the
> sysfs option for now.

I felt it is good to leave it to the system designer and have the
policy set a 'default' value, so incase it isn't touched by the
designer from userspace, then the policy default is fine, and if the
designer chooses to change it then he has the option to. This is also
how we currently set the rate limits for schedutil in android. I don't
feel strongly about one way or the other and if the general consensus
is to drop this part then I'm fine. I'm curious to know what others
think as well though.

thanks,

-Joel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] cpufreq: Make iowait boost a policy option
  2017-05-19  9:42   ` Peter Zijlstra
  2017-05-19 10:21     ` Peter Zijlstra
@ 2017-05-19 17:04     ` Joel Fernandes
  2017-05-22  8:21       ` Peter Zijlstra
  1 sibling, 1 reply; 14+ messages in thread
From: Joel Fernandes @ 2017-05-19 17:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linux PM, LKML, Srinivas Pandruvada, Len Brown,
	Rafael J . Wysocki, Viresh Kumar, Ingo Molnar

Hi Peter,

On Fri, May 19, 2017 at 2:42 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, May 18, 2017 at 11:23:43PM -0700, Joel Fernandes wrote:
>> Make iowait boost a cpufreq policy option and enable it for intel_pstate
>> cpufreq driver. Governors like schedutil can use it to determine if
>> boosting for tasks that wake up with p->in_iowait set is needed.
>
> Rather than just flat out disabling the option, is there something
> better we can do on ARM?
>
> The reason for the IO-wait boost is to ensure we feed our external
> devices data ASAP, this reduces wait times, increases throughput and
> decreases the duration the devices have to operate.

Can you help understand how CPU frequency can affect I/O? The ASAP
makes me think of it as a latency thing than a throughput in which
case there should a scheduling priority increase? Also, to me it
sounds more like memory instead of CPU frequency should be boosted
instead so that DMA transfers happen quicker to feed devices data
faster.

Are you trying to boost the CPU frequency so that a process waiting on
I/O does its next set of processing quickly enough after iowaiting on
the previous I/O transaction, and is ready to feed I/O the next time
sooner?

The case I'm seeing a lot is a background thread does I/O request and
blocks for short period, and wakes up. All this while the CPU
frequency is low, but that wake up causes a spike in frequency. So
over a period of time, you see these spikes that don't really help
anything.

>
> I realize max freq/volt might not be the best option for you, but is
> there another spot that would make sense? I can imagine you want to
> return your MMC to low power state ASAP as well.
>
>
> So rather than a disable flag, I would really rather see an IO-wait OPP
> state selector or something.

We never had this in older kernels and I don't think we ever had an
issue where I/O was slow because of CPU frequency. If a task is busy a
lot, then its load tracking signal should be high and take care of
keeping CPU frequency high right? If PELT is decaying the load
tracking of iowaiting tasks too much, then I think that it should be
fixed there (probably decay an iowaiting task lesser?). Considering
that it makes power worse on newer kernels, it'd probably be best to
disable it in my opinion for those who don't need it.

thanks,

-Joel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] cpufreq: Make iowait boost a policy option
  2017-05-19 17:04     ` Joel Fernandes
@ 2017-05-22  8:21       ` Peter Zijlstra
  2017-05-24 20:17         ` Joel Fernandes
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2017-05-22  8:21 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Linux PM, LKML, Srinivas Pandruvada, Len Brown,
	Rafael J . Wysocki, Viresh Kumar, Ingo Molnar

On Fri, May 19, 2017 at 10:04:28AM -0700, Joel Fernandes wrote:
> Hi Peter,
> 
> On Fri, May 19, 2017 at 2:42 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Thu, May 18, 2017 at 11:23:43PM -0700, Joel Fernandes wrote:
> >> Make iowait boost a cpufreq policy option and enable it for intel_pstate
> >> cpufreq driver. Governors like schedutil can use it to determine if
> >> boosting for tasks that wake up with p->in_iowait set is needed.
> >
> > Rather than just flat out disabling the option, is there something
> > better we can do on ARM?
> >
> > The reason for the IO-wait boost is to ensure we feed our external
> > devices data ASAP, this reduces wait times, increases throughput and
> > decreases the duration the devices have to operate.
> 
> Can you help understand how CPU frequency can affect I/O? The ASAP
> makes me think of it as a latency thing than a throughput in which
> case there should a scheduling priority increase? Also, to me it
> sounds more like memory instead of CPU frequency should be boosted
> instead so that DMA transfers happen quicker to feed devices data
> faster.

Suppose your (I/O) device has the task waiting for a completion for 1ms
for each request. Further suppose that feeding it the next request takes
.1ms at full speed (1 GHz).

Then we get, without contending tasks, a cycle of:


 R----------R----------					(1 GHz)


Which comes at 1/11-th utilization, which would then select something
like 100 MHz as being sufficient. But then the R part becomes 10x longer
and we end up with:


 RRRRRRRRRR----------RRRRRRRRRR----------		(100 MHz)


And since there's still plenty idle time, and the effective utilization
is still the same 1/11th, we'll not ramp up at all and continue in this
cycle.

Note however that the total time of the cycle increased from 1.1ms
to 2ms, for an ~80% decrease in throughput.

> Are you trying to boost the CPU frequency so that a process waiting on
> I/O does its next set of processing quickly enough after iowaiting on
> the previous I/O transaction, and is ready to feed I/O the next time
> sooner?

This. So we break the above pattern by boosting the task that wakes from
IO-wait. Its utilization will never be enough to cause a significant
bump in frequency on its own, as its constantly blocked on the IO
device.

> The case I'm seeing a lot is a background thread does I/O request and
> blocks for short period, and wakes up. All this while the CPU
> frequency is low, but that wake up causes a spike in frequency. So
> over a period of time, you see these spikes that don't really help
> anything.

So the background thread is doing some spurious IO but nothing
consistent?

> > I realize max freq/volt might not be the best option for you, but is
> > there another spot that would make sense? I can imagine you want to
> > return your MMC to low power state ASAP as well.
> >
> >
> > So rather than a disable flag, I would really rather see an IO-wait OPP
> > state selector or something.
> 
> We never had this in older kernels and I don't think we ever had an
> issue where I/O was slow because of CPU frequency. If a task is busy a
> lot, then its load tracking signal should be high and take care of
> keeping CPU frequency high right?

As per the above, no. If the device completion takes long enough to
inject enough idle time, the utilization signal will never be high
enough to break out of that pattern.

> If PELT is decaying the load
> tracking of iowaiting tasks too much, then I think that it should be
> fixed there (probably decay an iowaiting task lesser?).

For the above to work, we'd have to completely discard IO-wait time on
the utilization signal. But that would then give the task u=1, which
would be incorrect for placement decisions and wreck EAS.

> Considering
> that it makes power worse on newer kernels, it'd probably be best to
> disable it in my opinion for those who don't need it.

You have yet to convince me you don't need it. Sure Android might not
have much IO heavy workloads, but that's not to say nothing on ARM ever
does.

Also note that if you set the boost OPP to the lowest OPP you
effectively do disable it.

Looking at the code, it appears we already have this in
iowait_boost_max.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] cpufreq: Make iowait boost a policy option
  2017-05-22  8:21       ` Peter Zijlstra
@ 2017-05-24 20:17         ` Joel Fernandes
  2017-06-10  8:08           ` Joel Fernandes
  0 siblings, 1 reply; 14+ messages in thread
From: Joel Fernandes @ 2017-05-24 20:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linux PM, LKML, Srinivas Pandruvada, Len Brown,
	Rafael J . Wysocki, Viresh Kumar, Ingo Molnar

Hi Peter,

On Mon, May 22, 2017 at 1:21 AM, Peter Zijlstra <peterz@infradead.org> wrote:
[..]
>> On Fri, May 19, 2017 at 2:42 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Thu, May 18, 2017 at 11:23:43PM -0700, Joel Fernandes wrote:
>> >> Make iowait boost a cpufreq policy option and enable it for intel_pstate
>> >> cpufreq driver. Governors like schedutil can use it to determine if
>> >> boosting for tasks that wake up with p->in_iowait set is needed.
>> >
>> > Rather than just flat out disabling the option, is there something
>> > better we can do on ARM?
>> >
>> > The reason for the IO-wait boost is to ensure we feed our external
>> > devices data ASAP, this reduces wait times, increases throughput and
>> > decreases the duration the devices have to operate.
>>
>> Can you help understand how CPU frequency can affect I/O? The ASAP
>> makes me think of it as a latency thing than a throughput in which
>> case there should a scheduling priority increase? Also, to me it
>> sounds more like memory instead of CPU frequency should be boosted
>> instead so that DMA transfers happen quicker to feed devices data
>> faster.
>
> Suppose your (I/O) device has the task waiting for a completion for 1ms
> for each request. Further suppose that feeding it the next request takes
> .1ms at full speed (1 GHz).
>
> Then we get, without contending tasks, a cycle of:
>
>
>  R----------R----------                                 (1 GHz)
>
>
> Which comes at 1/11-th utilization, which would then select something
> like 100 MHz as being sufficient. But then the R part becomes 10x longer
> and we end up with:
>
>
>  RRRRRRRRRR----------RRRRRRRRRR----------               (100 MHz)
>
>
> And since there's still plenty idle time, and the effective utilization
> is still the same 1/11th, we'll not ramp up at all and continue in this
> cycle.
>
> Note however that the total time of the cycle increased from 1.1ms
> to 2ms, for an ~80% decrease in throughput.

Got it, thanks for the explanation.

>> Are you trying to boost the CPU frequency so that a process waiting on
>> I/O does its next set of processing quickly enough after iowaiting on
>> the previous I/O transaction, and is ready to feed I/O the next time
>> sooner?
>
> This. So we break the above pattern by boosting the task that wakes from
> IO-wait. Its utilization will never be enough to cause a significant
> bump in frequency on its own, as its constantly blocked on the IO
> device.

It sounds like this problem can happen with any other use-case where
one task blocks on the other, not just IO. Like a case where 2 tasks
running on different CPUs block on a mutex, then on either task can
wait on the other causing their utilization to be low right?

>> The case I'm seeing a lot is a background thread does I/O request and
>> blocks for short period, and wakes up. All this while the CPU
>> frequency is low, but that wake up causes a spike in frequency. So
>> over a period of time, you see these spikes that don't really help
>> anything.
>
> So the background thread is doing some spurious IO but nothing
> consistent?

Yes, its not a consistent pattern. Its actually a 'kworker' that woke
up to read/write something related to the video being played by the
YouTube app and is asynchronous to the app itself. It could be writing
to the logs or other information. But this definitely not a consistent
pattern as in the use case you described but intermittent spikes. The
frequency boosts don't help the actual activity of playing the video
except increasing power.

>> > I realize max freq/volt might not be the best option for you, but is
>> > there another spot that would make sense? I can imagine you want to
>> > return your MMC to low power state ASAP as well.
>> >
>> >
>> > So rather than a disable flag, I would really rather see an IO-wait OPP
>> > state selector or something.
>>
>> We never had this in older kernels and I don't think we ever had an
>> issue where I/O was slow because of CPU frequency. If a task is busy a
>> lot, then its load tracking signal should be high and take care of
>> keeping CPU frequency high right?
>
> As per the above, no. If the device completion takes long enough to
> inject enough idle time, the utilization signal will never be high
> enough to break out of that pattern.
>
>> If PELT is decaying the load
>> tracking of iowaiting tasks too much, then I think that it should be
>> fixed there (probably decay an iowaiting task lesser?).
>
> For the above to work, we'd have to completely discard IO-wait time on
> the utilization signal. But that would then give the task u=1, which
> would be incorrect for placement decisions and wreck EAS.

Not completely discard but cap the decay of the signal during IO wait.

>
>> Considering
>> that it makes power worse on newer kernels, it'd probably be best to
>> disable it in my opinion for those who don't need it.
>
> You have yet to convince me you don't need it. Sure Android might not
> have much IO heavy workloads, but that's not to say nothing on ARM ever
> does.
>
> Also note that if you set the boost OPP to the lowest OPP you
> effectively do disable it.
>
> Looking at the code, it appears we already have this in
> iowait_boost_max.

Currently it is set to:
 sg_cpu->iowait_boost_max = policy->cpuinfo.max_freq

Are you proposing to make this a sysfs tunable so we can override what
the iowait_boost_max value is?

thanks,

-Joel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] cpufreq: Make iowait boost a policy option
  2017-05-24 20:17         ` Joel Fernandes
@ 2017-06-10  8:08           ` Joel Fernandes
  2017-06-10 13:56             ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Joel Fernandes @ 2017-06-10  8:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linux PM, LKML, Srinivas Pandruvada, Len Brown,
	Rafael J . Wysocki, Viresh Kumar, Ingo Molnar, Juri Lelli,
	Patrick Bellasi

Adding Juri and Patrick as well to share any thoughts. Replied to
Peter in the end of this email.

On Wed, May 24, 2017 at 1:17 PM, Joel Fernandes <joelaf@google.com> wrote:
> Hi Peter,
>
> On Mon, May 22, 2017 at 1:21 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> [..]
>>> On Fri, May 19, 2017 at 2:42 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>>> > On Thu, May 18, 2017 at 11:23:43PM -0700, Joel Fernandes wrote:
>>> >> Make iowait boost a cpufreq policy option and enable it for intel_pstate
>>> >> cpufreq driver. Governors like schedutil can use it to determine if
>>> >> boosting for tasks that wake up with p->in_iowait set is needed.
>>> >
>>> > Rather than just flat out disabling the option, is there something
>>> > better we can do on ARM?
>>> >
>>> > The reason for the IO-wait boost is to ensure we feed our external
>>> > devices data ASAP, this reduces wait times, increases throughput and
>>> > decreases the duration the devices have to operate.
>>>
>>> Can you help understand how CPU frequency can affect I/O? The ASAP
>>> makes me think of it as a latency thing than a throughput in which
>>> case there should a scheduling priority increase? Also, to me it
>>> sounds more like memory instead of CPU frequency should be boosted
>>> instead so that DMA transfers happen quicker to feed devices data
>>> faster.
>>
>> Suppose your (I/O) device has the task waiting for a completion for 1ms
>> for each request. Further suppose that feeding it the next request takes
>> .1ms at full speed (1 GHz).
>>
>> Then we get, without contending tasks, a cycle of:
>>
>>
>>  R----------R----------                                 (1 GHz)
>>
>>
>> Which comes at 1/11-th utilization, which would then select something
>> like 100 MHz as being sufficient. But then the R part becomes 10x longer
>> and we end up with:
>>
>>
>>  RRRRRRRRRR----------RRRRRRRRRR----------               (100 MHz)
>>
>>
>> And since there's still plenty idle time, and the effective utilization
>> is still the same 1/11th, we'll not ramp up at all and continue in this
>> cycle.
>>
>> Note however that the total time of the cycle increased from 1.1ms
>> to 2ms, for an ~80% decrease in throughput.
>
> Got it, thanks for the explanation.
>
>>> Are you trying to boost the CPU frequency so that a process waiting on
>>> I/O does its next set of processing quickly enough after iowaiting on
>>> the previous I/O transaction, and is ready to feed I/O the next time
>>> sooner?
>>
>> This. So we break the above pattern by boosting the task that wakes from
>> IO-wait. Its utilization will never be enough to cause a significant
>> bump in frequency on its own, as its constantly blocked on the IO
>> device.
>
> It sounds like this problem can happen with any other use-case where
> one task blocks on the other, not just IO. Like a case where 2 tasks
> running on different CPUs block on a mutex, then on either task can
> wait on the other causing their utilization to be low right?
>
>>> The case I'm seeing a lot is a background thread does I/O request and
>>> blocks for short period, and wakes up. All this while the CPU
>>> frequency is low, but that wake up causes a spike in frequency. So
>>> over a period of time, you see these spikes that don't really help
>>> anything.
>>
>> So the background thread is doing some spurious IO but nothing
>> consistent?
>
> Yes, its not a consistent pattern. Its actually a 'kworker' that woke
> up to read/write something related to the video being played by the
> YouTube app and is asynchronous to the app itself. It could be writing
> to the logs or other information. But this definitely not a consistent
> pattern as in the use case you described but intermittent spikes. The
> frequency boosts don't help the actual activity of playing the video
> except increasing power.
>
>>> > I realize max freq/volt might not be the best option for you, but is
>>> > there another spot that would make sense? I can imagine you want to
>>> > return your MMC to low power state ASAP as well.
>>> >
>>> >
>>> > So rather than a disable flag, I would really rather see an IO-wait OPP
>>> > state selector or something.
>>>
>>> We never had this in older kernels and I don't think we ever had an
>>> issue where I/O was slow because of CPU frequency. If a task is busy a
>>> lot, then its load tracking signal should be high and take care of
>>> keeping CPU frequency high right?
>>
>> As per the above, no. If the device completion takes long enough to
>> inject enough idle time, the utilization signal will never be high
>> enough to break out of that pattern.
>>
>>> If PELT is decaying the load
>>> tracking of iowaiting tasks too much, then I think that it should be
>>> fixed there (probably decay an iowaiting task lesser?).
>>
>> For the above to work, we'd have to completely discard IO-wait time on
>> the utilization signal. But that would then give the task u=1, which
>> would be incorrect for placement decisions and wreck EAS.
>
> Not completely discard but cap the decay of the signal during IO wait.
>
>>
>>> Considering
>>> that it makes power worse on newer kernels, it'd probably be best to
>>> disable it in my opinion for those who don't need it.
>>
>> You have yet to convince me you don't need it. Sure Android might not
>> have much IO heavy workloads, but that's not to say nothing on ARM ever
>> does.
>>
>> Also note that if you set the boost OPP to the lowest OPP you
>> effectively do disable it.
>>
>> Looking at the code, it appears we already have this in
>> iowait_boost_max.
>
> Currently it is set to:
>  sg_cpu->iowait_boost_max = policy->cpuinfo.max_freq
>
> Are you proposing to make this a sysfs tunable so we can override what
> the iowait_boost_max value is?
>

Peter I didn't hear back from you. Maybe my comment here did not make
much sense to you? That could be because I was confused what you meant
by iowait_boost_max setting to 0. Currently afaik there isn't an
upstream way of doing this. Were you suggesting making
iowait_boost_max as a tunable and setting it to 0?

Or did you mean to have us carry an out of tree patch that sets it to
0? One of the reason I am pushing this patch is to not have to carry
an out of tree patch that disables it. Looking forward to your reply.

Thanks a lot,
Joel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] cpufreq: Make iowait boost a policy option
  2017-06-10  8:08           ` Joel Fernandes
@ 2017-06-10 13:56             ` Peter Zijlstra
  2017-06-11  6:59               ` Joel Fernandes
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2017-06-10 13:56 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Linux PM, LKML, Srinivas Pandruvada, Len Brown,
	Rafael J . Wysocki, Viresh Kumar, Ingo Molnar, Juri Lelli,
	Patrick Bellasi

On Sat, Jun 10, 2017 at 01:08:18AM -0700, Joel Fernandes wrote:

> Adding Juri and Patrick as well to share any thoughts. Replied to
> Peter in the end of this email.

Oh sorry, I completely missed your earlier reply :-(

> On Wed, May 24, 2017 at 1:17 PM, Joel Fernandes <joelaf@google.com> wrote:
> > On Mon, May 22, 2017 at 1:21 AM, Peter Zijlstra <peterz@infradead.org> wrote:

> >> Suppose your (I/O) device has the task waiting for a completion for 1ms
> >> for each request. Further suppose that feeding it the next request takes
> >> .1ms at full speed (1 GHz).
> >>
> >> Then we get, without contending tasks, a cycle of:
> >>
> >>
> >>  R----------R----------                                 (1 GHz)
> >>
> >>
> >> Which comes at 1/11-th utilization, which would then select something
> >> like 100 MHz as being sufficient. But then the R part becomes 10x longer
> >> and we end up with:
> >>
> >>
> >>  RRRRRRRRRR----------RRRRRRRRRR----------               (100 MHz)
> >>
> >>
> >> And since there's still plenty idle time, and the effective utilization
> >> is still the same 1/11th, we'll not ramp up at all and continue in this
> >> cycle.
> >>
> >> Note however that the total time of the cycle increased from 1.1ms
> >> to 2ms, for an ~80% decrease in throughput.
> >
> > Got it, thanks for the explanation.
> >
> >>> Are you trying to boost the CPU frequency so that a process waiting on
> >>> I/O does its next set of processing quickly enough after iowaiting on
> >>> the previous I/O transaction, and is ready to feed I/O the next time
> >>> sooner?
> >>
> >> This. So we break the above pattern by boosting the task that wakes from
> >> IO-wait. Its utilization will never be enough to cause a significant
> >> bump in frequency on its own, as its constantly blocked on the IO
> >> device.
> >
> > It sounds like this problem can happen with any other use-case where
> > one task blocks on the other, not just IO. Like a case where 2 tasks
> > running on different CPUs block on a mutex, then on either task can
> > wait on the other causing their utilization to be low right?

No, with two tasks bouncing on a mutex this does not happen. For both
tasks are visible and consume time on the CPU. So, if for example, a
task A blocks on a task B, then B will still be running, and cpufreq
will still see B and provide it sufficient resource to keep running.
That is, if B is cpu bound, and we recognise it as such, it will get
full CPU.

The difference with the IO is that the IO device is completely
invisible. This makes sense in that cpufreq cannot affect the devices
performance, but it does lead to the above issue.

> >>> The case I'm seeing a lot is a background thread does I/O request and
> >>> blocks for short period, and wakes up. All this while the CPU
> >>> frequency is low, but that wake up causes a spike in frequency. So
> >>> over a period of time, you see these spikes that don't really help
> >>> anything.
> >>
> >> So the background thread is doing some spurious IO but nothing
> >> consistent?
> >
> > Yes, its not a consistent pattern. Its actually a 'kworker' that woke
> > up to read/write something related to the video being played by the
> > YouTube app and is asynchronous to the app itself. It could be writing
> > to the logs or other information. But this definitely not a consistent
> > pattern as in the use case you described but intermittent spikes. The
> > frequency boosts don't help the actual activity of playing the video
> > except increasing power.

Right; so one thing we can try is to ramp-up the boost. Because
currently its a bit of an asymmetric thing in that we'll instantly boost
to max and then slowly back off again.

If instead we need to 'earn' full boost by repeatedly blocking on IO
this might sufficiently damp your spikes.

> >> Also note that if you set the boost OPP to the lowest OPP you
> >> effectively do disable it.
> >>
> >> Looking at the code, it appears we already have this in
> >> iowait_boost_max.
> >
> > Currently it is set to:
> >  sg_cpu->iowait_boost_max = policy->cpuinfo.max_freq
> >
> > Are you proposing to make this a sysfs tunable so we can override what
> > the iowait_boost_max value is?

Not sysfs, but maybe cpufreq driver / platform. For example have it be
the OPP that provides the max Instructions per Watt.

> Peter I didn't hear back from you. Maybe my comment here did not make
> much sense to you? 

Again sorry; I completely missed it :/

> That could be because I was confused what you meant by
> iowait_boost_max setting to 0. Currently afaik there isn't an upstream
> way of doing this. Were you suggesting making iowait_boost_max as a
> tunable and setting it to 0?

Tunable as in exposed to the driver, not userspace.

But I'm hoping an efficient OPP and the ramp-up together would be enough
for your case and also still work for our desktop/server loads.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] cpufreq: Make iowait boost a policy option
  2017-06-10 13:56             ` Peter Zijlstra
@ 2017-06-11  6:59               ` Joel Fernandes
  0 siblings, 0 replies; 14+ messages in thread
From: Joel Fernandes @ 2017-06-11  6:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linux PM, LKML, Srinivas Pandruvada, Len Brown,
	Rafael J . Wysocki, Viresh Kumar, Ingo Molnar, Juri Lelli,
	Patrick Bellasi

Hi Peter,

On Sat, Jun 10, 2017 at 6:56 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Sat, Jun 10, 2017 at 01:08:18AM -0700, Joel Fernandes wrote:
>
>> Adding Juri and Patrick as well to share any thoughts. Replied to
>> Peter in the end of this email.
>
> Oh sorry, I completely missed your earlier reply :-(

No problem. I appreciate you taking time to reply, thanks.

>> >>> Are you trying to boost the CPU frequency so that a process waiting on
>> >>> I/O does its next set of processing quickly enough after iowaiting on
>> >>> the previous I/O transaction, and is ready to feed I/O the next time
>> >>> sooner?
>> >>
>> >> This. So we break the above pattern by boosting the task that wakes from
>> >> IO-wait. Its utilization will never be enough to cause a significant
>> >> bump in frequency on its own, as its constantly blocked on the IO
>> >> device.
>> >
>> > It sounds like this problem can happen with any other use-case where
>> > one task blocks on the other, not just IO. Like a case where 2 tasks
>> > running on different CPUs block on a mutex, then on either task can
>> > wait on the other causing their utilization to be low right?
>
> No, with two tasks bouncing on a mutex this does not happen. For both
> tasks are visible and consume time on the CPU. So, if for example, a
> task A blocks on a task B, then B will still be running, and cpufreq
> will still see B and provide it sufficient resource to keep running.
> That is, if B is cpu bound, and we recognise it as such, it will get
> full CPU.
>
> The difference with the IO is that the IO device is completely
> invisible. This makes sense in that cpufreq cannot affect the devices
> performance, but it does lead to the above issue.

But if task A and B are on different CPUs due to CPU affinity and
these CPUs are on different frequency domains and bouncing on a mutex,
then you would run into the same problem right?

>> >>> The case I'm seeing a lot is a background thread does I/O request and
>> >>> blocks for short period, and wakes up. All this while the CPU
>> >>> frequency is low, but that wake up causes a spike in frequency. So
>> >>> over a period of time, you see these spikes that don't really help
>> >>> anything.
>> >>
>> >> So the background thread is doing some spurious IO but nothing
>> >> consistent?
>> >
>> > Yes, its not a consistent pattern. Its actually a 'kworker' that woke
>> > up to read/write something related to the video being played by the
>> > YouTube app and is asynchronous to the app itself. It could be writing
>> > to the logs or other information. But this definitely not a consistent
>> > pattern as in the use case you described but intermittent spikes. The
>> > frequency boosts don't help the actual activity of playing the video
>> > except increasing power.
>
> Right; so one thing we can try is to ramp-up the boost. Because
> currently its a bit of an asymmetric thing in that we'll instantly boost
> to max and then slowly back off again.
>
> If instead we need to 'earn' full boost by repeatedly blocking on IO
> this might sufficiently damp your spikes.

Cool, that sounds like a great idea.

>> >> Also note that if you set the boost OPP to the lowest OPP you
>> >> effectively do disable it.
>> >>
>> >> Looking at the code, it appears we already have this in
>> >> iowait_boost_max.
>> >
>> > Currently it is set to:
>> >  sg_cpu->iowait_boost_max = policy->cpuinfo.max_freq
>> >
>> > Are you proposing to make this a sysfs tunable so we can override what
>> > the iowait_boost_max value is?
>
> Not sysfs, but maybe cpufreq driver / platform. For example have it be
> the OPP that provides the max Instructions per Watt.
>
>> Peter I didn't hear back from you. Maybe my comment here did not make
>> much sense to you?
>
> Again sorry; I completely missed it :/

No problem, thank you for replying. :)

>> That could be because I was confused what you meant by
>> iowait_boost_max setting to 0. Currently afaik there isn't an upstream
>> way of doing this. Were you suggesting making iowait_boost_max as a
>> tunable and setting it to 0?
>
> Tunable as in exposed to the driver, not userspace.

Got it.

> But I'm hoping an efficient OPP and the ramp-up together would be enough
> for your case and also still work for our desktop/server loads.

Ok. I am trying repro this with a synthetic test and measure
throughput so that I have a predictable usecase.

I was also thinking of another approach where when a p->in_iowait task
wakes up, we don't decay its util_avg. Then we calculate the total
time it was blocked due to I/O and then use that to correct the error
in the rq's util_avg (since the task's contribution to the rq util_avg
could have decayed while it iowait-ing). This will in a sense boost
the util_avg. Do you think that's a workable approach? That way if the
task is waiting very briefly, then the error to correct would be small
and we wouldn't just end up ramping to max frequency.
I think the other way to do it could be to not decay the rq's util_avg
while a task is waiting on I/O (maybe by checking rq->nr_iowait?).

Thanks,
Joel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] sched: Make iowait_boost optional in schedutil
  2017-05-19 16:10     ` Joel Fernandes
@ 2017-07-11 19:02       ` Saravana Kannan
  0 siblings, 0 replies; 14+ messages in thread
From: Saravana Kannan @ 2017-07-11 19:02 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Viresh Kumar, Linux PM, LKML, Srinivas Pandruvada, Len Brown,
	Rafael J . Wysocki, Ingo Molnar, Peter Zijlstra

On 05/19/2017 09:10 AM, Joel Fernandes wrote:
> Hi Viresh,
>
> On Thu, May 18, 2017 at 11:50 PM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>> On 18-05-17, 23:23, Joel Fernandes wrote:
>>> We should apply the iowait boost only if cpufreq policy has iowait boost
>>> enabled. Also make it a schedutil configuration from sysfs so it can be turned
>>> on/off if needed (by default initialize it to the policy value).
>>>
>>> For systems that don't need/want it enabled, such as those on arm64 based
>>> mobile devices that are battery operated, it saves energy when the cpufreq
>>> driver policy doesn't have it enabled (details below):
>>>
>>> Here are some results for energy measurements collected running a YouTube video
>>> for 30 seconds:
>>> Before: 8.042533 mWh
>>> After: 7.948377 mWh
>>> Energy savings is ~1.2%
>>>
>>> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
>>> Cc: Len Brown <lenb@kernel.org>
>>> Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
>>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
>>> Cc: Ingo Molnar <mingo@redhat.com>
>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>> Signed-off-by: Joel Fernandes <joelaf@google.com>
>>> ---
>>>   kernel/sched/cpufreq_schedutil.c | 32 ++++++++++++++++++++++++++++++++
>>>   1 file changed, 32 insertions(+)
>>>
>>> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
>>> index 76877a62b5fa..0e392b58b9b3 100644
>>> --- a/kernel/sched/cpufreq_schedutil.c
>>> +++ b/kernel/sched/cpufreq_schedutil.c
>>> @@ -24,6 +24,7 @@
>>>   struct sugov_tunables {
>>>        struct gov_attr_set attr_set;
>>>        unsigned int rate_limit_us;
>>> +     bool iowait_boost_enable;
>>
>> I suggested s/iowait_boost_enable/iowait_boost/ and you said okay for
>> that change.
>
> Yes, I somehow only picked up 'bool' from your comment.  I'll drop the
> '_enable' in the next version. Sorry and thanks.
>
>>>   };
>>>
>>>   struct sugov_policy {
>>> @@ -171,6 +172,11 @@ static void sugov_get_util(unsigned long *util, unsigned long *max)
>>>   static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
>>>                                   unsigned int flags)
>>>   {
>>> +     struct sugov_policy *sg_policy = sg_cpu->sg_policy;
>>> +
>>> +     if (!sg_policy->tunables->iowait_boost_enable)
>>> +             return;
>>> +
>>>        if (flags & SCHED_CPUFREQ_IOWAIT) {
>>>                sg_cpu->iowait_boost = sg_cpu->iowait_boost_max;
>>>        } else if (sg_cpu->iowait_boost) {
>>> @@ -386,10 +392,34 @@ static ssize_t rate_limit_us_store(struct gov_attr_set *attr_set, const char *bu
>>>        return count;
>>>   }
>>>
>>> +static ssize_t iowait_boost_enable_show(struct gov_attr_set *attr_set,
>>> +                                     char *buf)
>>> +{
>>> +     struct sugov_tunables *tunables = to_sugov_tunables(attr_set);
>>> +
>>> +     return sprintf(buf, "%u\n", tunables->iowait_boost_enable);
>>> +}
>>> +
>>> +static ssize_t iowait_boost_enable_store(struct gov_attr_set *attr_set,
>>> +                                      const char *buf, size_t count)
>>> +{
>>> +     struct sugov_tunables *tunables = to_sugov_tunables(attr_set);
>>> +     bool enable;
>>> +
>>> +     if (kstrtobool(buf, &enable))
>>> +             return -EINVAL;
>>> +
>>> +     tunables->iowait_boost_enable = enable;
>>> +
>>> +     return count;
>>> +}
>>> +
>>>   static struct governor_attr rate_limit_us = __ATTR_RW(rate_limit_us);
>>> +static struct governor_attr iowait_boost_enable = __ATTR_RW(iowait_boost_enable);
>>>
>>>   static struct attribute *sugov_attributes[] = {
>>>        &rate_limit_us.attr,
>>> +     &iowait_boost_enable.attr,
>>>        NULL
>>>   };
>>
>> Do we really need this right now? I mean, are you going to use it this
>> way? It may never get used eventually and may be better to leave the
>> sysfs option for now.
>
> I felt it is good to leave it to the system designer and have the
> policy set a 'default' value, so incase it isn't touched by the
> designer from userspace, then the policy default is fine, and if the
> designer chooses to change it then he has the option to. This is also
> how we currently set the rate limits for schedutil in android. I don't
> feel strongly about one way or the other and if the general consensus
> is to drop this part then I'm fine. I'm curious to know what others
> think as well though.
>

Acked-by: Saravana Kannan <skannan@codeaurora.org>



-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-07-11 19:02 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-19  6:23 [PATCH v2 0/2] Make iowait_boost optional and default to policy Joel Fernandes
2017-05-19  6:23 ` [PATCH v2 1/2] cpufreq: Make iowait boost a policy option Joel Fernandes
2017-05-19  9:42   ` Peter Zijlstra
2017-05-19 10:21     ` Peter Zijlstra
2017-05-19 17:04     ` Joel Fernandes
2017-05-22  8:21       ` Peter Zijlstra
2017-05-24 20:17         ` Joel Fernandes
2017-06-10  8:08           ` Joel Fernandes
2017-06-10 13:56             ` Peter Zijlstra
2017-06-11  6:59               ` Joel Fernandes
2017-05-19  6:23 ` [PATCH v2 2/2] sched: Make iowait_boost optional in schedutil Joel Fernandes
2017-05-19  6:50   ` Viresh Kumar
2017-05-19 16:10     ` Joel Fernandes
2017-07-11 19:02       ` Saravana Kannan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).