linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] cpufreq: governor: Fix prev_load initialization in cpufreq_governor_start()
@ 2016-04-25  1:07 Rafael J. Wysocki
  2016-04-25  4:14 ` Viresh Kumar
  2016-04-25 15:45 ` Chen, Yu C
  0 siblings, 2 replies; 5+ messages in thread
From: Rafael J. Wysocki @ 2016-04-25  1:07 UTC (permalink / raw)
  To: Linux PM list; +Cc: Linux Kernel Mailing List, Viresh Kumar, Chen Yu

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

The way cpufreq_governor_start() initializes j_cdbs->prev_load is
questionable.

First off, j_cdbs->prev_cpu_wall used as a denominator in the
computation may be zero.  The case this happens is when
get_cpu_idle_time_us() returns -1 and get_cpu_idle_time_jiffy()
used to return that number is called exactly at the jiffies_64
wrap time.  It is rather hard to trigger that error, but it is not
impossible and it will just crash the kernel then.

Second, j_cdbs->prev_load is computed as the average load during
the entire time since the system started and it may not reflect the
load in the previous sampling period (as it is expected to).
That doesn't play well with the way dbs_update() uses that value.
Namely, if the update time delta (wall_time) happens do be greater
than twice the sampling rate on the first invocation of it, the
initial value of j_cdbs->prev_load (which may be completely off) will
be returned to the caller as the current load (unless it is equal to
zero and unless another CPU sharing the same policy object has a
greater load value).

For this reason, notice that the prev_load field of struct cpu_dbs_info
is only used by dbs_update() and only in that one place, so if
cpufreq_governor_start() is modified to always initialize it to 0,
it will make dbs_update() always compute the actual load first time
it checks the update time delta against the doubled sampling rate
(after initialization) and there won't be any side effects of it.

Consequently, modify cpufreq_governor_start() as described.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/cpufreq_governor.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Index: linux-pm/drivers/cpufreq/cpufreq_governor.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/cpufreq_governor.c
+++ linux-pm/drivers/cpufreq/cpufreq_governor.c
@@ -508,12 +508,12 @@ static int cpufreq_governor_start(struct
 
 	for_each_cpu(j, policy->cpus) {
 		struct cpu_dbs_info *j_cdbs = &per_cpu(cpu_dbs, j);
-		unsigned int prev_load;
 
 		j_cdbs->prev_cpu_idle = get_cpu_idle_time(j, &j_cdbs->prev_cpu_wall, io_busy);
-
-		prev_load = j_cdbs->prev_cpu_wall - j_cdbs->prev_cpu_idle;
-		j_cdbs->prev_load = 100 * prev_load / (unsigned int)j_cdbs->prev_cpu_wall;
+		/*
+		 * Make the first invocation of dbs_update() compute the load.
+		 */
+		j_cdbs->prev_load = 0;
 
 		if (ignore_nice)
 			j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] cpufreq: governor: Fix prev_load initialization in cpufreq_governor_start()
  2016-04-25  1:07 [PATCH] cpufreq: governor: Fix prev_load initialization in cpufreq_governor_start() Rafael J. Wysocki
@ 2016-04-25  4:14 ` Viresh Kumar
  2016-04-25 11:24   ` Rafael J. Wysocki
  2016-04-25 15:45 ` Chen, Yu C
  1 sibling, 1 reply; 5+ messages in thread
From: Viresh Kumar @ 2016-04-25  4:14 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux PM list, Linux Kernel Mailing List, Chen Yu

On 25-04-16, 03:07, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> The way cpufreq_governor_start() initializes j_cdbs->prev_load is
> questionable.
> 
> First off, j_cdbs->prev_cpu_wall used as a denominator in the
> computation may be zero.  The case this happens is when
> get_cpu_idle_time_us() returns -1 and get_cpu_idle_time_jiffy()
> used to return that number is called exactly at the jiffies_64
> wrap time.  It is rather hard to trigger that error, but it is not
> impossible and it will just crash the kernel then.
> 
> Second, j_cdbs->prev_load is computed as the average load during
> the entire time since the system started and it may not reflect the
> load in the previous sampling period (as it is expected to).
> That doesn't play well with the way dbs_update() uses that value.
> Namely, if the update time delta (wall_time) happens do be greater
> than twice the sampling rate on the first invocation of it, the
> initial value of j_cdbs->prev_load (which may be completely off) will
> be returned to the caller as the current load (unless it is equal to
> zero and unless another CPU sharing the same policy object has a
> greater load value).
> 
> For this reason, notice that the prev_load field of struct cpu_dbs_info
> is only used by dbs_update() and only in that one place, so if
> cpufreq_governor_start() is modified to always initialize it to 0,
> it will make dbs_update() always compute the actual load first time
> it checks the update time delta against the doubled sampling rate
> (after initialization) and there won't be any side effects of it.
> 
> Consequently, modify cpufreq_governor_start() as described.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/cpufreq/cpufreq_governor.c |    8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> Index: linux-pm/drivers/cpufreq/cpufreq_governor.c
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/cpufreq_governor.c
> +++ linux-pm/drivers/cpufreq/cpufreq_governor.c
> @@ -508,12 +508,12 @@ static int cpufreq_governor_start(struct
>  
>  	for_each_cpu(j, policy->cpus) {
>  		struct cpu_dbs_info *j_cdbs = &per_cpu(cpu_dbs, j);
> -		unsigned int prev_load;
>  
>  		j_cdbs->prev_cpu_idle = get_cpu_idle_time(j, &j_cdbs->prev_cpu_wall, io_busy);
> -
> -		prev_load = j_cdbs->prev_cpu_wall - j_cdbs->prev_cpu_idle;
> -		j_cdbs->prev_load = 100 * prev_load / (unsigned int)j_cdbs->prev_cpu_wall;
> +		/*
> +		 * Make the first invocation of dbs_update() compute the load.
> +		 */
> +		j_cdbs->prev_load = 0;
>  
>  		if (ignore_nice)
>  			j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];

I tried to understand why the

commit 18b46abd0009 ("cpufreq: governor: Be friendly towards
latency-sensitive bursty workloads")

modify the START section and added this stuff and I completely failed
to understand it now. Do you remember why was this added at all ?

-- 
viresh

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] cpufreq: governor: Fix prev_load initialization in cpufreq_governor_start()
  2016-04-25  4:14 ` Viresh Kumar
@ 2016-04-25 11:24   ` Rafael J. Wysocki
  2016-04-25 11:27     ` Viresh Kumar
  0 siblings, 1 reply; 5+ messages in thread
From: Rafael J. Wysocki @ 2016-04-25 11:24 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Linux PM list, Linux Kernel Mailing List, Chen Yu

On Mon, Apr 25, 2016 at 6:14 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 25-04-16, 03:07, Rafael J. Wysocki wrote:
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> The way cpufreq_governor_start() initializes j_cdbs->prev_load is
>> questionable.
>>
>> First off, j_cdbs->prev_cpu_wall used as a denominator in the
>> computation may be zero.  The case this happens is when
>> get_cpu_idle_time_us() returns -1 and get_cpu_idle_time_jiffy()
>> used to return that number is called exactly at the jiffies_64
>> wrap time.  It is rather hard to trigger that error, but it is not
>> impossible and it will just crash the kernel then.
>>
>> Second, j_cdbs->prev_load is computed as the average load during
>> the entire time since the system started and it may not reflect the
>> load in the previous sampling period (as it is expected to).
>> That doesn't play well with the way dbs_update() uses that value.
>> Namely, if the update time delta (wall_time) happens do be greater
>> than twice the sampling rate on the first invocation of it, the
>> initial value of j_cdbs->prev_load (which may be completely off) will
>> be returned to the caller as the current load (unless it is equal to
>> zero and unless another CPU sharing the same policy object has a
>> greater load value).
>>
>> For this reason, notice that the prev_load field of struct cpu_dbs_info
>> is only used by dbs_update() and only in that one place, so if
>> cpufreq_governor_start() is modified to always initialize it to 0,
>> it will make dbs_update() always compute the actual load first time
>> it checks the update time delta against the doubled sampling rate
>> (after initialization) and there won't be any side effects of it.
>>
>> Consequently, modify cpufreq_governor_start() as described.
>>
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> ---
>>  drivers/cpufreq/cpufreq_governor.c |    8 ++++----
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> Index: linux-pm/drivers/cpufreq/cpufreq_governor.c
>> ===================================================================
>> --- linux-pm.orig/drivers/cpufreq/cpufreq_governor.c
>> +++ linux-pm/drivers/cpufreq/cpufreq_governor.c
>> @@ -508,12 +508,12 @@ static int cpufreq_governor_start(struct
>>
>>       for_each_cpu(j, policy->cpus) {
>>               struct cpu_dbs_info *j_cdbs = &per_cpu(cpu_dbs, j);
>> -             unsigned int prev_load;
>>
>>               j_cdbs->prev_cpu_idle = get_cpu_idle_time(j, &j_cdbs->prev_cpu_wall, io_busy);
>> -
>> -             prev_load = j_cdbs->prev_cpu_wall - j_cdbs->prev_cpu_idle;
>> -             j_cdbs->prev_load = 100 * prev_load / (unsigned int)j_cdbs->prev_cpu_wall;
>> +             /*
>> +              * Make the first invocation of dbs_update() compute the load.
>> +              */
>> +             j_cdbs->prev_load = 0;
>>
>>               if (ignore_nice)
>>                       j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];
>
> I tried to understand why the
>
> commit 18b46abd0009 ("cpufreq: governor: Be friendly towards
> latency-sensitive bursty workloads")
>
> modify the START section and added this stuff and I completely failed
> to understand it now. Do you remember why was this added at all ?

The big comment in dbs_update() explains it, but not the initialization part.

I guess the initialization tried to be smart and avoid the "almost
zero load" effect in cases when the CPU is idle to start with, but
that's questionable as explained in my changelog.  I guess I should
add a "Fixes:" tag for that commit to the patch. :-)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] cpufreq: governor: Fix prev_load initialization in cpufreq_governor_start()
  2016-04-25 11:24   ` Rafael J. Wysocki
@ 2016-04-25 11:27     ` Viresh Kumar
  0 siblings, 0 replies; 5+ messages in thread
From: Viresh Kumar @ 2016-04-25 11:27 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Linux PM list, Linux Kernel Mailing List, Chen Yu

On 25-04-16, 13:24, Rafael J. Wysocki wrote:
> On Mon, Apr 25, 2016 at 6:14 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > On 25-04-16, 03:07, Rafael J. Wysocki wrote:
> >> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >>
> >> The way cpufreq_governor_start() initializes j_cdbs->prev_load is
> >> questionable.
> >>
> >> First off, j_cdbs->prev_cpu_wall used as a denominator in the
> >> computation may be zero.  The case this happens is when
> >> get_cpu_idle_time_us() returns -1 and get_cpu_idle_time_jiffy()
> >> used to return that number is called exactly at the jiffies_64
> >> wrap time.  It is rather hard to trigger that error, but it is not
> >> impossible and it will just crash the kernel then.
> >>
> >> Second, j_cdbs->prev_load is computed as the average load during
> >> the entire time since the system started and it may not reflect the
> >> load in the previous sampling period (as it is expected to).
> >> That doesn't play well with the way dbs_update() uses that value.
> >> Namely, if the update time delta (wall_time) happens do be greater
> >> than twice the sampling rate on the first invocation of it, the
> >> initial value of j_cdbs->prev_load (which may be completely off) will
> >> be returned to the caller as the current load (unless it is equal to
> >> zero and unless another CPU sharing the same policy object has a
> >> greater load value).
> >>
> >> For this reason, notice that the prev_load field of struct cpu_dbs_info
> >> is only used by dbs_update() and only in that one place, so if
> >> cpufreq_governor_start() is modified to always initialize it to 0,
> >> it will make dbs_update() always compute the actual load first time
> >> it checks the update time delta against the doubled sampling rate
> >> (after initialization) and there won't be any side effects of it.
> >>
> >> Consequently, modify cpufreq_governor_start() as described.
> >>
> >> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >> ---
> >>  drivers/cpufreq/cpufreq_governor.c |    8 ++++----
> >>  1 file changed, 4 insertions(+), 4 deletions(-)
> >>
> >> Index: linux-pm/drivers/cpufreq/cpufreq_governor.c
> >> ===================================================================
> >> --- linux-pm.orig/drivers/cpufreq/cpufreq_governor.c
> >> +++ linux-pm/drivers/cpufreq/cpufreq_governor.c
> >> @@ -508,12 +508,12 @@ static int cpufreq_governor_start(struct
> >>
> >>       for_each_cpu(j, policy->cpus) {
> >>               struct cpu_dbs_info *j_cdbs = &per_cpu(cpu_dbs, j);
> >> -             unsigned int prev_load;
> >>
> >>               j_cdbs->prev_cpu_idle = get_cpu_idle_time(j, &j_cdbs->prev_cpu_wall, io_busy);
> >> -
> >> -             prev_load = j_cdbs->prev_cpu_wall - j_cdbs->prev_cpu_idle;
> >> -             j_cdbs->prev_load = 100 * prev_load / (unsigned int)j_cdbs->prev_cpu_wall;
> >> +             /*
> >> +              * Make the first invocation of dbs_update() compute the load.
> >> +              */
> >> +             j_cdbs->prev_load = 0;
> >>
> >>               if (ignore_nice)
> >>                       j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];
> >
> > I tried to understand why the
> >
> > commit 18b46abd0009 ("cpufreq: governor: Be friendly towards
> > latency-sensitive bursty workloads")
> >
> > modify the START section and added this stuff and I completely failed
> > to understand it now. Do you remember why was this added at all ?
> 
> The big comment in dbs_update() explains it, but not the initialization part.
> 
> I guess the initialization tried to be smart and avoid the "almost
> zero load" effect in cases when the CPU is idle to start with, but
> that's questionable as explained in my changelog.  I guess I should
> add a "Fixes:" tag for that commit to the patch. :-)

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>

:)

-- 
viresh

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH] cpufreq: governor: Fix prev_load initialization in cpufreq_governor_start()
  2016-04-25  1:07 [PATCH] cpufreq: governor: Fix prev_load initialization in cpufreq_governor_start() Rafael J. Wysocki
  2016-04-25  4:14 ` Viresh Kumar
@ 2016-04-25 15:45 ` Chen, Yu C
  1 sibling, 0 replies; 5+ messages in thread
From: Chen, Yu C @ 2016-04-25 15:45 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linux Kernel Mailing List, Viresh Kumar, Linux PM list

Hi,

> -----Original Message-----
> From: Rafael J. Wysocki [mailto:rjw@rjwysocki.net]
> Sent: Monday, April 25, 2016 9:08 AM
> To: Linux PM list
> Cc: Linux Kernel Mailing List; Viresh Kumar; Chen, Yu C
> Subject: [PATCH] cpufreq: governor: Fix prev_load initialization in
> cpufreq_governor_start()
> 
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> The way cpufreq_governor_start() initializes j_cdbs->prev_load is questionable.
> 
> First off, j_cdbs->prev_cpu_wall used as a denominator in the computation may
> be zero.  The case this happens is when
> get_cpu_idle_time_us() returns -1 and get_cpu_idle_time_jiffy() used to return
> that number is called exactly at the jiffies_64 wrap time.  It is rather hard to
> trigger that error, but it is not impossible and it will just crash the kernel then.
> 
> Second, j_cdbs->prev_load is computed as the average load during the entire
> time since the system started and it may not reflect the load in the previous
> sampling period (as it is expected to).
> That doesn't play well with the way dbs_update() uses that value.
> Namely, if the update time delta (wall_time) happens do be greater than twice
happens s/do/to be?
> the sampling rate on the first invocation of it, the initial value of j_cdbs-
> >prev_load (which may be completely off) will be returned to the caller as the
> current load (unless it is equal to zero and unless another CPU sharing the same
> policy object has a greater load value).
> 
> For this reason, notice that the prev_load field of struct cpu_dbs_info is only
> used by dbs_update() and only in that one place, so if
> cpufreq_governor_start() is modified to always initialize it to 0, it will make
> dbs_update() always compute the actual load first time it checks the update
> time delta against the doubled sampling rate (after initialization) and there
> won't be any side effects of it.
> 
> Consequently, modify cpufreq_governor_start() as described.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---

Acked-by: Chen Yu <yu.c.chen@intel.com>

thanks,
Yu

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-04-25 15:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-25  1:07 [PATCH] cpufreq: governor: Fix prev_load initialization in cpufreq_governor_start() Rafael J. Wysocki
2016-04-25  4:14 ` Viresh Kumar
2016-04-25 11:24   ` Rafael J. Wysocki
2016-04-25 11:27     ` Viresh Kumar
2016-04-25 15:45 ` Chen, Yu C

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).