All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH][v3] cpufreq: governor: Fix overflow when calculating idle time
@ 2016-04-19  3:57 Chen Yu
  2016-04-20  1:26 ` Rafael J. Wysocki
  0 siblings, 1 reply; 3+ messages in thread
From: Chen Yu @ 2016-04-19  3:57 UTC (permalink / raw)
  To: linux-pm
  Cc: linux-kernel, Rafael J. Wysocki, Viresh Kumar, Len Brown, Chen Yu

It was reported that after Commit 0df35026c6a5 ("cpufreq: governor:
Fix negative idle_time when configured with CONFIG_HZ_PERIODIC"),
cpufreq ondemand governor started to act oddly. Without any load,
with freshly booted system, it pumped cpu frequency up to maximum
at some point of time and stayed there. The problem is caused by
jiffies overflow in get_cpu_idle_time:

After booting up 5 minutes, the jiffies will round up to zero.
As a result, the following condition in cpu governor will always be
true:
	if (cur_idle_time <= j_cdbs->prev_cpu_idle)
		idle_time = 0;

which caused problems.

For example, once cur_idle_time has rounded up to zero, meanwhile
prev_cpu_idle still remains negative(because of jiffies initial value
of -300HZ, which is very big after converted to unsigned), thus above
condition is met, thus we get a zero of idle running time during
this sample, which causes a high busy time, thus governor always
requests for the highest freq.

This patch fixes this problem by updating prev_cpu_idle for
each sample period, even if prev_cpu_idle is bigger than
cur_idle_time, thus to prevent the scenario of 'prev_cpu_idle always
bigger than cur_idle_time' from happening.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=115261
Reported-by: Timo Valtoaho <timo.valtoaho@gmail.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
v3:
 - Do not use INITIAL_JIFFIES because it should be transparent
   to user, meanwhile keep original semanteme to use delta
   of time slice.
---
v2:
 - Send this patch to a wider scope, including timing-system maintainers,
   as well as some modifications in the commit message to make it more clear.
---
 drivers/cpufreq/cpufreq.c          | 4 ++++
 drivers/cpufreq/cpufreq_governor.c | 8 +++++++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index b87596b..b0479b3 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -132,6 +132,10 @@ struct cpufreq_frequency_table *cpufreq_frequency_get_table(unsigned int cpu)
 }
 EXPORT_SYMBOL_GPL(cpufreq_frequency_get_table);
 
+/**
+ * The wall time and idle time are both possible to round up,
+ * people should use delta rather than the value itself.
+ */
 static inline u64 get_cpu_idle_time_jiffy(unsigned int cpu, u64 *wall)
 {
 	u64 idle_time;
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 10a5cfe..8de3fba 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -197,8 +197,14 @@ unsigned int dbs_update(struct cpufreq_policy *policy)
 			idle_time = 0;
 		} else {
 			idle_time = cur_idle_time - j_cdbs->prev_cpu_idle;
-			j_cdbs->prev_cpu_idle = cur_idle_time;
 		}
+		/*
+		 * It is possible prev_cpu_idle being bigger than cur_idle_time,
+		 * when 32bit rounds up if !CONFIG_VIRT_CPU_ACCOUNTING,
+		 * thus get a 0% idle estimation. So update prev_cpu_idle during
+		 * each sample period to avoid this situation lasting too long.
+		 */
+		j_cdbs->prev_cpu_idle = cur_idle_time;
 
 		if (ignore_nice) {
 			u64 cur_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH][v3] cpufreq: governor: Fix overflow when calculating idle time
  2016-04-19  3:57 [PATCH][v3] cpufreq: governor: Fix overflow when calculating idle time Chen Yu
@ 2016-04-20  1:26 ` Rafael J. Wysocki
  2016-04-20 13:52   ` Rafael J. Wysocki
  0 siblings, 1 reply; 3+ messages in thread
From: Rafael J. Wysocki @ 2016-04-20  1:26 UTC (permalink / raw)
  To: Chen Yu
  Cc: linux-pm, linux-kernel, Rafael J. Wysocki, Viresh Kumar, Len Brown

On Tuesday, April 19, 2016 11:57:32 AM Chen Yu wrote:
> It was reported that after Commit 0df35026c6a5 ("cpufreq: governor:
> Fix negative idle_time when configured with CONFIG_HZ_PERIODIC"),
> cpufreq ondemand governor started to act oddly. Without any load,
> with freshly booted system, it pumped cpu frequency up to maximum
> at some point of time and stayed there. The problem is caused by
> jiffies overflow in get_cpu_idle_time:
> 
> After booting up 5 minutes, the jiffies will round up to zero.
> As a result, the following condition in cpu governor will always be
> true:
> 	if (cur_idle_time <= j_cdbs->prev_cpu_idle)
> 		idle_time = 0;
> 
> which caused problems.
> 
> For example, once cur_idle_time has rounded up to zero, meanwhile
> prev_cpu_idle still remains negative(because of jiffies initial value
> of -300HZ, which is very big after converted to unsigned), thus above
> condition is met, thus we get a zero of idle running time during
> this sample, which causes a high busy time, thus governor always
> requests for the highest freq.
> 
> This patch fixes this problem by updating prev_cpu_idle for
> each sample period, even if prev_cpu_idle is bigger than
> cur_idle_time, thus to prevent the scenario of 'prev_cpu_idle always
> bigger than cur_idle_time' from happening.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=115261
> Reported-by: Timo Valtoaho <timo.valtoaho@gmail.com>
> Signed-off-by: Chen Yu <yu.c.chen@intel.com>

This looks better than the previous versions to me, but ->

> ---
> v3:
>  - Do not use INITIAL_JIFFIES because it should be transparent
>    to user, meanwhile keep original semanteme to use delta
>    of time slice.
> ---
> v2:
>  - Send this patch to a wider scope, including timing-system maintainers,
>    as well as some modifications in the commit message to make it more clear.
> ---
>  drivers/cpufreq/cpufreq.c          | 4 ++++
>  drivers/cpufreq/cpufreq_governor.c | 8 +++++++-
>  2 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index b87596b..b0479b3 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -132,6 +132,10 @@ struct cpufreq_frequency_table *cpufreq_frequency_get_table(unsigned int cpu)
>  }
>  EXPORT_SYMBOL_GPL(cpufreq_frequency_get_table);
>  
> +/**
> + * The wall time and idle time are both possible to round up,

That's difficult to parse.  I guess you wanted to say that they may overflow?

> + * people should use delta rather than the value itself.
> + */

-> this new comment doesn't really belong to the fix.  You can send a separate
patch adding it.

Moreover -->

>  static inline u64 get_cpu_idle_time_jiffy(unsigned int cpu, u64 *wall)
>  {
>  	u64 idle_time;
> diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
> index 10a5cfe..8de3fba 100644
> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -197,8 +197,14 @@ unsigned int dbs_update(struct cpufreq_policy *policy)
>  			idle_time = 0;
>  		} else {
>  			idle_time = cur_idle_time - j_cdbs->prev_cpu_idle;
> -			j_cdbs->prev_cpu_idle = cur_idle_time;
>  		}
> +		/*
> +		 * It is possible prev_cpu_idle being bigger than cur_idle_time,
> +		 * when 32bit rounds up if !CONFIG_VIRT_CPU_ACCOUNTING,
> +		 * thus get a 0% idle estimation. So update prev_cpu_idle during
> +		 * each sample period to avoid this situation lasting too long.
> +		 */
> +		j_cdbs->prev_cpu_idle = cur_idle_time;

--> it looks like the bug is that we are comparing signed values as unsigned.

>  
>  		if (ignore_nice) {
>  			u64 cur_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];
> 

So what about the simple change below?

---
 drivers/cpufreq/cpufreq_governor.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-pm/drivers/cpufreq/cpufreq_governor.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/cpufreq_governor.c
+++ linux-pm/drivers/cpufreq/cpufreq_governor.c
@@ -146,7 +146,7 @@ unsigned int dbs_update(struct cpufreq_p
 		wall_time = cur_wall_time - j_cdbs->prev_cpu_wall;
 		j_cdbs->prev_cpu_wall = cur_wall_time;
 
-		if (cur_idle_time <= j_cdbs->prev_cpu_idle) {
+		if ((s64)cur_idle_time <= (s64)j_cdbs->prev_cpu_idle) {
 			idle_time = 0;
 		} else {
 			idle_time = cur_idle_time - j_cdbs->prev_cpu_idle;

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH][v3] cpufreq: governor: Fix overflow when calculating idle time
  2016-04-20  1:26 ` Rafael J. Wysocki
@ 2016-04-20 13:52   ` Rafael J. Wysocki
  0 siblings, 0 replies; 3+ messages in thread
From: Rafael J. Wysocki @ 2016-04-20 13:52 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Chen Yu, linux-pm, Linux Kernel Mailing List, Rafael J. Wysocki,
	Viresh Kumar, Len Brown

On Wed, Apr 20, 2016 at 3:26 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Tuesday, April 19, 2016 11:57:32 AM Chen Yu wrote:
>> It was reported that after Commit 0df35026c6a5 ("cpufreq: governor:
>> Fix negative idle_time when configured with CONFIG_HZ_PERIODIC"),
>> cpufreq ondemand governor started to act oddly. Without any load,
>> with freshly booted system, it pumped cpu frequency up to maximum
>> at some point of time and stayed there. The problem is caused by
>> jiffies overflow in get_cpu_idle_time:
>>
>> After booting up 5 minutes, the jiffies will round up to zero.
>> As a result, the following condition in cpu governor will always be
>> true:
>>       if (cur_idle_time <= j_cdbs->prev_cpu_idle)
>>               idle_time = 0;
>>
>> which caused problems.
>>
>> For example, once cur_idle_time has rounded up to zero, meanwhile
>> prev_cpu_idle still remains negative(because of jiffies initial value
>> of -300HZ, which is very big after converted to unsigned), thus above
>> condition is met, thus we get a zero of idle running time during
>> this sample, which causes a high busy time, thus governor always
>> requests for the highest freq.
>>
>> This patch fixes this problem by updating prev_cpu_idle for
>> each sample period, even if prev_cpu_idle is bigger than
>> cur_idle_time, thus to prevent the scenario of 'prev_cpu_idle always
>> bigger than cur_idle_time' from happening.
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=115261
>> Reported-by: Timo Valtoaho <timo.valtoaho@gmail.com>
>> Signed-off-by: Chen Yu <yu.c.chen@intel.com>
>
> This looks better than the previous versions to me, but ->
>
>> ---
>> v3:
>>  - Do not use INITIAL_JIFFIES because it should be transparent
>>    to user, meanwhile keep original semanteme to use delta
>>    of time slice.
>> ---
>> v2:
>>  - Send this patch to a wider scope, including timing-system maintainers,
>>    as well as some modifications in the commit message to make it more clear.
>> ---
>>  drivers/cpufreq/cpufreq.c          | 4 ++++
>>  drivers/cpufreq/cpufreq_governor.c | 8 +++++++-
>>  2 files changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
>> index b87596b..b0479b3 100644
>> --- a/drivers/cpufreq/cpufreq.c
>> +++ b/drivers/cpufreq/cpufreq.c
>> @@ -132,6 +132,10 @@ struct cpufreq_frequency_table *cpufreq_frequency_get_table(unsigned int cpu)
>>  }
>>  EXPORT_SYMBOL_GPL(cpufreq_frequency_get_table);
>>
>> +/**
>> + * The wall time and idle time are both possible to round up,
>
> That's difficult to parse.  I guess you wanted to say that they may overflow?
>
>> + * people should use delta rather than the value itself.
>> + */
>
> -> this new comment doesn't really belong to the fix.  You can send a separate
> patch adding it.
>
> Moreover -->
>
>>  static inline u64 get_cpu_idle_time_jiffy(unsigned int cpu, u64 *wall)
>>  {
>>       u64 idle_time;
>> diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
>> index 10a5cfe..8de3fba 100644
>> --- a/drivers/cpufreq/cpufreq_governor.c
>> +++ b/drivers/cpufreq/cpufreq_governor.c
>> @@ -197,8 +197,14 @@ unsigned int dbs_update(struct cpufreq_policy *policy)
>>                       idle_time = 0;
>>               } else {
>>                       idle_time = cur_idle_time - j_cdbs->prev_cpu_idle;
>> -                     j_cdbs->prev_cpu_idle = cur_idle_time;
>>               }
>> +             /*
>> +              * It is possible prev_cpu_idle being bigger than cur_idle_time,
>> +              * when 32bit rounds up if !CONFIG_VIRT_CPU_ACCOUNTING,
>> +              * thus get a 0% idle estimation. So update prev_cpu_idle during
>> +              * each sample period to avoid this situation lasting too long.
>> +              */
>> +             j_cdbs->prev_cpu_idle = cur_idle_time;
>
> --> it looks like the bug is that we are comparing signed values as unsigned.
>
>>
>>               if (ignore_nice) {
>>                       u64 cur_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];
>>
>
> So what about the simple change below?

Well, it doesn't make sense, sorry about the confusion.

> ---
>  drivers/cpufreq/cpufreq_governor.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> Index: linux-pm/drivers/cpufreq/cpufreq_governor.c
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/cpufreq_governor.c
> +++ linux-pm/drivers/cpufreq/cpufreq_governor.c
> @@ -146,7 +146,7 @@ unsigned int dbs_update(struct cpufreq_p
>                 wall_time = cur_wall_time - j_cdbs->prev_cpu_wall;
>                 j_cdbs->prev_cpu_wall = cur_wall_time;
>
> -               if (cur_idle_time <= j_cdbs->prev_cpu_idle) {
> +               if ((s64)cur_idle_time <= (s64)j_cdbs->prev_cpu_idle) {
>                         idle_time = 0;
>                 } else {
>                         idle_time = cur_idle_time - j_cdbs->prev_cpu_idle;
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-04-20 13:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-19  3:57 [PATCH][v3] cpufreq: governor: Fix overflow when calculating idle time Chen Yu
2016-04-20  1:26 ` Rafael J. Wysocki
2016-04-20 13:52   ` Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.