All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched/cpufreq_schedutil: use now as reference when aggregating shared policy requests
@ 2017-05-03 13:30 Juri Lelli
  2017-05-04 14:29 ` Rafael J. Wysocki
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Juri Lelli @ 2017-05-03 13:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-pm, rjw, viresh.kumar, peterz, vincent.guittot, juri.lelli,
	Rafael J . Wysocki

Currently, sugov_next_freq_shared() uses last_freq_update_time as a
reference to decide when to start considering CPU contributions as
stale.

However, since last_freq_update_time is set by the last CPU that issued
a frequency transition, this might cause problems in certain cases. In
practice, the detection of stale utilization values fails whenever the
CPU with such values was the last to update the policy. For example (and
please note again that the SCHED_CPUFREQ_RT flag is not the problem
here, but only the detection of after how much time that flag has to be
considered stale), suppose a policy with 2 CPUs:

               CPU0                |               CPU1
                                   |
                                   |     RT task scheduled
                                   |     SCHED_CPUFREQ_RT is set
                                   |     CPU1->last_update = now
                                   |     freq transition to max
                                   |     last_freq_update_time = now
                                   |

                        more than TICK_NSEC nsecs

                                   |
     a small CFS wakes up          |
     CPU0->last_update = now1      |
     delta_ns(CPU0) < TICK_NSEC*   |
     CPU0's util is considered     |
     delta_ns(CPU1) =              |
      last_freq_update_time -      |
      CPU1->last_update = 0        |
      < TICK_NSEC                  |
     CPU1 is still considered      |
     CPU1->SCHED_CPUFREQ_RT is set |
     we stay at max (until CPU1    |
     exits from idle)              |

* delta_ns is actually negative as now1 > last_freq_update_time

While last_freq_update_time is a sensible reference for rate limiting,
it doesn't seem to be useful for working around stale CPU states.

Fix the problem by always considering now (time) as the reference for
deciding when CPUs have stale contributions.

Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
---
 kernel/sched/cpufreq_schedutil.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 76877a62b5fa..622eed1b7658 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -245,11 +245,10 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
 	sugov_update_commit(sg_policy, time, next_f);
 }
 
-static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu)
+static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
 {
 	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
 	struct cpufreq_policy *policy = sg_policy->policy;
-	u64 last_freq_update_time = sg_policy->last_freq_update_time;
 	unsigned long util = 0, max = 1;
 	unsigned int j;
 
@@ -265,7 +264,7 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu)
 		 * enough, don't take the CPU into account as it probably is
 		 * idle now (and clear iowait_boost for it).
 		 */
-		delta_ns = last_freq_update_time - j_sg_cpu->last_update;
+		delta_ns = time - j_sg_cpu->last_update;
 		if (delta_ns > TICK_NSEC) {
 			j_sg_cpu->iowait_boost = 0;
 			continue;
@@ -309,7 +308,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
 		if (flags & SCHED_CPUFREQ_RT_DL)
 			next_f = sg_policy->policy->cpuinfo.max_freq;
 		else
-			next_f = sugov_next_freq_shared(sg_cpu);
+			next_f = sugov_next_freq_shared(sg_cpu, time);
 
 		sugov_update_commit(sg_policy, time, next_f);
 	}
-- 
2.10.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] sched/cpufreq_schedutil: use now as reference when aggregating shared policy requests
  2017-05-03 13:30 [PATCH] sched/cpufreq_schedutil: use now as reference when aggregating shared policy requests Juri Lelli
@ 2017-05-04 14:29 ` Rafael J. Wysocki
  2017-05-04 14:40   ` Juri Lelli
  2017-05-04 14:41 ` Vincent Guittot
  2017-05-05  6:06 ` Viresh Kumar
  2 siblings, 1 reply; 5+ messages in thread
From: Rafael J. Wysocki @ 2017-05-04 14:29 UTC (permalink / raw)
  To: Juri Lelli; +Cc: linux-kernel, linux-pm, viresh.kumar, peterz, vincent.guittot

On Wednesday, May 03, 2017 02:30:48 PM Juri Lelli wrote:
> Currently, sugov_next_freq_shared() uses last_freq_update_time as a
> reference to decide when to start considering CPU contributions as
> stale.
> 
> However, since last_freq_update_time is set by the last CPU that issued
> a frequency transition, this might cause problems in certain cases. In
> practice, the detection of stale utilization values fails whenever the
> CPU with such values was the last to update the policy. For example (and
> please note again that the SCHED_CPUFREQ_RT flag is not the problem
> here, but only the detection of after how much time that flag has to be
> considered stale), suppose a policy with 2 CPUs:
> 
>                CPU0                |               CPU1
>                                    |
>                                    |     RT task scheduled
>                                    |     SCHED_CPUFREQ_RT is set
>                                    |     CPU1->last_update = now
>                                    |     freq transition to max
>                                    |     last_freq_update_time = now
>                                    |
> 
>                         more than TICK_NSEC nsecs
> 
>                                    |
>      a small CFS wakes up          |
>      CPU0->last_update = now1      |
>      delta_ns(CPU0) < TICK_NSEC*   |
>      CPU0's util is considered     |
>      delta_ns(CPU1) =              |
>       last_freq_update_time -      |
>       CPU1->last_update = 0        |
>       < TICK_NSEC                  |
>      CPU1 is still considered      |
>      CPU1->SCHED_CPUFREQ_RT is set |
>      we stay at max (until CPU1    |
>      exits from idle)              |
> 
> * delta_ns is actually negative as now1 > last_freq_update_time
> 
> While last_freq_update_time is a sensible reference for rate limiting,
> it doesn't seem to be useful for working around stale CPU states.
> 
> Fix the problem by always considering now (time) as the reference for
> deciding when CPUs have stale contributions.
> 
> Signed-off-by: Juri Lelli <juri.lelli@arm.com>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>

OK

I'll queue this up if there are no objections from the people in the CC.

> ---
>  kernel/sched/cpufreq_schedutil.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 76877a62b5fa..622eed1b7658 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -245,11 +245,10 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
>  	sugov_update_commit(sg_policy, time, next_f);
>  }
>  
> -static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu)
> +static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
>  {
>  	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
>  	struct cpufreq_policy *policy = sg_policy->policy;
> -	u64 last_freq_update_time = sg_policy->last_freq_update_time;
>  	unsigned long util = 0, max = 1;
>  	unsigned int j;
>  
> @@ -265,7 +264,7 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu)
>  		 * enough, don't take the CPU into account as it probably is
>  		 * idle now (and clear iowait_boost for it).
>  		 */
> -		delta_ns = last_freq_update_time - j_sg_cpu->last_update;
> +		delta_ns = time - j_sg_cpu->last_update;
>  		if (delta_ns > TICK_NSEC) {
>  			j_sg_cpu->iowait_boost = 0;
>  			continue;
> @@ -309,7 +308,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
>  		if (flags & SCHED_CPUFREQ_RT_DL)
>  			next_f = sg_policy->policy->cpuinfo.max_freq;
>  		else
> -			next_f = sugov_next_freq_shared(sg_cpu);
> +			next_f = sugov_next_freq_shared(sg_cpu, time);
>  
>  		sugov_update_commit(sg_policy, time, next_f);
>  	}
> 

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] sched/cpufreq_schedutil: use now as reference when aggregating shared policy requests
  2017-05-04 14:29 ` Rafael J. Wysocki
@ 2017-05-04 14:40   ` Juri Lelli
  0 siblings, 0 replies; 5+ messages in thread
From: Juri Lelli @ 2017-05-04 14:40 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-kernel, linux-pm, viresh.kumar, peterz, vincent.guittot

Hi Rafael,

On 04/05/17 16:29, Rafael J. Wysocki wrote:
> On Wednesday, May 03, 2017 02:30:48 PM Juri Lelli wrote:
> > Currently, sugov_next_freq_shared() uses last_freq_update_time as a
> > reference to decide when to start considering CPU contributions as
> > stale.
> > 
> > However, since last_freq_update_time is set by the last CPU that issued
> > a frequency transition, this might cause problems in certain cases. In
> > practice, the detection of stale utilization values fails whenever the
> > CPU with such values was the last to update the policy. For example (and
> > please note again that the SCHED_CPUFREQ_RT flag is not the problem
> > here, but only the detection of after how much time that flag has to be
> > considered stale), suppose a policy with 2 CPUs:
> > 
> >                CPU0                |               CPU1
> >                                    |
> >                                    |     RT task scheduled
> >                                    |     SCHED_CPUFREQ_RT is set
> >                                    |     CPU1->last_update = now
> >                                    |     freq transition to max
> >                                    |     last_freq_update_time = now
> >                                    |
> > 
> >                         more than TICK_NSEC nsecs
> > 
> >                                    |
> >      a small CFS wakes up          |
> >      CPU0->last_update = now1      |
> >      delta_ns(CPU0) < TICK_NSEC*   |
> >      CPU0's util is considered     |
> >      delta_ns(CPU1) =              |
> >       last_freq_update_time -      |
> >       CPU1->last_update = 0        |
> >       < TICK_NSEC                  |
> >      CPU1 is still considered      |
> >      CPU1->SCHED_CPUFREQ_RT is set |
> >      we stay at max (until CPU1    |
> >      exits from idle)              |
> > 
> > * delta_ns is actually negative as now1 > last_freq_update_time
> > 
> > While last_freq_update_time is a sensible reference for rate limiting,
> > it doesn't seem to be useful for working around stale CPU states.
> > 
> > Fix the problem by always considering now (time) as the reference for
> > deciding when CPUs have stale contributions.
> > 
> > Signed-off-by: Juri Lelli <juri.lelli@arm.com>
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Cc: Viresh Kumar <viresh.kumar@linaro.org>
> 
> OK
> 
> I'll queue this up if there are no objections from the people in the CC.
> 

Thanks!

Best,

- Juri

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] sched/cpufreq_schedutil: use now as reference when aggregating shared policy requests
  2017-05-03 13:30 [PATCH] sched/cpufreq_schedutil: use now as reference when aggregating shared policy requests Juri Lelli
  2017-05-04 14:29 ` Rafael J. Wysocki
@ 2017-05-04 14:41 ` Vincent Guittot
  2017-05-05  6:06 ` Viresh Kumar
  2 siblings, 0 replies; 5+ messages in thread
From: Vincent Guittot @ 2017-05-04 14:41 UTC (permalink / raw)
  To: Juri Lelli
  Cc: linux-kernel, linux-pm, rjw, viresh kumar, Peter Zijlstra,
	Rafael J . Wysocki

On 3 May 2017 at 15:30, Juri Lelli <juri.lelli@arm.com> wrote:
> Currently, sugov_next_freq_shared() uses last_freq_update_time as a
> reference to decide when to start considering CPU contributions as
> stale.
>
> However, since last_freq_update_time is set by the last CPU that issued
> a frequency transition, this might cause problems in certain cases. In
> practice, the detection of stale utilization values fails whenever the
> CPU with such values was the last to update the policy. For example (and
> please note again that the SCHED_CPUFREQ_RT flag is not the problem
> here, but only the detection of after how much time that flag has to be
> considered stale), suppose a policy with 2 CPUs:
>
>                CPU0                |               CPU1
>                                    |
>                                    |     RT task scheduled
>                                    |     SCHED_CPUFREQ_RT is set
>                                    |     CPU1->last_update = now
>                                    |     freq transition to max
>                                    |     last_freq_update_time = now
>                                    |
>
>                         more than TICK_NSEC nsecs
>
>                                    |
>      a small CFS wakes up          |
>      CPU0->last_update = now1      |
>      delta_ns(CPU0) < TICK_NSEC*   |
>      CPU0's util is considered     |
>      delta_ns(CPU1) =              |
>       last_freq_update_time -      |
>       CPU1->last_update = 0        |
>       < TICK_NSEC                  |
>      CPU1 is still considered      |
>      CPU1->SCHED_CPUFREQ_RT is set |
>      we stay at max (until CPU1    |
>      exits from idle)              |
>
> * delta_ns is actually negative as now1 > last_freq_update_time
>
> While last_freq_update_time is a sensible reference for rate limiting,
> it doesn't seem to be useful for working around stale CPU states.
>
> Fix the problem by always considering now (time) as the reference for
> deciding when CPUs have stale contributions.
>
> Signed-off-by: Juri Lelli <juri.lelli@arm.com>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>

FWIW

Acked-by: Vincent Guittot <vincent.guittot@linaro.org>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] sched/cpufreq_schedutil: use now as reference when aggregating shared policy requests
  2017-05-03 13:30 [PATCH] sched/cpufreq_schedutil: use now as reference when aggregating shared policy requests Juri Lelli
  2017-05-04 14:29 ` Rafael J. Wysocki
  2017-05-04 14:41 ` Vincent Guittot
@ 2017-05-05  6:06 ` Viresh Kumar
  2 siblings, 0 replies; 5+ messages in thread
From: Viresh Kumar @ 2017-05-05  6:06 UTC (permalink / raw)
  To: Juri Lelli
  Cc: linux-kernel, linux-pm, rjw, peterz, vincent.guittot, Rafael J . Wysocki

On 03-05-17, 14:30, Juri Lelli wrote:
> Currently, sugov_next_freq_shared() uses last_freq_update_time as a
> reference to decide when to start considering CPU contributions as
> stale.
> 
> However, since last_freq_update_time is set by the last CPU that issued
> a frequency transition, this might cause problems in certain cases. In
> practice, the detection of stale utilization values fails whenever the
> CPU with such values was the last to update the policy. For example (and
> please note again that the SCHED_CPUFREQ_RT flag is not the problem
> here, but only the detection of after how much time that flag has to be
> considered stale), suppose a policy with 2 CPUs:
> 
>                CPU0                |               CPU1
>                                    |
>                                    |     RT task scheduled
>                                    |     SCHED_CPUFREQ_RT is set
>                                    |     CPU1->last_update = now
>                                    |     freq transition to max
>                                    |     last_freq_update_time = now
>                                    |
> 
>                         more than TICK_NSEC nsecs
> 
>                                    |
>      a small CFS wakes up          |
>      CPU0->last_update = now1      |
>      delta_ns(CPU0) < TICK_NSEC*   |
>      CPU0's util is considered     |
>      delta_ns(CPU1) =              |
>       last_freq_update_time -      |
>       CPU1->last_update = 0        |
>       < TICK_NSEC                  |
>      CPU1 is still considered      |
>      CPU1->SCHED_CPUFREQ_RT is set |
>      we stay at max (until CPU1    |
>      exits from idle)              |
> 
> * delta_ns is actually negative as now1 > last_freq_update_time
> 
> While last_freq_update_time is a sensible reference for rate limiting,
> it doesn't seem to be useful for working around stale CPU states.
> 
> Fix the problem by always considering now (time) as the reference for
> deciding when CPUs have stale contributions.
> 
> Signed-off-by: Juri Lelli <juri.lelli@arm.com>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  kernel/sched/cpufreq_schedutil.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>

-- 
viresh

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-05-05  6:06 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-03 13:30 [PATCH] sched/cpufreq_schedutil: use now as reference when aggregating shared policy requests Juri Lelli
2017-05-04 14:29 ` Rafael J. Wysocki
2017-05-04 14:40   ` Juri Lelli
2017-05-04 14:41 ` Vincent Guittot
2017-05-05  6:06 ` Viresh Kumar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.