linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] cpufreq: schedutil: set sg_policy->next_freq to the final cpufreq
@ 2020-10-29  1:43 zhuguangqing83
  2020-10-29  7:19 ` Viresh Kumar
  0 siblings, 1 reply; 7+ messages in thread
From: zhuguangqing83 @ 2020-10-29  1:43 UTC (permalink / raw)
  To: 'Viresh Kumar'
  Cc: rjw, mingo, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, linux-pm,
	linux-kernel, 'zhuguangqing'


> On 28-10-20, 19:03, zhuguangqing83 wrote:
> > Thanks for your comments. Maybe my description was not clear before.
> >
> > If I understand correctly, when policy->min/max get changed in the time
> > Window between get_next_freq() and sugov_fast_switch(), to be more
> > precise between cpufreq_driver_resolve_freq() and
> > cpufreq_driver_fast_switch(), the issue may happen.
> >
> > For example, the first time schedutil callback gets called from the
> > scheduler, we reached get_next_freq() and calculate the next_freq,
> > suppose next_freq is 1.0 GHz, then sg_policy->next_freq is updated
> > to 1.0 GHz in sugov_update_next_freq(). If policy->min/max get
> > change right now, suppose policy->min is changed to 1.2 GHz,
> > then the final next_freq is 1.2 GHz for there is another clamp
> > between policy->min and policy->max in cpufreq_driver_fast_switch().
> > Then sg_policy->next_freq(1.0 GHz) is not the final next_freq(1.2 GHz).
> >
> > The second time schedutil callback gets called from the scheduler, there
> > are two issues:
> > (1) Suppose policy->min is still 1.2 GHz, we reached get_next_freq() and
> > calculate the next_freq, because sg_policy->limits_changed gets set to
> > true by sugov_limits() and there is a clamp between policy->min and
> > policy->max, so this time next_freq will be greater than or equal to 1.2
> > GHz, suppose next_freq is also 1.2 GHz. Now next_freq is 1.2 GHz and
> > sg_policy->next_freq is 1.0 GHz,  then we find
> > "if (sg_policy->next_freq == next_freq)" is not satisfied and we call
> > cpufreq driver to change the cpufreq to 1.2 GHz. Actually it's already
> > 1.2 GHz, it's not necessary to change this time.
> 
> This isn't that bad, but ...
> 
> > (2) Suppose policy->min was changed again to 1.0 GHz before, we reached
> > get_next_freq() and calculate the next_freq, suppose next_freq is also
> > 1.0 GHz. Now next_freq is 1.0 GHz and sg_policy->next_freq is also 1.0 GHz,
> > then we find "if (sg_policy->next_freq == next_freq)" is satisfied and we
> > don't change the cpufreq. Actually we should change the cpufreq to 1.0 GHz
> > this time.
> 
> This is a real problem we can get into. What about this diff instead ?
> 
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 0c5c61a095f6..bf7800e853d3 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -105,7 +105,6 @@ static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
>         if (sg_policy->next_freq == next_freq)
>                 return false;
> 
> -       sg_policy->next_freq = next_freq;
>         sg_policy->last_freq_update_time = time;
> 
>         return true;

It's a little strange that sg_policy->next_freq and 
sg_policy->last_freq_update_time are not updated at the same time.

> @@ -115,7 +114,7 @@ static void sugov_fast_switch(struct sugov_policy *sg_policy, u64 time,
>                               unsigned int next_freq)
>  {
>         if (sugov_update_next_freq(sg_policy, time, next_freq))
> -               cpufreq_driver_fast_switch(sg_policy->policy, next_freq);
> +               sg_policy->next_freq = cpufreq_driver_fast_switch(sg_policy->policy, next_freq);
>  }
> 

Great, it also takes into account the issue that 0 is returned by the
driver's ->fast_switch() callback to indicate an error condition.

For policy->min/max may be not the real CPU frequency in OPPs, so
next_freq got from get_next_freq() which is after clamping between
policy->min and policy->max may be not the real CPU frequency in OPPs.
In that case, if we use a real CPU frequency in OPPs returned from
cpufreq_driver_fast_switch() to compare with next_freq,
"if (sg_policy->next_freq == next_freq)" will never be satisfied, so we
change the CPU frequency every time schedutil callback gets called from
the scheduler. I see the current code in get_next_freq() as following,
the issue mentioned above should not happen. Maybe it's just one of my
unnecessary worries.

static unsigned int get_next_freq(struct sugov_policy *sg_policy,
				  unsigned long util, unsigned long max)
{
......
	if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
		return sg_policy->next_freq;
......
}

>  static void sugov_deferred_update(struct sugov_policy *sg_policy, u64 time,
> @@ -124,6 +123,7 @@ static void sugov_deferred_update(struct sugov_policy *sg_policy, u64 time,
>         if (!sugov_update_next_freq(sg_policy, time, next_freq))
>                 return;
> 
> +       sg_policy->next_freq = next_freq;
>         if (!sg_policy->work_in_progress) {
>                 sg_policy->work_in_progress = true;
>                 irq_work_queue(&sg_policy->irq_work);
> 
> --
> viresh


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] cpufreq: schedutil: set sg_policy->next_freq to the final cpufreq
  2020-10-29  1:43 [PATCH] cpufreq: schedutil: set sg_policy->next_freq to the final cpufreq zhuguangqing83
@ 2020-10-29  7:19 ` Viresh Kumar
  2020-10-29 12:52   ` Rafael J. Wysocki
  0 siblings, 1 reply; 7+ messages in thread
From: Viresh Kumar @ 2020-10-29  7:19 UTC (permalink / raw)
  To: zhuguangqing83
  Cc: rjw, mingo, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, linux-pm,
	linux-kernel, 'zhuguangqing'

Your mail client is screwing the "In-reply-to" field of the message
and that prevents it to appear properly in the thread in mailboxes of
other people, please fix that.

On 29-10-20, 09:43, zhuguangqing83 wrote:
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 0c5c61a095f6..bf7800e853d3 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -105,7 +105,6 @@ static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> >         if (sg_policy->next_freq == next_freq)
> >                 return false;
> > 
> > -       sg_policy->next_freq = next_freq;
> >         sg_policy->last_freq_update_time = time;
> > 
> >         return true;
> 
> It's a little strange that sg_policy->next_freq and 
> sg_policy->last_freq_update_time are not updated at the same time.
> 
> > @@ -115,7 +114,7 @@ static void sugov_fast_switch(struct sugov_policy *sg_policy, u64 time,
> >                               unsigned int next_freq)
> >  {
> >         if (sugov_update_next_freq(sg_policy, time, next_freq))
> > -               cpufreq_driver_fast_switch(sg_policy->policy, next_freq);
> > +               sg_policy->next_freq = cpufreq_driver_fast_switch(sg_policy->policy, next_freq);
> >  }
> > 
> 
> Great, it also takes into account the issue that 0 is returned by the
> driver's ->fast_switch() callback to indicate an error condition.

Yes but even my change wasn't good enough, more on it later.

> For policy->min/max may be not the real CPU frequency in OPPs, so

No, that can't happen. If userspace tries to set a value too large or
too small, we clamp that too to policy->max/min and so the below
problem shall never occur.

> next_freq got from get_next_freq() which is after clamping between
> policy->min and policy->max may be not the real CPU frequency in OPPs.
> In that case, if we use a real CPU frequency in OPPs returned from
> cpufreq_driver_fast_switch() to compare with next_freq,
> "if (sg_policy->next_freq == next_freq)" will never be satisfied, so we
> change the CPU frequency every time schedutil callback gets called from
> the scheduler. I see the current code in get_next_freq() as following,
> the issue mentioned above should not happen. Maybe it's just one of my
> unnecessary worries.

Coming back to my patch (and yours too), it only fixes the fast-switch
case and not the slow path which can also end up clamping the
frequency. And to be honest, even the drivers can have their own
clamping code in place, no one is stopping them too.

And we also need to do something about the cached_raw_freq as well, as
it will not be in sync with next_freq anymore.

Here is another attempt from me tackling this problem. The idea is to
check if the previous freq update went as expected or not. And if not,
we can't rely on next_freq or cached_raw_freq anymore. For this to
work properly, we need to make sure policy->cur isn't getting updated
at the same time when get_next_freq() is running. For that I have
given a different meaning to work_in_progress flag, which now creates
a lockless (kind of) critical section where we won't play with
next_freq while the cpufreq core is updating the frequency.

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 0c5c61a095f6..8991cc31b011 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -121,13 +121,8 @@ static void sugov_fast_switch(struct sugov_policy *sg_policy, u64 time,
 static void sugov_deferred_update(struct sugov_policy *sg_policy, u64 time,
                                  unsigned int next_freq)
 {
-       if (!sugov_update_next_freq(sg_policy, time, next_freq))
-               return;
-
-       if (!sg_policy->work_in_progress) {
-               sg_policy->work_in_progress = true;
+       if (sugov_update_next_freq(sg_policy, time, next_freq))
                irq_work_queue(&sg_policy->irq_work);
-       }
 }
 
 /**
@@ -159,6 +154,15 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
        unsigned int freq = arch_scale_freq_invariant() ?
                                policy->cpuinfo.max_freq : policy->cur;
 
+       /*
+        * The previous frequency update didn't go as we expected it to be. Lets
+        * start again to make sure we don't miss any updates.
+        */
+       if (unlikely(policy->cur != sg_policy->next_freq)) {
+               sg_policy->next_freq = 0;
+               sg_policy->cached_raw_freq = 0;
+       }
+
        freq = map_util_freq(util, freq, max);
 
        if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
@@ -337,8 +341,14 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
 
        ignore_dl_rate_limit(sg_cpu, sg_policy);
 
+       if (!sg_policy->policy->fast_switch_enabled) {
+               raw_spin_lock(&sg_policy->update_lock);
+               if (sg_policy->work_in_progress)
+                       goto unlock;
+       }
+
        if (!sugov_should_update_freq(sg_policy, time))
-               return;
+               goto unlock;
 
        /* Limits may have changed, don't skip frequency update */
        busy = !sg_policy->need_freq_update && sugov_cpu_is_busy(sg_cpu);
@@ -363,13 +373,14 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
         * concurrently on two different CPUs for the same target and it is not
         * necessary to acquire the lock in the fast switch case.
         */
-       if (sg_policy->policy->fast_switch_enabled) {
+       if (sg_policy->policy->fast_switch_enabled)
                sugov_fast_switch(sg_policy, time, next_f);
-       } else {
-               raw_spin_lock(&sg_policy->update_lock);
+       else
                sugov_deferred_update(sg_policy, time, next_f);
+
+unlock:
+       if (!sg_policy->policy->fast_switch_enabled)
                raw_spin_unlock(&sg_policy->update_lock);
-       }
 }
 
 static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
@@ -405,6 +416,9 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags)
 
        raw_spin_lock(&sg_policy->update_lock);
 
+       if (sg_policy->work_in_progress)
+               goto unlock;
+
        sugov_iowait_boost(sg_cpu, time, flags);
        sg_cpu->last_update = time;
 
@@ -419,33 +433,30 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags)
                        sugov_deferred_update(sg_policy, time, next_f);
        }
 
+unlock:
        raw_spin_unlock(&sg_policy->update_lock);
 }
 
 static void sugov_work(struct kthread_work *work)
 {
        struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work);
-       unsigned int freq;
        unsigned long flags;
 
        /*
-        * Hold sg_policy->update_lock shortly to handle the case where:
-        * incase sg_policy->next_freq is read here, and then updated by
-        * sugov_deferred_update() just before work_in_progress is set to false
-        * here, we may miss queueing the new update.
-        *
-        * Note: If a work was queued after the update_lock is released,
-        * sugov_work() will just be called again by kthread_work code; and the
-        * request will be proceed before the sugov thread sleeps.
+        * Prevent the schedutil hook to run in parallel while we are updating
+        * the frequency here and accessing next_freq.
         */
        raw_spin_lock_irqsave(&sg_policy->update_lock, flags);
-       freq = sg_policy->next_freq;
-       sg_policy->work_in_progress = false;
+       sg_policy->work_in_progress = true;
        raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags);
 
        mutex_lock(&sg_policy->work_lock);
-       __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L);
+       __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq, CPUFREQ_RELATION_L);
        mutex_unlock(&sg_policy->work_lock);
+
+       raw_spin_lock_irqsave(&sg_policy->update_lock, flags);
+       sg_policy->work_in_progress = false;
+       raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags);
 }
 
 static void sugov_irq_work(struct irq_work *irq_work)

-- 
viresh

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] cpufreq: schedutil: set sg_policy->next_freq to the final cpufreq
  2020-10-29  7:19 ` Viresh Kumar
@ 2020-10-29 12:52   ` Rafael J. Wysocki
  2020-10-30  7:21     ` [PATCH] cpufreq: schedutil: Don't skip freq update if need_freq_update is set Viresh Kumar
  0 siblings, 1 reply; 7+ messages in thread
From: Rafael J. Wysocki @ 2020-10-29 12:52 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: zhuguangqing83, mingo, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, linux-pm,
	linux-kernel, 'zhuguangqing'

On Thursday, October 29, 2020 8:19:52 AM CET Viresh Kumar wrote:
> Your mail client is screwing the "In-reply-to" field of the message
> and that prevents it to appear properly in the thread in mailboxes of
> other people, please fix that.
> 
> On 29-10-20, 09:43, zhuguangqing83 wrote:
> > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > > index 0c5c61a095f6..bf7800e853d3 100644
> > > --- a/kernel/sched/cpufreq_schedutil.c
> > > +++ b/kernel/sched/cpufreq_schedutil.c
> > > @@ -105,7 +105,6 @@ static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> > >         if (sg_policy->next_freq == next_freq)
> > >                 return false;
> > > 
> > > -       sg_policy->next_freq = next_freq;
> > >         sg_policy->last_freq_update_time = time;
> > > 
> > >         return true;
> > 
> > It's a little strange that sg_policy->next_freq and 
> > sg_policy->last_freq_update_time are not updated at the same time.
> > 
> > > @@ -115,7 +114,7 @@ static void sugov_fast_switch(struct sugov_policy *sg_policy, u64 time,
> > >                               unsigned int next_freq)
> > >  {
> > >         if (sugov_update_next_freq(sg_policy, time, next_freq))
> > > -               cpufreq_driver_fast_switch(sg_policy->policy, next_freq);
> > > +               sg_policy->next_freq = cpufreq_driver_fast_switch(sg_policy->policy, next_freq);
> > >  }
> > > 
> > 
> > Great, it also takes into account the issue that 0 is returned by the
> > driver's ->fast_switch() callback to indicate an error condition.
> 
> Yes but even my change wasn't good enough, more on it later.
> 
> > For policy->min/max may be not the real CPU frequency in OPPs, so
> 
> No, that can't happen. If userspace tries to set a value too large or
> too small, we clamp that too to policy->max/min and so the below
> problem shall never occur.
> 
> > next_freq got from get_next_freq() which is after clamping between
> > policy->min and policy->max may be not the real CPU frequency in OPPs.
> > In that case, if we use a real CPU frequency in OPPs returned from
> > cpufreq_driver_fast_switch() to compare with next_freq,
> > "if (sg_policy->next_freq == next_freq)" will never be satisfied, so we
> > change the CPU frequency every time schedutil callback gets called from
> > the scheduler. I see the current code in get_next_freq() as following,
> > the issue mentioned above should not happen. Maybe it's just one of my
> > unnecessary worries.
> 
> Coming back to my patch (and yours too), it only fixes the fast-switch
> case and not the slow path which can also end up clamping the
> frequency. And to be honest, even the drivers can have their own
> clamping code in place, no one is stopping them too.
> 
> And we also need to do something about the cached_raw_freq as well, as
> it will not be in sync with next_freq anymore.

I'm not actually sure why freq is not clamped between the policy min
and max before comparing it with the cached "raw" value.

IIRC the reason for having cached_raw_freq is because freq may not
correspond to any of the available values in the driver's table, but
why does it not take the policy min and max into account?

> Here is another attempt from me tackling this problem. The idea is to
> check if the previous freq update went as expected or not. And if not,
> we can't rely on next_freq or cached_raw_freq anymore.

Well, need_update_freq was kind of supposed to be the flag for that.

IOW, if need_update_freq is set, all of the cached frequency values should
be discarded and the update should go to the driver, or at least to
__cpufreq_driver_target().  [Note that if user space flips one of the
limits to a new value and back to the previous one between two consecutive
utilization updates from the scheduler, we'll end up invoking the driver
even though the frequency hasn't really changed, but that's part of the
design.]

IOW, when need_freq_update is set in sugov_should_update_freq(), it should
guarantee that the update will reach either cpufreq_driver_fast_switch()
or __cpufreq_driver_target().  IMO this needs to be fixed first.

Now, say this is the case.

Can we miss a limits update?

If limits_changed is set in sugov_limits() before sugov_should_update_freq()
checks it, then all should be fine if I'm not missing anything, so say it is
set after that check.

First, suppose that it wasn't set before, so the check fails and the update
goes with need_update_freq unset.  In that case the next run will have
need_freq_update set and it will reach the "lower level".

In turn, if it was set before, the check succeeds and now limits_changed is
cleared and need_freq_update is set.  Obviously, there is a race between
sugov_limits() and sugov_should_update_freq() and say the former wins, so
limits_changed is set again and the current update as well as the next update
will both have need_freq_update set and the both of them will reach the
"lower level".  This is fine.

OTOH, if sugov_should_update_freq() wins the race, limits_changed will remain
unset, so the next update will not have need_freq_update set, but it will be
set for the current one which will reach the "lower level".

So this should work in either case if I'm not mistaken.

Hence, the problem appears to be that setting need_freq_update does not guarantee
that the update will reach the "lower level" and it looks like need_freq_update
is cleared too early: it should be cleared in sugov_update_next_freq() which
should let the update reach the "lower level" if need_freq_update is set.




^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] cpufreq: schedutil: Don't skip freq update if need_freq_update is set
  2020-10-29 12:52   ` Rafael J. Wysocki
@ 2020-10-30  7:21     ` Viresh Kumar
  2020-10-30 15:07       ` Rafael J. Wysocki
  0 siblings, 1 reply; 7+ messages in thread
From: Viresh Kumar @ 2020-10-30  7:21 UTC (permalink / raw)
  To: Rafael Wysocki, Viresh Kumar, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira
  Cc: linux-pm, zhuguangqing, Rafael J . Wysocki, linux-kernel

The cpufreq policy's frequency limits (min/max) can get changed at any
point of time, while schedutil is trying to update the next frequency.
Though the schedutil governor has necessary locking and support in place
to make sure we don't miss any of those updates, there is a corner case
where the governor will find that the CPU is already running at the
desired frequency and so may skip an update.

For example, consider that the CPU can run at 1 GHz, 1.2 GHz and 1.4 GHz
and is running at 1 GHz currently. Schedutil tries to update the
frequency to 1.2 GHz, during this time the policy limits get changed as
policy->min = 1.4 GHz. As schedutil (and cpufreq core) does clamp the
frequency at various instances, we will eventually set the frequency to
1.4 GHz, while we will save 1.2 GHz in sg_policy->next_freq.

Now lets say the policy limits get changed back at this time with
policy->min as 1 GHz. The next time schedutil is invoked by the
scheduler, we will reevaluate the next frequency (because
need_freq_update will get set due to limits change event) and lets say
we want to set the frequency to 1.2 GHz again. At this point
sugov_update_next_freq() will find the next_freq == current_freq and
will abort the update, while the CPU actually runs at 1.4 GHz.

Until now need_freq_update was used as a flag to indicate that the
policy's frequency limits have changed, and that we should consider the
new limits while reevaluating the next frequency.

This patch fixes the above mentioned issue by extending the purpose of
the need_freq_update flag. If this flag is set now, the schedutil
governor will not try to abort a frequency change even if next_freq ==
current_freq.

As similar behavior is required in the case of
CPUFREQ_NEED_UPDATE_LIMITS flag as well, need_freq_update will never be
set to false if that flag is set for the driver.

We also don't need to consider the need_freq_update flag in
sugov_update_single() anymore to handle the special case of busy CPU, as
we won't abort a frequency update anymore.

Reported-by: zhuguangqing <zhuguangqing@xiaomi.com>
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 kernel/sched/cpufreq_schedutil.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index c03a5775d019..c6861be02c86 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -102,9 +102,12 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
 static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
 				   unsigned int next_freq)
 {
-	if (sg_policy->next_freq == next_freq &&
-	    !cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS))
-		return false;
+	if (!sg_policy->need_freq_update) {
+		if (sg_policy->next_freq == next_freq)
+			return false;
+	} else if (!cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS)) {
+		sg_policy->need_freq_update = false;
+	}
 
 	sg_policy->next_freq = next_freq;
 	sg_policy->last_freq_update_time = time;
@@ -162,11 +165,9 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
 
 	freq = map_util_freq(util, freq, max);
 
-	if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update &&
-	    !cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS))
+	if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
 		return sg_policy->next_freq;
 
-	sg_policy->need_freq_update = false;
 	sg_policy->cached_raw_freq = freq;
 	return cpufreq_driver_resolve_freq(policy, freq);
 }
@@ -442,7 +443,6 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
 	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
 	unsigned long util, max;
 	unsigned int next_f;
-	bool busy;
 	unsigned int cached_freq = sg_policy->cached_raw_freq;
 
 	sugov_iowait_boost(sg_cpu, time, flags);
@@ -453,9 +453,6 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
 	if (!sugov_should_update_freq(sg_policy, time))
 		return;
 
-	/* Limits may have changed, don't skip frequency update */
-	busy = !sg_policy->need_freq_update && sugov_cpu_is_busy(sg_cpu);
-
 	util = sugov_get_util(sg_cpu);
 	max = sg_cpu->max;
 	util = sugov_iowait_apply(sg_cpu, time, util, max);
@@ -464,7 +461,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
 	 * Do not reduce the frequency if the CPU has not been idle
 	 * recently, as the reduction is likely to be premature then.
 	 */
-	if (busy && next_f < sg_policy->next_freq) {
+	if (sugov_cpu_is_busy(sg_cpu) && next_f < sg_policy->next_freq) {
 		next_f = sg_policy->next_freq;
 
 		/* Restore cached freq as next_freq has changed */
@@ -829,9 +826,10 @@ static int sugov_start(struct cpufreq_policy *policy)
 	sg_policy->next_freq			= 0;
 	sg_policy->work_in_progress		= false;
 	sg_policy->limits_changed		= false;
-	sg_policy->need_freq_update		= false;
 	sg_policy->cached_raw_freq		= 0;
 
+	sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
+
 	for_each_cpu(cpu, policy->cpus) {
 		struct sugov_cpu *sg_cpu = &per_cpu(sugov_cpu, cpu);
 
-- 
2.25.0.rc1.19.g042ed3e048af


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] cpufreq: schedutil: Don't skip freq update if need_freq_update is set
  2020-10-30  7:21     ` [PATCH] cpufreq: schedutil: Don't skip freq update if need_freq_update is set Viresh Kumar
@ 2020-10-30 15:07       ` Rafael J. Wysocki
  2020-10-30 15:23         ` Rafael J. Wysocki
  0 siblings, 1 reply; 7+ messages in thread
From: Rafael J. Wysocki @ 2020-10-30 15:07 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Linux PM, zhuguangqing,
	Rafael J . Wysocki, Linux Kernel Mailing List

On Fri, Oct 30, 2020 at 8:31 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> The cpufreq policy's frequency limits (min/max) can get changed at any
> point of time, while schedutil is trying to update the next frequency.
> Though the schedutil governor has necessary locking and support in place
> to make sure we don't miss any of those updates, there is a corner case
> where the governor will find that the CPU is already running at the
> desired frequency and so may skip an update.
>
> For example, consider that the CPU can run at 1 GHz, 1.2 GHz and 1.4 GHz
> and is running at 1 GHz currently. Schedutil tries to update the
> frequency to 1.2 GHz, during this time the policy limits get changed as
> policy->min = 1.4 GHz. As schedutil (and cpufreq core) does clamp the
> frequency at various instances, we will eventually set the frequency to
> 1.4 GHz, while we will save 1.2 GHz in sg_policy->next_freq.
>
> Now lets say the policy limits get changed back at this time with
> policy->min as 1 GHz. The next time schedutil is invoked by the
> scheduler, we will reevaluate the next frequency (because
> need_freq_update will get set due to limits change event) and lets say
> we want to set the frequency to 1.2 GHz again. At this point
> sugov_update_next_freq() will find the next_freq == current_freq and
> will abort the update, while the CPU actually runs at 1.4 GHz.
>
> Until now need_freq_update was used as a flag to indicate that the
> policy's frequency limits have changed, and that we should consider the
> new limits while reevaluating the next frequency.
>
> This patch fixes the above mentioned issue by extending the purpose of
> the need_freq_update flag. If this flag is set now, the schedutil
> governor will not try to abort a frequency change even if next_freq ==
> current_freq.
>
> As similar behavior is required in the case of
> CPUFREQ_NEED_UPDATE_LIMITS flag as well, need_freq_update will never be
> set to false if that flag is set for the driver.
>
> We also don't need to consider the need_freq_update flag in
> sugov_update_single() anymore to handle the special case of busy CPU, as
> we won't abort a frequency update anymore.
>
> Reported-by: zhuguangqing <zhuguangqing@xiaomi.com>
> Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Thanks for following my suggestion!

> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  kernel/sched/cpufreq_schedutil.c | 22 ++++++++++------------
>  1 file changed, 10 insertions(+), 12 deletions(-)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index c03a5775d019..c6861be02c86 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -102,9 +102,12 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
>  static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
>                                    unsigned int next_freq)
>  {
> -       if (sg_policy->next_freq == next_freq &&
> -           !cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS))
> -               return false;
> +       if (!sg_policy->need_freq_update) {
> +               if (sg_policy->next_freq == next_freq)
> +                       return false;
> +       } else if (!cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS)) {
> +               sg_policy->need_freq_update = false;
> +       }
>
>         sg_policy->next_freq = next_freq;
>         sg_policy->last_freq_update_time = time;
> @@ -162,11 +165,9 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
>
>         freq = map_util_freq(util, freq, max);
>
> -       if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update &&
> -           !cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS))
> +       if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
>                 return sg_policy->next_freq;
>
> -       sg_policy->need_freq_update = false;
>         sg_policy->cached_raw_freq = freq;
>         return cpufreq_driver_resolve_freq(policy, freq);
>  }
> @@ -442,7 +443,6 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
>         struct sugov_policy *sg_policy = sg_cpu->sg_policy;
>         unsigned long util, max;
>         unsigned int next_f;
> -       bool busy;
>         unsigned int cached_freq = sg_policy->cached_raw_freq;
>
>         sugov_iowait_boost(sg_cpu, time, flags);
> @@ -453,9 +453,6 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
>         if (!sugov_should_update_freq(sg_policy, time))
>                 return;
>
> -       /* Limits may have changed, don't skip frequency update */
> -       busy = !sg_policy->need_freq_update && sugov_cpu_is_busy(sg_cpu);
> -
>         util = sugov_get_util(sg_cpu);
>         max = sg_cpu->max;
>         util = sugov_iowait_apply(sg_cpu, time, util, max);
> @@ -464,7 +461,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
>          * Do not reduce the frequency if the CPU has not been idle
>          * recently, as the reduction is likely to be premature then.
>          */
> -       if (busy && next_f < sg_policy->next_freq) {
> +       if (sugov_cpu_is_busy(sg_cpu) && next_f < sg_policy->next_freq) {
>                 next_f = sg_policy->next_freq;
>
>                 /* Restore cached freq as next_freq has changed */
> @@ -829,9 +826,10 @@ static int sugov_start(struct cpufreq_policy *policy)
>         sg_policy->next_freq                    = 0;
>         sg_policy->work_in_progress             = false;
>         sg_policy->limits_changed               = false;
> -       sg_policy->need_freq_update             = false;
>         sg_policy->cached_raw_freq              = 0;
>
> +       sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> +
>         for_each_cpu(cpu, policy->cpus) {
>                 struct sugov_cpu *sg_cpu = &per_cpu(sugov_cpu, cpu);
>
> --

I'll queue it up for -rc3 next week, thanks!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] cpufreq: schedutil: Don't skip freq update if need_freq_update is set
  2020-10-30 15:07       ` Rafael J. Wysocki
@ 2020-10-30 15:23         ` Rafael J. Wysocki
  2020-11-02  4:43           ` Viresh Kumar
  0 siblings, 1 reply; 7+ messages in thread
From: Rafael J. Wysocki @ 2020-10-30 15:23 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Linux PM, zhuguangqing,
	Rafael J . Wysocki, Linux Kernel Mailing List

On Fri, Oct 30, 2020 at 4:07 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Fri, Oct 30, 2020 at 8:31 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > The cpufreq policy's frequency limits (min/max) can get changed at any
> > point of time, while schedutil is trying to update the next frequency.
> > Though the schedutil governor has necessary locking and support in place
> > to make sure we don't miss any of those updates, there is a corner case
> > where the governor will find that the CPU is already running at the
> > desired frequency and so may skip an update.
> >
> > For example, consider that the CPU can run at 1 GHz, 1.2 GHz and 1.4 GHz
> > and is running at 1 GHz currently. Schedutil tries to update the
> > frequency to 1.2 GHz, during this time the policy limits get changed as
> > policy->min = 1.4 GHz. As schedutil (and cpufreq core) does clamp the
> > frequency at various instances, we will eventually set the frequency to
> > 1.4 GHz, while we will save 1.2 GHz in sg_policy->next_freq.
> >
> > Now lets say the policy limits get changed back at this time with
> > policy->min as 1 GHz. The next time schedutil is invoked by the
> > scheduler, we will reevaluate the next frequency (because
> > need_freq_update will get set due to limits change event) and lets say
> > we want to set the frequency to 1.2 GHz again. At this point
> > sugov_update_next_freq() will find the next_freq == current_freq and
> > will abort the update, while the CPU actually runs at 1.4 GHz.
> >
> > Until now need_freq_update was used as a flag to indicate that the
> > policy's frequency limits have changed, and that we should consider the
> > new limits while reevaluating the next frequency.
> >
> > This patch fixes the above mentioned issue by extending the purpose of
> > the need_freq_update flag. If this flag is set now, the schedutil
> > governor will not try to abort a frequency change even if next_freq ==
> > current_freq.
> >
> > As similar behavior is required in the case of
> > CPUFREQ_NEED_UPDATE_LIMITS flag as well, need_freq_update will never be
> > set to false if that flag is set for the driver.
> >
> > We also don't need to consider the need_freq_update flag in
> > sugov_update_single() anymore to handle the special case of busy CPU, as
> > we won't abort a frequency update anymore.
> >
> > Reported-by: zhuguangqing <zhuguangqing@xiaomi.com>
> > Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Thanks for following my suggestion!
>
> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> > ---
> >  kernel/sched/cpufreq_schedutil.c | 22 ++++++++++------------
> >  1 file changed, 10 insertions(+), 12 deletions(-)
> >
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index c03a5775d019..c6861be02c86 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -102,9 +102,12 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> >  static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> >                                    unsigned int next_freq)
> >  {
> > -       if (sg_policy->next_freq == next_freq &&
> > -           !cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS))
> > -               return false;
> > +       if (!sg_policy->need_freq_update) {
> > +               if (sg_policy->next_freq == next_freq)
> > +                       return false;
> > +       } else if (!cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS)) {
> > +               sg_policy->need_freq_update = false;

One nit, though.

This can be changed into

} else {
      sg_policy->need_freq_update =
cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
}

to save a branch and because need_freq_update is there in the cache
already, this should be a fast update.

So I'm going to make this change while applying the patch.

> > +       }
> >
> >         sg_policy->next_freq = next_freq;
> >         sg_policy->last_freq_update_time = time;
> > @@ -162,11 +165,9 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
> >
> >         freq = map_util_freq(util, freq, max);
> >
> > -       if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update &&
> > -           !cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS))
> > +       if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
> >                 return sg_policy->next_freq;
> >
> > -       sg_policy->need_freq_update = false;
> >         sg_policy->cached_raw_freq = freq;
> >         return cpufreq_driver_resolve_freq(policy, freq);
> >  }
> > @@ -442,7 +443,6 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
> >         struct sugov_policy *sg_policy = sg_cpu->sg_policy;
> >         unsigned long util, max;
> >         unsigned int next_f;
> > -       bool busy;
> >         unsigned int cached_freq = sg_policy->cached_raw_freq;
> >
> >         sugov_iowait_boost(sg_cpu, time, flags);
> > @@ -453,9 +453,6 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
> >         if (!sugov_should_update_freq(sg_policy, time))
> >                 return;
> >
> > -       /* Limits may have changed, don't skip frequency update */
> > -       busy = !sg_policy->need_freq_update && sugov_cpu_is_busy(sg_cpu);
> > -
> >         util = sugov_get_util(sg_cpu);
> >         max = sg_cpu->max;
> >         util = sugov_iowait_apply(sg_cpu, time, util, max);
> > @@ -464,7 +461,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
> >          * Do not reduce the frequency if the CPU has not been idle
> >          * recently, as the reduction is likely to be premature then.
> >          */
> > -       if (busy && next_f < sg_policy->next_freq) {
> > +       if (sugov_cpu_is_busy(sg_cpu) && next_f < sg_policy->next_freq) {
> >                 next_f = sg_policy->next_freq;
> >
> >                 /* Restore cached freq as next_freq has changed */
> > @@ -829,9 +826,10 @@ static int sugov_start(struct cpufreq_policy *policy)
> >         sg_policy->next_freq                    = 0;
> >         sg_policy->work_in_progress             = false;
> >         sg_policy->limits_changed               = false;
> > -       sg_policy->need_freq_update             = false;
> >         sg_policy->cached_raw_freq              = 0;
> >
> > +       sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> > +
> >         for_each_cpu(cpu, policy->cpus) {
> >                 struct sugov_cpu *sg_cpu = &per_cpu(sugov_cpu, cpu);
> >
> > --
>
> I'll queue it up for -rc3 next week, thanks!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] cpufreq: schedutil: Don't skip freq update if need_freq_update is set
  2020-10-30 15:23         ` Rafael J. Wysocki
@ 2020-11-02  4:43           ` Viresh Kumar
  0 siblings, 0 replies; 7+ messages in thread
From: Viresh Kumar @ 2020-11-02  4:43 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael Wysocki, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Linux PM, zhuguangqing,
	Rafael J . Wysocki, Linux Kernel Mailing List

On 30-10-20, 16:23, Rafael J. Wysocki wrote:
> On Fri, Oct 30, 2020 at 4:07 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Fri, Oct 30, 2020 at 8:31 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > >
> > > The cpufreq policy's frequency limits (min/max) can get changed at any
> > > point of time, while schedutil is trying to update the next frequency.
> > > Though the schedutil governor has necessary locking and support in place
> > > to make sure we don't miss any of those updates, there is a corner case
> > > where the governor will find that the CPU is already running at the
> > > desired frequency and so may skip an update.
> > >
> > > For example, consider that the CPU can run at 1 GHz, 1.2 GHz and 1.4 GHz
> > > and is running at 1 GHz currently. Schedutil tries to update the
> > > frequency to 1.2 GHz, during this time the policy limits get changed as
> > > policy->min = 1.4 GHz. As schedutil (and cpufreq core) does clamp the
> > > frequency at various instances, we will eventually set the frequency to
> > > 1.4 GHz, while we will save 1.2 GHz in sg_policy->next_freq.
> > >
> > > Now lets say the policy limits get changed back at this time with
> > > policy->min as 1 GHz. The next time schedutil is invoked by the
> > > scheduler, we will reevaluate the next frequency (because
> > > need_freq_update will get set due to limits change event) and lets say
> > > we want to set the frequency to 1.2 GHz again. At this point
> > > sugov_update_next_freq() will find the next_freq == current_freq and
> > > will abort the update, while the CPU actually runs at 1.4 GHz.
> > >
> > > Until now need_freq_update was used as a flag to indicate that the
> > > policy's frequency limits have changed, and that we should consider the
> > > new limits while reevaluating the next frequency.
> > >
> > > This patch fixes the above mentioned issue by extending the purpose of
> > > the need_freq_update flag. If this flag is set now, the schedutil
> > > governor will not try to abort a frequency change even if next_freq ==
> > > current_freq.
> > >
> > > As similar behavior is required in the case of
> > > CPUFREQ_NEED_UPDATE_LIMITS flag as well, need_freq_update will never be
> > > set to false if that flag is set for the driver.
> > >
> > > We also don't need to consider the need_freq_update flag in
> > > sugov_update_single() anymore to handle the special case of busy CPU, as
> > > we won't abort a frequency update anymore.
> > >
> > > Reported-by: zhuguangqing <zhuguangqing@xiaomi.com>
> > > Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > Thanks for following my suggestion!
> >
> > > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> > > ---
> > >  kernel/sched/cpufreq_schedutil.c | 22 ++++++++++------------
> > >  1 file changed, 10 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > > index c03a5775d019..c6861be02c86 100644
> > > --- a/kernel/sched/cpufreq_schedutil.c
> > > +++ b/kernel/sched/cpufreq_schedutil.c
> > > @@ -102,9 +102,12 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> > >  static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> > >                                    unsigned int next_freq)
> > >  {
> > > -       if (sg_policy->next_freq == next_freq &&
> > > -           !cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS))
> > > -               return false;
> > > +       if (!sg_policy->need_freq_update) {
> > > +               if (sg_policy->next_freq == next_freq)
> > > +                       return false;
> > > +       } else if (!cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS)) {
> > > +               sg_policy->need_freq_update = false;
> 
> One nit, though.
> 
> This can be changed into
> 
> } else {
>       sg_policy->need_freq_update =
> cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> }
> 
> to save a branch and because need_freq_update is there in the cache
> already, this should be a fast update.

Nice.

-- 
viresh

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-11-02  4:43 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-29  1:43 [PATCH] cpufreq: schedutil: set sg_policy->next_freq to the final cpufreq zhuguangqing83
2020-10-29  7:19 ` Viresh Kumar
2020-10-29 12:52   ` Rafael J. Wysocki
2020-10-30  7:21     ` [PATCH] cpufreq: schedutil: Don't skip freq update if need_freq_update is set Viresh Kumar
2020-10-30 15:07       ` Rafael J. Wysocki
2020-10-30 15:23         ` Rafael J. Wysocki
2020-11-02  4:43           ` Viresh Kumar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).