All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Qais Yousef <qyousef@layalina.io>
Cc: mingo@kernel.org, peterz@infradead.org, dietmar.eggemann@arm.com,
	rafael@kernel.org, viresh.kumar@linaro.org, vschneid@redhat.com,
	linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org,
	lukasz.luba@arm.com, wvw@google.com, xuewen.yan94@gmail.com,
	han.lin@mediatek.com, Jonathan.JMChen@mediatek.com
Subject: Re: [PATCH v2] sched/fair: unlink misfit task from cpu overutilized
Date: Fri, 13 Jan 2023 09:53:49 +0100	[thread overview]
Message-ID: <CAKfTPtCPedS22+0QHS65ixMELAvjRnn6XMCu0DzickEcpjkMFQ@mail.gmail.com> (raw)
In-Reply-To: <20230112142603.yj5dtleksi6sfrzp@airbuntu>

On Thu, 12 Jan 2023 at 15:26, Qais Yousef <qyousef@layalina.io> wrote:
>
> Hi Vincent
>
> On 12/28/22 17:54, Vincent Guittot wrote:
> > By taking into account uclamp_min, the 1:1 relation between task misfit
> > and cpu overutilized is no more true as a task with a small util_avg of
> > may not may not fit a high capacity cpu because of uclamp_min constraint.
>
> Wouldn't it be better to split this into two patches
>
>         * Unlink/Decouple misfit ...
>         * Unlink/Decouple util_fits_cpu from HMP
>
> ?

I'm afraid that git bisect could then raise a false positive between
the 2 commits

>
> >
> > Add a new state in util_fits_cpu() to reflect the case that task would fit
> > a CPU except for the uclamp_min hint which is a performance requirement.
> >
> > Use -1 to reflect that a CPU doesn't fit only because of uclamp_min so we
> > can use this new value to take additional action to select the best CPU
> > that doesn't match uclamp_min hint.
>
> This part has nothing to do with the commit subject. I think it's better to
> split the patches if it's not too much work for you.
>
> >
> > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> > ---
> >
> > Change since v1:
> > - fix some wrong conditions
> > - take into account more cases
> >
> >  kernel/sched/fair.c | 99 +++++++++++++++++++++++++++++++++------------
> >  1 file changed, 74 insertions(+), 25 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 1649e7d71d24..57077f0a897e 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -4582,8 +4582,7 @@ static inline int util_fits_cpu(unsigned long util,
> >        *     2. The system is being saturated when we're operating near
> >        *        max capacity, it doesn't make sense to block overutilized.
> >        */
> > -     uclamp_max_fits = (capacity_orig == SCHED_CAPACITY_SCALE) && (uclamp_max == SCHED_CAPACITY_SCALE);
> > -     uclamp_max_fits = !uclamp_max_fits && (uclamp_max <= capacity_orig);
> > +     uclamp_max_fits = (uclamp_max <= capacity_orig) || (capacity_orig == SCHED_CAPACITY_SCALE);
> >       fits = fits || uclamp_max_fits;
> >
> >       /*
> > @@ -4618,8 +4617,8 @@ static inline int util_fits_cpu(unsigned long util,
> >        * handle the case uclamp_min > uclamp_max.
> >        */
> >       uclamp_min = min(uclamp_min, uclamp_max);
> > -     if (util < uclamp_min && capacity_orig != SCHED_CAPACITY_SCALE)
> > -             fits = fits && (uclamp_min <= capacity_orig_thermal);
> > +     if (fits && (util < uclamp_min) && (uclamp_min > capacity_orig_thermal))
> > +             return -1;
> >
> >       return fits;
> >  }
> > @@ -4629,7 +4628,7 @@ static inline int task_fits_cpu(struct task_struct *p, int cpu)
> >       unsigned long uclamp_min = uclamp_eff_value(p, UCLAMP_MIN);
> >       unsigned long uclamp_max = uclamp_eff_value(p, UCLAMP_MAX);
> >       unsigned long util = task_util_est(p);
> > -     return util_fits_cpu(util, uclamp_min, uclamp_max, cpu);
> > +     return (util_fits_cpu(util, uclamp_min, uclamp_max, cpu) > 0);
>
> So the big difference between your approach and my approach is that
> task_fits_cpu() and asym_fits_cpu() now are very strict in regards to thermal
> pressure since with your approach we delegate the smartness to the caller.
>
> Should we add a comment for these 2 users to make it obvious we intentionally
> ignore the '-1' value and why it is okay?

I can probably add something saying that a positive value (1 in this
case) is the only one that says that a task fits to a cpu. Other
returned values imply that either the utilization or the uclamp
constraints are not meet
>
> I'm not sure I can write a reasonable rationale myself. I'm actually worried
> this might subtly break decisions made by select_idle_capacity() or feec() when
> doing the LB.
>
> Have you considered this?

Yes, that why i have keep the changes in 1 patch

>
> >  }
> >
> >  static inline void update_misfit_status(struct task_struct *p, struct rq *rq)
> > @@ -6864,6 +6863,7 @@ static int
> >  select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
> >  {
> >       unsigned long task_util, util_min, util_max, best_cap = 0;
> > +     int fits, best_fits = 0;
> >       int cpu, best_cpu = -1;
> >       struct cpumask *cpus;
> >
> > @@ -6879,12 +6879,28 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
> >
> >               if (!available_idle_cpu(cpu) && !sched_idle_cpu(cpu))
> >                       continue;
> > -             if (util_fits_cpu(task_util, util_min, util_max, cpu))
> > +
> > +             fits = util_fits_cpu(task_util, util_min, util_max, cpu);
> > +
> > +             /* This CPU fits with all capacity and performance requirements */
> > +             if (fits > 0)
> >                       return cpu;
> > +             /*
> > +              * Only the min performance (i.e. uclamp_min) doesn't fit. Look
> > +              * for the CPU with highest performance capacity.
> > +              */
> > +             else if (fits < 0)
> > +                     cpu_cap = capacity_orig_of(cpu) - thermal_load_avg(cpu_rq(cpu));
> >
> > -             if (cpu_cap > best_cap) {
> > +             /*
> > +              * First, select cpu which fits better (-1 being better than 0).
> > +              * Then, select the one with largest capacity at same level.
> > +              */
> > +             if ((fits < best_fits) ||
> > +                 ((fits == best_fits) && (cpu_cap > best_cap))) {
> >                       best_cap = cpu_cap;
> >                       best_cpu = cpu;
> > +                     best_fits = fits;
> >               }
> >       }
> >
> > @@ -6897,7 +6913,7 @@ static inline bool asym_fits_cpu(unsigned long util,
> >                                int cpu)
> >  {
> >       if (sched_asym_cpucap_active())
> > -             return util_fits_cpu(util, util_min, util_max, cpu);
> > +             return (util_fits_cpu(util, util_min, util_max, cpu) > 0);
> >
> >       return true;
> >  }
> > @@ -7264,6 +7280,9 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
> >       unsigned long p_util_max = uclamp_is_used() ? uclamp_eff_value(p, UCLAMP_MAX) : 1024;
> >       struct root_domain *rd = this_rq()->rd;
> >       int cpu, best_energy_cpu, target = -1;
> > +     int prev_fits = -1, best_fits = -1;
> > +     unsigned long best_thermal_cap = 0;
> > +     unsigned long prev_thermal_cap = 0;
> >       struct sched_domain *sd;
> >       struct perf_domain *pd;
> >       struct energy_env eenv;
> > @@ -7299,6 +7318,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
> >               unsigned long prev_spare_cap = 0;
> >               int max_spare_cap_cpu = -1;
> >               unsigned long base_energy;
> > +             int fits, max_fits = -1;
> >
> >               cpumask_and(cpus, perf_domain_span(pd), cpu_online_mask);
> >
> > @@ -7351,7 +7371,9 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
> >                                       util_max = max(rq_util_max, p_util_max);
> >                               }
> >                       }
> > -                     if (!util_fits_cpu(util, util_min, util_max, cpu))
> > +
> > +                     fits = util_fits_cpu(util, util_min, util_max, cpu);
> > +                     if (!fits)
> >                               continue;
> >
> >                       lsub_positive(&cpu_cap, util);
> > @@ -7359,7 +7381,9 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
> >                       if (cpu == prev_cpu) {
> >                               /* Always use prev_cpu as a candidate. */
> >                               prev_spare_cap = cpu_cap;
> > -                     } else if (cpu_cap > max_spare_cap) {
> > +                             prev_fits = fits;
> > +                     } else if ((fits > max_fits) ||
> > +                                ((fits == max_fits) && (cpu_cap > max_spare_cap))) {
> >                               /*
> >                                * Find the CPU with the maximum spare capacity
> >                                * among the remaining CPUs in the performance
> > @@ -7367,6 +7391,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
> >                                */
> >                               max_spare_cap = cpu_cap;
> >                               max_spare_cap_cpu = cpu;
> > +                             max_fits = fits;
> >                       }
> >               }
> >
> > @@ -7385,26 +7410,50 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
> >                       if (prev_delta < base_energy)
> >                               goto unlock;
> >                       prev_delta -= base_energy;
> > +                     prev_thermal_cap = cpu_thermal_cap;
> >                       best_delta = min(best_delta, prev_delta);
> >               }
> >
> >               /* Evaluate the energy impact of using max_spare_cap_cpu. */
> >               if (max_spare_cap_cpu >= 0 && max_spare_cap > prev_spare_cap) {
> > +                     /* Current best energy cpu fits better */
> > +                     if (max_fits < best_fits)
> > +                             continue;
> > +
> > +                     /*
> > +                      * Both don't fit performance (i.e. uclamp_min) but
> > +                      * best energy cpu has better performance.
> > +                      */
> > +                     if ((max_fits < 0) &&
> > +                         (cpu_thermal_cap <= best_thermal_cap))
> > +                             continue;
> > +
> >                       cur_delta = compute_energy(&eenv, pd, cpus, p,
> >                                                  max_spare_cap_cpu);
> >                       /* CPU utilization has changed */
> >                       if (cur_delta < base_energy)
> >                               goto unlock;
> >                       cur_delta -= base_energy;
> > -                     if (cur_delta < best_delta) {
> > -                             best_delta = cur_delta;
> > -                             best_energy_cpu = max_spare_cap_cpu;
> > -                     }
> > +
> > +                     /*
> > +                      * Both fit for the task but best energy cpu has lower
> > +                      * energy impact.
> > +                      */
> > +                     if ((max_fits > 0) &&
>
> Shouldn't this be
>
>                         if ((max_fits > 0) && (max_fits == best_fits) &&
> ?

I will use the below which match better the comment and the fact that
both fit for the task:

+                    if ((max_fits > 0) && (best_fits > 0) &&

>
> We should update best_delta unconditionally first time we hit max_fits = 1, no?
>
> I think it's worth extending the comment with something along the lines of
>
>                         * ... except for the first time max_fits becomes 1
>                         * then we must update best_delta unconditionally

With the new condition above this is not needed anymore

>
> > +                         (cur_delta >= best_delta))
> > +                             continue;
> > +
> > +                     best_delta = cur_delta;
> > +                     best_energy_cpu = max_spare_cap_cpu;
> > +                     best_fits = max_fits;
> > +                     best_thermal_cap = cpu_thermal_cap;
> >               }
> >       }
> >       rcu_read_unlock();
> >
> > -     if (best_delta < prev_delta)
> > +     if ((best_fits > prev_fits) ||
> > +         ((best_fits > 0) && (best_delta < prev_delta)) ||
> > +         ((best_fits < 0) && (best_thermal_cap > prev_thermal_cap)))
> >               target = best_energy_cpu;
>
> Overall I think the approach is sound. I tested it on my pinebook pro and
> couldn't catch obvious breakage at least.
>
> I am still worried though about spilling the knowledge outside of
> util_fits_cpu() is creating extra complexity in the callers and potentially
> more fragility when these callers evolve overtime e.g:
> task_fits_cpu()/asym_fits_cpu() gain a new user that must actually care about
> the -1 return value.

ask_fits_cpu()/asym_fits_cpu() remain simple booleans that return true
if the task fits the cpu in regards to all requirements. If a new user
wants to make smarter decisions like select_idle_capacity() or feec(),
it will have to use util_fits_cpu(). Both handle the case where
uclamp_min is not met differently.

>
> I think we can still optimize the capacity inversion logic to use no loops
> without having to spill the knowledge to the users/callers of util_fits_cpu(),
> no?

TBH, I don't know how because we are not always comparing the same
things depending of the reason it doesn't fit

>
> That said except for the few comments I had this LGTM anyway. Thanks for your
> effort!

Thanks

I'm going to prepare  v2

>
>
> Cheers
>
> --
> Qais Yousef
>
> >
> >       return target;
> > @@ -10228,24 +10277,23 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
> >        */
> >       update_sd_lb_stats(env, &sds);
> >
> > -     if (sched_energy_enabled()) {
> > -             struct root_domain *rd = env->dst_rq->rd;
> > -
> > -             if (rcu_dereference(rd->pd) && !READ_ONCE(rd->overutilized))
> > -                     goto out_balanced;
> > -     }
> > -
> > -     local = &sds.local_stat;
> > -     busiest = &sds.busiest_stat;
> > -
> >       /* There is no busy sibling group to pull tasks from */
> >       if (!sds.busiest)
> >               goto out_balanced;
> >
> > +     busiest = &sds.busiest_stat;
> > +
> >       /* Misfit tasks should be dealt with regardless of the avg load */
> >       if (busiest->group_type == group_misfit_task)
> >               goto force_balance;
> >
> > +     if (sched_energy_enabled()) {
> > +             struct root_domain *rd = env->dst_rq->rd;
> > +
> > +             if (rcu_dereference(rd->pd) && !READ_ONCE(rd->overutilized))
> > +                     goto out_balanced;
> > +     }
> > +
> >       /* ASYM feature bypasses nice load balance check */
> >       if (busiest->group_type == group_asym_packing)
> >               goto force_balance;
> > @@ -10258,6 +10306,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
> >       if (busiest->group_type == group_imbalanced)
> >               goto force_balance;
> >
> > +     local = &sds.local_stat;
> >       /*
> >        * If the local group is busier than the selected busiest group
> >        * don't try and pull any tasks.
> > --
> > 2.17.1
> >

  reply	other threads:[~2023-01-13  8:54 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-28 16:54 [PATCH v2] sched/fair: unlink misfit task from cpu overutilized Vincent Guittot
2023-01-12 14:26 ` Qais Yousef
2023-01-13  8:53   ` Vincent Guittot [this message]
2023-01-14 18:21     ` Qais Yousef
2023-01-13 13:49 ` Kajetan Puchalski
2023-01-13 14:28   ` Vincent Guittot
2023-01-13 14:50     ` Kajetan Puchalski
2023-01-14 21:18     ` Qais Yousef
2023-01-16  9:21       ` Dietmar Eggemann
2023-01-16 11:25         ` Vincent Guittot
2023-01-17 16:40         ` Qais Yousef
2023-01-20 16:08       ` Kajetan Puchalski
2022-12-29  5:40 kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKfTPtCPedS22+0QHS65ixMELAvjRnn6XMCu0DzickEcpjkMFQ@mail.gmail.com \
    --to=vincent.guittot@linaro.org \
    --cc=Jonathan.JMChen@mediatek.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=han.lin@mediatek.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=lukasz.luba@arm.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=qyousef@layalina.io \
    --cc=rafael@kernel.org \
    --cc=viresh.kumar@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=wvw@google.com \
    --cc=xuewen.yan94@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.