All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Valentin Schneider <valentin.schneider@arm.com>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Qais Yousef <qais.yousef@arm.com>,
	Quentin Perret <qperret@google.com>,
	Pavan Kondeti <pkondeti@codeaurora.org>,
	Rik van Riel <riel@surriel.com>,
	Lingutla Chandrasekhar <clingutla@codeaurora.org>
Subject: Re: [PATCH 1/2] sched/fair: Filter out locally-unsolvable misfit imbalances
Date: Fri, 16 Apr 2021 15:29:30 +0200	[thread overview]
Message-ID: <CAKfTPtA7uq24hFv57c=W_XkXH-g9b2xo7OfJHRP+cbKoh-0TOQ@mail.gmail.com> (raw)
In-Reply-To: <20210415175846.494385-2-valentin.schneider@arm.com>

On Thu, 15 Apr 2021 at 19:58, Valentin Schneider
<valentin.schneider@arm.com> wrote:
>
> Consider the following (hypothetical) asymmetric CPU capacity topology,
> with some amount of capacity pressure (RT | DL | IRQ | thermal):
>
>   DIE [          ]
>   MC  [    ][    ]
>        0  1  2  3
>
>   | CPU | capacity_orig | capacity |
>   |-----+---------------+----------|
>   |   0 |           870 |      860 |
>   |   1 |           870 |      600 |
>   |   2 |          1024 |      850 |
>   |   3 |          1024 |      860 |
>
> If CPU1 has a misfit task, then CPU0, CPU2 and CPU3 are valid candidates to
> grant the task an uplift in CPU capacity. Consider CPU0 and CPU3 as
> sufficiently busy, i.e. don't have enough spare capacity to accommodate
> CPU1's misfit task. This would then fall on CPU2 to pull the task.
>
> This currently won't happen, because CPU2 will fail
>
>   capacity_greater(capacity_of(CPU2), sg->sgc->max_capacity)
>
> in update_sd_pick_busiest(), where 'sg' is the [0, 1] group at DIE
> level. In this case, the max_capacity is that of CPU0's, which is at this
> point in time greater than that of CPU2's. This comparison doesn't make
> much sense, given that the only CPUs we should care about in this scenario
> are CPU1 (the CPU with the misfit task) and CPU2 (the load-balance
> destination CPU).
>
> Aggregate a misfit task's load into sgs->group_misfit_task_load only if
> env->dst_cpu would grant it a capacity uplift.
>
> Note that the aforementioned capacity vs sgc->max_capacity comparison was
> meant to prevent misfit task downmigration: candidate groups classified as
> group_misfit_task but with a higher (max) CPU capacity than the destination CPU
> would be discarded. This change makes it so said group_misfit_task
> classification can't happen anymore, which may cause some undesired
> downmigrations.
>
> Further tweak find_busiest_queue() to ensure this doesn't happen. Also note
> find_busiest_queue() can now iterate over CPUs with a higher capacity than
> the local CPU's, so add a capacity check there.
>
> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
> ---
>  kernel/sched/fair.c | 63 ++++++++++++++++++++++++++++++++-------------
>  1 file changed, 45 insertions(+), 18 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 9b8ae02f1994..d2d1a69d7aa7 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5759,6 +5759,12 @@ static unsigned long capacity_of(int cpu)
>         return cpu_rq(cpu)->cpu_capacity;
>  }
>
> +/* Is CPU a's capacity noticeably greater than CPU b's? */
> +static inline bool cpu_capacity_greater(int a, int b)
> +{
> +       return capacity_greater(capacity_of(a), capacity_of(b));
> +}
> +
>  static void record_wakee(struct task_struct *p)
>  {
>         /*
> @@ -7486,6 +7492,7 @@ struct lb_env {
>
>         enum fbq_type           fbq_type;
>         enum migration_type     migration_type;
> +       enum group_type         src_grp_type;
>         struct list_head        tasks;
>  };
>
> @@ -8447,6 +8454,32 @@ static bool update_nohz_stats(struct rq *rq)
>  #endif
>  }
>
> +static inline void update_sg_lb_misfit_stats(struct lb_env *env,
> +                                            struct sched_group *group,
> +                                            struct sg_lb_stats *sgs,
> +                                            int *sg_status,
> +                                            int cpu)
> +{
> +       struct rq *rq = cpu_rq(cpu);
> +
> +       if (!(env->sd->flags & SD_ASYM_CPUCAPACITY) ||
> +           !rq->misfit_task_load)
> +               return;
> +
> +       *sg_status |= SG_OVERLOAD;
> +
> +       /*
> +        * Don't attempt to maximize load for misfit tasks that can't be
> +        * granted a CPU capacity uplift.
> +        */
> +       if (cpu_capacity_greater(env->dst_cpu, cpu)) {
> +               sgs->group_misfit_task_load = max(
> +                       sgs->group_misfit_task_load,
> +                       rq->misfit_task_load);
> +       }
> +
> +}
> +
>  /**
>   * update_sg_lb_stats - Update sched_group's statistics for load balancing.
>   * @env: The load balancing environment.
> @@ -8498,12 +8531,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
>                 if (local_group)
>                         continue;
>
> -               /* Check for a misfit task on the cpu */
> -               if (env->sd->flags & SD_ASYM_CPUCAPACITY &&
> -                   sgs->group_misfit_task_load < rq->misfit_task_load) {
> -                       sgs->group_misfit_task_load = rq->misfit_task_load;
> -                       *sg_status |= SG_OVERLOAD;
> -               }
> +               update_sg_lb_misfit_stats(env, group, sgs, sg_status, i);
>         }
>
>         /* Check if dst CPU is idle and preferred to this group */
> @@ -8550,15 +8578,9 @@ static bool update_sd_pick_busiest(struct lb_env *env,
>         if (!sgs->sum_h_nr_running)
>                 return false;
>
> -       /*
> -        * Don't try to pull misfit tasks we can't help.
> -        * We can use max_capacity here as reduction in capacity on some
> -        * CPUs in the group should either be possible to resolve
> -        * internally or be covered by avg_load imbalance (eventually).
> -        */
> +       /* Don't try to pull misfit tasks we can't help */
>         if (sgs->group_type == group_misfit_task &&
> -           (!capacity_greater(capacity_of(env->dst_cpu), sg->sgc->max_capacity) ||
> -            sds->local_stat.group_type != group_has_spare))
> +            sds->local_stat.group_type != group_has_spare)
>                 return false;
>
>         if (sgs->group_type > busiest->group_type)
> @@ -9288,6 +9310,8 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
>         if (!sds.busiest)
>                 goto out_balanced;
>
> +       env->src_grp_type = busiest->group_type;
> +
>         /* Misfit tasks should be dealt with regardless of the avg load */
>         if (busiest->group_type == group_misfit_task)
>                 goto force_balance;
> @@ -9441,8 +9465,8 @@ static struct rq *find_busiest_queue(struct lb_env *env,
>                  * average load.
>                  */
>                 if (env->sd->flags & SD_ASYM_CPUCAPACITY &&
> -                   !capacity_greater(capacity_of(env->dst_cpu), capacity) &&
> -                   nr_running == 1)
> +                   env->src_grp_type <= group_fully_busy &&
> +                   !capacity_greater(capacity_of(env->dst_cpu), capacity))
>                         continue;
>
>                 switch (env->migration_type) {
> @@ -9504,15 +9528,18 @@ static struct rq *find_busiest_queue(struct lb_env *env,
>                 case migrate_misfit:
>                         /*
>                          * For ASYM_CPUCAPACITY domains with misfit tasks we
> -                        * simply seek the "biggest" misfit task.
> +                        * simply seek the "biggest" misfit task we can
> +                        * accommodate.
>                          */
> +                       if (!cpu_capacity_greater(env->dst_cpu, i))

Use the same level of interface as above. This makes code and the
condition easier to follow in find_busiest_queue()

capacity_greater(capacity_of(env->dst_cpu), capacity_of(i))


> +                               continue;
> +
>                         if (rq->misfit_task_load > busiest_load) {
>                                 busiest_load = rq->misfit_task_load;
>                                 busiest = rq;
>                         }
>
>                         break;
> -
>                 }
>         }
>
> --
> 2.25.1
>

  parent reply	other threads:[~2021-04-16 13:29 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-15 17:58 [PATCH 0/2] sched/fair: (The return of) misfit task load-balance tweaks Valentin Schneider
2021-04-15 17:58 ` [PATCH 1/2] sched/fair: Filter out locally-unsolvable misfit imbalances Valentin Schneider
2021-04-15 18:47   ` Rik van Riel
2021-04-16 13:29   ` Vincent Guittot [this message]
2021-04-19 17:13     ` Valentin Schneider
2021-04-22  9:48   ` Dietmar Eggemann
2021-04-22 19:19     ` Valentin Schneider
2021-04-15 17:58 ` [PATCH 2/2] sched/fair: Relax task_hot() for misfit tasks Valentin Schneider
2021-04-15 20:39   ` Rik van Riel
2021-04-16  9:43     ` Valentin Schneider
2021-04-19 12:59       ` Phil Auld
2021-04-19 17:17         ` Valentin Schneider
2021-04-19 20:23           ` Phil Auld
2021-04-16 13:51   ` Vincent Guittot
2021-04-19 17:13     ` Valentin Schneider
2021-04-20 14:33       ` Vincent Guittot
2021-04-21 10:52         ` Valentin Schneider
2021-04-22 17:29           ` Dietmar Eggemann
2021-04-22 19:19             ` Valentin Schneider
2021-04-30  6:58           ` Vincent Guittot
2021-05-07 13:46             ` Valentin Schneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKfTPtA7uq24hFv57c=W_XkXH-g9b2xo7OfJHRP+cbKoh-0TOQ@mail.gmail.com' \
    --to=vincent.guittot@linaro.org \
    --cc=clingutla@codeaurora.org \
    --cc=dietmar.eggemann@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=pkondeti@codeaurora.org \
    --cc=qais.yousef@arm.com \
    --cc=qperret@google.com \
    --cc=riel@surriel.com \
    --cc=valentin.schneider@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.