From: Vincent Guittot <vincent.guittot@linaro.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
Ingo Molnar <mingo@redhat.com>, Phil Auld <pauld@redhat.com>,
Valentin Schneider <valentin.schneider@arm.com>,
Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
Quentin Perret <quentin.perret@arm.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Morten Rasmussen <Morten.Rasmussen@arm.com>
Subject: Re: [PATCH v2 4/8] sched/fair: rework load_balance
Date: Mon, 26 Aug 2019 11:31:10 +0200 [thread overview]
Message-ID: <CAKfTPtAS2F8ewznprk3d7fR7sAFD9Duy0PEmzyfW9Mot9ZZH=Q@mail.gmail.com> (raw)
In-Reply-To: <20190806155620.GV2332@hirez.programming.kicks-ass.net>
On Tue, 6 Aug 2019 at 17:56, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Aug 01, 2019 at 04:40:20PM +0200, Vincent Guittot wrote:
> > The load_balance algorithm contains some heuristics which have becomes
> > meaningless since the rework of metrics and the introduction of PELT.
> >
> > Furthermore, it's sometimes difficult to fix wrong scheduling decisions
> > because everything is based on load whereas some imbalances are not
> > related to the load. The current algorithm ends up to create virtual and
> > meaningless value like the avg_load_per_task or tweaks the state of a
> > group to make it overloaded whereas it's not, in order to try to migrate
> > tasks.
> >
> > load_balance should better qualify the imbalance of the group and define
> > cleary what has to be moved to fix this imbalance.
> >
> > The type of sched_group has been extended to better reflect the type of
> > imbalance. We now have :
> > group_has_spare
> > group_fully_busy
> > group_misfit_task
> > group_asym_capacity
> > group_imbalanced
> > group_overloaded
> >
> > Based on the type of sched_group, load_balance now sets what it wants to
> > move in order to fix the imnbalance. It can be some load as before but
> > also some utilization, a number of task or a type of task:
> > migrate_task
> > migrate_util
> > migrate_load
> > migrate_misfit
> >
> > This new load_balance algorithm fixes several pending wrong tasks
> > placement:
> > - the 1 task per CPU case with asymetrics system
> > - the case of cfs task preempted by other class
> > - the case of tasks not evenly spread on groups with spare capacity
> >
> > The load balance decisions have been gathered in 3 functions:
> > - update_sd_pick_busiest() select the busiest sched_group.
> > - find_busiest_group() checks if there is an imabalance between local and
> > busiest group.
> > - calculate_imbalance() decides what have to be moved.
>
> I like this lots, very good stuff.
>
> It is weird how misfit is still load, but istr later patches cure that.
>
> Here's a whole pile of nits, some overlap with what Valentin already
> noted. His suggestions for the changelog also make sense to me.
ok.
Thanks
>
> ---
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8205,10 +8205,10 @@ static bool update_sd_pick_busiest(struc
> {
> struct sg_lb_stats *busiest = &sds->busiest_stat;
>
> -
> /* Make sure that there is at least one task to pull */
> if (!sgs->sum_h_nr_running)
> return false;
> +
> /*
> * Don't try to pull misfit tasks we can't help.
> * We can use max_capacity here as reduction in capacity on some
> @@ -8239,11 +8239,11 @@ static bool update_sd_pick_busiest(struc
> break;
>
> case group_imbalanced:
> - /* Select the 1st imbalanced group as we don't have
> - * any way to choose one more than another
> + /*
> + * Select the 1st imbalanced group as we don't have any way to
> + * choose one more than another
> */
> return false;
> - break;
>
> case group_asym_capacity:
> /* Prefer to move from lowest priority CPU's work */
> @@ -8253,8 +8253,8 @@ static bool update_sd_pick_busiest(struc
>
> case group_misfit_task:
> /*
> - * If we have more than one misfit sg go with the
> - * biggest misfit.
> + * If we have more than one misfit sg go with the biggest
> + * misfit.
> */
> if (sgs->group_misfit_task_load < busiest->group_misfit_task_load)
> return false;
> @@ -8262,14 +8262,14 @@ static bool update_sd_pick_busiest(struc
>
> case group_fully_busy:
> /*
> - * Select the fully busy group with highest avg_load.
> - * In theory, there is no need to pull task from such
> - * kind of group because tasks have all compute
> - * capacity that they need but we can still improve the
> - * overall throughput by reducing contention when
> - * accessing shared HW resources.
> - * XXX for now avg_load is not computed and always 0 so
> - * we select the 1st one.
> + * Select the fully busy group with highest avg_load. In
> + * theory, there is no need to pull task from such kind of
> + * group because tasks have all compute capacity that they need
> + * but we can still improve the overall throughput by reducing
> + * contention when accessing shared HW resources.
> + *
> + * XXX for now avg_load is not computed and always 0 so we
> + * select the 1st one.
> */
> if (sgs->avg_load <= busiest->avg_load)
> return false;
> @@ -8277,11 +8277,11 @@ static bool update_sd_pick_busiest(struc
>
> case group_has_spare:
> /*
> - * Select not overloaded group with lowest number of
> - * idle cpus. We could also compare the spare capacity
> - * which is more stable but it can end up that the
> - * group has less spare capacity but finally more idle
> - * cpus which means less opportunity to pull tasks.
> + * Select not overloaded group with lowest number of idle cpus.
> + * We could also compare the spare capacity which is more
> + * stable but it can end up that the group has less spare
> + * capacity but finally more idle cpus which means less
> + * opportunity to pull tasks.
> */
> if (sgs->idle_cpus >= busiest->idle_cpus)
> return false;
> @@ -8289,10 +8289,10 @@ static bool update_sd_pick_busiest(struc
> }
>
> /*
> - * Candidate sg has no more than one task per CPU and
> - * has higher per-CPU capacity. Migrating tasks to less
> - * capable CPUs may harm throughput. Maximize throughput,
> - * power/energy consequences are not considered.
> + * Candidate sg has no more than one task per CPU and has higher
> + * per-CPU capacity. Migrating tasks to less capable CPUs may harm
> + * throughput. Maximize throughput, power/energy consequences are not
> + * considered.
> */
> if ((env->sd->flags & SD_ASYM_CPUCAPACITY) &&
> (sgs->group_type <= group_fully_busy) &&
> @@ -8428,6 +8428,7 @@ static inline void calculate_imbalance(s
>
> local = &sds->local_stat;
> busiest = &sds->busiest_stat;
> +
> if (busiest->group_type == group_imbalanced) {
> /*
> * In the group_imb case we cannot rely on group-wide averages
> @@ -8442,8 +8443,8 @@ static inline void calculate_imbalance(s
>
> if (busiest->group_type == group_asym_capacity) {
> /*
> - * In case of asym capacity, we will try to migrate all load
> - * to the preferred CPU
> + * In case of asym capacity, we will try to migrate all load to
> + * the preferred CPU
> */
> env->balance_type = migrate_load;
> env->imbalance = busiest->group_load;
> @@ -8483,7 +8484,8 @@ static inline void calculate_imbalance(s
> * groups.
> */
> env->balance_type = migrate_task;
> - env->imbalance = (busiest->sum_h_nr_running - local->sum_h_nr_running) >> 1;
> + env->imbalance = (busiest->sum_h_nr_running -
> + local->sum_h_nr_running) >> 1;
> return;
> }
>
> @@ -8502,7 +8504,7 @@ static inline void calculate_imbalance(s
> */
> if (local->group_type < group_overloaded) {
> /*
> - * Local will become overvloaded so the avg_load metrics are
> + * Local will become overloaded so the avg_load metrics are
> * finally needed
> */
>
> @@ -8514,12 +8516,12 @@ static inline void calculate_imbalance(s
> }
>
> /*
> - * Both group are or will become overloaded and we're trying to get
> - * all the CPUs to the average_load, so we don't want to push
> - * ourselves above the average load, nor do we wish to reduce the
> - * max loaded CPU below the average load. At the same time, we also
> - * don't want to reduce the group load below the group capacity.
> - * Thus we look for the minimum possible imbalance.
> + * Both group are or will become overloaded and we're trying to get all
> + * the CPUs to the average_load, so we don't want to push ourselves
> + * above the average load, nor do we wish to reduce the max loaded CPU
> + * below the average load. At the same time, we also don't want to
> + * reduce the group load below the group capacity. Thus we look for
> + * the minimum possible imbalance.
> */
> env->balance_type = migrate_load;
> env->imbalance = min(
> @@ -8641,7 +8643,6 @@ static struct sched_group *find_busiest_
> if (100 * busiest->avg_load <=
> env->sd->imbalance_pct * local->avg_load)
> goto out_balanced;
> -
> }
>
> /* Try to move all excess tasks to child's sibling domain */
> @@ -8651,7 +8652,7 @@ static struct sched_group *find_busiest_
>
> if (busiest->group_type != group_overloaded &&
> (env->idle == CPU_NOT_IDLE ||
> - local->idle_cpus <= (busiest->idle_cpus + 1)))
> + local->idle_cpus <= (busiest->idle_cpus + 1))) {
> /*
> * If the busiest group is not overloaded
> * and there is no imbalance between this and busiest group
> @@ -8661,15 +8662,14 @@ static struct sched_group *find_busiest_
> * group.
> */
> goto out_balanced;
> + }
>
> force_balance:
> /* Looks like there is an imbalance. Compute it */
> calculate_imbalance(env, &sds);
> -
> return env->imbalance ? sds.busiest : NULL;
>
> out_balanced:
> -
> env->imbalance = 0;
> return NULL;
> }
next prev parent reply other threads:[~2019-08-26 9:31 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-01 14:40 [PATCH v2 0/8] sched/fair: rework the CFS load balance Vincent Guittot
2019-08-01 14:40 ` [PATCH v2 1/8] sched/fair: clean up asym packing Vincent Guittot
2019-08-01 14:40 ` [PATCH v2 2/8] sched/fair: rename sum_nr_running to sum_h_nr_running Vincent Guittot
2019-08-01 14:40 ` [PATCH v2 3/8] sched/fair: remove meaningless imbalance calculation Vincent Guittot
2019-08-01 14:40 ` [PATCH v2 4/8] sched/fair: rework load_balance Vincent Guittot
2019-08-05 17:07 ` Valentin Schneider
2019-08-26 9:26 ` Vincent Guittot
2019-08-28 10:25 ` Valentin Schneider
2019-08-06 15:56 ` Peter Zijlstra
2019-08-26 9:31 ` Vincent Guittot [this message]
2019-08-06 17:17 ` Valentin Schneider
2019-08-07 11:16 ` Valentin Schneider
2019-08-26 10:11 ` Vincent Guittot
2019-08-28 14:19 ` Valentin Schneider
2019-08-29 14:26 ` Vincent Guittot
2019-08-30 14:33 ` Valentin Schneider
2019-08-01 14:40 ` [PATCH v2 5/8] sched/fair: use rq->nr_running when balancing load Vincent Guittot
2019-08-01 14:40 ` [PATCH v2 6/8] sched/fair: use load instead of runnable load Vincent Guittot
2019-08-06 16:07 ` Peter Zijlstra
2019-08-26 15:45 ` Vincent Guittot
2019-08-01 14:40 ` [PATCH v2 7/8] sched/fair: evenly spread tasks when not overloaded Vincent Guittot
2019-08-01 14:40 ` [PATCH v2 8/8] sched/fair: use utilization to select misfit task Vincent Guittot
2019-08-01 16:27 ` Valentin Schneider
2019-08-02 8:29 ` Vincent Guittot
2019-08-02 10:49 ` Valentin Schneider
2019-08-02 12:56 ` [PATCH v3] " Vincent Guittot
2019-08-02 14:27 ` Valentin Schneider
2019-08-05 11:01 ` Valentin Schneider
2019-08-29 19:23 ` [PATCH v2 0/8] sched/fair: rework the CFS load balance Phil Auld
2019-08-30 6:46 ` Vincent Guittot
[not found] ` <20190809052124.13016-1-hdanton@sina.com>
2019-09-02 13:07 ` [PATCH v2 5/8] sched/fair: use rq->nr_running when balancing load Vincent Guittot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAKfTPtAS2F8ewznprk3d7fR7sAFD9Duy0PEmzyfW9Mot9ZZH=Q@mail.gmail.com' \
--to=vincent.guittot@linaro.org \
--cc=Morten.Rasmussen@arm.com \
--cc=dietmar.eggemann@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pauld@redhat.com \
--cc=peterz@infradead.org \
--cc=quentin.perret@arm.com \
--cc=srikar@linux.vnet.ibm.com \
--cc=valentin.schneider@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).