linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Rik van Riel <riel@surriel.com>
Cc: Chris Mason <clm@fb.com>, Peter Zijlstra <peterz@infradead.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] fix scheduler regression from "sched/fair: Rework load_balance()"
Date: Mon, 26 Oct 2020 16:42:14 +0100	[thread overview]
Message-ID: <CAKfTPtAfKn0jzOpPNR4NUb0zLs02iLQq2_UCDSCEwhTB2LDAig@mail.gmail.com> (raw)
In-Reply-To: <334f491d2887a6ed7c5347d5125412849feb8a0a.camel@surriel.com>

On Mon, 26 Oct 2020 at 16:04, Rik van Riel <riel@surriel.com> wrote:
>
> On Mon, 2020-10-26 at 15:56 +0100, Vincent Guittot wrote:
> > On Mon, 26 Oct 2020 at 15:38, Rik van Riel <riel@surriel.com> wrote:
> > > On Mon, 2020-10-26 at 15:24 +0100, Vincent Guittot wrote:
> > > > Le lundi 26 oct. 2020 à 08:45:27 (-0400), Chris Mason a écrit :
> > > > > On 26 Oct 2020, at 4:39, Vincent Guittot wrote:
> > > > >
> > > > > > Hi Chris
> > > > > >
> > > > > > On Sat, 24 Oct 2020 at 01:49, Chris Mason <clm@fb.com> wrote:
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > > We’re validating a new kernel in the fleet, and compared
> > > > > > > with
> > > > > > > v5.2,
> > > > > >
> > > > > > Which version are you using ?
> > > > > > several improvements have been added since v5.5 and the
> > > > > > rework of
> > > > > > load_balance
> > > > >
> > > > > We’re validating v5.6, but all of the numbers referenced in
> > > > > this
> > > > > patch are
> > > > > against v5.9.  I usually try to back port my way to victory on
> > > > > this
> > > > > kind of
> > > > > thing, but mainline seems to behave exactly the same as
> > > > > 0b0695f2b34a wrt
> > > > > this benchmark.
> > > >
> > > > ok. Thanks for the confirmation
> > > >
> > > > I have been able to reproduce the problem on my setup.
> > > >
> > > > Could you try the fix below ?
> > > >
> > > > --- a/kernel/sched/fair.c
> > > > +++ b/kernel/sched/fair.c
> > > > @@ -9049,7 +9049,8 @@ static inline void
> > > > calculate_imbalance(struct
> > > > lb_env *env, struct sd_lb_stats *s
> > > >          * emptying busiest.
> > > >          */
> > > >         if (local->group_type == group_has_spare) {
> > > > -               if (busiest->group_type > group_fully_busy) {
> > > > +               if ((busiest->group_type > group_fully_busy) &&
> > > > +                   (busiest->group_weight > 1)) {
> > > >                         /*
> > > >                          * If busiest is overloaded, try to fill
> > > > spare
> > > >                          * capacity. This might end up creating
> > > > spare
> > > > capacity
> > > >
> > > >
> > > > When we calculate an imbalance at te smallest level, ie between
> > > > CPUs
> > > > (group_weight == 1),
> > > > we should try to spread tasks on cpus instead of trying to fill
> > > > spare
> > > > capacity.
> > >
> > > Should we also spread tasks when balancing between
> > > multi-threaded CPU cores on the same socket?
> >
> > My explanation is probably misleading. In fact we already try to
> > spread tasks. we just use spare capacity instead of nr_running when
> > there is more than 1 CPU in the group and the group is overloaded.
> > Using spare capacity is a bit more conservative because it tries to
> > not pull more utilization than spare capacity
>
> Could utilization estimates be off, either lagging or
> simply having a wrong estimate for a task, resulting
> in no task getting pulled sometimes, while doing a
> migrate_task imbalance always moves over something?

task and cpu utilization are not always up to fully synced and may lag
a bit which explains that sometimes LB can fail to migrate for a small
diff

>
> Within an LLC we might not need to worry too much
> about spare capacity, considering select_idle_sibling
> doesn't give a hoot about capacity, either.
>
> --
> All Rights Reversed.

  reply	other threads:[~2020-10-26 15:42 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-23 23:49 [PATCH] fix scheduler regression from "sched/fair: Rework load_balance()" Chris Mason
2020-10-26  8:39 ` Vincent Guittot
2020-10-26 12:45   ` Chris Mason
2020-10-26 14:24     ` Vincent Guittot
2020-10-26 14:38       ` Rik van Riel
2020-10-26 14:56         ` Vincent Guittot
2020-10-26 15:04           ` Rik van Riel
2020-10-26 15:42             ` Vincent Guittot [this message]
2020-10-26 15:54               ` Vincent Guittot
2020-10-26 16:04               ` Rik van Riel
2020-10-26 16:20                 ` Vincent Guittot
2020-10-26 16:48                   ` Chris Mason
2020-10-26 16:52                     ` Vincent Guittot
2020-10-30  2:10                       ` Rik van Riel
2020-10-30  9:16                         ` Vincent Guittot
2020-10-26 15:05       ` Chris Mason
2020-10-26 15:18         ` Vincent Guittot
2020-10-26 15:28         ` Chris Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKfTPtAfKn0jzOpPNR4NUb0zLs02iLQq2_UCDSCEwhTB2LDAig@mail.gmail.com \
    --to=vincent.guittot@linaro.org \
    --cc=clm@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).