All of lore.kernel.org
 help / color / mirror / Atom feed
From: Valentin Schneider <valentin.schneider@arm.com>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Peng Liu <iwtbavbm@gmail.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] sched/fair: Fix nohz.next_balance update
Date: Wed, 06 May 2020 21:21:15 +0100	[thread overview]
Message-ID: <jhjr1vw7r2c.mognet@arm.com> (raw)
In-Reply-To: <CAKfTPtAujvP=kN6zuB9N+5H2xGZ2U2ScsDUcUf+3iLeKbrmNKg@mail.gmail.com>


On 06/05/20 17:56, Vincent Guittot wrote:
> On Wed, 6 May 2020 at 18:03, Valentin Schneider
> <valentin.schneider@arm.com> wrote:
>>
>>
>> On 06/05/20 14:45, Vincent Guittot wrote:
>> >> But then we may skip an update if we goto abort, no? Imagine we have just
>> >> NOHZ_STATS_KICK, so we don't call any rebalance_domains(), and then as we
>> >> go through the last NOHZ CPU in the loop we hit need_resched(). We would
>> >> end in the abort part without any update to nohz.next_balance, despite
>> >> having accumulated relevant data in the local next_balance variable.
>> >
>> > Yes but on the other end, the last CPU has not been able to run the
>> > rebalance_domain so we must not move  nohz.next_balance otherwise it
>> > will have to wait for at least another full period
>> > In fact, I think that we have a problem with current implementation
>> > because if we abort because  local cpu because busy we might end up
>> > skipping idle load balance for a lot of idle CPUs
>> >
>> > As an example, imagine that we have 10 idle CPUs with the same
>> > rq->next_balance which equal nohz.next_balance.  _nohz_idle_balance
>> > starts on CPU0, it processes idle lb for CPU1 but then has to abort
>> > because of need_resched. If we update nohz.next_balance like
>> > currently, the next idle load balance  will happen after a full
>> > balance interval whereas we still have 8 CPUs waiting for running an
>> > idle load balance.
>> >
>> > My proposal also fixes this problem
>> >
>>
>> That's a very good point; so with NOHZ_BALANCE_KICK we can reduce
>> nohz.next_balance via rebalance_domains(), and otherwise we would only
>> increase it if we go through a complete for_each_cpu() loop in
>> _nohz_idle_balance().
>>
>> That said, if for some reason we keep bailing out of the loop, we won't
>> push nohz.next_balance forward and thus may repeatedly nohz-balance only
>> the first few CPUs in the NOHZ mask. I think that can happen if we have
>> say 2 tasks pinned to a single rq, in that case nohz_balancer_kick() will
>> kick a NOHZ balance whenever now >= nohz.next_balance.
>
> If we take my example above and we have CPU0 which is idle at every
> tick and selected as ilb_cpu but unluckily CPU0 has to abort before
> running ilb for CPU1 everytime, I agree that we can end up trying to
> run ilb on CPU0 at every tick without any success. We might consider
> to kick_ilb in _nohz_idle_balance if we have to abort to let another
> CPU handle the ilb

That's an idea; maybe something like the next CPU that was due to be
rebalanced (i.e. the one for which we hit the goto abort).

  reply	other threads:[~2020-05-06 20:21 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-03  8:34 [PATCH] sched/fair: Fix nohz.next_balance update Peng Liu
2020-05-04  0:10 ` Valentin Schneider
2020-05-05 12:36   ` Peng Liu
2020-05-04 15:17 ` Vincent Guittot
2020-05-04 15:48   ` Valentin Schneider
2020-05-04 16:05   ` Dietmar Eggemann
2020-05-05 13:40   ` Peng Liu
2020-05-05 14:27     ` Vincent Guittot
2020-05-05 15:16       ` Peng Liu
2020-05-05 15:43         ` Vincent Guittot
2020-05-05 16:08           ` Peng Liu
2020-05-06 10:29       ` Valentin Schneider
2020-05-06 13:45         ` Vincent Guittot
2020-05-06 16:02           ` Valentin Schneider
2020-05-06 16:56             ` Vincent Guittot
2020-05-06 20:21               ` Valentin Schneider [this message]
2020-05-07 12:41             ` Peng Liu
2020-05-07 12:53               ` Vincent Guittot
2020-05-08 13:01       ` Peng Liu
2020-05-08 15:31         ` Vincent Guittot
2020-05-06 10:28   ` Valentin Schneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jhjr1vw7r2c.mognet@arm.com \
    --to=valentin.schneider@arm.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=iwtbavbm@gmail.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.