All of lore.kernel.org
 help / color / mirror / Atom feed
From: Valentin Schneider <valentin.schneider@arm.com>
To: Will Deacon <will@kernel.org>
Cc: linux-kernel@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Qais Yousef <qais.yousef@arm.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Quentin Perret <qperret@google.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	kernel-team@android.com
Subject: Re: [PATCH 2/2] sched: Plug race between SCA, hotplug and migration_cpu_stop()
Date: Thu, 24 Jun 2021 10:33:07 +0100	[thread overview]
Message-ID: <87v9638v58.mognet@arm.com> (raw)
In-Reply-To: <20210622135140.GA31071@willie-the-truck>

Hi,

On 22/06/21 14:51, Will Deacon wrote:
> Hi Valentin,
>
> I've been looking at this on and off and I'm afraid I'm still not convinced,
> almost certainly due to my own ignorance, but hey :)
>

Nah, this thing is nasty, I still haven't regrown the hair I lost from last
time I stared at it.

> On Wed, Jun 02, 2021 at 08:43:31PM +0100, Valentin Schneider wrote:
>> Harumph...
>>
>> So something like all CPUs but one are running their take_cpu_down()
>> callback because one is still running migration_cpu_stop(), i.e.:
>>
>>   CPU0                   CPU1                ...             CPUN
>>   <stopper>              <stopper>                           <stopper>
>>     migration_cpu_stop()   take_cpu_down()@MULTI_STOP_PREPARE    take_cpu_down()@MULTI_STOP_PREPARE
>>
>> If CPU0 hits that else if (pending) condition, it'll queue a
>> migration_cpu_stop() elsewhere (say CPU1), then run the take_cpu_down()
>> callback which follows in its work->list.
>>
>> If the CPU being brought down is anything else than CPU1, it shouldn't
>> really matter. If it *is* CPU1, then I think we've got some guarantees.
>>
>> Namely, there's no (read: there shouldn't be any) way for a task to
>> still be on CPU1 at this point; per sched_cpu_wait_empty(),
>> migration-disabled tasks and pcpu kthreads must vacate the CPU before it
>> then (migrate_disable regions must be bounded, and pcpu kthreads are
>> expected to be moved away by their respective owners).
>
> I agree with that, but the stopper is still running on CPU1 and so
> migration_cpu_stop() could still queue work there after sched_cpu_wait_empty()
> has returned but before stop_machine_park(), afaict.
>

Right, the stopper should flush its work before parking itself - in the
above scenario migration_cpu_stop() is already on CPU1's work list when it
finishes running take_cpu_down(), so that should run before any parking
happens.

> Actually, it looks like migration_cpu_stop() ignores the return value of
> stop_one_cpu_nowait(), so if the stopper thread has been parked I think
> we'll quietly do nothing there as well.
>

There's a handful of loosely related cogs (including those added by the
patch) that *I think* give us sufficient guarantees this can't happen. Let
me try to structure this somehow, and please point out any inane bit.

1) Per the synchronize_rcu() in CPUHP_AP_ACTIVE.teardown(), the return
   value of is_cpu_allowed(p, cpu) is stable within a migration_cpu_stop()
   invocation for any p allowed on *active* CPUs.
2) Per the serialization of stopper callbacks, the return value of
   is_cpu_allowed(p, cpu) is stable within a migration_cpu_stop()
   invocation for any p allowed on *online* CPUs: migration_cpu_stop()
   has to complete before any CPU can do an online -> !online transition.

  (I would also add here that I don't expect any task to be migrated to an
  !active && online dying CPU via migration_cpu_stop(): the only tasks
  allowed on such CPUs are pcpu kthreads, which will be parked rather than
  migrated, and migrate_disabled() tasks which, unsurprisingly, aren't
  migrated - they just get to *stay* on their CPU longer than regular tasks)

3) Per 1) + 2), the move_queued_task() call in migration_cpu_stop() cannot
   silently fail.


4) No task other than a hotplug thread or a stopper thread may run on a
   CPU beyond CPUHP_AP_SCHED_WAIT_EMPTY.teardown()
5) Per 4), no (queued) task p passed to migration_cpu_stop() can have
   task_cpu(p) be a dying CPU with hotplug state below
   CPUHP_AP_SCHED_WAIT_EMPTY.
6) Per 5), migration_cpu_stop() cannot invoke stop_one_cpu_nowait() on a
   CPU with a parked stopper.
7) Per 6), migration_cpu_stop() self-queuing cannot be silently discarded
   and will always end up executed

Per 3) + 7), we're good?

      reply	other threads:[~2021-06-24  9:33 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-26 20:57 [PATCH 0/2] sched: SCA vs hotplug vs stopper races fixes Valentin Schneider
2021-05-26 20:57 ` [PATCH 1/2] sched: Don't defer CPU pick to migration_cpu_stop() Valentin Schneider
2021-06-01 14:04   ` [tip: sched/core] " tip-bot2 for Valentin Schneider
2021-05-26 20:57 ` [PATCH 2/2] sched: Plug race between SCA, hotplug and migration_cpu_stop() Valentin Schneider
2021-05-27  9:12   ` Peter Zijlstra
2021-06-01 16:59   ` Valentin Schneider
2021-06-02 15:26     ` Will Deacon
2021-06-02 19:43       ` Valentin Schneider
2021-06-22 13:51         ` Will Deacon
2021-06-24  9:33           ` Valentin Schneider [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87v9638v58.mognet@arm.com \
    --to=valentin.schneider@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=kernel-team@android.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=qais.yousef@arm.com \
    --cc=qperret@google.com \
    --cc=vincent.guittot@linaro.org \
    --cc=will@kernel.org \
    --subject='Re: [PATCH 2/2] sched: Plug race between SCA, hotplug and migration_cpu_stop()' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.