linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Valentin Schneider <vschneid@redhat.com>
Cc: linux-kernel@vger.kernel.org, mingo@redhat.com,
	peterz@infradead.org, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
	bristot@redhat.com
Subject: Re: "Dying CPU not properly vacated" splat
Date: Tue, 26 Apr 2022 09:24:45 -0700	[thread overview]
Message-ID: <20220426162445.GG4285@paulmck-ThinkPad-P17-Gen-1> (raw)
In-Reply-To: <xhsmhy1zr99zt.mognet@vschneid.remote.csb>

On Tue, Apr 26, 2022 at 03:48:06PM +0100, Valentin Schneider wrote:
> On 25/04/22 17:03, Paul E. McKenney wrote:
> > On Mon, Apr 25, 2022 at 10:59:44PM +0100, Valentin Schneider wrote:
> >> On 25/04/22 10:33, Paul E. McKenney wrote:
> >> >
> >> > So what did rcu_torture_reader() do wrong here?  ;-)
> >> >
> >>
> >> So on teardown, CPUHP_AP_SCHED_WAIT_EMPTY->sched_cpu_wait_empty() waits for
> >> the rq to be empty. Tasks must *not* be enqueued onto that CPU after that
> >> step has been run - if there are per-CPU tasks bound to that CPU, they must
> >> be unbound in their respective hotplug callback.
> >>
> >> For instance for workqueue.c, we have workqueue_offline_cpu() as a hotplug
> >> callback which invokes unbind_workers(cpu), the interesting bit being:
> >>
> >>                 for_each_pool_worker(worker, pool) {
> >>                         kthread_set_per_cpu(worker->task, -1);
> >>                         WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, cpu_possible_mask) < 0);
> >>                 }
> >>
> >> The rcu_torture_reader() kthreads aren't bound to any particular CPU are
> >> they? I can't find any code that would indicate they are - and in that case
> >> it means we have a problem with is_cpu_allowed() or related.
> >
> > I did not intend that the rcu_torture_reader() kthreads be bound, and
> > I am not seeing anything that binds them.
> >
> > Thoughts?  (Other than that validating any alleged fix will be quite
> > "interesting".)
> 
> IIUC the bogus scenario is is_cpu_allowed() lets one of those kthreads be
> enqueued on the outgoing CPU *after* CPUHP_AP_SCHED_WAIT_EMPTY.teardown() has
> been run, and hilarity ensues.
> 
> The cpu_dying() condition should prevent a regular kthread from getting
> enqueued there, most of the details have been evinced from my brain but I
> recall we got the ordering conditions right...
> 
> The only other "obvious" thing here is migrate_disable() which lets the
> enqueue happen, but then balance_push()->select_fallback_rq() should punt
> it away on context switch.
> 
> I need to rediscover those paths, I don't see any obvious clue right now.

Thank you for looking into this!

The only thought that came to me was to record that is_cpu_allowed()
returned true do to migration being disabled, and then use that in later
traces, printk()s or whatever.

My own favorite root-cause hypothesis was invalidated by the fact that
is_cpu_allowed() returns cpu_online(cpu) rather than just true.  ;-)

							Thanx, Paul

  reply	other threads:[~2022-04-26 16:24 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-21 19:38 "Dying CPU not properly vacated" splat Paul E. McKenney
2022-04-25 16:15 ` Valentin Schneider
2022-04-25 17:33   ` Paul E. McKenney
2022-04-25 21:59     ` Valentin Schneider
2022-04-26  0:03       ` Paul E. McKenney
2022-04-26 14:48         ` Valentin Schneider
2022-04-26 16:24           ` Paul E. McKenney [this message]
2022-06-22 19:58             ` Paul E. McKenney
2022-07-05  7:45               ` Valentin Schneider
2022-07-05 17:23                 ` Paul E. McKenney
2022-08-02  9:30                   ` Valentin Schneider
2023-09-06 13:08                     ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220426162445.GG4285@paulmck-ThinkPad-P17-Gen-1 \
    --to=paulmck@kernel.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).