From: "Paul E. McKenney" <paulmck@kernel.org>
To: Uladzislau Rezki <urezki@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
Thomas Gleixner <tglx@linutronix.de>,
"tip-bot2 for Paul E. McKenney" <tip-bot2@linutronix.de>,
linux-tip-commits@vger.kernel.org,
Peter Zijlstra <peterz@infradead.org>,
Frederic Weisbecker <frederic@kernel.org>,
x86@kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [tip: core/rcu] softirq: Don't try waking ksoftirqd before it has been spawned
Date: Wed, 14 Apr 2021 11:11:58 -0700 [thread overview]
Message-ID: <20210414181158.GU4510@paulmck-ThinkPad-P17-Gen-1> (raw)
In-Reply-To: <20210414085757.GA1917@pc638.lan>
On Wed, Apr 14, 2021 at 10:57:57AM +0200, Uladzislau Rezki wrote:
> On Wed, Apr 14, 2021 at 09:13:22AM +0200, Sebastian Andrzej Siewior wrote:
> > On 2021-04-12 11:36:45 [-0700], Paul E. McKenney wrote:
> > > > Color me confused. I did not follow the discussion around this
> > > > completely, but wasn't it agreed on that this rcu torture muck can wait
> > > > until the threads are brought up?
> > >
> > > Yes, we can cause rcutorture to wait. But in this case, rcutorture
> > > is just the messenger, and making it wait would simply be ignoring
> > > the message. The message is that someone could invoke any number of
> > > things that wait on a softirq handler's invocation during the interval
> > > before ksoftirqd has been spawned.
> >
> > My memory on this is that the only user, that required this early
> > behaviour, was kprobe which was recently changed to not need it anymore.
> > Which makes the test as the only user that remains. Therefore I thought
> > that this test will be moved to later position (when ksoftirqd is up and
> > running) and that there is no more requirement for RCU to be completely
> > up that early in the boot process.
> >
> > Did I miss anything?
> >
> Seems not. Let me wrap it up a bit though i may miss something:
>
> 1) Initially we had an issue with booting RISV because of:
>
> 36dadef23fcc ("kprobes: Init kprobes in early_initcall")
>
> i.e. a developer decided to move initialization of kprobe at
> early_initcall() phase. Since kprobe uses synchronize_rcu_tasks()
> a system did not boot due to the fact that RCU-tasks were setup
> at core_initcall() step. It happens later in this chain.
>
> To address that issue, we had decided to move RCU-tasks setup
> to before early_initcall() and it worked well:
>
> https://lore.kernel.org/lkml/20210218083636.GA2030@pc638.lan/T/
>
> 2) After that fix you reported another issue. If the kernel is run
> with "threadirqs=1" - it did not boot also. Because ksoftirqd does
> not exist by that time, thus our early-rcu-self test did not pass.
>
> 3) Due to (2), Masami Hiramatsu proposed to fix kprobes by delaying
> kprobe optimization and it also addressed initial issue:
>
> https://lore.kernel.org/lkml/20210219112357.GA34462@pc638.lan/T/
>
> At the same time Paul made another patch:
>
> softirq: Don't try waking ksoftirqd before it has been spawned
>
> it allows us to keep RCU-tasks initialization before even
> early_initcall() where it is now and let our rcu-self-test
> to be completed without any hanging.
In short, this window of time in which it is not possible to reliably
wait on a softirq handler has caused trouble, just as several other
similar boot-sequence time windows have caused trouble in the past.
It therefore makes sense to just eliminate this problem, and prevent
future developers from facing inexplicable silent boot-time hangs.
We can move the spawning of ksoftirqd kthreads earlier, but that
simply narrows the window. It does not eliminate the problem.
I can easily believe that this might have -rt consequences that need
attention. For your amusement, I will make a few guesses as to what
these might be:
o Back-of-interrupt softirq handlers degrade real-time response.
This should not be a problem this early in boot, and once the
ksoftirqd kthreads are spawned, there will never be another
back-of-interrupt softirq handler in kernels that have
force_irqthreads set, which includes -rt kernels.
o That !__this_cpu_read(ksoftirqd) check remains at runtime, even
though it always evaluates to false. I would be surprised if
this overhead is measurable at the system level, but if it is,
static branches should take care of this.
o There might be a -rt lockdep check that isn't happy with
back-of-interrupt softirq handlers. But such a lockdep check
could be conditioned on __this_cpu_read(ksoftirqd), thus
preventing it from firing during that short window at boot time.
o The -rt kernels might be using locks to implement things like
local_bh_disable(), in which case back-of-interrupt softirq
handlers could result in self-deadlock. This could be addressed
by disabling bh the old way up to the time that the ksoftirqd
kthreads are created. Because these are created while the system
is running on a single CPU (right?), a simple flag (or static
branch) could be used to switch this behavior into lock-only
mode long before the first real-time application can be spawned.
So my turn. Did I miss anything?
Thanx, Paul
next prev parent reply other threads:[~2021-04-14 18:12 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-11 13:43 [tip: core/rcu] softirq: Don't try waking ksoftirqd before it has been spawned tip-bot2 for Paul E. McKenney
2021-04-12 14:16 ` Thomas Gleixner
2021-04-12 18:36 ` Paul E. McKenney
2021-04-14 7:13 ` Sebastian Andrzej Siewior
2021-04-14 8:57 ` Uladzislau Rezki
2021-04-14 18:11 ` Paul E. McKenney [this message]
2021-04-14 23:54 ` Thomas Gleixner
2021-04-15 5:02 ` Paul E. McKenney
2021-04-15 14:34 ` Uladzislau Rezki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210414181158.GU4510@paulmck-ThinkPad-P17-Gen-1 \
--to=paulmck@kernel.org \
--cc=bigeasy@linutronix.de \
--cc=frederic@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-tip-commits@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=tip-bot2@linutronix.de \
--cc=urezki@gmail.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).