[tip:,core/rcu] softirq: Don't try waking ksoftirqd before it has been spawned
diff mbox series

Message ID 161814860838.29796.15260901429057690999.tip-bot2@tip-bot2
State Accepted
Commit 1c0c4bc1ceb580851b2d76fdef9712b3bdae134b
Headers show
Series
  • [tip:,core/rcu] softirq: Don't try waking ksoftirqd before it has been spawned
Related show

Commit Message

tip-bot2 for Vitaly Kuznetsov April 11, 2021, 1:43 p.m. UTC
The following commit has been merged into the core/rcu branch of tip:

Commit-ID:     1c0c4bc1ceb580851b2d76fdef9712b3bdae134b
Gitweb:        https://git.kernel.org/tip/1c0c4bc1ceb580851b2d76fdef9712b3bdae134b
Author:        Paul E. McKenney <paulmck@kernel.org>
AuthorDate:    Fri, 12 Feb 2021 16:20:40 -08:00
Committer:     Paul E. McKenney <paulmck@kernel.org>
CommitterDate: Mon, 15 Mar 2021 13:51:48 -07:00

softirq: Don't try waking ksoftirqd before it has been spawned

If there is heavy softirq activity, the softirq system will attempt
to awaken ksoftirqd and will stop the traditional back-of-interrupt
softirq processing.  This is all well and good, but only if the
ksoftirqd kthreads already exist, which is not the case during early
boot, in which case the system hangs.

One reproducer is as follows:

tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 2 --configs "TREE03" --kconfig "CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y CONFIG_NO_HZ_IDLE=y CONFIG_HZ_PERIODIC=n" --bootargs "threadirqs=1" --trust-make

This commit therefore adds a couple of existence checks for ksoftirqd
and forces back-of-interrupt softirq processing when ksoftirqd does not
yet exist.  With this change, the above test passes.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: Uladzislau Rezki <urezki@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
[ paulmck: Remove unneeded check per Sebastian Siewior feedback. ]
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/softirq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Thomas Gleixner April 12, 2021, 2:16 p.m. UTC | #1
On Sun, Apr 11 2021 at 13:43, tip-bot wrote:
> The following commit has been merged into the core/rcu branch of tip:
>
> Commit-ID:     1c0c4bc1ceb580851b2d76fdef9712b3bdae134b
> Gitweb:        https://git.kernel.org/tip/1c0c4bc1ceb580851b2d76fdef9712b3bdae134b
> Author:        Paul E. McKenney <paulmck@kernel.org>
> AuthorDate:    Fri, 12 Feb 2021 16:20:40 -08:00
> Committer:     Paul E. McKenney <paulmck@kernel.org>
> CommitterDate: Mon, 15 Mar 2021 13:51:48 -07:00
>
> softirq: Don't try waking ksoftirqd before it has been spawned
>
> If there is heavy softirq activity, the softirq system will attempt
> to awaken ksoftirqd and will stop the traditional back-of-interrupt
> softirq processing.  This is all well and good, but only if the
> ksoftirqd kthreads already exist, which is not the case during early
> boot, in which case the system hangs.
>
> One reproducer is as follows:
>
> tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 2 --configs "TREE03" --kconfig "CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y CONFIG_NO_HZ_IDLE=y CONFIG_HZ_PERIODIC=n" --bootargs "threadirqs=1" --trust-make
>
> This commit therefore adds a couple of existence checks for ksoftirqd
> and forces back-of-interrupt softirq processing when ksoftirqd does not
> yet exist.  With this change, the above test passes.

Color me confused. I did not follow the discussion around this
completely, but wasn't it agreed on that this rcu torture muck can wait
until the threads are brought up?

> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index 9908ec4..bad14ca 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -211,7 +211,7 @@ static inline void invoke_softirq(void)
>  	if (ksoftirqd_running(local_softirq_pending()))
>  		return;
>  
> -	if (!force_irqthreads) {
> +	if (!force_irqthreads || !__this_cpu_read(ksoftirqd)) {
>  #ifdef CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK
>  		/*
>  		 * We can safely execute softirq on the current stack if

This still breaks RT which forces force_irqthreads to a compile time
const which makes the compiler optimize out the direct invocation.

Surely RT can work around that, but how is that rcu torture muck
supposed to work then? We're back to square one then.

Thanks,

        tglx
Paul E. McKenney April 12, 2021, 6:36 p.m. UTC | #2
On Mon, Apr 12, 2021 at 04:16:55PM +0200, Thomas Gleixner wrote:
> On Sun, Apr 11 2021 at 13:43, tip-bot wrote:
> > The following commit has been merged into the core/rcu branch of tip:
> >
> > Commit-ID:     1c0c4bc1ceb580851b2d76fdef9712b3bdae134b
> > Gitweb:        https://git.kernel.org/tip/1c0c4bc1ceb580851b2d76fdef9712b3bdae134b
> > Author:        Paul E. McKenney <paulmck@kernel.org>
> > AuthorDate:    Fri, 12 Feb 2021 16:20:40 -08:00
> > Committer:     Paul E. McKenney <paulmck@kernel.org>
> > CommitterDate: Mon, 15 Mar 2021 13:51:48 -07:00
> >
> > softirq: Don't try waking ksoftirqd before it has been spawned
> >
> > If there is heavy softirq activity, the softirq system will attempt
> > to awaken ksoftirqd and will stop the traditional back-of-interrupt
> > softirq processing.  This is all well and good, but only if the
> > ksoftirqd kthreads already exist, which is not the case during early
> > boot, in which case the system hangs.
> >
> > One reproducer is as follows:
> >
> > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 2 --configs "TREE03" --kconfig "CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y CONFIG_NO_HZ_IDLE=y CONFIG_HZ_PERIODIC=n" --bootargs "threadirqs=1" --trust-make
> >
> > This commit therefore adds a couple of existence checks for ksoftirqd
> > and forces back-of-interrupt softirq processing when ksoftirqd does not
> > yet exist.  With this change, the above test passes.
> 
> Color me confused. I did not follow the discussion around this
> completely, but wasn't it agreed on that this rcu torture muck can wait
> until the threads are brought up?

Yes, we can cause rcutorture to wait.  But in this case, rcutorture
is just the messenger, and making it wait would simply be ignoring
the message.  The message is that someone could invoke any number of
things that wait on a softirq handler's invocation during the interval
before ksoftirqd has been spawned.

We looked at spawning the ksoftirq kthreads earlier, but that just
narrows the window -- it doesn't eliminate the problem.

We considered adding a check for this condition in order to yell at
people who invoke things that rely heavily on softirq during this time,
but there are perfectly legitimate use cases where it is OK for the
softirq handlers to just sit there until ksoftirqd is spawned.  The
problem isn't doing a raise_softirq(), but instead waiting on the
corresponding handler to complete.

We didn't see any reasonable false-positive-free way to create a reliable
diagnostic for that case, possibly due to a lack of imagination on
our part.

Ideas?

> > diff --git a/kernel/softirq.c b/kernel/softirq.c
> > index 9908ec4..bad14ca 100644
> > --- a/kernel/softirq.c
> > +++ b/kernel/softirq.c
> > @@ -211,7 +211,7 @@ static inline void invoke_softirq(void)
> >  	if (ksoftirqd_running(local_softirq_pending()))
> >  		return;
> >  
> > -	if (!force_irqthreads) {
> > +	if (!force_irqthreads || !__this_cpu_read(ksoftirqd)) {
> >  #ifdef CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK
> >  		/*
> >  		 * We can safely execute softirq on the current stack if
> 
> This still breaks RT which forces force_irqthreads to a compile time
> const which makes the compiler optimize out the direct invocation.
> 
> Surely RT can work around that, but how is that rcu torture muck
> supposed to work then? We're back to square one then.

Ah.  So RT relies on softirq handlers never ever being directly invoked,
even during boot time?  I was not aware of that.

OK, I will bite...  What are the RT workarounds for this case?  Maybe
they apply more generally.

							Thanx, Paul
Sebastian Andrzej Siewior April 14, 2021, 7:13 a.m. UTC | #3
On 2021-04-12 11:36:45 [-0700], Paul E. McKenney wrote:
> > Color me confused. I did not follow the discussion around this
> > completely, but wasn't it agreed on that this rcu torture muck can wait
> > until the threads are brought up?
> 
> Yes, we can cause rcutorture to wait.  But in this case, rcutorture
> is just the messenger, and making it wait would simply be ignoring
> the message.  The message is that someone could invoke any number of
> things that wait on a softirq handler's invocation during the interval
> before ksoftirqd has been spawned.

My memory on this is that the only user, that required this early
behaviour, was kprobe which was recently changed to not need it anymore.
Which makes the test as the only user that remains. Therefore I thought
that this test will be moved to later position (when ksoftirqd is up and
running) and that there is no more requirement for RCU to be completely
up that early in the boot process.

Did I miss anything?

> 
> 							Thanx, Paul

Sebastian
Uladzislau Rezki April 14, 2021, 8:57 a.m. UTC | #4
On Wed, Apr 14, 2021 at 09:13:22AM +0200, Sebastian Andrzej Siewior wrote:
> On 2021-04-12 11:36:45 [-0700], Paul E. McKenney wrote:
> > > Color me confused. I did not follow the discussion around this
> > > completely, but wasn't it agreed on that this rcu torture muck can wait
> > > until the threads are brought up?
> > 
> > Yes, we can cause rcutorture to wait.  But in this case, rcutorture
> > is just the messenger, and making it wait would simply be ignoring
> > the message.  The message is that someone could invoke any number of
> > things that wait on a softirq handler's invocation during the interval
> > before ksoftirqd has been spawned.
> 
> My memory on this is that the only user, that required this early
> behaviour, was kprobe which was recently changed to not need it anymore.
> Which makes the test as the only user that remains. Therefore I thought
> that this test will be moved to later position (when ksoftirqd is up and
> running) and that there is no more requirement for RCU to be completely
> up that early in the boot process.
> 
> Did I miss anything?
> 
Seems not. Let me wrap it up a bit though i may miss something:

1) Initially we had an issue with booting RISV because of:

36dadef23fcc ("kprobes: Init kprobes in early_initcall")

i.e. a developer decided to move initialization of kprobe at
early_initcall() phase. Since kprobe uses synchronize_rcu_tasks()
a system did not boot due to the fact that RCU-tasks were setup
at core_initcall() step. It happens later in this chain.

To address that issue, we had decided to move RCU-tasks setup
to before early_initcall() and it worked well:

https://lore.kernel.org/lkml/20210218083636.GA2030@pc638.lan/T/

2) After that fix you reported another issue. If the kernel is run
with "threadirqs=1" - it did not boot also. Because ksoftirqd does
not exist by that time, thus our early-rcu-self test did not pass.

3) Due to (2), Masami Hiramatsu proposed to fix kprobes by delaying
kprobe optimization and it also addressed initial issue:

https://lore.kernel.org/lkml/20210219112357.GA34462@pc638.lan/T/

At the same time Paul made another patch:

softirq: Don't try waking ksoftirqd before it has been spawned

it allows us to keep RCU-tasks initialization before even
early_initcall() where it is now and let our rcu-self-test
to be completed without any hanging.

--
Vlad Rezki
Paul E. McKenney April 14, 2021, 6:11 p.m. UTC | #5
On Wed, Apr 14, 2021 at 10:57:57AM +0200, Uladzislau Rezki wrote:
> On Wed, Apr 14, 2021 at 09:13:22AM +0200, Sebastian Andrzej Siewior wrote:
> > On 2021-04-12 11:36:45 [-0700], Paul E. McKenney wrote:
> > > > Color me confused. I did not follow the discussion around this
> > > > completely, but wasn't it agreed on that this rcu torture muck can wait
> > > > until the threads are brought up?
> > > 
> > > Yes, we can cause rcutorture to wait.  But in this case, rcutorture
> > > is just the messenger, and making it wait would simply be ignoring
> > > the message.  The message is that someone could invoke any number of
> > > things that wait on a softirq handler's invocation during the interval
> > > before ksoftirqd has been spawned.
> > 
> > My memory on this is that the only user, that required this early
> > behaviour, was kprobe which was recently changed to not need it anymore.
> > Which makes the test as the only user that remains. Therefore I thought
> > that this test will be moved to later position (when ksoftirqd is up and
> > running) and that there is no more requirement for RCU to be completely
> > up that early in the boot process.
> > 
> > Did I miss anything?
> > 
> Seems not. Let me wrap it up a bit though i may miss something:
> 
> 1) Initially we had an issue with booting RISV because of:
> 
> 36dadef23fcc ("kprobes: Init kprobes in early_initcall")
> 
> i.e. a developer decided to move initialization of kprobe at
> early_initcall() phase. Since kprobe uses synchronize_rcu_tasks()
> a system did not boot due to the fact that RCU-tasks were setup
> at core_initcall() step. It happens later in this chain.
> 
> To address that issue, we had decided to move RCU-tasks setup
> to before early_initcall() and it worked well:
> 
> https://lore.kernel.org/lkml/20210218083636.GA2030@pc638.lan/T/
> 
> 2) After that fix you reported another issue. If the kernel is run
> with "threadirqs=1" - it did not boot also. Because ksoftirqd does
> not exist by that time, thus our early-rcu-self test did not pass.
> 
> 3) Due to (2), Masami Hiramatsu proposed to fix kprobes by delaying
> kprobe optimization and it also addressed initial issue:
> 
> https://lore.kernel.org/lkml/20210219112357.GA34462@pc638.lan/T/
> 
> At the same time Paul made another patch:
> 
> softirq: Don't try waking ksoftirqd before it has been spawned
> 
> it allows us to keep RCU-tasks initialization before even
> early_initcall() where it is now and let our rcu-self-test
> to be completed without any hanging.

In short, this window of time in which it is not possible to reliably
wait on a softirq handler has caused trouble, just as several other
similar boot-sequence time windows have caused trouble in the past.
It therefore makes sense to just eliminate this problem, and prevent
future developers from facing inexplicable silent boot-time hangs.

We can move the spawning of ksoftirqd kthreads earlier, but that
simply narrows the window.  It does not eliminate the problem.

I can easily believe that this might have -rt consequences that need
attention.  For your amusement, I will make a few guesses as to what
these might be:

o	Back-of-interrupt softirq handlers degrade real-time response.
	This should not be a problem this early in boot, and once the
	ksoftirqd kthreads are spawned, there will never be another
	back-of-interrupt softirq handler in kernels that have
	force_irqthreads set, which includes -rt kernels.

o	That !__this_cpu_read(ksoftirqd) check remains at runtime, even
	though it always evaluates to false.  I would be surprised if
	this overhead is measurable at the system level, but if it is,
	static branches should take care of this.

o	There might be a -rt lockdep check that isn't happy with
	back-of-interrupt softirq handlers.  But such a lockdep check
	could be conditioned on __this_cpu_read(ksoftirqd), thus
	preventing it from firing during that short window at boot time.

o	The -rt kernels might be using locks to implement things like
	local_bh_disable(), in which case back-of-interrupt softirq
	handlers could result in self-deadlock.  This could be addressed
	by disabling bh the old way up to the time that the ksoftirqd
	kthreads are created.  Because these are created while the system
	is running on a single CPU (right?), a simple flag (or static
	branch) could be used to switch this behavior into lock-only
	mode long before the first real-time application can be spawned.

So my turn.  Did I miss anything?

							Thanx, Paul
Thomas Gleixner April 14, 2021, 11:54 p.m. UTC | #6
Paul,

On Wed, Apr 14 2021 at 11:11, Paul E. McKenney wrote:
> On Wed, Apr 14, 2021 at 10:57:57AM +0200, Uladzislau Rezki wrote:
>> On Wed, Apr 14, 2021 at 09:13:22AM +0200, Sebastian Andrzej Siewior wrote:
>> At the same time Paul made another patch:
>> 
>> softirq: Don't try waking ksoftirqd before it has been spawned
>> 
>> it allows us to keep RCU-tasks initialization before even
>> early_initcall() where it is now and let our rcu-self-test
>> to be completed without any hanging.
>
> In short, this window of time in which it is not possible to reliably
> wait on a softirq handler has caused trouble, just as several other
> similar boot-sequence time windows have caused trouble in the past.
> It therefore makes sense to just eliminate this problem, and prevent
> future developers from facing inexplicable silent boot-time hangs.
>
> We can move the spawning of ksoftirqd kthreads earlier, but that
> simply narrows the window.  It does not eliminate the problem.
>
> I can easily believe that this might have -rt consequences that need
> attention.  For your amusement, I will make a few guesses as to what
> these might be:
>
> o	Back-of-interrupt softirq handlers degrade real-time response.
> 	This should not be a problem this early in boot, and once the
> 	ksoftirqd kthreads are spawned, there will never be another
> 	back-of-interrupt softirq handler in kernels that have
> 	force_irqthreads set, which includes -rt kernels.

Not a problem obviously.

> o	That !__this_cpu_read(ksoftirqd) check remains at runtime, even
> 	though it always evaluates to false.  I would be surprised if
> 	this overhead is measurable at the system level, but if it is,
> 	static branches should take care of this.

Agreed.

> o	There might be a -rt lockdep check that isn't happy with
> 	back-of-interrupt softirq handlers.  But such a lockdep check
> 	could be conditioned on __this_cpu_read(ksoftirqd), thus
> 	preventing it from firing during that short window at boot time.

It's not like there are only a handful of lockdep invocations which need
to be taken care of. The lockdep checks are mostly inside of lock
operations and if lockdep has recorded back-of-interrupt context once
during boot it will complain about irqs enabled context usage later on
no matter what.

If you can come up with a reasonable implementation of that without
losing valuable lockdep coverage and without creating a major mess in
the code then I'm all ears.

But lockdep is just one of the problems

> o	The -rt kernels might be using locks to implement things like
> 	local_bh_disable(), in which case back-of-interrupt softirq
> 	handlers could result in self-deadlock.  This could be addressed
> 	by disabling bh the old way up to the time that the ksoftirqd
> 	kthreads are created.  Because these are created while the system
> 	is running on a single CPU (right?), a simple flag (or static
> 	branch) could be used to switch this behavior into lock-only
> 	mode long before the first real-time application can be spawned.

That has absolutely nothing to do with the first real-time application
at all and just looking at the local_bh_disable() part does not cut it
either.

The point is that the fundamental assumption of RT to break the non-rt
semantics of interrupts and soft interrupts in order to rely on always
thread context based serialization is going down the drain with that.

Surely we can come up with yet another pile of horrible hacks to work
around that, which in turn will cause more problems than they solve.

But what for? Just to make it possible to issue rcu_whatever() during
early boot just because?

Give me _one_ serious technical reason why this is absolutely required
and why the accidental misplacement which might happen due to
unawareness, sloppyness or incompetence is reason enough to create a
horrible clusterfail just for purely academic reasons.

None of the usecases at hand did have any reason to depend on the early
boot behaviour of softirqs and unless you can come up with something
really compelling the proper solution is to revert this commit and add
the following instead:

--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -74,7 +74,10 @@ static void wakeup_softirqd(void)
 	/* Interrupts are disabled: no need to stop preemption */
 	struct task_struct *tsk = __this_cpu_read(ksoftirqd);
 
-	if (tsk && tsk->state != TASK_RUNNING)
+	if (WARN_ON_ONCE(!tsk))
+		return;
+
+	if (tsk->state != TASK_RUNNING)
 		wake_up_process(tsk);
 }
 
which will trigger on any halfways usefull CI infrastructure which cares
about the !default use cases of the kernel.

Putting a restriction like that onto the early boot process is not the
end of the world and might teach people who think that their oh so
important stuff is not that important at all especially not during the
early boot phase which is fragile enough already.

There is absolutely no need to proliferate cargo cult programming.

Keep It Simple ...

> So my turn.  Did I miss anything?

Maybe :)

Thanks,

        tglx
Paul E. McKenney April 15, 2021, 5:02 a.m. UTC | #7
On Thu, Apr 15, 2021 at 01:54:18AM +0200, Thomas Gleixner wrote:
> Paul,
> 
> On Wed, Apr 14 2021 at 11:11, Paul E. McKenney wrote:
> > On Wed, Apr 14, 2021 at 10:57:57AM +0200, Uladzislau Rezki wrote:
> >> On Wed, Apr 14, 2021 at 09:13:22AM +0200, Sebastian Andrzej Siewior wrote:
> >> At the same time Paul made another patch:
> >> 
> >> softirq: Don't try waking ksoftirqd before it has been spawned
> >> 
> >> it allows us to keep RCU-tasks initialization before even
> >> early_initcall() where it is now and let our rcu-self-test
> >> to be completed without any hanging.
> >
> > In short, this window of time in which it is not possible to reliably
> > wait on a softirq handler has caused trouble, just as several other
> > similar boot-sequence time windows have caused trouble in the past.
> > It therefore makes sense to just eliminate this problem, and prevent
> > future developers from facing inexplicable silent boot-time hangs.
> >
> > We can move the spawning of ksoftirqd kthreads earlier, but that
> > simply narrows the window.  It does not eliminate the problem.
> >
> > I can easily believe that this might have -rt consequences that need
> > attention.  For your amusement, I will make a few guesses as to what
> > these might be:
> >
> > o	Back-of-interrupt softirq handlers degrade real-time response.
> > 	This should not be a problem this early in boot, and once the
> > 	ksoftirqd kthreads are spawned, there will never be another
> > 	back-of-interrupt softirq handler in kernels that have
> > 	force_irqthreads set, which includes -rt kernels.
> 
> Not a problem obviously.
> 
> > o	That !__this_cpu_read(ksoftirqd) check remains at runtime, even
> > 	though it always evaluates to false.  I would be surprised if
> > 	this overhead is measurable at the system level, but if it is,
> > 	static branches should take care of this.
> 
> Agreed.
> 
> > o	There might be a -rt lockdep check that isn't happy with
> > 	back-of-interrupt softirq handlers.  But such a lockdep check
> > 	could be conditioned on __this_cpu_read(ksoftirqd), thus
> > 	preventing it from firing during that short window at boot time.
> 
> It's not like there are only a handful of lockdep invocations which need
> to be taken care of. The lockdep checks are mostly inside of lock
> operations and if lockdep has recorded back-of-interrupt context once
> during boot it will complain about irqs enabled context usage later on
> no matter what.
> 
> If you can come up with a reasonable implementation of that without
> losing valuable lockdep coverage and without creating a major mess in
> the code then I'm all ears.

My naive thought was something vaguely like this in invoke_softirq():

static inline void invoke_softirq(void)
{
	if (ksoftirqd_running(local_softirq_pending()))
		return;

	if (!force_irqthreads || !__this_cpu_read(ksoftirqd)) {
		if (force_irqthreads && !__this_cpu_read(ksoftirqd))
			lockdep_off();
#ifdef CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK
		/*
		 * We can safely execute softirq on the current stack if
		 * it is the irq stack, because it should be near empty
		 * at this stage.
		 */
		__do_softirq();
#else
		/*
		 * Otherwise, irq_exit() is called on the task stack that can
		 * be potentially deep already. So call softirq in its own stack
		 * to prevent from any overrun.
		 */
		do_softirq_own_stack();
#endif
		if (force_irqthreads && !__this_cpu_read(ksoftirqd))
			lockdep_on();
	} else {
		wakeup_softirqd();
	}
}

If I am reading the code correctly (ha!), this prevents locks from being
recorded during that short piece of the boot process, but vanilla kernels
would collect lockdep information during that time as well.

Similar changes would be needed elsewhere, which could easily get into
"mess" territory, and maybe even "major mess" territory.

> But lockdep is just one of the problems
> 
> > o	The -rt kernels might be using locks to implement things like
> > 	local_bh_disable(), in which case back-of-interrupt softirq
> > 	handlers could result in self-deadlock.  This could be addressed
> > 	by disabling bh the old way up to the time that the ksoftirqd
> > 	kthreads are created.  Because these are created while the system
> > 	is running on a single CPU (right?), a simple flag (or static
> > 	branch) could be used to switch this behavior into lock-only
> > 	mode long before the first real-time application can be spawned.
> 
> That has absolutely nothing to do with the first real-time application
> at all and just looking at the local_bh_disable() part does not cut it
> either.
> 
> The point is that the fundamental assumption of RT to break the non-rt
> semantics of interrupts and soft interrupts in order to rely on always
> thread context based serialization is going down the drain with that.
> 
> Surely we can come up with yet another pile of horrible hacks to work
> around that, which in turn will cause more problems than they solve.
> 
> But what for? Just to make it possible to issue rcu_whatever() during
> early boot just because?
> 
> Give me _one_ serious technical reason why this is absolutely required
> and why the accidental misplacement which might happen due to
> unawareness, sloppyness or incompetence is reason enough to create a
> horrible clusterfail just for purely academic reasons.

I have just been burned by mid-boot madness more than once.

But yes, this is not an emergency, not that I know of, not yet, anyway.
Thus, reverting this makes sense.  The merge window is not very far away,
and attempting to get anything resembling this right in that timeframe
would be quite brave.

> None of the usecases at hand did have any reason to depend on the early
> boot behaviour of softirqs and unless you can come up with something
> really compelling the proper solution is to revert this commit and add
> the following instead:
> 
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -74,7 +74,10 @@ static void wakeup_softirqd(void)
>  	/* Interrupts are disabled: no need to stop preemption */
>  	struct task_struct *tsk = __this_cpu_read(ksoftirqd);
>  
> -	if (tsk && tsk->state != TASK_RUNNING)
> +	if (WARN_ON_ONCE(!tsk))
> +		return;
> +
> +	if (tsk->state != TASK_RUNNING)
>  		wake_up_process(tsk);
>  }
>  
> which will trigger on any halfways usefull CI infrastructure which cares
> about the !default use cases of the kernel.

This gives false positives because it is OK to pile up requests for
softirq handlers.  When ksoftirqd starts, it will then invoke any
handlers that were requested.  The thing that is not OK is to block the
boot process waiting on those handlers to finish.

> Putting a restriction like that onto the early boot process is not the
> end of the world and might teach people who think that their oh so
> important stuff is not that important at all especially not during the
> early boot phase which is fragile enough already.

"But it is so simple without -rt!"  ;-)

I suppose that we could put a check in the scheduling-clock interrupt
under appropriate debug.  If ksoftirqd has not started one minute into
boot, complain.  Maybe "complain" includes a call to show_state(),
just to help convince people to use faster console hardware.

> There is absolutely no need to proliferate cargo cult programming.

You lost me on this one.

> Keep It Simple ...

NR_CPUS=0!  It is the only way!!!

> > So my turn.  Did I miss anything?
> 
> Maybe :)

Another approach is to move the spawning of ksoftirqd earlier.  This
still leaves a window of vulnerability, but the window is smaller, and
thus the probablity of something needing to happen there is smaller.
Uladzislau sent out a patch that did this some weeks back.

							Thanx, Paul
Uladzislau Rezki April 15, 2021, 2:34 p.m. UTC | #8
> 
> Another approach is to move the spawning of ksoftirqd earlier.  This
> still leaves a window of vulnerability, but the window is smaller, and
> thus the probablity of something needing to happen there is smaller.
> Uladzislau sent out a patch that did this some weeks back.
> 
See below the patch that is in question, just in case:

<snip>
commit f4cd768e341486655c8c196e1f2b48a4463541f3
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Fri Feb 12 16:41:05 2021 -0800

    softirq: Don't try waking ksoftirqd before it has been spawned

    If there is heavy softirq activity, the softirq system will attempt
    to awaken ksoftirqd and will stop the traditional back-of-interrupt
    softirq processing.  This is all well and good, but only if the
    ksoftirqd kthreads already exist, which is not the case during early
    boot, in which case the system hangs.

    One reproducer is as follows:

    tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 2 --configs "TREE03" --kconfig "CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y" --bootargs "threadirqs=1" --trust-make

    This commit therefore moves the spawning of the ksoftirqd kthreads
    earlier in boot.  With this change, the above test passes.

    Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Reported-by: Uladzislau Rezki <urezki@gmail.com>
    Inspired-by: Uladzislau Rezki <urezki@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index bb8ff90..283a02d 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -592,6 +592,8 @@ static inline struct task_struct *this_cpu_ksoftirqd(void)
        return this_cpu_read(ksoftirqd);
 }

+int spawn_ksoftirqd(void);
+
 /* Tasklets --- multithreaded analogue of BHs.

    This API is deprecated. Please consider using threaded IRQs instead:
diff --git a/init/main.c b/init/main.c
index c68d784..99835bb 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1512,6 +1512,7 @@ static noinline void __init kernel_init_freeable(void)

        init_mm_internals();

+       spawn_ksoftirqd();
        rcu_init_tasks_generic();
        do_pre_smp_initcalls();
        lockup_detector_init();
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 9d71046..45d50d4 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -724,7 +724,7 @@ static struct smp_hotplug_thread softirq_threads = {
        .thread_comm            = "ksoftirqd/%u",
 };

-static __init int spawn_ksoftirqd(void)
+__init int spawn_ksoftirqd(void)
 {
        cpuhp_setup_state_nocalls(CPUHP_SOFTIRQ_DEAD, "softirq:dead", NULL,
                                  takeover_tasklets);
@@ -732,7 +732,6 @@ static __init int spawn_ksoftirqd(void)



        return 0;
 }
-early_initcall(spawn_ksoftirqd);

 /*
  * [ These __weak aliases are kept in a separate compilation unit, so that
<snip>

Thanks.

--
Vlad Rezki

Patch
diff mbox series

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 9908ec4..bad14ca 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -211,7 +211,7 @@  static inline void invoke_softirq(void)
 	if (ksoftirqd_running(local_softirq_pending()))
 		return;
 
-	if (!force_irqthreads) {
+	if (!force_irqthreads || !__this_cpu_read(ksoftirqd)) {
 #ifdef CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK
 		/*
 		 * We can safely execute softirq on the current stack if