linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Fix PATCH] cpu/hotplug: Fix bug report when add "nosmt" parameter with CONFIG_HOTPLUG_CPU=N
@ 2019-03-25 13:51 lantianyu1986
  2019-03-25 14:16 ` Thomas Gleixner
  2019-03-25 19:31 ` Greg KH
  0 siblings, 2 replies; 5+ messages in thread
From: lantianyu1986 @ 2019-03-25 13:51 UTC (permalink / raw)
  To: tglx, mingo, konrad.wilk, jpoimboe, peterz, mojha, peterz,
	jkosina, riel, luto, Tianyu.Lan, michael.h.kelley, kys, gregkh
  Cc: linux-kernel, stable

From: Lan Tianyu <Tianyu.Lan@microsoft.com>

When add "nosmt" parameter, kernel still boots up all logical cpus once
and set CR4.MCE on each CPU. This is to avoid shutting down machine
when a broadacasted MCE is observed CR4.MCE=0b. (Detail please see comment
in the cpu_smt_allowed()). Smt cpus will bring up and bring down during
kernel boot with "nosmt" parameter.

When CONFIG_HOTPLUG_CPU=Y, CPU_DYING callbacks will be called inside
stop-machine and irq is disabled. This happens in the take_cpu_down()
callback. When CONFIG_HOTPLUG_CPU=N,CPU_DYING callbacks will be called
with irq enabled.

smpcfd_dying_cpu() is one of CPU_DYING callbacks and it assumes to be
called when irq is disabled. smpcfd_dying_cpu() calls flush_smp_call_
function_queue() which requires to be called with irq disabled.

When CONFIG_HOTPLUG_CPU=N and add "nosmt" parameter, smpcfd_dying_cpu()
is called with irq enalbed and this triggers BUG_ON(!irqs_disabled())
in the irq_work_run_list(). This patch is to fix the issue.

Fixes: 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once")
Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
---
 kernel/smp.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/smp.c b/kernel/smp.c
index f4cf1b0..33f1970 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -73,6 +73,8 @@ int smpcfd_dead_cpu(unsigned int cpu)
 
 int smpcfd_dying_cpu(unsigned int cpu)
 {
+	unsigned long flags;
+
 	/*
 	 * The IPIs for the smp-call-function callbacks queued by other
 	 * CPUs might arrive late, either due to hardware latencies or
@@ -82,7 +84,10 @@ int smpcfd_dying_cpu(unsigned int cpu)
 	 * ensure that the outgoing CPU doesn't go offline with work
 	 * still pending.
 	 */
+	local_irq_save(flags);
 	flush_smp_call_function_queue(false);
+	local_irq_restore(flags);
+
 	return 0;
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [Fix PATCH] cpu/hotplug: Fix bug report when add "nosmt" parameter with CONFIG_HOTPLUG_CPU=N
  2019-03-25 13:51 [Fix PATCH] cpu/hotplug: Fix bug report when add "nosmt" parameter with CONFIG_HOTPLUG_CPU=N lantianyu1986
@ 2019-03-25 14:16 ` Thomas Gleixner
  2019-03-25 22:39   ` Thomas Gleixner
  2019-03-25 19:31 ` Greg KH
  1 sibling, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2019-03-25 14:16 UTC (permalink / raw)
  To: lantianyu1986
  Cc: mingo, konrad.wilk, jpoimboe, peterz, mojha, peterz, jkosina,
	riel, luto, Tianyu.Lan, michael.h.kelley, kys, gregkh,
	linux-kernel, stable

Tianyu,

On Mon, 25 Mar 2019, lantianyu1986@gmail.com wrote:

thanks for tracking this down.

A few nit picks vs. the subject first:

> Subject: [Fix PATCH] cpu/hotplug: Fix bug report when add "nosmt" parameter with CONFIG_HOTPLUG_CPU=N

[PATCH] is sufficient. The extra 'Fix' has no real value. 'Fix bug report'
is weird. Bug reports cannot be fixed, only bugs can be fixed. Also please
avoid overly long subjects. Subjects should be concise and describe the
problem/change as precise as it goes.

> When add "nosmt" parameter, kernel still boots up all logical cpus once
> and set CR4.MCE on each CPU. This is to avoid shutting down machine
> when a broadacasted MCE is observed CR4.MCE=0b. (Detail please see comment
> in the cpu_smt_allowed()). Smt cpus will bring up and bring down during
> kernel boot with "nosmt" parameter.
> 
> When CONFIG_HOTPLUG_CPU=Y, CPU_DYING callbacks will be called inside
> stop-machine and irq is disabled. This happens in the take_cpu_down()
> callback. When CONFIG_HOTPLUG_CPU=N,CPU_DYING callbacks will be called
> with irq enabled.

Yes, that's bad.

> smpcfd_dying_cpu() is one of CPU_DYING callbacks and it assumes to be
> called when irq is disabled. smpcfd_dying_cpu() calls flush_smp_call_
> function_queue() which requires to be called with irq disabled.
> 
> When CONFIG_HOTPLUG_CPU=N and add "nosmt" parameter, smpcfd_dying_cpu()
> is called with irq enalbed and this triggers BUG_ON(!irqs_disabled())
> in the irq_work_run_list(). This patch is to fix the issue.

That's not a fix, it's papering over the root cause. As you figured out
already, the DYING callbacks should be called with interrupts disabled.

So rather than "fixing" that particular callback we need to look at the
reason why these callbacks are invoked with interrupts enabled and fix
that.

> Fixes: 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once")

That has nothing to do with 'nosmt'. It's a general bug in the rollback
code when HOTPLUG_CPU=n. 'nosmt' is using the rollback mechanism and is
just a reliable way to trigger the problem. This happens in the same way
when the bringup of a CPU fails for any other reason. That bug was there
way before 0cc3cd21657b.

I'll have a look, but I fear that needs some surgery to fix.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Fix PATCH] cpu/hotplug: Fix bug report when add "nosmt" parameter with CONFIG_HOTPLUG_CPU=N
  2019-03-25 13:51 [Fix PATCH] cpu/hotplug: Fix bug report when add "nosmt" parameter with CONFIG_HOTPLUG_CPU=N lantianyu1986
  2019-03-25 14:16 ` Thomas Gleixner
@ 2019-03-25 19:31 ` Greg KH
  1 sibling, 0 replies; 5+ messages in thread
From: Greg KH @ 2019-03-25 19:31 UTC (permalink / raw)
  To: lantianyu1986
  Cc: tglx, mingo, konrad.wilk, jpoimboe, peterz, mojha, peterz,
	jkosina, riel, luto, Tianyu.Lan, michael.h.kelley, kys,
	linux-kernel, stable

On Mon, Mar 25, 2019 at 09:51:23PM +0800, lantianyu1986@gmail.com wrote:
> From: Lan Tianyu <Tianyu.Lan@microsoft.com>
> 
> When add "nosmt" parameter, kernel still boots up all logical cpus once
> and set CR4.MCE on each CPU. This is to avoid shutting down machine
> when a broadacasted MCE is observed CR4.MCE=0b. (Detail please see comment
> in the cpu_smt_allowed()). Smt cpus will bring up and bring down during
> kernel boot with "nosmt" parameter.
> 
> When CONFIG_HOTPLUG_CPU=Y, CPU_DYING callbacks will be called inside
> stop-machine and irq is disabled. This happens in the take_cpu_down()
> callback. When CONFIG_HOTPLUG_CPU=N,CPU_DYING callbacks will be called
> with irq enabled.
> 
> smpcfd_dying_cpu() is one of CPU_DYING callbacks and it assumes to be
> called when irq is disabled. smpcfd_dying_cpu() calls flush_smp_call_
> function_queue() which requires to be called with irq disabled.
> 
> When CONFIG_HOTPLUG_CPU=N and add "nosmt" parameter, smpcfd_dying_cpu()
> is called with irq enalbed and this triggers BUG_ON(!irqs_disabled())
> in the irq_work_run_list(). This patch is to fix the issue.
> 
> Fixes: 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once")
> Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
> ---
>  kernel/smp.c | 5 +++++
>  1 file changed, 5 insertions(+)

<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read:
    https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.

</formletter>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Fix PATCH] cpu/hotplug: Fix bug report when add "nosmt" parameter with CONFIG_HOTPLUG_CPU=N
  2019-03-25 14:16 ` Thomas Gleixner
@ 2019-03-25 22:39   ` Thomas Gleixner
  2019-03-26  8:44     ` Tianyu Lan
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2019-03-25 22:39 UTC (permalink / raw)
  To: lantianyu1986
  Cc: Ingo Molnar, konrad.wilk, jpoimboe, mojha, Peter Zijlstra,
	Jiri Kosina, riel, Andy Lutomirski, Tianyu.Lan, michael.h.kelley,
	kys, Greg KH, LKML, stable, Linus Torvalds, Borislav Petkov

On Mon, 25 Mar 2019, Thomas Gleixner wrote:
> That has nothing to do with 'nosmt'. It's a general bug in the rollback
> code when HOTPLUG_CPU=n. 'nosmt' is using the rollback mechanism and is
> just a reliable way to trigger the problem. This happens in the same way
> when the bringup of a CPU fails for any other reason. That bug was there
> way before 0cc3cd21657b.
> 
> I'll have a look, but I fear that needs some surgery to fix.

This is SMP=y HOTPLUG_CPU=n case is broken and was broken forever.

Of course that tglx moron did not notice this particular wreckage when
cleaning up the hotplug code. So we just kept the historical brainfart
around.

The problem is that a failure in the bringup path triggers the rollback
down to the point where the CPU is dead and resources are cleaned up.

But the interesting part of the rollback, i.e. takedown_cpu(), is not
available when CONFIG_HOTPLUG_CPU=n.

As a consequence the former CPU_DYING callbacks are invoked on the control
CPU with interrupts disabled. That rightfully explodes and even if
interrupts were disabled then it's still violating the fundamental
requirements of CPU unplug which include stop machine and various
synchronizations in facilities like RCU which are all unavailable with
HOTPLUG_CPU=n.

The pre state machine code has a different failure mode, but the end result
is a straight forward crash as well. Just less obvious to decode.

So the SMP=y HOTPLUG_CPU=n onlining is just working by chance as long as
none of the bringup callbacks fails.

'nosmt' exposes this flaw nicely because the wonderful MCE design of x86
forces us to bringup the sibling thread and then tear it down right
away. That's equivalent to a callback failure and triggers the same
rollback code which is broken...

But this is not a x86 only problem. Any architecture which supports the
SMP=y HOTPLUG_CPU=n combination suffers from the same issue. It's just less
likely to be triggered because in 99.99999% of the cases all bringup
callbacks succeed.

Now making HOTPLUG_CPU mandatory for SMP is not working on all
architectures as the following architectures have either no hotplug support
at all or not all subarchitectures support it:

alpha, arc, hexagon, openrisc, riscv, sparc (32bit), mips (partial).

So I looked into a 'minimal rollback' for the HOTPLUG=n case which prevents
at least the full crash. That would mean:

 - the CPU is brought down to the point where the stop_machine takedown
   would happen.

 - the CPU stays there forever and is idle

 - The CPU is cleared in the CPU active mask, but not in the CPU online
   mask which is a legit state.

 - Interrupts are not forced away from the CPU

 - All facilities which only look at online mask would still see it, but
   that is the case on normal hotplug/unplug operations as well. It's just
   a longer time frame.

There might be some subtle stuff broken, but that's something we should
figure out anyway as it's then also broken during unplug to the point where
it is actually taken down. Letting it stay there just increases the time
window. Same applies for the onlining path.

And indeed with that minimal rollback workaround in place this makes the
vmstat code trigger an preemtible warning because it schedules work on a
CPU which is online but inactive. The workqueue for that CPU is not bound
yet and that triggers a preemptible warning of some sort over and over.

And that's not a problem of the workaround. Just onlining the CPU to the
same point makes that problem visible as well.

A simple test which delayed the full onlining of a CPU and a test module
scheduling work on this already marked online CPU triggers the same issue
when the worker callback uses e.g. this_cpu_read(). So there is some nasty
crap lurking all over the place.

We surely want to fix the fallout of this, but I think the sane short term
solution for x86 is to enforce HOTPLUG_CPU when SMP is enabled which is
also the sane and simple fix for backporting. That 'nosmt' stuff is popular
nowadays for some reasons.

Minimal rollback patch to prevent crashing below.

Thoughts?

	tglx

8<----------------
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -564,6 +564,20 @@ static void undo_cpu_up(unsigned int cpu
 		cpuhp_invoke_callback(cpu, st->state, false, NULL, NULL);
 }
 
+static inline bool can_rollback_cpu(struct cpuhp_cpu_state *st)
+{
+	if (IS_ENABLED(CONFIG_HOTPLUG_CPU))
+		return true;
+	/*
+	 * When CPU hotplug is disabled, then taking the CPU down is not
+	 * possible because takedown_cpu() and the architecture and
+	 * subsystem specific mechanisms are not available. So the CPU
+	 * which would be completely unplugged again needs to stay around
+	 * in the current state, i.e. <= CPUHP_AP_ONLINE_IDLE.
+	 */
+	return st->state <= CPUHP_BRINGUP_CPU;
+}
+
 static int cpuhp_up_callbacks(unsigned int cpu, struct cpuhp_cpu_state *st,
 			      enum cpuhp_state target)
 {
@@ -574,8 +588,10 @@ static int cpuhp_up_callbacks(unsigned i
 		st->state++;
 		ret = cpuhp_invoke_callback(cpu, st->state, true, NULL, NULL);
 		if (ret) {
-			st->target = prev_state;
-			undo_cpu_up(cpu, st);
+			if (can_rollback_cpu(st)) {
+				st->target = prev_state;
+				undo_cpu_up(cpu, st);
+			}
 			break;
 		}
 	}


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Fix PATCH] cpu/hotplug: Fix bug report when add "nosmt" parameter with CONFIG_HOTPLUG_CPU=N
  2019-03-25 22:39   ` Thomas Gleixner
@ 2019-03-26  8:44     ` Tianyu Lan
  0 siblings, 0 replies; 5+ messages in thread
From: Tianyu Lan @ 2019-03-26  8:44 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Konrad Rzeszutek Wilk, jpoimboe, mojha,
	Peter Zijlstra, Jiri Kosina, Rik van Riel, Andy Lutomirski,
	Tianyu Lan, michael.h.kelley, KY Srinivasan, Greg KH, LKML,
	stable, Linus Torvalds, Borislav Petkov

Hi Thomas:
           Thanks for your review.

On Tue, Mar 26, 2019 at 6:39 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Mon, 25 Mar 2019, Thomas Gleixner wrote:
> > That has nothing to do with 'nosmt'. It's a general bug in the rollback
> > code when HOTPLUG_CPU=n. 'nosmt' is using the rollback mechanism and is
> > just a reliable way to trigger the problem. This happens in the same way
> > when the bringup of a CPU fails for any other reason. That bug was there
> > way before 0cc3cd21657b.
> >
> > I'll have a look, but I fear that needs some surgery to fix.
>
> This is SMP=y HOTPLUG_CPU=n case is broken and was broken forever.
>
> Of course that tglx moron did not notice this particular wreckage when
> cleaning up the hotplug code. So we just kept the historical brainfart
> around.
>
> The problem is that a failure in the bringup path triggers the rollback
> down to the point where the CPU is dead and resources are cleaned up.
>
> But the interesting part of the rollback, i.e. takedown_cpu(), is not
> available when CONFIG_HOTPLUG_CPU=n.
>
> As a consequence the former CPU_DYING callbacks are invoked on the control
> CPU with interrupts disabled. That rightfully explodes and even if
> interrupts were disabled then it's still violating the fundamental
> requirements of CPU unplug which include stop machine and various
> synchronizations in facilities like RCU which are all unavailable with
> HOTPLUG_CPU=n.
>
> The pre state machine code has a different failure mode, but the end result
> is a straight forward crash as well. Just less obvious to decode.
>
> So the SMP=y HOTPLUG_CPU=n onlining is just working by chance as long as
> none of the bringup callbacks fails.
>
> 'nosmt' exposes this flaw nicely because the wonderful MCE design of x86
> forces us to bringup the sibling thread and then tear it down right
> away. That's equivalent to a callback failure and triggers the same
> rollback code which is broken...
>
> But this is not a x86 only problem. Any architecture which supports the
> SMP=y HOTPLUG_CPU=n combination suffers from the same issue. It's just less
> likely to be triggered because in 99.99999% of the cases all bringup
> callbacks succeed.
>
> Now making HOTPLUG_CPU mandatory for SMP is not working on all
> architectures as the following architectures have either no hotplug support
> at all or not all subarchitectures support it:
>
> alpha, arc, hexagon, openrisc, riscv, sparc (32bit), mips (partial).
>
> So I looked into a 'minimal rollback' for the HOTPLUG=n case which prevents
> at least the full crash. That would mean:
>
>  - the CPU is brought down to the point where the stop_machine takedown
>    would happen.
>
>  - the CPU stays there forever and is idle
>
>  - The CPU is cleared in the CPU active mask, but not in the CPU online
>    mask which is a legit state.
>
>  - Interrupts are not forced away from the CPU
>
>  - All facilities which only look at online mask would still see it, but
>    that is the case on normal hotplug/unplug operations as well. It's just
>    a longer time frame.
>
> There might be some subtle stuff broken, but that's something we should
> figure out anyway as it's then also broken during unplug to the point where
> it is actually taken down. Letting it stay there just increases the time
> window. Same applies for the onlining path.
>
> And indeed with that minimal rollback workaround in place this makes the
> vmstat code trigger an preemtible warning because it schedules work on a
> CPU which is online but inactive. The workqueue for that CPU is not bound
> yet and that triggers a preemptible warning of some sort over and over.
>
> And that's not a problem of the workaround. Just onlining the CPU to the
> same point makes that problem visible as well.
>
> A simple test which delayed the full onlining of a CPU and a test module
> scheduling work on this already marked online CPU triggers the same issue
> when the worker callback uses e.g. this_cpu_read(). So there is some nasty
> crap lurking all over the place.
>
> We surely want to fix the fallout of this, but I think the sane short term
> solution for x86 is to enforce HOTPLUG_CPU when SMP is enabled which is
> also the sane and simple fix for backporting. That 'nosmt' stuff is popular
> nowadays for some reasons.
>
> Minimal rollback patch to prevent crashing below.
>
> Thoughts?

Yes, you are right. The "nosmt" parameter just exposes the issue and
root cause is that
DYING callbacks should be called with interrupts disabled.I didn't
notice synchronizations
(e.g RCU) were also not available when HOTPLUG_CPU=n. Other
synchronization issues
maybe not exposed.

Agree. Force HOTPLUG_CPU=Y if SMP=Y on x86 would be a quick fix and it's easy to
be backported.

>
>         tglx
>
> 8<----------------
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -564,6 +564,20 @@ static void undo_cpu_up(unsigned int cpu
>                 cpuhp_invoke_callback(cpu, st->state, false, NULL, NULL);
>  }
>
> +static inline bool can_rollback_cpu(struct cpuhp_cpu_state *st)
> +{
> +       if (IS_ENABLED(CONFIG_HOTPLUG_CPU))
> +               return true;
> +       /*
> +        * When CPU hotplug is disabled, then taking the CPU down is not
> +        * possible because takedown_cpu() and the architecture and
> +        * subsystem specific mechanisms are not available. So the CPU
> +        * which would be completely unplugged again needs to stay around
> +        * in the current state, i.e. <= CPUHP_AP_ONLINE_IDLE.
> +        */
> +       return st->state <= CPUHP_BRINGUP_CPU;
> +}
> +
>  static int cpuhp_up_callbacks(unsigned int cpu, struct cpuhp_cpu_state *st,
>                               enum cpuhp_state target)
>  {
> @@ -574,8 +588,10 @@ static int cpuhp_up_callbacks(unsigned i
>                 st->state++;
>                 ret = cpuhp_invoke_callback(cpu, st->state, true, NULL, NULL);
>                 if (ret) {
> -                       st->target = prev_state;
> -                       undo_cpu_up(cpu, st);
> +                       if (can_rollback_cpu(st)) {
> +                               st->target = prev_state;
> +                               undo_cpu_up(cpu, st);
> +                       }
>                         break;
>                 }
>         }
>

I have tested your patch. It resolves the crash with "nosmt" parameter.
-- 
Best regards
Tianyu Lan

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-03-26  8:44 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-25 13:51 [Fix PATCH] cpu/hotplug: Fix bug report when add "nosmt" parameter with CONFIG_HOTPLUG_CPU=N lantianyu1986
2019-03-25 14:16 ` Thomas Gleixner
2019-03-25 22:39   ` Thomas Gleixner
2019-03-26  8:44     ` Tianyu Lan
2019-03-25 19:31 ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).