linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/2] cpu/hotplug: Prevent damage with SMP=y and HOTPLUG_CPU=n
@ 2019-03-26 16:36 Thomas Gleixner
  2019-03-26 16:36 ` [patch 1/2] cpu/hotplug: Prevent crash when CPU bringup fails on CONFIG_HOTPLUG_CPU=n Thomas Gleixner
  2019-03-26 16:36 ` [patch 2/2] x86/smp: Enforce CONFIG_HOTPLUG_CPU when SMP=y Thomas Gleixner
  0 siblings, 2 replies; 7+ messages in thread
From: Thomas Gleixner @ 2019-03-26 16:36 UTC (permalink / raw)
  To: LKML
  Cc: Tianyu Lan, Konrad Wilk, Josh Poimboeuf, Mukesh Ojha,
	Peter Zijlstra, Jiri Kosina, Rik van Riel, Andy Lutomirski,
	Micheal Kelley, K. Y. Srinivasan, Greg KH, Linus Torvalds,
	Borislav Petkov, x86

Tianyu reported a crash with SMP=y and HOTPLUG_CPU=n plus 'nosmt' on the
kernel command line.

  https://lkml.kernel.org/r/1553521883-20868-1-git-send-email-Tianyu.Lan@microsoft.com

The reason is a bug in the hotplug code which does not handle the fact,
that HOTPLUG_CPU=n cannot tear down a CPU completely.

Unfortunately HOTPLUG_CPU cannot be enforced as some architectures do not
support it at all.

The fix is only a workaround because a full solution is not possible due to
the limitations of HOTPLUG_CPU=n. So the CPU stays around in an undead state.

As 'nosmt' has become popular recently, the proper solution for X86 is to
enforce HOTPLUG_CPU when SMP is enabled.

Thanks,

	tglx
----
 arch/x86/Kconfig |    8 +-------
 kernel/cpu.c     |   20 ++++++++++++++++++--
 2 files changed, 19 insertions(+), 9 deletions(-)




^ permalink raw reply	[flat|nested] 7+ messages in thread

* [patch 1/2] cpu/hotplug: Prevent crash when CPU bringup fails on CONFIG_HOTPLUG_CPU=n
  2019-03-26 16:36 [patch 0/2] cpu/hotplug: Prevent damage with SMP=y and HOTPLUG_CPU=n Thomas Gleixner
@ 2019-03-26 16:36 ` Thomas Gleixner
  2019-03-27  1:12   ` Greg KH
  2019-03-28 12:38   ` [tip:smp/urgent] " tip-bot for Thomas Gleixner
  2019-03-26 16:36 ` [patch 2/2] x86/smp: Enforce CONFIG_HOTPLUG_CPU when SMP=y Thomas Gleixner
  1 sibling, 2 replies; 7+ messages in thread
From: Thomas Gleixner @ 2019-03-26 16:36 UTC (permalink / raw)
  To: LKML
  Cc: Tianyu Lan, Konrad Wilk, Josh Poimboeuf, Mukesh Ojha,
	Peter Zijlstra, Jiri Kosina, Rik van Riel, Andy Lutomirski,
	Micheal Kelley, K. Y. Srinivasan, Greg KH, Linus Torvalds,
	Borislav Petkov, x86

Tianyu reported a crash in a CPU hotplug teardown callback when booting a
kernel which has CONFIG_HOTPLUG_CPU disabled with the 'nosmt' boot
parameter.

It turns out that the SMP=y CONFIG_HOTPLUG_CPU=n case has been broken
forever in case that a bringup callback fails. Unfortunately this issue was
not recognized when the CPU hotplug code was reworked, so the shortcoming
just stayed in place.

When a bringup callback fails, the CPU hotplug code rolls back the
operation and takes the CPU offline.

The 'nosmt' command line argument uses a bringup failure to abort the
bringup of SMT sibling CPUs. This partial bringup is required due to the
MCE misdesign on Intel CPUs.

With CONFIG_HOTPLUG_CPU=y the rollback works perfectly fine, but
CONFIG_HOTPLUG_CPU=n lacks essential mechanisms to exercise the low level
teardown of a CPU including the synchronizations in various facilities like
RCU, NOHZ and others.

As a consequence the teardown callbacks which must be executed on the
outgoing CPU within stop machine with interrupts disabled are executed on
the control CPU in interrupt enabled and preemptible context causing the
kernel to crash and burn. The pre state machine code has a different
failure mode which is more subtle and resulting in a less obvious use after
free crash because the control side frees resources which are still in use
by the undead CPU.

But this is not a x86 only problem. Any architecture which supports the
SMP=y HOTPLUG_CPU=n combination suffers from the same issue. It's just less
likely to be triggered because in 99.99999% of the cases all bringup
callbacks succeed.

The easy solution of making HOTPLUG_CPU mandatory for SMP is not working on
all architectures as the following architectures have either no hotplug
support at all or not all subarchitectures support it:

 alpha, arc, hexagon, openrisc, riscv, sparc (32bit), mips (partial).

Crashing the kernel in such a situation is not an acceptable state
either.

Implement a minimal rollback variant by limiting the teardown to the point
where all regular teardown callbacks have been invoked and leave the CPU in
the 'dead' idle state. This has the following consequences:

 - the CPU is brought down to the point where the stop_machine takedown
   would happen.

 - the CPU stays there forever and is idle

 - The CPU is cleared in the CPU active mask, but not in the CPU online
   mask which is a legit state.

 - Interrupts are not forced away from the CPU

 - All facilities which only look at online mask would still see it, but
   that is the case during normal hotplug/unplug operations as well. It's
   just a (way) longer time frame.

This will expose issues, which haven't been exposed before or only seldom,
because now the normally transient state of being non active but online is
a permanent state. In testing this exposed already an issue vs. work queues
where the vmstat code schedules work on the almost dead CPU which ends up
in an unbound workqueue and triggers 'preemtible context' warnings. This is
not a problem of this change, it merily exposes an already existing issue.
Still this is better than crashing fully without a chance to debug it.

This is mainly thought as workaround for those architectures which do not
support HOTPLUG_CPU. All others should enforce HOTPLUG_CPU for SMP.

Fixes: 2e1a3483ce74 ("cpu/hotplug: Split out the state walk into functions")
Reported-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Cc: Konrad Wilk <konrad.wilk@oracle.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Mukesh Ojha <mojha@codeaurora.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Rik van Riel <riel@surriel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Micheal Kelley <michael.h.kelley@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: stable@vger.kernel.org
---
 kernel/cpu.c |   20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -564,6 +564,20 @@ static void undo_cpu_up(unsigned int cpu
 		cpuhp_invoke_callback(cpu, st->state, false, NULL, NULL);
 }
 
+static inline bool can_rollback_cpu(struct cpuhp_cpu_state *st)
+{
+	if (IS_ENABLED(CONFIG_HOTPLUG_CPU))
+		return true;
+	/*
+	 * When CPU hotplug is disabled, then taking the CPU down is not
+	 * possible because takedown_cpu() and the architecture and
+	 * subsystem specific mechanisms are not available. So the CPU
+	 * which would be completely unplugged again needs to stay around
+	 * in the current state.
+	 */
+	return st->state <= CPUHP_BRINGUP_CPU;
+}
+
 static int cpuhp_up_callbacks(unsigned int cpu, struct cpuhp_cpu_state *st,
 			      enum cpuhp_state target)
 {
@@ -574,8 +588,10 @@ static int cpuhp_up_callbacks(unsigned i
 		st->state++;
 		ret = cpuhp_invoke_callback(cpu, st->state, true, NULL, NULL);
 		if (ret) {
-			st->target = prev_state;
-			undo_cpu_up(cpu, st);
+			if (can_rollback_cpu(st)) {
+				st->target = prev_state;
+				undo_cpu_up(cpu, st);
+			}
 			break;
 		}
 	}



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [patch 2/2] x86/smp: Enforce CONFIG_HOTPLUG_CPU when SMP=y
  2019-03-26 16:36 [patch 0/2] cpu/hotplug: Prevent damage with SMP=y and HOTPLUG_CPU=n Thomas Gleixner
  2019-03-26 16:36 ` [patch 1/2] cpu/hotplug: Prevent crash when CPU bringup fails on CONFIG_HOTPLUG_CPU=n Thomas Gleixner
@ 2019-03-26 16:36 ` Thomas Gleixner
  2019-03-27  1:12   ` Greg KH
  2019-03-28 12:39   ` [tip:smp/urgent] " tip-bot for Thomas Gleixner
  1 sibling, 2 replies; 7+ messages in thread
From: Thomas Gleixner @ 2019-03-26 16:36 UTC (permalink / raw)
  To: LKML
  Cc: Tianyu Lan, Konrad Wilk, Josh Poimboeuf, Mukesh Ojha,
	Peter Zijlstra, Jiri Kosina, Rik van Riel, Andy Lutomirski,
	Micheal Kelley, K. Y. Srinivasan, Greg KH, Linus Torvalds,
	Borislav Petkov, x86

The SMT disable 'nosmt' command line argument is not working properly when
CONFIG_HOTPLUG_CPU is disabled. The teardown of the sibling CPUs which are
required to be brought up due to the MCE issues, cannot work. The CPUs are
then kept in a half dead state.

As the 'nosmt' functionality has become popular due to the speculative
hardware vulnerabilities, the half torn down state is not a proper solution
to the problem.

Enforce CONFIG_HOTPLUG_CPU=y when SMP is enabled so the full operation is
possible.

Reported-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Konrad Wilk <konrad.wilk@oracle.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Mukesh Ojha <mojha@codeaurora.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Rik van Riel <riel@surriel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Micheal Kelley <michael.h.kelley@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: stable@vger.kernel.org
---
 arch/x86/Kconfig |    8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2217,14 +2217,8 @@ config RANDOMIZE_MEMORY_PHYSICAL_PADDING
 	   If unsure, leave at the default value.
 
 config HOTPLUG_CPU
-	bool "Support for hot-pluggable CPUs"
+	def_bool y
 	depends on SMP
-	---help---
-	  Say Y here to allow turning CPUs off and on. CPUs can be
-	  controlled through /sys/devices/system/cpu.
-	  ( Note: power management support will enable this option
-	    automatically on SMP systems. )
-	  Say N if you want to disable CPU hotplug.
 
 config BOOTPARAM_HOTPLUG_CPU0
 	bool "Set default setting of cpu0_hotpluggable"



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch 1/2] cpu/hotplug: Prevent crash when CPU bringup fails on CONFIG_HOTPLUG_CPU=n
  2019-03-26 16:36 ` [patch 1/2] cpu/hotplug: Prevent crash when CPU bringup fails on CONFIG_HOTPLUG_CPU=n Thomas Gleixner
@ 2019-03-27  1:12   ` Greg KH
  2019-03-28 12:38   ` [tip:smp/urgent] " tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 7+ messages in thread
From: Greg KH @ 2019-03-27  1:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Tianyu Lan, Konrad Wilk, Josh Poimboeuf, Mukesh Ojha,
	Peter Zijlstra, Jiri Kosina, Rik van Riel, Andy Lutomirski,
	Micheal Kelley, K. Y. Srinivasan, Linus Torvalds,
	Borislav Petkov, x86

On Tue, Mar 26, 2019 at 05:36:05PM +0100, Thomas Gleixner wrote:
> Tianyu reported a crash in a CPU hotplug teardown callback when booting a
> kernel which has CONFIG_HOTPLUG_CPU disabled with the 'nosmt' boot
> parameter.
> 
> It turns out that the SMP=y CONFIG_HOTPLUG_CPU=n case has been broken
> forever in case that a bringup callback fails. Unfortunately this issue was
> not recognized when the CPU hotplug code was reworked, so the shortcoming
> just stayed in place.
> 
> When a bringup callback fails, the CPU hotplug code rolls back the
> operation and takes the CPU offline.
> 
> The 'nosmt' command line argument uses a bringup failure to abort the
> bringup of SMT sibling CPUs. This partial bringup is required due to the
> MCE misdesign on Intel CPUs.
> 
> With CONFIG_HOTPLUG_CPU=y the rollback works perfectly fine, but
> CONFIG_HOTPLUG_CPU=n lacks essential mechanisms to exercise the low level
> teardown of a CPU including the synchronizations in various facilities like
> RCU, NOHZ and others.
> 
> As a consequence the teardown callbacks which must be executed on the
> outgoing CPU within stop machine with interrupts disabled are executed on
> the control CPU in interrupt enabled and preemptible context causing the
> kernel to crash and burn. The pre state machine code has a different
> failure mode which is more subtle and resulting in a less obvious use after
> free crash because the control side frees resources which are still in use
> by the undead CPU.
> 
> But this is not a x86 only problem. Any architecture which supports the
> SMP=y HOTPLUG_CPU=n combination suffers from the same issue. It's just less
> likely to be triggered because in 99.99999% of the cases all bringup
> callbacks succeed.
> 
> The easy solution of making HOTPLUG_CPU mandatory for SMP is not working on
> all architectures as the following architectures have either no hotplug
> support at all or not all subarchitectures support it:
> 
>  alpha, arc, hexagon, openrisc, riscv, sparc (32bit), mips (partial).
> 
> Crashing the kernel in such a situation is not an acceptable state
> either.
> 
> Implement a minimal rollback variant by limiting the teardown to the point
> where all regular teardown callbacks have been invoked and leave the CPU in
> the 'dead' idle state. This has the following consequences:
> 
>  - the CPU is brought down to the point where the stop_machine takedown
>    would happen.
> 
>  - the CPU stays there forever and is idle
> 
>  - The CPU is cleared in the CPU active mask, but not in the CPU online
>    mask which is a legit state.
> 
>  - Interrupts are not forced away from the CPU
> 
>  - All facilities which only look at online mask would still see it, but
>    that is the case during normal hotplug/unplug operations as well. It's
>    just a (way) longer time frame.
> 
> This will expose issues, which haven't been exposed before or only seldom,
> because now the normally transient state of being non active but online is
> a permanent state. In testing this exposed already an issue vs. work queues
> where the vmstat code schedules work on the almost dead CPU which ends up
> in an unbound workqueue and triggers 'preemtible context' warnings. This is
> not a problem of this change, it merily exposes an already existing issue.
> Still this is better than crashing fully without a chance to debug it.
> 
> This is mainly thought as workaround for those architectures which do not
> support HOTPLUG_CPU. All others should enforce HOTPLUG_CPU for SMP.
> 
> Fixes: 2e1a3483ce74 ("cpu/hotplug: Split out the state walk into functions")
> Reported-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Tested-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> Cc: Konrad Wilk <konrad.wilk@oracle.com>
> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> Cc: Mukesh Ojha <mojha@codeaurora.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Jiri Kosina <jkosina@suse.cz>
> Cc: Rik van Riel <riel@surriel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Micheal Kelley <michael.h.kelley@microsoft.com>
> Cc: K. Y. Srinivasan <kys@microsoft.com>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: stable@vger.kernel.org
> ---
>  kernel/cpu.c |   20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch 2/2] x86/smp: Enforce CONFIG_HOTPLUG_CPU when SMP=y
  2019-03-26 16:36 ` [patch 2/2] x86/smp: Enforce CONFIG_HOTPLUG_CPU when SMP=y Thomas Gleixner
@ 2019-03-27  1:12   ` Greg KH
  2019-03-28 12:39   ` [tip:smp/urgent] " tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 7+ messages in thread
From: Greg KH @ 2019-03-27  1:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Tianyu Lan, Konrad Wilk, Josh Poimboeuf, Mukesh Ojha,
	Peter Zijlstra, Jiri Kosina, Rik van Riel, Andy Lutomirski,
	Micheal Kelley, K. Y. Srinivasan, Linus Torvalds,
	Borislav Petkov, x86

On Tue, Mar 26, 2019 at 05:36:06PM +0100, Thomas Gleixner wrote:
> The SMT disable 'nosmt' command line argument is not working properly when
> CONFIG_HOTPLUG_CPU is disabled. The teardown of the sibling CPUs which are
> required to be brought up due to the MCE issues, cannot work. The CPUs are
> then kept in a half dead state.
> 
> As the 'nosmt' functionality has become popular due to the speculative
> hardware vulnerabilities, the half torn down state is not a proper solution
> to the problem.
> 
> Enforce CONFIG_HOTPLUG_CPU=y when SMP is enabled so the full operation is
> possible.
> 
> Reported-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Konrad Wilk <konrad.wilk@oracle.com>
> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> Cc: Mukesh Ojha <mojha@codeaurora.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Jiri Kosina <jkosina@suse.cz>
> Cc: Rik van Riel <riel@surriel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Micheal Kelley <michael.h.kelley@microsoft.com>
> Cc: K. Y. Srinivasan <kys@microsoft.com>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: stable@vger.kernel.org
> ---
>  arch/x86/Kconfig |    8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [tip:smp/urgent] cpu/hotplug: Prevent crash when CPU bringup fails on CONFIG_HOTPLUG_CPU=n
  2019-03-26 16:36 ` [patch 1/2] cpu/hotplug: Prevent crash when CPU bringup fails on CONFIG_HOTPLUG_CPU=n Thomas Gleixner
  2019-03-27  1:12   ` Greg KH
@ 2019-03-28 12:38   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 7+ messages in thread
From: tip-bot for Thomas Gleixner @ 2019-03-28 12:38 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, hpa, gregkh, tglx, bp, michael.h.kelley, linux-kernel,
	jkosina, Tianyu.Lan, mingo, torvalds, konrad.wilk, mojha, luto,
	riel, kys, jpoimboe

Commit-ID:  206b92353c839c0b27a0b9bec24195f93fd6cf7a
Gitweb:     https://git.kernel.org/tip/206b92353c839c0b27a0b9bec24195f93fd6cf7a
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Tue, 26 Mar 2019 17:36:05 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 28 Mar 2019 13:34:58 +0100

cpu/hotplug: Prevent crash when CPU bringup fails on CONFIG_HOTPLUG_CPU=n

Tianyu reported a crash in a CPU hotplug teardown callback when booting a
kernel which has CONFIG_HOTPLUG_CPU disabled with the 'nosmt' boot
parameter.

It turns out that the SMP=y CONFIG_HOTPLUG_CPU=n case has been broken
forever in case that a bringup callback fails. Unfortunately this issue was
not recognized when the CPU hotplug code was reworked, so the shortcoming
just stayed in place.

When a bringup callback fails, the CPU hotplug code rolls back the
operation and takes the CPU offline.

The 'nosmt' command line argument uses a bringup failure to abort the
bringup of SMT sibling CPUs. This partial bringup is required due to the
MCE misdesign on Intel CPUs.

With CONFIG_HOTPLUG_CPU=y the rollback works perfectly fine, but
CONFIG_HOTPLUG_CPU=n lacks essential mechanisms to exercise the low level
teardown of a CPU including the synchronizations in various facilities like
RCU, NOHZ and others.

As a consequence the teardown callbacks which must be executed on the
outgoing CPU within stop machine with interrupts disabled are executed on
the control CPU in interrupt enabled and preemptible context causing the
kernel to crash and burn. The pre state machine code has a different
failure mode which is more subtle and resulting in a less obvious use after
free crash because the control side frees resources which are still in use
by the undead CPU.

But this is not a x86 only problem. Any architecture which supports the
SMP=y HOTPLUG_CPU=n combination suffers from the same issue. It's just less
likely to be triggered because in 99.99999% of the cases all bringup
callbacks succeed.

The easy solution of making HOTPLUG_CPU mandatory for SMP is not working on
all architectures as the following architectures have either no hotplug
support at all or not all subarchitectures support it:

 alpha, arc, hexagon, openrisc, riscv, sparc (32bit), mips (partial).

Crashing the kernel in such a situation is not an acceptable state
either.

Implement a minimal rollback variant by limiting the teardown to the point
where all regular teardown callbacks have been invoked and leave the CPU in
the 'dead' idle state. This has the following consequences:

 - the CPU is brought down to the point where the stop_machine takedown
   would happen.

 - the CPU stays there forever and is idle

 - The CPU is cleared in the CPU active mask, but not in the CPU online
   mask which is a legit state.

 - Interrupts are not forced away from the CPU

 - All facilities which only look at online mask would still see it, but
   that is the case during normal hotplug/unplug operations as well. It's
   just a (way) longer time frame.

This will expose issues, which haven't been exposed before or only seldom,
because now the normally transient state of being non active but online is
a permanent state. In testing this exposed already an issue vs. work queues
where the vmstat code schedules work on the almost dead CPU which ends up
in an unbound workqueue and triggers 'preemtible context' warnings. This is
not a problem of this change, it merily exposes an already existing issue.
Still this is better than crashing fully without a chance to debug it.

This is mainly thought as workaround for those architectures which do not
support HOTPLUG_CPU. All others should enforce HOTPLUG_CPU for SMP.

Fixes: 2e1a3483ce74 ("cpu/hotplug: Split out the state walk into functions")
Reported-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Konrad Wilk <konrad.wilk@oracle.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Mukesh Ojha <mojha@codeaurora.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Rik van Riel <riel@surriel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Micheal Kelley <michael.h.kelley@microsoft.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190326163811.503390616@linutronix.de

---
 kernel/cpu.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 025f419d16f6..6754f3ecfd94 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -564,6 +564,20 @@ static void undo_cpu_up(unsigned int cpu, struct cpuhp_cpu_state *st)
 		cpuhp_invoke_callback(cpu, st->state, false, NULL, NULL);
 }
 
+static inline bool can_rollback_cpu(struct cpuhp_cpu_state *st)
+{
+	if (IS_ENABLED(CONFIG_HOTPLUG_CPU))
+		return true;
+	/*
+	 * When CPU hotplug is disabled, then taking the CPU down is not
+	 * possible because takedown_cpu() and the architecture and
+	 * subsystem specific mechanisms are not available. So the CPU
+	 * which would be completely unplugged again needs to stay around
+	 * in the current state.
+	 */
+	return st->state <= CPUHP_BRINGUP_CPU;
+}
+
 static int cpuhp_up_callbacks(unsigned int cpu, struct cpuhp_cpu_state *st,
 			      enum cpuhp_state target)
 {
@@ -574,8 +588,10 @@ static int cpuhp_up_callbacks(unsigned int cpu, struct cpuhp_cpu_state *st,
 		st->state++;
 		ret = cpuhp_invoke_callback(cpu, st->state, true, NULL, NULL);
 		if (ret) {
-			st->target = prev_state;
-			undo_cpu_up(cpu, st);
+			if (can_rollback_cpu(st)) {
+				st->target = prev_state;
+				undo_cpu_up(cpu, st);
+			}
 			break;
 		}
 	}

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [tip:smp/urgent] x86/smp: Enforce CONFIG_HOTPLUG_CPU when SMP=y
  2019-03-26 16:36 ` [patch 2/2] x86/smp: Enforce CONFIG_HOTPLUG_CPU when SMP=y Thomas Gleixner
  2019-03-27  1:12   ` Greg KH
@ 2019-03-28 12:39   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 7+ messages in thread
From: tip-bot for Thomas Gleixner @ 2019-03-28 12:39 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: michael.h.kelley, jkosina, linux-kernel, luto, riel, kys, mingo,
	mojha, Tianyu.Lan, peterz, hpa, jpoimboe, tglx, torvalds, bp,
	konrad.wilk, gregkh

Commit-ID:  bebd024e4815b1a170fcd21ead9c2222b23ce9e6
Gitweb:     https://git.kernel.org/tip/bebd024e4815b1a170fcd21ead9c2222b23ce9e6
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Tue, 26 Mar 2019 17:36:06 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 28 Mar 2019 13:34:58 +0100

x86/smp: Enforce CONFIG_HOTPLUG_CPU when SMP=y

The SMT disable 'nosmt' command line argument is not working properly when
CONFIG_HOTPLUG_CPU is disabled. The teardown of the sibling CPUs which are
required to be brought up due to the MCE issues, cannot work. The CPUs are
then kept in a half dead state.

As the 'nosmt' functionality has become popular due to the speculative
hardware vulnerabilities, the half torn down state is not a proper solution
to the problem.

Enforce CONFIG_HOTPLUG_CPU=y when SMP is enabled so the full operation is
possible.

Reported-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Konrad Wilk <konrad.wilk@oracle.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Mukesh Ojha <mojha@codeaurora.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Rik van Riel <riel@surriel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Micheal Kelley <michael.h.kelley@microsoft.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190326163811.598166056@linutronix.de

---
 arch/x86/Kconfig | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c1f9b3cf437c..5ad92419be19 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2217,14 +2217,8 @@ config RANDOMIZE_MEMORY_PHYSICAL_PADDING
 	   If unsure, leave at the default value.
 
 config HOTPLUG_CPU
-	bool "Support for hot-pluggable CPUs"
+	def_bool y
 	depends on SMP
-	---help---
-	  Say Y here to allow turning CPUs off and on. CPUs can be
-	  controlled through /sys/devices/system/cpu.
-	  ( Note: power management support will enable this option
-	    automatically on SMP systems. )
-	  Say N if you want to disable CPU hotplug.
 
 config BOOTPARAM_HOTPLUG_CPU0
 	bool "Set default setting of cpu0_hotpluggable"

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-03-28 12:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-26 16:36 [patch 0/2] cpu/hotplug: Prevent damage with SMP=y and HOTPLUG_CPU=n Thomas Gleixner
2019-03-26 16:36 ` [patch 1/2] cpu/hotplug: Prevent crash when CPU bringup fails on CONFIG_HOTPLUG_CPU=n Thomas Gleixner
2019-03-27  1:12   ` Greg KH
2019-03-28 12:38   ` [tip:smp/urgent] " tip-bot for Thomas Gleixner
2019-03-26 16:36 ` [patch 2/2] x86/smp: Enforce CONFIG_HOTPLUG_CPU when SMP=y Thomas Gleixner
2019-03-27  1:12   ` Greg KH
2019-03-28 12:39   ` [tip:smp/urgent] " tip-bot for Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).