All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] x86/crash: Fix double NMI shootdown bug
@ 2022-05-18  0:16 Sean Christopherson
  2022-05-18  0:16 ` [PATCH v2 1/3] x86/crash: Disable virt in core NMI crash handler to avoid double shootdown Sean Christopherson
                   ` (6 more replies)
  0 siblings, 7 replies; 9+ messages in thread
From: Sean Christopherson @ 2022-05-18  0:16 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86
  Cc: H. Peter Anvin, linux-kernel, Guilherme G . Piccoli,
	Vitaly Kuznetsov, Paolo Bonzini, Sean Christopherson

Fix a double NMI shootdown bug found and debugged by Guilherme, who did all
the hard work.  NMI shootdown is a one-time thing; the handler leaves NMIs
blocked and enters halt.  At best, a second (or third...) shootdown is an
expensive nop, at worst it can hang the kernel and prevent kexec'ing into
a new kernel, e.g. prior to the hardening of register_nmi_handler(), a
double shootdown resulted in a double list_add(), which is fatal when running
with CONFIG_BUG_ON_DATA_CORRUPTION=y.

With the "right" kexec/kdump configuration, emergency_vmx_disable_all() can
be reached after kdump_nmi_shootdown_cpus() (currently the only two users
of nmi_shootdown_cpus()).

To fix, move the disabling of virtualization into crash_nmi_callback(),
remove emergency_vmx_disable_all()'s callback, and do a shootdown for
emergency_vmx_disable_all() if and only if a shootdown hasn't yet occurred.
The only thing emergency_vmx_disable_all() cares about is disabling VMX/SVM
(obviously), and since I can't envision a use case for an NMI shootdown that
doesn't want to disable virtualization, doing that in the core handler means
emergency_vmx_disable_all() only needs to ensure _a_ shootdown occurs, it
doesn't care when that shootdown happened or what callback may have run.

Patch 2 is a related bug fix found while exploring ideas for patch 1.
Patch 3 is a cleanup to try to prevent future "fixed VMX but not SVM"
style bugs.

Guilherme and Vitaly, I dropped your Tested-by and Reviewed-by tags
since the relevant patches changed a decent amount.

v2:
  - Use a NULL handler and crash_ipi_issued instead of a magic nop
    handler. [tglx]
  - Add comments to call out that modifying the existing handler
    once the NMI is sent may cause explosions.
  - Add a patch to cleanup cpu_emergency_vmxoff().

v1: https://lore.kernel.org/all/20220511234332.3654455-1-seanjc@google.com

Sean Christopherson (3):
  x86/crash: Disable virt in core NMI crash handler to avoid double
    shootdown
  x86/reboot: Disable virtualization in an emergency if SVM is supported
  x86/virt: Fold __cpu_emergency_vmxoff() into its sole caller

 arch/x86/include/asm/reboot.h  |  1 +
 arch/x86/include/asm/virtext.h | 14 +-----
 arch/x86/kernel/crash.c        | 16 +-----
 arch/x86/kernel/reboot.c       | 89 +++++++++++++++++++++++++---------
 4 files changed, 69 insertions(+), 51 deletions(-)


base-commit: a7fed5c0431dbfa707037848830f980e0f93cfb3
-- 
2.36.0.550.gb090851708-goog


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/3] x86/crash: Disable virt in core NMI crash handler to avoid double shootdown
  2022-05-18  0:16 [PATCH v2 0/3] x86/crash: Fix double NMI shootdown bug Sean Christopherson
@ 2022-05-18  0:16 ` Sean Christopherson
  2022-05-18 15:50   ` Guilherme G. Piccoli
  2022-05-18  0:16 ` [PATCH v2 2/3] x86/reboot: Disable virtualization in an emergency if SVM is supported Sean Christopherson
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 9+ messages in thread
From: Sean Christopherson @ 2022-05-18  0:16 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86
  Cc: H. Peter Anvin, linux-kernel, Guilherme G . Piccoli,
	Vitaly Kuznetsov, Paolo Bonzini, Sean Christopherson

Disable virtualization in crash_nmi_callback() and rework the
emergency_vmx_disable_all() path to do an NMI shootdown if and only if a
shootdown has not already occurred.   NMI crash shootdown fundamentally
can't support multiple invocations as responding CPUs are deliberately
put into halt state without unblocking NMIs.  But, the emergency reboot
path doesn't have any work of its own, it simply cares about disabling
virtualization, i.e. so long as a shootdown occurred, emergency reboot
doesn't care who initiated the shootdown, or when.

If "crash_kexec_post_notifiers" is specified on the kernel command line,
panic() will invoke crash_smp_send_stop() and result in a second call to
nmi_shootdown_cpus() during native_machine_emergency_restart().

Invoke the callback _before_ disabling virtualization, as the current
VMCS needs to be cleared before doing VMXOFF.  Note, this results in a
subtle change in ordering between disabling virtualization and stopping
Intel PT on the responding CPUs.  While VMX and Intel PT do interact,
VMXOFF and writes to MSR_IA32_RTIT_CTL do not induce faults between one
another, which is all that matters when panicking.

Harden nmi_shootdown_cpus() against multiple invocations to try and
capture any such kernel bugs via a WARN instead of hanging the system
during a crash/dump, e.g. prior to the recent hardening of
register_nmi_handler(), re-registering the NMI handler would trigger a
double list_add() and hang the system if CONFIG_BUG_ON_DATA_CORRUPTION=y.

 list_add double add: new=ffffffff82220800, prev=ffffffff8221cfe8, next=ffffffff82220800.
 WARNING: CPU: 2 PID: 1319 at lib/list_debug.c:29 __list_add_valid+0x67/0x70
 Call Trace:
  __register_nmi_handler+0xcf/0x130
  nmi_shootdown_cpus+0x39/0x90
  native_machine_emergency_restart+0x1c9/0x1d0
  panic+0x237/0x29b

Extract the disabling logic to a common helper to deduplicate code, and
to prepare for doing the shootdown in the emergency reboot path if SVM
is supported.

Note, prior to commit ed72736183c4 ("x86/reboot: Force all cpus to exit
VMX root if VMX is supported"), nmi_shootdown_cpus() was subtly protected
against a second invocation by a cpu_vmx_enabled() check as the kdump
handler would disable VMX if it ran first.

Fixes: ed72736183c4 ("x86/reboot: Force all cpus to exit VMX root if VMX is supported)
Cc: stable@vger.kernel.org
Reported-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/all/20220427224924.592546-2-gpiccoli@igalia.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/reboot.h |  1 +
 arch/x86/kernel/crash.c       | 16 +--------
 arch/x86/kernel/reboot.c      | 66 ++++++++++++++++++++++++++++-------
 3 files changed, 56 insertions(+), 27 deletions(-)

diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
index 04c17be9b5fd..8f2da36435a6 100644
--- a/arch/x86/include/asm/reboot.h
+++ b/arch/x86/include/asm/reboot.h
@@ -25,6 +25,7 @@ void __noreturn machine_real_restart(unsigned int type);
 #define MRR_BIOS	0
 #define MRR_APM		1
 
+void cpu_crash_disable_virtualization(void);
 typedef void (*nmi_shootdown_cb)(int, struct pt_regs*);
 void nmi_panic_self_stop(struct pt_regs *regs);
 void nmi_shootdown_cpus(nmi_shootdown_cb callback);
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index e8326a8d1c5d..fe0cf83843ba 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -81,15 +81,6 @@ static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
 	 */
 	cpu_crash_vmclear_loaded_vmcss();
 
-	/* Disable VMX or SVM if needed.
-	 *
-	 * We need to disable virtualization on all CPUs.
-	 * Having VMX or SVM enabled on any CPU may break rebooting
-	 * after the kdump kernel has finished its task.
-	 */
-	cpu_emergency_vmxoff();
-	cpu_emergency_svm_disable();
-
 	/*
 	 * Disable Intel PT to stop its logging
 	 */
@@ -148,12 +139,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
 	 */
 	cpu_crash_vmclear_loaded_vmcss();
 
-	/* Booting kdump kernel with VMX or SVM enabled won't work,
-	 * because (among other limitations) we can't disable paging
-	 * with the virt flags.
-	 */
-	cpu_emergency_vmxoff();
-	cpu_emergency_svm_disable();
+	cpu_crash_disable_virtualization();
 
 	/*
 	 * Disable Intel PT to stop its logging
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index fa700b46588e..e67737418f7d 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -528,10 +528,7 @@ static inline void kb_wait(void)
 	}
 }
 
-static void vmxoff_nmi(int cpu, struct pt_regs *regs)
-{
-	cpu_emergency_vmxoff();
-}
+static inline void nmi_shootdown_cpus_on_restart(void);
 
 /* Use NMIs as IPIs to tell all CPUs to disable virtualization */
 static void emergency_vmx_disable_all(void)
@@ -554,7 +551,7 @@ static void emergency_vmx_disable_all(void)
 		__cpu_emergency_vmxoff();
 
 		/* Halt and exit VMX root operation on the other CPUs. */
-		nmi_shootdown_cpus(vmxoff_nmi);
+		nmi_shootdown_cpus_on_restart();
 	}
 }
 
@@ -802,6 +799,18 @@ static nmi_shootdown_cb shootdown_callback;
 static atomic_t waiting_for_crash_ipi;
 static int crash_ipi_issued;
 
+void cpu_crash_disable_virtualization(void)
+{
+	/*
+	 * Disable virtualization, i.e. VMX or SVM, so that INIT is recognized
+	 * during reboot.  VMX blocks INIT if the CPU is post-VMXON, and SVM
+	 * blocks INIT if GIF=0.  Note, CLGI #UDs if SVM isn't enabled, so it's
+	 * easier to just disable SVM unconditionally.
+	 */
+	cpu_emergency_vmxoff();
+	cpu_emergency_svm_disable();
+}
+
 static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
 {
 	int cpu;
@@ -817,7 +826,14 @@ static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
 		return NMI_HANDLED;
 	local_irq_disable();
 
-	shootdown_callback(cpu, regs);
+	if (shootdown_callback)
+		shootdown_callback(cpu, regs);
+
+	/*
+	 * Prepare the CPU for reboot _after_ invoking the callback so that the
+	 * callback can safely use virtualization instructions, e.g. VMCLEAR.
+	 */
+	cpu_crash_disable_virtualization();
 
 	atomic_dec(&waiting_for_crash_ipi);
 	/* Assume hlt works */
@@ -828,18 +844,32 @@ static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
 	return NMI_HANDLED;
 }
 
-/*
- * Halt all other CPUs, calling the specified function on each of them
+/**
+ * nmi_shootdown_cpus - Stop other CPUs via NMI
+ * @callback:	Optional callback to be invoked from the NMI handler
  *
- * This function can be used to halt all other CPUs on crash
- * or emergency reboot time. The function passed as parameter
- * will be called inside a NMI handler on all CPUs.
+ * The NMI handler on the remote CPUs invokes @callback, if not
+ * NULL, first and then disables virtualization to ensure that
+ * INIT is recognized during reboot.
+ *
+ * nmi_shootdown_cpus() can only be invoked once. After the first
+ * invocation all other CPUs are stuck in crash_nmi_callback() and
+ * cannot respond to a second NMI.
  */
 void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 {
 	unsigned long msecs;
+
 	local_irq_disable();
 
+	/*
+	 * Avoid certain doom if a shootdown already occurred; re-registering
+	 * the NMI handler will cause list corruption, modifying the callback
+	 * will do who knows what, etc...
+	 */
+	if (WARN_ON_ONCE(crash_ipi_issued))
+		return;
+
 	/* Make a note of crashing cpu. Will be used in NMI callback. */
 	crashing_cpu = safe_smp_processor_id();
 
@@ -867,7 +897,17 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 		msecs--;
 	}
 
-	/* Leave the nmi callback set */
+	/*
+	 * Leave the nmi callback set, shootdown is a one-time thing.  Clearing
+	 * the callback could result in a NULL pointer dereference if a CPU
+	 * (finally) responds after the timeout expires.
+	 */
+}
+
+static inline void nmi_shootdown_cpus_on_restart(void)
+{
+	if (!crash_ipi_issued)
+		nmi_shootdown_cpus(NULL);
 }
 
 /*
@@ -897,6 +937,8 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 	/* No other CPUs to shoot down */
 }
 
+static inline void nmi_shootdown_cpus_on_restart(void) { }
+
 void run_crash_ipi_callback(struct pt_regs *regs)
 {
 }
-- 
2.36.0.550.gb090851708-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/3] x86/reboot: Disable virtualization in an emergency if SVM is supported
  2022-05-18  0:16 [PATCH v2 0/3] x86/crash: Fix double NMI shootdown bug Sean Christopherson
  2022-05-18  0:16 ` [PATCH v2 1/3] x86/crash: Disable virt in core NMI crash handler to avoid double shootdown Sean Christopherson
@ 2022-05-18  0:16 ` Sean Christopherson
  2022-05-18  0:16 ` [PATCH v2 3/3] x86/virt: Fold __cpu_emergency_vmxoff() into its sole caller Sean Christopherson
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Sean Christopherson @ 2022-05-18  0:16 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86
  Cc: H. Peter Anvin, linux-kernel, Guilherme G . Piccoli,
	Vitaly Kuznetsov, Paolo Bonzini, Sean Christopherson

Disable SVM on all CPUs via NMI shootdown during an emergency reboot.
Like VMX, SVM can block INIT and thus prevent bringing up other CPUs via
INIT-SIPI-SIPI.

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kernel/reboot.c | 23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index e67737418f7d..0aba4a37f3f9 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -530,27 +530,26 @@ static inline void kb_wait(void)
 
 static inline void nmi_shootdown_cpus_on_restart(void);
 
-/* Use NMIs as IPIs to tell all CPUs to disable virtualization */
-static void emergency_vmx_disable_all(void)
+static void emergency_reboot_disable_virtualization(void)
 {
 	/* Just make sure we won't change CPUs while doing this */
 	local_irq_disable();
 
 	/*
-	 * Disable VMX on all CPUs before rebooting, otherwise we risk hanging
-	 * the machine, because the CPU blocks INIT when it's in VMX root.
+	 * Disable virtualization on all CPUs before rebooting to avoid hanging
+	 * the system, as VMX and SVM block INIT when running in the host.
 	 *
 	 * We can't take any locks and we may be on an inconsistent state, so
-	 * use NMIs as IPIs to tell the other CPUs to exit VMX root and halt.
+	 * use NMIs as IPIs to tell the other CPUs to disable VMX/SVM and halt.
 	 *
-	 * Do the NMI shootdown even if VMX if off on _this_ CPU, as that
-	 * doesn't prevent a different CPU from being in VMX root operation.
+	 * Do the NMI shootdown even if virtualization is off on _this_ CPU, as
+	 * other CPUs may have virtualization enabled.
 	 */
-	if (cpu_has_vmx()) {
-		/* Safely force _this_ CPU out of VMX root operation. */
-		__cpu_emergency_vmxoff();
+	if (cpu_has_vmx() || cpu_has_svm(NULL)) {
+		/* Safely force _this_ CPU out of VMX/SVM operation. */
+		cpu_crash_disable_virtualization();
 
-		/* Halt and exit VMX root operation on the other CPUs. */
+		/* Disable VMX/SVM and halt on other CPUs. */
 		nmi_shootdown_cpus_on_restart();
 	}
 }
@@ -587,7 +586,7 @@ static void native_machine_emergency_restart(void)
 	unsigned short mode;
 
 	if (reboot_emergency)
-		emergency_vmx_disable_all();
+		emergency_reboot_disable_virtualization();
 
 	tboot_shutdown(TB_SHUTDOWN_REBOOT);
 
-- 
2.36.0.550.gb090851708-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 3/3] x86/virt: Fold __cpu_emergency_vmxoff() into its sole caller
  2022-05-18  0:16 [PATCH v2 0/3] x86/crash: Fix double NMI shootdown bug Sean Christopherson
  2022-05-18  0:16 ` [PATCH v2 1/3] x86/crash: Disable virt in core NMI crash handler to avoid double shootdown Sean Christopherson
  2022-05-18  0:16 ` [PATCH v2 2/3] x86/reboot: Disable virtualization in an emergency if SVM is supported Sean Christopherson
@ 2022-05-18  0:16 ` Sean Christopherson
  2022-06-03 16:07 ` [PATCH v2 0/3] x86/crash: Fix double NMI shootdown bug Guilherme G. Piccoli
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Sean Christopherson @ 2022-05-18  0:16 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86
  Cc: H. Peter Anvin, linux-kernel, Guilherme G . Piccoli,
	Vitaly Kuznetsov, Paolo Bonzini, Sean Christopherson

Fold __cpu_emergency_vmxoff() into cpu_emergency_vmxoff(), its sole
remaining caller, to discourage crash/emergency code from handling VMX
but not SVM.  The behavior of VMX and SVM is so similar that it's highly
unlikely that there will be a scenario where VMX needs to be disabled,
but SVM can be left enabled.  In other words, during a crash and/or
emergency reboot, funnel all virtualization teardown sequences through
cpu_crash_disable_virtualization().

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/virtext.h | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h
index 8757078d4442..a9414ef5269e 100644
--- a/arch/x86/include/asm/virtext.h
+++ b/arch/x86/include/asm/virtext.h
@@ -60,22 +60,12 @@ static inline int cpu_vmx_enabled(void)
 	return __read_cr4() & X86_CR4_VMXE;
 }
 
-/** Disable VMX if it is enabled on the current CPU
- *
- * You shouldn't call this if cpu_has_vmx() returns 0.
- */
-static inline void __cpu_emergency_vmxoff(void)
-{
-	if (cpu_vmx_enabled())
-		cpu_vmxoff();
-}
-
 /** Disable VMX if it is supported and enabled on the current CPU
  */
 static inline void cpu_emergency_vmxoff(void)
 {
-	if (cpu_has_vmx())
-		__cpu_emergency_vmxoff();
+	if (cpu_has_vmx() && cpu_vmx_enabled())
+		cpu_vmxoff();
 }
 
 
-- 
2.36.0.550.gb090851708-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/3] x86/crash: Disable virt in core NMI crash handler to avoid double shootdown
  2022-05-18  0:16 ` [PATCH v2 1/3] x86/crash: Disable virt in core NMI crash handler to avoid double shootdown Sean Christopherson
@ 2022-05-18 15:50   ` Guilherme G. Piccoli
  0 siblings, 0 replies; 9+ messages in thread
From: Guilherme G. Piccoli @ 2022-05-18 15:50 UTC (permalink / raw)
  To: Sean Christopherson, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86
  Cc: H. Peter Anvin, linux-kernel, Vitaly Kuznetsov, Paolo Bonzini

On 17/05/2022 21:16, Sean Christopherson wrote:
> [...]
> Fixes: ed72736183c4 ("x86/reboot: Force all cpus to exit VMX root if VMX is supported)
> Cc: stable@vger.kernel.org
> Reported-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Link: https://lore.kernel.org/all/20220427224924.592546-2-gpiccoli@igalia.com
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/include/asm/reboot.h |  1 +
>  arch/x86/kernel/crash.c       | 16 +--------
>  arch/x86/kernel/reboot.c      | 66 ++++++++++++++++++++++++++++-------
>  3 files changed, 56 insertions(+), 27 deletions(-)

Thanks Sean, great patch! It's working fine according to my tests, and
the comments made the code very clear.
Feel free to add:

Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/3] x86/crash: Fix double NMI shootdown bug
  2022-05-18  0:16 [PATCH v2 0/3] x86/crash: Fix double NMI shootdown bug Sean Christopherson
                   ` (2 preceding siblings ...)
  2022-05-18  0:16 ` [PATCH v2 3/3] x86/virt: Fold __cpu_emergency_vmxoff() into its sole caller Sean Christopherson
@ 2022-06-03 16:07 ` Guilherme G. Piccoli
  2022-07-08 23:28 ` Sean Christopherson
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Guilherme G. Piccoli @ 2022-06-03 16:07 UTC (permalink / raw)
  To: Sean Christopherson, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86
  Cc: H. Peter Anvin, linux-kernel, Vitaly Kuznetsov, Paolo Bonzini

Hi Sean / Thomas, was this merged anywhere? I just checked and seems it
didn't reach mainline...patch 01 is especially relevant, IMHO.

Thanks in advance,


Guilherme

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/3] x86/crash: Fix double NMI shootdown bug
  2022-05-18  0:16 [PATCH v2 0/3] x86/crash: Fix double NMI shootdown bug Sean Christopherson
                   ` (3 preceding siblings ...)
  2022-06-03 16:07 ` [PATCH v2 0/3] x86/crash: Fix double NMI shootdown bug Guilherme G. Piccoli
@ 2022-07-08 23:28 ` Sean Christopherson
  2022-08-22 18:39 ` Guilherme G. Piccoli
  2022-10-03 16:03 ` Guilherme G. Piccoli
  6 siblings, 0 replies; 9+ messages in thread
From: Sean Christopherson @ 2022-07-08 23:28 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, linux-kernel, Guilherme G . Piccoli,
	Vitaly Kuznetsov, Paolo Bonzini

On Wed, May 18, 2022, Sean Christopherson wrote:
> Fix a double NMI shootdown bug found and debugged by Guilherme, who did all
> the hard work.  NMI shootdown is a one-time thing; the handler leaves NMIs
> blocked and enters halt.  At best, a second (or third...) shootdown is an
> expensive nop, at worst it can hang the kernel and prevent kexec'ing into
> a new kernel, e.g. prior to the hardening of register_nmi_handler(), a
> double shootdown resulted in a double list_add(), which is fatal when running
> with CONFIG_BUG_ON_DATA_CORRUPTION=y.

...

> Sean Christopherson (3):
>   x86/crash: Disable virt in core NMI crash handler to avoid double
>     shootdown
>   x86/reboot: Disable virtualization in an emergency if SVM is supported
>   x86/virt: Fold __cpu_emergency_vmxoff() into its sole caller
> 
>  arch/x86/include/asm/reboot.h  |  1 +
>  arch/x86/include/asm/virtext.h | 14 +-----
>  arch/x86/kernel/crash.c        | 16 +-----
>  arch/x86/kernel/reboot.c       | 89 +++++++++++++++++++++++++---------
>  4 files changed, 69 insertions(+), 51 deletions(-)

Ping!  Still applies cleanly.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/3] x86/crash: Fix double NMI shootdown bug
  2022-05-18  0:16 [PATCH v2 0/3] x86/crash: Fix double NMI shootdown bug Sean Christopherson
                   ` (4 preceding siblings ...)
  2022-07-08 23:28 ` Sean Christopherson
@ 2022-08-22 18:39 ` Guilherme G. Piccoli
  2022-10-03 16:03 ` Guilherme G. Piccoli
  6 siblings, 0 replies; 9+ messages in thread
From: Guilherme G. Piccoli @ 2022-08-22 18:39 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86
  Cc: Sean Christopherson, H. Peter Anvin, linux-kernel,
	Vitaly Kuznetsov, Paolo Bonzini

On 17/05/2022 21:16, Sean Christopherson wrote:
> Fix a double NMI shootdown bug found and debugged by Guilherme, who did all
> the hard work.  NMI shootdown is a one-time thing; the handler leaves NMIs
> blocked and enters halt.  At best, a second (or third...) shootdown is an
> expensive nop, at worst it can hang the kernel and prevent kexec'ing into
> a new kernel, e.g. prior to the hardening of register_nmi_handler(), a
> double shootdown resulted in a double list_add(), which is fatal when running
> with CONFIG_BUG_ON_DATA_CORRUPTION=y.
> 
> With the "right" kexec/kdump configuration, emergency_vmx_disable_all() can
> be reached after kdump_nmi_shootdown_cpus() (currently the only two users
> of nmi_shootdown_cpus()).
> 
> To fix, move the disabling of virtualization into crash_nmi_callback(),
> remove emergency_vmx_disable_all()'s callback, and do a shootdown for
> emergency_vmx_disable_all() if and only if a shootdown hasn't yet occurred.
> The only thing emergency_vmx_disable_all() cares about is disabling VMX/SVM
> (obviously), and since I can't envision a use case for an NMI shootdown that
> doesn't want to disable virtualization, doing that in the core handler means
> emergency_vmx_disable_all() only needs to ensure _a_ shootdown occurs, it
> doesn't care when that shootdown happened or what callback may have run.
> 
> Patch 2 is a related bug fix found while exploring ideas for patch 1.
> Patch 3 is a cleanup to try to prevent future "fixed VMX but not SVM"
> style bugs.
> 
> Guilherme and Vitaly, I dropped your Tested-by and Reviewed-by tags
> since the relevant patches changed a decent amount.
> 
> v2:
>   - Use a NULL handler and crash_ipi_issued instead of a magic nop
>     handler. [tglx]
>   - Add comments to call out that modifying the existing handler
>     once the NMI is sent may cause explosions.
>   - Add a patch to cleanup cpu_emergency_vmxoff().
> 
> v1: https://lore.kernel.org/all/20220511234332.3654455-1-seanjc@google.com
> 
> Sean Christopherson (3):
>   x86/crash: Disable virt in core NMI crash handler to avoid double
>     shootdown
>   x86/reboot: Disable virtualization in an emergency if SVM is supported
>   x86/virt: Fold __cpu_emergency_vmxoff() into its sole caller
> 
>  arch/x86/include/asm/reboot.h  |  1 +
>  arch/x86/include/asm/virtext.h | 14 +-----
>  arch/x86/kernel/crash.c        | 16 +-----
>  arch/x86/kernel/reboot.c       | 89 +++++++++++++++++++++++++---------
>  4 files changed, 69 insertions(+), 51 deletions(-)
> 
> 
> base-commit: a7fed5c0431dbfa707037848830f980e0f93cfb3


Hey folks, is there any news about this series?
I saw a ping from Sean some time ago...so, this is a re-ping heh

Thanks,


Guilherme

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/3] x86/crash: Fix double NMI shootdown bug
  2022-05-18  0:16 [PATCH v2 0/3] x86/crash: Fix double NMI shootdown bug Sean Christopherson
                   ` (5 preceding siblings ...)
  2022-08-22 18:39 ` Guilherme G. Piccoli
@ 2022-10-03 16:03 ` Guilherme G. Piccoli
  6 siblings, 0 replies; 9+ messages in thread
From: Guilherme G. Piccoli @ 2022-10-03 16:03 UTC (permalink / raw)
  To: Sean Christopherson, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen
  Cc: H. Peter Anvin, linux-kernel, Vitaly Kuznetsov, Paolo Bonzini, x86

On 17/05/2022 21:16, Sean Christopherson wrote:
> Fix a double NMI shootdown bug found and debugged by Guilherme, who did all
> the hard work.  NMI shootdown is a one-time thing; the handler leaves NMIs
> blocked and enters halt.  At best, a second (or third...) shootdown is an
> expensive nop, at worst it can hang the kernel and prevent kexec'ing into
> a new kernel, e.g. prior to the hardening of register_nmi_handler(), a
> double shootdown resulted in a double list_add(), which is fatal when running
> with CONFIG_BUG_ON_DATA_CORRUPTION=y.
> 
> With the "right" kexec/kdump configuration, emergency_vmx_disable_all() can
> be reached after kdump_nmi_shootdown_cpus() (currently the only two users
> of nmi_shootdown_cpus()).
> 
> To fix, move the disabling of virtualization into crash_nmi_callback(),
> remove emergency_vmx_disable_all()'s callback, and do a shootdown for
> emergency_vmx_disable_all() if and only if a shootdown hasn't yet occurred.
> The only thing emergency_vmx_disable_all() cares about is disabling VMX/SVM
> (obviously), and since I can't envision a use case for an NMI shootdown that
> doesn't want to disable virtualization, doing that in the core handler means
> emergency_vmx_disable_all() only needs to ensure _a_ shootdown occurs, it
> doesn't care when that shootdown happened or what callback may have run.
> 
> Patch 2 is a related bug fix found while exploring ideas for patch 1.
> Patch 3 is a cleanup to try to prevent future "fixed VMX but not SVM"
> style bugs.
> 
> Guilherme and Vitaly, I dropped your Tested-by and Reviewed-by tags
> since the relevant patches changed a decent amount.
> 
> v2:
>   - Use a NULL handler and crash_ipi_issued instead of a magic nop
>     handler. [tglx]
>   - Add comments to call out that modifying the existing handler
>     once the NMI is sent may cause explosions.
>   - Add a patch to cleanup cpu_emergency_vmxoff().
> 
> v1: https://lore.kernel.org/all/20220511234332.3654455-1-seanjc@google.com
> 
> Sean Christopherson (3):
>   x86/crash: Disable virt in core NMI crash handler to avoid double
>     shootdown
>   x86/reboot: Disable virtualization in an emergency if SVM is supported
>   x86/virt: Fold __cpu_emergency_vmxoff() into its sole caller
> 
>  arch/x86/include/asm/reboot.h  |  1 +
>  arch/x86/include/asm/virtext.h | 14 +-----
>  arch/x86/kernel/crash.c        | 16 +-----
>  arch/x86/kernel/reboot.c       | 89 +++++++++++++++++++++++++---------
>  4 files changed, 69 insertions(+), 51 deletions(-)
> 
> 
> base-commit: a7fed5c0431dbfa707037848830f980e0f93cfb3

Hi folks, monthly ping!
Any news on this fix series? Just checked, still applies cleanly.

Thanks,


Guilherme

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-10-03 16:04 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-18  0:16 [PATCH v2 0/3] x86/crash: Fix double NMI shootdown bug Sean Christopherson
2022-05-18  0:16 ` [PATCH v2 1/3] x86/crash: Disable virt in core NMI crash handler to avoid double shootdown Sean Christopherson
2022-05-18 15:50   ` Guilherme G. Piccoli
2022-05-18  0:16 ` [PATCH v2 2/3] x86/reboot: Disable virtualization in an emergency if SVM is supported Sean Christopherson
2022-05-18  0:16 ` [PATCH v2 3/3] x86/virt: Fold __cpu_emergency_vmxoff() into its sole caller Sean Christopherson
2022-06-03 16:07 ` [PATCH v2 0/3] x86/crash: Fix double NMI shootdown bug Guilherme G. Piccoli
2022-07-08 23:28 ` Sean Christopherson
2022-08-22 18:39 ` Guilherme G. Piccoli
2022-10-03 16:03 ` Guilherme G. Piccoli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.